11:31RSpliet: karolherbst: Congrats on getting the NIR stuff pushed, great stuff.
14:21mupuf: RSpliet: +1
15:16karolherbst: RSpliet, mupuf: thanks! Sadly 64 bit values are a little broken and you can run into some weirdo RA issues. There is a patch for that, just needs some reviewing and I didn't want to push something which might affect the TGSI path without reviews
15:16mupuf: makes sense
15:17karolherbst: and "a little broken" means like 10 piglit tests regress or something
15:19karolherbst: mupuf: ohh, btw, do you think you might be able to tell what games/applications use doubles/uint64_t inside their shaders? Would then dump the shaders myself and add them to the shader db we've got
15:20mupuf: karolherbst: /me thought no games were using it...
15:20karolherbst: yeah dunno
15:20mupuf: but remind me tomorrow, I'll see if I still have access to shaderdb
15:20karolherbst: :) cool
15:20karolherbst: random applicaitons would be nice as well
15:20karolherbst: I could imagine that blender or stuff might use it
15:21karolherbst: I want to check what the benefits are of doing all the lowering inside nir
15:21karolherbst: instead of codegen
15:21karolherbst: especially the int64 stuff
17:27abadilebo: Still WIP, but simulation exposed most of the strategy of GPUs, some of the relevant testbenches i simulated. Conceptually would be ready to try to code, I am reading and rereading all the time.
17:30abadilebo: moving helps like sports, when brain freezes, cause paradigm of hw is so complex to read, that it is likely to happen over time and time again.
17:37abadilebo: https://github.com/VerticalResearchGroup/miaow/blob/master/src/verilog/rtl/fetch/round_robin.v , one of the key files is this one which i left the last, i found it highly difficult to read, maniacaly hard, but I might had finally unlocked the thought of that
17:38abadilebo: this is the heart of the f_decode_wfid arbitrator, this small file, I could write several pages explaining how it works
17:51abadilebo: the idea is: it gets feedback from the dispatcher when one instruction is fully loaded from memory and available in instruction cache, the internal code structure feeds them into issue queues over the decoder in a way that f_decode_wfid is like a nested loop entry where outer loop does 40iteration but decode_wfid only one, and it then wraps around
17:52abadilebo: in other words, fetch arbitration on GCN is greedy then oldest and pc arbitration round robin
17:58abadilebo: actually even though the whitepaper says it is so for amd cards, lot have asked which is the default, and in the forums it is the most often question
18:00abadilebo: it is done so, cause that method is cache coalescing friendly, and fetches should be always made via similar strategy, even though miaow is not fully compliant on multiword fetches
18:04abadilebo: I think it is inapropriate to go for a full methodical hw details from my braindump, cause it looks like to be a full book worth of information, yeah 500pages how modern hw works
18:17abadilebo: and if some book is authored, there is no unimportant detail among those pages, even though i describe one more important general detail to begin with, hw tries to refetch from fetch queues whenever the valid bits will remain 1, and this is a condition when after some amount of time the instruction is still not issued, like when instruction has a dependency
18:25abadilebo: I am about to go crazy, i've studied something that is way over my head, and i can not put it to practice easily, even though maybe some parts i studied reasonably well, but 35years old, and clock is ticking
18:28abadilebo: clock is so ruthlessly ticking that I may not have another year to deal with polishing the stuff
18:33abadilebo: If i can not negotiate miraculously somehow now or at least during the next month, my life is generally all round ruined, and all the time i spent was wasted, however I still can not allow to waste anymore time , bye.