07:27cheetahpixie: I've seen worse puns in my time
07:27cheetahpixie: and you know, thinking about it
07:27cheetahpixie: if asahi's driver for the m* gpus were to implement metal, it'd really be full circle
07:27cheetahpixie: since that driver is written in rust and all
13:20fdobridge: <jekstrand> Why am I getting `ILLEGAL_INSTRUCTION_ENCODING`?!? I have exactly one instruction and it's bit-identical to the one codegen generates. 🤦
13:34fdobridge: <jekstrand> Because I forgot to set `nvk_shader::stage` and it was assuming it was a VS and uploading a header.... 🤦
13:35fdobridge: <jekstrand> Now I'm getting a fault on my LD which I'm 100% sure is because my dependency stuff is non-existent.
13:55fdobridge: <karolherbst🐧🦀> yeah... so you need to get the sched stuff right
14:25fdobridge: <jekstrand> Yeah
14:25fdobridge: <jekstrand> If you want to stick something on your list of things to try and get form nvidia. The real table of instruction latencies would be amazing.
15:06fdobridge: <karolherbst🐧🦀> I think I have that info 🙃
15:09fdobridge: <karolherbst🐧🦀> but in a weird way
15:10fdobridge: <karolherbst🐧🦀> at least if certain instructions have a minimum wait latency, that's what is explicitly stated, the other things depend on the unit they are executed on
15:10fdobridge: <karolherbst🐧🦀> but I also know what needs a read/write barrier
16:50HdkR: Just use the worst case scheduling option. flush + stall for every instruction, including ALU ops :P
17:01fdobridge: <nanokatze> for alu ops you need to explicitly specify stall cycles which seems to be the culprit here
17:01fdobridge: <nanokatze> for alu ops on new nv you need to explicitly specify stall cycles which seems to be the culprit here (edited)
17:03fdobridge: <nanokatze> oops, nevermind, it's ld...
17:05fdobridge: <karolherbst🐧🦀> you can always specify the worst case thing
17:05fdobridge: <karolherbst🐧🦀> there is a default "wait on everything and stall the max amount of time" thing
20:25fdobridge: <jekstrand> This hardware is annoyingly rust-unfriendly. 👿
20:25fdobridge: <jekstrand> If I'm reading the codegen code right, the delay on each instruction is to delay the next instruction.
20:26fdobridge: <jekstrand> Which means I either need to walk backwards or I need lookback in my loop.
20:26fdobridge: <jekstrand> *grumble*
20:31anarsoul: who said linked list? :)
20:31anarsoul: jekstrand: welcome to rust world!
20:44fdobridge: <karolherbst🐧🦀> or you loop twice
20:46fdobridge: <jekstrand> I'm just walking backwards. That should work fine.
20:46fdobridge: <jekstrand> It's just not the direction I expected. 🤷
21:10fdobridge: <karolherbst🐧🦀> well.. that doesn't work if you have to use barriers
21:11fdobridge: <karolherbst🐧🦀> because that's decided by the source instruction 🙃
21:11fdobridge: <karolherbst🐧🦀> so some instructions (like f2i) have a variable runtime and consumers have to wait on the instruction to finish via a barrier
23:28fdobridge: <jekstrand> Great success! I have a compute shader working which means constant buffers and global load/store 😄
23:29fdobridge: <jekstrand> I don't totally suck at compilers after all. 🙂
23:39mhenning: jekstrand: cool! sounds like progress
23:39mhenning: karolherbst: did you ever write a patch for fixing the latency of OP_BAR on ampere? If not, I might poke at it this weekend.
23:40fdobridge: <marysaka> does anyone know what PRI stand for?
23:42fdobridge: <airlied> in what context?
23:43fdobridge: <marysaka> like the prefix of some open-gpu-doc's manual
23:43fdobridge: <marysaka> https://github.com/NVIDIA/open-gpu-doc/blob/master/manuals/volta/gv100/pri_mme.ref.txt
23:47fdobridge: <marysaka> I got lost in a rabbit hole while analysing some MME macro I dumped ^^'
23:49fdobridge: <marysaka> basically there is some macro that talk with method "FALCON04" that seems to write values at given address/offset
23:51fdobridge: <marysaka> searching one of those value used for a call in nvgpu kernel driver does returned a match with a name that is "gr_pri_gpcs_setup_debug_r"
23:55fdobridge: <marysaka> that value is also in some whitelist range for "NVGPU_DBG_GPU_IOCTL_REG_OPS" ioctl too