07:27 cheetahpixie: I've seen worse puns in my time
07:27 cheetahpixie: and you know, thinking about it
07:27 cheetahpixie: if asahi's driver for the m* gpus were to implement metal, it'd really be full circle
07:27 cheetahpixie: since that driver is written in rust and all
13:20 fdobridge: <j​ekstrand> Why am I getting `ILLEGAL_INSTRUCTION_ENCODING`?!? I have exactly one instruction and it's bit-identical to the one codegen generates. 🤦
13:34 fdobridge: <j​ekstrand> Because I forgot to set `nvk_shader::stage` and it was assuming it was a VS and uploading a header.... 🤦
13:35 fdobridge: <j​ekstrand> Now I'm getting a fault on my LD which I'm 100% sure is because my dependency stuff is non-existent.
13:55 fdobridge: <k​arolherbst🐧🦀> yeah... so you need to get the sched stuff right
14:25 fdobridge: <j​ekstrand> Yeah
14:25 fdobridge: <j​ekstrand> If you want to stick something on your list of things to try and get form nvidia. The real table of instruction latencies would be amazing.
15:06 fdobridge: <k​arolherbst🐧🦀> I think I have that info 🙃
15:09 fdobridge: <k​arolherbst🐧🦀> but in a weird way
15:10 fdobridge: <k​arolherbst🐧🦀> at least if certain instructions have a minimum wait latency, that's what is explicitly stated, the other things depend on the unit they are executed on
15:10 fdobridge: <k​arolherbst🐧🦀> but I also know what needs a read/write barrier
16:50 HdkR: Just use the worst case scheduling option. flush + stall for every instruction, including ALU ops :P
17:01 fdobridge: <n​anokatze> for alu ops you need to explicitly specify stall cycles which seems to be the culprit here
17:01 fdobridge: <n​anokatze> for alu ops on new nv you need to explicitly specify stall cycles which seems to be the culprit here (edited)
17:03 fdobridge: <n​anokatze> oops, nevermind, it's ld...
17:05 fdobridge: <k​arolherbst🐧🦀> you can always specify the worst case thing
17:05 fdobridge: <k​arolherbst🐧🦀> there is a default "wait on everything and stall the max amount of time" thing
20:25 fdobridge: <j​ekstrand> This hardware is annoyingly rust-unfriendly. 👿
20:25 fdobridge: <j​ekstrand> If I'm reading the codegen code right, the delay on each instruction is to delay the next instruction.
20:26 fdobridge: <j​ekstrand> Which means I either need to walk backwards or I need lookback in my loop.
20:26 fdobridge: <j​ekstrand> *grumble*
20:31 anarsoul: who said linked list? :)
20:31 anarsoul: jekstrand: welcome to rust world!
20:44 fdobridge: <k​arolherbst🐧🦀> or you loop twice
20:46 fdobridge: <j​ekstrand> I'm just walking backwards. That should work fine.
20:46 fdobridge: <j​ekstrand> It's just not the direction I expected. 🤷
21:10 fdobridge: <k​arolherbst🐧🦀> well.. that doesn't work if you have to use barriers
21:11 fdobridge: <k​arolherbst🐧🦀> because that's decided by the source instruction 🙃
21:11 fdobridge: <k​arolherbst🐧🦀> so some instructions (like f2i) have a variable runtime and consumers have to wait on the instruction to finish via a barrier
23:28 fdobridge: <j​ekstrand> Great success! I have a compute shader working which means constant buffers and global load/store 😄
23:29 fdobridge: <j​ekstrand> I don't totally suck at compilers after all. 🙂
23:39 mhenning: jekstrand: cool! sounds like progress
23:39 mhenning: karolherbst: did you ever write a patch for fixing the latency of OP_BAR on ampere? If not, I might poke at it this weekend.
23:40 fdobridge: <m​arysaka> does anyone know what PRI stand for?
23:42 fdobridge: <a​irlied> in what context?
23:43 fdobridge: <m​arysaka> like the prefix of some open-gpu-doc's manual
23:43 fdobridge: <m​arysaka> https://github.com/NVIDIA/open-gpu-doc/blob/master/manuals/volta/gv100/pri_mme.ref.txt
23:47 fdobridge: <m​arysaka> I got lost in a rabbit hole while analysing some MME macro I dumped ^^'
23:49 fdobridge: <m​arysaka> basically there is some macro that talk with method "FALCON04" that seems to write values at given address/offset
23:51 fdobridge: <m​arysaka> searching one of those value used for a call in nvgpu kernel driver does returned a match with a name that is "gr_pri_gpcs_setup_debug_r"
23:55 fdobridge: <m​arysaka> that value is also in some whitelist range for "NVGPU_DBG_GPU_IOCTL_REG_OPS" ioctl too