00:08gfxstrand[d]: mhenning[d]: Fedora has packages, too. Look for instructions on the RPMFusion website.
00:09gfxstrand[d]: x512[m]: It's fine. The command processor just gobbles up data and processes it. It's only a problem if the kernel inserts something between the command we started and the data we're trying to submit with it.
00:10gfxstrand[d]: Also, this isn't really used much pre-Turing.
00:12gfxstrand[d]: Turing+, rather. We use it for indirect draws pre-Turing because the MME lacks the ability to read arbitrary data and all data has to be pushed.
00:35mhenning[d]: I started hacking at some hoppen,blackwell instruction encoding stuff here: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34334
00:35mhenning[d]: (just using nvdisasm, not on hardware)
00:37snowycoder[d]: pavlo_kozlenko[d]: Not yet, there are still a lot of missing features, but that is the plan in a few weeks (I hope).
00:37snowycoder[d]: Right now it can run a lot of compute or an untextured triangle and that's itπ
00:41gfxstrand[d]: Textures shouldn't be hard. They're just a bit of typing.
00:42gfxstrand[d]: Then again, the most of Kepler bringup is "just a bit of typing". π
00:42gfxstrand[d]: You're going great, BTW. π
00:44mhenning[d]: "just a bit of typing" feels a bit reductive to me
00:44mhenning[d]: like this: https://xkcd.com/722/
00:45broski[m]: From Trusted and Vouched Dealers... (full message at <https://matrix.org/oftc/media/v1/media/download/AUkwdnnNqX213Oej23dRYTV9YA5x32yRnIrDPC839NmGI3mZjUe5czDt6S86Mb7T-_am4Rh6sRsgSV41csQQCN1CeWPA6G6QAG1hdHJpeC5vcmcvakNIc0lIRkV1VUhKWkdDYktJWm5mSmNY>)
00:46redsheep[d]: If you already know what to type then the entire driver would be "just a bit of typing" lol
00:47redsheep[d]: With some clairvoyance fixing my issues with the nouveau kmd would be just a bit of typing
00:48redsheep[d]: Clearly I don't possess that power
00:53redsheep[d]: Maybe I just need to read all of the display code again
01:04redsheep[d]: Given that last time I couldn't actually find where some parts were implemented I'm worried the actual issue is in nouveau specific code or an issue with the monitors I'm working with having bad edids. But if that were the case why isn't amdgpu broken?
01:04redsheep[d]: isn't in*
01:16airlied[d]: oh uldc is now ldcu and is in a different spot
01:52airlied[d]: oh I've moved to misaligned address
03:08airlied[d]: https://gitlab.freedesktop.org/airlied/mesa/-/commits/nvk-wip-gb20x is where I'm playing around, but getting misaligned address, and lots of fail
03:27airlied[d]: looks like r2ur moved
03:31gfxstrand[d]: snowycoder[d]: I started typing something for you: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34336
03:32gfxstrand[d]: mhenning[d]: It is
03:32gfxstrand[d]: Sorry
03:32gfxstrand[d]: I didn't mean to be dismissive
03:32airlied[d]: hmm I wonder if qmd has changed
03:32gfxstrand[d]: Almost certainly
03:33gfxstrand[d]: I would be shocked if it hasn't
03:35airlied[d]: dEQP-VK.compute.pipeline.basic.empty_shader seems like the only compute shader I that passes π
03:35orowith2os[d]: How noble of an effort would it be to write a nova equivalent for pre-Turing (Kepler specifically) :akipeek:
03:35orowith2os[d]: I'm guessing it would be Not Too Bad, since there's already nouveau for that
03:36airlied[d]: not sure why you would pre-gsp, fix nouveau is probably easier
03:37orowith2os[d]: Funsies? You saw me on r4l's zulip, right?
03:37orowith2os[d]: A while back
03:37orowith2os[d]: First reason would be because, I want to. Next up is to see if I can clean it up at all, compared to normal nouveau, and maybe redo things in a way that make a bit more sense
03:59airlied[d]: okay got 55 dEQP-VK.draw.* tests to pass in a row
04:06redsheep[d]: orowith2os[d]: Having nouveau as reference surely helps but that sounds like a pretty crazy undertaking. "Just" fixing nouveau would be awesome but I assume you're talking about r4l cuz you don't want to work on C
04:06orowith2os[d]: Correct
04:06orowith2os[d]: I'd be more confident with myself if I were writing Rust
04:07orowith2os[d]: I could probably pull it off with C, but I wouldn't be happy about it
04:07orowith2os[d]: Maybe if the kernel were more friendly to some C++ features, I'd like it more
04:09redsheep[d]: I wonder if it would be remotely possible to convert to rust in parts
04:10orowith2os[d]: Probably
04:10orowith2os[d]: I'd just have to expose some symbols for it to link against from C
04:11redsheep[d]: Then just rewrite the display code π
04:11orowith2os[d]: I definitely wouldn't rewrite anything facing userspace
04:11airlied[d]: no converting to rust in part would be hard, too many structs
04:11orowith2os[d]: orowith2os[d]: So yeah, probably more just interfacing with the hardware directly, like reclocking the GPU or whatever
04:23gfxstrand[d]: airlied[d]: Yeah. People have tried that. It doesn't go well.
04:41orowith2os[d]: Any links to past effort?
05:30airlied[d]: there might be some stuff around panthor conversion to rust on dri-devel
07:42snowycoder[d]: gfxstrand[d]: That's a lot pf typing hahaha.
07:42snowycoder[d]: Thanks, it also makes a lot of IR more readable
07:46marysaka[d]: For anyone using envyhooks, I pushed some changes that should make Blackwell work for dumping (cc airlied[d] mohamexiety[d] gfxstrand[d])
07:48marysaka[d]: It should also be easier to update on newer drivers as the wrapper.h file is now autogenerated + channel types are now detected in a generic way instead of hardcoding each gens
08:30matt_schwartz[d]: I did put the latest gb20x kernel on my Blackwell rig as well in case you need it
12:02gfxstrand[d]: snowycoder[d]: Yeah, the fact that we weren't printing the masks anywhere was pretty rubbish.
12:03gfxstrand[d]: Don't take the top patch, obviously, but I think the others mostly work. (I left them CTSing last night.)
14:30snowycoder[d]: gfxstrand[d]: The first patch could work if we restrict the target to sm32 though, right?
14:34gfxstrand[d]: That top patch should work everywhere, is just unnecessary.
14:34gfxstrand[d]: I wrote it just to test `sust.b` to make sure I encoded it correctly.
14:37gfxstrand[d]: gfxstrand[d]: Also, there's some issue with the crazy 10 and 11-bit float format which I haven't looked into. Something with rounding or denorm flushing, I suspect.
14:50gfxstrand[d]: Yeah, looks like all the issues are 10 and 11-bit stores
15:15gfxstrand[d]: But yeah, if you drop the `HACK:` patch and implement `sust.p` and `suld.b`, that should get you image support on Kepler.
15:49babblebones[d]: airlied[d]: Any of the GSP work in place to properly flag priority on the workloads?
15:49babblebones[d]: If not hopefully soon <a:BongoTap:732755194928168970>
15:52gfxstrand[d]: No, we don't support priorities yet.
16:08snowycoder[d]: gfxstrand[d]: Thanks, I'll try after work.
16:08snowycoder[d]: p.s. do you have any CTS paths to test basic images and basic textures? All I can find are some complex uses that also test filtering and mipmaps
16:08gfxstrand[d]: For basic images, search for `with_format`.
16:09gfxstrand[d]: For basic textures, maybe search for "texelfetch"?
16:10snowycoder[d]: gfxstrand[d]: ok, perfect!
16:12gfxstrand[d]: Also, IDK if you've found this but if you run `deqp-vk --deqp-run-mode=txt-caselist`, it'll dump all the test names to a file.
16:13snowycoder[d]: Oh nice, I've been grepping the caselist folders until now
16:18mhenning[d]: You could also try the sascha willems "texture" demo
16:22snowycoder[d]: Thanks, I'm also building a little testbed project to test things surgically (mostly to debug IPAs)
19:40mohamexiety[d]: https://docs.nvidia.com/cuda/pdf/CUDA_Binary_Utilities.pdf table 8 here has the blackwell instruction set
19:41mohamexiety[d]: they grouped both sm_100 and 120 though :/
19:41mohamexiety[d]: (it's just the names, no encoding or any fancy info but thought it may be helpful)
19:42mohamexiety[d]: trying to play with Kuter's tool but seeing some weird stuff with sm_120. for starters, the SASS file I dumped through cuobjdump was _2.4GB_. if I dump sm_89 instead it's 383MB
19:45airlied[d]: it's bigger due to the delay stuff I think
19:52airlied[d]: unfortunately doesn't have any tex instructions in it
19:52mohamexiety[d]: the pdf? yeah it feels like it's mostly sm_100
19:57gfxstrand[d]: mohamexiety[d]: Yeah, I had limited success with it on SM120. I haven't run SM100 yet.
19:58mohamexiety[d]: let me try dumping sm_100 actually
19:58gfxstrand[d]: And I don't know how to produce the HTML from the cache file.
20:00mohamexiety[d]: yeah I didnt get that far
20:00mohamexiety[d]: don't think he actually published a way to produce the HTML, but not sure. there is a table creating .py file :thonk:
20:05mohamexiety[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1357083434081714508/image.png?ex=67eee9f0&is=67ed9870&hm=60e682daa6a97d381254307c547262f93fb007c10326792a6b1823189b9ca958&
20:05mohamexiety[d]: so sm_100 is still pretty fat, but quite a lot less than sm_120
20:08mohamexiety[d]: gfxstrand[d]: `nv-isa-solver-scan` seems to run fine with sm_100 SASS, which is kinda interesting
20:08mohamexiety[d]: with sm_120 it ran for a few instructions but then there was _a lot_ of "couldn't parse [...]" spam
20:09gfxstrand[d]: Yeah, that's what I saw, too.
20:09gfxstrand[d]: I got plenty out of cublas but then it failed to parse most of it
20:09mohamexiety[d]: yeah
20:09mohamexiety[d]: sm_100 seems to run fine though so I wonder if we can use that temporarily :thonk:
20:10gfxstrand[d]: But I don't know how important that step is. I suspect we just won't get everything but we'll still get some stuff.
20:10mohamexiety[d]: but also I am really curious what even is so different with sm_120 that it's almost twice the size, and it can't be parsed
20:11gfxstrand[d]: mohamexiety[d]: I don't know. I suspect there's differences but I don't know how significant. They skipped two whole CUDA versions, though, so I wouldn't bet on it being super close.
20:11gfxstrand[d]: Okay, back to βοΈ mode.
20:11mohamexiety[d]: yeah that's the other weird bit
20:11mohamexiety[d]: gfxstrand[d]: oop, sorry. good luck!
20:20redsheep[d]: mohamexiety[d]: This would be a question for kuter7639
20:20mohamexiety[d]: yeah been meaning to ping but wanted to gather more info first
20:30mohamexiety[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1357089924725346364/gb200_cache.txt?ex=67eeeffb&is=67ed9e7b&hm=1887957312b30e7122de0911ce78ffd40c6b2b9b4dd32c20e6591d14e66c9bd8&
20:30mohamexiety[d]: sm_100 cache if anyone is interested
20:31mohamexiety[d]: I guess I could run sm_120 even with the errors
20:33mohamexiety[d]: the main discreprenacy that sticks out can be seen here:
20:33mohamexiety[d]: ~/dev/nv-isa-solver/nv_isa_solver$ nv-isa-solver-scan --arch SM120 --cache_file 5090_cache.txt libcublasLt_sm120.sass
20:33mohamexiety[d]: Cache could not be loaded
20:33mohamexiety[d]: Distilling LDC R1, c[0x0][0x37c] &wr=0x0 ?trans8
20:33mohamexiety[d]: Couldn't parse b'\x0cx\x00\x04\x01\x00\x00\x00pb\xf0\x03\x00\xda/\x00'
20:33mohamexiety[d]: Couldn't parse b'M\x89\x00\x00\x00\x00\x00\x00\x00\x00\x80\x03\x00\xea\x1f\x00'
20:33mohamexiety[d]: Couldn't parse b'\x0cx\x00\x04\x10\x00\x00\x00p`\xf2\x03\x00\xe2\x0f\x04'
20:33mohamexiety[d]: Couldn't parse b'\x12x\x06\x04\x0f\x00\x00\x00\xff\xc0\x8e\x07\x00\xe2\x0f\x00'
20:33mohamexiety[d]: Distilling LDCU.64 UR6, c[0x0][0x358] &wr=0x0 ?trans1
20:33mohamexiety[d]: `sm_100` did not have the `&wr=0x0 ?trans8` stuff, and also all the things it couldnt parse. so I guess the SASS for 120 has more info, but not sure what type of info it is
20:35mohamexiety[d]: `wr` could be write/read and `trans` could be transcendental at a guess, but these are pure uneducated guesses. (and also don't make sense? why would LDC have mentions of transcendental stuff)
20:37mohamexiety[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1357091694348079335/5090_cache.txt?ex=67eef1a1&is=67eda021&hm=e4e73dd76f5488129a033097c7b5d60661b981447cd156647f4086e69b4c4ad0&
20:37mohamexiety[d]: for sm_120 it ends up only getting info for 10 instructions. the rest it completely fails on
20:38redsheep[d]: mohamexiety[d]: Oh, sorry
20:38mohamexiety[d]: it's finee
20:58mohamexiety[d]: hm I think I may have figured what's up. can't send things here without massively spamming the irc side of the bridge but I suspect the tool can't cope with the extra stuff, which is why it's failing
20:58mohamexiety[d]: if we look at a single instruction, this is what it looks like in the sm_120 sass:
20:58mohamexiety[d]: /*0020*/ ISETP.GE.AND P0, PT, R4, 0x1, PT &req={1} ?WAIT13_END_GROUP; /* 0x000000010400780c */
20:59mohamexiety[d]: this is how it looks like in the sm_100 sass:
20:59mohamexiety[d]: /*0020*/ ISETP.GE.AND P0, PT, R6, 0x1, PT ; /* 0x000000010600780c */
21:00airlied[d]: that's what I meant by including the delay stuff
21:00airlied[d]: sm120 seems to actually decode delay stuff, where's older sm doesn't decode it
21:00airlied[d]: it's still there, just not decoded
21:00mohamexiety[d]: ah
21:01mohamexiety[d]: yeah from my understanding, volta+ do have this stuff in the instruction. so I guess for 120+ they decided to decode it as well
21:01karolherbst[d]: finally the "yield" isn't a flag information is public knowledge
21:01mohamexiety[d]: so theoretically if we modify the parser to understand the extra stuff, things should work :thonk:
21:04snowycoder[d]: mohamexiety[d]: I think that's what `nv_isa_solver/instruction_solver.py` does, I got it to generate some html files
21:05mhenning[d]: There are also some nvdisasm flags that remove some extra information eg. "--print-raw"
21:06mohamexiety[d]: snowycoder[d]: ooh nice then. so just need to fix up the parser and then we should be able to be good to go
21:06mohamexiety[d]: sorry for being a bit slow with this -- my only experience with compiler stuff is reading the Citadel "Dissecting the <arch>" papers :nervous:
21:07mohamexiety[d]: mhenning[d]: alternatively could try this if it'd make the format match the "old" decoded format
21:08snowycoder[d]: I know this only because I tried to execute for kepler (and failed, kepler encoding is strange).
21:08snowycoder[d]: p.s. if it can help I added some utils to nv-shader-tools that could help, I used a "bit-flip" mode that tries to flip each bit in a range and checks what changes (instead of trying all possible values)
21:09snowycoder[d]: It just needs a bit of refactoring
22:27redsheep[d]: mohamexiety[d]: If you save to a file and attach it to your message then discord users can see the text without spam
22:59gfxstrand[d]: snowycoder[d]: That would be a nice addition to nvfuzz.
23:00gfxstrand[d]: Maybe as a flag that puts it into bit-flip mode but still takes a range.
23:01gfxstrand[d]: We should also fix it so it takes --sm instead of the positional arg.
23:55mhenning[d]: I think I have blackwell/hopper atomic encodings fixed on my branch
23:56airlied[d]: oh nice I had started on fixing atom.g yesterday
23:56airlied[d]: but this morning has all been meetinging out
23:58airlied[d]: I've got some kernel instruction enconding errors, but nvdisasm seems happy with the shaders, so have to dig around a bit more
23:59mhenning[d]: Ah, I'm just testing with nvdisasm so far