00:09 airlied[d]: what nvidia driver are people using the nv-shader-tools with? just tried latest 550 and it crashes inside the driver
00:14 gfxstrand[d]: airlied[d]: mhenning[d] Just pushed to Dave's MR with the rebase. I also replaced both delay patches with a new one that can handle any delay that fits in a `u8`
00:19 redsheep[d]: It kind of seems much like we were talking about pascal kind of being maxwell C that ada is a bit like an ampere B
00:27 gfxstrand[d]: Yeah
00:41 mhenning[d]: redsheep[d]: yeah, ada is pretty much just ampere on a new process node
00:42 gfxstrand[d]: Arguably, everything is Kepler, Maxwell, or Turing. 😝
00:43 gfxstrand[d]: Turing is definitely the biggest cut point in the hardware iteration.
00:44 redsheep[d]: I still haven't seen a ton of detail but it doesn't seem like blackwell has deviated enough to not also be something you could call... idk I guess Turing... E?
00:45 gfxstrand[d]: Yeah, Blackwell is more Turing.
00:45 gfxstrand[d]: They haven't done much on the 3D side. They just keep adding MMA ops.
00:46 mhenning[d]: they're also messing with raytracing
00:46 gfxstrand[d]: Yeah
00:46 redsheep[d]: I'm still not clear on whether the Rubin architecture the recent roadmap had will also be the gaming arch, I wonder if that will finally be the point where they do another major change. I expected blackwell to be much more different than it was
00:46 gfxstrand[d]: Some of that is hardware
00:48 gfxstrand[d]: redsheep[d]: I wouldn't be surprised if Blackwell ends up being the new Volta. They've spent a lot of time doing R&D and it's about time for a major shift. But I also don't know what needs shifting or what would justify it. The Turing ISA has aged fairly well.
00:48 gfxstrand[d]: But I'm just speculating.
00:49 redsheep[d]: Rubin has this huge leap in matrix perf and in the ratios of what they are talking about as inference vs training and that makes me think they have something novel planned
00:50 redsheep[d]: Even if it has aged well it has been quite a long time on iterations of turing at this point
00:50 gfxstrand[d]: We'll find out when they drop it
00:51 redsheep[d]: I don't want to see nvidia end up in the intel core scenario where they've forgotten how to do anything meaningfully different. Even if I don't know what the next thing looks like I doubt turing is the ideal way for a gpu to work
00:53 gfxstrand[d]: I don't think they will. Nvidia has a very different engineering culture from Intel.
00:54 mhenning[d]: Fermi to turing was ~8 years. They don't need fundamental changes all that often
00:54 redsheep[d]: Even if the crazy zeus 1c people are wildly overestimating how effective their gpu will be at raytracing I think they're right that there's some much greater raytracing perf to be found in taking a fundamentally different approach
00:57 redsheep[d]: But I also struggle to see how leaning super hard into RT will coexist with the insatiable hunger for matrix perf and maintaining compatibility and comparable performance in traditional graphics workloads
00:58 gfxstrand[d]: <a:shrug_anim:1096500513106841673>
00:58 redsheep[d]: Maybe I am wrong but I doubt anybody who manages to actually achieve 10x a 5090 in some hyper specific RT benchmark will do much else particularly well
01:00 gfxstrand[d]: What I will say is that Nvidia needs to do something. That gaming perf of both Ada and Blackwell has been quite disappointing. They've got lots of interesting new tech but it kinda falls flat when presented with existing titles.
01:01 gfxstrand[d]: It doesn't help their case when Jensen says a 5070 is as powerful as a 4090 and it can't even beat a 4070 in practice.
01:01 redsheep[d]: I think 4090 perf was very impressive, but the rest of the lineup basically just used the shrink to improve margins and shuffle the stack a bit. I fully agree though that blackwell wasn't what I had hoped
01:02 gfxstrand[d]: Yup. Another generation or two of this and AMD will have clear gaming dominance again.
01:03 gfxstrand[d]: Like, the new DLSS stuff is cool but no one's going to turn on temporal frame generation in an actual game. It'd murder your latency.
01:04 redsheep[d]: gfxstrand[d]: I think the real problem isn't as much that blackwell itself is at all bad, it's just marketing. They named them wrong, priced them wrong, and then got on stage and claimed frame gen is performance. And the worst part is they can't make enough of them so none of this hurts them in the short term, only in long term sentiment
01:04 gfxstrand[d]: Maybe generate one frame to bump 50 FPS to 90 but not for a shooter and you don't want to stretch it any further than that.
01:05 gfxstrand[d]: redsheep[d]: Yup. Pretty much...
01:05 redsheep[d]: gfxstrand[d]: Especially with the rumors around rdna5/udna and how much better fsr4 is, yeah absolutely. They've already taken advantage of the situation well enough that everyone I know who was actively looking for a card bought AMD.
01:06 airlied[d]: I should plug my 5080 into something I suppose
01:07 gfxstrand[d]: redsheep[d]: Another aspect of the marketing is that they probably underpowered the 5070. If they'd come out with that first and given it a little more juice, they could have claimed an actually decent bump. But they bet too hard on AI shit and ended up making a nothingburger.
01:07 gfxstrand[d]: airlied[d]: Wanna submit CTS on it?
01:08 airlied[d]: I think there's a bit of RE I have to do first 🙂
01:09 gfxstrand[d]: Hehe
01:09 redsheep[d]: gfxstrand[d]: The entire lineup needed more juice, yeah. Trouble is they had nowhere to go in terms of power. I feel like they probably expected better efficiency than what they got, which was approximately nothing and sometimes regression.
01:10 redsheep[d]: Also Having 5080 vs 5090 being a full doubling of almost every part just feels like a misstep, everything 80 and below needed like at least 15% more die
01:12 airlied[d]: okay the only outstanding question on my latency branch is why reader is set, just cts'ing it not being set
01:12 redsheep[d]: But we're still stuck at them not being able to make enough so what can you do? As long as nvidia doesn't outright cede gaming to AMD then spare capacity and more aggressive positioning later on put them right back on top whenever they feel like it
01:16 redsheep[d]: airlied[d]: Is there still kernel work to get it booting?
01:17 airlied[d]: nope I think it boots, at least my gb203 laptop boots off the branch
01:20 gfxstrand[d]: airlied[d]: Yay!
01:21 gfxstrand[d]: airlied[d]: I'm running on T400 again. I didn't make any fixes, though, besides the delay stuff.
01:21 airlied[d]: I've pushed all the fixes for the things mhenning[d] found
01:23 gfxstrand[d]: cool
01:23 gfxstrand[d]: Rough shape looked okay today and I have to trust you on the numbers. I'll read it again in detail tomorrow.
01:24 gfxstrand[d]: Does anyone want to perf test that branch on a game or two?
01:24 redsheep[d]: mohamexiety[d]: airlied[d] I'm just now remembering this older message. I assume you're working on bringing up nvk for blackwell? Have you been able to find out whether graphics and compute just aren't separate subchannels anymore, or what is going on with this?
01:25 redsheep[d]: gfxstrand[d]: Which one? I possibly can later tonight
01:25 skeggsb9778[d]: redsheep[d]: no, subchannels etc work basically the same
01:25 skeggsb9778[d]: there's still a separate compute class
01:25 gfxstrand[d]: redsheep[d]: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33573
01:26 gfxstrand[d]: I wouldn't be surprised it it doesn't give us an FPS or two in a few things.
01:26 skeggsb9778[d]: i believe hw just doesn't wait for idle when switching from using one subchannel to the next anymore
01:26 gfxstrand[d]: I guess I could put my 4060 in and fire up DA:TV
01:26 gfxstrand[d]: skeggsb9778[d]: That would be nice of it.
01:27 gfxstrand[d]: We should already have the necessary WFIs in place so I don't think that'll break anything.
01:28 gfxstrand[d]: airlied[d]: Should we preemptively make nouveau GL not enumerate on Blackwell?
01:28 airlied[d]: gfxstrand[d]: I'm not sure it'll move a lot of needles, I suspect our slowdowns are elsewhere, but fingers crossed
01:28 skeggsb9778[d]: i haven't looked at it in detail, but presumably it means you don't need a separate channel for async compute anymore
01:28 skeggsb9778[d]: gfxstrand[d]: it already doesn't, the gl driver has checks against "chipset" in nvc0_screen.c (and winsys too for some reason)
01:28 gfxstrand[d]: airlied[d]: Yeah, I don't expect much. But I know our worst case estimates weren't great.
01:29 gfxstrand[d]: skeggsb9778[d]: Sweet.
01:29 airlied[d]: I wonder how long nvfuzz 0..128 0 0 0 0 would take 😛
01:29 gfxstrand[d]: A long ass time. But a shorter ass time if you have more CPU cores. 😛
01:31 redsheep[d]: I've always wondered about the duration of a standard ass time
01:32 gfxstrand[d]: airlied[d]: https://github.com/kuterd/nv_isa_solver will do something a little smarter than nvfuzzing the entire solution space.
01:32 redsheep[d]: They're always long and short, never just average
01:33 gfxstrand[d]: https://tenor.com/view/snoopy-peanuts-charlie-brown-laugh-cartoon-gif-10063073985247548142
01:37 airlied[d]: gfxstrand[d]: okay I think that MR is mergeable from my end now
01:43 gfxstrand[d]: \o/
01:43 gfxstrand[d]: I'll read again tomorrow
01:44 gfxstrand[d]: Hopefully I'll land it tomorrow
02:24 mohamexiety[d]: I do want to work on Blackwell support but the cards have been kinda unobtainable here. If I still can’t get a big card soon I’ll probably yoink a 5060/Ti since allegedly those release soon, alternatively could just go for the 5070 since it’s kinda available for MSRPish on amazon
02:25 mohamexiety[d]: skeggsb9778[d]: Just for compute and 3d or is more general? (iirc there was a separate copy class for example)
02:53 skeggsb9778[d]: i'm not sure, i haven't looked super deep into it yet
02:54 gfxstrand[d]: Right now we run the copy engine in synchronous mode. That's very not great
02:56 skeggsb9778[d]: how come? all you need to do is allocate a separate channel? iiuc, all the sync etc is the app's responsibility with vk - right?
02:56 airlied[d]: would need to expose transfer queus
02:57 airlied[d]: I think I started hacking one it once and wandered off
02:57 skeggsb9778[d]: *separate channel with NOUVEAU_FIFO_ENGINE_CE instead of _GR
02:57 gfxstrand[d]: We need to sort out channel enumeration better and expose actual transfer/compute queues
02:57 gfxstrand[d]: Also, we need to figure out how intra-channel sync works when not running in synchronous mode.
02:57 skeggsb9778[d]: semaphores
02:58 gfxstrand[d]: Well, yes. I know that
02:58 gfxstrand[d]: But I need to design something to use the semaphores
02:58 gfxstrand[d]: It's on the ToDo list
02:58 gfxstrand[d]: Compute needs the same
03:10 gfxstrand[d]: airlied[d]: Looks like maybe 5% in DA:TV?
03:10 gfxstrand[d]: 42ms -> 40ms or so
03:11 gfxstrand[d]: That's about what I expected
03:11 gfxstrand[d]: Not much but not nothing
03:11 gfxstrand[d]: kuter7639: I'm running SM90 right now, BTW.
03:12 orowith2os[d]: I think I just got my hands on a 20 series card, would y'all want me to set up a VM or something you can use for CTS or whatever?
03:12 gfxstrand[d]: IDK how long scan takes to run. It seems to be doing an instruction every couple seconds.
03:13 gfxstrand[d]: orowith2os[d]: 20-series as in Turing?
03:13 orowith2os[d]: I think so
03:13 orowith2os[d]: I'm taking the PC apart right now to be sure
03:14 gfxstrand[d]: Nah, I've got a pretty good Turing collection at this point.
03:14 orowith2os[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1356466679789650010/IMG_20250331_221341.jpg?ex=67ecab8a&is=67eb5a0a&hm=f55be50ee36a2dba002e6bb5f2b5eded270c3ba7e2a766fccaca982250c46b63&
03:14 orowith2os[d]: I only really need enough storage, memory, and CPU to build The Entire Fucking World, and not the GPU
03:15 gfxstrand[d]: hehe
03:15 gfxstrand[d]: Ugh... I just realized that at this point Mesa 25.1 won't have Blackwell support. 😢
03:16 orowith2os[d]: Geh, looks like it might be a 2060
03:16 redsheep[d]: orowith2os[d]: The spiderwebs and dust, my god
03:16 gfxstrand[d]: I could rush a card here and try to get everything done but I don't think I'll have the time.
03:16 orowith2os[d]: There are NO markings on this whatsoever
03:16 x512[m]: May I ask stupid question: what is hardware effect of allocating subchannels and why is it needed?
03:17 redsheep[d]: If that's a stupid question then I'm a certified caveman
03:17 orowith2os[d]: redsheep[d]: My dad's out-of-country and I've been suffering with building webkitgtk, ffmpeg, and friends on my laptop, so I decided to see if he was fine with me borrowing this for a bit.
03:17 orowith2os[d]: It's probably been two years since he's used it, let alone cleaned it.
03:18 orowith2os[d]: 64GB of memory <a:eyes:1205249674961883197>
03:19 gfxstrand[d]: x512[m]: Not much. Modern NV hardware has subchannels pretty well fixed these days. At least, that's what I'm given to understand.
03:20 x512[m]: Nothing works if not ask KMD to create objects for TURING_A etc..
03:20 gfxstrand[d]: I think the original idea might have been to have things a bit mix-and-match but they gave up on that. There's 3D, compute, and copy these days, where 3D can do everything, compute can do copy, and copy is just copy.
03:20 gfxstrand[d]: Example?
03:21 orowith2os[d]: HA! It's a 3090!
03:21 orowith2os[d]: gfxstrand[d]: how's that?
03:21 orowith2os[d]: Useful or no?
03:21 x512[m]: gfxstrand[d]: Comment out this code block and it will stop working: https://github.com/X547/mesa/blob/7e715b311da45b779a5d3482247e4c49616e0994/src/nouveau/vulkan/nvkmd/nvrm/nvkmd_nvrm_ctx.c#L138
03:22 mhenning[d]: envytools has some background info on channels/subchannels https://envytools.readthedocs.io/en/latest/hw/fifo/intro.html
03:22 gfxstrand[d]: orowith2os[d]: Pretty nice....
03:22 gfxstrand[d]: Not something I need to SSH into but that's a pretty sweet GPU.
03:22 orowith2os[d]: Knowing damn well I won't use it, if there's anything I can do to set it up for y'all, just lmk
03:27 orowith2os[d]: I'm not sure if there's anything I can do to hook it up to Mesa's CI and maybe automatically build and test PRs for Ampere
03:28 redsheep[d]: I am skimming the latency MR now and I am really glad nvidia provided docs, REing all of these would have been insane
03:29 gfxstrand[d]: Valve's got 8 4060s they're about to turn on for CI. And generally we prefer not to run CI out of people's homes because internet, power, etc. aren't reliable enough.
03:29 gfxstrand[d]: redsheep[d]: Yup
03:30 orowith2os[d]: gfxstrand[d]: Not merge-blocking CI, of course, but if y'all want it there (even with the potential flakiness)
03:30 orowith2os[d]: :akipeek:
03:31 orowith2os[d]: I don't know all what the Mesa NVIDIA hw CI looks like, so I'm not sure
03:31 gfxstrand[d]: My test rig has 2x 3060s
03:32 orowith2os[d]: So we're good then, I can just disconnect the 3090 and leave it be?
03:32 gfxstrand[d]: Yeah. Sad to see a beautiful piece of HW like that powered off, though. 😢
03:33 orowith2os[d]: If I owned it, I'd hand it to one of y'all :glorp:
03:33 orowith2os[d]: Just @ me if you need anything ran on it, and I'll let you know if I can
03:34 skeggsb9778[d]: x512[m]: the driver still uses the info to allocate appropriate engine context buffers, etc
03:35 skeggsb9778[d]: once upon a time you had to describe objects in a "special" section of vram (instance memory)
03:36 x512[m]: To understand NVRM objects, I made a tool to dump object hierarchy that looks like this:
03:37 x512[m]:sent a code block: https://matrix.org/oftc/media/v1/media/download/AQoHV1yBiylve12cGtJ-GYzBE3wctp-Ma9Els03tNXfpFxYQYmXN27KiwHxtZ4NvyqwMElu1SCUIrFAt3p_-clxCeWN4W1ZwAG1hdHJpeC5vcmcvSHZIeXNMUVVnUUNRdGtMeE9RbXFTRmVi
03:37 gfxstrand[d]: Ugh... That's turning into a link that doesn't make its way through all the bridges
03:40 orowith2os[d]: If I had a nickel for every time a discord <-> matrix bridge broke...
03:40 orowith2os[d]: Oh, well, it's Discord <-> IRC <-> matrix, isn't it?
03:40 orowith2os[d]: Same idea.
03:40 gfxstrand[d]: discord <-> IRC <-> matrix, even. Extra bridges make it extra reliable!
03:41 orowith2os[d]: Someone toss me a thousand dollars so I don't have to worry about money and maybe I can write the 60th bridge ;)
03:41 redsheep[d]: We only need the 4th bridge
03:42 orowith2os[d]: Revolt?
03:42 orowith2os[d]: Gasp. Tulip.
03:42 orowith2os[d]: Zulip, bleh.
03:42 redsheep[d]: No, rather indirectly referencing a fantasy novel
03:45 orowith2os[d]: Dammit, ADHD. Now I might look at a crate for the zulip API to write a native desktop client for it.
03:45 orowith2os[d]: :blobcatnotlikethis:
03:46 redsheep[d]: https://tenor.com/view/its-time-to-stop-stop-clock-time-gif-5001372
03:46 orowith2os[d]: I got ONE opportunity to hyperfocus and that was on an unfinished Wayland compositor that I probably now won't touch for another two weeks
03:46 orowith2os[d]: ❤️
03:47 x512[m]: https://gist.github.com/X547/f6626b9d6cca66d1b31fdcda76e9aa0b#file-nvrm-2-tree-log-L130
03:49 gfxstrand[d]: x512[m]: Yeah, so that looks like one 3D+compute+copy queue and two compute+copy queues
03:50 gfxstrand[d]: Then later another channel group which is just 3D
03:52 mohamexiety[d]: gfxstrand[d]: How much time do we have?
03:52 gfxstrand[d]: Nominally, branching on the 14th
03:52 mohamexiety[d]: I can rush a card from Amazon for a liitle bit of markup, and it would arrive in ~1.5 weeks
03:52 mohamexiety[d]: Hmm
03:53 gfxstrand[d]: I could have one here tomorrow if I thought I'd have time to get it all done.
03:53 mohamexiety[d]: I do know I will have the time but the issue is logistics 😦
03:53 mohamexiety[d]: Local options are out of the question because they need you to buy a full PC alongside the card
03:53 gfxstrand[d]: Okay, maybe not that fast
03:54 gfxstrand[d]: But in a day or two
03:54 redsheep[d]: This sounds like a job for ssh
03:54 mohamexiety[d]: Problem is it’s a bit more complicated than that for Faith
03:55 gfxstrand[d]: Actually... I could get one early next week and give mohamexiety[d] SSH access.
03:56 mohamexiety[d]: That could work yeah. Alternatively Amazon has the 5070 with an April 6 arrival day too
03:57 mohamexiety[d]: I am fine with either option so it depends on what you find more convenient
03:59 gfxstrand[d]: <a:shrug_anim:1096500513106841673>
04:00 mohamexiety[d]: gfxstrand[d]: Btw any reason in particular? SM90 is Hopper
04:03 matt_schwartz[d]: since times a factor I don’t mind volunteering an ssh into a rig with Blackwell for the immediate future if it would help make the mesa 25.1 deadline
04:04 mhenning[d]: also blackwell being both 100 and 120 makes me worry that we have multiple architectures this gen
04:05 mohamexiety[d]: mhenning[d]: Yeah 100 is the DC version and Spark (the UMA AI device) and Thor. 120 is the consumer gfx line
04:07 mohamexiety[d]: The DC card doesn’t do Vulkan I think but I think Spark/Thor can work with nvk
04:10 gfxstrand[d]: Yeah, I'll probably pick one up next week.
04:10 gfxstrand[d]: I can probably get it working in 2 weeks unless something major comes up
04:11 gfxstrand[d]: And we can backport some patches if needed
04:11 mohamexiety[d]: I will order the 5070 just in case too
04:12 gfxstrand[d]: okay
04:15 mohamexiety[d]: matt_schwartz[d]: I am not too experienced with using ssh tbh but I could take you up on that effort till the card arrives if it’s not too much trouble. Main thing I’ll want to do is CTS and mesa builds
04:16 airlied[d]: from a quick look at lot of the encoding looks the same
04:16 gfxstrand[d]: I'd be surprised if they changed much of the simple stuff.
04:16 gfxstrand[d]: MMA is probably all of the map but the basics like IADD probably haven't moved.
04:18 mohamexiety[d]: Tbh even MMA may not have changed much beyond FP4
04:18 gfxstrand[d]: Don't forget FP6!
04:18 mohamexiety[d]: Per SM per clock matrix throughput of Blackwell is 1:1 Ada
04:18 gfxstrand[d]: But yeah, first step is for someone to boot one and run dEQP on it
04:19 mohamexiety[d]: SM 100 does get a lot of matrix goodies (e.g tensor cores have a dedicated SRAM scratchpad now) but nothing that carried over to 120 it seems
04:19 gfxstrand[d]: Yeah, that sounds expensive
04:22 redsheep[d]: gfxstrand[d]: Oh my god I thought you were joking, this exists... whyyyyyyyyyy
04:23 matt_schwartz[d]: mohamexiety[d]: its got a 9800x3d and plenty of ram so mesa compilation shouldnt be a problem. i can give it its own ethernet connection and even a spare monitor via displayport or hdmi. i have cachyos (arch) on there right now but could put workstation fedora on it if you're more familiar with that.
04:23 gfxstrand[d]: Because you can fit more of them in memory than FP8 but they're higher precision than FP4.
04:24 redsheep[d]: Hmm. I guess with 4x4 matrix it can always go into 24 bits so it's not quite as terrible as I thought, just nearly as terrible
04:25 gfxstrand[d]: Blackwell is all about packing as big a model as possible into the physical limitations of that rack of GPUs.
04:28 mohamexiety[d]: matt_schwartz[d]: Yeah that’s plenty fine. I don’t think distro choice matters in this case so you can keep it I think. Give me a shout when it’s ready and in the meantime I’ll play around with ssh here locally to get a better feel of things. Thanks so much! <a:ablob_heart:432929453644185602>
04:30 redsheep[d]: The only part of using ssh that is at all complicated is the keys, and it's not bad. It's just a very simple remote terminal
04:32 redsheep[d]: The part that could suck here is having the remote machine's kernel fall over
04:37 redsheep[d]: Whenever I am figuring out what parts are needed for a server room I never want to drive to the servers have to have a BMC, and then a remote controlled power strip as a fallback. Driving 4 hours for a locked up machine is not fun.
04:38 matt_schwartz[d]: i'm gonna assume you want the machine on the gb20x kernel w/ that specific linux-firmware, right?
04:38 mohamexiety[d]: Yeah
04:39 mohamexiety[d]: Though actually I noticed that there’s been another update
04:39 mohamexiety[d]: skeggsb9778[d]: does the other branch support gb20x?
04:43 gfxstrand[d]: redsheep[d]: That's why both of my desktops have PiKVMs.
05:38 airlied[d]: skeggsb9778[d]: so did the fmc firmware become an elf file deliberately?
05:39 airlied[d]: oh it doesn't have the hand written header, and uses elf now that is probably nicer
05:52 redsheep[d]: gfxstrand[d]: Got set back up to do verified tests with good precision, just had to remember how I had been doing it. The witness sees no discernible difference with a margin for error of about 1%
05:57 redsheep[d]: Deep Rock Galactic is seeing about a 2% improvement
06:05 redsheep[d]: Test methodolgy to make talos 2 work consistently is really annoying but I think I've got it, once the mad stuttering dies down the frametimes are 5% better. Mind you that's 12 fps, but still that's the best I have seen so far
06:05 redsheep[d]: That's with nanite and software RT and every bell and whistle turned on
06:05 matt_schwartz[d]: mohamexiety[d]: i made a fresh kernel build on the 03.01-gb20x branch and so far get nouveau to grab the framebuffer even w/ the correct linux-firmware :thonk:
06:05 matt_schwartz[d]: [ 3.782320] nouveau 0000:01:00.0: NVIDIA GB202 (1b2000a1)
06:05 matt_schwartz[d]: [ 3.782413] nouveau 0000:01:00.0: gsp ctor failed: -2
06:05 matt_schwartz[d]: [ 3.782499] nouveau 0000:01:00.0: probe with driver nouveau failed with error -2
06:06 matt_schwartz[d]: cant get nouveau to grab the framebuffer** i mean
06:07 matt_schwartz[d]: i have everything else set up for you to ssh once i get this figured out
06:10 matt_schwartz[d]: ls /usr/lib/firmware/nvidia/gb202/gsp/
06:10 matt_schwartz[d]: .rw-r--r-- 199k root 31 Mar 22:00  bootloader-570.133.07.bin.zst
06:10 matt_schwartz[d]: .rw-r--r-- 200k root 31 Mar 22:00  fmc-570.133.07.bin.zst
06:10 matt_schwartz[d]: lrwxrwxrwx - root 31 Mar 22:00  gsp-570.133.07.bin.zst -> ../../ga102/gsp/gsp-570.133.07.bin.zst
06:10 matt_schwartz[d]: firmware in place. no nvidia drivers present on the machine at all.
06:11 airlied[d]: the l-f repo and that branch are out of sync
06:13 redsheep[d]: I'm beginning to remember why I so often stop at testing the witness. It takes like 3 minutes for doom eternal to stop acting like it's hung
06:13 airlied[d]: https://gitlab.freedesktop.org/nouvelles/kernel/-/commits/03.01-gb20x has some hacks to reconverge things until ben rebases
06:16 x512[m]: As I understand dEQP is some kind of Vulkan test suite? Is it hard to run it? Does it include fence and [timeline] semaphore tests? Opaque FD import/export?
06:17 redsheep[d]: airlied[d]: your latency branch looks good from a perf perspective. Not earth shattering but you didn't expect it to be. Doom eternal is maybe 4% faster. The heavier the game the more the uplift from what I've seen so far, which is where it counts
06:17 airlied[d]: VK-GL-CTS is the test suite
06:17 airlied[d]: based on deqp
06:18 airlied[d]: it includes lots of tests, but I don't think I'd say it's easy to get running
06:19 mangodev: i wonder how to test why a program is running slower than expected; is there a type of profiler that can find what calls are running abnormally slow?
06:20 matt_schwartz[d]: does nouveau support gpuvis?
06:21 airlied[d]: you mostly just guess
06:21 airlied[d]: and after guessing for about 5 years you start to make more educated guesses
06:21 mangodev: i'm wondering why kde spectacle seems to run abnormally slow under nvk+zink
06:22 mangodev: would the vulkan screenshot layer help (if it's even compatible with nvk)?
06:22 redsheep[d]: I found a vulkan layer that gave some insight but the info wasn't super reliable or useful
06:23 mangodev: from what i've read, the screenshot layer shouldn't do much with spectacle, as the screenshot itself doesn't seem to cause the lag
06:23 redsheep[d]: There's some amount you can get from renderdoc as well but renderdoc isn't exactly easy to use and there's not much I could find there either tbh
06:24 airlied[d]: yeah renderdoc timings can give a rough sketch as well
06:24 mangodev: i kinda wanna try using renderdoc (or similar) on no man's sky to see why it runs at a locked 2fps on lowest settings
06:24 mangodev: is it a possibility that mesa could be using lavapipe as a fallback since nvk doesn't have mesh shader support yet?
06:25 mangodev: although i'd think there'd be some info about that from dxvk if so
06:25 redsheep[d]: Oh nice, last game I am testing tonight, talos 1, and it's looking great! 5-6% from the latency MR, but as a whole is running more than 50% faster than just a couple months ago
06:26 airlied[d]: perf top would probably show if you are using lvp
06:26 redsheep[d]: redsheep[d]: I wonder if that was the instruction scheduler or something else, either way it runs at 120 fps now which is awesome
06:27 mangodev: how do you even tell what games run well and which ones don't? it feels backward to me because some complex games run fine while some seemingly simple ones don't run at all
06:29 mangodev: minecraft w/ vulkanmod runs flawlessly, if not nearing to be *better* than proprietary, yet anything with shadowmapping seems to chug on my end
06:30 mangodev: i do love how the desktop runs so far though
06:30 redsheep[d]: From what I've seen it tends to be older titles and CPU bound titles in that runs great category
06:31 tiredchiku[d]: NVK is missing zcull (which helps with performant shadows) and color compression (which helps with anti-aliasing)
06:31 mangodev: ohhhhh interesting
06:31 redsheep[d]: For GPU bound titles it tends to be newer games struggle most but it's pretty well all over the place
06:31 mangodev: i never learned enough about zcull to know it's involved with shadowmapping
06:31 tiredchiku[d]: so turning down shadows and AA gives you a big boost
06:31 mangodev: interesting
06:31 redsheep[d]: That's also just true of games in general though
06:31 tiredchiku[d]: also it still doesn't have the RT extensions and fancier things like that
06:32 redsheep[d]: Nvk gets hurt more than usual but there's clearly a loooooot more to the story
06:32 tiredchiku[d]: yup
06:32 mangodev: may explain why we <3 katamari reroll ran so slowly on highest settings (despite being a relatively simple game graphically)
06:32 redsheep[d]: Believe me I've been hunting for that simple narrative for what's wrong since like 18 months ago lol
06:34 tiredchiku[d]: spoiler: lots
06:34 mangodev: i'm wondering if some of my compositing issues are kde-related, zink-related, or actual nvk issues
06:34 redsheep[d]: I don't think a simple answer exists. The cbuf stuff still seems like it's got the power to massively swing performance but since that's a difficult area and the same optimizations hurt and help seemingly at random it's not great
06:34 mangodev: i've been pondering if a vulkan-composited wl-roots desktop would run better than kde+zink; would there be any tangible performance/functionality improvements for doing so?
06:36 mangodev: i'm wondering if the stutter from moving my mouse between monitors would be fixed using native vulkan wsi instead of zink
06:38 mangodev: it's most noticeable *specifically* when moving my mouse from one monitor to another monitor with the discord voice chat view open 🙃
06:38 redsheep[d]: I have noticed the zink session seems a bit more prone to locking up from background stuff going on than nouveau gl seemed to be. I'm just glad that session is working reliably and correctly now though
06:38 mangodev: yeah i was wondering
06:39 mangodev: the explicit sync doesn't feel truly explicit if that makes sense
06:39 mangodev: it works, though only partially; the system (mouse included) still lags if one application is lagging
06:40 mangodev: i do want to find out why electron windows keep recurrently silently "crashing" though, definitely something to do with nvk/zink, as it didn't happen with proprietary
06:41 mangodev: every now and then, electron windows just kinda blank out for a sec… maybe a memory leak?
06:42 redsheep[d]: What version are you running? That sounds like some bugs I had thought were fixed
06:42 mangodev: 34
06:42 redsheep[d]: No I mean are you running recent mesa main, or what?
06:42 mangodev: git
06:42 mangodev: from a few hours ago
06:43 redsheep[d]: Hmm. Maybe I just don't leave it going long enough to see it, I dunno. Does anything show up in dmesg when it appears to do those partial crashes?
06:44 mangodev: hard to say, there's usually a lot of logspam
06:44 mangodev: i regret updating my drivers today because there's a little oopsie in mesa as of today
06:44 mangodev: nv12 was added as a mesa color format, but was only implemented in etnaviv :|
06:45 mangodev: meaning anything expecting nv12 (such as a video) spams logs *every frame* saying the color format is invalid
06:47 mangodev: tried bringing it up in #dri-devel, but that channel is dead silent
06:47 redsheep[d]: That's not great. Well if you do find that it puts something in dmesg that would be good to know (not that anyone is likely to be keen on more zink+chromium debugging fun times lol)
06:49 mangodev: i want to speak positively about nvk, because i do have a lot of positivity about it as well, but there's so many potential issues that i don't know where else to bring up :(
06:49 mangodev: i don't wanna be a nag, i want to help the driver go forward
06:51 redsheep[d]: I'm right there with you, it's not easy to balance coming off as overly negative while talking about pain points
06:51 mangodev: also — a little off-topic, but what does the [H: 4(x)] mean over the text input in irc? is it users typing, or users with the channel focused? i'm very new to irc and still learning the ropes :D
06:51 redsheep[d]: And there's only so much good it can do talking about how great things are
06:51 mangodev: agreed
06:52 mangodev: especially because i've been in a weird mood lately, the littlest things drive me insane and i don't know why
06:53 mangodev: i love the driver so far, but it's hard to say that it's "complete" in its current state, or even truly a "daily driver"
06:55 redsheep[d]: It's getting there but there are a few things that would be blockers for quite a few folks. For me it's display stuff having to do with the kernel and I haven't been able to make heads or tails of what could be wrong with the code so I just live with it when testing, and don't use my nvk install daily.
06:56 mangodev: interesting
06:56 redsheep[d]: If you don't play recent titles and use a very typical monitor, preferably only one of them, then you can probably do just great
06:56 mangodev: what "display stuff with the kernel"?
06:58 redsheep[d]: Well, a lot of things. I stopped bringing them up because they should be fixed with nova and there's probably not much point trying to fix nouveau for some of this, but it basically boils down to lack of displaystream compression and HDMI FRL support, mixed with modes that should be absolutely fine either not working or being inexplicably missing.
06:59 matt_schwartz[d]: mohamexiety[d]: alright i've got pretty much everything set up for you on the blackwell rig. i'll DM you on discord with how to connect. you can do anything except mine frogcoins on it 🐸
07:00 redsheep[d]: Oh and nouveau isn't super great at picking when to use chroma subsampling vs lowering bits per channel, my display drops to 6 rather than just using 4:2:0 or something which would be a whole lot less painful
07:03 tiredchiku[d]: mangodev: what irc client are you using
07:03 redsheep[d]: The heuristics could really use some love. A little bit of fringing on text vs massive banding isn't even close to being a similar level of tradeoff
07:05 mangodev: tiredchiku[d]: weechat through kitty
07:05 redsheep[d]: Like, IMO dropping to 6 bpc should be something the kmd only does when there's literally no other way to drive the display successfully
07:06 tiredchiku[d]: 4 users have that channel as their focused channel
07:06 tiredchiku[d]: so, one of them is you, one's probably the irc-matrix bridge, one's the irc-discord bridge :p
07:09 mangodev: interesting
07:09 mangodev: and the parentheses are people typing i assume?
07:10 tiredchiku[d]: nope, irc doesn't do that
07:10 tiredchiku[d]: I forget what the parentheses is sadly
07:10 redsheep[d]: What I really don't get is why I can't just have explicit control over my chroma subsampling, bpc, and DSC, but I get there's quite a lot of plumbing that would require and is apparantly a niche thing to want to change, go figure
07:11 mangodev: qq, what does "plumbing" mean in the nouveau space? i see it a lot
07:11 redsheep[d]: Informal term. Just means connecting stuff up, basically
07:13 redsheep[d]: Lots of layers and agreement needed to get something like that to the point where I would have UI to control it, and probably involving a lot of people who wouldn't see the point and don't want users to have controls that could degrade their experience when usually the kernel knows better
07:13 mangodev: "connecting stuff up" implies fully implementing already partially implemented features, yes?
07:14 mangodev: i've seen a lot of recent commits about adding helper functions and similar to the codebase; does using those across the codebase count as "plumbing?" or would that be different
07:17 redsheep[d]: It's a vague term, you could call almost anything in programming "plumbing", just that it tends to mean getting different parts talking with each other in a way they didn't before. Specifically I am talking about making it so the kernel's choice of how to set up my display could be overridden, exposing that to user control somehow.
07:19 redsheep[d]: Chroma subsampling is implemented, different possible bit depths are implemented, me being able to choose what those are set to isn't.
07:19 mangodev: isn't chroma subsampling needed for some color formats?
07:20 mangodev: iirc such as ypbpr 4:2:2 in hdmi?
07:22 redsheep[d]: Yes
07:23 redsheep[d]: At least I believe so, if I understand you correctly
07:26 mangodev: or if you want to support analog output formats >:)
07:26 mangodev: ~~vga support soon?~~
07:45 phomes_[d]: I am doing some game testing as well today. Before I test the MR I am updating the benchmark baseline with current main. Wow. Compared to my last baseline we improve from in many games ~6% and one even going up 39%
07:47 phomes_[d]: vk3d3 games got ~7%, dxvk ~11%, and the largest improvements happened in vulkan native games (one with 39% and one with 21%)
08:05 redsheep[d]: phomes_[d]: Was your last baseline before the greedy scheduler?
08:16 phomes_[d]: yes
08:19 phomes_[d]: some is from the greedy scheduler. I did upgrade some system packages so there could be some improvements from kernel etc. I should rerun the baselines a few weeks back to pinpoint where the improvements are from
08:20 phomes_[d]: MR 33573 adds another 2-4% improvement on top of all that
09:18 x512[m]: <mangodev> "i'm wondering why kde spectacle..." <- Maybe the same problem as in https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34247?
11:55 moogai: Hi, I'm using Ubuntu 24.04 and I have a laptop with dual graphics, but frankly I'm not sure my graphics is working correctly, can I somehow do a smoke test to see if the nouveau driver is loaded correctly?
11:58 DodoGTA: moogai: You can try running an OpenGL application with DRI_PRIME=1 environment variable
12:05 moogai: https://gist.github.com/aucampia/0584481a66d8c40d48c4fb3412bc2a78
12:52 snowycoder[d]: Have I broken something or does Kepler codegen fail all of `dEQP-VK.texture`?
13:51 gfxstrand[d]: You haven't broken anything
13:52 gfxstrand[d]: Oh, wait, codegen as in the old compiler?
13:52 gfxstrand[d]: Not sure. That could be totally broken. I honestly don't know how much I tested it.
13:53 snowycoder[d]: Welp, textures might be harder than I thought, but they are the only missing feature (expect shared atomics)
13:56 snowycoder[d]: I should probably install an old nvidia driver that supports kepler to compare outputs -.-
13:58 gfxstrand[d]: snowycoder[d]: I don't think they'll be too bad. There's some text in the old compiler describing where all the inputs go, you just have to implement that in the lowering pass.
14:01 gfxstrand[d]: https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/nouveau/codegen/nv50_ir_lowering_nvc0.cpp#L972
14:03 gfxstrand[d]: Why did they move everything around multiple times,? <a:shrug_anim:1096500513106841673>
14:05 snowycoder[d]: ahah, thanks!
14:05 snowycoder[d]: That was also the compiler used for opengl, right?
14:05 gfxstrand[d]: Yeah
14:07 gfxstrand[d]: Once Kepler porting is done, we should really rip codegen support out of NVK and move codegen back inside the gallium driver.
14:07 gfxstrand[d]: Have you been using codegen much for bringing up NAK on Kepler?
14:07 snowycoder[d]: gfxstrand[d]: (I'm only doing "Kepler+", SM30 needs another encoder)
14:08 snowycoder[d]: gfxstrand[d]: It helped a lot since it's the only semi-functional driver
14:09 gfxstrand[d]: Heh
14:09 gfxstrand[d]: snowycoder[d]: Yeah... 😭
14:09 snowycoder[d]: I should install old nvidia drivers but I'm a bit scared that they'll break my daily driver
14:09 gfxstrand[d]: Fair
14:10 gfxstrand[d]: And a lot of stuff is annoyingly hard to compare better NVK and the blob driver.
14:10 snowycoder[d]: Wait, aren't shaders mostly similar?
14:11 gfxstrand[d]: snowycoder[d]: In that case maybe we'll keep it until someone does KeplerA/Fermi.
14:12 gfxstrand[d]: snowycoder[d]: The arithmetic is but our resource binding is a bit different so texture ops and things like that might not match up as well as you want.
14:13 gfxstrand[d]: But congrats on getting most of KeplerB working! :frog_party:
14:13 gfxstrand[d]: Textures really shouldn't be hard
14:14 snowycoder[d]: Thanks! It's been quite fun
14:14 gfxstrand[d]: Storage images will need a little work but I can walk you through that easily enough.
14:14 snowycoder[d]: One step at a time😂
14:15 gfxstrand[d]: 😁
14:23 pavlo_kozlenko[d]: gfxstrand[d]: theoretically, changing the frequency to maximum could break it?
14:27 gfxstrand[d]: Huh?
15:16 gfxstrand[d]: snowycoder[d]: do you have a draft MR up for the Kepler stuff yet?
15:16 snowycoder[d]: gfxstrand[d]: Not yet, but I can push one
15:49 gfxstrand[d]: mhenning[d]: airlied[d] I've rebased the MR, pulled in Mel's scheduler patch, fixed a couple things, and pushed. I'm running on Ada now. I'll run on Turing in an hour or so.
15:49 gfxstrand[d]: I think it's probably good to go now if the CTS passes with `NAK_DEBUG=cycles`
15:49 gfxstrand[d]: (One of the changes was to put the cycle count assert behind `NAK_DEBUG=cycles`
16:04 gfxstrand[d]: Ada run is looking good so far:
16:04 gfxstrand[d]: Pass: 562189, Skip: 547807, Timeout: 4, Duration: 16:12, Remaining: 23:51
16:05 gfxstrand[d]: Unfortunately, I don't have my primary CTS box right now so no 25 min CTS runs for me.
16:27 gfxstrand[d]: `Pass: 1390398, Warn: 1, Skip: 1353590, Timeout: 12, Duration: 38:40, Remaining: 0`
16:27 gfxstrand[d]: Time to plug in Turing
16:27 snowycoder[d]: gfxstrand[d]: I'm rebasing on main and there's an `self.sm == 70` in `sm50.rs` that should never trigger (right?)
16:31 gfxstrand[d]: Yeah, that should never trigger
16:31 gfxstrand[d]: Go ahead and add a patch to get rid of it
16:31 gfxstrand[d]: gfxstrand[d]: Turing is running now
17:32 snowycoder[d]: gfxstrand[d]: MR for minor fixes: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34327
17:32 snowycoder[d]: Draft MR for Kepler+ support: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34329
17:32 snowycoder[d]: (it might contain debug code or wrongly encoded ops)
17:39 gfxstrand[d]: snowycoder[d]: Thanks!
18:27 gfxstrand[d]: Okay, found one more tiny bug in the Turing code. Looks like Turing gets another CTS run before we merge this monster.
18:41 redsheep[d]: You really should have used the collabora blog to do an "NVK now has no bugs" April fools post
18:42 mohamexiety[d]: airlied[d]: skeggsb9778[d] so I am working on gb20x on nvk through matt_schwartz[d]'s system. seeing something weird though; the card exists, and nouveau initializes it fine, but nvk doesn't seem like it's latching onto it at all. is there something missing?
18:42 mohamexiety[d]: ❯ sudo dmesg | grep nouveau
18:42 mohamexiety[d]: [ 5.835764] nouveau 0000:01:00.0: NVIDIA GB202 (1b2000a1)
18:42 mohamexiety[d]: [ 5.959767] nouveau 0000:01:00.0: gsp: RM version: 570.133.07
18:42 mohamexiety[d]: [ 5.959871] nouveau 0000:01:00.0: vgaarb: deactivate vga console
18:42 mohamexiety[d]: [ 6.444606] nouveau 0000:01:00.0: gsp:msg fn:4124 len:0x21/0x1 res:0x0 resp:0x0
18:42 mohamexiety[d]: [ 6.460716] nouveau 0000:01:00.0: gsp:msg fn:4124 len:0x21/0x1 res:0x0 resp:0x0
18:42 mohamexiety[d]: [ 6.710842] nouveau 0000:01:00.0: gsp:msg fn:4124 len:0x21/0x1 res:0x0 resp:0x0
18:42 mohamexiety[d]: [ 6.725901] nouveau 0000:01:00.0: gsp:msg fn:4124 len:0x21/0x1 res:0x0 resp:0x0
18:42 mohamexiety[d]: [ 6.794940] nouveau 0000:01:00.0: gsp:msg fn:4124 len:0x21/0x1 res:0x0 resp:0x0
18:42 mohamexiety[d]: [ 6.795578] nouveau 0000:01:00.0: gsp:msg fn:4124 len:0x21/0x1 res:0x0 resp:0x0
18:42 mohamexiety[d]: [ 6.813035] nouveau 0000:01:00.0: gsp:msg fn:4124 len:0x21/0x1 res:0x0 resp:0x0
18:42 mohamexiety[d]: [ 6.825940] nouveau 0000:01:00.0: gsp:msg fn:4124 len:0x21/0x1 res:0x0 resp:0x0
18:42 mohamexiety[d]: [ 6.831763] nouveau 0000:01:00.0: gsp:msg fn:4124 len:0x21/0x1 res:0x0 resp:0x0
18:42 mohamexiety[d]: [ 6.832060] nouveau 0000:01:00.0: gsp:msg fn:4124 len:0x21/0x1 res:0x0 resp:0x0
18:42 mohamexiety[d]: [ 6.958171] nouveau 0000:01:00.0: gsp:msg fn:4124 len:0x21/0x1 res:0x0 resp:0x0
18:42 mohamexiety[d]: [ 6.959039] nouveau 0000:01:00.0: gsp:msg fn:4124 len:0x21/0x1 res:0x0 resp:0x0
18:42 mohamexiety[d]: [ 6.959315] nouveau 0000:01:00.0: gsp:msg fn:4124 len:0x21/0x1 res:0x0 resp:0x0
18:42 mohamexiety[d]: [ 6.964010] nouveau 0000:01:00.0: gsp:msg fn:4124 len:0x21/0x1 res:0x0 resp:0x0
18:42 mohamexiety[d]: [ 6.993486] nouveau 0000:01:00.0: drm: VRAM: 32607 MiB
18:42 mohamexiety[d]: [ 6.993487] nouveau 0000:01:00.0: drm: GART: 0 MiB
18:42 mohamexiety[d]: [ 22.261280] nouveau 0000:01:00.0: DRM: failed to idle channel 64 [DRM]
18:42 mohamexiety[d]: [ 22.261284] nouveau 0000:01:00.0: drm: ce test: failed to idle channel
18:42 mohamexiety[d]: [ 23.336842] nouveau 0000:01:00.0: [drm] Registered 4 planes with drm panic
18:42 mohamexiety[d]: [ 23.336844] [drm] Initialized nouveau 1.4.0 for 0000:01:00.0 on minor 0
18:42 mohamexiety[d]: [ 23.343423] nouveau 0000:01:00.0: [drm] Cannot find any crtc or sizes
18:42 mohamexiety[d]: [ 23.359626] nouveau 0000:01:00.0: [drm] Cannot find any crtc or sizes
18:42 mohamexiety[d]: [ 23.362585] nouveau 0000:01:00.0: [drm] Cannot find any crtc or sizes
18:43 mohamexiety[d]: I am not sure what the `msg`s are saying but given it proceeds it looks like it's initialized
18:43 mohamexiety[d]: on the userspace/vulkan side it's seeing just llvmpipe. the card also exists in `/dev/dri`
18:44 mhenning[d]: mohamexiety[d]: have you modified nvk's version checks? I don't think we enumerate on unknown cards by default
18:46 gfxstrand[d]: Oh, yeah, you need `NVK_I_WANT_A_BROKEN_VULKAN_DRIVER=true`
18:47 mhenning[d]: also, here's a random patch for something you'll run into https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34161/diffs?commit_id=0296c8600300671e3b0f40ed54106d6c7a95f16b
18:49 orowith2os[d]: I found a Quadro M5000 when digging through some of these old PCs :akipeek:
18:49 gfxstrand[d]: *grumble*
18:49 gfxstrand[d]: Why did nvidia skip 11?
18:49 mohamexiety[d]: good question :KEKW:
18:49 mohamexiety[d]: mhenning[d]: oh, thanks!
18:50 mohamexiety[d]: will check the version checks though `nvkmd_pdev` and`nvkmd_dev` didn't seem to have anything stand out on a glance
18:50 redsheep[d]: Maybe an architecture that was scrapped
18:51 redsheep[d]: That would go some way to explaining Blackwell having some aspects seem rushed even though it came out kind of later than expected
18:51 mhenning[d]:mumbles something about direct x 4
18:55 tiredchiku[d]: windows 9 :ha:
18:57 gfxstrand[d]: mohamexiety[d]: You'll also need to add Blackwell to `sm_for_chipset()`.
18:57 gfxstrand[d]: tiredchiku[d]: There's actually a really good reason for that one
18:57 gfxstrand[d]: It's stupid but there's a reason
18:58 tiredchiku[d]: oh?
18:58 mohamexiety[d]: gfxstrand[d]: yeah just checked nvk_physical_device.c, it will need this to run. but I think it's failing before we get to that point because it should output the "if you know what you are doing pass in the env var" tidbit :thonk:
18:58 mohamexiety[d]: I tried passing it in anyways and it didn't change things
18:59 gfxstrand[d]: A LOT of apps, and I do mean a LOT, have "win 9x" hard-coded to be "windows 95 or 98" and if they released a Windows 9 they would either have to do something weird with those interfaces to prevent matching or thousands of apps would break with no real ability to update or fix them. Microsoft's stubborn commitment to backwards compatibility therefore says Windows 9 is right out.
18:59 tiredchiku[d]: ...
18:59 tiredchiku[d]: amazing
19:02 airlied[d]: mohamexiety[d]: I don't think the kernel is fully setting up the hw properly yet for accel, if you turn off the accel tests in the kernel it'll run nvk up until the first channel exploding
19:02 airlied[d]: but of course the accel tests should work, so there is a bit of a gap somewhere 🙂
19:03 mohamexiety[d]: and here I was excited to get out of the kernel mines for a bit!
19:03 mohamexiety[d]: I guess that's a good lead to follow then, thanks!
19:07 mohamexiety[d]: gfxstrand[d]: where are the chipset values from?
19:07 airlied[d]: I think Ben has some ideas on what is missing, and has been working on it
19:08 skeggsb9778[d]: the channel exploding isn't the kernel's fault, it blows up for me with ILLEGAL_INSTR_ENCODING
19:09 skeggsb9778[d]: though, yes, once the GPFIFO fills up, the kernel will currently think the channel has stalled
19:10 airlied[d]: do you decode ILLEGAL_INSTR_ENCODING from a hex value?
19:11 airlied[d]: https://paste.centos.org/view/8d3babe6 is from my gb203 yesterday
19:12 skeggsb9778[d]: oh, you're probably hitting places where removed methods are being sent by nvk - i hacked those out to get to the illegal opcode bit
19:12 airlied[d]: ah makes more sense, I should hack those off then 🙂
19:13 skeggsb9778[d]: probably be nicer if you could see which ones 😛 i had the benefit of rm logs
19:13 mhenning[d]: we don't have updated method headers yet, do we?
19:14 skeggsb9778[d]: https://bpa.st/RKJA
19:14 skeggsb9778[d]: that's the hacks i had
19:14 airlied[d]: I'll ask today for the class headers to be released
19:14 mohamexiety[d]: mhenning[d]: they're not public no 😦
19:16 mohamexiety[d]: skeggsb9778[d]: `1b0` is gb20x?
19:16 skeggsb9778[d]: yeah
19:16 mohamexiety[d]: shouldnt it return 120 then?
19:16 skeggsb9778[d]: possibly - i wouldn't read too much into what's there, it was a quick 5m exercise to see what'd blow up
19:17 mohamexiety[d]: ah ok
19:21 airlied[d]: on the gb203 I just get a channel hang later now with vkcube, what illegal instr on rm logs or does it show up in kernel?
19:22 skeggsb9778[d]: i thought that showed up as a plain Xid message from RM, but i'll double-check in a min
19:23 airlied[d]: I should probably figure out how to plug the 5080 in, 3 8-pins is fun 😛
19:23 skeggsb9778[d]: did vkcube actually render something before it hung?
19:24 airlied[d]: no, black window appeared, but I'm using xe as the primary
19:24 mohamexiety[d]: oh there actually is a sm 104. interesting.. I wonder where that one is used in. iirc 101 is Spark
19:24 mohamexiety[d]: anyways, anything I can help with in this area?
19:32 skeggsb9778[d]: ah, it was actually starting Xorg (with hacked up nvc0 driver) that got me the ILLEGAL_INSTR_ENCODING
19:32 skeggsb9778[d]: [ 222.175774] nouveau 0000:01:00.0: gsp: Xid:13 Graphics SM Warp Exception on (GPC 0, TPC 0, SM 0): Illegal Instruction Encoding
19:32 skeggsb9778[d]: [ 222.175781] nouveau 0000:01:00.0: gsp: Xid:13 Graphics Exception: ESR 0x505730=0x2c0009 0x505734=0x0 0x505728=0x1c81fb60 0x50572c=0x1174
19:33 skeggsb9778[d]: i just re-pushed the -gb20x branch too (and it's actually in sync with the fw now)
19:34 mohamexiety[d]: nice, thanks!
19:35 mohamexiety[d]: skeggsb9778[d]: so I guess before this can work, we need the updated methods. and then we need to fix up blackwell in nak/the compiler?
19:38 skeggsb9778[d]: airlied[d]: ` if (queue->engines & NVKMD_ENGINE_COMPUTE) {
19:38 skeggsb9778[d]: if (pdev->info.cls_compute >= VOLTA_COMPUTE_A) {
19:38 skeggsb9778[d]: uint64_t temp = 0xfeULL << 24;
19:38 skeggsb9778[d]: - P_MTHD(p, NVC3C0, SET_SHADER_SHARED_MEMORY_WINDOW_A);
19:38 skeggsb9778[d]: - P_NVC3C0_SET_SHADER_SHARED_MEMORY_WINDOW_A(p, temp >> 32);
19:38 skeggsb9778[d]: - P_NVC3C0_SET_SHADER_SHARED_MEMORY_WINDOW_B(p, temp & 0xffffffff);
19:38 skeggsb9778[d]: +// P_MTHD(p, NVC3C0, SET_SHADER_SHARED_MEMORY_WINDOW_A);
19:38 skeggsb9778[d]: +// P_NVC3C0_SET_SHADER_SHARED_MEMORY_WINDOW_A(p, temp >> 32);
19:38 skeggsb9778[d]: +// P_NVC3C0_SET_SHADER_SHARED_MEMORY_WINDOW_B(p, temp & 0xffffffff);
19:38 skeggsb9778[d]: `
19:39 skeggsb9778[d]: you might need that one too, now I get illegal instr trying to start Xorg with zink 😛
19:39 skeggsb9778[d]: mohamexiety[d]: yeah, though you may have luck playing spot the difference between traces of (say) ada vs blackwell with nv's vk driver
19:40 mohamexiety[d]: yeah I was thinking of something like that. just not sure how to go about it yet
19:42 mohamexiety[d]: the instruction encodings I think we can get via https://github.com/kuterd/nv_isa_solver?tab=readme-ov-file
19:43 airlied[d]: skeggsb9778[d]: that stops some of the spam, but now I just get a channel timeout
19:44 airlied[d]: no illegals
19:44 airlied[d]: okay smoke triangle seems to pass
19:45 airlied[d]: all the cts smoke tests pass
19:52 orowith2os[d]: Kepler obtained 🔥
19:52 orowith2os[d]: I guess I'll load this and the Maxwell into a PC and rig everything up to a VM
19:54 mohamexiety[d]: skeggsb9778[d]: gfxstrand[d] marysaka[d] do we have anything for this btw? I don't remember the RE tools we have, sorry
19:54 airlied[d]: so mufu and ipa might be the first instructions I'd look for
19:54 marysaka[d]: mohamexiety[d]: Only have stuffs for shader you build yourself, envyhooks doesn't have enough to grab shaders sadly
19:54 marysaka[d]: (it would be awesome to implement tho)
19:55 skeggsb9778[d]: i was thinking more (mostly) for finding the changed methods
19:58 gfxstrand[d]: skeggsb9778[d]: We really need a decoder for those messages. What we got out of pre-GSP nouveau was pretty nice. The GSP errors are damn near useless without asking an NVIDIA person to run it through the tool.
19:58 gfxstrand[d]: skeggsb9778[d]: No more shared memory windows? Or did the methods just change?
19:59 airlied[d]: oh looks like the encoding for the scoreboards might have changed
20:00 gfxstrand[d]: That's annoying
20:00 airlied[d]: airlied@p53el8f:~/devel/nv-shader-tools$ cargo run --bin nvfuzz SM80 0..1 00007326 000000ff 000e001f 000e0400
20:00 airlied[d]: Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.03s
20:00 airlied[d]: Running `target/debug/nvfuzz SM80 0..1 00007326 000000ff 000e001f 000e0400`
20:00 airlied[d]: With 0x0: ipa.pass r0, a[0x7c]
20:00 airlied[d]: airlied@p53el8f:~/devel/nv-shader-tools$ cargo run --bin nvfuzz SM120 0..1 00007326 000000ff 000e001f 000e0400
20:00 gfxstrand[d]: Do we have Blackwell class headers?
20:00 airlied[d]: Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.03s
20:00 airlied[d]: Running `target/debug/nvfuzz SM120 0..1 00007326 000000ff 000e001f 000e0400`
20:00 airlied[d]: With 0x0: ipa.pass r0, a[0x7c] &wr=0x0 ?wait2_end_group
20:00 airlied[d]: not yet, I'll ask for them today
20:01 gfxstrand[d]: airlied[d]: Interesting... The disassembler is printing scoreboard info now?
20:02 gfxstrand[d]: Also, if you've got a valid ipa.pass, you can fuzz the scoreboard info with nvfuzz. 🙂
20:09 airlied[d]: throwing 16-bits at nvfuzz,might get there in an hour or two 😛
20:09 airlied[d]: oh 26-bits
20:18 airlied[d]: okay wierd, maybe it's just the dis printing the scoreboards and the encoding hasn't changed
20:19 gfxstrand[d]: I think it's just printing the scoreboards now
20:20 gfxstrand[d]: The cool thing is that now I have proper notation that I can plub into nvdis
20:21 gfxstrand[d]: Oh, reuse_mask is gone. It's other stuff now
20:22 gfxstrand[d]: nvfuzz SM120 122..126 00007326 000000ff 000e001f 000e0400
20:22 gfxstrand[d]: With 0x0: ipa.pass r0, a[0x7c] &wr=0x0 ?wait2_end_group
20:22 gfxstrand[d]: With 0x1: ipa.pass r0, a[0x7c] &wr=0x0 ?wait2_end_group ?batch_start
20:22 gfxstrand[d]: With 0x2: ipa.pass r0, a[0x7c] &wr=0x0 ?wait2_end_group ?batch_start_tile
20:22 gfxstrand[d]: With 0x3: ipa.pass r0, a[0x7c] &wr=0x0 ?wait2_end_group ????3
20:22 gfxstrand[d]: With 0x4: ipa.pass r0, a[0x7c] &wr=0x0 ?wait2_end_group ?batch_end
20:22 gfxstrand[d]: With 0x5: ipa.pass r0, a[0x7c] &wr=0x0 ?wait2_end_group ?barrier_exempt
20:22 gfxstrand[d]: With 0x8: ipa.pass r0, a[0x7c] &wr=0x0 ?wait2_end_group
20:22 gfxstrand[d]: With 0x9: ipa.pass r0, a[0x7c] &wr=0x0 ?wait2_end_group ?batch_start
20:22 gfxstrand[d]: With 0xa: ipa.pass r0, a[0x7c] &wr=0x0 ?wait2_end_group ?batch_start_tile
20:22 gfxstrand[d]: With 0xb: ipa.pass r0, a[0x7c] &wr=0x0 ?wait2_end_group ????3
20:22 gfxstrand[d]: With 0xc: ipa.pass r0, a[0x7c] &wr=0x0 ?wait2_end_group ?batch_end
20:22 gfxstrand[d]: With 0xd: ipa.pass r0, a[0x7c] &wr=0x0 ?wait2_end_group ?barrier_exempt
20:22 gfxstrand[d]: And what used to be `.yld` is different, too
20:22 airlied[d]: yeah I dropped yld her
20:24 airlied[d]: doh probably the vertex shader that is busted, and I'm staring at fragment
20:40 mohamexiety[d]: hm that's a bit odd, still cant get nvk to pick it up even with adding the sm definitions and commenting out the methods mentioned here
20:43 mohamexiety[d]: I'll push a super early draft MR just to collect things to do/hacks though
20:47 mohamexiety[d]: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34331 there. people here should be able to push directly to it/add commits
20:53 gfxstrand[d]: mohamexiety[d]: Do you have a device in /dev/dri?
20:54 mohamexiety[d]: yeah:
20:54 mohamexiety[d]: ❯ ls /dev/dri
20:54 mohamexiety[d]: drwxr-xr-x - root 1 Apr 01:11  by-path
20:54 mohamexiety[d]: crw-rw---- 226,0 root 1 Apr 01:11  card0
20:54 mohamexiety[d]: crw-rw-rw- 226,128 root 1 Apr 01:11  renderD128
20:55 mohamexiety[d]: iGPU is disabled (and doesn't appear in vulkaninfo so it really isn't working)
21:06 x512[m]: Are there some common utility for reference counted objects?
21:12 airlied[d]: strace vulkaninfo and see where it dies,
21:18 gfxstrand[d]: Latency MR assigned to Marge. :transtada128x128:
21:19 gfxstrand[d]: x512[m]: There is in gallium but not for all of Mesa. You have to roll your own.
21:19 pavlo_kozlenko[d]: Is there still a -nouveau-experimenal build flag for -D vulkan-driver? Is there only -nouveau left
21:20 x512[m]: Also I suppose I need to implement custom allocator for semaphore pushbuffers in NVRM backend.
21:23 mhenning[d]: pavlo_kozlenko[d]: There's only the `nouveau` option, not `nouveau-experimental` any more
21:23 x512[m]: Wait/signal entries will be converted to semaphore acquire/release pushbuffer commands.
21:25 pavlo_kozlenko[d]: mhenning[d]: Thanks
21:25 pavlo_kozlenko[d]: ❤️
21:31 airlied[d]: gfxstrand[d]: I request blackwell latency info today, no idea what the latency on getting it will be 🙂
21:32 karolherbst[d]: airlied[d]: did you ask for Ada as well?
21:34 airlied[d]: I think we've sort of come to the conclusion that Ada should be like Ampere
21:34 karolherbst[d]: "should"
21:35 karolherbst[d]: though probably right
21:35 pavlo_kozlenko[d]: snowycoder[d]: with these patches we get a working vulkan on keplerB?
21:36 mohamexiety[d]: strace is so noisy, damn
21:36 mohamexiety[d]: I can tell it does open the device though at least:
21:36 mohamexiety[d]: readlink("/sys", 0x7fff5ae25bf0, 1023) = -1 EINVAL (Invalid argument)
21:36 mohamexiety[d]: readlink("/sys/dev", 0x7fff5ae25bf0, 1023) = -1 EINVAL (Invalid argument)
21:36 mohamexiety[d]: readlink("/sys/dev/char", 0x7fff5ae25bf0, 1023) = -1 EINVAL (Invalid argument)
21:36 mohamexiety[d]: readlink("/sys/dev/char/226:0", "../../devices/pci0000:00/0000:00"..., 1023) = 60
21:36 mohamexiety[d]: readlink("/sys/devices", 0x7fff5ae25bf0, 1023) = -1 EINVAL (Invalid argument)
21:36 mohamexiety[d]: readlink("/sys/devices/pci0000:00", 0x7fff5ae25bf0, 1023) = -1 EINVAL (Invalid argument)
21:36 mohamexiety[d]: readlink("/sys/devices/pci0000:00/0000:00:01.1", 0x7fff5ae25bf0, 1023) = -1 EINVAL (Invalid argument)
21:36 mohamexiety[d]: readlink("/sys/devices/pci0000:00/0000:00:01.1/0000:01:00.0", 0x7fff5ae25bf0, 1023) = -1 EINVAL (Invalid argument)
21:36 mohamexiety[d]: readlink("/sys/devices/pci0000:00/0000:00:01.1/0000:01:00.0/drm", 0x7fff5ae25bf0, 1023) = -1 EINVAL (Invalid argument)
21:36 mohamexiety[d]: readlink("/sys/devices/pci0000:00/0000:00:01.1/0000:01:00.0/drm/card0", 0x7fff5ae25bf0, 1023) = -1 EINVAL (Invalid argument)
21:36 mohamexiety[d]: readlink("/sys/devices/pci0000:00/0000:00:01.1/0000:01:00.0/drm/card0/device", "../../../0000:01:00.0", 1023) = 21
21:36 mohamexiety[d]: readlink("/sys/devices/pci0000:00/0000:00:01.1/0000:01:00.0", 0x7fff5ae25bf0, 1023) = -1 EINVAL (Invalid argument)
21:37 mohamexiety[d]: openat(AT_FDCWD, "/sys/devices/pci0000:00/0000:00:01.1/0000:01:00.0/vendor", O_RDONLY) = 5
21:37 mohamexiety[d]: fstat(5, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0
21:37 airlied[d]: the last thing it does should be the problem
21:47 airlied[d]: look like the ureg range might have expanded on blackwell
21:48 mohamexiety[d]: airlied[d]: this is the weird thing, it doesn't seem there's anything that stands out as problematic
21:50 airlied[d]: and iadd3 seems to have moved
21:55 airlied[d]: oh the cb src register is different
21:57 airlied[d]: yes doesn't seem to support c[][] encoding anumore
21:58 airlied[d]: looks like you have uldc/ldc those
21:58 mhenning[d]: That's pretty surprising
21:59 gfxstrand[d]: x512[m]: They got rid of bindless cbufs in ALU? That's disappointing.
21:59 mhenning[d]: Are you sure the encoding didn't just change?
21:59 x512[m]: gfxstrand[d]: ?
22:00 airlied[d]: I've dump all the SASS from their library, for ada it does c[] for blackweel it never does
22:00 airlied[d]: I guess expanding ur range from 63 to 255 ran out of bits
22:03 airlied[d]: I wonder have they provisioned 255 UREGs or if they've just changed the encoding
22:04 airlied[d]: to allow for it in the future
22:04 airlied[d]: but urZ is now ur255
22:06 x512[m]: nv_push code generator do not correctly generate semaphore class:
22:06 x512[m]: src/nouveau/headers/nv_push_clc36f.c: In function 'P_DUMP_NVC36F_MTHD_DATA':
22:06 x512[m]: src/nouveau/headers/nv_push_clc36f.c:656:9: error: duplicate case value
22:06 x512[m]:  656 | case NVC36F_WFI_SCOPE_CURRENT_VEID:
22:06 x512[m]:  | ^~~~
22:06 x512[m]: src/nouveau/headers/nv_push_clc36f.c:653:9: note: previously used here
22:07 x512[m]:  653 | case NVC36F_WFI_SCOPE_CURRENT_SCG_TYPE:
22:09 mhenning[d]: x512[m]: there are some cases that the generator manually handles. I forget if duplicate values are one of them
22:10 x512[m]: How to fix it? I am adding new classes for NVRM KMD backend.
22:12 mhenning[d]: not sure
22:13 gfxstrand[d]: x512[m]: Sorry. Wrong reply.
22:15 gfxstrand[d]: x512[m]: Yes, you're going to need to write some sort of allocator or reuse gadget for sync pushes if you can't put them straight on the ring.
22:16 gfxstrand[d]: mhenning[d]: I don't think so. Generally there are no duplicate class methods.
22:18 airlied[d]: gfxstrand[d]: I assume to move these out of ALUs I'd want to do it earlier like from_nir rather than later in legalize?
22:18 gfxstrand[d]: Yeah. And make copy prop refuse to propagate them
22:19 mhenning[d]: gfxstrand[d]: I don't think it's a method, I think it's a duplicate enum
22:19 gfxstrand[d]: There's already a cbuf rules thing in copy prop
22:20 gfxstrand[d]: mhenning[d]: Then we may have to special case it somehow. We don't want to modify the files from Nvidia.
22:20 airlied[d]: okay guess that sorts out what I'm doing today then
22:21 mhenning[d]: Instruction form changes from ampere to blackwell: https://gitlab.freedesktop.org/mhenning/re/-/snippets/7832
22:21 mhenning[d]: yeah, looks like all the C forms are gone
22:21 gfxstrand[d]: airlied[d]: Replacing bindless cbufs with more UGPRs kinda makes sense. Most ALU can take them and it's probably easier in the hardware to access registers than this cbuf thing that cache miss. They probably saved some hardware that way.
22:23 gfxstrand[d]: airlied[d]: https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/nouveau/compiler/nak/opt_copy_prop.rs?ref_type=heads#L8
22:23 gfxstrand[d]: We probably want a `NoBindless` rule and set that on Blackwell+
22:23 matt_schwartz[d]: if any other devs want to ssh into my Blackwell rig for any reason feel free to ping me for access since mohamexiety[d] is done with it for now
22:25 gfxstrand[d]: I may next week if airlied[d] doesn't sort it out between now and then.
22:25 gfxstrand[d]: Or I might order a 5070 to add to my GPU library
22:25 mhenning[d]: blackwell has some cool new stuff: `uffma__URURI_URURI`, `ldg_256_uniform__Ra32`
22:26 mhenning[d]: looks like texture ops got shuffled around a bit
22:28 gfxstrand[d]: I hope we don't end up with a new encoder for blackwell...
22:29 mhenning[d]: It doesn't look like it needs a new encoder, but a number of instruction forms are different
22:29 gfxstrand[d]: mhenning[d]: Oh, uffma?!? Nice! That was my #1 grip with uniform instructions: They didn't have any float stuff.
22:29 gfxstrand[d]: If we have uffma, suddenly uniform gets a hell of a lot more interesting
22:30 mhenning[d]: yeah, it looks like uniform is a lot more capable
22:30 gfxstrand[d]: Nice!
22:38 mohamexiety[d]: a bit too tired atm so will stop for today. what do you think I should look into for tomorrow on the blackwell stuff?
22:39 mohamexiety[d]: I also may have figured a way to run Kuter's ISA gen tool so could try that for SM_120 tomorrow if it works. downside is it'll probably take a lot of time on regular machines (he mentioned it was a few hours on a 128c EPYC)
22:42 matt_schwartz[d]: mohamexiety[d]: does the work have to be done on the same unit as the nvidia gpu? I also have a threadripper on the network but it’s a different machine.
22:43 mohamexiety[d]: nah -- should be fully independent
22:45 mohamexiety[d]: (does need CUDA tho afaiu so it needs _a_ NV GPU in that sense)
22:48 mhenning[d]: I don't think it needs a gpu? It's just running the compiler and disassembler right?
22:50 mohamexiety[d]: doesnt it rely on cuda being installed to get cudaBLAS though?
22:51 mhenning[d]: You can install cuda without a gpu
22:52 mohamexiety[d]: oh I thought it'd just error out if it doesn't detect the NV driver/a NV GPU. that could be worth trying then
22:53 mhenning[d]: oh, I never use the official installer - I use the arch package
22:53 mhenning[d]: but yeah, I run the cuda compiler without the kernel module loaded all the time for RE stuff
22:55 mohamexiety[d]: yeah I see, that's super nice and convenient
22:58 matt_schwartz[d]: i could set up access to the 7970x too in that case (i have no idea what youre even talking about doing) 🫡
23:45 x512[m]: https://gitlab.freedesktop.org/mesa/mesa/-/commit/0915b3131fa65709733a66227ef9137235fc6d78
23:46 x512[m]: Is this trick even valid? Is it allowed to break single command group between FIFO segments?
23:47 x512[m]: Or it is fine as long no semaphore write or flush is done in between?