07:16snowycoder[d]: GTX 770 CTS run results: `Pass: 1134253, Fail: 27, Crash: 24, Warn: 12, Skip: 1709331, Timeout: 19, Flake: 95, Duration: 17:08:21`
12:16gfxstrand[d]: What are the fails/crashes?
12:37zmike[d]: gfxstrand[d]: Mechanically they are very similar in dri, which is why they share some codepaths
14:21gfxstrand[d]: I don't know how anything works anymore. I refactor a bit of EGL code and now a thing that was segfaulting in GLX is working. 🤦🏻♀️
14:32gfxstrand[d]: Okay, steam doesn't work with DRI_PRIME=1 on AMD+AMD. It doesn't work with prime at all. I'm not gonna try to make it work.
14:33asuasuasu[d]: so uh, i've always had weirdness with anything chromium or CEF gpu selection in general tbh
14:33asuasuasu[d]: like DRI_PRIME= never seemed to do what i wanted and would spin up my dGPU when i don't want it to
14:34gfxstrand[d]: Yeah, Chrome is clearly doing some weird GLX shenanigans
14:34asuasuasu[d]: i really don't know what's up with that
14:34asuasuasu[d]: they have flags for GPU selection, but this didn't work for me
14:34asuasuasu[d]: my workaround was, uh.
14:34asuasuasu[d]: `alias igpu 'systemd-run -t -p "Type=oneshot" -p "InaccessiblePaths=/dev/dri/by-path/pci-0000:04:00.0-render" --user'`
14:35asuasuasu[d]: i don't use this for steam (it does run but god knows what it does to actually running games), but i successfully do get chromium/vsc/obsidian on the right GPU
14:40gfxstrand[d]: That's actually a pretty clever workaround. :silvy_sweat:
14:41karolherbst[d]: cleaned up my MR to eliminate 22% of the spills to memory in shader-db
14:42asuasuasu[d]: gfxstrand[d]: i was really pissed at my GPU fans ramping up and i was very amused that this worked
14:42karolherbst[d]: though the impact outside of parallel_rdp is rather small, lol
14:43karolherbst[d]: but reduces slm size in no man sky as well
14:49asuasuasu[d]: actually, maybe a silly question, if you have, say, an intel iGPU, an AMD dGPU, monitors plugged into the iGPU, and you run software via DRI_PRIME on the dGPU
14:49asuasuasu[d]: how much is the userland responsible for in all the iGPU<->dGPU shenanigans? is the userland GL stack of the iGPU involved at all, or is it all compositor/kernel side?
15:03gfxstrand[d]: Which process?
15:04gfxstrand[d]: In the client process, only the dGPU is involved. We might keep the iGPU open for `$reasons` but it's not used. The rendering is done on the dGPU and then copied using the dGPU into a linear dma-buf that lives in system RAM. The iGPU is never invoked. When the compositor gets that buffer, it's already visible to the iGPU and the dGPU is never involved in reading from it.
15:05asuasuasu[d]: that does answer my question nicely
15:05asuasuasu[d]: is it any more complicated for, say, a dGPU to different dGPU case?
15:05gfxstrand[d]: We may allocate the memory using the iGPU driver in some cases but that's an artifact of how we deal with display requirements with mobile GPUs where they are basically always doing PRIME.
15:06gfxstrand[d]: asuasuasu[d]: Nope. It's exactly the same.
15:06gfxstrand[d]: In theory, with SLI we could maybe share VRAM directly but we've never hooked that up in the kernel AFAIK.
15:07gfxstrand[d]: And there is also some potential if the two GPUs share tiling formats to share a tiled image directly instead of doing a copy. However, you still don't want to render straight to system RAM on a discrete card. Unless you're something as simple as glxgears, rendering in VRAM and copying to system RAM at the end is going to be faster than rendering straight to system RAM.
15:08asuasuasu[d]: makes sense
15:11gfxstrand[d]: snowycoder[d]: Just plugged in my 780. We'll see if I have the same bugs
15:16snowycoder[d]: gfxstrand[d]: Sorry, I have these:
15:16snowycoder[d]: dEQP-VK.info.device_extensions
15:16snowycoder[d]: dEQP-VK.api.driver_properties.conformance_version
15:16snowycoder[d]: dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.device.payload_nonlocal.buffer.guard_nonlocal.image.frag
15:16snowycoder[d]: dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.device.payload_nonlocal.buffer.guard_local.image.frag
15:16snowycoder[d]: dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.device.payload_nonlocal.image.guard_local.buffer.frag
15:16snowycoder[d]: dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.device.payload_nonlocal.image.guard_local.image.frag
15:16snowycoder[d]: The others are just instabilities I guess?
15:16snowycoder[d]: Maybe it was caused by running the whole tests with 0f pstate (the card was at 60 degrees 0_o)
15:17gfxstrand[d]: Okay, I've got 6 fails so far. Not sure what
15:19snowycoder[d]: message passing fails seem to be caused by instr latencies
15:29gfxstrand[d]: That's plausbiel
15:30gfxstrand[d]: On Maxwell+, there was some funky stuff around barrier where we had to wait some number of cycles before the barrier would take effect
15:57karolherbst[d]: gfxstrand[d]: submitted a nvk talk for XDC?
15:59gfxstrand[d]: Always
16:03gfxstrand[d]: snowycoder[d]: I'm also seeing some image reinterpret fails. Did those exist before?
16:04gfxstrand[d]: I'm not surprised. The r11g11b10 tests are horribly twitchy
16:06karolherbst[d]: gfxstrand[d]: I might want to sneak in for 5 minutes or so and give Nvidia a thanks for their help and opening up, or you do it, or other Nouveau folks or something
16:07gfxstrand[d]: Sure
16:07gfxstrand[d]: We can do a joint talk again if you want
16:07gfxstrand[d]: I should be there in-person this time so it wouldn't be too hard to do
16:07magic_rb[d]: Cant wait for xdc and meeting all you folks
16:08gfxstrand[d]: It's usually a good time
16:09karolherbst[d]: I still have to book my trip 😄
16:09karolherbst[d]: but it's like a train trip for me
16:09magic_rb[d]: I do have one ask, if i start turning yellow, let me know
16:09karolherbst[d]: anyway, I need to submit my talks, lol
16:09karolherbst[d]: Maybe I talk about SVM...
16:09karolherbst[d]: maybe I talk about coop matrix...
16:09magic_rb[d]: I tend to indulge in alcohol at these events and i got gilberts syndrome soooo lowered liver function
16:10karolherbst[d]: If I submit a coop matrix talk, might also bring it up myself
16:10karolherbst[d]: but also somebody has to review the patches 🙃
16:10gfxstrand[d]: But, yeah, I think one of the things I'll definitely talk about is getting the real latencies in there
16:10gfxstrand[d]: Also Blackwell
16:10karolherbst[d]: ahh
16:10karolherbst[d]: real latencies might also be a good spot to thank nvidia 🙃
16:10gfxstrand[d]: Yeah
16:13karolherbst[d]: magic_rb[d]: oh no
16:14karolherbst[d]: I think a lightning talk is enough for coop matrix, don't you think? 🙃
16:15karolherbst[d]: not even sure it's interesting enough, dunno
16:15gfxstrand[d]: Joint lightening talk with marysaka[d] ?
16:15karolherbst[d]: maybe
16:15magic_rb[d]: karolherbst[d]: Its fine, i just need to be told im turning into a simpsons character
16:16karolherbst[d]: `number #50.` :ferrisUwU:
16:19marysaka[d]: yeah not sure if there is much to talk about on it so lightning should be good :aki_thonk:
16:19karolherbst[d]: the matrix layout is sure funky
16:19karolherbst[d]: but...
16:19karolherbst[d]: `cmat_convert` is a disaster
16:19marysaka[d]: yeah :blobcatnotlikethis:
16:21karolherbst[d]: and I think fp64 is slightly different but not sure.. it feels like it looks the same if you pretend it's fp32 vec2 values, but....
16:22karolherbst[d]: I should dig into it at some point
16:22karolherbst[d]: ohh and the tensor stuff is new.. maybe I get to it until XDC
16:22karolherbst[d]: mhhh yeah maybe there will be enough for a lightning talk
16:23karolherbst[d]: marysaka[d]: so wanna join the lightning talk?
16:23karolherbst[d]: ohh LDSM is also funky
16:23karolherbst[d]: like real funky
16:25karolherbst[d]: using LDSM nukes like 600 instructions in that one benchmark 🙃
16:25marysaka[d]: maybe? not sure what I could mention more than "I wrote a shader executor in python to test LDSM and other funny stuffs"
16:25karolherbst[d]: heh fair
16:25karolherbst[d]: I have done 0 reverse engineering on nvidia for this 🙃
16:27karolherbst[d]: that one shader looks sooo good
16:27karolherbst[d]: marysaka[d]: well can always change it later
16:37karolherbst[d]: ~~maybe we should move to 120 chars per line~~
16:38gfxstrand[d]: *grumble*
16:39gfxstrand[d]: What's the Rust default? 100? 120?
16:41karolherbst[d]: no idea 🙂 120 I'd bet
16:41karolherbst[d]: ohh it's more complicated
16:41karolherbst[d]: 100
16:42karolherbst[d]: but rustfmt sometimes wraps shorter lines
16:42karolherbst[d]: which sometimes annoys me actually 😄
16:42karolherbst[d]: could go with 100...
16:42karolherbst[d]: we have wide screens these days 😄
16:43karolherbst[d]: anyway.. I can probably clean up some of the coop matrix code, but it was a lot harder to read with 80
16:44karolherbst[d]: could maybe move more things into functions...
16:44gfxstrand[d]: I'm okay creeping over 80 if it genuinely is easier to read. There's a few places where that's true. And then there's a lot of really long lines that have very natural wrapping
16:45karolherbst[d]: yeah.. I should do another round of line width stuff
16:45karolherbst[d]: it helps with `determine_matrix_type` tho
16:45karolherbst[d]: mhh could add `is_fp16` and stuff...
16:46karolherbst[d]: thing is.. we should just use clang-format 😄
16:46karolherbst[d]: is it set up for nvk?
16:46gfxstrand[d]: karolherbst[d]: Yeah, that one is basically a table. Keep it as a table.
16:48karolherbst[d]: also.. maybe I should document `cmat_convert` a bit better...
16:49karolherbst[d]: ~~adding tensor floats~~
16:50mohamexiety[d]: karolherbst[d]: that one is bad. rustfmt just works fine, no?
16:50karolherbst[d]: I meant for the C code
17:12x512[m]: What time zone is most common here?
17:27gfxstrand[d]: People are all over
17:33zmike[d]: East Coast is best coast
17:36HdkR: North coast is pretty good as well
17:57gfxstrand[d]: I think most people are either NA east coast or central(ish) European. And then there's Dave hanging out in Australia.
17:57mohamexiety[d]: yeah it's a bit funny. timezone difference can range from 1 hour to 12 hours :KEKW:
17:58tiredchiku[d]: :catHiding:
17:58snowycoder[d]: Italy is bestaly 🇮🇹
17:59gfxstrand[d]: And then there's Mary, who's in france but I swear lives in a US timezone. 😂
17:59asuasuasu[d]: i was going to say timezones are a mere suggestion to people with broken sleep schedules :P
18:00gfxstrand[d]: And Karol lives on roughly Hawaii time, in spite of being in Europe.
18:00gfxstrand[d]: Time is an illusion. Lunchtime, doubly so.
18:24marysaka[d]: Does anyone have a cross file for 32-bit build on Fedora? I'm fighting with paste being built for 64-bit and messing up with nil build :blobcatnotlikethis:
18:26HdkR: https://github.com/FEX-Emu/RootFS/blob/main/Scripts/Fedora/cross_x86 cross for 32-bit x86?
18:26HdkR: Might be a bit old at this point and missing some new rusty bits :D
18:26marysaka[d]: I tried fex one without much success sadly
18:26HdkR: faaah
18:26HdkR: I'll need to fix that.
18:27gfxstrand[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1393297719291936788/x86.cross?ex=6872a91e&is=6871579e&hm=c84078ed9872b5531dd8aec4a2ea86826ddd01d1632544fbf8bde923d5ca7e13&
18:27gfxstrand[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1393297719631810637/pkg-config-lib32?ex=6872a91e&is=6871579e&hm=9847554945c2b1b974428b8e52ee0db715bdbbd014362c2cbdfd64b9a7f1b74e&
18:28gfxstrand[d]: marysaka[d]: ^^
18:28marysaka[d]: huh so not using i686-redhat-linux-gnu-pkg-config I see
18:29gfxstrand[d]: My scripts are ancient. If you want to use the one Fedora packages, that's fine.
18:29marysaka[d]: mhmm, was mostly doing diff to see what might be different :aki_thonk:
18:30karolherbst[d]: marysaka[d]: I do
18:30HdkR: The bindgen arguments at least look important
18:31karolherbst[d]: marysaka[d]: https://gist.githubusercontent.com/karolherbst/8e424bdeccd6706e5cc76c1ac716f850/raw/5760514711a308a07ed24ba12d5777dbb9a2f711/gistfile1.txt
18:31marysaka[d]: we should maybe document it in the cross compile docs in mesa too
18:31HdkR: Did Fedora fix their recent 32-bit llvm dev package problems? It's one of the reasons why I haven't updated the FEX one in a while.
18:31marysaka[d]: karolherbst[d]: hmm isn't llvm-config-32 gone on F42 :aki_thonk:
18:31karolherbst[d]: HdkR: ohh good question
18:31karolherbst[d]: marysaka[d]: mhh yeah, but I used it before that!
18:33gfxstrand[d]: marysaka[d]: My cross file works on F42. I used it yesterday
18:33marysaka[d]: okay it works now 🎉
18:33gfxstrand[d]: HdkR: They did, finally.
18:34marysaka[d]: https://gist.github.com/marysaka/f0885d7ebe2e6ec36ab8c0c2eefd772f
18:34marysaka[d]: so yeah switched to rustc from rustup too
18:34marysaka[d]: but the pkgconfig is the only diff with Faith now
18:36HdkR: gfxstrand[d]: Nice. Now I just need Debian/Ubuntu to fix it :D
18:37gfxstrand[d]: marysaka[d]: I should switch my pkgconfig line. I've just been carrying that little wrapper around since the dawn of time.
18:41x512[m]: Is it good idea to implement 2D shifting by allocating 64x64 buffer and recording command buffer with source -> buffer, buffer -> dest commands? Will it 2 times slower?
18:42gfxstrand[d]: Yeah, that's about what you have to do for an actual 2D memmove
18:42gfxstrand[d]: IDK if 64x64 is the right size, though.
18:43x512[m]: Something that will fit to GPU cache?
18:44gfxstrand[d]: Well, it's a trade-off. You want good caching but you also want the hardware to be able to go wide so you get good memory throughput.
18:44gfxstrand[d]: IDK where that trade-off is. You'd have to benchmark it.
18:47karolherbst[d]: gfxstrand[d]: any opinions on leaving dead code around like the matrix sizes only available with int4? Could also just remove it and just leave a comment on what the encoding is in case anybody ever cares
19:10gfxstrand[d]: Generally, I like leaving enums around. No need to leave piles of code that generates nothing. But if it's just an enumerant, that's fine.
19:32orowith2os[d]: asuasuasu[d]: I'll be conversing with people on the other side of the world and still be up in four hours for work.
21:37snowycoder[d]: The only part in Kepler-NVK that I'm not confident in is the instruction timings.
21:37snowycoder[d]: Can't we really ask Nvidia to have some tables or internal docs? They would really be helpful for stable drivers
21:43x512[m]: Are Kepler EOL?
21:43orowith2os[d]: For a long time, yeay
21:44orowith2os[d]: August of '21, looks like