02:11 airlied[d]: gfxstrand[d]: gonna poke doom by saying I think the only major problem left on blackwell is the bound vs bindless texture choosing in flow control
02:13 mhenning[d]: Does everything work then if you use bindless everywhere?
02:14 mhenning[d]: It's also not difficult to just check if the current block is divergent in nir as long as divergence information has been calculated
02:14 airlied[d]: I suppose that is worth a try , just seems horrible 🙂
02:14 airlied[d]: yes it hasn't been at the time we lower textures I don't think
02:14 mhenning[d]: yeah, it's not a long-term solution but worth trying for bringup
02:14 airlied[d]: hence why I'm considering letting gfxstrand[d] figure out a happy strategy
02:15 mhenning[d]: you can just calculate divergence info then
02:16 mhenning[d]: `nir_metadata_require(impl, nir_metadata_divergence);`
02:22 airlied[d]: oh and some host image copy stuff might need more thought
02:32 gfxstrand[d]: airlied[d]: Really tempting fate there. 😉
02:33 gfxstrand[d]: When's the next branch point?
02:34 mhenning[d]: i feel like we just branched 25.1 yesterday
02:34 gfxstrand[d]: July 16
02:34 gfxstrand[d]: So we've got about 5 weeks
02:35 gfxstrand[d]: mhenning[d]: I'm fine with going 100% bindless for now.
02:36 airlied[d]: oh I see one thing I could do better, we do a handle load in uniform space, but the texture instruction is in flow control, but I then add the new extra handle load in flow control
02:36 gfxstrand[d]: As long as it's all landed by the branch point, that's fine. We can cherry pick the enable patch.
02:36 airlied[d]: I could try and stick it beside the original one
03:07 airlied[d]: okay my branch now has better divergence and inserts the load near the old one, and hacks around a legalization problem
03:20 airlied[d]: talos run, I'm done 😛
03:32 airlied[d]: okay from dEQP-VK.api*: dEQP-VK.api.copy_and_blit.sparse.image_to_image.array.array_to_array_whole_mipmap_d24_unorm_s8_uint,Crash
03:32 airlied[d]: dEQP-VK.api.device_init.create_instance_device_intentional_alloc_fail.basic,Crash
03:32 airlied[d]: dEQP-VK.api.driver_properties.conformance_version,Fail
03:32 airlied[d]: dEQP-VK.api.object_management.multiple_shared_resources.device_group,Crash
03:32 airlied[d]: dEQP-VK.api.object_management.multiple_unique_resources.device_group,Crash
03:32 airlied[d]: dEQP-VK.api.object_management.multithreaded_per_thread_resources.device_group,Crash
03:32 airlied[d]: dEQP-VK.api.object_management.multithreaded_shared_resources.device_group,Crash
03:32 airlied[d]: dEQP-VK.api.object_management.single.device_group,Crash
03:38 airlied[d]: gfxstrand[d]: submitted https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35483 for the texture header rework
04:02 airlied[d]: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35484 and that has some misc blackwell bits that we can probably land
04:06 airlied[d]: mhenning[d]: so if something is convergent, but inside flow control, my idea won't detect the problematic case 😦
04:07 airlied[d]: I suppose I could check if the block is divergent
04:17 mhenning[d]: Not totally sure what case you're talking about
04:19 airlied[d]: con 64 %13 = @ldc_nv (%11 (0x1), %12 (0x10)) (access=none, align_mul=8, align_offset=0, is_tex_ld=1)
04:19 mhenning[d]: now that I think about it checking block divergence actually might not be the right way to check if we can write a ureg - there might be edge cases where it's different
04:19 airlied[d]: {%r17 %r18} = ldc.unpack.b64 c[0x1][+0x10]
04:19 airlied[d]: it translate a convergent handle load into warp regs when inside flow control
04:20 mhenning[d]: yeah, you need ti check control flow divergence
04:20 mhenning[d]: can't write a ureg from nonuniform control flow right now
04:21 airlied[d]: so the problem is when I do divergence analysis on the shader when we lower tex, the block gets marked as convergent
04:21 mhenning[d]: yeah, that might be the edge case I was worried about
04:22 airlied[d]: so if the only safe option is if the handle load is inside any flow control to bail to bindless
04:22 mhenning[d]: you can have a loop with a convergent header but other blocks divergent later on, and nak will treat this case as if the whole loop is divergent
04:23 mhenning[d]: you could probably traverse the control flow tree and check for any loops with divergent breaks or continues
04:24 mhenning[d]: airlied[d]: this would also work for now
04:24 airlied[d]: that seems above my "not a compiler person" threshold, though I'm not sure how best in nir to detect a block is inside flow control
04:26 mhenning[d]: that's fair
04:27 mhenning[d]: I'll think about it a bit. We might be able to do some work to align nir's concept of block divergence with nak's concept of block divergence
04:33 airlied[d]: I've got the super hack in my branch, if the instr isn't in the start block
04:52 airlied[d]: probably time to start using the 5080 in a real box instead of testing on this laptop, CTS is estimating 16hours
06:27 mohamexiety[d]: airlied[d]: anything I could poke around with Blackwell?
06:30 airlied[d]: Still some misc CTS falls, but not sure if there is anything super big left other than turn the patches in my branch into acceptable submissions
06:33 airlied[d]: otherwise run CTS, pick a fail or a crash and dig in 🙂
06:34 airlied[d]: Pass: 186189, Fail: 5234, Crash: 104, Skip: 200827, Timeout: 54, Flake: 92, Duration: 1:45:41, Remaining: 11:02:10 is where mine is at, so still a few gaps
06:34 airlied[d]: if I had to guess something multisample sample positions might be moved
06:35 mohamexiety[d]: airlied[d]: Why is it so slow 😮
06:35 mohamexiety[d]: But yeah fair
06:55 airlied[d]: Pre release Laptop and I'm not sure if the CPU per mgmt is working very well!
06:57 airlied[d]: I'll get the 5080 in a real box running tomorrow
07:05 mohamexiety[d]: https://forums.developer.nvidia.com/t/blackwell-integer/320578/137 this may be useful
07:13 airlied[d]: I think I have the Blackwell latencies spreadsheets, just have to do the typing
09:54 airlied[d]: okay also some instr encoding errors in the dmesg, so have to track down those
10:09 mohamexiety[d]: airlied[d]: Could you please send the list of failed tests when it’s done btw?
10:09 mohamexiety[d]: (If you will stop it before the full 11hr :KEKW:)
10:45 pavlo_kozlenko[d]: Will I be able to use my Kepler gt 630 video card to record video to an OBS via va-api?
11:02 gfxstrand[d]: mhenning[d]: I don't remember why I did that. I'm sure I had a reason but I didn't remember off hand.
11:15 AlbinCheddar: There is not much that i owe you , you retarded chimps. Every real method i developed i gave. so 192 also came as 112-32+112 , i have tested many banks for access now, installed a new system and whatnot. But no longer interested to deal with you. airlied your "hackery" "must" be as special as possible for a debiliated scammer man like you and your other retarded fart sniffers calling the
11:15 AlbinCheddar: biggest contributor as crazy, so i looked i had no dscent code available, but arizona uni has the same code downloadable, foundry gives you transistor specification and it can derive or inferre all the rest how to run those , through a simulator framework, they also have 7nm transistors on open source.
13:23 karolherbst[d]: airlied[d]: did you also get the updates for ampere?
16:03 karolherbst[d]: yeah...maybe I should use that because otherwise the code will be too messy
16:05 mohamexiety[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1382752643566534686/IMG_0459.jpg?ex=684c4c40&is=684afac0&hm=29915c1512793463d6b2abc83028f847112c094d096f908573d8d7cbc4b47612&
16:05 mohamexiety[d]: Sorry for phone pic but anyone know what this is?
16:06 mohamexiety[d]: Trying to run Blackwell and Ada at the same time
16:07 mohamexiety[d]: I guess let me try 6.16rc first though
16:38 mohamexiety[d]: yeah idk, there's something really weird
16:39 mohamexiety[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1382761134297452696/message.txt?ex=684c5428&is=684b02a8&hm=f6ff24acd551a902d3b21b16c588f85bcfdb5fbedb384333d3965ce463686e06&
16:41 mohamexiety[d]: RTX 5070 (GB205 at 0000:01:00.0) + RTX 4000 Ada (AD103 at 0000:07:00.0), displays are 1 monitor connected via DP in the RTX 4000 Ada, and 1 monitor via HDMI to the iGPU
16:41 mohamexiety[d]: with 6.16 I could boot at least, so it does recover somewhat? but with 6.14.9 I couldnt boot
16:43 tiredchiku[d]: doesn't 6.14 not have GSP 570 support? or did it get backported
16:43 mohamexiety[d]: it doesnt but it shouldnt matter as I dont really have anything connected to the 5070
16:45 mohamexiety[d]: beyond that dmesg spam and segfault in gnome, it looks like nvk and all initializes properly :thonk:
16:45 mohamexiety[d]: Device Properties and Extensions:
16:45 mohamexiety[d]: =================================
16:45 mohamexiety[d]: GPU0:
16:45 mohamexiety[d]: VkPhysicalDeviceProperties:
16:45 mohamexiety[d]: ---------------------------
16:45 mohamexiety[d]: apiVersion = 1.4.305 (4210993)
16:45 mohamexiety[d]: driverVersion = 25.0.7 (104857607)
16:45 mohamexiety[d]: vendorID = 0x10de
16:45 mohamexiety[d]: deviceID = 0x27b2
16:45 mohamexiety[d]: deviceType = PHYSICAL_DEVICE_TYPE_DISCRETE_GPU
16:45 mohamexiety[d]: deviceName = NVIDIA RTX 4000 Ada Generation (NVK AD104)
16:45 mohamexiety[d]: pipelineCacheUUID = 2de8bc90-8b13-1363-a64f-292164f5411b
17:59 mohamexiety[d]: hm another weird thing, it doesn't like the 5070, and also doesn't load in the newer gsp. ugh..
18:00 mohamexiety[d]: [ 4.781502] nouveau 0000:01:00.0: NVIDIA GB205 (1b5000a1)
18:00 mohamexiety[d]: [ 4.781536] nouveau 0000:01:00.0: gsp ctor failed: -2
18:00 mohamexiety[d]: [ 4.781539] nouveau 0000:01:00.0: probe with driver nouveau failed with error -2
18:08 mohamexiety[d]: mohamexiety[d]: one thing I wonder is if maybe it's getting confused because while the RTX 4000 is normally AD104, this particular GPU is based on a cut down AD103
18:14 mohamexiety[d]: also I deleted every single instance of 535 GSP in /lib/firmware/nvidia and somehow it's still loading in 535 .-.
18:14 mohamexiety[d]: [ 5.072300] nouveau 0000:07:00.0: gsp: RM version: 535.113.01
18:16 mhenning[d]: maybe the firmware is getting loaded from initrd or something?
18:16 mhenning[d]: do you load nouveau before or after mounting root
18:27 mohamexiety[d]: ...yeah. I am dumb, sorry
18:28 mohamexiety[d]: interestingly I didnt need to do this in the past on this system, it would regenerate by itself. but when I regenerated it after you asked and rebooted, it properly loaded in the newer GSP and both GPUs are running it. 5070 also initialized now
18:28 mohamexiety[d]: mohamexiety[d]: there's still this -- no idea what it is. the session recovers after it and everything is working fine but could be relevant, not sure
20:23 airlied[d]: Pass: 1354318, Fail: 37194, Crash: 374, Warn: 5, Skip: 1458391, Timeout: 339, Flake: 775, Duration: 12:28:52, Remaining: 0
20:25 airlied[d]: https://people.freedesktop.org/~airlied/scratch/failures.csv.gz
20:41 mohamexiety[d]: damn you actually left it the whole time. thanks!
20:42 airlied[d]: any ds/sparse ones should be fixed
20:44 airlied[d]: definitely something multisample/sample_locations still broken
20:52 skeggsb9778[d]: i'm impressed it survived 😛
20:58 airlied[d]: indeed, one reason I just let it go, that and I was asleep 🙂
20:58 airlied[d]: also with the failures list there are a lot of sideswipes, so not all the fails are real
22:23 airlied[d]: okay fixed one instruction encoding with suatmo
23:37 airlied[d]: multisample was another instruction encoding