00:02 airlied[d]: I think bisecting when it started would have the biggest chance of fixing it
00:05 jStefan: I have no idea what version was the first to cause those errors.
00:07 jStefan: going further back in time may also risk using a less mature version of the driver
01:32 airlied[d]: okay not trivially reproducing on tu104 at least, will try tu11x more
01:36 jStefan: is that to me?
01:56 airlied[d]: no the other bug
01:57 airlied[d]: the chances of me looking at fermi is very low, got enough problems in life 😛
02:01 jStefan: :'(
03:35 airlied[d]: esdrastarsis[d]: what kernel was this on, running those tests on mesa main with 7.1-rc1 on a tu116 isn't throwing me any problems
10:13 airlied[d]: https://paste.centos.org/view/raw/f06b21a3 tu116 cts fails
10:17 airlied[d]: but didn't see any page faults, just two [17140.102215] nouveau 0000:09:00.0: gsp: Xid:13 Graphics SM Warp Exception on (GPC 1, TPC 0, SM 0): Misaligned Address
10:17 airlied[d]: [17140.102261] nouveau 0000:09:00.0: gsp: Xid:13 Graphics Exception: ESR 0x50c730=0x500000f 0x50c734=0x0 0x50c728=0x4c1ab72 0x50c72c=0x174
10:17 airlied[d]: [17140.103372] nouveau 0000:09:00.0: gsp: rc engn:00000001 chid:60 gfid:0 level:2 type:13 scope:1 part:233 fault_addr:0000003ffdf5c034 fault_type:ffffffff
10:17 airlied[d]: [17140.103380] nouveau 0000:09:00.0: fifo:000000:003c:003c:[deqp-vk[148287]] errored - disabling channel
10:17 airlied[d]: [17140.103393] nouveau 0000:09:00.0: deqp-vk[148287]: channel 60 killed!
10:17 airlied[d]: [17151.575573] nouveau 0000:09:00.0: gsp: Xid:13 Graphics SM Warp Exception on (GPC 1, TPC 0, SM 0): Misaligned Address
10:17 airlied[d]: [17151.575622] nouveau 0000:09:00.0: gsp: Xid:13 Graphics Exception: ESR 0x50c730=0x500000f 0x50c734=0x0 0x50c728=0x4c1ab72 0x50c72c=0x174
10:17 airlied[d]: [17151.576713] nouveau 0000:09:00.0: gsp: rc engn:00000001 chid:57 gfid:0 level:2 type:13 scope:1 part:233 fault_addr:0000003ffdf5c034 fault_type:ffffffff
10:17 airlied[d]: [17151.576721] nouveau 0000:09:00.0: fifo:000000:0039:0039:[deqp-vk[148475]] errored - disabling channel
10:18 airlied[d]: [17151.576736] nouveau 0000:09:00.0: deqp-vk[148475]: channel 57 killed!
10:20 airlied[d]: I also expect some of those fails are sideswipes from other fails
10:22 airlied[d]: dEQP-VK.spirv_assembly.instruction.compute.variable_pointers.compute.opselect_different_strides generates the Xid
11:24 gfxstrand[d]: Mind filing an issue if it’s reproducible? We shouldn’t be getting misaligned addresses in the CTS
12:11 chikuwad[d]: decided to just use meson devenv but I'm still not able to get deqp-gles31 to pick llvmpipe 🤔
12:11 chikuwad[d]: still uses zink w/ nvk
12:12 chikuwad[d]: am I going to have to build without nvk/zink
12:14 mohamexiety[d]: you should always have llvmpipe i think, it should be the last gfx device exposed
12:14 chikuwad[d]: it is, yes
12:15 chikuwad[d]: but I need to run the gles3.1 CTS with llvmpipe
12:15 chikuwad[d]: and it's just been picking zink w/ nvk
12:16 mohamexiety[d]: the vk cts has a config option to choose the device, doesnt the gles cts have something similar?
12:19 jannau: does `LIBGL_ALWAYS_SOFTWARE=true` work
12:20 chikuwad[d]: mohamexiety[d]: not that I can find
12:21 chikuwad[d]: jannau: that does seem to help, ty
12:25 phomes_[d]: About xcom2: I looked deeper and see that several other passes show the same pattern that draw call duration increases for each call.
12:25 phomes_[d]: Render pass 2 has the same pattern up to a point where it resets back to half of the max duration in the pass, and then even gradually drops.
12:25 phomes_[d]: I will continue to look more into it.
12:26 phomes_[d]: For gfxreconstruct I have not been able to get it to work. I have tried several combinations of versions/commits of nvk and gfxreconstruct. When I do a full capture from the start I am able to replay it up to where it should show the main menu. There it crashes with:
12:26 phomes_[d]: `[gfxrecon] FATAL - API call at index: 458123 thread: 5 vkCreateImageView returned error value VK_ERROR_INVALID_OPAQUE_CAPTURE_ADDRESS that does not match the result from the capture file: VK_SUCCESS. Replay cannot continue.
12:26 phomes_[d]: Replay has encountered a fatal error and cannot continue: A buffer creation or memory allocation failed because the requested address is not available. A shader group handle assignment failed because the requested shader group handle information is no longer valid.`
12:28 karolherbst[d]: phomes_[d]: well you can also just make a dump of the full thing
12:28 karolherbst[d]: well
12:28 karolherbst[d]: for one frame
12:29 chikuwad[d]: guh, it might legit just be easier for me to test on radv
12:29 phomes_[d]: if I trigger it to grab just one frame then I just get a greenish image that looks like some texture
12:29 karolherbst[d]: I mean when running the game
12:29 karolherbst[d]: it's going to be a looooooot, but maybe that's fine 🙃
12:30 karolherbst[d]: it's just about finding a pattern, and it's better than nothing
12:30 karolherbst[d]: but maybe X4 also shows the same behavior?
12:30 karolherbst[d]: could do the capturing in other games and see if you can dump it easily there
12:30 phomes_[d]: one thing I could try is to load the renderdoc capture with NVK_DEBUG=push_dump. Then grab the output of a single timing
12:30 karolherbst[d]: yeah
12:31 karolherbst[d]: or that
12:31 phomes_[d]: I will do both. I think I will create an issue with all the data about xcom2, and then try to look for the same pattern in a native vulkan game
12:32 karolherbst[d]: but yeah.. if we have a perf issue with big render passes that would be great, because uhm.. I suspect this will improve perf across the board
12:38 phomes_[d]: also the spikes in higher duration than prop for some passes correlate with the number of drawcalls in the pass. So maybe some weirdness with timestamps cause renderdoc to compare the start/end of a drawcall to a wrong timestamp
12:42 karolherbst[d]: yeah....... that can also be the case
13:01 zmike[d]: phomes_[d]: -m rebind
13:01 zmike[d]: or disable descriptor buffer
13:02 karolherbst[d]: but yeah.. maybe the way we do timestamps are just wrong 🙃
13:02 karolherbst[d]: *is
13:03 chikuwad[d]: yeah, I'm gonna test these CTS fails with zink/radv
13:03 chikuwad[d]: I'm not able to reproduce them on llvmpipe/softpipe for some reason
13:04 zmike[d]: gfxr does not have usable DB or heap support so you'll have to disable it for now
17:04 nazikiller8492: <3
17:10 chikuwad[d]: <3
18:54 mohamexiety[d]: mhenning[d]: gfxstrand[d] added all the r-b tags and tested by tags, do i assign marge and go?
18:54 mohamexiety[d]: for the gstream mr
18:54 anholt: anyone up for taking a quick look at the pile of nv fails in https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41243
18:54 mhenning[d]: mohamexiety[d]: I think so yeah
18:55 mohamexiety[d]: got it, thanks!
19:01 mhenning[d]: anholt: I'll see if I can reproduce those locally. I've been running cts 1.4.5.3 for a bit but I normally run against the vulkan headless platform which might be why I haven't seen those issues.
19:01 anholt: mhenning[d]: thanks! There's a list of cherry-picks in the MR -- sure hope those don't regress things, but who knows?
19:04 mohamexiety[d]: yeah i dont think anyone here runs wsi tests. i wonder what happened, these are a lot of fails 🐸
19:08 anholt: the memory allocation failure I saw on all the logs in the job artifacts was a different signature than I've seen from wsi fails before.
20:44 gfxstrand[d]: mohamexiety[d]: Yup
20:45 mohamexiety[d]: yep done
20:45 mohamexiety[d]: thanks for the reviews!
21:01 airlied[d]: gfxstrand[d]: https://gitlab.freedesktop.org/mesa/mesa/-/work_items/15369 done
21:15 airlied[d]: it's wierd the stores seem aligned correctly