09:11 kode54: I'm wondering how adequately RDNA3 is supported by Linux nowadays
09:11 kode54: or would I be better off seeking an RDNA2 card instead?
09:28 hakzsam: best support is RDNA2 for sure, but RDNA3 is going great as well (as far as RADV is concerned)
10:00 soreau: MrCooper: Did you find out any other information about using scissor with the fast clear path in radeonsi? It usually works fine, just this one case happens to demonstrate the bug I've been experiencing
10:01 MrCooper: nope, I thought fast clear couldn't work with scissor, I might be wrong though
10:02 soreau: like I said, I think it has something to do with using slow clear immediately followed by fast clear (without glFlush in between) and then rendering
10:03 soreau: the rendering either happens before the second clear or just is skipped somehow
10:04 soreau: but eh, I actually made a test where I rendered, then made a pixel buffer of all red pixels, used glReadPixels to write it and it came back all 0's
10:04 soreau: of course only in this 'perfect storm' case
10:09 soreau: what GFX version is RX580 anyway?
10:10 soreau: because si_fast_clear() in si_clear.c definitely takes different paths depending on the chip
10:30 soreau: looks like GFX8
10:33 soreau: I guess vi_dcc_enabled() means DCC enabled on volcanic islands?
11:38 pepp: soreau: yes, GFX8
11:39 soreau: pepp: thanks
11:39 pepp: soreau: would you be able to confirm that this issue was present in older Mesa versions?
11:39 soreau: pepp: I tried from 2022 about a year old, even tried differnt llvm versions
11:39 soreau: it was still present
11:41 soreau: but I'm pretty confident it's in the slow clear->fast clear path, something to do with scissor set
11:41 soreau: pepp: do you happen to know if the dimensions in the si_clear.c clear functions should represent the scissor rect?
11:42 soreau: because I don't think they do, but 'crossing' the 'too_small' bariier that seems to go from slow to fast seems to trigger the problem
11:43 pepp: the "clear" function from pipe_context says "* \param scissor_state the scissored region to clear"
11:45 soreau: pepp: but specifically, I mean this part https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/gallium/drivers/radeonsi/si_clear.c#L658
11:45 soreau: it says fb.. but does this represent the unscissored fb or what?
11:45 pepp: soreau: unscissored I think
11:46 soreau: ok, that's what my debug messages showed
11:47 soreau: I notice in si_clear(), there is a scissor_state arg but it's unused
11:47 soreau: I guess this isn't a problem?
11:48 pepp: radeonsi doesn't support PIPE_CAP_CLEAR_SCISSORED so AFAICT scissor_state should be NULL
11:48 pepp: (see st_Clear)
11:49 soreau: did you get a chance to review the issue report? https://gitlab.freedesktop.org/mesa/mesa/-/issues/9830 specifically this part https://git.sr.ht/~soreau/wl-gears/tree/radeonsi-bug-repro/item/gears.c#L1044-1062
11:49 pepp: yes I did take a look but couldn't repro on my main machine (because it's not gfx8)
11:49 pepp: it's on my todo list...
11:50 soreau: well the comments in that snippet of code shows how the scissor is somehow involved
11:50 soreau: pepp: great
11:51 pepp: if you use scissor, st_Clear won't use the st->clear() functions but instead it will clear by drawing quads
11:51 pepp: and this issue looks similar to the one fixed by https://gitlab.freedesktop.org/mesa/mesa/-/commit/573d6451335a0b3c947aa2823619e31017d0362c but in a different setting
11:52 soreau: not sure exactly what you mean but this patch fixes it http://ix.io/4GHQ basically bypassing the fast clear path altogether afaik (since there's no env var to do it)
11:53 pepp: hmm ... actually I misread the comment in gears.c :)
11:53 soreau: ah
11:54 soreau: I guess it could have been clearer
11:54 soreau: no pun intended :)
11:56 pepp: do your comments mean that adding only "glScissor(0, 0, window->window_size.width, 1);" is enough to fix flickering?
11:58 soreau: pepp: no, commenting out that line fixes the problem
11:59 soreau: the code as it stands, causes the flickering
11:59 pepp: ok got it
11:59 soreau: 👍
12:02 soreau: pepp: but if you have any wild ideas for a mesa patch before you get a chance to test locally, I would be willing to try
12:02 soreau: or if you want some info printed etc
12:04 KitsuWhooa: I wonder if my issue with flickering on compiz is the same one
12:05 soreau: hm, maybe try starting X with AMD_DEBUG=notiling to see
12:05 soreau: or just whatever app is flickering
12:06 KitsuWhooa: I suspect it'll be fixed by the time I have access to my computer with the RX580, but I'll keep it in mind and test before updating mesa
12:07 KitsuWhooa: It seems to be any window, randomly
12:07 soreau: during resize?
12:07 KitsuWhooa: nah, just sitting
12:07 soreau: huh, that's probably a bitrotting glx/tfp :P
12:07 KitsuWhooa: mmm
12:08 soreau: does compiz even do the present dance?
12:08 KitsuWhooa: I have absolutely no idea. I just watched the video in the bug report and went "that looks very familiar"
12:14 pepp: soreau: I assume that "mesa_glthread=false GALLIUM_THREAD=0 ./wl-gears" doesn't modify the outcome?
12:15 soreau: pepp: nope, still flickers
12:21 pepp: soreau: AMD_DEBUG=nodcc makes it worse, right? What about AMD_DEBUG=nodccclear?
12:21 soreau: pepp it uses the CMASK path
12:22 soreau: AMD_DEBUG=nodccclear has no effect
12:24 soreau: but yes, AMD_DEBUG=nodcc makes it to where when it reaches a certain size (> 512 * 512?) it disappears completely and doesn't show again until less than that size
12:25 pepp: amusing... I can reproduce the nodcc issue on gfx9
12:26 soreau: you mean it disappears after passing the certain limit?
12:26 soreau: size limit
12:27 pepp: but not the flickering
12:27 pepp: yes
12:27 soreau: it's a clue, at least..
12:27 soreau: interesting indeed
12:27 pepp: the cmask thing is odd; unless I'm misremembering it should only be used for msaa surfaces
12:28 pepp: can you run the app with AMD_DEBUG=tex and upload the output?
12:28 soreau: pepp: AMD_DEBUG=nodcc,notiling fixes things
12:28 soreau: if that's the right way to set those
12:29 soreau: yes
12:30 soreau: pepp: http://ix.io/4GO0
12:33 pepp: thanks, will take a look in a moment
12:33 soreau: sure
12:38 soreau: maybe somehow vi_dcc_enabled() is not right so it always returns false on RX580/GFX6-8
12:47 soreau: oh it does return true sometimes
12:54 Wallbraker: Hello hello, trying to make a little bit sense of the output of RadeonGPUProfiler. I rewrote a shader and looking at a before and after. Before I had a much larger UBO and dispatched a single instance with a Z of 1 to select between two large chunks of data. Now I flattened the shader so dispatched two with a Z of 1 and instead had two UBOs with the different data. What I noticed was that my occupancy was effectively halved. This is with a
12:54 Wallbraker: Vega64. The shader is probably very much memory limited, so wondering essentially if hitting a memory bandwidth limit shows up as lower occupancy?
13:28 Venemo: Wallbraker: it's difficult to answer your question without looking at the code. Occupancy depends on register usage and shared memory usage mostly.
13:33 Wallbraker:uploaded an image: (114KiB) < https://matrix.org/_matrix/media/v3/download/matrix.org/MNjrxajnacUoCtoDByAwfWzS/layer-squasher-compare_01.png >
13:33 Wallbraker:uploaded an image: (245KiB) < https://matrix.org/_matrix/media/v3/download/matrix.org/qUiUZKsuEFqniaFjsTqhrYbm/layer-squasher-compare_02.png >
13:33 Wallbraker:uploaded an image: (550KiB) < https://matrix.org/_matrix/media/v3/download/matrix.org/PSpNnxgElSSbQlfOGSHmnuVi/layer-squasher-compare_03.png >
13:34 Wallbraker: https://gitlab.freedesktop.org/monado/monado/-/blob/f4aca1cc830ffd6f1fe6d4e27570fd8b6ef6c207/src/xrt/compositor/shaders/layer.comp
13:34 Wallbraker: New version of the shader is there.
13:34 Wallbraker: Old: https://gitlab.freedesktop.org/monado/monado/-/blob/main/src/xrt/compositor/shaders/layer.comp
13:39 Wallbraker:uploaded an image: (189KiB) < https://matrix.org/_matrix/media/v3/download/matrix.org/JRVugbsLDljaEfvGxtKbJoDo/layer-squasher-compare_03.png >
13:43 Wallbraker:uploaded an image: (256KiB) < https://matrix.org/_matrix/media/v3/download/matrix.org/vgDYbwBZmxzyFIilVmfDdjBw/layer-squasher-compare_02.png >
13:43 mareko: pepp: CMASK is used for fast clear without MSAA on older chips
13:44 Wallbraker: The shaders should have full occapancy, but it looks like they only use half of the available shader units on the device (if you look at the first image).ö
14:04 pepp: mareko: oh ok, thx
14:40 Venemo: Wallbraker: can you upload the RGP captures somewhere so I can take a look?
14:43 Venemo: it is not immediately obvious why the new version would have half the occupancy
14:43 Venemo: it seems like both versions are 10/10
14:43 Wallbraker: Sure
14:45 Wallbraker: Venemo: https://drive.google.com/file/d/1mvdLN4M1yvMTSzI8wi5zqlTNP_NUvdin/view
14:45 Wallbraker: And made it accessible
14:56 Venemo: Wallbraker: I don't see why RGP reports half occupancy, but at the same time I don't fully get why the second version should be better
14:57 Venemo: the shaders look very similar, it just just seems you split it into two dispatches instead of one
14:58 Venemo: the total number of wavefronts is also the same
14:59 Wallbraker: Yeah, it's strange. I did reduce the number of branches, by reordering the shader a bit.
15:02 Venemo: not sure if the occupancy graph is buggy or what
15:02 Wallbraker: Hmm could be
15:03 Wallbraker: Thanks for looking into it!
15:03 Venemo: mareko: why does RGP display about 50% occupancy in the graph when the shader's occupancy is 10/10 here?
15:05 Wallbraker: Kernel: 6.4.10-060410-generic
15:05 Wallbraker: Mesa: 23.1.7~kisak1~j
16:55 mareko: Venemo: 10/10 in what units?
16:55 Wallbraker: Run 10 wavefronts out of 10 wavefronts per SIMD.
17:02 mareko: what does the graph mean?
17:58 Wallbraker: mareko: It shows how well the compute units are occupied during the execution.
18:23 Lyude: JFYI: it seems there's been some sort of kernel regression in amdgpu on the stable branch between 6.4.14-6.4.15. I'm not sure of the specifics yet but I do have a video: https://drive.google.com/file/d/13mLtlvl3kEU0Z_hXtQQiKeIakhT-xciK/view?usp=sharing you can see there's a weird pattern around the center of the screen when the image changes, it's got sort of a checkerboard pattern to
18:23 Lyude: it (blitting?)
18:24 Lyude: fwiw this was on 01:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Baffin HDMI/DP Audio [Radeon RX 550 640SP / RX 560/560X] [1002:aae0]
18:24 Lyude: I will see if I can bisect it soon
18:28 Remco: Is the linked video right? Seems to be a 3D printer video from 2023-08-09 so quite a while ago already
18:30 Lyude: oh oops
18:30 Lyude: sorry haha, wrong video
18:31 Lyude: give me a second
18:32 Lyude: Remco: https://lyude.net/~lyudess/PXL_20230919_181159850~2.mp4 there is the correct video (just threw it on my server since it's much smaller then the other file)
18:32 _ds_:sees no amdgpu-related changes between 6.4.14 and 6.4.15
18:33 Lyude: oh that's weird then, i'm able to get the bug to go away by switching kernels
18:33 Lyude: perhaps something else changed
18:38 Remco: Different firmware?
18:40 Lyude: maybe? i'd be quite surprised if this was firmware related since I'm not sure where firmware would be involved with page flips or blitting. but also my knowledge of amd hardware isn't the strongest, so I suppose it's possile
18:40 Lyude: *possible
18:40 _ds_: That seems unlikely, but maybe there's an initrd which didn't get rebuilt?
18:41 Lyude: i'm curious if something else in drm changed
18:42 Lyude: eh, I dont have the time to dig in just yet but I figured I'd let y'all know. I'll definitely dig into it in the near future though and see if I can narrow things down more
18:54 Remco: The RX 550/560 is an older card right? Is it even using amdgpu?
18:55 Remco: Or is it one of the new ones because numbers are hard for marketing
18:55 Remco: Ah, it's the new one. Sorry
18:59 kchibisov: Remco: it's polaris.
18:59 Remco: So it is old? It's so confusing
18:59 kchibisov: it's old, but it's the first GPU series which worked without quirks with amdgpu.
19:00 kchibisov: Ah, maybe RX 4XX was the first, I know that for some of them you need extra quirks.
19:00 kchibisov: The next GPUs after polaris were navi, so it's not that old.
19:02 _ds_: Vega, not navi.
19:02 kchibisov: it's a bit confusing since vega was around the same time?
19:03 kchibisov: I just remember that navi sort of replaced polaris, but you still had vega during both polaris and navi GPUs.
19:03 kchibisov: But the GPU market is weird...
19:04 kchibisov: Yeah, some polaris cards were released after vega.
19:07 agd5f: polaris is gfx8, vega is gfx9, navi is gfx10
19:40 mareko: Wallbraker: it seems like it's CU occupation
19:40 mareko: Wallbraker: i.e. which CUs are occupied
19:45 mareko: Wallbraker: either the shader is too simple that waves finish faster than they start, or the RGP trace doesn't store something correctly for gfx9
20:17 Wallbraker: Ah okay, being too simple makes sense.
23:24 illwieckz: I get random annoying GPU reset with my Radeon PRO W7600. Like: I'm working, and suddenly my screen goes OFF and ON again and my session is lost. After this happens the dmesg log says a GPU reset happened. I got it like 3 times in less than 24h. It's the first day I run that card.
23:24 illwieckz: I'm running Ubuntu 23.04 with Linux 6.2.0-33. Is it possible that my GPU is a bit too-new? May I get more luck with amdgpu-pro dkms module and firmwares?
23:24 illwieckz: I'm going to report the bug on Ubuntu kernel issue tracker, but if there is something I can do to avoid some of those resets, or if something is already known about such issue, I'm listening! :D
23:29 bnieuwenhuizen: illwieckz: not much confidence on an Ubuntu issue here, gitlab.freedesktop.org/drm/amd would be better but your kernel is behind a bit.
23:30 bnieuwenhuizen: in general if you get hangs during "normal work" it may be worth trying to disable GPU accel and/or HW video decode in your browser
23:30 bnieuwenhuizen: (not that I have a specific known issue in mind, but those tend to be more susceptible to random issues if you're not running anything like games)
23:30 illwieckz: details: https://pastebin.com/Kxtk72a7
23:31 illwieckz: I played games for hours without problem: Unvanquished, Quake Champions, Halo Anniversary…
23:31 illwieckz: But yesterday I got a reset while doing some desktop things.
23:31 illwieckz: And I got it twice in the past hours in the same way.
23:32 bnieuwenhuizen: bleh, Xorg, that doesn't help much as far as error attribution to avoid things
23:33 illwieckz: First time today when it reseted, I was interacting with a website (editing this table: https://my.calcs.quest/u/587?ref=587 ), so there was no video in that tab. Maybe there was some opened tab to some video websites (like Youtube), but I have not played videos in a browser since half a day.
23:34 illwieckz: The second time it reseted (just before I sent my IRC message), I even don't remember what I was doing precisely, maybe I was reading a document, or a page, or chatting on Discord…
23:34 illwieckz: The second time it reseted today*
23:35 illwieckz: So it feels very random.
23:35 bnieuwenhuizen: my best bet would be seeing if you can get a newer upstream kernel and if that helps
23:35 illwieckz: OK.
23:37 illwieckz: Can the amdgpu-pro dkms help me or not? As it may use newer backported amdgpu code, and I would keep the stock kernel.
23:37 bnieuwenhuizen: I have no idea
23:38 bnieuwenhuizen: it might more recent theoretically, but I have no idea how recent the drivers are for that
23:38 illwieckz: OKOK.
23:45 airlied: random misc resets can also be hw issue with things like power supply transitions, but not sure best way to diagnose them
23:47 airlied: latest mesa bit might also be a good plan, since that's a new card and fixes always accumulate
23:59 illwieckz: Ah, I have a self-built mesa in a folder, but I don't relly know how to run the whole desktop on it
23:59 illwieckz: (I'm usually only testing rusticl stuff with it :D)
23:59 illwieckz: something I haven't done between last boot and last reset anyway