00:00 soreau: oh
00:00 soreau:didn't review the video
00:00 vaxry: I attached a video, they look like noise, kinda like UAF stuff that I'd sometimes do
00:00 vaxry: random fuzzy colors
00:00 soreau: do you have any other hw that supports vulkan?
00:00 soreau: might be another option
00:01 soreau: for testing
00:01 vaxry: I have a laptop with an AMD iGPU, dunno if that supports tearing
00:01 vaxry: might try
00:01 soreau: what does it need to support specifically?
00:01 vaxry: DRM_MODE_PAGE_FLIP_ASYNC I guess
00:01 vaxry: IIRC intel doesn't
00:01 vaxry: nvidia is wonky
00:02 soreau: indubitably
00:02 soreau: but if it doesn't happen there..
00:02 soreau: could rule out a bug in your impl
00:03 vaxry: I asked two other people, one on -zen with the same gpu (6700 xt) and another on regular linux with 6750 xt and both didnt have either artifacts or the log spam
00:04 soreau: have you tried any of the RADV debug options outlined in mesa env vars page?
00:05 soreau: maybe nodisplaydcc or nofastclears or something
00:05 vaxry: nope, I can try em, let's see
00:05 soreau: also aco debug opts
00:06 soreau: or RADV_DEBUG=llvm
00:06 soreau: to skip aco
00:06 soreau: might have to run the compositor with these set, not just the game
00:07 soreau: does it happen in nested compositor instance?
00:08 vaxry: nested can't tear because they aren't drm
00:08 soreau: also if you can find some test where it happens that isn't a full game, might be easier to track down
00:08 vaxry: yeah havent found a single thing that would do this outside cs2
00:09 vaxry: tried a dozen games and a few simple gl apps
00:09 soreau: hm
00:10 soreau: well there are some things you can try, hope that helps
00:11 vaxry: yeah I'll see, so far "RADV_DEBUG="llvm,nodisplaydcc,nofastclears"" has just made it worse I think lol
00:11 vaxry: I'll get back to ya if I find anything, thanks a lot
00:12 soreau: np
00:14 soreau: and since it only happens in that game, maybe tweak some of the video options (but I expect you might have tried that already)
00:14 soreau: in the game
00:14 vaxry: yeah, I did. Nothing really helps
00:26 soreau: vaxry: looking at the video, is the problem what looks like the mouse cursor over the game image?
00:30 soreau: (if so, does it happen if you set the env var to enable software cursor for wlroots?)
00:41 vaxry: it's not the cursor, unforto already discussed with other wlrootseans
00:41 vaxry: software cursors dont do anything to fix this, neither does completely removing em
00:42 vaxry: however, removing mouse input from csgo _does_ fix the issue, it only happens when moving the character's viewpoint it seems
00:47 soreau: maybe it's just a bug in the game or something
00:48 soreau: I guess with tearing, comes artifacts
00:48 soreau: in certain situations
00:49 soreau: btw, does it show up in a video like with wf-recorder?
00:49 soreau: or does that not work in this case
00:57 vaxry: nope, on video they are not visible
00:57 vaxry: buffers game <-> compositor seem A-OK
00:57 soreau: hm
01:27 soreau: bnieuwenhuizen: something more like this? http://ix.io/4HI5
02:18 mareko: soreau: yes except the DCC flags, and multisamples aren't supported by modifiers
02:21 soreau: mareko: ok, but which are the multisamples?
02:21 mareko: soreau: just saying, modifiers don't support MSAA
02:22 soreau: does the patch do anything with MSAA? afaict, it doesn't
02:22 mareko: also I think they don't support multiplane images either, at least not with amdgpu
02:22 mareko: right
02:22 soreau: ok
02:22 soreau: mareko: thanks
02:22 soreau:tries again
02:26 soreau: mareko: http://ix.io/4HI5
02:27 mareko: each add_modifier call should set all the fields
02:29 mareko: you also need GB_ADDR_CONFIG fields PIPE_INTERLEAVE_SIZE and ROW_SIZE
02:29 soreau: mareko: what do you mean by 'all the fields'?
02:29 soreau: like just make one call out of that or'ed together?
02:29 mareko: yes
02:29 soreau: ok
02:30 soreau: now for PIPE_INTERLEAVE_SIZE and ROW_SIZE..
02:30 soreau:greps around
02:54 soreau: mareko: http://ix.io/4HI5 not sure what to do with ROW_SIZE
02:55 soreau: seems like it's already handled in gfx_vX_0.c files
02:58 soreau: there is no adev->gfx.config.gb_addr_config_fields.row_size..
02:59 soreau: mareko: I'm on kernel 6.5.0 FWIW
04:47 mareko: soreau: it's part of GB_ADDR_CONFIG
04:48 soreau: mareko: but that define isn't available in amdgpu_dm_plane.c afaik
04:48 mareko: it's actually adev->gfx.config.mem_row_size_in_kb
04:48 soreau: oh
04:49 soreau: that's why I couldn't find it
04:49 soreau:grep fail
04:53 soreau: mareko: but I need to assign it to tiling_info->gfx8...?
04:54 soreau: the compiler is telling me there aren't such fields on that side
04:54 mareko: you need to get it there somehow
04:54 mareko: the MASKs are placeholders I guess
04:56 soreau: mareko: should it be assigned to gfx6 maybe?
04:56 soreau: or gfx8 in this case
04:57 mareko: you really need just one substructure for gfx6-8
04:58 soreau: oh no memeber named gfx6
04:58 mareko: also old hw doesn't use DC
04:58 soreau: so I guess it's gfx8
04:59 soreau: I removed the DC bits.. but do I need AMD_FMT_MOD_SET(DCC, 0)?
05:00 mareko: no
05:01 soreau: ok
05:14 soreau: mareko: This is what I have now http://ix.io/4HI5 but I just added those fields to tiling_info->gfx8 struct, so they're set but not hooked up to anything (read anywhere) yet
05:19 soreau: the other thing is, that struct is called dc_tiling_info..
06:20 mareko: you can ignore PIPE_INTERLEAVE_SIZE, but you need ROW_SIZE in the modifier
07:16 soreau: mareko: had a power outage, the last message I got from you was <mareko> no
07:17 soreau: if there was another message after that, I didn't get it
07:19 Venemo: soreau: https://people.freedesktop.org/~cbrill/dri-log/?channel=radeon&date=2023-09-30
07:21 Venemo: the channel is logged so don't worry if you got disconnected
07:24 soreau: oh right, I forget
07:24 soreau: Venemo: thanks
08:12 soreau: marcheu: ok, http://ix.io/4HI5
08:12 soreau: mareko: ok, http://ix.io/4HI5
08:14 soreau: I got the 28 from https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/amd/amdgpu/sid.h#n1170
08:15 soreau: not sure if that's right for AMD_FMT_MOD_GFX6_ROW_SIZE_SHIFT
08:15 soreau: but it builds :-)
08:52 soreau: oh I think it should be 23
08:57 soreau: because it shifts the mask by shift and needs room for each bit in the mask
09:02 soreau: bnieuwenhuizen: mareko: Does this look good/complete or does it need more work? http://ix.io/4HI5
10:41 mareko: a lot more work
11:15 haasn: how much faster than subgroupAdd(...) is subgroupBallotBitCount(subgroupBallot(...))?
11:15 haasn: I wonder whether it would be worth to use two ballots instead of a single add if I want the low 2 bits of every invocation
11:27 soreau: mareko: should it look more like bnieuwenhuizen's commit here? (should I follow this as an outline?) https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=faa37f54ce0462b2b5ec99bb8239a568bb8ba8d5
11:27 Venemo: haasn: ballot bit count is a scalar instruction, while subgroupAdd is several vector instructions
11:28 haasn: I see, so probably the former will run almost for free
11:28 haasn: since SALU and VALU are separate?
11:28 bnieuwenhuizen: well, also have to extract the 2 bits
11:29 Venemo: it depends on what is in the subgroupBallot(...<here>...), but it should still be cheaper than a reduction
11:30 bnieuwenhuizen: yes, I suspect the ballot route would be faster
11:30 haasn: the motivation is to get a ballot of (x > 0u) but with a sort of soft roll-off
11:31 haasn: so instead of only one bit I want to have a 2 bit result, basically computing the sum of min(x, 3u)
11:31 haasn: and my idea is to use two ballots, one checking for x == 1u || x == 3u and the other checking for x > 1u
11:31 haasn: || x >= 3u I mean
11:32 haasn: or alternatively min(x, 3u) & 1u and min(x, 3u) & 2u
11:35 haasn: hmm actually this approach won't work anyway, back to drawing board
11:46 soreau: mareko: after reviewing the commit I just linked, it seems I have most things covered.. what more is there? separation for GFX6,7,8?
11:48 soreau: or does it need stuff in relevant amdgpu/gfx_vX_X.c files?
11:48 soreau: or do you mean mesa?
11:50 Venemo: haasn: sounds like you are trying to implement a reduction, in which case you may as well just use an actual reduction
11:51 haasn: by reduction you mean subgroupSum()?
11:52 haasn: my understanding is that this is forced to add 32 bits, whereas we can get away with only 2 bits in my case since I don't need it with such high precision
11:52 haasn: anyway I'll implement and bench it
11:52 haasn: that should solve all questions
12:25 Venemo: you can look at the shader disasm to see which shader looks better
15:48 MrCooper: vaxry: how does your compositor respond to the async page flip failing? Since you get tearing, does it maybe blit to the scanout buffer? If so, the artifacts look like unresolved DCC compression or something like that
15:49 vaxry: so, if a page flip fails, wlroots should "rollback changes", I believe, when it comes to my compositor it just kinda goes "oh well" and waits for another frame to be ready
15:50 vaxry: I get a lot of fails on different games too, notably minecraft as it can push like 1.5k fps, but no artifacts
15:50 MrCooper: do some async flips succeed though?
15:50 vaxry: yeah, most
15:50 vaxry: in cs2 I'd say 4/5 do
15:51 vaxry: maybe more, if I don't move my mouse 99% do succeed
15:51 MrCooper: then it sounds like wlroots / your compositor isn't properly waiting for flip completion events before submitting the next flip
15:51 vaxry: but... async flips...?
15:51 MrCooper: can tear, that's all
15:51 MrCooper: still need to wait for the completion event
15:52 vaxry: yeah but isnt the definition of that "we don't wait for a finish"?
15:52 MrCooper: nope
15:52 vaxry: so... wait. What you are saying is that I should wait for drm to send a frame event?
15:52 MrCooper: the flip ioctl is always asynchronous in that sense
15:53 vaxry: I'm not sure I understand this one
15:53 vaxry: when we'd do sync, the flip event would come at the next vblank
15:53 MrCooper: DRM_MODE_PAGE_FLIP_ASYNC means "not synchronized to vertical blank"
15:53 vaxry: yes
15:53 MrCooper: nope, it comes when the flip completes
15:53 vaxry: so... vblank?
15:53 MrCooper: not with DRM_MODE_PAGE_FLIP_ASYNC
15:53 vaxry: yes, here I understand
15:54 MrCooper: if you submit the next flip before the previous one has completed, you get EBUSY
15:54 vaxry: so what you are saying is that with ASYNC, I should _not_ push another frame if drm hasn't sent me a page-flip notification yet
15:54 MrCooper: yep, always, the ASYNC flag is irrelevant for that
15:54 vaxry: okay, thanks a lot. Will impl that and see what happens
16:07 vaxry: OH MY GOD man you did it, that was it. No more log spam, no more artifacts. Thank you so much :D
16:07 vaxry: and it still tears :D
16:21 soreau: MrCooper++;
20:27 mareko: soreau: it should kind of look like bnieuwenhuizen's commit, though that's not all
20:27 soreau: mareko: I noticed there were more commits in that series
20:28 soreau: but some of the items are already in place AFAICT
20:29 mareko: tilings_flags_to_modifiers will be different for gfx6-8
20:29 mareko: because all your tiling flags and modifier flags will be different
20:31 soreau: mareko: you mean i.e. fill_gfx9_plane_attributes_from_modifiers()?
20:31 soreau: that calls fill_gfx9_tiling_info_from_modifier()
20:32 soreau: which in turn calls fill_gfx9_tiling_info_from_device()
20:32 mareko: yesconvert_tiling_flags_to_modifier
20:32 mareko: convert_tiling_flags_to_modifier
20:33 soreau: well I see there is fill_gfx8_tiling_info_from_flags() already
20:34 soreau: but you're talking about drivers/gpu/drm/amd/amdgpu/amdgpu_display.c:701
20:36 soreau: I already made a small modification to that function to return version = AMD_FMT_MOD_TILE_VER_GFX6 in the final case
20:36 soreau: and set version = AMD_FMT_MOD_TILE_VER_GFX9 in the case if (adev->ip_versions[GC_HWIP][0] >= IP_VERSION(9, 0, 0))
20:37 soreau: but that's the only thing I've changed in that function so far
20:38 soreau: I'm guessing switch (swizzle & 3) case 3: might need filling in?
20:39 soreau: and not sure if (has_xor) will be hit but that could stand a case for AMD_FMT_MOD_TILE_VER_GFX6
20:40 soreau: and if (dcc_offset != 0) will not be true for gfx6-8?
20:40 soreau: if so, that's like half the function
22:32 mareko: none of that code is relevant for gfx6-8
23:03 soreau: mareko: so it just needs to be skipped by returning early in convert_tiling_flags_to_modifier()?