04:57kode54: https://gitlab.freedesktop.org/drm/intel/-/issues/8325 hmm, fascinating
04:58kode54: wouldn't know the first thing to bisect to find what may be causing this
08:03danvet: emersion, https://lore.kernel.org/dri-devel/20230411222931.15127-2-ville.syrjala@linux.intel.com/ for you I think
08:03danvet: pq not here ...
08:03danvet: daniels, ^^ maybe
08:03danvet: emersion, this makes me wonder, do we need to formalize kms uapi reviewers?
08:04danvet: it's a bit hard to catch with a MAINTAINERS entry because it's all over
08:21emersion: ah, i've actually never used CTM
08:21emersion: JoshuaAshton: ^ does this match your experience?
08:22emersion: danvet, that would be nice. i think we already ask people to CC wayland-devel for new props?
08:22danvet: emersion, is that documented anywhere?
08:23danvet: I'm also wondering whether we could at least make some pattern matches in MAINTAINERS
08:23danvet: like matching the uapi header file is easy, but many props are just all over
08:24emersion: hm i thought so, but apparently not
08:24danvet: plus often the important stuff is in the kerneldoc and I don't think there's a way to match that
08:25danvet: emersion, I guess at least a "DRM KMS UAPI" MAINTAINERS entry that matches include/uapi/drm/drm_mode.h?
08:25danvet: as a start at least
08:25jannau: that ctm order is what Apple display processor FW expects and is used by kwin 5.27.3 and later
08:25emersion: maybe we can move the KMS prop docs to a rst
08:26emersion: hm, not sure that'd be a welcome change
08:26emersion: but yeah maybe start with drm_mode.h and we can improve later
09:40javierm: danvet: why are you looking fbmem and fbcon again?
10:41Lynne: how is it possible a memory load after 10s of barrier(); and memoryBarrier(); and everything else I could think of still differs between compute local invocations within a workgroup?
10:52ishitatsuyuki: Lynne: IIUC, neither barrier nor memoryBarrier will flush caches unless the variable is marked as coherent
10:52ishitatsuyuki: hm, but you mentioned within a workgroup
10:54ishitatsuyuki: so, technically, barriers only ensures coherency for shared memory
10:54ishitatsuyuki: but as far as AMD hardware is concerned, memory access within the same workgroup should be coherent
11:19Lynne: I've tried both nvidia's binary drivers and radv, the behavior is the same
11:19Lynne: the only way I know I'm somewhat sane is that if I do another dispatch with a memory barrier, the memory looks fine
11:20Lynne: but I really want to do the processing in a single shader, since barriers completely kill performance
11:21Lynne: adding coherent doesn't fix the issue either
11:21Lynne: I'm using BDAs if it matters
11:23Lynne: this is my shader if anyone's interested - https://paste.debian.net/1277097/
11:24Lynne: issue is at line 313
11:25Lynne: the same value should get splatted horizontally, but yet, after x = height[0], the old value is written instead, as if prefix_sum() wasn't even called
11:26Lynne: if I put a hardcoded value for a, it works, if I don't call prefix_sum() the second time it works
12:22ishitatsuyuki: Lynne: if I understand correctly, the decoupled lookback algorithm only waits for indices up to the thread's job to complete before continuing. hence loading v[4096200] feels a bit suspicious to me, unless this is conditioned to only happen for the last workgroup
12:23Lynne: it's just a convenient index, since it's non-zero, it breaks with any index
12:28ishitatsuyuki: Are the prefix sums global? Are there sufficient synchronization before and after the prefix sum to ensure src/dst writes are visible?
12:32Lynne: global?
12:32ishitatsuyuki: Are you doing prefix sums across workgroups?
12:32Lynne: no, there's only a single workgroup dispatch
12:32ishitatsuyuki: so vkCmdDispatch(1,1,1)?
12:33Lynne: yup
12:34ishitatsuyuki: weird
12:36Lynne: I can push the code somewhere with instructions to run it if you find it interesting enough
12:38ishitatsuyuki: looks like RDNA can actually have two L0 cache within a dual compute unit, where the same workgroup can be scheduled
12:38ishitatsuyuki: when you added coherent, where did you add it to?
12:39Lynne: in both the buffer definition (coherent buffer DataBuffer) and the pushConstants pointer
12:40ishitatsuyuki: I guess that should do the job
12:41ishitatsuyuki: busy with other stuff right now but I'll check later to see if we're emitting the correct instruction flags for this case (workgroup running on dual CU)
12:42ishitatsuyuki: can you try barrier with explicit memory scope on nvidia?
12:43Lynne: explicit memory scope?
12:44ishitatsuyuki: lemme lookup the syntax but it's from vulkan memory model
12:46ishitatsuyuki: controlBarrier(gl_ScopeWorkgroup, gl_ScopeWorkgroup,gl_StorageSemanticsBuffer,gl_SemanticsAcquireRelease)
12:50Lynne: same output on both radv and nvidia after putting it right after the barrier() call
12:50Lynne: same == wrong
12:57ishitatsuyuki: I think you need it before and after the prefix sum too
12:58ishitatsuyuki: (for the src/dst synchronization I mentioned above)
13:04Lynne: hey, that did something!
13:05Lynne: still not correct, but it looks a bit more correct now
13:05HdkR: karolherbst: Congrats on rusticl for radeonsi merge! Next stop rusticl for Freedreno? :D
13:15Lynne: hmm, it still looks as wrong in my regular use-case rather than this hacked-up simplified version
13:16danvet: narmstrong, so you'll apply the bridge maintainer patch with all the fixups?
13:17narmstrong: danvet: yup, or you can resend as you wish
13:17danvet: nah happy if you take care :-)
13:17narmstrong: danvet: sure!
13:18javierm: danvet: I think narmstrong's suggestion was to fix the author field though
13:18javierm: while in the past you asked to add both S-o-B tags
13:18narmstrong: I can do either just tell me
13:21karolherbst: HdkR: probably just works.. or maybe not as it needs some func pointers, dunno...
13:21karolherbst: I don't have hardware :D
13:25narmstrong: danvet: ok to add both SoB then?
13:25danvet: sure
13:26mlankhorst: danvet: is the merge window for v6.4-rc1 still open?
13:29danvet: mlankhorst, I guess one last -next pull or something
13:33mlankhorst: sent, didn't look too spectacular. :)
13:38HdkR: karolherbst: Sounds like a good reason to get such hardware :P
14:27danvet: mlankhorst, did you prep drm-misc-next-fixes already?
14:35mlankhorst: danvet: hm no, should that not be drm-next after merge?
14:57danvet: mlankhorst, yup, and that's pushed now so you can forward
15:44robclark: danvet: how to link to rst?
15:45danvet: robclark, https://dri.freedesktop.org/docs/drm/doc-guide/sphinx.html#cross-referencing
15:45danvet: note that functions should auto-link anywhere
15:46danvet: https://dri.freedesktop.org/docs/drm/doc-guide/kernel-doc.html#cross-referencing-from-restructuredtext
15:46danvet: there's actually good docs for the kernel's doc system :-)
15:46robclark: so just use full path then
15:47danvet: well if you want a specific chapter in an .rst then it gets more tricky
15:49danvet: need to put a sphinx tag above the heading and reference that
15:52robclark: full path I think is better.. because it is also something that looks sensible to someone looking at code instead of html ;-)
16:08robclark: `make htmldocs` does take some time..
16:17danvet: robclark, incremental is a lot better
17:27mlankhorst: done
17:28mlankhorst: drm-misc-next is closed!
17:38javierm: danvet, jfalempe: any idea how my patch could had made things worse? That's very surprising... https://lists.freedesktop.org/archives/dri-devel/2023-April/399981.html
19:17DavidHeidelberg[m]: robclark: a630 showing huge performance boost, any particular commit you suspect? :D https://gitlab.freedesktop.org/mesa/mesa/-/issues/7144#note_1865796
19:37DavidHeidelberg[m]: gallo: performance go brrrrr ^ :)
19:57DavidHeidelberg[m]: this could be offender: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22148 but most likely this one: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22098
20:22robclark: more likely the latter
20:24robclark: there was also a kernel fix for anything doing short duration (<10ms) fence waits.. not sure if that shows up in any traces (stk does this)
20:25robclark: (hmm, well the kernel fix might actually make the trace appear slower since the trace is just a pre-recorded # of fence waits and nothing to do with reality)
21:25karolherbst: dcbaker: seems like bindgen is doing incompatible changes and we have to know the version in meson :(
22:46Lynne: ishitatsuyuki: any ideas on how to fix this?