07:06airlied: sima: you or anyone else know about the imx dpu stuff? 14 revs seems like we should make a decision :)
07:08sima: hm haven't look
07:08sima: airlied, is it a new driver or something?
07:10airlied: yup I just saw it go past
07:20sima: airlied, I'm not sure we have anyone consistently pushing new drivers in, I kinda stopped
07:21sima: I think tzimmermann and javierm have done a few in the past, but it's a bit a mess
07:22sima: probably should just land it and hand out some commit rights
07:22sima: mlankhorst, ^^ since mripard also doesn't seem to be around
07:22airlied: yeah I think the imgtec one is out there as well
07:22sima: airlied, v14 is defo process fail somewhere :-/
07:22sima: yeah I've outright lost track of what's floating around :-(
07:24sima: airlied, related, I have also no idea whether we're doing an ok job in adding regular contribturs as committers or not
07:24sima: since people just don't ask for that, maybe also because it's not very usual in the kernel
07:37sima: airlied, different thing, do you plan to do a backmerge into drm-next before the merge window pr?
07:37sima: I think we should create the drm-ci topic branch asap and get sfr to include it :-)
08:48airlied: sima: I think I have to do one for the last msm pull so maybe I should do it tomorrow
09:38mripard: so hopefully it will work this time
09:38mripard: for imgtec, I think most of the issue was that there was no review until recently on blocking stuff
09:38mripard: like the UAPI
09:39mripard: fortunately gfxstrand and Michel have reviewed it recently so I think we're making progress there
10:37daniels: emersion: btw, I didn't see what you pushed it as (have been away the last few days), but GPLv2-only is fine by me for the doc patch, or any other prevailing license if that makes life easier
10:40_jannau__: `ls
10:40_jannau__: err, wrong window
10:59Dan: https://www.phoronix.com/news/Mesa-Terakan-R600-Vulkan-Driver does anyone know if there's a way to install this repository version of Mesa rn? I have an aging AMD HD 6000 and I would like to make use of Vulkan if possible
11:02ccr: the link to the git repo is in the post? thought I doubt it is in usable state, and last commits seem to have been pushed months ago
11:03ccr: oops, sorry. there is activity, looked at wrong branch.
11:04emersion: daniels: thx, i pushed it as GPL2 as this was the most common thing in Documentation/
11:04emersion: and figured i could always fix it later if there were objections
11:16daniels: emersion: perfect, thankyou! :)
11:31karolherbst: I have a few issues with this nir and I'm wondering if you can help me out what nir passes to run and/or what passes to fix https://gist.githubusercontent.com/karolherbst/d07dfc2e5e0ad9288e64ce9439490e0a/raw/096799f52d635ac3f61a7cba3f2ef672a6c80699/gistfile1.txt
11:32karolherbst: though I can't really pin point my pain points yet, one of the problem is, that values are reloaded all the time
11:33karolherbst: though in theory the input addresses can be the same and I suspect that's where optimization bails?
11:33karolherbst: so because it can't exclude the possibility of those buffers being the same it has to asume the store voids the content
11:34karolherbst: yeah.. changing the global to constant makes it _way_ faster
11:34karolherbst: I might just fix the CTS
11:38DavidHeidelberg[m]: karolherbst: btw. about ubsan, it just means that if we going to test with ubsan, we have to omit Nouveau build, but if you don't need it, it's fine
11:38karolherbst: not saying that enabling ubsan is bad, it's just annoying if it reports such false positives (well.. not strictly, but the problem is something else)
11:39karolherbst: it doesn't complain about any other indexing with an int
11:39karolherbst: why with enums?
11:39karolherbst: and then also in the wrong way
11:42DavidHeidelberg[m]: no idea.. (never used ubsan before).. it's like with every compiler, it catches useful stuff, but it's usually 99.x% :(
11:43zmike: my experience with ubsan so far has been 100% false positives and useless reports
12:16karolherbst: gfxstrand: what's the intended way of getting `nir_intrinsic_load_global_constant` from CL constant buffers? let them be emitted as ubos and then use a global ptr format?
12:17karolherbst: currently nir_lower_io turns `nir_var_mem_constant` things into normal global memory after lower_explicit_io
12:18karolherbst: or maybe I just figure out how to make ubos work, but I wanted to use the mem_constant way for it first anyway
13:58sima: javierm, https://lore.kernel.org/dri-devel/20230809192514.158062-1-jfalempe@redhat.com/ I guess you're looking at this?
13:59sima: since tzimmermann isn't around
13:59dottedmag: libgbm seems to be pretty vague about what gets returned from gbm_bo_get_fd/gbm_bo_get_handle. Can one assume when DRI backend is in use that former returns dmabuf fd, and latter GEM handle?
14:02emersion: it's always the case
14:03emersion: no matter the backend
14:03dottedmag: aha, thanks
14:04emersion: i'd merge a patch making this more obvious
14:05sima: mlankhorst, jani, https://paste.debian.net/hidden/a25331e1/ any idea why we have pattern-depth 1 for this dim script?
14:09jani: sima: no idea at all, and when I look at get_maintainer.pl I don't understand what it does either. (and I may very well have added that option to dim!)
14:09jani: sima: I think e.g. dim fixes is overeager at adding Cc's
14:17sima: jani, yeah I think it's bad either way
14:17sima: depth 1 limits how much people get added
14:17sima: but otoh if you have depth and the file is deep in a driver, not even the driver author is cc'ed
14:17sima: depth=1 I mean
14:18sima: and both amd and i915 are very deep at this point
14:18sima: also many others
14:39linkmauve: enunes, anarsoul, what would be needed for robustness support in lima? I’m getting a bit tired of phosh encountering the timeout and not being able to recover from that, and I think robustness is the way to go.
14:46javierm: sima: I'm on PTO untl next week. But I do have it in my TODO list to take a look
14:54enunes: linkmauve: I never looked into robustness. I think someone helping to debug the actual reasons for the pinephone issues first would be more helpful
14:55linkmauve: enunes, I can reproduce with any shader taking more than 1s to execute.
14:56linkmauve: I then see it timeout in dmesg, and the program will continue executing with the context having been lost.
14:56enunes: well then it hits the job execution timeout and that's really not normal, from what I understand in the pinephone issues it's coupled with power management switches going on at the same time
14:58pq: linkmauve, would your program use robustness extensions is they were exposed?
14:58linkmauve: pq, not yet, but I would add support to it yes.
15:00pq: linkmauve, doesn't any existing API call return you errors after the timeout?
15:00linkmauve: glGetError() keeps returning 0, and the rest of the GLES 2.0 APIs don’t return any error AFAIK.
15:01pq: I was thinking more like EGL.
15:01pq: since contexts are an EGL thing, and egl funcs might return errors, like SwapBuffers
15:04pq: AFAIU, robustness will only ensure your app doesn't crash, and you can know about losing context and if you were the culprit or not, but the remedy is always to tear everything down and re-create from scratch.
15:05pq: so if EGL happened to tell you about it already, maybe you wouldn't need robustness
15:05linkmauve: That’s fine I think, as a compositor I’d be fine with recreating everything on crash.
15:07pq: There has been some talk about GPU reset handling on dri-devel@ recently, what should happen.
15:13enunes: I think work on robustness is orthogonal to fixing the actual known driver issue. someone still needs to put the needed effort to go in the code and debug these issues which only happen on the pinephone
15:14enunes: would be great if someone from the pinephone community stepped up to do that, with our current contributor capacity it might take another while
15:15enunes: and I think it needs to be better than those issues with a 100 maybe-relevant drive-by comments of people hitting any timeout for any reason
15:21linkmauve: enunes, how would you go at debugging this kind of issue?
15:22linkmauve: I have a PinePhone, but I have no access to any hardware debugging tool I think.
15:23enunes: enable some power management debugging in the kernel, try to see if the driver is using the subsystem correctly, maybe look in the kernel mali driver to see how they handle it and if our kernel mali driver is missing something
15:25enunes: there are also issues for frequency scaling and other for device suspend, so it could be that for each of these
15:29enunes: maybe for example devfreq needs to be guarded in some place that currently isn't so it's happening in a wrong moment, etc
15:38gfxstrand: karolherbst: If the driver controls lowering (recommended), then they can pick out UBOs while they still have deref chains. Then anything that path can't convert to UBOs, you convert to global.
15:39gfxstrand: karolherbst: The other way to do it would be to use a different addr format for constant memory (which is possible since it can't be included in generic) and lower it to something else.
15:42karolherbst: gfxstrand: mhh.. yeah, maybe. I just want to emit a load_global_constant, or maybe just turn them all into ubos and the driver does their thing anyway....
15:42karolherbst: I should probably just to the ubo thing right away
15:43karolherbst: yeah... https://opencl.gpuinfo.org/displaydeviceinfo.php?name=CL_DEVICE_MAX_CONSTANT_ARGS
15:43karolherbst: I think it's fair to use ubos there given what limits other drivers expose
15:46ids1024[m]: When buffers are allocated for scanout in Iris, the ISL_SURF_USAGE_DISPLAY_BIT usage it set, and in https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/intel/isl/isl.c#L1727-1749 that adjusts the row pitch to be suitable for scanout, considering the alignment required by Nvidia or AMD. It looks like pitch alignment is also an issue for importing dmabufs on Nvidia and AMD cards too, though, not just scanout? On Wayland
15:46ids1024[m]: linux-dmabuf-unstable-v1 has a scanout flag, but as far as I can tell there's currently no other way to handle alignment requirements for import on a different GPU.
15:48emersion: ids1024[m]: yes.
15:54gfxstrand: bnieuwenhuizen, airlied: What does RADV do if it runs out of command buffer space? Do you hand a pile to the kernel or do some sort of goto/chain thing?
16:02anarsoul: enunes: linkmauve: timeout issue might as well be a bug in Allwinner A64 arch timer
16:03anarsoul: the timer is buggy and experiences time jumps, maybe workaround in the kernel isn't sufficient to fix it
16:05bnieuwenhuizen: gfxstrand: chain
16:05anarsoul: linkmauve: enunes: to rule it out, you need to add support for A64 into timer-sun4i and switch to timer-sun4i clocksource
16:05gfxstrand: bnieuwenhuizen: On all hardware or is it really restricted?
16:05bnieuwenhuizen: Except on old HW where we can batch multiple buffers
16:05gfxstrand: bnieuwenhuizen: I see a use_ib thing
16:06bnieuwenhuizen: Which with power of two growth also works
17:04anarsoul: linkmauve: btw, we have #lima channel to discuss lima issues
17:12anarsoul: linkmauve: it looks like timer-sun4i already supports A64. So I guess you can make sure that it's enabled in the kernel and just try to switch clocksource
18:38Venemo: I can't get over how difficult it is to read NIR 2.0 now
18:39zmike: just let your eyes lose focus like you're staring at a magic eye puzzle
18:39Venemo: :D
18:51alyssa: Venemo: what's the problem?:(
18:51Venemo: just need to get used to it, I guess
18:52Venemo: two things that look weird to me are how the bit sizes are tabulated and how intrinsics without definitions are tabulated
18:52alyssa: Oh, you mean the nir_print
18:52alyssa: I thought you meant the nir_def
19:06sima: airlied, I created topic/drm-ci and asked sfr to add it in case you missed my reply
19:06sima: koike, robclark daniels ^^ more acks from (driver) maintainers would be real good
19:38robclark: sima: probably better to ping folks who _haven't_ already acked it... I don't think you can just add my a-b 5x times :-P
19:38sima: robclark, I figured you're most motivated to find those people :-P
19:39robclark: I think i915 has the most devices so far, in drm/ci
19:41Venemo: alyssa: I mean the result of nir_print_shader yes
20:18karolherbst: gfxstrand: is there a way in vulkan to query how much private memory a pipeline/shader/whatever consumes?
20:18karolherbst: I'm sure vulkan does not bother with that information at all
20:18karolherbst: just wanted to double check
20:25pixelcluster: karolherbst: unless we talk raytracing there is no standardized way, no
20:26pixelcluster: private mem doesn't exist as a concept in vulkan, except in vulkan for rt pipeline stacks
20:26alyssa: karolherbst: is this for ruzticl?
20:27pixelcluster: if you know the driver you could perhaps filter it from pipeline executable properties, if it includes that info there
20:28pixelcluster: s/in vulkan for rt pipeline stacks/in raytracing for the stacks used in recursive calls/
20:30karolherbst: yeah
20:30karolherbst: alyssa: ^^
20:30karolherbst: pixelcluster: mhhh
20:30alyssa: karolherbst: >:)
20:31karolherbst: pixelcluster: I think I'll let zink just use nir->scratch_size
20:31karolherbst: the good thing is... I can return whatever value
20:31karolherbst: this is just a hint in CL
20:31karolherbst: no idea what applications would do with that though
20:31karolherbst: maybe account for VRAM and see if their stuff fits? dunno
20:38gfxstrand: karolherbst: no
20:38karolherbst: :')
20:39karolherbst: gfxstrand: ohh.. is there a property for compute shaders to know how many threads can be in a block per compiled pipeline/object?
20:39gfxstrand: max local workgroup size? Yes.
20:39karolherbst: with variable block size that is
20:39karolherbst: ahh, cool
20:39karolherbst: where could I find that?
20:40gfxstrand: VkLimits
20:40gfxstrand: Or maybe VkDeviceLimits?
20:40karolherbst: not device limits, I need it per compiled shader
20:42zmike: gfxstrand: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20841
20:42karolherbst: seems like clvk just returns the device limit...
20:43karolherbst: (that's not very helpful)
20:43karolherbst: thing is... launching a kernel with the value returned there has to _always_ succeed
20:43karolherbst: if the device limit is 1024, but a given compiled shader can only support 512 threads, then it would be a bug to return 1024
20:44gfxstrand: If it's possible for a compiled shader to only support 512 threads, then it's a bug to return 1024 in the VkDeviceLimits
20:45karolherbst: mhhh
20:46karolherbst: the spec wording doesn't make me very confident about that
20:46gfxstrand: vkCmdDispatch doesn't have a way to safely fail
20:46karolherbst: ahh, fair enough then
20:47karolherbst: so drivers have to be very pessimistic about what they report through maxComputeWorkGroupInvocations then
20:47gfxstrand: Most drivers don't have Intel's insane flexibility
20:47karolherbst: (or lower it)
20:47karolherbst: I think it's also a problem on nvidia hardware
20:47gfxstrand: Could be
20:48robclark: it's a problem on all hw I think ;-)
20:48karolherbst: you have 64k grps, and you can allocate up to 255 per thread
20:48karolherbst: but you can launc 1024 threads
20:48karolherbst: so you'd have to report 256 threads max to be safe
20:50karolherbst: ehh seems like newer GPUs can even do 2048 or 1536 threads...
20:50karolherbst: uhm.. not per block
20:51karolherbst: seems like nvidia reports 1024 though...
20:51karolherbst: so either they cheat it, or you can kinda deal with nonsense here
21:25Venemo: karolherbst: what exactly do you mean by private memory there?
21:25karolherbst: memory an implementation will have to allocate to run a certain shader
21:26karolherbst: like scratch memory or other funky stuff
21:26karolherbst: might even include spilled values and other things
21:26karolherbst: essentially any kinda of global memory you'd have to allocate to run something
21:29columbarius: emersion: thanks
21:42Venemo: karolherbst: I don't think these are defined in the API in any way
21:42Venemo: they are entirely implementation dependent
21:58alyssa: karolherbst: You would presumably need a vk ext for this
22:02karolherbst: or I just don't care :P
22:02karolherbst: there are more improtant things to figure out :D
22:07alyssa: valid
22:11karolherbst: does vulkan have a way to report the native GPU pointer size?
22:15alyssa: karolherbst: 64
22:15karolherbst: I guess that's fair
22:15karolherbst: so if your GPU has 32 bit pointers, you can't do vulkan? :P
22:15karolherbst: (though I guess you just zero the high bits)
22:19alyssa: :P
22:19alyssa: eric_engestrom: We think we've found a Mesa EGL bug (though there's a small chance it's a CTS bug)
22:20alyssa: The symptom is that Mesa does not pass CTS if gles1 is disabled at build-time (-Dgles1=disabled).
22:21alyssa: Presumably this wasn't caught because the gles1 option is default true and people are running CTS on their development builds, and not the wacko space-optimized release builds that system integrators come up with
22:21alyssa: (But it looks like at least 1 major distro is unwittingly shipping non-conformant Mesa due to this issue)
22:22alyssa: One failing test is `dEQP-EGL.functional.create_context.no_config`
22:23alyssa: This test unconditionally creates contexts of all APIs, and then skips based on the error code
22:23alyssa: It looks like Mesa is returning the wrong error code for GLES1 on -Dgles1=disabled builds
22:23alyssa: causing the gles1 portion of the test to fail rather than be correctly skipped
22:24alyssa: apparently other tests like `dEQP-EGL.functional.create_context.rgb888_no_depth_no_stencil` are also failing... I can't tell why, since they don't seem to exercise GLES1, but maybe I'm looking at the wrong CTS source code.
22:24alyssa: This is based on debugging on Asahi, but presumably all Mesa drivers are affected
22:25alyssa: I'm hopeful this will turn out to be a 1-line Mesa patch... and not a CTS bug...
22:25alyssa: Regardless, could you take a look at this tomorrow? Thank you! :-)
23:02karolherbst: anyway....
23:02karolherbst: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24839
23:59zmike: reviewed