IRC Logs of #dri-devel on irc.freenode.net for 2024-01-04

00:21 mareko: Mesa has a tiny x86 assembler, so in theory it could do NIR -> x86
00:23 mareko: the GL select mode uses a geometry shader now, so draw is only used by rasterpos and feedback mode, and rasterpos is a single shader thread that could run quickly on x86
00:32 airlied: we have aarch64 users though, so I'm not sure we'd care for x86 only answers
00:32 airlied: since we have llvm for x86 anways
00:33 airlied: unless we start adding NIR cpu backends :-P
00:34 zmike: I can't imagine such a thing would ever work
00:37 HdkR: <- This guy using AArch64 + Radeon https://cdn.discordapp.com/attachments/702040551893106732/1192265498260406345/IMG_1482.jpg?ex=65a872c6&is=6595fdc6&hm=c35650b84cd9a8ce3f874f41c1ca9870d9a2796f104734fde5cb62e6a073b340&
00:42 airlied: zmike: you don't have to imagine it, just fund it :-P
00:43 zmike: I'll put in $20 that says it doesn't work
00:44 karolherbst: zmike: why wouldn't it tho?
00:44 karolherbst: shit... now I gonna have to do it, no?
00:46 zmike: I don't know why you'd bother, there's no way you'd ever get it to even run glxgears
00:47 karolherbst: your taunts do nothing
00:47 karolherbst: 🙃
00:47 zmike: just stating facts
00:47 zmike: you're too busy with rusticl anyway
00:47 karolherbst: true
00:48 Sachiel: software rasterizer in OpenCL C
00:48 karolherbst: no
00:48 karolherbst: somehow just compile mesa for CL C
00:49 airlied: I wonder which I could make happen first, this 18 hours cts run or a nir exec :-P
00:50 karolherbst: my best are on nir exec
00:50 karolherbst: *bets
00:51 zmike: airlied no.
00:51 zmike: focus.
00:51 karolherbst: just do it airlied
00:51 zmike: just think of all the useful things you could be doing instead
00:51 zmike: like
00:52 karolherbst: supporting function calls in aco
00:52 karolherbst: zmike: oh btw, have any ideas to fix the use-after-free? I wouldn't even mind fixing it myself
00:55 airlied: I'm trialing introducing coffee so I do have to restrain major impluses to write stuff
00:56 mareko: coffee is a hell of a drug
00:57 zmike: karolherbst: yes, in short the zink_batch_state structs need to become screen-owned such that, upon a context being created or destroyed, they are cached/retrieved under lock from the screen instead of being destroyed
00:57 karolherbst: ahh
00:57 zmike: I was planning to do it tomorrow or friday since I have fewer calls, but feel free if you want to
00:58 karolherbst: let me give it a shot tomorrow and I'll tell you if I get annoyed or give up
00:58 zmike: ok
00:59 zmike: the gist is you need to still clear all the batch states on ctx destroy but then not free them, and then on ctx/batch_state creation you need to reinit the cached batch states so they "belong" to the new ctx
00:59 karolherbst: yeah...
00:59 zmike: very little new code, just some moving
01:00 mareko: or rusticl could be enabled for softpipe
01:00 karolherbst: pain
01:01 zmike: it would have to have a compelling name to justify the work
01:01 zmike: and surely no one is that clever
01:08 jenatali: Softicl?
01:21 mareko: vulkan on softpipe could be called cloggedpipe
01:36 alyssa: zmike: congrats on your chairmanship :)
01:36 karolherbst: what did mike get himself into this time?
01:36 HdkR: Something like a Herman Miller or more a recliner situation? :P
01:36 airlied: khr gl/gles chair
01:36 karolherbst: fun
01:36 alyssa: an aeron
01:37 karolherbst: congrats tho
01:37 airlied: konstantin, zmike : finally the lvp descriptor size reduction is assigned to marge
01:50 karolherbst: zmike: guess that wasn't hard after all 🙃
01:50 karolherbst: basically done
01:52 karolherbst: well.. I have a bug, I never reuse the batches :D
02:00 karolherbst: ohh.. I forgot to reassign the context...
02:05 alyssa: airlied: re 54232bee06a ("llvmpipe: flush resources on sampler view binding"),
02:05 alyssa: is that actually required?
02:05 alyssa: It seems to be papering over a test bu
02:05 alyssa: bug
02:12 alyssa: The GLES version of the test has the barrier, the GL one doesnot
02:13 karolherbst: zmike: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26889
02:14 alyssa: divergence is alrady in "Import Khronos OpenGL CTS"
02:14 alyssa: whee
02:14 karolherbst: ehh...
02:15 karolherbst: I forgot to update `last_free_batch_state` in one area ..
02:16 karolherbst: fixed
02:19 alyssa: divergence is old
02:22 airlied: alyssa: that wouldn't surprise me, I was probably in a just pass tests mode when I wrote it
02:23 alyssa: airlied: ack
02:23 alyssa: I am.. pretty sure the test is busted and the bug takes back to 2014
02:33 alyssa: airlied: filed https://gitlab.khronos.org/Tracker/vk-gl-cts/-/issues/4867
02:35 airlied: alyssa: nice, will happily burn that fix once that lands
02:35 alyssa: :)
03:34 JoshuaAshton: Lynne: Thanks for hooking up ffmpeg h265 encode, it's much easier to follow than that weird NVIDIA sample that's incredibly confusing and weird
09:29 dj-death: should the divergence analysis pass be dealing with registers?
09:29 dj-death: looks like nothing does currently for registers created with nir_lower_locals_to_regs()
09:42 dj-death: div 32 %117 = @decl_reg () (num_components=1, num_array_elems=0, bit_size=32, divergent=0)
09:42 dj-death: ah maybe there is another problem actually :)
10:30 jfalempe: If I add a macro like drm_for_each_legacy_plane (https://elixir.bootlin.com/linux/latest/source/include/drm/drm_plane.h#L923) checkpatch complains with
10:30 jfalempe: ERROR: Macros with complex values should be enclosed in parentheses
10:31 jfalempe: but adding parentheses here is not possible, is that ok as there are already a few macros like this in this file ?
10:58 MrCooper: jfalempe: I'd say yes, checkpatch is more like a guideline than a law anyway
11:06 Lynne: JoshuaAshton: did you test my branch?
11:13 jfalempe: MrCooper, ok, thanks.
11:28 MrCooper: huh, ssh's ObscureKeystrokeTiming feature makes gitk run very slowly via X11 forwarding
11:46 jani: jfalempe: you do need to wrap plane in parens in for_each_if (plane->type == DRM_PLANE_TYPE_OVERLAY)
11:48 tzimmermann: jfalempe, about that caching issue: did you read https://www.kernel.org/doc/html/next/x86/mtrr.html ?
11:49 tzimmermann: there's a section on removing mtrrs via /proc/mtrr
11:49 tzimmermann: how does the RT process behave if you remove the VRAM's mtrr?
11:49 jani: jfalempe: -> has a higher precedence than e.g. & or *. passing &plane to that macro would end up being parsed as &(plane->type) instead of (&plane)->type as intended
13:05 jfalempe: jani, I already put all plane under (plane), but that's not enough for checkpatch, it wants the whole macro under ()
13:07 jfalempe: tzimmermann, Yes I read a bit about mtrr, but it should do the same as removing the devm_arch_phys_wc_add() call ?
13:09 jfalempe: tzimmermann, ah so it can be done from userspace, without having to modify the mgag200 driver ?
13:09 tzimmermann: jfalempe, exactly
13:10 tzimmermann: i'm currently trying your instructions on my dl120 machine
13:10 tzimmermann: clearing the mtrr reg seems to have an impact on the latency
13:10 jfalempe: tzimmermann, looks strange to change the memory mapping in the "back" of the driver, but I can try that.
13:11 tzimmermann: jfalempe, i have to boot the kernel with 'nopat' to enable the mtrr
13:12 tzimmermann: then ' echo "disable=2" >| /proc/mtrr
13:12 tzimmermann: reg02 is the framebuffer, hence disable=2
13:12 tzimmermann: and then the latency goes down for the test
13:13 tzimmermann: you could do this via ioctl from within the RT process
13:13 tzimmermann: or in a wrapper script
13:13 tzimmermann: and then re-enable the mtrr if the RT process goes away
13:15 jfalempe: tzimmermann, ok, I 'll try to get access to that server again, but that sounds good.
13:22 tzimmermann: i could not find a way to manipulate the PAT entries from userspace, though
13:25 jfalempe: tzimmermann, are there side effect to disable PAT ?
13:27 tzimmermann: you have to try. on my dl120, pat only affects the framebuffer memory
13:27 tzimmermann: there's non here
13:27 tzimmermann: have a look at https://www.kernel.org/doc/html/v5.7/x86/pat.html
13:27 tzimmermann: under "PAT debugging"
13:39 tzimmermann: jfalempe, i'm looking at this comment: https://elixir.bootlin.com/linux/latest/source/include/linux/io.h#L152
13:40 tzimmermann: IIUC the PAT tables are only relevant if we want to mmap the vram pages to userspace
13:40 tzimmermann: but we don't do this any longer
13:40 tzimmermann: maybe there's a little driver cleanup lurking here
13:41 tzimmermann: i have to investigate this
13:42 jfalempe: yes, there is no need for the userspace to directly write the VRAM.
13:43 tzimmermann: if we remove the call devm_arch_io_reserve_memtype_wc() from mgag200, it's like using nopat
13:43 tzimmermann: and devm_arch_phys_wc_add() would be a no-op; so there's no mtrr set up for the framebuffer
13:44 tzimmermann: BTW: i've foudn that we need to use devm_ioremap_wc() at https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/mgag200/mgag200_drv.c#L153
13:44 tzimmermann: the current call is incorrect
13:44 tzimmermann: there's no apparent difference in proactice, but still
13:45 tzimmermann: s/proactice/practice
13:54 jfalempe: ok so we can remove devm_arch_io_reserve_memtype_wc() and devm_arch_phys_wc_add(), and change devm_ioremap() to devm_ioremap_wc() ?
13:55 tzimmermann: jfalempe, let me test first
13:55 tzimmermann: we should keep devm_arch_phys_wc_add(), but it's a no-op if PAT has been enabled
13:56 tzimmermann: let me do some testing and see if how the latency changes
13:56 jfalempe: ok
13:57 jfalempe: on my server, the /proc/mtrr approach works. (it was reg08, so disable=8)
13:58 tzimmermann: great. so we have sound fallback if the driver cleanup doesn't do it
15:06 Company: here's a random question that confuses me:
15:06 Company: It turns out I forgot glBindAttribLocation() calls in my code, yet everything worked fine for me on all my machines and with all drivers and versions - software, zink, Intel, AMD
15:06 Company: that part is fine
15:07 Company: however, mclasen - who has the same laptop as me - had broken rendering due to that
15:07 Company: the only difference being that he's on rawhide and I'm on F39
15:07 Company: but they're both Mesa 23.3
15:07 Company: what can cause that?
15:08 Company: and the difference was with both Intel and swrast
15:14 karolherbst: UB being UB
15:15 Company: I'm interested in what's causing it
15:15 karolherbst: could be related to MESA_NO_ERROR
15:19 Company: that would mean that rawhide runs with MESA_NO_ERROR?
15:20 karolherbst: mhhh... does rawhide have different compile flags?
15:22 karolherbst: but I'd assume that you run with MESA_NO_ERROR for whatever reasons, as this will just skip over API errors... but anyway, kinda hard to judge what's going on here without debugging it
15:23 Company: running with MESA_NO_ERROR=1 doesn't cause any issues for me at least
15:23 Company: but maybe there's still shader caches
15:24 Company: I'm mostly curious so I can detect things like that when they happen in the future
15:24 karolherbst: what about when you run with `MESA_NO_ERROR=0` and your glBindAttribLocation call removed
15:25 karolherbst: mhh
15:25 karolherbst: Company: tried creating a gl debug context?
15:25 karolherbst: I think debug builds of mesa do that by default
15:25 karolherbst: might also change some of the UB or something
15:25 karolherbst: MESA_DEBUG=context
15:26 MrCooper: it could just be uninitialized memory, which happens to have contain different values on different systems?
15:26 Company: MrCooper: I tried both asan and valgrind and they found nothing
15:26 karolherbst: something like that
15:28 karolherbst: anyway, I'd check with "MESA_DEBUG=context" with your glBindAttribLocation calls removed, just to see if any errors are printed
15:28 Company: nope
15:28 Company: there aren't any
15:29 karolherbst: maybe there are on mclasens system
15:29 karolherbst: I guess it's an gtk4 app or something?
15:29 karolherbst: or some other toolkit?
15:29 karolherbst: maybe something changed there as well
15:30 Company: it's the new GTK4 renderer - but we're both running the same commit
15:30 karolherbst: of gtk4?
15:30 Company: yeah
15:30 karolherbst: interesting...
15:30 Company: https://gitlab.gnome.org/GNOME/gtk/-/merge_requests/6588
15:30 karolherbst: worst case do an apitrace and see what's different
15:31 karolherbst: mhh
15:31 Company: I'm pretty sure the apitrace would be identical
15:31 karolherbst: seems like mclasen doesn't have any additional errors
15:31 karolherbst: well
15:32 Company: I mean it works with the BindAttribLocation calls added
15:32 Company: but I'm still stumped by why seemingly identical code lead to different results
15:32 karolherbst: welcome to driver development
15:33 Company: and I haven't yet been brave enough to dive into how mesa assigns locations
15:33 Company: well, it happened the same for all drivers
15:33 Company: including software
15:34 karolherbst: sure, but the bug exists or doesn't depending on the system
15:34 Company: right - independent of hardware apparently
15:34 karolherbst: the hw can still be different in subtle ways
15:34 karolherbst: or it's indeed some memory issue
15:34 karolherbst: something something
15:34 Company: sure, it could be things like inodes
15:34 Company: or caches
15:35 karolherbst: yeah.. anyway, nothing surprising with the same code having different results in driver development
15:35 Company: yeah
15:36 Company: I can live with it remaining a mystery, but I'd love to figure it out
15:36 karolherbst: maybe a reboot would trigger the bug for you, or fix it for mclasen, who knows
15:36 Company: I've developed on this thing for a few months and never had an issue
15:36 Company: and I've used mesa versions from 23.2 to git main - on multiple machines
15:37 karolherbst: tried with the same testcases?
15:37 Company: yes
15:37 Company: make check with 0 failures for me and 50% of tests failing for him
15:37 Company: so it's really obvious
15:37 karolherbst: could also be a case of "you have to run thoes test cases in order to trigger it"
15:37 karolherbst: ohhh
15:37 karolherbst: I see
15:38 karolherbst: interesting
15:38 Company: all his shaders were mixing attribs seemingly randomly
15:38 karolherbst: well, maybe you got lucky or mclasen got unlucky
15:38 Company: and all my shaders kept perfect order
15:40 karolherbst: maybe the GPU is less/more busy on your system and it just changes things in weird ways
15:40 karolherbst: anyway
15:40 Company: ... for software rendering
15:40 karolherbst: at this point I'd make an apitrace
15:41 Company: I would start reading Mesa's code next if I cared - but I'm not that curious
15:41 karolherbst: that wouldn't really help
15:41 Company: trying to figure out how it selects attrib locations when none are set
15:47 jenatali: Company: I've run into that before. There's a quicksort call, which isn't a stable sort, and depending on the result you can get different bindings assigned
15:47 jenatali: I saw a test fail on Windows but pass on Linux due to that
15:48 Company: glibc:
15:48 Company: Fedora Rawhide 2.38.9000-30.fc40
15:48 Company: Fedora 39 2.38-14.fc39
15:48 Company: that sounds like a possible culprit
15:52 jenatali: Company: I'm curious if that does end up being it. Let me know if you confirm it
15:53 Company: hard to test without installing a new libc
15:54 Company: not sure I dare installing the rawhide libc on my F39
15:56 jenatali: Yeah fair
15:57 robclark: reminds me of https://gitlab.freedesktop.org/mesa/mesa/-/issues/10217 which was also an f40 qsort undefined ordering change
15:58 Company: I stopped using qsort() because it isn't stable (glib has a copy that is stable)
16:00 Company: but this might be good to know for all the GL tools that will now break because of this in F40 ;)
16:02 ccr: quicksort is a unstable algorithm by definition. of course many of the "qsort()" implementations like the one in glibc are not actually quicksort but something else, not that it matters per se ..
16:05 tzimmermann: jfalempe, it's weird
16:05 tzimmermann: i absolutely have to use the nopat parameter to get the reduced latency on my test system
16:05 Company: ccr: the qsort part in glibc used to be stable - only the non-qsort insertion sort part wasn't
16:06 tzimmermann: even just having pat enabled (without WC for the vram) gives worse results
16:06 Company: but IMO all sort algorithms should be stable by default, because nobody expects sorts to be unstable and that causes bugs
16:07 Company: it's especially bad in C because afaik there's not even a stable sort available in libc
16:08 jfalempe: tzimmermann, that's weird indeed, for me disabling wc was enough and pat was always enabled before.
16:08 Company: jenatali: I installed rawhide glibc now
16:09 Company: jenatali: everything kept working
16:09 Company: jenatali: then I deleted the shader cache
16:09 jenatali: Huh
16:09 Company: jenatali: and now I reproduced it
16:09 jenatali: Ah makes sense
16:09 Company: so yes, F40 libc is the culprit
16:09 jenatali: Libc wouldn't be part of the cache key
16:10 tzimmermann: jfalempe, i removed devm_arch_io_reserve_memtype_wc() and also used plain ioremap(). so there was no entry in the PAT list
16:10 tzimmermann: but i do use a debugging build. it could be that seomthing else interferes here
16:10 ccr: Company, agreed about the "should be stable", yes. unstable sorts .. well, I suppose they have their uses if they are more performant and there's no need for stability.
16:11 tzimmermann: jfalempe, and if i now use nopat results are always good. changing /proc/mtrr doesn't seem to have much of an effect
16:11 Company: in my experience, they're not more performant in almost all cases
16:12 Company: if you want a fast sort, use timsort
16:12 jfalempe: tzimmermann, even with WriteCombine enabled ? that's surprising.
16:13 jenatali: FWIW Windows/MSVCRT qsort has always been unstable so cross-platform GL code should be fine at least
16:13 tzimmermann: yes, even when i has mtrrs set to WC
16:13 jfalempe: tzimmermann, let me try that too.
16:13 tzimmermann: jfalempe, i'll send out a patch that cleans up mgag200 to do the right thing for the common case. from there, nopat + /proc/mtrr should still be an option
16:14 jfalempe: tzimmermann, ok, sounds good. even "nopat" alone should be good, if I can reproduce.
16:14 Company: jenatali: everybody should just use glBindAttribLocation() - but I guess simple tools can forget the call as long as it works fine
16:15 jenatali: Yeah
16:15 tzimmermann: jfalempe, i'm currently testing with this code: https://etherpad.opensuse.org/p/mgag200
16:15 Company: I expect a bunch of tools to break with F40
16:15 tzimmermann: plain ioremap + mtrr setup
16:16 Company: yay, dnf downgrade is a thing, I easily can get my libc back
16:16 tzimmermann: that does not create a PAT entry, but an mtrr (if nopat given)
16:19 jfalempe: tzimmermann, ok, I'm rebuilding a kernel, I should have the results shortly.
16:31 tzimmermann: jfalempe, see you tomorrow
16:33 jfalempe: tzimmermann, see you, thanks for all the helps.
17:58 karolherbst: zmike: no regressions on my side with my zink fix + using a real buffer for cb0, so I kinda want to merge it soonish
18:04 mattst88: m/win 6
18:30 eric_engestrom: zmike: no, `backport-to:` is not case-sensitive (btw neither is `fixes: $sha` or `cc: mesa-stable`); there is however a bug right now, where if you specify the line two or more times, only the first match is parsed; I haven't looked into fixing that yet but I have a script that detects any such commit and I handle them manually
18:30 eric_engestrom: zmike: are you asking because something was not backported properly?
18:33 zmike: eric_engestrom: no I was asking because I was going to start using it
18:34 zmike: karolherbst: you can start hassling me if it's been more than 24 hours since you posted a MR
18:34 zmike: it's barely been 12
18:35 eric_engestrom: zmike: ack
18:37 karolherbst: oh, sorry
20:15 mareko: DavidHeidelberg: ping on libdrm
22:26 JoshuaAshton: Lynne: Did not test ffmpeg myself, just was peeking at the code in that branch to see wtf I was missing my own Vulkan Video Encode setup
22:26 JoshuaAshton: This stuff is so hard to follow :sweat_smile: