IRC Logs of #dri-devel on irc.freenode.net for 2023-03-16

03:45 Venemo: I think this is an interesting issue, but not sure how to label it as it affects many different parts of mesa: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4742 it seems that it fell through the cracks due to not being labeled
03:59 airlied: Venemo: probably one to throw at the mailing list instead, or add all the tags to
03:59 airlied: that really need some sort of toplevel owner to make it happen
04:45 Venemo: airlied: yes I agree
06:11 mupuf: anholt: thanks for letting me know! I'm on it!
06:11 mupuf: seems like the MR to disable it did not land
06:26 mupuf: it's back up, and I am combing the logs to see what happened
06:26 mupuf: but I would understand if you guys would rather wait
06:26 mupuf: CI is already flaky enough
09:26 javierm: airlied, jani, tzimmermann: I was reading the thread about firware versions and initrd and wondered if there's really a need to have a native DRM driver in the initrd
09:27 javierm: if is only to add your LUKS password or troubleshoot the system if fails to mount the rootfs partition, etc then simpledrm should be enough I believe
09:29 javierm: airlied, jani, tzimmermann: didn't answer in the thread to not derail the current conversation but maybe avoiding to add the DRM drivers (and related firmware) could be a solution for this as well
09:31 tzimmermann: javierm, i'm not in the business of building initrds, but that would be my take on this as well. having native drm drivers in the initrd is a nice-to-have. if one of the them interferes, it has to remain on the root partion
09:33 javierm: tzimmermann: I've no business in building initrds either :) but with my fedora distro developer hat, I believe that would be the best approach
09:33 javierm: specially since huge initrds are big problem, for example for folks doing booting over the network (PXE and HTTP boot) and so forth
09:47 q4a: Hi. Simple questions: is it possible to use some gallium drivers (like Panfrost, Nine) on Android? Maybe someone can add a description of how Mesa is limited on Android to docs like https://docs.mesa3d.org/android.html ?
10:03 airlied: javierm: I talked to karolherbst about it, he said for a lot of laptops simpledrm isn't enough
10:03 airlied: esp boot with lid closed you get nothing until a drm driver loads often
10:08 javierm: airlied: ah, that's a good point
10:25 skinkie: I have the recurring issue where video= (or even user space) just does not take a specific mode, likely because the name of the mode is equal to another mode.
10:25 skinkie: And the difference here is the framerate, so 1920x1080@50e becomes 1920x1080@60. While for example xrandr would be perfectly capable to switch it to 50p. Now also see the variant where 50 becomes 50i, without actually specifying interlacing.
10:47 madhavpcm: hello
10:52 madhavpcm: Is this the right place to ask for questions related to the ideas mentioned in this page (https://www.x.org/wiki/SummerOfCodeIdeas/).
10:53 karolherbst: yes
10:54 madhavpcm: Great! Im interested to work in the GUI projects namely,adriconf enhancements and Vulkan GPU preference tool
10:59 madhavpcm: I have some experience with the tools mentioned, Can anyone help me with other prerequisites if any?
11:00 madhavpcm: I also noticed a patch was to be submitted and it would be great if I could get some details on that :)
11:51 zmike: mareko: I've got cts changes up for 2/3 the issues, but I'm still missing some context for the KHR-GL46.gpu_shader_fp64.fp64.state_query one (i.e., what exactly is the error you get)
12:34 karolherbst: gfxstrand: I have bad news for you: we need alu ops in nir which will never flush denorms regardless of what the fp mode is
12:35 karolherbst: jenatali: you might be interested as well, because that's required for proper fp16 support
12:36 jenatali: karolherbst: The fp mode has separate states for fp32/fp16/fp64
12:36 karolherbst: yes, but no
12:36 jenatali: You can set a mode that's fp32 flush denorm, fp16 preserve
12:36 karolherbst: vstore/vload_half _can't_ flush
12:36 karolherbst: regardless of the mode
12:37 jenatali: Can't flush 32-bit denorms?
12:37 karolherbst: ehh
12:37 karolherbst: must not
12:37 karolherbst: never ever
12:37 jenatali: Are we talking about fp16 denorms or fp32 denorms?
12:37 karolherbst: so if the application wants denorm flushed, vload/vstore_half still mustn't
12:37 karolherbst: both
12:38 karolherbst: I guess?
12:38 karolherbst: but for now I indicate to drivers that fp16 shouldn't be flushed and that works for vload/vstore_half
12:38 karolherbst: the issue is once you advertize fp16, you also agree to the denorm madness
12:38 karolherbst: by default you can say you don't support denorms with fp16
12:38 karolherbst: _but_
12:38 jenatali: Oh, I see
12:38 karolherbst: vload/vstore_half still must not flush them
12:39 karolherbst: and we do this rounding conversion madness there as well
12:39 jenatali: The D3D spec only allows denorm control for 32bit, the requirements are no flushing for 16 or 64, so I didn't notice
12:39 karolherbst: ahh
12:39 karolherbst: I guess you don't advertize fp16 support in cl yet?
12:40 jenatali: No, I think I tried but something blew up
12:40 karolherbst: figures
12:40 karolherbst: -cl-denorms-are-zero is problem :P
12:40 jenatali: Doesn't that only apply to 32bit?
12:40 jenatali: Removing 16bit denorms really limits the usefulness of fp16
12:41 karolherbst: "This option is ignored for double precision numbers if the device does not support double precision or if it does support double precision but not double precision denormalized numbers i.e. CL_FP_DENORM bit is not set in CL_DEVICE_DOUBLE_FP_CONFIG."
12:41 karolherbst: at least
12:41 karolherbst: not sure about fp16
12:41 karolherbst: let me check..
12:42 karolherbst: I mean.. I wouldn't mind to advertize CL_FP_DENORM always
12:42 karolherbst: for fp16
12:44 karolherbst: jenatali: I think you might be right
12:44 karolherbst: `This option controls how single precision and double precision denormalized numbers are handled.`
12:45 karolherbst: and usually the spec already takes extensions like fp16 into account
12:45 karolherbst: sometimes
12:45 karolherbst: might want to file a spec bug to get some clarification on it
12:50 jenatali: karolherbst: oh I think our software rasterizer blew up. WARP had some bad fp16 conversation logic IIRC
12:50 karolherbst: heh
12:50 jenatali: I recently fixed it for VK fp16
12:51 jenatali: At some point I should try CL again
12:52 karolherbst: yeah.. but I think just requiring CL_FP_DENORM is probably good enough
12:52 jenatali: I also finally fixed the bug with non-uniform indexing of constant buffers because VK hits it
12:52 karolherbst: and if drivers can't handle not flushing denorms then no fp16 for them I guess
12:52 karolherbst: nice
12:58 dolphin: danvet, airlied: drm-intel-gt-next PR sent, will be OoO next week
13:21 karolherbst: I have a terrible idea for an optimization
13:22 karolherbst: make use of alignment information to know if adding an offset to a pointer won't touch the upper bits
13:22 karolherbst: so you could use 32 bit maths on 64 bit pointers
13:23 jenatali: Is that really more efficient? Wouldn't that require splitting the pointer to be able to recombine the high bits
13:26 alyssa: jenatali: that's free unless your compiler sucks
13:26 jenatali: Fair enough
13:26 alyssa: (in your specific case, your compiler being the underlying dx12 driver. it doesn't matter if you have some extra moves in your dxil.)
13:29 jenatali: Yeah, it'd be shifts and masks
13:30 jenatali: But it also doesn't matter because we already do all of our pointer math as 32bit because it's actually a buffer index in the high bits
13:31 jenatali: Also ugh another build break snuck in for the Windows build while it was disabled......
13:34 alyssa: 13:29 < jenatali> Yeah, it'd be shifts and masks
13:34 alyssa: as I said. if the underlying backend compiler can't coalesce them it sucks :p
13:34 jenatali: Yeah
13:57 karolherbst: most will lower it to 32 bit anyway
13:57 karolherbst: and the next opt loop over that would tidy up all the masks
13:57 karolherbst: or whatever you have
13:57 karolherbst: but yeah.. the question is how to do that without making it hard to optimize
13:58 karolherbst: I mean.. nvidia hardware doesn't have 64 bit int ops anyway
13:58 karolherbst: sooo.. it's clearly a win there
14:00 karolherbst: actually
14:00 karolherbst: this is even simpler
14:00 karolherbst: we just have to replace the iadd with an ior
14:01 karolherbst: because check if the offset would touch any of the bits above alignment
14:01 karolherbst: though I can see the corner case of applying two offsets in a chain to the same pointer and stuff, mhhh
14:02 karolherbst: but then we'd have to track down the entire chain and could just do the or on the base ptr again?
14:07 madhavpcm: Hey I had to go offline, did I miss some reply to the earlier msg I put?
14:20 jenatali: karolherbst: You can't track down the entire chain, a pointer could've been stored in e.g. groupshared or scratch memory after a first addition
14:20 karolherbst: yes, and then we use the normal add
14:20 karolherbst: _but_
14:20 karolherbst: we still have the alignment information in a few places
14:21 karolherbst: and we must be able to trust that information
14:22 karolherbst: I just don't think it will matter all that much, still an interesting experiment
14:46 pundir: Hi, need some help in figuring out how to fix this mesa/main build error for AOSP. https://www.irccloud.com/pastebin/N02SnfBA/
14:48 karolherbst: you know what python version is used there?
14:49 karolherbst: mhh, but also what's the mesa version?
14:49 karolherbst: or git hash
14:52 pundir: @karolherbst python3.8.10, mesa HEAD at 5c5c114fa290 meson: correct typo in comment
14:53 karolherbst: so I guess it's caused by this: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21754
14:56 karolherbst: weird...
14:56 karolherbst: pundir: I wonder why `list` is an object there
14:57 karolherbst: ohh wait..
14:57 karolherbst: it needs to be uppercase for 3.8 I think
14:58 karolherbst: https://docs.python.org/3/library/typing.html#typing.List
14:58 karolherbst: pundir: can you check if replacing list[str] with List[str] fixes it?
14:59 pundir: @karolherbst checking..
14:59 karolherbst: though not sure if the code would run as is
15:05 pundir: @karolherbst that didn't help. "NameError: name 'List' is not defined"
15:06 pundir: @karolherbst and this failure is indeed introduced in https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21754. If I checkout yesterday's tree then I dont see this build error.
15:06 pundir: A new error but not this one :)
15:07 karolherbst: might have to add a `from typing import List` as well
15:19 pundir: @karolherbst "from list import List" helped move the build further, but now i'm stuck at "AttributeError: 'str' object has no attribute 'removeprefix'"
15:19 karolherbst: mhh
15:19 pundir: https://www.irccloud.com/pastebin/4aRgGmz4/
15:19 karolherbst: seems like we stopped being compatible with python 3.8 then
15:20 karolherbst: removeprefix was added in 3.9
15:20 pundir: ohh
15:20 karolherbst: could you file a bug and saying that mesa doesn't compile with python 3.8? Not sure if that's something we care about or not, but if ASOP still uses it, we might want to
15:20 karolherbst: *AOSP
15:21 pundir: so python 3.8.10 is ubuntu-20.04 problem and not an AOSP problem likely
15:22 karolherbst: right..
15:22 pundir: i'll try upgrading python version on my desktop and see if that helps
15:22 pundir: thanks a lot for your time @karolherbst
15:22 karolherbst: though I think if we require ptyhon 3.9 we should catch that earlier in the process
15:23 karolherbst: please report back if using python 3.9 works (or any other version you'd be using)
15:23 karolherbst: but testing 3.9 is kind of prefered
16:03 alyssa: anholt: Do you think we should land https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20553 ?
16:04 alyssa: I recall you lamenting not having clang-format enforced for i915g a while back, this would solve that at the cost of a little more CI
16:04 alyssa: With a bit more of a clear head then when I rage-closed that MR originally -- I guess I'm neutral
16:06 alyssa: I've locally added a pre-push script that does a clang-format lint, so even if I have a vim formatting fail, non-clang-formatted code should never leave my machine
16:06 alyssa: So it doesn't make a big difference to me and my workflow
16:06 alyssa: However it does change the calculus vaguely for drive-by contributions, unsure if for better or for worse.
16:06 alyssa: eric_engestrom: daniels: mupuf: as other CI stakeholders
16:07 alyssa: lina: bbrezillon: italove: as other driver stakeholders for panfrost & asahi
16:07 alyssa: (That MR enables the lint for Asahi, but Panfrost would presumably follow if it's merged. Currently Panfrost is not clang-format clean though that shouldn't be hard to fix.)
16:08 eric_engestrom: pundir: `from typing import List`, not `from list`
16:09 eric_engestrom: maybe you made a typo on irc and not in your code
16:09 eric_engestrom:should read everything befor ereplying
16:14 eric_engestrom: alyssa: I think the cost is negligible (few seconds of CI), so the rate of false-positives (0 I expect), and the benefit is not having to care about reviewing formatting anymore, which is small but non-negligible, so in favour overall, but not strongly enough to argue over it :]
16:15 alyssa: very fair
16:16 alyssa: the false positive risk I see is "drive-by contributor who doesn't know they need clang-format, we're so used to not reviewing formatting because we have proper editor configuration, we review their patch and assign to marge and the pipeline comes back red 4 hours later because the brace was in the wrong place"
16:16 alyssa: IDK if there's an effective way to mitigate that.
16:17 alyssa: otoh, if that MR is merged (status quo), then when I rebase there'll be a formatting fail so then my script wont
16:17 alyssa: let me push any of my code until I fix their fail
16:17 alyssa: which is not the end of the world either
16:17 alyssa: but not ideal
16:20 daniels: yeah I don’t have any object to merging format as well as we can make it clear how people can do it locally
16:25 alyssa: daniels: where would you like that documented specifically?
16:28 eric_engestrom:was asking themselves the same question
16:28 eric_engestrom: https://docs.mesa3d.org/codingstyle.html I guess?
16:52 daniels: yeah + making sure that the CI job failure message gives a clear ‘run ./bin/indentme to fix this’ error message
16:53 alyssa: Not fully sure how to do that
16:53 alyssa: also would really love if someone wants to take over that MR because I am still not convinced this is a good idea which makes me the wrong person to drive it, lol
16:54 alyssa: (and because apparently studying full time while also working on multiple drivers is contra-indicated)
16:56 alyssa: daniels: any objection to merging as-is and improving the UX after? because not sure when I will have time to work on this again and regenerating the container isn't great
16:56 alyssa: (or at least merging the "add clang-format to the container" change now)
16:56 alyssa: (especially given the MR is now 2x r-b'd)
16:57 daniels: alyssa: sure, I’m on holiday so am hardly likely to spend much/any time on CI review
16:57 alyssa: oh, didn't realize. happy holidays then
16:58 daniels: grazie!
16:59 alyssa: "Merge blocked: the source branch must be rebased onto the target branch."
16:59 alyssa: uh
17:00 alyssa: shrug
17:03 daniels: Marge will do that
17:04 alyssa: yeah but i
17:04 alyssa: i did rebase it
17:04 alyssa: :p
17:49 Lynne: what's the penalty like for using VK_KHR_buffer_device_address instead of normal storage buffers on amd/intel?
17:53 pendingchaos: for AMD, 64-bit address calculation instead of 32-bit
17:53 pendingchaos: we can usually assume that different SSBO bindings don't alias each other, which helps conclude that memory read is read-only
17:53 pendingchaos: which can't be done with BDA
17:53 dj-death: Lynne: not much different for intel I think
17:54 pendingchaos: BDA doesn't have bounds checking, but it's mostly free with SSBOS, if you care about that
17:55 pendingchaos: otoh, BDA load/store can be vectorized more easily than SSBOs if robust_buffer_access2 is used, because there are no offset wraparound edge cases
17:56 dj-death: Lynne: for intel it's mostly descriptor_indexing that is costly
17:58 Lynne: interesting, I'd expect robustBufferAccess2 to make it slower rather than faster
17:59 pendingchaos: robustBufferAccess2 doesn't make it faster, it's just unaffected by robustBufferAccess2
17:59 Lynne: you can mark BDAs with write/read_only like regular buffers
17:59 Lynne: ah, I misread
18:00 alyssa: tangentially related but how are backend compilers supposed to decide which memory loads/stores need dependencies/waits/barriers and which don't?
18:01 alyssa: if there's a memoryBarrier() in the app that's an easy case but otherwise
18:01 alyssa: presumably that hits some messy alias analysis, unless NIR is already helping with that somehow?
18:14 Lynne: I was looking to use BDAs for compute with buffers ~100mb in size, so I'll go ahead with it, thanks
18:14 Lynne: maybe I'll convert everything to BDA while I'm at it, I don't need bounds checking, my code is perfect
18:15 alyssa: Lynne: https://social.treehouse.systems/@alyssa/110033324649036607
18:21 pendingchaos: 64-bit address calculation is rought 2x the cost of 32-bit address calculation
18:21 pendingchaos: so unless you need the functionality or helps vectorization a lot, it's probably slightly worse than normal SSBOs
18:26 Lynne: all my loads are sequential, so they're easily vectorizable
18:28 pendingchaos: the problem with ssbo vectorization is alignment, for example: a vec2 load at offset -4 wouldn't work, so we can only vectorize if we're sure that never happens or it's vec2 aligned
18:29 Lynne: that's why the buffer alignment has to be specified for BDAs
18:29 Lynne: my data is all just 4x4 matrices anyway
18:34 Lynne: speaking of, I get roughly 4x the performance loss going from vectors to matrices for the prefix sum shader I have, aren't there dedicated matrix units?
18:38 pendingchaos: no, at least none that are useful for the matN GLSL type
18:38 pendingchaos: I think they require an extension to use and take fp16
18:41 Lynne: yeah, nvidia have an extension to use their tensor units in glsl, though theirs are fp32
18:42 Lynne: cooperative_matrix, sadly their compiler doesn't seem smart enough either to use them without it
19:07 airlied: alyssa: is there a possible problem with clang-format version divergence?
19:07 airlied: also can you share your hook somewhere
19:22 alyssa: airlied: I've pinned a particular format (clang-format-13) both in CI and locally
19:22 alyssa: so version divergence should be limited to some churn once a version bump
19:23 alyssa: s/format/version/
19:23 airlied: i was more thinking on random contributor end
19:23 alyssa: oh. well random contributors need to pin the same version too then
19:24 airlied: or repeat fling at CI
19:24 alyssa: clang-format-13 is packaged in debian from oldstable to sid
19:24 airlied: which is what i do for all projects that enforce clang format
19:24 alyssa: idk what other distros do
19:25 alyssa: if other distros are only packaging a single version that might be trickier.
19:26 alyssa: fwiw lint passes on clang-format-14 too so hopefully not *too* much divergence
19:27 alyssa: airlied: as for scripts
19:27 alyssa: instead of git pushing I use `pub`
19:27 alyssa: https://rosenzweig.io/pub
19:27 alyssa: with this check-format.sh https://rosenzweig.io/check-format.sh
19:28 alyssa: that one is new as of this morning, wanted to see if linting only changed files was materially better
19:28 alyssa: the alternative is what we do in CI (with gnu parallel because 8 cores go brr)
19:28 alyssa: https://rosenzweig.io/format-agx-check.sh
19:28 alyssa: i have some, uh, rube gitberg machines https://rosenzweig.io/agx-mr
20:14 DavidHeidelberg[m]: Did anyone tried LVP -> Venus -> Zink ?
20:27 alyssa: yo i heard you like mesa
20:27 alyssa: so we put mesa in your mesa
20:29 jenatali: LVP -> Venus doesn't work, does it?
20:29 jenatali: Also, isn't the Venus renderer on the host Vulkan? How would that layer on zink?
20:33 airlied: I'd have put the arrows the other way
20:33 airlied: zink on venus on lvp
20:33 jenatali: Ohhh
20:33 jenatali: That makes a lot more sense
20:34 DavidHeidelberg[m]: airlied: right. app -> zink -> venus -> lvp (or HW)
20:36 DavidHeidelberg[m]: in CI we have jobs for zink -> lvp and venus -> lvp, so I'm just wondering if there is currently anything why it shouldn't work
20:46 alyssa: why not throw virgl in there for nested virt for funsies
20:47 airlied: app -> virgl -> zink -> venus -> lvp
20:47 alyssa: yes that one
20:47 airlied: needs more pikachu
20:48 airlied: like we should just create one single CI pipeline that tries to use every part of the stack in one run :P
20:49 airlied: like back it all onto some sort of DMX systems running across a range of GPUs
20:49 alyssa: airlied: can we fit openswr and i915c in there somehow
20:49 airlied: opensewer? :-P
20:49 airlied: we need to also fit angle and swiftshader in to appears our google overlords :-P
20:50 airlied: appease
20:50 airlied: I'd blame autocomplete but it was in my brain
20:52 alyssa: airlied: ANGLE can do GLES-to-GL so we can stick it on the top
20:54 DavidHeidelberg[m]: shame we can't do emulation GL -> Vulkan, we could make never ending loop
20:54 alyssa: DavidHeidelberg[m]: oh yes we can
20:55 alyssa: https://github.com/DragonJoker/Ashes
20:56 DavidHeidelberg[m]: alyssa: thanks, had to give a star to this beautiful project
20:57 alyssa: :D
20:58 alyssa: oh also stick some dozen + vkd3d-proton loops in there somewhere
20:58 DavidHeidelberg[m]: so GL -> virgl -> zink -> venus -> ashes -> zink (just for sure) -> anv... I like this train
20:58 DavidHeidelberg[m]: haha, we could play this like a game... attach another wagon :D
20:59 DavidHeidelberg[m]: d3d9on12 or something like that would be nice
21:00 alyssa: nine -> zink -> dozen -> vkd3d-proton -> ashes -> virgl -> angle -> venus -> lavapipe
21:00 alyssa: prince of persia not go brr
21:03 DavidHeidelberg[m]: let me enhance it just to be this train aesthetically pleasing: d3d8to9 -> nine -> zink -> dozen -> vkd3d-proton -> ashes -> virgl -> angle -> venus -> lavapipe
21:03 DavidHeidelberg[m]: btw. we could make contest. Who stack most of tech on top of each other and it still can render triangle :D
21:05 HdkR: I'm sure you could stick FEX in there somewhere
21:06 kisak: need to add a contest rule to not repeat layers
21:13 Fijxu: Hello, i wanted to try NOUVEAU driver because the nvidia-open drivers are really a mess since i can't fully shutdown the GPU on there (Since i am using a Thinkpad with a 3070 Max-Q and the idle power is 11W and that is a lot) and well, it works better (in terms of power management) since the dedicated one fully powers off when is not used. But i wanted to try the NVK Mesa driver, i compiled and installed succesfully the NVK Mesa but vulkan doensn't
21:13 Fijxu: work using `vkcube`. Any clue? I would ask this on #nouveau-vk but is kinda empty
21:14 Fijxu: And of course i used DRI_PRIME=1 variable on every command
21:20 airlied: you want #nouveau more likely
21:35 Fijxu: airlied: Thanks, i will try to ask there ;)
21:36 Fijxu: This thing of nvidia is the only bad thing of my thinkpad power usage
22:13 Company: random GL question: glTexParameter() - are those properties set on the texture referenced by the id or are they set on the texture unit of the current context?
22:14 Company: ir if context1 sets the filter of a texture, does that affect the filter of the same texture used in context2?
22:16 Company: *ie
22:19 pendingchaos: glTexParameter() affects the texture, not the texture unit
22:19 pendingchaos: so the sets the filter for both contexts (though I don't know if you can share textures across contexts like that)
22:22 Company: (you can, but properties are only guarantted to be synced at predfined sync points and a bunch of stuff is pretty undefined if both modify stuff at the same time)
22:22 Company: yes, I had to debug GLsync stuff recently
22:23 Company: so if I want to share texture data between 2 contexts but use 2 different filters in each, what's the best way to achieve that?
22:24 pendingchaos: sampler objects can be used to split the filter mode (and some other settings) from the texture object
22:25 Company: which is not in GLES2, hurray
22:26 pendingchaos: texture views, maybe? besides just constantly changing the filter mode
22:27 pendingchaos: looks like GLES2 doesn't have texture views
22:27 Company: that's GL 4.3+ only
22:28 Company: okay, but there are solutions, worst case we need to fallback to manual copying for GLES
23:59 alyssa: Company: the short answer to multi-context GL is "don't"