07:43pq: I seem to have a knack for killing any discussion I participate in, at least on dri-devel.
08:28darkbasic4: Can someone please confirm if rusticl does work with nouveau on mesa 23.1? I'm trying to finalize the Gentoo ebuild but I can't remember if it's currently limited to Intel+radeonsi
09:34karolherbst: there are still discussions on dri-devel?
09:34daniels: one or two (thousand) posts about colour management
09:35karolherbst: sounds more like bikeshedding than discussions :P
09:35daniels: no, it’s just really really hard tbf
09:35karolherbst: ahh
09:36daniels: and there’s no clear objectively ‘good’ answer, it’s all tradeoffs as well as some guesswork around future hardware etc
09:36karolherbst: pq: +1 though
09:37kode54: showing how important VM_BIND is to DG2
09:37kode54: Ryujinx runs unplayably slow on i915, but works just fine on Xe
09:51kode54: um
09:59pq: karolherbst, is that +1 like "I do that too" or "pq should keep doing that" or "pq is harmful to discussions"? :-)
10:00karolherbst: nah, to your last comment. out of tree drivers don't exist in terms of discussion what goes into the kernel :P
10:00karolherbst: no idea why people still try to make exceptions
10:02pq: ah, cool.
10:03pq: except drm_fourcc pixel formats? ;-)
10:03karolherbst: huh? it's used in open source
10:03karolherbst: maybe there are some formats we don't use yet?
10:04karolherbst: But the nvidia stuff is more or less used if you do tegra + nouveau stuff on tegra chips afaik
10:04pq: they are not required to the used in the kernel, but are they required to have FOSS code somewhere still?
10:04pq: *be used
10:04kode54: I saw what I needed from the changelog
10:04karolherbst: good question, it's not really uapi in the sense of uapi...
10:04kode54: I needed BuildStream 1.6.8
10:05karolherbst: but I think they are all valid to be used with nouveau
10:05karolherbst: but that fourcc stuff is also highly dynamic
10:06karolherbst: anyway.. it's actually required for tegra chips even in userspace, otherwise... uhh.. things break
10:06karolherbst: so yeah
10:06pq: karolherbst, I don't think it's so much different in concept from the COLOROP element definitions. There could be an out-of-tree driver that makes use of a drm_fourcc while no upstream does.
10:07karolherbst: it kinda depends on the stack tho
10:07karolherbst: but yeah, a specific value might be only used on a prop driver, due to exposed formats or whatever
10:08karolherbst: but it is all supported in upstream userspace
10:10karolherbst: in the end it kinda means how strongly defined the meaning of bits in data is. Drivers can always use undefined bits for random things and a specific undefined value can so as well
10:11karolherbst: but once we do define those and non open source userspace breaks, what then?
10:11karolherbst: is it a regression? is it none?
10:11karolherbst: does the "we don't break userspace" also covers closed source userspace?
10:11karolherbst: and if so, do we really want undefined things to happen?
10:13karolherbst: so the only choice we have is to reject undefined bits/values and require open source implementors/users
10:18kode54: lovely, BuildStream 1.6.8 isn't compatible with Python 3.11
10:35kode54: good, now it's rapidly building htings
10:35kode54: all I needed was to install python 3.10 and set up a venv for it
10:53sima: karolherbst, drm_fourcc is slightly different because upstream serves as the official registry for some khr extensions
10:53sima: https://dri.freedesktop.org/docs/drm/gpu/drm-kms.html#open-source-user-waiver
10:53karolherbst: sima: ahh, good to know
10:53sima: pq, ^^ in case you ever need the doc reference
10:54karolherbst: so if you want to define custom prop only formats you'll have to get a waiver for it first
10:54karolherbst: and a very _strong_ reason
10:56sima: well the 2nd paragraph is supposed to cover that
10:57karolherbst: fair
11:17karolherbst: airlied: what's the subgroup situation with llvmpipe? It kinda returns bogus results at runtime not matching the CAPs
11:17karolherbst: is the subgroup size fixed? Is it variable?
11:17karolherbst: How big are those?
11:23karolherbst: ohh it's lp_native_vector_width / 4
11:23karolherbst: ehh lp_native_vector_width / 32
11:26airlied: karolherbst: yeah its 8 usually and sometimes 4 if you have a crap cpu
11:26karolherbst: also works with forcing it to 512 apparently
11:26karolherbst: but yeah..
11:26karolherbst: the cap returns 32 :)
11:26karolherbst: anyway, got the CTS subgroup stuff to pass on llvmpipe now
11:27airlied: k
11:28karolherbst: 1024 returns weird values, but I suspect it's just not supported
11:28karolherbst: and as long as it's behind an env var I'm inclined to not care
11:31karolherbst: so let's see if I made it work on iris, llvmpipe and radeonsi now :)
12:56jani: mripard: tzimmermann: mlankhorst: ack on merging this via drm-intel? https://patchwork.freedesktop.org/patch/msgid/20230511103714.5194-1-juhapekka.heikkila@gmail.com
12:56tzimmermann: jani, sure, makes sense
14:05alyssa: jenatali: Thanks for the review!
14:05alyssa: Can I count you down for reviewing the rest of the atomics series too? ;-)
14:07alyssa: This was MR 1, "add the new intrinsics and port a bunch of drivers to them"
14:08alyssa: next few MRs are the straggler drivers that didn't make it into that MR, honestly that one is already bigger than I wanted
14:08alyssa: once all the backends are converted .. there's the rest of the tree
14:08alyssa: TBD whether I can tell a nice story for that one or it ends up being a big ol' flag day
14:17jenatali: alyssa: Sure, I'm happy to review what I can
14:18alyssa: :-D
14:18alyssa: thank you
14:18jenatali: It's a good / much-needed change. Our backend is going to get much simpler
14:19alyssa: Literally every backend I looked at except for Midgard is simplified
14:20alyssa: and once all the backends are converted, core is going to shed a LOT of code :3
14:22alyssa: jenatali: and do ping me if you ever need NIR review
14:22jenatali: Yeah. I don't know how long you want to let that sit waiting for more reviewers. Probably at least want acks from the backend maintainers (that aren't already you)
14:23alyssa: > that aren't already you
14:23alyssa: ...Lol
14:23alyssa: Alyssa: I am stepping down from Panfrost. My new job is not working on Apple GPUs.
14:23alyssa: Also Alyssa: Writes patches for all 3 compilers on her first day.
14:27alyssa: ---
14:28alyssa: robclark: anholt_: dschuermann: airlied: italove: bbrezillon: The atomic rework MR contains a significant patch to your backend that you haven't yet reviewed. Jesse has reviewed it so I am comfortable landing but giving you a heads-up in case you would like to review yourself as well before merge https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22914
14:28alyssa: Cheers :)
14:29alyssa: (Apologies for the freedreno/ir3 and radv/aco labels getting lost, I added those commits after the MR was labelled)
14:30alyssa: absent further review I'll probably merge tomorrow
15:16karolherbst: I wonder if I should test on nouveau before that
15:17karolherbst: guess I'll do some testing once I'm done with testing subgroups
15:21kj: How come there's no p_atomic_dec() in u_atomic.h . I feel like there's proabaly a reason that I'm not aware of
15:22kj: I meant p_atomic_sub()
15:22karolherbst: sub == add
15:22jenatali: Is there a good reason to have one? Would it do anything other than add a negative value?
15:24kj: Fair enough. I though there would have been a difference in using `__atomic_sub_fetch()` than the add with a negative value
15:24HdkR: Where's my atomic two's complement negate? :)
15:26karolherbst: HdkR: cursed
15:27HdkR: karolherbst: Tell that to x86 :P
15:27karolherbst: I'm sure doing atomic mul isn't that trivial, but maybe it is? dunno
15:28karolherbst: but yeah, negate would be more simple
15:29HdkR: Everyone loves sticking weird ops in their memory pipelines I guess
15:54robclark: just need that atomic strlen instruction
15:54mareko: atomic_sub should give you negation
15:56mareko: actually no
15:56HdkR: Wrong way around sadly
16:01HdkR: If anyone actually wants to use atomic negate, I would give that code a very strong eyeballing
16:02mareko: cmpswap then
16:02HdkR: `lock neg` is all you need
16:04HdkR: https://github.com/FEX-Emu/FEX/blob/main/unittests/ASM/PrimaryGroup/3_F7_03_2.asm#L14 Need to update my unittests to test cross-cacheline...
16:08mareko: HdkR: I would also need the transistors that do "lock neg" in my GPU, and those aren't easy to get
16:09HdkR: :D
16:09HdkR: Bit spicy to do on the GPU for a worthless atomic operation
16:12mareko: there is a huge transistor inequality, only the biggest companies can get the transistors they need
16:13alyssa: fight transistor inequality!
16:13alyssa: transistors, transbrothors, and transiblors unite!
16:13alyssa: :p
16:14HdkR: hehe
19:09alyssa: 1 line docs/python patch, any takers? =)
19:09alyssa: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22929/diffs?commit_id=d40fbf33ba2d44be21a746270f0533ab375fc4a7
21:14karolherbst: gfxstrand: do you think you'll have some time to discussion global mem initializers? I've looked at how we potentially could support it, but it looks like pain to implement atm. So there is already some code inside nir to handle this, but I'm convinced it's all wrong. We can't do the lowering while in nir or rather.. we can only do it as long as we still have all functions, so it's entirely incompatible with the per kernel approach we are
21:14karolherbst: doing atm. vtn kinda has to set the layout of the global data buffer of a spirv + provide a buffer with the init data. The frontend takes that buffer and preinitializers memory with it. The frontend lowers load_global_base_ptr. And nir simply translates access to that buffer to offsets according to the layout set by vtn (mostly the order of all variables, maybe we can still sneak some DCE in).
21:15karolherbst: at least those are my initial thoughts on how to implement it
21:15karolherbst: can discuss it tomorrow or next week or whenever fits for you
21:21jenatali: karolherbst: Can we pass in a "hidden" additional kernel arg (i.e. ubo var) for the base address of the persistent global mem?
21:21jenatali: At least that's kind of how I expected it to look
21:21karolherbst: yeah, but the frontend can just do it
21:21karolherbst: when it sees load_global_base_ptr, it can add an internal argument for it
21:22jenatali: Oh I see. Yeah alright
21:22karolherbst: I just don't really know how to deal with the initial data actually
21:23karolherbst: the pain part is, that currently we don't translate one spirv to multiple nirs, but rather do it per entrypoint
21:23jenatali: Right
21:23karolherbst: not sure if that will get us all of the memory initializers even
21:24jenatali: It won't if you DCE
21:24karolherbst: I think this feature might even the point where we say "alright.. spirv_to_nir will one nir with all entrypoints and frontends have to dup it to compile against one"
21:24karolherbst: *generate
21:24jenatali: Yeah, you could already do that if you ask for the nir as a library though, can't you?
21:25karolherbst: mhhhh
21:25karolherbst: good question
21:25karolherbst: I think there are some benefits in doing that anyway once we get rid of inlining even
21:26karolherbst: like.. we could pre optimize the nir before an entry point gets picked
21:26karolherbst: maybe I'll play around with that approach
21:28karolherbst: but yeah.. that would allow us to optimize the nir already, do some initial DCE and extract the global init data then before splitting the nir
21:28jenatali: Yeah
21:28karolherbst: could also DCE function arguments if we aren't doing it already
21:29karolherbst: though I'd keep them for entry points, because they are pain to deal with, because API validation rules
21:29karolherbst: not sure
21:29karolherbst: but that's for later anyway
21:30karolherbst: mhh.. create_library is used quite a bit, have to read through the code and see what might need adjustments
21:35jenatali: I suspect it's mostly used for raytracing these days
23:07karolherbst: and the libclc library
23:28karolherbst: jenatali: uhh.. this starts to become a real headache project. We also have per kernel information stored in the shader_info struct, e.g. the workgroup size
23:28jenatali: Yeah
23:29karolherbst: maybe pytorch doesn't need it and I can ignore this problem for a while longer 🙃
23:29karolherbst: but seems like that for general HIP support on top of CL we'll need it
23:31karolherbst: maybe gfxstrand has a smart idea, but I suspect if we want to support it properly we really have to convert to a model where nir can handle multiple entry points
23:32karolherbst: even for initializers we'll need it, because... we have to decide on a global variable buffer layout anyway