IRC Logs of #dri-devel on irc.freenode.net for 2024-02-15

08:15 dj-death: karolherbst: this case I'm dealing with is kind of annoying
08:15 dj-death: karolherbst: how do I build a non cast deref with a different type than the structure or structure fields
08:16 dj-death: I guess I could do this with arrays...
08:18 dj-death: because I can't have a load_deref of the original struct
08:23 dj-death: I guess all I can do here is teach nir_lower_vars_to_ssa about those llvm derefs
08:41 dj-death: ah wtf
08:41 dj-death: even if I call nir_lower_vars_to_explicit_types
08:41 dj-death: it doesn't give a explicit stride to some of the temp variables
08:50 dj-death: lol
08:50 dj-death: yeah, all structs are created without an explicit_stride
09:15 dj-death: and of course all the stride fields are not saved in blobs
09:33 u-amarsh04: if anyone able to help get me started with git-bisecting mesa under Debian?
09:45 karolherbst: dj-death: uhh.. that doesn't sound too good :') guess we may have to focus on cleaning this all up
10:04 dj-death: karolherbst: yeah :(
10:05 dj-death: karolherbst: it's kind of hard if you have non vec4 types of structs
10:08 dj-death: I'm loading indirect structs and they're vec5
10:08 karolherbst: oh no
10:08 karolherbst: I do wonder though what we want to do about implicit strides...
10:09 karolherbst: like in the trivial case where the cast has a stride, but we haven't assigned one to the struct fields
10:14 dj-death: I have this struct which is local and 18 dwords
10:14 dj-death: it gets packed into 10 dwords
10:15 dj-death: how stufff gets packed in the local data is completely irrelevant
10:15 dj-death: but yeah we're completely unable to put all of that in ssa values atm
10:16 dj-death: I almost wonder if it's not just easier once everything has been lower to scratch load/stores to build back ssa values out of it :)
10:21 karolherbst: dj-death: I _think_ this is the proper fix for the original kernel: https://gitlab.freedesktop.org/karolherbst/mesa/-/commit/d9d83f40de680f49928c40869d414b2130ed7c68
10:24 karolherbst: but yeah.. still hitting an assert.. let's see what's the problem there..
10:25 karolherbst: 32x4 %54 = @load_global (%52) (access=none, align_mul=8, align_offset=0)
10:25 karolherbst: @store_scratch (%54, %7 (0x0)) (align_mul=256, align_offset=0, wrmask=xyzw)
10:25 karolherbst: 64 %55 = @load_scratch (%25 (0x8)) (align_mul=256, align_offset=8)
10:25 karolherbst: 64 %56 = @load_scratch (%7 (0x0)) (align_mul=256, align_offset=0)
10:25 karolherbst: 🙃
10:28 karolherbst: I think that's your vec4 problem
10:34 naveenk2: HI Nemesa
10:35 nemesaga: Hi Naveen
10:36 adarshgm: test msg
10:38 nemesaga: Hi All
10:42 u-amarsh04: I'm trying to git bisect mesa under Debian / Devuan and have only git bisected kernels before. Are there any up-to-date guides to help?
10:43 karolherbst: u-amarsh04: you want to use `meson devenv` so you won't have to install stuff
10:43 karolherbst: but that's pretty much it
10:43 karolherbst: just write your reproduce script and use `git bisect run`
10:43 karolherbst: or do it manual if you can't automate it
10:47 u-amarsh04: so if I have git source in /usr/src/mesa and from in that directory have run "meson setup ../meson-test", what next?
10:55 dj-death: karolherbst: I'm starting to think lower_vars_to_ssa needs an upgrade
10:56 dj-death: karolherbst: we should really be able to remap everything if all accesses are not indexed with dynamic values
10:56 karolherbst: yeah...
10:56 dj-death: take the whole struct size
10:56 dj-death: split it into vec4s
10:57 dj-death: rebuild the casts
10:57 dj-death: I mean even vec16 right
10:57 dj-death: if it's all constant offsets, it'll get splitted correctly
10:58 karolherbst: it would probably be enough to just take into account actual location instead to just look at the logical paths
10:58 karolherbst: though we don't always have this information when calling into vars_to_ssa
10:58 karolherbst: we could require explicit types, but then we need to fix a couple of passes who can choke on that
10:59 karolherbst: though maybe only nir_opt_memcpy needs fixing
10:59 karolherbst: ehh
10:59 dj-death: explicit types is another issue on its own :)
10:59 karolherbst: maybe those things got fixed actually
10:59 karolherbst: well
10:59 karolherbst: you don't know the size of a struct without explict types
11:00 karolherbst: so taking the size into account is impossible without that
11:00 dj-death: the fact that we lower explicit types and the explicit_stride is still 0
11:00 dj-death: that's completely broken
11:00 dj-death: that's probably step0
11:00 karolherbst: I sadly don't know what was the reason for that, but gfxstrand might be able to tell
11:02 karolherbst: though I think explicit stride is only set in vtn by creating the type
11:03 dj-death: it's never set on structs
11:03 dj-death: only arrays & matrix
11:03 karolherbst: yeah, because structs don't have explicit strides
11:04 karolherbst: I think the bug is rather that we shouldn't rely on the explicit stride being set at all
11:04 karolherbst: _though_
11:04 karolherbst: having like this duality also sucks
11:05 karolherbst: but this smells like "spend a month to rework all of this" to change how we do things here
11:08 dj-death: can't we just special case with nir->info.stage == MESA_SHADER_KERNEL and glsl_get_cl_size ? :)
11:09 karolherbst: I don't see how that would actually help? Because the fix is to not rely on the explicit stride anyway
11:09 karolherbst: I think
11:10 karolherbst: though I think that assigning a stride to every type is probably the better long term solution here and just rely on that. In any case, we should discuss with gfxstrand before somebody spends a lot of time on this and then we decide something else
12:05 thellstrom: Has there been any discussion about using __GFP_ACCOUNT for user-mode triggered allocations in the kernel mode drivers? I can imagine starting adding that in existing drivers may break existing setups. But for new drivers, is this a good time to audit for such allocations and add it?
12:16 alyssa:would like to get rid of SHADER_KERNEL someday
12:21 sima: thellstrom, doing that for system ram gpu allocations is pretty much what the last big cgroups discussion boiled down to
12:21 sima: it's just kinda a pile of work
12:22 karolherbst: alyssa: that requires fixing zink :')
12:22 sima: thellstrom, wrt the backwards compat thing, I figured we'll just do a Kconfig
12:22 karolherbst: or well.. rework how gl sampler/textures work
12:22 sima: since it's kinda a distro choice
12:23 sima: but that means when we have it, we should at least try to account consistently across drivers
12:23 karolherbst: not sure we have any other reasons why there is still KERNEL
12:23 sima: which I don't think is that much work since almost everyone uses helpers nowadays
12:23 sima: the other fun part is that you kinda need a cgroups aware shrinker or it'll not work out great at all
12:25 thellstrom: sima: I was thinking of starting adding it initially to things we can't really shrink. Like persistent structure allocations triggered by user-space etc. But yeah for shrinkable buffer object memory, cgroups-aware shrinkers would be needed.
12:26 sima: thellstrom, I fear a bit that if we're very piecemeal then we need a new opt-in every time we add a substantial amount of memory
12:26 sima: but if we start out with gem bo accounting first, then adding the other bits should only ever really catch abusive applications that e.g. create a ton of ctx they don't actually use
12:27 sima: thellstrom, t j mercier did work on this last, including some charge transfer stuff that android would need, for otherwise it all lands in the central allocator binder process
12:29 sima: thellstrom, I chatted with mlankhorst about this earlier this week and also dropped a few links there, can dig them out again if you want
12:32 thellstrom: NP. I can ping mlankhorst if needed. This was more like a general question whether it was a "No, don't do that" thing.
12:57 zmike: dj-death: is there any progress on making anv work again or do I need to just locally revert however many patches
12:58 dj-death: zmike: not yet
12:58 zmike: 🤕
12:59 karolherbst: could use llvm-16 locally
13:03 dj-death: zmike: probably going to implement that stupid scratch to ssa plan
13:06 u-amarsh04: karolherbst - thanks, I think that I have it working now
13:06 karolherbst: dj-death: how would that work though?
13:06 karolherbst: like...
13:06 karolherbst: we need to be able to optimize it all before explicit_types
13:06 karolherbst: or rather...
13:06 karolherbst: before explicit_io
13:07 karolherbst: but maybe running explicit_types twice or moving scratch size calculation elsewhere might actually work?
13:07 dj-death: after explicit_io
13:07 dj-death: because I know my shaders
13:07 dj-death: with a special intel_clc --fuck-you-llvm-17
13:08 karolherbst: but then you can't calculate a new scratch_size
13:08 karolherbst: ahh.. so intel_clc specific pass then?
13:08 dj-death: I assume it'll be 0
13:08 karolherbst: right...
13:08 dj-death: dirty
13:08 karolherbst: probably good enough until we properly fix it
13:08 dj-death: but at least that'll make zmike happy
13:08 dj-death: yeah
13:10 zmike: generally being able to init drivers does make me happy, yes
13:10 zmike: MrCooper: did you have additional comments or just the one
14:19 dj-death: karolherbst: it's interesting that all the function stuff is using scratch
14:19 dj-death: karolherbst: and yet once you inline it goes away magically
14:20 alyssa: dj-death: i hit this with libagx too, yeah
14:20 alyssa: The big culprit are return values
14:21 dj-death: but that seems okay for me
14:21 alyssa: which are derefs in NIR so we're forced to use scratch variables for me
14:21 alyssa: s/me/them/
14:21 alyssa: but after inlining, vars_to_ssa chews through them and makes the variables go away
14:21 alyssa: but scratch_size is never decremented
14:56 dj-death: zmike: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27637 working for you?
14:57 zmike: dj-death: will test in a bit
15:03 karolherbst: dj-death: why is it interesting that function uses scratch? where else would you put it (besides SSA values)?
15:03 karolherbst: or do you mean the function parameters?
15:03 dj-death: karolherbst: yeah parameters
15:03 karolherbst: that's probably because kernels are cursed...
15:04 karolherbst: you can read up on some of the kernel wrapper we emit inside spirv_to_nir
15:04 karolherbst: but memory has to go somewhere anyway
15:06 dj-death: could be something else
15:06 dj-death: like stack
15:06 karolherbst: but I think most of it is due to llvm placing anything bigger always in memory? but the spirv way of passing things by value is also by putting them into function memory
15:06 karolherbst: and then pass in pointers
15:06 karolherbst: and yeah.. CL allows you to take pointers to stack memory :)
15:07 karolherbst: though I think the reason we end up with so many pointers as function arguments is because that's how LLVM and the translator work
15:07 dj-death: because for instance execute in RT pipelines goes into a different memory location than scratch for us
15:09 karolherbst: mhhh
15:09 karolherbst: we could potentially make use of other memory regions, but not quite sure how that all would look like as CL Creally has a stronger model here than GLSL
15:22 jenatali: dj-death: isn't scratch the same as stack?
15:23 dj-death: jenatali: what do you mean?
15:23 jenatali: alyssa: yeah see the long discussion around scratch_size yesterday and how it's currently terrible. You have to reset it and re-run vars to explicit types after optimizations
15:23 alyssa: jenatali: yep (:
15:24 jenatali: dj-death: my interpretation of scratch is that it's equivalent to the C concept of stack memory
15:25 karolherbst: I fixed it for rusticl today 🙃 https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27634
15:26 dj-death: jenatali: yeah, that's not really how our HW works
15:26 dj-death: jenatali: we have a special way to store scratch, which offsets magically per lane
15:26 karolherbst: jenatali: depends on how you look at it... technically CL private is more like thread local storage and CL local is more like CPU C stack?
15:26 karolherbst: just without dedicated hardware
15:26 dj-death: jenatali: but that only works if you don't reorder thread through shader calls (which our HW does)
15:27 dj-death: jenatali: so we have to store stack stuff somewhere else, independent of HW thread location
15:27 karolherbst: wait...
15:27 karolherbst: that's kinda cursed
15:27 jenatali: Oh sure, scratch is like stack from the single threaded POV, where local is like stack from the SIMT POV
15:28 karolherbst: dj-death: so you basically have to use a global buffer and offset per thread manually?
15:28 dj-death: karolherbst: correct
15:28 dj-death: for the stack
15:29 dj-death: we only use that in the RT pipelines atm
15:29 karolherbst: yeah.. figures
15:31 karolherbst: anyway... trying to figure out to fix this other issue with scratch..
15:34 karolherbst: dj-death: uhh.. how can I figure out the easiest way which kernels (as in the CL C code) causes issues?
15:35 dj-death: karolherbst: from what I've seen it's mostly structure
15:35 dj-death: karolherbst: and temps
15:36 karolherbst: I mean.. sure, but I still want to see the code so I can copy it instead of trying to figure out the examples myself
15:36 dj-death: karolherbst: like a fairly large private structure (20 dwords) passed by pointer to a function
15:36 dj-death: and the function picking some values to do something else with it
15:36 dj-death: I can only give you some example of what is causing problems there
15:37 dj-death: s/there/here/
15:37 karolherbst: yeah, that would be good enough for now
15:37 dj-death: okay, let me try to generate a few
15:37 karolherbst: like.. if you hav ethe code or should I just dump whatever gets passed into compiler_shader?
15:38 dj-death: I mean it's all upstream
15:38 dj-death: like this : https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/intel/shaders/generate.cl?ref_type=heads#L10
15:39 dj-death: but then I can simplify it
15:39 karolherbst: I tihnk I just want a way to dump what's generated or so...
15:39 karolherbst: or well..
15:39 dj-death: that will use scratch too : https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/intel/shaders/generate.cl?ref_type=heads#L81
15:39 karolherbst: maybe I just hack clc and dump the code for me or something
15:40 dj-death: yeah you can dump out what comes out of clc
15:40 dj-death: I'm not sure what form you want
15:40 dj-death: SPIRV? OpenCL C? NIR?
15:40 karolherbst: CLC
15:42 dj-death: you can run the intel-clc command with -t /tmp/out.txt
15:42 karolherbst: ahh, nice
15:42 dj-death: it'll print out the concatenation of all .cl files
15:43 dj-death: just take whatever is in the ninja file
15:43 dj-death: or just touch one of the cl file and ninja --verbose
16:00 zmike: dj-death: yes
16:00 dj-death: zmike: thanks
16:09 karolherbst: okay.. seems like my fix already takes care of some of those issues
16:19 karolherbst: dj-death: that's how I'm testing that stuff now with :D https://gist.github.com/karolherbst/2c6c6779dbfb8c229f3b36f2a59e95e2
16:19 karolherbst: and piglits cl-program-tester
16:20 karolherbst: (and adjusted -I flags)
16:22 karolherbst: it's kinda nice, I can just change variables to constant to simplify the code and stuff
16:31 dj-death: karolherbst: oh nice
16:32 karolherbst: but yeah.. looking at that vec4 stuff now.. that's an odd one
16:33 dj-death: yep
16:33 dj-death: tweaking the structure size, I've seen different patterns
16:33 dj-death: with vec2 as well
16:33 dj-death: I support for 6 dwords
16:39 karolherbst: pain
16:41 karolherbst: like.. the difference between llvm-16 and 17 here is.. "small" but it leads to entirely different code paths
16:42 karolherbst: with llvm-16 I see a cast to u8*, with llvm-17 a cast to u64*
16:43 abhinav__: vsyrjala jani hi GM. Can I pls get your reviews on https://patchwork.freedesktop.org/patch/577863/ and https://patchwork.freedesktop.org/patch/578154/
16:43 karolherbst: the reason llvm-17 ends up with u64* here is, because the translatoe "guesses" the function signature of the called function by looking at the pointer types it has atm, because LLVM doesn't have typed pointers anymore
16:43 karolherbst: soooo
16:43 karolherbst: e.g. if you call into memcpy, it might not be a void* thing, but a u64* thing or whatever
16:44 karolherbst: anyway.. now we optimize that one cast to a `deref_struct` thing :D
16:45 karolherbst: soooo
16:45 karolherbst: okay
16:45 karolherbst: here is the _actual_ difference
16:45 karolherbst: `nir_opt_memcpy` was able to recognize that the copy copies between two values of the same type and deconstructs it
16:45 karolherbst: or rather
16:46 karolherbst: converts memcpy_deref -> copy_deref
16:47 karolherbst: and we end up using scratch with llvm-17 because of nir_lower_memcpy
16:54 karolherbst: but ultimately that means that the cast -> deref_struct optimization, breaks this use pattern
17:03 jani: abhinav__: when I see patches don't apply, I usually just move on and wait for them to be fixed https://lore.kernel.org/r/170776056427.1183349.16027608030000138738@5338d5abeb45 and https://lore.kernel.org/r/170786949600.1218994.4500927935579436266@5338d5abeb45
17:18 abhinav__: jani strange, I had sent it on top of drm-next
17:18 abhinav__: is there a different branch for the intel CI i should follow
17:19 jani: abhinav__: it's recommended to do development on drm-tip
17:20 jani: it's a bit like linux-next for graphics
17:21 jani: atm drm-next is a few pull requests behind at least on drm-misc-next and drm-intel-next
17:54 abhinav__: jani thanks, I will use drm-tip and post those again. we will need one help though, once reviewed and merged onto intel tree, we will need a tag with those patches so that we can base our tree on top of that as one series of ours needs those.
18:04 lumag: jani, regarding abhinav__'s request. The depending series has mostly passed the reviews, so if possible we'd like to skip unnecessary delays.
18:34 karolherbst: do we have a handy nir helper to iterate over all uses of a def?
18:34 karolherbst: oh... *_foreach_use_*
18:39 karolherbst: dj-death: fixed :3
18:43 karolherbst: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27640
18:46 dj-death: karolherbst: oh thanks, will give this a try
18:46 karolherbst: what a mess honestly..
18:47 karolherbst: but I think a fix here might actually also help optimizing better in a few corner cases regardless of llvm being weird
18:48 karolherbst: maybe we want to have better memcpy opts 🤷
18:48 dj-death: testing ETA 45mn :)
19:02 HdkR: 8
20:32 zmike: mareko: you probably know this off the top of your head - is it legal for e.g., a vertex attrib to be set with vertexArrayAttribFormat(R8_UINT) and then the shader attribute is type int?
20:32 zmike: I've skimmed core and glsl specs and I haven't yet found anything disallowing it
23:05 sarthakbhatt: Hi, I'm trying to fork the mesa repo but unfortunately I'm not able to fork it. I'm kinda new to mesa.
23:08 Sachiel: sarthakbhatt: spam problems required new users be verified, https://gitlab.freedesktop.org/freedesktop/freedesktop/-/issues/new?issuable_template=User%20verification
23:29 sarthakbhatt: thank you for the help. now I can fork it.