07:13tzimmermann: danvet, how do you feel about making simple_encoder the default for drm_encoder_init() that has no funcs argument?
08:21danvet: tzimmermann, the trouble I see is that strictly simple_encoder only works with drmm_
08:22danvet: if if the encoder is embedded into the drm_device structure (well the overall driver struct that also contains drm_device)
08:22danvet: or by allocating it with kzalloc and having a kfree in your encoder->ops->cleanup function
08:22danvet: or with drmm_kzalloc
08:23danvet: so for most drivers which don't have their own ->cleanup there's a bug
08:23danvet: only exceptions are the simple ones
08:23danvet: hence why I think we should try to figure out how to cut encoders over to drmm_
08:23danvet: or at least, how that works together really neatly
08:23danvet: to make sure we're going in the right direction
08:24danvet: that was at least my plan, atm I'm busy getting rid of the interim drmm_add_final_kfree :-)
08:24danvet: pq, thx for writing my reply for me
08:25danvet: j4ni, you too :-)
08:27danvet: j4ni, I already mentioned somewhere else that we need this in i915+nouveau
08:27danvet: or upstream wont care
08:27danvet: 2nd part implied
08:33tzimmermann: danvet, just asking before i convert all these drivers to the simple encoder. i cannot use it for writeback connectors, as they are in core and simple encoder is in helpers. but it's no big deal
08:35danvet: tzimmermann, yeah essentially my idea would be that with the drmm-ification we'd also move the default cleanup to core
08:35danvet: or something like that
08:36danvet: but I haven't looked at any of this in detail yet, so my gut feeling is totally out of tune most likely
08:36danvet: this = drmm + simple encoder
08:36tzimmermann: danvet, yeah
08:37danvet: I'm also pondering a lot how to make sure bugs don't come back
08:37danvet: e.g. if we have a drmm_encoder_init (automatically simple or not tbh)
08:37danvet: to make sure that drivers _dont_ allocate this with devm_kzalloc (which would be buggy)
08:38danvet: need to work on some debug patch for devres and discuss that with gregkh
08:38danvet: but the drmm_add_final_kfree removal I'm working on also has some patch for devres.c already that needs discussion
08:39danvet: https://cgit.freedesktop.org/~danvet/drm/log/ sneak peek if you want, it starts with "drivers/base: Always release devres on device_del"
08:39danvet: pinchartl, btw if you have time for some early reactions on https://cgit.freedesktop.org/~danvet/drm/commit/?id=6a39a64d8a5d0b6c6f55b8a0344f75443432b910
08:39tzimmermann: i thought that drmm-code is supposed to become available for all subsystems? sort of compementing devres?
08:39danvet: tzimmermann, atm it's drm only
08:39danvet: well drm_device only
08:42danvet: that idea was more "could be done if someone is interested"
08:42danvet: tzimmermann, which other subsystem would you want this?
08:42tzimmermann: danvet, that's quite a lot of description for a two-line patch :)
08:43tzimmermann: danvet, i don't know about the other subsystems.
08:46tzimmermann: danvet, i'm confused by the naming: shouldn't devm_drm_dev_alloc() be called drmm_dev_alloc()?
08:47danvet: because the cleanup action of devm_drm_dev_alloc is a devm action, which calls drm_dev_put
08:47danvet: so devm prefix
08:47danvet: it does the drmm_add_final_kfree to set that up, but that will soon be an implementation detail
08:48danvet: only after the drm_device is set up can you do drmm_ stuff
08:48tzimmermann: danvet, then it makes sense. thanks for explaining
08:49danvet: this is the entire problem here, there's 2 distinct auto-cleanup things now, tied to two different lifetimes
08:49danvet: and often very tricky to figure out which one should be used
08:49danvet: hence why distinct names (so that people at least ask the question when reading code)
08:49danvet: and also trying to have debug code to make sure you can't mix them
08:49danvet: like devm_kzalloc together with a drmm_encoder_init
08:51tzimmermann: danvet, true. it can be confusing to understand the requirements of when to GC each structure
09:10pq: danvet, ahaha, you're welcome :-)
09:13j4ni: danvet: ditto ^
09:14j4ni: pq: also happy that the responses aren't just i915 centric
09:17MrCooper: pq: and for me too o/\o
09:19pinchartl: danvet: for what it's worth, converting test drivers to real drivers is what V4L2 has done
09:19pinchartl: (well, not convert them, they were real drivers from the start)
09:20pinchartl: and I think that would be a nice model, as otherwise we may run into other issues caused by the fact they're not real drivers, and will have to fix those too
09:20pq: This is the best feedback I've got... ever? :-o :-D
09:21danvet: pinchartl, I guess we can bikeshed that on-list ... it only affects 3 drivers really
09:21danvet: plus the final cleanup, which I haven't typed yet
11:03daniels: ivyl: i've pushed a change which will reduce igt-ci-tags impact on fd.o storage cost a little bit, but could you please look into using the ci-templates project? that will make your containers smaller and more efficient, and as a bonus also eliminate a lot of the open-coding you currently do \o/
12:03ivyl: daniels: thanks! I have switch on my todolist. I had a lovely chat with a some of the people involved and for us this would need few new things in ci-taempaltes first
12:11daniels: ivyl: oh nice, that's great, thanks
12:11daniels: ivyl: please let me know how you get on
12:33danvet: daniels, btw are we tracking anywhere the various project specific efforts for size/bw reduction
12:34danvet: hm wrong chat
12:34danvet: maybe for including in Lyude's summaries
12:34daniels: yeah would be a good one to track, not currently anywhere apart from IRC
13:13hakzsam: is marge borken?
13:14danvet: daniels, I guess there's no easy way to search specific issues across all projects?
13:15danvet: so that we could file issues with projects and still track them globally
13:16danvet: daniels, like maybe some kind of global milestone we could assign them all to?
13:17danvet: i can only create milestones for groups
13:17daniels: not that I'm aware of; the bronze solution is just referencing them all from one fdo master issue and tracking by hadn't
13:18danvet: well we do seem to have milestones
13:18danvet: I think that's a recent addition
15:15MrCooper: shadeslayer: please always check the "Allow commits from members who can merge to the target branch" box when creating an MR
15:16MrCooper: so others can rebase & control pipelines
15:23robclark: it would be nice if gitlab could let us make that checked by default
15:26Venemo: daniels: CI is giving me trouble again. this time it's the x86 build: ERROR: Preparation failed: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
15:30shadeslayer: MrCooper: oh, I wasn't aware, I'll make sure to keep that in mind next time
15:34MrCooper: anholt: any idea why Marge isn't merging https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4373 ?
15:36kisak: MrCooper: Marge Bot @marge-bot merged 1 day ago?
15:36hakzsam: yeah, it's merged?
15:36MrCooper: sorry, I mean https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4349
15:37hakzsam: yeah, it's weird
15:37hakzsam: oh, pipeline passed this time, it's a start
15:52shadeslayer: Could someone reassign https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4377 to marge bot?
16:20hakzsam: MrCooper: marge failed to merge it again but pipeline passed ...
16:30anholt: MrCooper: https://gitlab.freedesktop.org/hakzsam/mesa/pipelines/127171/builds shows a bunch of erstarts of meson-clang jobs such that the pipeline probably lasted longer than marge's timeout
16:30MrCooper: no, that's not the issue
16:31MrCooper: Marge is just sitting there doing nothing even if the pipeline passed
16:31MrCooper: (or failed)
16:31anholt: so, what was going on with those meson-clang jobs in that pipeline?
16:32MrCooper: some of the packet runners were having issues
16:32MrCooper: see #freedesktop
16:32anholt: I hear that, but was that not the pipeline that for marge or something?
16:32anholt: if that pipeline spanned as long as the "hours ago"s listed in that link, then I would expect marge to time out
16:34MrCooper: Marge was still sitting there doing nothing even after that pipeline had failed
16:35MrCooper: until she timed out
16:36MrCooper: then I restarted the failed jobs and re-assigned the MR to Marge, and she again didn't do anything, even after the pipeline passed, until she timed out
16:37anholt: hmm. if you restarted the failed jobs then reassigned to marge, I would expect the old pipeline to go from failed back to running, and marge would have the MR back in the queue to be re-rebased and a new pipeline run. but maybe my expectations for marge are wrong?
16:37MrCooper: that same procedure worked before
16:37anholt: We unfortunately only get about 3 minutes of logs from marge, so there's not much I can do after the fact.
16:37MrCooper: Marge just waited for the pipeline to pass and then merged it
16:38MrCooper: but now she seems to be having trouble getting the pipeline status (notifications) or something
16:39anholt: don't see any upstream changes to pull
16:39anholt: for marge
16:42MrCooper: let's see what happens with https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4084
16:55MrCooper: she merged that, so far so good
16:55SolarAquarion: i'm interested in this https://gitlab.freedesktop.org/cubanismo/mesa/-/tree/nouveau-improve_format_modifiers
16:56SolarAquarion: although it's quite a big development afaik because nvidia doesn't like gbm
18:03hakzsam: daniels: meson-windows-vs2019 seems to take a lot of time before being picked up by a runner
18:03robclark: anholt: just had a look at scrollback in #freedreno-ci .. # of reported flakes is down *massively* in last couple days :-)
18:04daniels: robclark: \o/ \o/
18:04anholt: yeah, I think most of them are just un-rebased branches
18:04daniels: hakzsam: guess it's popular then ... surprising as it only takes 6min per job and we have one job per pipeline. i'll look at what we can do for capacity.
18:05robclark: yeah.. the garbage ubwc thing seems to have been the biggest issue..
18:05daniels: thanks a lot for fixing that up both of you
18:05hakzsam: daniels: great, thanks
18:44jcristau: is gitlab the right place to report possible mesa security issues?
18:44anholt: jcristau: probably, in an issue marked confidential/non-public (for what that's worth)
18:49krh: robclark: fantastic, nice catch anholt
18:54anholt: krh: we still need to sort out the initialization bit, but it's nice to have the freedreno flood of intermittent failures in my inbox gone
18:54anholt: now it's s390x and collabora's lava lab
18:56ajax: is s390x about flaky tests or about timeouts?
18:57imirkin: anholt: memset of the uwbc flags fixed everything?
18:58anholt: ajax: the timeouts
18:58anholt: I think it's "spinning up 150 qemus takes too long when the system is under load" as we see single-process tests sometimes take a few seconds.
18:59anholt: we could maybe just crank the timeout on that test up to 5 min or something
18:59anholt: but "5 minutes of faffing in emulating the compiler" makes me think that we might be doing something silly.
19:01ajax: the compiler's not emulated, it's cross-building. the tests get emulated.
19:01anholt: I meant the glsl compiler
19:02anholt: (and, ftr, I'm pretty sure the "something silly" is "initializing glsl builtins 150 times" and making that less slow is actually hard because we've tried many times already)
19:06ajax: enh? it's just ninja test, it's not running piglit or anything. there are glsl compiler tests but the format checking stuff is way more of the runtime afaict.
19:12daniels: anholt: yeah, I don't know why our LAVA lab has so suddenly gone so bad, but I've not seen those failures before. we're trying to work it out, and if we can't then we'll disable it until we can
19:12anholt: ajax: even a non-timeout job suggests that glcpp tests are the slow ones under qemu https://gitlab.freedesktop.org/llandwerlin/mesa/-/jobs/2130141
19:13anholt: (compared to formats)
19:13ajax: blah, clearly i meant blend
19:13robclark: imirkin: well, it at least fixed teh UBWC related flakes.. (we may still want to move the memset to blitter.. or maybe the hw gives us some bit we can set to indicate that there is no valid ubwc).. there are still some xfb flakes (which is actually nothing to do w/ xfb and something to do w/ varyings.. which might be related to the unexplained nops blob sometimes inserts at start of frag shader)
19:14ajax: and, tbqfh, the primary reason to want the s390x tests is keeping llvmpipe linking and the format code big-endian-clean
19:14ajax: so if there's a way to skip those jobs if you're only touching iris, i'm totally fine with that
19:19mattst88: there are various unit tests in mesa that fail on big endian (u_format_test comes to mind, https://gitlab.freedesktop.org/mesa/mesa/issues/1036)
19:19gitbot: Mesa issue 1036 in mesa ">=19.0.0 fails u_format_test on x86/ppc/ppc64" [Bugzilla, Opened]
19:20anholt: mattst88: the tests have incorrect expectations on BE for certain classes of formats, unfortunately.
19:20mattst88: yeah, that was my conclusion as well
19:21mattst88: it's somewhere way down my todo list to reinvestigate...
19:21anholt: (there's also some hilarity in llvmpipe needing formats to be mis-specified on BE in a certain way to work around llvmpipe handling formats wrong!)
19:21mattst88: just got to make sure you have an even parity of errors
19:21anholt: it's the big endian way!
19:23krh: I like big endians and I cannot lie
19:25anholt: let's try this https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4411
19:34robclark: jekstrand: nir_from_ssa question.. is this a bad idea? https://gitlab.freedesktop.org/robclark/mesa/-/commit/4ce810a17e7c89a0792bde5d7158baef218af1a7 .. it is a massive help for flow-control-heavy shaders, perhaps *too* good..
19:38anholt: I'd noticed that pattern back at broadcom, too. would be nice to get a fix!
19:38krh: huh, yea... seems unfortunate to turn that into a long dependency chain
19:41jekstrand: robclark: Oh, boy.....
19:41jekstrand: robclark: In principle, it should be ok but it might break the algorithm
19:41jekstrand: I would have to give it a VERY careful read to be sure it doesn't
19:42robclark: I *think* restricting it to ssa should be ok, or at least I've not managed to convince myself otherwise..
19:42robclark: but it is a pretty massive win in some flow control heavy shaderytoy ;-)
19:44jekstrand: robclark: Is it also a win on Intel? We don't have delay slots but a MOV is like 14 cycles
19:46jekstrand:reads nir_from_ssa code
19:47robclark: not sure about intel.. this is possibly (assuming correct) the sort of thing we want to make a driver configurable thing?
19:48jekstrand: Oh, I'm not worried about making it worse. I want to know if it makes something better. :-)
19:48robclark: could be higher register pressure by keeping the ssa value live longer, too
19:49robclark: but yeah, I'd assume not having dependent read/write changes could help
19:49robclark: jekstrand: fwiw, this was the shader I was looking at: https://www.shadertoy.com/view/3l23Rh
19:49anholt: why, I happen to have a fun little tool here that I would like to throw a perf change at over lunch right now. let me build that hack.
19:51jekstrand: robclark: My gut is that what you did there is probably mostly correct (might need a bit more tweaking). The primary issue with it is that it leads to registers beeing "freed" more slowly and increases the chance we hit a cycle and have to allocate a new register to resolve.
19:52robclark: it looks (from a limited # of sample points) like it helps manhattan a bit too, although most of the flow control there is VS
19:53robclark: jekstrand: hmm, ok.. don't suppose you have an example evil shader? I can look at that some more after lunch.. (at least having an example would help me understand the logic there better)
19:54jekstrand: robclark: No, I don't
19:54jekstrand: robclark: But I suspect that, if we restrict to SSA only, we'll be very unlikely to find an "evil" shader.
19:54jekstrand: robclark: I'm going to shader-db this
19:55robclark: ok, cool
19:55jekstrand: robclark: I suspect it's a good idea
19:55jekstrand: I seem to have recalled thinking about this when I wrote out-of-SSA
19:55anholt:throwing it at 80 traces
19:55jekstrand: Which also makes me think I ruled it out for some reason. :-(
19:57jekstrand: robclark: Ok, I'm pretty sure now that your patch is NOT correct as-is
19:58jekstrand: Hrm... Or maybe it is?
19:58jekstrand: Bah. This algorithm is tricky
20:02jekstrand: robclark: I think I found a better patch that's more obviously correct
20:02jekstrand: robclark: It also works even in non-SSA cases :)
20:02robclark: ok, cool
20:04jekstrand: robclark: How big of a delta did you see in that shader toy?
20:05robclark: not quite 2x.. but close to it!
20:06robclark: so either there is something wrong about my patch, or there are a *lot* of loop iterations ;-)
20:07jekstrand: I can't remember, is --disable-gpu-sandbox enough to run a custom mesa with Chrome?
20:10robclark: hmm, oh, yeah, could be you need tha if you are trying to override LD_LIBRARY_PATH?
20:10robclark: jekstrand: kmscube has a shadertoy mode fwiw ;-)
20:11robclark: so you can watch clouds on a spinning cube
20:17jekstrand: robclark: Does this give you the same benefit?
20:17robclark: let's see..
20:18jekstrand: robclark: Do you have a simple shader_test file which shows this off?
20:19robclark: yeah, https://paste.centos.org/view/7a88a556
20:20robclark: yup, looks like I get the same gain
20:24anarsoul: robclark: nice
20:24anarsoul: (shadertoy mode)
20:28jekstrand: robclark: I find your numbers extremely suspect
20:28jekstrand: robclark: Also, the thing that's doing nasty things for you is a constant.
20:29jekstrand: Which your back-end may not be able to figure out if they're not ssa values.....
20:29jekstrand: I guess I tend to forget how spoiled we are with FS's copy-prop.
20:30jekstrand: robclark: Shader-db resulted in no change on Intel
20:31jekstrand: robclark: Probably because this only tends to come up in reality when you splat 0 out to a vector and things of that like
20:31jekstrand: I'm going to double-check that shader-db though...
20:31jekstrand: robclark: Also, That's a tiny change in a very large shader. How does it result in 2x perf?????
20:32krh: maybe a lot of iterations
20:32jekstrand: But there's piles of unconditional math in that loop
20:33jekstrand: I bet it's just twiddling with RA and you're getting massive differences due to that.
20:34krh: maybe it increases register pressure, which bumps us up into the next register bucket, which let's us pipeline a lot more math...?
20:34jekstrand: That would also make sense
20:35jekstrand: In any case, I'm very sure this isn't the real problem
20:35jekstrand: That said, what it was doing before was stupid if MOVs cost anything at all so I'm happy to land it.
20:40anholt: no statistically significant perf difference over lunch on glmark2, bgfx, dolphin, humus, supertuxkart
20:49jekstrand: robclark, anholt: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4412
20:50jekstrand: No shader-db impact on Intel. I checked twice.
20:50jekstrand: But I still think it's better so there you go
20:51jekstrand: Oh, and when I say "no shader-db impact" I'm including cycle estimates. Likely no shader changed at all.
20:51anholt: well, that would also explain my intel results :)
20:51anholt: though I find it surprising given how often I remember a pattern like this from v3d days
20:51jekstrand: I think our copy-prop is just more awesome than yours
20:51jekstrand: It gobbles through that chain and undoes it
20:52jekstrand: Those kinds of direct MOV chains are what a good non-SSA copy prop pass eats for breakfast.
20:55anholt: yeah, I think I got my v3d copy prop to do it, too.
21:04robclark: jekstrand: yeah, our backend isn't clever enough to clean that up.. I do think that other than generic things (ie. spending less time w/ threads diverged) that it is helping reducing our GPR read conflicts (since a lot of the movs from regA to regB turn into things w/ an immediate encoded, so no GPR read)
21:05robclark: iirc I wasn't seeing a change in register footprint.. so that isn't the thing for us (but let me double check that)
21:10robclark: anholt: hmm, I think your query resource_used() change needs some locking?
21:13anholt: cool, hidden under #ifdef DEBUG
21:15robclark: yeah, might be good to do debug builds in CI..
21:18robclark: jekstrand: yeah, it is helping a lot on GPR reads.. the short version is this shader has a lot of SFU (sin/cos/rsq/etc).. which like tex/mem instructions execute async.. ie. they even read their src regs and begin exec async from alu instructions.. the hw can sustain 2 gpr read per cycle, which is cool if you are all 2src alu instructions.. but throw in mad's and SFU/TEX also competing w/ ALU instructions to read
21:18robclark: from register file and you start getting stalls.. so replacing those mov's w/ mov-from-immed helps
21:41anholt: my trace perf testing tool is now updated with renderdoc support: https://crates.io/crates/gpu-trace-perf
21:55jekstrand: anholt: Doesn't inserting a timestamp query per draw nuke perf?
21:59robclark: shouldn't really.. although timestamp is less useful for tilers
22:03robclark: (well, it has some effect on perf.. but not as bad as needing a WFI between draws)