IRC Logs of #dri-devel on irc.freenode.net for 2023-09-04

00:43 airlied: yes only enabling it for gnome-shell because that is all I'm paid to care about
00:43 DemiMarie: airlied: makes sense!
00:45 DemiMarie: To rephrase my original question, is the Xe driver significantly simpler than the parts of i915 used when running on Xe architecture hardware? Simpler code generally has fewer vulnerabilities all else equal, and I have no reason to believe that the Xe driver will be of inferior quality once it does reach mainline.
00:45 airlied: no it's just a different driver, without paying for an analysis there is no vector along which you could say it is "simpler
00:45 airlied: "simpler"
00:46 DemiMarie: ah
00:46 DemiMarie:wonders why Intel is paying for it
00:50 airlied: because the current driver reached a level of unmaintainium
00:59 DemiMarie: Is drm_sched the current blocker?
01:00 airlied: I think there's a todo list
01:08 mareko: Marge for libdrm when?
01:09 airlied: do we ever have enough conflicts to require marge?
01:11 DemiMarie: airlied: yup, https://docs.kernel.org/gpu/rfc/xe.html
01:46 mareko: oh yeah I could use the merge button
02:35 YaLTeR[m]: <DemiMarie> "kchibisov: GTK3 used Cairo for..." <- Not sure if my messages come through here, but VTE (the terminal library) still uses cairo even on GTK 4 and has some questionable repaint logic on top, making it this slow
02:35 kchibisov: YaLTeR[m]: the thing is that other gtk apps were not that much faster.
02:36 kchibisov: But yeah, it depends on particular apps, I guess.
02:37 karolherbst: anyway... it seems like even on 4k and iGPUs gtk4 apps aren't necessarily slower or less smooth... so I suspect the sw implementation is just bad or something?
02:37 kchibisov: Could be something like that, pixman is ok on 4k from what I can tell.
02:37 kchibisov: Though, I use A380, which is a bit faster igpu.
02:38 karolherbst: forced sw rendering and it's not _that_ slow
02:39 karolherbst: it's not smooth, but also good enough so it won't matter
02:39 kchibisov: I just use bad gtk apps I guess.
02:40 karolherbst: well.. it burns 5 of my 12 cores though
02:41 karolherbst: anyway... cpu rendering is going to be slow
02:42 karolherbst: or maybe something else is going wrong
02:44 karolherbst: maybe it's also a xorg vs wayland thing or something
03:01 DemiMarie: karolherbst: I think the reason that almost nobody else worries as much as I do is because for the vast majority of systems it hardly matters
03:08 DemiMarie: It’s only with the work on virtio-GPU native contexts that one can have GPU ioctls exposed to a VM for example. And that is not even merged upstream yet.
05:27 YaLTeR[m]: Hi, so we're debugging the weirdest performance problem in Smithay involving overlay planes. It seems that on my RX 6700M laptop, AMDGPU is taking much longer to move an overlay plane to the right, compared to all other directions. Or something. Enough to start dropping frames left and right on a 165 Hz screen. Does this ring any bells?
05:27 YaLTeR[m]: There's a kworker that runs after submitting a frame to KMS, and it just starts taking 10x more time when a plane is moving to the right
05:28 YaLTeR[m]: The flamegraph shape for that kworker seems the same as normal, just takes longer
07:15 tzimmermann: airlied, hi. will there be another PR from drm-next to upstream before -rc1? i have a patch for a newly added bug in linus' tree
08:16 sima: tzimmermann, a -fixes pr in the 2nd week (or maybe two if there's something time critical) is the rule, first week of merge window is kinda like -rc0
08:16 sima: since linus doesn't appreciate if you send big stuff in the 2nd week
08:16 tzimmermann: sima, thanks
08:54 airlied: tzimmermann: yes what sima said, thu/fri this week
09:00 kode54: whoa
09:00 kode54: that Z range limit env var
09:01 kode54: I need to check if that fixes not just one, but two bugs
09:01 kode54: if that fixes anything, does it mean that it ends up in a future copy of the default mesa compat config file?
09:52 karolherbst: DemiMarie: yeah.. but it also doesn't seem to be a huge issue. I mean, there are other projects/users hit by this as well. Whenever you have delayed upstreaming of acceleration support, e.g. nouveau generally doesn't have day 1 sipport for GPUs. In my experience all that software fallback stuff isn't all that bad. You still notice it, but I wouldn't say that it becomes really slow. On very old hardware it might with not powerful CPUs.
11:57 zmike: karolherbst: ok
11:57 zmike: mareko: maybe once it's more robust
12:24 alyssa: DavidHeidelberg[m]: !25030 should have been git pushed to main, not Marged, please
12:25 alyssa: otherwise it's just been extending the length of the outage by $LENGTH_OF_A_CI_RUN for no good reason
12:27 alyssa: Perhaps we need explicit policy for this (trivial patches disabling dead farms / hardware due to unexpected outages/fails rather than planned maint, should be pushed rather than MR'd, at least if the person has been around Mesa enough to make that judgement call)
12:27 alyssa: eric_engestrom: and I have talked about this... The benefits really do outweigh the drawbacks. It just needs to be written down somewhere. And followed.
12:28 alyssa: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25027#note_2070221
12:28 alyssa: This is the delay I mean in particular
12:28 alyssa: If the "freedreno is dead MR" is instead git pushed, then there'd be no delay..
12:29 alyssa: git pushing does shoot down a single MR's CI. but if that CI was going to fail anyway (due to running the dead job), this is a net positive.
12:31 alyssa: [I realize that !25030 should have ordinarily been merged 20 minutes ago, and the only reason it hasn't is infra snafu that's causing the build jobs to timeout... but the point still stands. building mesa isn't instant.]
12:33 ccr: Quantum Mesa not here yet?
12:33 alyssa: ~that was a joke, ha ha, fat chance~
12:33 DavidHeidelberg[m]: alyssa: why?
12:34 dottedmag: DavidHeidelberg[m]: it was a reference to Portal's "Still Alive" song
12:35 alyssa: correct
12:35 alyssa: everyone before the last line was serious =)
12:36 alyssa: I see !25030 is now merged
12:36 eric_engestrom: +1 to the exception "experienced devs can direct push for emergency-disable of a farm that's unexpectedly down"
12:36 alyssa: which kinda just underscores my point
12:36 alyssa: by going through Marge for this one, turned an outage of length N into an outage of length N + 1 hour
12:37 alyssa: or at least, N + 40 minutes
12:37 DavidHeidelberg[m]: dottedmag: I meant the alyssa mentioning git push X Marge
12:37 alyssa: dottedmag: I think I did explain why?
12:38 alyssa: DavidHeidelberg[m]: ^
12:38 eric_engestrom: btw for the disable MR itself, I have a local change that is almost done to skip containers & builds, so that `sanity` is the only job that runs in farm-disable MRs
12:38 alyssa: TL;DR Forcing farm disables through Marge artifically extends the length of outages, increasing developer unhappiness.
12:39 alyssa: eric_engestrom: That would help! although it doesn't help the "force Marge to prioritize this MR" part
12:39 eric_engestrom: yeah I definitely agree :)
12:40 alyssa: and -- with my UX hat on rather than my dev hat on -- I don't see a reasonable way to teach Marge about priorities without it being abused
12:40 alyssa: The obvious developer answer is "a magic CI EMERGENCY label that forces Marge to pick emergency MRs over all over MRs regardless of assigned time"
12:41 alyssa: That has the extremely obvious UX problem that people will tag their broken driver patches with "CI EMERGENCY" on flaky CI days :~)
12:41 zmike: if people abuse it then their developer access can be revoked
12:41 zmike: not a hard problem to solve
12:41 alyssa: Don't get me wrong.. I am - overall - in favour of good CI. I am very aware of the crap I & others try to merge ;)
12:44 alyssa: The "unassign everything, then reassign" dance requires O(# of assigned MRs) human steps
12:45 alyssa: which -- again from a UX standpoint -- is problematic given that -- on days with a CI outage -- that # is presumably higher on average due to people repeatedly reassigning MRs patiently trying to merge
12:47 alyssa: Possibly the UX answer would be a script that does the "disable given farm/job and git push" as a single atomic operation?
12:47 alyssa: That hides the push, to prevent people from abusing it.
12:48 alyssa: and can ensure the change is good and that somebody won't clumsily git push a typo in the .yml and end up making everything worse ;)
12:48 alyssa: sounds like more engineering effort though..
12:59 eric_engestrom: yeah a script for 3 commands (`git mv`, `git commit`, `git push`) seems a bit overkill :]
13:02 karolherbst: alyssa: could require certain people to ack such MRs, but yeah...
13:02 alyssa: karolherbst: that moves the problem ... time spent waiting for an ack is time that ci is down for everybody
13:02 karolherbst: like have a group of 10? devs and if one of them approve the MR in the GUI, the label takes effect
13:03 karolherbst: but yeah...
13:03 karolherbst: I don't think there will be a technical solution to the problem that devs might abuse that power
13:03 karolherbst: we can only say: if you abuse it, your dev access gets removed
13:05 eric_engestrom: yeah I agree with that
13:05 eric_engestrom: basically "trust until there's a reason not to"
13:05 eric_engestrom: since this is among a pool of people who have already gained enough trust to get there
13:06 karolherbst: I think I see broken communication as the major reason it will cause issues, but...
13:06 eric_engestrom: but I think direct push is much simpler than teaching Marge about some magic labels and such
13:11 alyssa: eric_engestrom: karolherbst: my objection to a "CI Emergency" label is less about trust and more that its UX affordances are wrong... it's begging to be abused by well-meaning people who don't understand the finer points of policy
13:11 eric_engestrom: true
13:12 alyssa: the concern is good faith actors who will erroneously use the escape hatch when it isn't warranted, at risk to others
13:12 alyssa: not bad faith actors with merge access, they can already git push today
13:12 karolherbst: that's what I meant with broken communication
13:15 eric_engestrom: (the change for making farm-disabling MRs instant: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25032)
13:34 alyssa: eric_engestrom: nice!
13:58 dj-death: anybody managed to compile mesa with tsan?
14:23 alyssa:tries to figure out how to model non-SM5 shifts in NIR nicely
14:24 alyssa: Apparently it does matter for Dolphin fps.
14:37 dj-death: -ltsan fixed it :)
14:51 dj-death: arg
14:51 dj-death: looks like you can't post backtraces in the gitlab comments :(
15:01 zmike: yup
17:51 karolherbst: dj-death: you can't?
17:51 karolherbst: not even if wrapped in a code block?
18:50 dj-death: karolherbst: nope
18:52 karolherbst: odd
18:52 karolherbst: I'm sure I used to do that...
18:53 airlied: yeah I thought the old ``` used to work
18:53 karolherbst: dj-death: maybe some spam protection becuase there are a bunch of "references"
18:53 karolherbst: but anyway.. this used to work
19:49 dj-death: it used too indeed
19:50 dj-death: not anymore
22:10 alyssa: airlied: can you look at the llvmpipe fail on https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24965 ?
22:29 airlied: alyssa: I've seen a traces flake in the last few days
22:30 airlied: so I suspect it's just a infra or llvmpipe slowdown change
22:30 airlied: might want to drop those two traces
22:30 alyssa: ok
22:30 alyssa: it's entirely possible that my patch goes a slow down, though
22:44 alyssa: otoh.. not super likely either without second order effects
22:48 alyssa: the timeouts do seem to be repeateable https://gitlab.freedesktop.org/mesa/mesa/-/jobs/48512211
22:48 alyssa: I dont know what to make of that
22:48 alyssa: It's not a particularly spicy patch
22:48 alyssa: (now it's 3 traces..)
22:49 zmike: it's a shame trace jobs don't have per-trace timing info more readily available
22:49 zmike: koike: ^maybe another thing for the dashboard, visualizing trace runtimes to track which ones are timing out or near to timing out
22:52 alyssa: what really gets me is that it's a different set of traces timing out
22:52 alyssa: at least 4 different llvmpipe-traces timing out across different pipelines for that commit
22:53 zmike: cpu jobs don't have dedicated runners, so all that means is the cpus are under load
22:53 alyssa: right..
22:53 alyssa: I genuinely don't know if this is something wrong with my patch or not
22:54 alyssa: maybe it's causing spilling
22:54 zmike: have you tried running the traces locally
22:55 zmike: should be easy to time them before/after to evaluate
22:56 alyssa: fine i'll delete more code
22:57 airlied: yeah I doubt it gets any slower really
22:57 alyssa: If it's spiking register pressure => increasing spilling => LLVM path of doom? could be
22:58 zmike: it could be hitting one of those forever llvm optimizer loops
22:58 alyssa: i sure do love llvm
22:58 zmike: though you'd need to be testing on the same version as ci (14?) to know
22:58 alyssa:reconsiders her plan to make llvm a mandatory part of the mesa build process
23:07 alyssa:pushed a change that should avoid reg pressure regression, hopefully that fixes llvmpipe
23:08 psykose: love the smell of blown registers in the morning
23:30 psykose: what's next on the chart
23:34 zmike: cybersecurity mesh 🤕