IRC Logs of #dri-devel on irc.freenode.net for 2023-10-09

00:45 kurufu: i havnt had luck running games through render doc, usually i use gfxrecon to capture games. Which afaik doesnt disable any extensions.
00:47 kurufu: (and then push the replay through renderdoc)
01:32 Company: I haven't tried it outside of my app, so I'm afraid that's all the info I have
06:10 airlied: zmike, konstantin : https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25609 okay I think I've covered it
06:23 airlied: zmike: could we rename zink-lvp job to zink-lavapipe, so I can write a regex for the ci run script?
06:41 DavidHeidelberg: Rather unify all jobs from lvp to lavapipe or otherway :)
06:46 airlied: DavidHeidelberg: I want all to match on .*l.*pipe :)
06:46 airlied: because I'm crap at writing regexps that involve two completely different patterns
08:17 DavidHeidelberg: airlied: ".*(lvp|lavapipe|llvmpipe).*", but I agree that on first sight, lvp doesn't reassemble lavapipe that much
08:18 DavidHeidelberg: I think the original (and ongoing :/ ) reason for this is character limited gitlab UI presentation of job names
09:08 MrCooper: immibis: core Wayland has always been CSD only, since long before GNOME had any Wayland support
09:08 MrCooper: SSD is an optional extension
09:08 MrCooper: which would be difficult to support in mutter
09:09 emersion: no need to start that discussion again…
09:09 MrCooper: just setting the record straight, no intention of discussing it further
12:16 tomeu: anholt: what testing framework would you recommend to use with deqp-runner if I was to write new tests from scratch?
13:42 zmike: DavidHeidelberg: what is the ci script that starts the xserver instance in ci?
13:50 zmike: I guess it's the aptly named "init-stage2.sh"
13:56 kisak: eric_engestrom: morning, https://cgit.freedesktop.org/mesa/mesa/commit/?id=e42c5b86d0f7fccf3c3866b1452309ad65833b4b caught my eye. Around the branchpoint there's a window of time where merge requests are getting marged, but the submitter of the merge request hasn't mentally acknowledged that the new release branch exists and should be notated yet. A nice to have extra for backport-to: to also nominate
13:56 kisak: the N-1 branch marked commits for the new release branch between the new branchpoint and maybe XX.Y.0 ... but that's probably an annoying timeframe to turn into code.
14:01 DavidHeidelberg: zmike: or .gitlab-ci/common/start-x.sh
14:04 kisak: Hypothetical, maybe Backport-to: could accept a + marking to be everthing newer than branch to cover that scenario? Backport-to: 19.2+
14:06 kisak: (the intent is to cover the common usage of Cc: mesa-stable so that it can be removed without a functional loss)
14:35 eric_engestrom: kisak: the `+` is a good idea, but actually I don't there's ever a case where you *don't* want the + behaviour, so I think I'll make it always work like this
14:36 eric_engestrom: (I'm on holiday today, I'll do that when I'm back)
14:38 kisak: yeah, no hurry
15:23 DavidHeidelberg: XDC reminder: we're organising small hack-weekend in Barcellona, so far only focused on CI, but we welcome any Mesa3D folks to join :) if anyone wants to join also one room is still available in the accomodation :)
15:25 zmike: DavidHeidelberg: is there a way to get the xorg log off ci? I'm trying to add some startup logging but none of the prints show up anywhere
15:26 DavidHeidelberg: what do you mean? Xorg usually logs loading if I'm not mistaking
15:26 zmike: the log isn't preserved in artifacts
15:26 DavidHeidelberg: yup
15:27 DavidHeidelberg: mv /Xorg.0.log /results/ or something like that before the job end should do it I guess
15:27 DavidHeidelberg: or just change the path in .gitlab-ci/common/start-x.sh and stage2 to results/Xorg.0.log
15:35 gfxstrand: I don't know how to manually kick off CI anymore...
15:35 gfxstrand: How many manual jobs do I have to run?!?
15:46 gfxstrand: IDK what motivated the recent re-structuring of CI jobs but it's made CI utterly useless for developers tryint to run CI on MRs.
15:48 gfxstrand: To be clear, before all I had to do was kick off x86_64-build_base, arm-build_base, and x86_64-test_base, and I'd get CI. Now I have no clue how to run CI. I keep starting misc jobs but it's not at all clear how to actually kick it of.
15:48 dj-death: similar feeling here
15:48 zmike: ci_run_n_monitor.py with a glob ?
15:49 gfxstrand: Uh, that's a thing?
15:49 dj-death: that script doesn't work for me
15:49 dj-death: it creates a pipeline but doesn't start anything
15:49 zmike: works fine for me and has been since forever
15:49 zmike: the only caveat is you can't start it before the job is extant
15:49 dj-death: it has undocumented dependencies too, you have to install packages but you don't know what versions you need
15:54 gfxstrand: And a LOT of those deps aren't in Fedora
15:54 zmike: I'm using fedora 🤔
15:58 robclark: gfxstrand: ci_run_n_monitor.sh does pip stuff to deal with the dependencies
15:58 gfxstrand: robclark: Oh, okay. That helps
15:59 robclark:was in same boat
16:00 gfxstrand: Still not a fan of personal access tokens but I guess there's not much to be done about that.
16:06 anholt: gfxstrand: yeah, I really dislike how CI has been recently changed to remove the ability to just click run on container jobs. I use ci_run_n_monitor all the time, but I don't want to have to pull it out and construct a glob every time when I just want to pre-review CI run someone's MR.
16:07 gfxstrand: and IDK what I'm even globbing
16:07 gfxstrand: like --target "zink*" doesn't do anything
16:07 gfxstrand: --target anv-tgl works
16:07 anholt: gfxstrand: sorry, regex not glob
16:08 zmike: zink.* ?
16:08 gfxstrand: Yeah, --target "zink.*" doesn't work, either
16:09 anholt: it gives you a link to the pipeline, are there zink jobs in that pipeline?
16:09 gfxstrand: Wait, what?!? Now everything is cancelled?
16:13 gfxstrand: Okay, I think I have it all running now
16:19 gfxstrand: IDK why it sets everything not in the glob to cancelled. That seems like an antifeature
16:25 zmike: DavidHeidelberg: followup: why are none of the deqp logs from https://gitlab.freedesktop.org/mesa/mesa/-/jobs/50060636 available as artifacts?
16:37 zmike: actually I don't have log files for any of the failing jobs 🤔
16:41 anholt: gfxstrand: my guess is "some stuff would end up running automatically if it wasn't canceled, so just cancel everything because that's easier"
16:42 gfxstrand: Yeah, probably.
16:44 anholt: zmike: guess the log saving isn't happening for the special missing case.
16:44 zmike: I think I managed to repro locally, but it was surprising
16:48 gfxstrand: Oh, good, asan tests are failing in code I didn't touch...
17:33 gfxstrand: Apparently, --target ".*" doesn't work.
17:47 airlied: can you get jobs to run after they have been cancelled?
17:47 gfxstrand: IDK. Not easily?
17:49 airlied: that annoyed me because my regex guesses were missing jobs
17:49 airlied: so i wanted to just hit a couple of others
17:50 anholt: I've had success clicking the button in the job's page. but it's irritating.
17:50 anholt: !25473 may help with some of the frustrations here
17:54 gfxstrand: "run cancelled jobs if they are in the targets list"
17:54 gfxstrand: That sounds helpful
17:57 airlied: yeah tokens also suck when you develop on 5-10 machines
17:58 airlied: but i can deal with that pain
19:47 alyssa: how do I trigger a manual CI pipeline running whatever marge will, but not e.g. nightlies?
19:48 alyssa: for an open mr
19:56 daniels: anholt: if anyone uses ci_run_n_monitor on stable branches, the post-container jobs are all on_success, so you need to cancel the others so you don’t cascade job starts down
20:26 gfxstrand: alyssa: ci_run_n_monitor.sh (not .py, the .sh one does python magic for you)
20:27 gfxstrand: alyssa: I just learned about this a few hours ago
20:52 alyssa: ....sh?
20:53 alyssa: i don't see what that fixes
20:54 gfxstrand: It invokes pythonenv and pip and stuff to make sure you have the dependencies
20:54 alyssa: that's not the problem
20:55 alyssa: it's what to pass to it to run the premerge
21:03 gfxstrand: IDK. I did --target 'anv.*|zink.*|radv.*|a.*_vk' and got a decent selection.
21:04 dcbaker: gfxstrand: I'm trying to accelerate the cargo patches so we can have them in 1.3.0. Whether that will branch in time is another question. I personally don't hate the idea of having the wraps in tree, at least until we decide that enough people have a new enough Meson?
21:04 anholt: gfxstrand: the problem is that that also kicks off the nightly jobs that take forever.
21:05 gfxstrand: dcbaker: Once it's in a meson version, I'm happy to hard-require that version for NVK.
21:05 dcbaker: gfxstrand: I think we'll want to have Meson add the cargo dependencies into our artifact tarballs anyway (which it can do), so that no one has to have an active internet connection to build a Meson tarball
21:06 gfxstrand: That would be neat
21:07 dcbaker: I reviewed the first part of Xavier's work today, the only thing that was major in it is that we've abstracted the rust crate information a bit since he reworked my patches, so that we can correctly handle building a static lib and a dynamic lib at the same time (we've gone to a rust_abi flag, and made proc-macro it's own thing so that we can enforce that you're cross compiling proc-macros the right way)
21:07 dcbaker: I don't think that will take too long for him to fix
21:07 gfxstrand: Cool. Yeah, I added notifications to both of his MRs so I saw your comments fly by.
21:08 gfxstrand: dcbaker: I also need the features PR for proc_macro2
21:08 dcbaker: Yeah, I have his second series on my todo-list to look at. I'm just sorta neck deep in teaching llvm's build system about pkg-config and meson about said pkg-config...
21:09 gfxstrand: Oh my...
21:09 gfxstrand: Good luck! (You're gonna need it...)
21:11 dcbaker: I've got it working correctly in about 33% of cases I think (although that's not to say that it's in a shape that it could land...)
21:11 dcbaker: they apparently want to drop llvm-config, and that's a bit of a problem for anyone who wants to consume llvm and isn't using cmake...
21:12 airlied: the whole linux distro world?
21:12 dcbaker: lol, yeah
21:13 dcbaker: among such notable projects: Mesa and PostgresSQL
21:14 gfxstrand: Isn't that kind-of on them to sort out?
21:21 gfxstrand: But anyway, I can't use meson's crate support until I have features because proc_macro2 and friends have quite a few of them, some of which I need to be able to turn on for stuff I'm using in NAK.
21:21 gfxstrand: So it looks like we'll be using wraps for a bit.
21:25 dcbaker: gfxstrand: should they sort that out? yes. Will they sort that out before they break the entire ecosystem and leave me trying to figure out why meson's cmake dependency system doesn't work right in some strange corner cases and pull my hair out for months before writing pkg-config files and then still pulling my hair out for months because there's at least one major version of LLVM that is really hard to use without using cmake? probably
21:26 gfxstrand: dcbaker: lmao, fair
21:28 dcbaker:remembers when LLVM dropped autotools support and then everyone found out that basically all of Linux was using exclusively autotools and things like symbol versioning didn't work with cmake...
21:29 ccr: \:D\
21:40 airlied: yeah cmake is not well tested in the multi-llvm versions + cross compile stuff at all
22:48 alyssa: I want to preface this saying that I'm good at chaos
22:49 alyssa: So if I were being paid by a billion dollar company to disrupt an upstream project, an effective strategy would be burning out the top developers until there's nobody left to improve thingd
22:50 alyssa: Slow people down, frustrate people, argue back when they protest, until finally one by one they leave "on their own terms" because they realize that there's no point to staying
22:51 alyssa: But of course, doing that would meet resistance. The way to succeed would be coating in the name of progress. Instead of "think of the children", be able to push back any protest with a "think of our users"
22:51 alyssa: Since the project presumbably values correctness & their users, an effective way is to target testing.
22:52 alyssa: Nobody is allowed to say no to more testing, right? think of the poor users, lest bugs happen
22:52 alyssa: So with the financial backing, I would target testing. Make it terrible, make the testing so terrible that nobody can get their work down. Make it bad enough to give stomach aches to the top devs.
22:53 alyssa: And I'd make it mandatory, so that anybody who dares bypass the shibboleth is threatened.
22:53 alyssa: There would be no consequences for me breaking the developers. but there would be consequences for breaking the code
22:54 alyssa: I would have testing that doesn't work and that I know doesn't work, but that looks plausible. and if anyone protested, I would argue back until I win by default, because I'm being paid to fight this and they're being paid to do productive work and so I can shout louder and longer than they can.
22:55 alyssa: But here's the kicker.
22:55 alyssa: You don't need a bad actor.
22:55 alyssa: You don't need malice.
22:55 alyssa: You don't need to be trying to disrupt a project
22:55 alyssa: You don't need to be trying to burn people out.
22:57 alyssa: You can be well-intentioned but as long as you disregard the externalities -- disregard the harms you're doing to developers that are only incidental to your ostensibly good goal -- what you're doing is hard to distinguish from the bad actor
22:58 alyssa: I'm told that things are getting better. The reality is that every time I come back to upstream mesa, CI is somehow in worse shape than it was last itme.
22:58 alyssa: to the point where I can't do my job
22:59 alyssa: to the point where I'm forced to fork or switch to working on other projects instead
22:59 alyssa: and I'm not the only one
22:59 alyssa: I have no real power here
22:59 alyssa: I can't stop what's happened to mesa
22:59 alyssa: I can't get the project back
23:00 alyssa: I know that -- my health being what it is these days -- when you come angrily replying to me, that you'll be able to type a response longer and larger than anything I can, and that I will be too exhausted to reply in kind, and you'll win by default
23:01 alyssa: and you'll win
23:01 alyssa: and mesa will lose.
23:01 anholt: you do, in fact, have real power here. you can write MRs and review MRs related to CI. I agree that there's a problem, and in my view most of the problem comes from having a group of CI developers who are not driver developers. Back in my day we had Mesa testing being driven by Mesa developers, but driver developers quit doing that work because it was hard and no fun. But we needed testing. So people got hired to do that work, except
23:01 anholt: that they don't see the problems it causes to developers because they're just trying to do their jobs which is not driver dev.
23:02 alyssa: It's frustrating to see so many talks at XDC this year talking about how great mesa ci is, and if only we would expand coverage further
23:02 alyssa: but the reality is that the current state is worse than it was last year
23:02 alyssa: and I'm out
23:02 alyssa: I'm sorry but I can't do this anymore
23:02 anholt: I am also really grumpy at the state of CI. I'm on calls weekly complaining about the situation. I participate in MRs and poke holes in how it's going to break driver dev. But I wish I didn't feel alone in that.
23:02 anholt: s/weekly/biweekly/
23:07 zmike: ci is definitely better now than it was a month or two ago when every job was failing
23:07 Company: if you wanted to force things, you could just agree to work on a mesa-next or mesa-staging branch where all the code goes that doesn't pass (enough of) CI yet
23:08 Company: that's kinda what happens when stuff gets too big - like, Linux and Mozilla have those release branches that feed from whatever the -next branch is
23:09 alyssa: Company: that's effectively where i'm at
23:10 Company: the tricky part is that you need people who actually do the release engineering and merging things from -next into -release
23:10 anholt: Company: we don't have releng, though. we can barely get releases out the door as is, where releasing is theoretically just wait a while to catch any remaining regressions and then make a tarball.
23:10 dcbaker: and CI actually is a big contributor to slow release process
23:11 dcbaker: I don't get tagged to pull patches that turn off known dead machiens
23:11 dcbaker: I don't get tagged to turn them back on
23:11 dcbaker: Some tests don't run and it's not clear if they're being disabled by design or if there's something wrong
23:12 dcbaker: patches get tagged that apply cleanly but cause regressions, and then the maintainer has to figure out who to ask, or try to figure out if there's something else (say in the original series) that is needed
23:13 dcbaker: I can't speak for eric_engestrom, but CI turnaround is long and I often pull a bunch of patches say first thing in the morning, do a local build test, and send them to CI, then get into something more interesting/pressing and don't get back to looking at those CI results for 4 hours
23:16 anholt: dcbaker: I agree, current "CI is on fire" issue is hour-long pipelines. we were supposed to be holding ourselves to "10ish minute turnaround on HW jobs for the whole capacity of a farm", but everyone's slipped on how long 10ish is, plus automatic retries were added instead of bottoming out instabilities, then people added automatic retries on the automatic retries, and that plus higher overall load on the farms from more users (more mesa
23:16 anholt: devs, plus DRM CI, plus the --stress tool etc.) means that we need to crank down our usage.
23:16 alyssa: i recall being told recently that 20min is acceptably close to 10min and, no.
23:16 anholt: alyssa: /o\
23:17 alyssa: it's the externalities that get me though
23:18 dj-death: and cts grows fast too :(
23:18 alyssa: and i guess me working on common code is what burned me fastest.
23:18 alyssa: because i got to run thru everyone's ci and wow
23:19 anholt: alyssa: common code also was awful pre-CI, because you instead got to wait for intel and amd and etc. to manually run your code for you (or have a room full of machines you ran it on yourself), then also remote-debug with someone when you landed regressions anyway and the release was blocked on you.
23:20 alyssa: yeah, fair. no winning there.
23:21 dcbaker: and I'll be fair, it is nice that we have a lot less regressions in stable branches, but if I could ask for one thing from CI it would be to bring down the runtime, and to have a better way to tag CI stuff that needs to be pulled back to stable branches
23:21 Company: alyssa: I remember watching a talk a while ago of some driver guys and loving the fact that they got new features in common stuff enabled - so I would guess it's better than before?
23:22 anholt: dcbaker: are you watching the CI label in gitlab?
23:22 anholt: (seems like for farm enable/disables and stuff that would be the way)
23:22 alyssa: tangent, who is responsible for microsoft with jesse OoO
23:23 dcbaker: No, but I should. Maybe I can update the pick script to look for things labled for CI
23:23 alyssa: spirv2dxil job failin and I don't even have a windows machine
23:23 zmike: alatiera in #freedesktop I think
23:23 alyssa: zmike: it's a real fail from my patch I just can't really debug myself
23:24 zmike: relatable
23:24 alyssa: probably preexisting bug
23:24 zmike: surely
23:27 Company: that reminds me: is Mesa generating better shader code from GLSL than from spirv?
23:28 Company: because I have the ame shader code essentially and when I have benchmarks where complex fragment shaders are the bottlenecks, Vulkan is the one with lower fps
23:28 Company: where "Vulkan" means my Vulkan stuff and my GL-over-zink
23:30 anholt: Company: I would expect equivalent shader code on radv and freedreno. I'm a suspicious of the intel compiler for vulkan but don't have any hard evidence.
23:30 anholt: (radv and turnip vs radeonsi and freedreno)
23:30 Company: I'm on radv
23:31 Company: I'm suspicious because I used glslc with -O and that spirv resulted in massively worse shader code
23:31 anholt: we've got a lot of zink-on-radv vs radeonsi perf hits in our traces collection, but I haven't gone digging into them.
23:32 anholt: by "worse shader code" you mean shader-db reports from the driver, or something else?
23:32 alyssa: zmike: i mean it works on everything else and the diff looks right
23:32 bnieuwenhuizen: anholt: on AMD we have the ACO vs. LLVM thing going on
23:32 alyssa: d3d12 job passes
23:32 alyssa: just not spirv2dxil units
23:32 bnieuwenhuizen: which might actually matter for shader perf
23:32 Company: I mean I had some dumb ubershader test in my code that I benchmarked that took 2s on radeonsi and 6s on radv
23:33 Company: and after I removed the -O (for optimize) from glslc it took 3s
23:33 Company: zink took 3s, too
23:34 pendingchaos: if no one can look at the spirv2dxil fail soonish, then maybe you could just disable the job
23:34 pendingchaos: I assume spirv2dxil is a command line tool, anyways?
23:34 pendingchaos: the comment at the top of spirv2dxil.c says it's for testing
23:35 alyssa: tempting
23:36 pendingchaos: maybe it's possible to compile the tool on linux and reproduce the assertion failure
23:38 anholt: or update the xfails and file an issue?
23:44 pendingchaos: that's probably a better idea
23:48 airlied: alyssa: maybe you just didn't realise how bad things were before we had CI
23:48 airlied: that you think this is worse
23:48 airlied: like it's bad, but it's in no way worse than the mesa pre-CI
23:49 airlied: let me merge my regressions faster because I'm an experienced developer isn't the argument that will move the needle on this
23:52 airlied: you probably had a nice time living in drivers which weren't central to the world, but dealing with the core and not regressing one of the major drivers is hard
23:53 airlied: you either wait for CI or you have to wait for as anholt said approvals and testing from amd, intel, zink, llvmpipe etc
23:54 airlied: like the run n monitor changes are there because people complained CI was overloaded, so instead of letting everyone just click go on all the pipeliens slowing down merges, you do some targetted pre-merge testing