IRC Logs of #dri-devel on irc.freenode.net for 2023-03-11

00:04 daniels: anholt_: if you can add me somehow as an admin I’ll take care off them
01:19 edt: anyone know what has happened to mesa's stageing/23.0 branch? Lots of commits from march 1-8th are no longer there.
02:28 bnieuwenhuizen: alyssa: IIRC the container should be based on the tags I think, so the second time it should reuse it even with rebase
02:29 bnieuwenhuizen: MR and Marge might be different for security reasons though, no idea really
02:47 alyssa: bnieuwenhuizen: Weeee
02:49 alyssa: mupuf: https://gitlab.freedesktop.org/mesa/mesa/-/jobs/37881777 navi21 seems unhappy
02:49 alyssa: arm64 rootfs is still building somehow, wooof
02:50 alyssa: vulkancts has entered the chat
03:06 alyssa: ERROR: Job failed: execution took longer than 1h0m0s seconds
03:06 alyssa: oh you're kidding me
03:06 alyssa: 60 minutes later
03:07 alyssa: eric_engestrom: DavidHeidelberg[m]: Can one of you get !20553 landed? Thank you :)
03:07 alyssa: Merging container changes is above my CI-fu
08:09 mareko: is there a NIR pass that deduplicates identical loads?
08:16 mareko: hm, nir_opt_cse should do that in theory
08:31 eric_engestrom: alyssa: reassigned to marge, let's hope the container build takes slightly less than an hour this time? 🤷
08:32 eric_engestrom: but I've already encountered that, and imo we should bump the timeout for these jobs from 60 to 90 minutes or so
08:34 eric_engestrom: in 99% of pipelines these jobs take about 15 seconds, but in the 1% of container bumps an hour is clearly not enough
10:35 tomeu: have noticed that spirv-llvm-translator seems to be defaulting to double for floating point literals in opencl kernels
10:35 tomeu: which can be a problem on my HW, which doesn't support fp64
10:36 tomeu: does anybody knows why and if it is a bug?
11:23 daniels: eric_engestrom, alyssa: if the container’s already built, it doesn’t need to be rebuilt again
12:18 pendingchaos: mareko: nir_opt_cse works for ubo and read-only ssbos
12:18 pendingchaos: nir_opt_copy_prop_vars can also do that for deref loads, but I think slightly better in some cases (like writable ssbos and "vec4 a = var; float b = var.y" would become "vec4 a = var; float b = a.y;")
12:18 pendingchaos: for lowered loads, nir_opt_load_store_vectorize can combine intersecting loads in the same block
12:40 alyssa: eric_engestrom: sure, yeah
12:40 alyssa: daniels: so ideally we wouldn't be building the container in a marge context.. so IDK what went wrong here
12:40 alyssa: maybe because I had cancelled the earlier pipeline and the image didn't get cached or something like that
12:41 alyssa: (I cancelled a lot of pipelines in an effort to conserve resources, lolz)
12:41 eric_engestrom: daniels: sure, but it does need to be built a first time, which is where the 1h timeout is too short
12:42 alyssa: eric_engestrom: nominally you can do that with a manual run before assigning to Marge, so you don't hog up the Marge queue
12:42 alyssa: (since then the first container build can run in parallel with regular MRs)
12:42 alyssa: IDK why that didn't work here
12:42 eric_engestrom: but that'll be in the fork's namespace, not mesa, so the image won't be available in the mesa context
12:42 eric_engestrom: (I think?)
12:43 alyssa: right..
12:43 alyssa: does Marge build in fork namespace too?
12:43 alyssa: if not it would presumably be rebuilt post-merge
12:43 alyssa: which I guess defeats the point
12:43 eric_engestrom: marge runs the merge pipeline in the mesa namespace I think
12:44 daniels: yep, that's right
12:46 alyssa: ah
12:47 alyssa: yes, I see the problem then :|
12:47 alyssa: so bumping the timeout to 90min++
12:49 daniels: yeah, we did think about that, but a pretty consistent complaint is that the biggest issue is throughput and the inability for stuff to complete quickly
12:49 daniels: so I've not wanted to push the timeout too much higher given that
12:49 daniels: (the rootfs build also doesn't usually take more than 60min last time I checked ...)
12:51 alyssa: the arm64 rootfs build took like 60 minutes and 15 seconds or something
12:51 alyssa: but also Marge gave up hope for that MR a long while before
12:52 alyssa: so you need two roundtrips through Marge to get a container change in anyway
12:55 tomeu: today I had to rebuild the arm64 rootfs and was slower than I remembered
12:56 tomeu: timed out and had to increase the timeout value in my fork
13:00 daniels: perhaps it's got much slower than the last time I looked then, but it used to be like 30min
13:00 daniels: what I'm saying is that before we allow every pipeline to take 90min (potentially making throughput, a really bad issue, worse), we should understand if this is a global problem or if it's something that happened to someone once
13:02 alyssa: daniels: I think there are 2 distinct problems here
13:02 alyssa: One is that the container/rootfs *jobs* have a per-job 60min timeout, that they might not complete in
13:03 alyssa: The other is that the whole *pipeline* for Marge has a 60min timeout
13:03 alyssa: If we bump the job timeout but not the pipeline timeout, then at least the container builds will finish long after Marge has gotten bored and moved on
13:03 alyssa: and that way when you reassign the MR a second time, the image is cached so it'll get merged quickly (hopefully)
13:04 alyssa: and that won't affect throughput for normal pipelines because the pipeline timeout is the same and normal pipelines finish their container jobs in seconds (and if the container job for a regular MR is hung, whether it's hung for 60min or for 90min before timing out seems immaterial because Everything Is Broken at that point)
13:04 alyssa: having to take two roundtrips through the Marge queue to merge a container change is silly, don't get me wrong, but at least it'll work
13:05 alyssa: right now with a per-job timeout of 60min on the arm64 lava rootfs, and that taking sometimes a hair longer, it just can't be done
13:05 alyssa: (reliably)
13:06 daniels: sure, make it 90min
13:07 daniels: I have no idea what's been added to the jobs that they now take that long, but I guess that boat's sailed
13:09 alyssa: in my case, clang-format? :-p
13:11 daniels: heh, more llvm is always helpful
13:12 alyssa: alyssa: no llvm in my mesa tyvm
13:12 alyssa: also alyssa: thou shalt use clang-format
13:21 zmike: maybe we should be integrating chrome builds somehow
13:21 tomeu: with android, please
13:21 ccr: and firefox, for good measure.
13:22 alyssa: ccr: nah, we can compile firefox to wasm and run it inside the android chrome build
13:23 alyssa: more secure that way
13:23 ccr: ahh. excellent.
13:23 ccr: turtles.. I mean browsers all the way down.
14:38 alyssa: "Possible inputs: A period of time written in natural language."
14:38 alyssa: i'm sorry gitlab but this is terrible docs
14:59 psykose: does it recognise 'a fortnight ago'
15:00 daniels: alyssa: ‘90m’ or ‘1h30m’ are both valid
15:01 alyssa: daniels: yeah, but eric_engestrom put "90 minutes" and I did a doubletake on whether that was valid
15:01 alyssa: psykose: that's the problem with such docs, the spec says it really should ;~P
15:02 alyssa: "une heure" that's natural language
15:02 alyssa: well, language naturelle
15:02 psykose: free-range organic language