IRC Logs of #freedesktop on irc.freenode.net for 2025-04-03

06:44 svuorela: I've gotten a feeling that gitlab has gotten slower ?
06:45 MrCooper: compared to a pre- or post-migration baseline?
07:04 svuorela: definitely compared to a post-migration baseline, maybe even compared to a pre-migration baseline.
07:05 svuorela: (I'm primarily in poppler if that makes a difference)
07:38 bilboed: hm... indeed
08:02 slomo: i think still a bit faster then pre-migration, but not as much as right after migration
08:13 bentiss: one thing that could explain, is the backups of gitaly that I only enabled last Sunday
08:13 bentiss: the daily backup takes 6h, and we are 4h12m in
08:15 bentiss: side note: I've enabled fastly for all *.freedesktop.org pages sites as of this morning. Of course I screwed up a bit the DNS, so if this is not working yet, wait a little bit more that the DNS gets cached properly to fastly
08:15 bentiss: (IOW, mesa.freedesktop.org is using fastly, mesa3d.org is not)
08:27 eric_engestrom: bentiss: womp womp... we need to add other tags to fdo runners, otherwise just `priority:low` gets picked by any runner that has that tag, such as... a steamdeck in mupuf's farm: https://gitlab.freedesktop.org/mesa/mesa/-/jobs/73883719
08:27 eric_engestrom: I think we need to haave the fdo runners register both the priority tag and an `fdo-runner` tag or something like that, and jobs needs to require both
08:30 eric_engestrom: (ci-tron jobs are fine because they always have a tag for the farm they run on, so they don't risk being picked up by fdo runners, it's only the other way around that's a problem right now)
08:32 bentiss: sigh, the tagging mechanism in gitlab is just shitty
08:32 eric_engestrom: yeah :/
08:34 eric_engestrom: I think my solution should work though, what do you think?
08:36 bentiss: I just checked, this runner from mupuf is the only one having priority:low (mupuf-gfx10-vangogh-1 and mupuf-gfx10-vangogh-5), so I wonder if we should not address that instead
08:36 bentiss: your solution works, but I feel like that's not the best
08:40 bentiss: mupuf: do you use the priority in mupuf-gfx10-vangogh-*?
08:42 eric_engestrom: yeah we use it, but we can rename it to eg. `ci-tron-priority:*`
08:43 eric_engestrom: or `ci-tron:priority:*` to be more in line with our other tags
08:49 eric_engestrom: bentiss, mupuf: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34358
08:49 eric_engestrom: (I haven't renamed the tag on the farm side, I'll do that when merging this)
08:50 bentiss: thanks!
08:59 mupuf: I proposed an alternative name
09:30 __tim: we've been seeing loads of "WARNING: Uploading artifacts as "archive" to coordinator... 500 Internal Server Error" since yesterday, is that related to the hetzner S3 problems or something else?
09:31 __tim: (and failed jobs/pipelines as a result)
09:41 mupuf: bentiss:
09:41 mupuf: is fastly still hammering our bandwidth?
09:42 mupuf: I keep getting KVM jobs to timeout, seemingly due to slow network but could also be insane CPU usage too
10:04 bentiss: mupuf: runner-x86-1 is pulling a lot of data from RIPE-ERX-146-75-0-0
10:05 mupuf: bentiss: any idea what this is?
10:05 bentiss: nothing seems abnormal on the runner
10:05 bentiss: android-ndk is running, maybe that's related
10:06 bentiss: qit eventually stopped
10:07 bentiss: s/qit/it
10:07 bentiss: mupuf: also one thing to remember, is that those runners now only have a single Gbit line, when the Equinix ones had a 10 Gbit (maybe dual)
10:08 mupuf: bentiss: ack, but it shouldn't be using *that* much network
10:08 bentiss: mupuf: link?
10:08 bentiss: __tim: yeah, hetzner is still having a little bit of issues with their object storage
10:09 __tim: "little bit" 😆
10:09 mupuf: bentiss: https://gitlab.freedesktop.org/samueldr/ci-tron/-/jobs/73885848
10:09 mupuf: this very same step takes less than 5 minutes on all the gateways we have, and none have as good a connection as one would expect from hetzner
10:10 mupuf: at equinix, it took less than 2 minutes
10:11 bentiss: I just don't know what I'm supposed to see
10:11 mupuf: it shouldn't be pulling much data at all, no more than 100 MB... most of it coming from a DNF update
10:11 mupuf: there isn't much to see, indeed
10:12 mupuf: maybe I could re-run the job and from there you could tell if there is a high cpu load or something?
10:12 bentiss: the pull, create, and init the container only took 32 secs, so I guess it's not a network issue
10:12 mupuf: yeah, but sometimes I saw that just revalidating an artifact (a HEAD request) would take over a minute
10:13 bentiss: artifacts is different, as mentioned above hetzner is having issues
10:14 mupuf: it's basically been ever since you moved the kvm runner, so that pre-dates the issues at hetzner
10:14 bentiss: k
10:33 mupuf: bentiss, eric_engestrom: the gitlab runner priority has a little bit of a bug
10:34 mupuf: no jobs of a lower priority will be picked up as long as there is at least a high priority job executing
10:34 mupuf: the script was designed for `parallel: 1`
10:37 mupuf: in other words, we should probably drop the parallel and register as many runners as we can run in parallel
10:44 eric_engestrom: ah indeed, good catch
10:45 eric_engestrom: I haven't looked at how bentiss integrated my code into the fdo infra; do you have a link I could look at?
10:51 mupuf: https://gitlab.freedesktop.org/freedesktop/helm-gitlab-infra/-/tree/main/cloud-init/ci-baremetal/files.d/etc/gitlab-runner?ref_type=heads is what I've found
10:51 mupuf: but not sure how this works
10:52 mupuf: https://gitlab.freedesktop.org/freedesktop/helm-gitlab-infra/-/blob/main/cloud-init/ci-baremetal/runcmd.d/70-prep-gitlab-runner.gotmpl?ref_type=heads seems like this is the invocation of the templates
10:56 eric_engestrom: and https://gitlab.freedesktop.org/freedesktop/helm-gitlab-infra/-/blob/main/cloud-init/ci-baremetal/files.d/usr/local/bin/gitlab_runner_priority.py for the imported script
11:30 bentiss: yep to all three files
11:31 bentiss: https://gitlab.freedesktop.org/freedesktop/helm-gitlab-infra/-/commit/0123a6a14bd806be7c1063955c9293db54ce2b3a is the commit for handling concurrency
11:32 bentiss: mupuf, eric_engestrom: IIRC I fixed that in the deployed version. Each runner has a concurrent variable set to the number of threads, and the commit from above ensures each thread is independant of each other
11:33 bentiss: https://gitlab.freedesktop.org/freedesktop/helm-gitlab-infra/-/blob/main/cloud-init/ci-baremetal/files.d/etc/gitlab-runner/config.toml?ref_type=heads for the deployed config
11:37 mupuf: bentiss: great!
11:38 bentiss: I just realized I promised eric_engestrom a MR with my changes... sorry
13:00 eric_engestrom: bentiss: no worries! I applied some already
13:01 eric_engestrom: the concurrent change didn't make enough sense to me so I didn't apply it back for now
13:01 eric_engestrom: also, there's `os.cpu_count()` instead of calling `nproc` :)
13:05 eric_engestrom: also, we might want a check that `cpu_count % concurrent == 0` to make sure we're not leaving some cpus unreachable
13:53 bentiss: eric_engestrom: sure, I'll take any upgrade in that script. It was kind of a "let get this thing working" situation
13:54 __tim: ooc, how does mesa get any MRs merged? does mesa not do any artefact upload/download?
14:00 bentiss: __tim: looks like they get a few merged regularly, so not sure if they have an issue
14:01 __tim: yes exactly, I'm wondering if we're the only ones having issues :) (esp since it's with our own runners)
14:01 bentiss: oh... link to a failed job?
14:03 __tim: artefact download on mac os runner (this did pass on the 15th retry though): https://gitlab.freedesktop.org/gstreamer/cerbero/-/jobs/73923885
14:03 __tim: usually it's the uploads that are failing
14:04 __tim: and yes, I guess that's the hetzner s3 issue, but why is mesa not so affected?
14:04 bentiss: maybe they use smaller artifacts?
14:05 __tim: maybe :)
14:06 bentiss: but yeah, right now, there isn't much I can do. Worse case we'll have to use a different bucket location, but that means we'd have to move all of the data first, which is a PITA
14:06 __tim: ouch
14:07 bentiss: the artifacts data, not the git data
14:08 bentiss: https://status.hetzner.com/incident/da6b6285-b8a3-450f-b54b-19849ee9a09e is still "investigating"
14:08 bentiss: I put the data there, to be closer to the machines
19:21 DemiMarie: Was anyone ever concerned about the security of Hetzner’s bare-metal offerings in light of hardware/firmware infection attacks?
20:32 pinchartl: DemiMarie: are you volunteering to go camp in the data centre to keep watch ? :-)
20:33 DemiMarie: pinchartl: what I mean is “was it wise to pick a bare-metal offering run by a not-that-high-end provider, as opposed to one of the big name vendors or a colo”
20:34 DemiMarie: https://eclypsium.com/blog/the-missing-security-primer-for-bare-metal-cloud-services/
20:35 pinchartl: are the big names inherently safer ? especially when considering that many of them are USA companies, and are covered by the USA cloud act ?
20:35 pinchartl: I don't think anyone can answer that question with any certainty
20:35 DemiMarie: More resources to spend on things like custom board designs
20:36 DemiMarie: I believe that generally the people who are really concerned about security go for colos
20:36 DemiMarie: or their own datacenters if the scale justifies it (which this does not)
20:37 pinchartl: it reminds me of https://xkcd.com/641/. do you pick the cereals guaranteed 100% free of asbestos, or the ones guaranteed 100% free of plutonium ?
20:37 DemiMarie: see above w.r.t. colos
20:38 DemiMarie: (read: giving up on cloud and using dedicated hardware)
20:38 pinchartl: I don't think fdo can afford building its own data centre indeed :-)
20:39 DemiMarie: I think the general rule is that if security is the top priority, you want to own hardware, not rent it
20:40 pixelcluster: honestly hetzner isn't exactly a no-name provider either is it
20:40 pixelcluster: this really seems like a "if it's so important to you, feel free to provide the resources to make it happen" scenario to me
20:40 pixelcluster:is not too involved in infra tbc
20:50 DragoonAethis: DemiMarie: Would you consider "Oracle" to be enough of a big name to trust?
20:51 DemiMarie: DragoonAethis: for me, “big name” in the cloud space means “AWS/Azure/GCP”, especially Amazon or Google
20:52 DragoonAethis: And all 3 of these options are at least an order of magnitude more expensive than what Hetzner gets you
20:52 DemiMarie: personally, I would have gone with a colo facility and buying servers from a vendor, but if fd.o doesn’t ahve the resources for that it makes sense why they had to go with a different option
20:52 DemiMarie: DragoonAethis: you get what you pay for in the hosting space
20:54 DemiMarie: my concern, of course, is that someone would target https://gitlab.freedesktop.org so they can backdoor Mesa or one of the other giant projects
20:55 vyivel: i would just pay someone to push vulnerable code
20:56 vyivel: sounds much easier
20:56 DemiMarie: vyivel: am I too paranoid?
20:57 DragoonAethis: DemiMarie: kinda?
20:57 DragoonAethis: This is a massive project that you would like to run at hyperscaler's levels of corporate security
20:58 DragoonAethis: Whereas the backend gets 3 part-time admins mostly trying to keep it held together with duct tape
20:58 vyivel: oh right bribing/blackmailing an admin is even "better"
20:59 pixelcluster: infecting a server with a baremetal malware to (I guess?) alter some files in the git repo honestly sounds like the most elaborate and expensive setup for the smallest possible result to me
21:04 airlied: indeed if you wanted to run a botnet on hetzner it might be okay or hoping someone with corp secrets would provision the same server after you, but for a server hosting open source git repos, probably not worth it
21:26 alanc: "if security is the top priority" - for fd.o though, security cannot be the top priority - something the org can afford (from both a monetary and admin time perspective) has to be the top priority, since otherwise the project is just dead
21:27 alanc: security is important, and a high priority, but at a level appropriate to the project, not excluding everything else
22:09 zmike: anyone know what's going on with CI jobs on https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34235
22:09 zmike: seems like trace jobs are having issues maybe?
22:58 robclark: for another example, https://gitlab.freedesktop.org/mesa/mesa/-/pipelines/1396465 .. the traces are not ok