IRC Logs of #freedesktop on irc.freenode.net for 2025-04-29

03:47 airlied: bentiss: just in case you missed it, gl down :-)
04:34 daniels: I bet it's Redis
04:34 daniels: wa-hey ...
06:13 whot: and it's back, whee!
06:46 bentiss: regarding redis.... I just don't understand: initial deployment on Hetzner: 4 weeks without trouble, then one full week, then 5 days, then 3 :(
06:52 colinmarc: their s3 product has also been unstable, right? maybe worth using aws for that stuff and peering?
06:57 bentiss: colinmarc: I don't think it's related to Hetzner the redis problem. Either a new kubernetes version enforces the 10 GB limit when it wasn't on the old version, either a gitlab upgrade made redis consume a lot more data
06:57 bentiss: for s3, I agree that it's not ideal. Though nuremberg seems to be less affected
07:03 jani: DragoonAethis: that's just silly imo
07:45 pq: bentiss, today I have created a MR and it was merged without any hickup. :-)
07:45 pq: thank you
07:51 bentiss: \o/
08:00 eric_engestrom: sergi, mupuf: I just realised I read my calendar wrong, and I can't make it to the meeting this afternoon :(
08:00 daniels: colinmarc: have you ever seen redis corrupting the AOF happily?
08:02 daniels: I was thinking this morning it would be nice to have a preflight script for the pod which would just check the db and nuke it if it was bad, but i couldn't see an arg to redis-server we could use for that
08:07 bentiss: daniels: before redis has its data corrupted, we get a disk full error
08:07 bentiss: so not surprising
08:07 bentiss: an "easy" solution is to re-enable persistent disks, but that means that if the data is corrupted, we'll have to manually wipe it, which is a PITA
08:23 daniels: we have to do that with what we have anyway
08:24 daniels: EmptyDir still persists with the pod, so redis was fruitlessly restarting and reading the same corrupted AOF
08:24 bentiss: yeah, the question is which size we need
08:24 bentiss: agree
08:25 sergi: eric_engestrom, mupuf: np see you next one, and talk here
08:25 bentiss: daniels: one supposition I had too was that when we get the disk full issue, it might be because of a different process eating all the ram, or too much of it, and redis can't write to what's left
09:42 eric_engestrom: bentiss, daniels: is the new gitlab ssh key documented somewhere? a colleague asked me and I have no idea
09:43 bentiss: eric_engestrom: it's shown everytime you hit on "Code -> Clone with SSH"
09:43 eric_engestrom: all I know about is https://gitlab.freedesktop.org/help/instance_configuration#ssh-host-keys-fingerprints but it's empty
09:44 eric_engestrom: oh, checking
09:44 bentiss: oh, sorry I misread
09:45 eric_engestrom: yeah, I was about to say that ^^
09:46 eric_engestrom: (I'm on train wifi right now btw, my messages might be sent late and I might drop out)
09:46 bentiss: they should be in the SSHFP dns entry
09:46 bentiss: (don't know how to retrieve them from that field though)
09:48 eric_engestrom: ah right
09:48 eric_engestrom: `ssh-keygen -r ssh.gitlab.freedesktop.org` fetches that
09:48 eric_engestrom: thanks!
09:50 bentiss: hmm... they might not be correct
09:51 eric_engestrom: hmm, the question is how to map from the record format to what ssh shows when connecting though
09:51 eric_engestrom: they don't look anything alike
09:51 bentiss: eric_engestrom: ssh-keyscan -D ssh.gitlab.freedesktop.org seems to show them based on the keysan it did
09:51 bentiss: but it seems they are wrong
09:51 eric_engestrom: ack
09:52 bentiss: daniels: should we update the DNS entries for the new SSHFP as reported by `ssh-keyscan -D ssh.gitlab.freedesktop.org`?
09:53 bentiss: eric_engestrom: and if that is correct, eventually you can add `VerifyHostKeyDNS yes` in .ssh/config and this should be good :)
09:53 bentiss: was looking at https://www.sindastra.de/p/3405/how-to-verify-ssh-host-keys-and-optionally-using-sshfp-in-dns
09:54 eric_engestrom: yeah, that would be good
09:55 eric_engestrom: I guess any migration procedure that involves an ssh server should include a check that `ssh-keyscan -D` and `ssh-keygen -r` agree (that is, the fingerprint from connecting to ssh matches the fingerprint published in the dns records)
09:59 bentiss: the weird part is I'm almost certain I kept the old keys around. So I might have messed up, but maybe SSHFP stores the DNS entry as well, or it was wrong for a long time
10:04 eric_engestrom: I take it that it's not possible to update the dns record automatically?
10:07 eric_engestrom: and what about the gl.fd.o/help page linked above?
10:08 bentiss: eric_engestrom: there are tool to automatically update DNS records, but I'll probably have to write one for SSHFP, and it'll be quicker to update by hand
10:09 bentiss: for the help page, no ideas why it's not showing the keys, but I can't tell you if it was the case previously
10:15 svuorela: if a project want to do some additional 'extra custom ci steps, maybe with own hardware', is there a how-to document or best practices or anything ?
11:11 __tim: heh, freetype is inventing its own ci-templates it seems :)
11:23 eric_engestrom: svuorela: depends mostly on the hardware you want/have and who physically handles it
11:30 svuorela: eric_engestrom: the custom hardware might just be some cloud data somewhere that needs access to non-public content (This is poppler (pdf library) render regression tests)
11:31 eric_engestrom: "somewhere" is kinda important here :P
11:33 eric_engestrom: but it sounds like maybe it's already exposed in some way; if it's as a gitlab runner, then you need its admin to register the gitlab.freedesktop.org instance, and then you can just add a job in your .gitlab-ci.yml like any other
11:33 eric_engestrom: if it's not a gitlab runner, there might be tools to connect anyway, depending on what it is
11:42 svuorela: why is somewhere that important? Isn't it the same if it is in a amazon cloud or in a machine under my desk ?
11:53 daniels: bentiss: yeah, updating sshfp sounds like a good idea
11:54 daniels: svuorela: like eric_engestrom says, what we do within the other projects is to register a new instance-wide runner with specific tags you can select to run on that. some of those are just Linux machines with magic things available (access tokens, network routes, etc), and some of those are raw hardware devices which we do full boots on
12:05 eric_engestrom: yep
12:05 __tim: do you actually want to run it for all MR pipelines, or would it be enough to cover post-merge / scheduled pipelines and MR pipelines of developers/maintainers? Because in that case it might just be enough to add some kind of env vars with some access keys for the non-public documents, right? (assuming all devs have access anyway)
12:06 eric_engestrom: svuorela: if you tell us a bit more about the machine you want to run on, we can help you more
12:07 svuorela: eric_engestrom: We are very early in the figuring out phase, but it might just present itself as a webserver unfortunately (it's something someone else is building ..)
12:07 svuorela: I'm still on a .. fact finding mission
12:07 eric_engestrom: haha
12:08 svuorela: trying to approach both ends (this and the people who want to build it) to figure out how to get the ends to actually meet
12:08 eric_engestrom: you always have the option to have a "normal" software job on a generic fdo runner, which just pokes the remote device via http
12:09 eric_engestrom: as long as the interface has a way to get progress, you can have the gitlab job poll on that
12:10 svuorela: yeah. that's also what is at least the fallback if I don't get better ideas from my fact finding mission
12:20 eric_engestrom: svuorela: well, the "best" option is to install gitlab-runner on that machine and `gitlab-runner register --url https://gitlab.freedesktop.org --tag-list some-tag-you-will-use-in-your-job ...` and then your job's `script:` will be exactly what you want to run on that machine, no need for wrapping it through an http interface or anything
14:31 robclark: anyone seeing issues with the new?merge_request pages returning 404 despite being logged in? This seems to happen if I access the page before logging in and then login. It's like something is getting invalidly cached?
14:36 robclark: and in fact if I rename the branch, I can see the new?merge_request page for that branch... this smells like some caching lolz
15:12 colinmarc: <daniels> "Colin Marc: have you ever seen..." <- that's a new one for me, and I've run redis in production a few times. but never on k8s
15:14 colinmarc: how does gitlab use redis? does it need disk persistence at all? is it just a cache or also queue etc?
15:14 daniels: it's just a lookaside cache, can be binned at will
15:15 bentiss: robclark: 404 should be cached for 10 seconds only in the latest deployments
15:17 colinmarc: daniels: then why not just turn off the AOF?
15:18 colinmarc: in my experience, redis is great when so long as you're not using disk persistence or clustering
15:30 robclark: bentiss: hmm, I waited for more than just 10sec