00:29 Wally: Do we use this? https://nvidia.github.io/open-gpu-doc/
00:30 Wally: [I assume so]
00:33 airlied: not sure how used it has been for development, definitely referenced later to confirm reverse engineering results
00:34 imirkin: yeah, it's mostly been post-facto
00:48 anholt: has anyone here used a tk1? got a kernel command line or kernel image I could copy that would get me serial console?
02:33 imirkin: anholt: i've used it, but it dies whenever i use network heavily
02:33 imirkin: dunno if it's my unit, my configuration, or what
02:33 imirkin: works fine with l4t though
02:35 imirkin: let me see if i can find something
02:42 imirkin: anholt: i have an ancient thing from gnurou who helped me get it going. happy to share. basically a script which takes a u-boot bin + some stuff + dtb. i think i was having trouble with getting the installed u-boot to boot off tftp or something? and was afraid of flashing things.
03:44 anholt: imirkin: would take whatever you've got. I'm using the last l4t release for tk1, and whether it's their kernel or mine, ttyS0 or ttyS3 or earlyprintk + earlycon on any of the serials, I get nothing after "Starting kernel..."
03:44 imirkin: anholt: ah, well at least my thing boots
03:44 imirkin: and definitely l4t worked just fine
03:44 imirkin: wait, are you saying that the "built-in" l4t doesn't work?
03:45 imirkin: iirc it just boots and dhcp's, so you can ssh in
03:45 anholt: well, I flashed 21.8 because I was trying to work toward getting tftpboot set up
03:45 anholt: I did see kernel serial console with what was stock on the boards
03:45 imirkin: ok. well let me tar this thing up
03:47 imirkin: anholt: https://people.freedesktop.org/~imirkin/tegra/
03:47 imirkin: this came from gnurou
03:47 imirkin: i have some local modifications too, but i think you should start from his thing :)
03:47 imirkin: his sample cmdline was "./boot-kernel.py PM375_Hynix_2GB_H5TC4G63AFR_RDA_924MHz.bct /path/to/zImage /path/to/tegra124-jetson-tk1.dtb"
03:49 imirkin: this makes use of the tegrarcm tool
03:50 imirkin: which is sorta like fastboot i guess
03:52 anholt: hmm. doesn't look like I have a tegrarcm with those args.
03:53 imirkin: grab a tegrarcm-du-jour? i think the day was 2015 or so?
04:03 anholt: oho. I need to set fdt_addr_r to make space for the kernel
04:04 anholt: except that my "I'll just stuff it in my boot script I supply to flashing" didn't actually take.
04:05 imirkin: i can also give you the current state of that directory. i've hacked up that boot-kernel script tremendously (i suspect i just used it to save some outputs) and a bunch of various img files which presumably didn't work extremely well. not sure how helpful it'd be.
04:08 imirkin: anyways, i gtg, and shut my box off (scheduled power outage). good luck!
17:03 glennk: karolherbst, which kernel did you run with nv4x? anything much past 5.4 runs into random vm errors and lockups
17:04 karolherbst: ehh
17:04 karolherbst: 5.18 just works perfectly fine for me
17:04 karolherbst: well
17:04 karolherbst: besides random issues
17:05 karolherbst: but I might have tried to run with wayland, which doesn't seem to work out all that great :D
17:09 glennk: worked well enough on gnome/wayland on 5.4 for me
17:10 glennk: but any kernel past that was just a lot of random rendering errors and lockups
17:10 karolherbst: weird
17:10 glennk: and the bisect was basically "well we refactored ttm and nouveau all at once"
17:10 karolherbst: classic
17:11 karolherbst: speaking of ttm, I found a workaround for the stalls on nv50: https://gitlab.freedesktop.org/drm/nouveau/-/issues/156#note_1385770
17:11 karolherbst: :(
17:12 glennk: so basically if engine is idling it takes too long to ramp up and complete before the fence runs out?
17:12 karolherbst: I have no idea
17:12 glennk: s/idling/at idle clocks/
17:13 karolherbst: just that waiting on the fence explicitly makes it work, that's all I know
17:13 karolherbst: ttm does call into dma_fence_wait, but that simply times out
17:13 karolherbst: anyway.. I wouldn't be surprised if our ttm integration is all busted in weird ways
17:13 glennk: isn't that just timing out in itself and then the next wait succeeds?
17:14 karolherbst: the next wait also times out
17:14 karolherbst: the annoying part is, it doens't happen to all.. also if the GPU is quick enough this isn't a problem either
17:15 karolherbst: I wouldn't be even surprised if that helps on nv40
17:15 karolherbst: the main issue here is, that if the dma_fence wait fails, we fallback to software
17:16 karolherbst: and I suspect this messes things up, because on the GPU side, everything still runs or something
17:16 glennk: only sw fallback on nv4x is the T&L path
17:17 glennk: or if the engine can't init at all you get softpipe/llvmpipe at context init time
17:17 karolherbst: I meant on the kernel side
17:17 karolherbst: so we do a ttm memcpy thing, which does mapping/unmapping and a CPU memcpy
17:17 glennk: right but it doesn't notify any existing contexts about that afaik
17:17 karolherbst: dunno
17:17 glennk: so they just try submitting and stuff hangs
17:18 karolherbst: well
17:18 karolherbst: this happens on the drm channel
17:18 karolherbst: at lest on nv50 it did
17:18 karolherbst: but yeah.. applications just appeared to be frozen with that
17:18 glennk: i think nvc0 has slightly more robust recovery
17:19 karolherbst: it's not about recovery though
17:19 glennk: but <= nv4x its reboot time when that happens
17:19 karolherbst: oh wow
17:19 karolherbst: all I know is, that my nv4x gpus are booting fine
17:21 glennk: they tend not to reflow themselves on fan failures at least
17:22 glennk: but the dinky spinny thing on my 7800gt is hard to not notice when its spinning
17:23 karolherbst: ehh
17:25 glennk: i'd guess though that the 5700 ultra fan is probably still the record holder for most dB out of a gpu
17:26 karolherbst: I actually own three passively cooled nv4x gpus
17:26 glennk: yeah i have a 6600, its just a lot slower than the 7800
17:26 karolherbst: yeah... they are all terribly slow :D
17:27 glennk: well, gnome on the 6600 is paint drying levels slow, but the 7800 feels intel integrated sort of speed
17:32 karolherbst: at least something
17:32 karolherbst: glennk: did you run with kasan or kcsan enabled?
17:32 karolherbst: maybe we just trash host memory somewhere
17:33 glennk: no, that old system is slow enough already
17:33 karolherbst: but what kind of errors are you seeing anyway? or well.. is it userspace or kernel space command submission being bonkers?
17:33 glennk: also kernel driver was outside my space of caring :-p
17:34 glennk: same userspace ran fine on 5.14, and lots of issues on later kernels
17:34 karolherbst: 5.14 or 5.4?
17:34 glennk: err, sorry, ran fine on 5.4, and badly up to at least 5.14
17:35 karolherbst: ehhhh
17:35 karolherbst: that's annoying
17:35 karolherbst: on 5.4 networking doesn't work on my desktop :D
17:35 karolherbst: but anyhow.. it runs fine
17:35 karolherbst: but let me check if wayland vs xorg makes a huge difference here
17:35 glennk: throw random pci ethernet card in machine?
17:35 karolherbst: mhhh
17:36 glennk: i found wayland vs xorg made no difference in stability
17:36 glennk: all kernel what i could tell
17:37 karolherbst: okay.. let's see
17:38 karolherbst: ehh.. I think my threading fixes break things on nv4x.. good to know
17:41 karolherbst: glennk: the only thing which renders incorrectly is the background here
17:41 karolherbst: well.. and other random bits :D oh boi
17:41 karolherbst: but no errors in dmesg
17:42 glennk: the icons on the gnome desktop use a lot of glScissor
17:42 karolherbst: wayland looks a little worse
17:42 glennk: had a patch for that somewhere
17:42 karolherbst: right..
17:42 karolherbst: I tried yours but it didn't really improved things
17:42 glennk: like i mentioned, you have to run a working kernel driver too
17:42 karolherbst: but you said on 5.4 most bits are actually fine? or do you mean on 5.4 it still renders garbage but at least it boots?
17:43 glennk: no it renders perfectly on 5.4
17:43 karolherbst: interesting....
17:48 airlied: glennk: so 5.5 breaks? there isn't a lot of nouveau or ttm changes in that gap
17:48 glennk: i don't remember unfortunately
17:49 airlied: I would expect a bisect to nail it down pretty exactly if we can reproduce it
17:49 glennk: it was basically one patch that changed oodles of nouveau stuff in one go
17:51 airlied: outside of some svm patch not seeing much in there for nouveau
17:51 glennk: there is also a chance i'm misremembering if it was 5.4 or something even older
17:51 karolherbst: I mean.. I can boot something like 4.19 or so, I just don't have any network :)
17:51 glennk: what network card is on that thing?
17:52 glennk: nforce?
17:52 airlied: 5.6 has a bunch more nouveau
17:53 airlied: glennk: so on a real nv40 gpu?
17:53 airlied:doesn't have much nv40 outside of the G5 box, that I've no idea if that even boots and is locked in an office
17:53 glennk: 7800gt, is that nv42?
17:54 karolherbst: glennk: ehh.. it's a fairly new motherboard
17:54 karolherbst: like 2021 new
17:55 glennk: airlied, g70 aka nv47
17:55 karolherbst: I think llvmpipe on that system is more power efficient and faster than any n4x GPU :D
17:55 glennk: karolherbst, ah, i plug nv4x into period correct hardware...
17:55 glennk: karolherbst, a core2 duo system
17:55 karolherbst: ehhh no... I pair it with a i7-12700 :D
17:56 glennk: i don't think that combo has ever worked well
17:56 karolherbst: need to keep hw in sync, GPU and CPU equally fast
17:56 glennk: new cpus speculate a lot more
17:56 karolherbst: well
17:56 karolherbst: it does boot and gnome does start
17:56 glennk: and well, a fair bit faster too
17:56 karolherbst: some GPU are just broken if you put them into the PEGP slot, but...
17:57 karolherbst: but that's broken on the firmware level
17:57 glennk: i'd expect a new cpu/mb to expose more coherency issues
17:57 glennk: and well, nv4x kernel driver has 'em
17:58 karolherbst: let's start with 4.19 shall we :D
18:01 anholt: tagr: recognize "tegra-pcie 1003000.pcie: failed to power ungate: -110" on the tk1? boot hangs just after that. kernel 5.16 or .17.
18:11 mynacol:is also pairing a i5-11400F CPU with a GT 730 GPU =D
18:12 karolherbst: mynacol: that's way too close :P
18:13 airlied: glennk: I'd be interested in any bisection result you could turn up, or even just knowing 5.5 vs 5.6 works
18:14 karolherbst: I will probably already bisect it, as I need nv30 to work
18:16 karolherbst: airlied, glennk: well... 4.19 is just as broken
18:16 karolherbst: probably it's worse even