00:29Wally: Do we use this? https://nvidia.github.io/open-gpu-doc/
00:30Wally: [I assume so]
00:33airlied: not sure how used it has been for development, definitely referenced later to confirm reverse engineering results
00:34imirkin: yeah, it's mostly been post-facto
00:48anholt: has anyone here used a tk1? got a kernel command line or kernel image I could copy that would get me serial console?
02:33imirkin: anholt: i've used it, but it dies whenever i use network heavily
02:33imirkin: dunno if it's my unit, my configuration, or what
02:33imirkin: works fine with l4t though
02:35imirkin: let me see if i can find something
02:42imirkin: anholt: i have an ancient thing from gnurou who helped me get it going. happy to share. basically a script which takes a u-boot bin + some stuff + dtb. i think i was having trouble with getting the installed u-boot to boot off tftp or something? and was afraid of flashing things.
03:44anholt: imirkin: would take whatever you've got. I'm using the last l4t release for tk1, and whether it's their kernel or mine, ttyS0 or ttyS3 or earlyprintk + earlycon on any of the serials, I get nothing after "Starting kernel..."
03:44imirkin: anholt: ah, well at least my thing boots
03:44imirkin: and definitely l4t worked just fine
03:44imirkin: wait, are you saying that the "built-in" l4t doesn't work?
03:45imirkin: iirc it just boots and dhcp's, so you can ssh in
03:45anholt: well, I flashed 21.8 because I was trying to work toward getting tftpboot set up
03:45anholt: I did see kernel serial console with what was stock on the boards
03:45imirkin: ok. well let me tar this thing up
03:47imirkin: anholt: https://people.freedesktop.org/~imirkin/tegra/
03:47imirkin: this came from gnurou
03:47imirkin: i have some local modifications too, but i think you should start from his thing :)
03:47imirkin: his sample cmdline was "./boot-kernel.py PM375_Hynix_2GB_H5TC4G63AFR_RDA_924MHz.bct /path/to/zImage /path/to/tegra124-jetson-tk1.dtb"
03:49imirkin: this makes use of the tegrarcm tool
03:50imirkin: which is sorta like fastboot i guess
03:52anholt: hmm. doesn't look like I have a tegrarcm with those args.
03:53imirkin: grab a tegrarcm-du-jour? i think the day was 2015 or so?
04:03anholt: oho. I need to set fdt_addr_r to make space for the kernel
04:04anholt: except that my "I'll just stuff it in my boot script I supply to flashing" didn't actually take.
04:05imirkin: i can also give you the current state of that directory. i've hacked up that boot-kernel script tremendously (i suspect i just used it to save some outputs) and a bunch of various img files which presumably didn't work extremely well. not sure how helpful it'd be.
04:08imirkin: anyways, i gtg, and shut my box off (scheduled power outage). good luck!
17:03glennk: karolherbst, which kernel did you run with nv4x? anything much past 5.4 runs into random vm errors and lockups
17:04karolherbst: 5.18 just works perfectly fine for me
17:04karolherbst: besides random issues
17:05karolherbst: but I might have tried to run with wayland, which doesn't seem to work out all that great :D
17:09glennk: worked well enough on gnome/wayland on 5.4 for me
17:10glennk: but any kernel past that was just a lot of random rendering errors and lockups
17:10glennk: and the bisect was basically "well we refactored ttm and nouveau all at once"
17:11karolherbst: speaking of ttm, I found a workaround for the stalls on nv50: https://gitlab.freedesktop.org/drm/nouveau/-/issues/156#note_1385770
17:12glennk: so basically if engine is idling it takes too long to ramp up and complete before the fence runs out?
17:12karolherbst: I have no idea
17:12glennk: s/idling/at idle clocks/
17:13karolherbst: just that waiting on the fence explicitly makes it work, that's all I know
17:13karolherbst: ttm does call into dma_fence_wait, but that simply times out
17:13karolherbst: anyway.. I wouldn't be surprised if our ttm integration is all busted in weird ways
17:13glennk: isn't that just timing out in itself and then the next wait succeeds?
17:14karolherbst: the next wait also times out
17:14karolherbst: the annoying part is, it doens't happen to all.. also if the GPU is quick enough this isn't a problem either
17:15karolherbst: I wouldn't be even surprised if that helps on nv40
17:15karolherbst: the main issue here is, that if the dma_fence wait fails, we fallback to software
17:16karolherbst: and I suspect this messes things up, because on the GPU side, everything still runs or something
17:16glennk: only sw fallback on nv4x is the T&L path
17:17glennk: or if the engine can't init at all you get softpipe/llvmpipe at context init time
17:17karolherbst: I meant on the kernel side
17:17karolherbst: so we do a ttm memcpy thing, which does mapping/unmapping and a CPU memcpy
17:17glennk: right but it doesn't notify any existing contexts about that afaik
17:17glennk: so they just try submitting and stuff hangs
17:18karolherbst: this happens on the drm channel
17:18karolherbst: at lest on nv50 it did
17:18karolherbst: but yeah.. applications just appeared to be frozen with that
17:18glennk: i think nvc0 has slightly more robust recovery
17:19karolherbst: it's not about recovery though
17:19glennk: but <= nv4x its reboot time when that happens
17:19karolherbst: oh wow
17:19karolherbst: all I know is, that my nv4x gpus are booting fine
17:21glennk: they tend not to reflow themselves on fan failures at least
17:22glennk: but the dinky spinny thing on my 7800gt is hard to not notice when its spinning
17:25glennk: i'd guess though that the 5700 ultra fan is probably still the record holder for most dB out of a gpu
17:26karolherbst: I actually own three passively cooled nv4x gpus
17:26glennk: yeah i have a 6600, its just a lot slower than the 7800
17:26karolherbst: yeah... they are all terribly slow :D
17:27glennk: well, gnome on the 6600 is paint drying levels slow, but the 7800 feels intel integrated sort of speed
17:32karolherbst: at least something
17:32karolherbst: glennk: did you run with kasan or kcsan enabled?
17:32karolherbst: maybe we just trash host memory somewhere
17:33glennk: no, that old system is slow enough already
17:33karolherbst: but what kind of errors are you seeing anyway? or well.. is it userspace or kernel space command submission being bonkers?
17:33glennk: also kernel driver was outside my space of caring :-p
17:34glennk: same userspace ran fine on 5.14, and lots of issues on later kernels
17:34karolherbst: 5.14 or 5.4?
17:34glennk: err, sorry, ran fine on 5.4, and badly up to at least 5.14
17:35karolherbst: that's annoying
17:35karolherbst: on 5.4 networking doesn't work on my desktop :D
17:35karolherbst: but anyhow.. it runs fine
17:35karolherbst: but let me check if wayland vs xorg makes a huge difference here
17:35glennk: throw random pci ethernet card in machine?
17:36glennk: i found wayland vs xorg made no difference in stability
17:36glennk: all kernel what i could tell
17:37karolherbst: okay.. let's see
17:38karolherbst: ehh.. I think my threading fixes break things on nv4x.. good to know
17:41karolherbst: glennk: the only thing which renders incorrectly is the background here
17:41karolherbst: well.. and other random bits :D oh boi
17:41karolherbst: but no errors in dmesg
17:42glennk: the icons on the gnome desktop use a lot of glScissor
17:42karolherbst: wayland looks a little worse
17:42glennk: had a patch for that somewhere
17:42karolherbst: I tried yours but it didn't really improved things
17:42glennk: like i mentioned, you have to run a working kernel driver too
17:42karolherbst: but you said on 5.4 most bits are actually fine? or do you mean on 5.4 it still renders garbage but at least it boots?
17:43glennk: no it renders perfectly on 5.4
17:48airlied: glennk: so 5.5 breaks? there isn't a lot of nouveau or ttm changes in that gap
17:48glennk: i don't remember unfortunately
17:49airlied: I would expect a bisect to nail it down pretty exactly if we can reproduce it
17:49glennk: it was basically one patch that changed oodles of nouveau stuff in one go
17:51airlied: outside of some svm patch not seeing much in there for nouveau
17:51glennk: there is also a chance i'm misremembering if it was 5.4 or something even older
17:51karolherbst: I mean.. I can boot something like 4.19 or so, I just don't have any network :)
17:51glennk: what network card is on that thing?
17:52airlied: 5.6 has a bunch more nouveau
17:53airlied: glennk: so on a real nv40 gpu?
17:53airlied:doesn't have much nv40 outside of the G5 box, that I've no idea if that even boots and is locked in an office
17:53glennk: 7800gt, is that nv42?
17:54karolherbst: glennk: ehh.. it's a fairly new motherboard
17:54karolherbst: like 2021 new
17:55glennk: airlied, g70 aka nv47
17:55karolherbst: I think llvmpipe on that system is more power efficient and faster than any n4x GPU :D
17:55glennk: karolherbst, ah, i plug nv4x into period correct hardware...
17:55glennk: karolherbst, a core2 duo system
17:55karolherbst: ehhh no... I pair it with a i7-12700 :D
17:56glennk: i don't think that combo has ever worked well
17:56karolherbst: need to keep hw in sync, GPU and CPU equally fast
17:56glennk: new cpus speculate a lot more
17:56karolherbst: it does boot and gnome does start
17:56glennk: and well, a fair bit faster too
17:56karolherbst: some GPU are just broken if you put them into the PEGP slot, but...
17:57karolherbst: but that's broken on the firmware level
17:57glennk: i'd expect a new cpu/mb to expose more coherency issues
17:57glennk: and well, nv4x kernel driver has 'em
17:58karolherbst: let's start with 4.19 shall we :D
18:01anholt: tagr: recognize "tegra-pcie 1003000.pcie: failed to power ungate: -110" on the tk1? boot hangs just after that. kernel 5.16 or .17.
18:11mynacol:is also pairing a i5-11400F CPU with a GT 730 GPU =D
18:12karolherbst: mynacol: that's way too close :P
18:13airlied: glennk: I'd be interested in any bisection result you could turn up, or even just knowing 5.5 vs 5.6 works
18:14karolherbst: I will probably already bisect it, as I need nv30 to work
18:16karolherbst: airlied, glennk: well... 4.19 is just as broken
18:16karolherbst: probably it's worse even