06:45 pmoreau: karolherbst: No idea, sorry. I haven’t looked into SVM at all, you are the specialist on that front. :-)
06:49 pabs3: karolherbst: got another GPU hang, managed to recover by killing Xorg via ssh: https://paste.debian.net/hidden/38bb3748/ (Linux 5.2.9-2 from Debian)
06:56 karolherbst: pabs3: well, sadly that doesn't tell us more than "sorry, the GPU context was killed"
06:57 karolherbst: pabs3: is that with modesetting or nouveau ddx?
07:12 karolherbst: pmoreau: uff... I already see me discussing that inside khronos
07:37 pabs3: karolherbst: modesetting
07:38 karolherbst: mhhhh
07:38 karolherbst: so obviously we do something stupid inside mesa, I guess
07:38 pabs3: I assume modesetting anyways, Loading /usr/lib/xorg/modules/drivers/modesetting_drv.so
07:39 karolherbst: oh, it's more about whether there is a "modeseting" doing the bits or "nouveau"
07:39 karolherbst: it should be quite obvious
07:41 pabs3: I'm not sure what I should look for in the logs, sorry :)
07:42 karolherbst: well, either there is a bunch of "nouveau(0)" lines or "modeset(0)" or so
07:42 karolherbst: somewhere at the begining
07:42 karolherbst: it is quite obvious
07:43 pabs3: (II) modeset(0): using drv /dev/dri/card0
07:44 pabs3: looks like modeset
07:47 pmoreau: karolherbst: Sounds like a ton of fun! 🙃
09:54 AndrewR: karolherbst, so mesa master (git-0c6ca0a647) compiles for me ok, thanks for fix!
09:55 karolherbst: np
13:05 linearain: hello
13:06 linearain: when i run glxgears i get nv30_screen_create:592 - nv30_screen_init failed: -22
13:06 linearain: libGL error: failed to create dri screen
13:06 linearain: libGL error: failed to load driver: nouveau
13:08 linearain: latest debian i686, nvidia fx 5600xt
13:09 karolherbst: linearain: uhm.. something seems wrong with your installation if nothing else happens..
13:09 karolherbst: linearain: dmesg?
13:09 linearain: the glxgears do load up
13:10 linearain: but very slow
13:10 karolherbst: yeah, that's software rendering
13:10 linearain: sorry im a noob at linux
13:10 karolherbst: I am sure the kernel module complains about something
13:10 linearain: dmesg
13:10 linearain: and thats it?
13:11 karolherbst: execute dmesg
13:11 karolherbst: it should print a log
13:11 karolherbst: mind pastebining it somewhere?
13:11 linearain: ok give me a minute
13:13 linearain: do you need all of it?
13:15 karolherbst: probably
13:15 karolherbst: or probably not
13:15 karolherbst: but it's hard to filter
13:15 karolherbst: and there is no sensitive information anyway
13:16 karolherbst: just upload it somewhere
13:16 karolherbst: there are enough websites allowing you to paste text in it
13:16 linearain: ok
13:19 linearain: you know what ill reboot cause the log is massive, i was messind around with the machine
13:19 linearain: brb very soon
13:31 linearain: karolherbst, https://pastebin.com/uUAPfN4f
13:31 linearain: although after reboot i dont get that error anymore with glxgears...
13:32 karolherbst: mhhh
13:32 karolherbst: "imem: OOM: 00001000 00001000 -28"
13:32 karolherbst: OOM probably means out of memory
13:32 karolherbst: linearain: I think we just run out of memory on that GPU after a while
13:32 karolherbst: and we don't really handle it all that well
13:32 karolherbst: linearain: what desktop are you running?
13:33 linearain: lxqt
13:33 karolherbst: mhhhh
13:33 linearain: that gpu has 128mb ram and my agp aperture is now 64mb in bios
13:33 karolherbst: I seriously have no experience with lxqt on low end hardware... always used xfce, but maybe even that is too much these days
13:33 linearain: so no wonder
13:33 karolherbst: linearain: how much was it before?
13:33 linearain: what was before?
13:34 karolherbst: "agp aperture is now 64mb in bios"
13:34 karolherbst: wondering about the "now"
13:34 linearain: aperture? was 64mb too
13:34 karolherbst: sounded like you changed it
13:34 karolherbst: ahh
13:34 karolherbst: okay
13:34 linearain: im planning to run this machine headless anyway and use GUI just sometimes
13:34 linearain: but just wondering
13:34 linearain: doom3demo runs at 5fps with artifacts
13:35 linearain: and on this same machine i used to play half-life 2 just fine on windows
13:35 karolherbst: might be we don't handle everything correctly.. but running out of memory is a big issue
13:35 karolherbst: lxqt might use OpenGL a lot
13:35 karolherbst: as it's essentially qt5
13:35 karolherbst: and qt5 uses OpenGL for accelerating stuff
13:37 linearain: i thought youtube is laggy because nouveau is not being used but i guess its just the poorly optimized firefox and really old hardware on my end
13:38 linearain: or maybe im dumb and youtube has nothing to do with gpu
13:38 karolherbst: it will probably use the CPU for decoding though
13:38 karolherbst: but the GPU wouldn't be able to anyway
13:38 karolherbst: and your CPU is probably also way too slow for h.264 or even mpeg-4 content
13:38 linearain: true
13:39 linearain: im gonna run this machine headless anyway
13:39 karolherbst: linearain: you could give xfce4 a shot though
13:39 karolherbst: I think it should have lower overhead in general
13:39 karolherbst: might work better
13:39 linearain: well lxqt is performing quite well
13:39 karolherbst: sure.. but random applications are failing
13:40 karolherbst: and I am just wondering if the situation would be better with xfce4
13:40 linearain: maybe ill try it some day
13:41 linearain: i want to ask something else
13:41 linearain: is it possible to get gpu temperature or modify fan speed with nouveau?
13:43 karolherbst: with certain GPUs yes. We use hwmon as the interface for that
13:43 karolherbst: "sensors" is a userspace tool which can list you available sensors on our machine
13:43 linearain: yeah, sensors wont give me gpu sensors
13:44 karolherbst: then it's probably not possible. Older GPUs had fans with a fixed speed and stuff
13:44 karolherbst: so unless it's possible with the nvidia driver (on windows eg) then I doubt there is anything we can do
13:44 karolherbst: linearain: do you know how many pins the fan connector has?
13:44 karolherbst: 2/3/4?
13:44 linearain: let me open up the case
13:46 linearain: looks like only black and red wires going
13:47 linearain: its not a big deal, i just want to see if i can remove the gpu fan for running the machine headless
13:48 linearain: maybe you know if the gpu is doing any work at all if desktop environment is disabled and monitor is unplugged from gpu?
13:49 linearain: i could just take out the gpu from mobo, but sometimes i might need to manage it graphically
13:49 karolherbst: linearain: old GPUs have really terrible power management
13:50 karolherbst: so it might not even matter if they are doing anything or not
13:50 karolherbst: they might just produce the same amount of heat
13:50 linearain: wow
13:50 karolherbst: they don't even have different clock states
13:50 karolherbst: so they operate on fixed frequencies anyway
13:51 linearain: https://www.ixbt.com/video/itogi-video/fx5600xt-128b.jpg
13:51 linearain: with a tiny heatsink like this it probably isnt generating much anyway
13:51 linearain: well i hope so
13:51 karolherbst: well, the fan helps a lot
13:51 karolherbst: anyway, it's a two pin fan, so you aren't able to control it properly
13:52 linearain: its fine
15:10 tagr: karolherbst: if the OOM goes away after a reboot it could indicate that we're leaking somewhere
15:10 karolherbst: tagr: well.. it's 4000 seconds after boot
15:11 karolherbst: but yeah
15:11 tagr: I vaguely remember seeing some weird WARN about something PTC or so not being empty when I was testing module unload
15:11 karolherbst: possible
15:11 tagr: and this was perhaps 100 seconds after boot
15:12 tagr: I think there was also an accompanying nvmm warning that it wasn't empty when it was getting destroyed
15:13 tagr: or was it drm_mm?
15:13 karolherbst: well
15:13 karolherbst: I wouldn't be surprised if we have VRAM leaks
15:14 tagr: could've been something Tegra specific, to be fair
15:15 tagr: could've also been just one tiny buffer that wasn't getting freed, in which case it probably wouldn't have been related to this
15:16 tagr: while 64/128 MiB is pretty constrained, it should be enough to run glxgears
15:17 tagr: on the other hand, depending on where you leak you may not be able to allocate a big enough contiguous chunk in VRAM
15:17 tagr: VRAM is always allocated in contigous chunks, isn't it?
15:18 karolherbst: I think so
15:18 tagr: actually I'm not even sure what the MMU support is like on nv30, perhaps I should just shut up /o\
15:19 karolherbst:has no idea either
15:20 karolherbst: tagr: OOM: 00001000 00001000 means size: 0x1000 and align is 0x1000
15:20 karolherbst: which would be weird if we fail to allocate that much
15:20 karolherbst: well
15:20 karolherbst: due to not having enough contiguous VRAM
15:21 tagr: heh, yeah
15:21 karolherbst: OOM is a disaster with nouveau right now anyway
15:21 karolherbst: we don't even know how much VRAM we have left
15:21 karolherbst: or not really
15:22 karolherbst: there are some bits inside libdrm but I have no idea how reliable that is
15:22 tagr: actually, it's weird that you fail to allocate 4 KiB in any case
15:23 tagr: isn't there something in TTM's debugfs for that?
15:23 karolherbst: well...
15:23 karolherbst: the thing is, what does actually "used" VRAM mean
15:23 karolherbst: or when are you actually out of VRAM
15:24 karolherbst: it's a fuzzy concept to begin with and you can really just answer it based on whatever shaders/operations you are executing, etc... sometimes having 12GB mapped on a 4GB VRAM GPU can be totally valid
15:26 tagr: nv30 would be using nv04_instobj_new(), right? and if this was a kzalloc() failure there should've been more splat in dmesg, so this must've been nvkm_mm_head() failing
15:30 karolherbst: probably yes
15:30 karolherbst: I trust you in reading kernel code :p
15:31 tagr: for align 0x1000 it seems like nvkm_mm_head() will only fail with -ENOSPC
15:31 tagr: both -ENOMEM cases would be slab allocation failures, so should produce a splat in dmesg
15:33 karolherbst: ufff.. I hate the kernel for not erroring when you pass in an enum as a bool :/
15:33 karolherbst: but I only do so, because I convert a bool -> enum
15:34 tagr: heh... actually the OOM message says that it's -ENOSPC (-28)...
15:42 tagr: karolherbst: wait a minute, this only runs out of instance memory, of which there's only 512 KiB to begin with
15:43 karolherbst: ufff
15:43 karolherbst: right
15:43 karolherbst: tagr: do you know how instance memory gets allocated in general?
15:44 tagr: karolherbst: I think it's always used for things like GPU objects
15:44 tagr: so things like channels each get a chunk of instance memory
15:45 tagr: page tables also go into instance memory I think (not sure if that's even relevant for nv30, though)
15:48 karolherbst: okay
15:48 tagr: not the whole page tables, though, I think it's only the descriptor of the page table
15:48 karolherbst: so if my theory is right, and lxqt is using qt5 and qt5 uses OpenGL for every single application to do acceleration, we have a _huge_ problem on such GPUs
15:48 tagr: but I get these things confused sometimes, so I'm definitely not an authority on this topic
15:49 tagr: yeah, that sounds like it could potentially exhaust those 512 KiB pretty quickly
15:49 karolherbst: I have not the faintiest idea what qt5 does if they see a 1.4 GL context or so
15:49 karolherbst: but still
15:49 tagr: of those 512 KiB some 128 KiB seem to be reserved to begin with
15:52 tagr: hm... some of this seems to be done differently pre-nv50, which is not totally unexpected
15:56 karolherbst: yeah.. if there would be a way to just force disable opengl in qt5 on linux... :/
16:02 tagr: karolherbst: if I read things correctly, of nv30 each gr channel is about 24 KiB, which I guess isn't catastrophic depending on how many OpenGL contexts there are (assuming that there's really only one gr channel per context)
16:03 tagr: but then every gr object is bound to instance memory using 16 bytes
16:03 karolherbst: well... if there are just 15 qt5 applications active doing OpenGL, you are already in big trouble I guess
16:04 tagr: yeah, I suppose so
16:05 karolherbst: just wondering if this is indeed the case
16:11 tagr: looking at xf86-video-nouveau, there's plentry of nouveau_object_new() calls as well, each of those will be one GPU object
16:11 tagr: I guess it's slightly less bad for those because they should only exist once per X session