06:45pmoreau: karolherbst: No idea, sorry. I haven’t looked into SVM at all, you are the specialist on that front. :-)
06:49pabs3: karolherbst: got another GPU hang, managed to recover by killing Xorg via ssh: https://paste.debian.net/hidden/38bb3748/ (Linux 5.2.9-2 from Debian)
06:56karolherbst: pabs3: well, sadly that doesn't tell us more than "sorry, the GPU context was killed"
06:57karolherbst: pabs3: is that with modesetting or nouveau ddx?
07:12karolherbst: pmoreau: uff... I already see me discussing that inside khronos
07:37pabs3: karolherbst: modesetting
07:38karolherbst: so obviously we do something stupid inside mesa, I guess
07:38pabs3: I assume modesetting anyways, Loading /usr/lib/xorg/modules/drivers/modesetting_drv.so
07:39karolherbst: oh, it's more about whether there is a "modeseting" doing the bits or "nouveau"
07:39karolherbst: it should be quite obvious
07:41pabs3: I'm not sure what I should look for in the logs, sorry :)
07:42karolherbst: well, either there is a bunch of "nouveau(0)" lines or "modeset(0)" or so
07:42karolherbst: somewhere at the begining
07:42karolherbst: it is quite obvious
07:43pabs3: (II) modeset(0): using drv /dev/dri/card0
07:44pabs3: looks like modeset
07:47pmoreau: karolherbst: Sounds like a ton of fun! 🙃
09:54AndrewR: karolherbst, so mesa master (git-0c6ca0a647) compiles for me ok, thanks for fix!
13:06linearain: when i run glxgears i get nv30_screen_create:592 - nv30_screen_init failed: -22
13:06linearain: libGL error: failed to create dri screen
13:06linearain: libGL error: failed to load driver: nouveau
13:08linearain: latest debian i686, nvidia fx 5600xt
13:09karolherbst: linearain: uhm.. something seems wrong with your installation if nothing else happens..
13:09karolherbst: linearain: dmesg?
13:09linearain: the glxgears do load up
13:10linearain: but very slow
13:10karolherbst: yeah, that's software rendering
13:10linearain: sorry im a noob at linux
13:10karolherbst: I am sure the kernel module complains about something
13:10linearain: and thats it?
13:11karolherbst: execute dmesg
13:11karolherbst: it should print a log
13:11karolherbst: mind pastebining it somewhere?
13:11linearain: ok give me a minute
13:13linearain: do you need all of it?
13:15karolherbst: or probably not
13:15karolherbst: but it's hard to filter
13:15karolherbst: and there is no sensitive information anyway
13:16karolherbst: just upload it somewhere
13:16karolherbst: there are enough websites allowing you to paste text in it
13:19linearain: you know what ill reboot cause the log is massive, i was messind around with the machine
13:19linearain: brb very soon
13:31linearain: karolherbst, https://pastebin.com/uUAPfN4f
13:31linearain: although after reboot i dont get that error anymore with glxgears...
13:32karolherbst: "imem: OOM: 00001000 00001000 -28"
13:32karolherbst: OOM probably means out of memory
13:32karolherbst: linearain: I think we just run out of memory on that GPU after a while
13:32karolherbst: and we don't really handle it all that well
13:32karolherbst: linearain: what desktop are you running?
13:33linearain: that gpu has 128mb ram and my agp aperture is now 64mb in bios
13:33karolherbst: I seriously have no experience with lxqt on low end hardware... always used xfce, but maybe even that is too much these days
13:33linearain: so no wonder
13:33karolherbst: linearain: how much was it before?
13:33linearain: what was before?
13:34karolherbst: "agp aperture is now 64mb in bios"
13:34karolherbst: wondering about the "now"
13:34linearain: aperture? was 64mb too
13:34karolherbst: sounded like you changed it
13:34linearain: im planning to run this machine headless anyway and use GUI just sometimes
13:34linearain: but just wondering
13:34linearain: doom3demo runs at 5fps with artifacts
13:35linearain: and on this same machine i used to play half-life 2 just fine on windows
13:35karolherbst: might be we don't handle everything correctly.. but running out of memory is a big issue
13:35karolherbst: lxqt might use OpenGL a lot
13:35karolherbst: as it's essentially qt5
13:35karolherbst: and qt5 uses OpenGL for accelerating stuff
13:37linearain: i thought youtube is laggy because nouveau is not being used but i guess its just the poorly optimized firefox and really old hardware on my end
13:38linearain: or maybe im dumb and youtube has nothing to do with gpu
13:38karolherbst: it will probably use the CPU for decoding though
13:38karolherbst: but the GPU wouldn't be able to anyway
13:38karolherbst: and your CPU is probably also way too slow for h.264 or even mpeg-4 content
13:39linearain: im gonna run this machine headless anyway
13:39karolherbst: linearain: you could give xfce4 a shot though
13:39karolherbst: I think it should have lower overhead in general
13:39karolherbst: might work better
13:39linearain: well lxqt is performing quite well
13:39karolherbst: sure.. but random applications are failing
13:40karolherbst: and I am just wondering if the situation would be better with xfce4
13:40linearain: maybe ill try it some day
13:41linearain: i want to ask something else
13:41linearain: is it possible to get gpu temperature or modify fan speed with nouveau?
13:43karolherbst: with certain GPUs yes. We use hwmon as the interface for that
13:43karolherbst: "sensors" is a userspace tool which can list you available sensors on our machine
13:43linearain: yeah, sensors wont give me gpu sensors
13:44karolherbst: then it's probably not possible. Older GPUs had fans with a fixed speed and stuff
13:44karolherbst: so unless it's possible with the nvidia driver (on windows eg) then I doubt there is anything we can do
13:44karolherbst: linearain: do you know how many pins the fan connector has?
13:44linearain: let me open up the case
13:46linearain: looks like only black and red wires going
13:47linearain: its not a big deal, i just want to see if i can remove the gpu fan for running the machine headless
13:48linearain: maybe you know if the gpu is doing any work at all if desktop environment is disabled and monitor is unplugged from gpu?
13:49linearain: i could just take out the gpu from mobo, but sometimes i might need to manage it graphically
13:49karolherbst: linearain: old GPUs have really terrible power management
13:50karolherbst: so it might not even matter if they are doing anything or not
13:50karolherbst: they might just produce the same amount of heat
13:50karolherbst: they don't even have different clock states
13:50karolherbst: so they operate on fixed frequencies anyway
13:51linearain: with a tiny heatsink like this it probably isnt generating much anyway
13:51linearain: well i hope so
13:51karolherbst: well, the fan helps a lot
13:51karolherbst: anyway, it's a two pin fan, so you aren't able to control it properly
13:52linearain: its fine
15:10tagr: karolherbst: if the OOM goes away after a reboot it could indicate that we're leaking somewhere
15:10karolherbst: tagr: well.. it's 4000 seconds after boot
15:11karolherbst: but yeah
15:11tagr: I vaguely remember seeing some weird WARN about something PTC or so not being empty when I was testing module unload
15:11tagr: and this was perhaps 100 seconds after boot
15:12tagr: I think there was also an accompanying nvmm warning that it wasn't empty when it was getting destroyed
15:13tagr: or was it drm_mm?
15:13karolherbst: I wouldn't be surprised if we have VRAM leaks
15:14tagr: could've been something Tegra specific, to be fair
15:15tagr: could've also been just one tiny buffer that wasn't getting freed, in which case it probably wouldn't have been related to this
15:16tagr: while 64/128 MiB is pretty constrained, it should be enough to run glxgears
15:17tagr: on the other hand, depending on where you leak you may not be able to allocate a big enough contiguous chunk in VRAM
15:17tagr: VRAM is always allocated in contigous chunks, isn't it?
15:18karolherbst: I think so
15:18tagr: actually I'm not even sure what the MMU support is like on nv30, perhaps I should just shut up /o\
15:19karolherbst:has no idea either
15:20karolherbst: tagr: OOM: 00001000 00001000 means size: 0x1000 and align is 0x1000
15:20karolherbst: which would be weird if we fail to allocate that much
15:20karolherbst: due to not having enough contiguous VRAM
15:21tagr: heh, yeah
15:21karolherbst: OOM is a disaster with nouveau right now anyway
15:21karolherbst: we don't even know how much VRAM we have left
15:21karolherbst: or not really
15:22karolherbst: there are some bits inside libdrm but I have no idea how reliable that is
15:22tagr: actually, it's weird that you fail to allocate 4 KiB in any case
15:23tagr: isn't there something in TTM's debugfs for that?
15:23karolherbst: the thing is, what does actually "used" VRAM mean
15:23karolherbst: or when are you actually out of VRAM
15:24karolherbst: it's a fuzzy concept to begin with and you can really just answer it based on whatever shaders/operations you are executing, etc... sometimes having 12GB mapped on a 4GB VRAM GPU can be totally valid
15:26tagr: nv30 would be using nv04_instobj_new(), right? and if this was a kzalloc() failure there should've been more splat in dmesg, so this must've been nvkm_mm_head() failing
15:30karolherbst: probably yes
15:30karolherbst: I trust you in reading kernel code :p
15:31tagr: for align 0x1000 it seems like nvkm_mm_head() will only fail with -ENOSPC
15:31tagr: both -ENOMEM cases would be slab allocation failures, so should produce a splat in dmesg
15:33karolherbst: ufff.. I hate the kernel for not erroring when you pass in an enum as a bool :/
15:33karolherbst: but I only do so, because I convert a bool -> enum
15:34tagr: heh... actually the OOM message says that it's -ENOSPC (-28)...
15:42tagr: karolherbst: wait a minute, this only runs out of instance memory, of which there's only 512 KiB to begin with
15:43karolherbst: tagr: do you know how instance memory gets allocated in general?
15:44tagr: karolherbst: I think it's always used for things like GPU objects
15:44tagr: so things like channels each get a chunk of instance memory
15:45tagr: page tables also go into instance memory I think (not sure if that's even relevant for nv30, though)
15:48tagr: not the whole page tables, though, I think it's only the descriptor of the page table
15:48karolherbst: so if my theory is right, and lxqt is using qt5 and qt5 uses OpenGL for every single application to do acceleration, we have a _huge_ problem on such GPUs
15:48tagr: but I get these things confused sometimes, so I'm definitely not an authority on this topic
15:49tagr: yeah, that sounds like it could potentially exhaust those 512 KiB pretty quickly
15:49karolherbst: I have not the faintiest idea what qt5 does if they see a 1.4 GL context or so
15:49karolherbst: but still
15:49tagr: of those 512 KiB some 128 KiB seem to be reserved to begin with
15:52tagr: hm... some of this seems to be done differently pre-nv50, which is not totally unexpected
15:56karolherbst: yeah.. if there would be a way to just force disable opengl in qt5 on linux... :/
16:02tagr: karolherbst: if I read things correctly, of nv30 each gr channel is about 24 KiB, which I guess isn't catastrophic depending on how many OpenGL contexts there are (assuming that there's really only one gr channel per context)
16:03tagr: but then every gr object is bound to instance memory using 16 bytes
16:03karolherbst: well... if there are just 15 qt5 applications active doing OpenGL, you are already in big trouble I guess
16:04tagr: yeah, I suppose so
16:05karolherbst: just wondering if this is indeed the case
16:11tagr: looking at xf86-video-nouveau, there's plentry of nouveau_object_new() calls as well, each of those will be one GPU object
16:11tagr: I guess it's slightly less bad for those because they should only exist once per X session