01:33 tertl3: evenin
01:40 tertl3: ive yet to contribute to the power/fan or video decoding on nv136 although the hope is still alive
01:41 imirkin: for which one?
13:58 TheLemonMan: is there a way to check what's keeping the discrete GPU always powered on?
13:59 TheLemonMan: lsof of /dev/dri/card* nodes maybe?
14:12 RSpliet: TheLemonMan: what is /sys/kernel/debug/vgaswitcheroo/switch saying?
14:23 TheLemonMan: RSpliet, http://ix.io/2VBN looks good to me
14:24 TheLemonMan: but the card runtime_status is always active and runtime_suspended_time is always zero
14:24 RSpliet: Yeah, I would expect DIS to say "DynOff" rather than DynPwr.
14:24 RSpliet: Got a dmesg by any chance?
14:25 RSpliet: (just raw unfiltered/ungrepped please)
14:32 RSpliet: TheLemonMan: ^
14:34 RSpliet: Actally, bbl. But I'll read backlogs
15:05 TheLemonMan: RSpliet, sure thing, see PMs
15:15 RSpliet: TheLemonMan: interestingly, this looks to be an issue with snd_hda_intel
15:16 RSpliet: "snd_hda_intel 0000:01:00.1: can't change power state from D3cold to D0 (config space inaccessible)"
15:16 TheLemonMan: interesting, is that a known problem?
15:19 RSpliet: Well, not really. I've had trouble with snd_hda_intel in the past, not properly initialising because the GPU's HDA controller doesn't have a codec, but haven't seen this error message before
15:23 pmoreau: imirkin: Finally looking through your MR again, sorry for taking so long. Currently going through the fixes to half-registers; what’s the situation again on Tesla, does it only have half-regs, or 16-bit or 32-bit registers depending on the version?
15:24 RSpliet: TheLemonMan: Better get in touch with the alsa people, explain that this error stops snd_hda_intel from registering DIS-Audio with vgaswitcheroo, in turn preventing the GPU from entering powersave.
15:24 RSpliet: And then fingers crossed this isn't a BIOS error
15:26 RSpliet: Or well, that's my first guess. I also don't know why nouveau is repeatedly trying to load nve7_fuc084
15:27 TheLemonMan: heh awesome news
15:28 TheLemonMan: I think your guess is correct, I've just found this https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907212
15:28 RSpliet: unbinding works 'eh?
15:30 TheLemonMan: not really, runtime_status is still 'active'
15:30 TheLemonMan: and vgaswitcheroo is still reporting DynPwr
15:33 RSpliet: Oh
15:33 RSpliet: hmm
15:34 RSpliet: imirking, karolherbst: this is a continuation of an LKML/ML discussion from back in January
15:34 RSpliet: https://lkml.org/lkml/2020/12/14/25
15:34 RSpliet: *imirkin
15:35 tertl3: https://www.reddit.com/r/linux/comments/mo0ay0/hacker_figures_how_to_unlock_vgpu_functionality/
15:54 RSpliet: TheLemonMan: just breathed some fresh life into the accompanying mailing list thread. Hopefully someone will pick it up soon
15:57 TheLemonMan: RSpliet, thank you, I'm always up for testing and debugging new patches
15:59 tertl3: i sgned up for the mai ling list but I never get mail?
16:01 karolherbst: RSpliet: oh, that might work
16:19 tertl3: ok I completed the mailing list confimation I shuld be getting the mail here on out
18:03 imirkin: pmoreau: it has both, always
18:03 imirkin: pmoreau: similar to x86 -- $r0 is like "ax" and $r0h is like "ah" and $r0l is like "al"
18:05 pmoreau: Okay; is there a mode for byte-size ones too?
18:05 imirkin: no
18:05 imirkin: just like x86 ;)
18:05 pmoreau: :-)
18:05 imirkin: but certain ops may produce u8/s8 values
18:05 imirkin: (esp cvt)
18:05 imirkin: but they'd be dumped into 16-bit regs
18:06 imirkin: generally there are very few ops which mix 16- and 32-bit regs
18:06 imirkin: cvt (and mad) are the only ones i can think of
18:06 imirkin: [and yeah, technically $a is 16-bit, but i'm not counting that)
18:07 pmoreau: I see
19:16 imirkin: pmoreau: btw, i discovered i have some CB flush issues, if i super-flush always, that fixes it, but obviously not desirable...
19:34 dob1: hi, I have problems with an old geforce4 ti 4200, the mouse pointer/cursor is not visible
19:35 imirkin: dob1: pastebin dmesg + xorg log
19:55 dob1: imirkin, https://paste.debian.net/hidden/a91e088e/ and https://paste.debian.net/hidden/1e576e49/
19:55 imirkin: tahnks
19:56 imirkin: give me a min
19:56 imirkin: dob1: is this a laptop btw?
19:56 dob1: no wait a second
19:56 dob1: because I removed it and now I am not able to install as it was
19:56 imirkin: [ 6.570711] ata2.01: limited to UDMA/33 due to 40-wire cable
19:56 imirkin: lol
19:57 imirkin: i remember those days.
19:57 dob1: it says Loading /usr/lib/xorg/modules/drivers/nouveau_drv.so but I don't see it on lsmod
19:57 imirkin: dob1: those logs seem to indicate nouveau is used though
19:57 dob1: ah ok
19:57 imirkin: that's not going to show up in lsmod
19:57 imirkin: that's an X driver
19:57 imirkin: not a kernel driver
19:57 imirkin: lsmod should show 'nouveau' (unless you built it into the kernel)
19:57 dob1: I was sure it was on lsmod output
19:58 dob1: before I uninstalled it
19:58 imirkin: yeah, you should see `nouveau` in there
19:58 dob1: it's not here right now
19:58 dob1: anyway if it's loaded it's ok
19:58 dob1: ah no, wrong pc :)
19:58 dob1: sorry !|
19:58 dob1: it is on lsmod!
19:58 imirkin: lol
19:59 dob1: ok so the problem: the mouse works, I can press buttons etc etc
19:59 dob1: but no cursor
19:59 imirkin: what DE are you using?
19:59 imirkin: (if any)
19:59 imirkin: also did this ever work?
19:59 dob1: well I tried first xfce4 (changing the mouse pont cursor to try but nothing) , then openbox nothing, now I am using i3 because I don't need a mouse with it
20:00 imirkin: hehe
20:00 imirkin: ok, just making sure it wasn't some sort of fanciness with compositors/etc
20:00 imirkin: i'm at a bit of a loss
20:00 dob1: if I remove the xorg driver and the kernel driver I think it will go with vesa and the cursor is there
20:00 imirkin: wait, maybe this does sound familiar ... let me do some searching
20:02 imirkin: dob1: https://bugs.freedesktop.org/show_bug.cgi?id=54700#c69
20:03 imirkin: dob1: and actually perhaps even more interestingly
20:04 imirkin: dob1: https://bugs.freedesktop.org/show_bug.cgi?id=54700#c61
20:04 imirkin: can you get a copy of envytools, and do a "nvapeek 100080" and let me know what it returns?
20:04 dob1: if it's on debian
20:05 imirkin: dunno
20:05 dob1: it's not in its repository
20:08 imirkin: ok
20:08 imirkin: well, you should get it then
20:08 imirkin: or try booting with nouveau.config=NvAGP=0
20:08 imirkin: which will disable agp
20:09 dob1: I try
20:14 dob1: I am not sure where to use this options
20:14 imirkin: you need to pass this on the kernel cmdline
20:14 imirkin: i don't know how to do that in your setup, perhaps you can search around online or ask in a distro support channel
20:15 dob1: at grub screen I edited the boot options
20:15 imirkin: sure, that works
20:15 dob1: there were some of them and the path to the kernel
20:15 dob1: I added a line with this option
20:15 imirkin: just stick that at the end
20:15 dob1: but nothing changed
20:15 dob1: ah at the end
20:15 imirkin: of the kernel cmdline
20:15 imirkin: (or beginning, or middle, doesn't matter)
20:15 imirkin: iirc in grub the kernel cmdline is the first line
20:16 imirkin: and the other lines are like extra options?
20:16 imirkin: i forget
20:16 imirkin: (like initrd, etc)
20:17 dob1: I have like a lot of options / lines the last one is initrd /boot/initrd.img.xxxxxxx
20:17 dob1: I write this options after this line?
20:18 imirkin: no
20:18 imirkin: on the first line.
20:18 imirkin: at the end of the first line.
20:18 imirkin: all lines after the first line are various grub options
20:18 imirkin: not the kernel cmdline
20:20 ccr: it's the line that starts with "linux" .. followed by something like /boot/vmlinuz... etc
20:20 dob1: it's there!
20:20 dob1: I got the mouse
20:21 dob1: BUT
20:21 dob1: how to I make this option permanent ?
20:21 imirkin: you'll want to investigate this with your distro
20:22 imirkin: we don't do distro support here
20:22 dob1: I will do, you just helped me a lot :)
20:22 dob1: thanks
20:22 imirkin: (too many distros, insufficiently interested in keeping track of them all)
20:22 imirkin: dob1: you should be able to keep agp around though, if you feel like things are slowing down
20:22 imirkin: someone else tracked down the vbios init doing something dodgy
20:22 imirkin: which nouveau should potentially undo
20:23 imirkin: but i'd want to see if it works for you first
20:23 ccr: in Debian edit /etc/default/grub and edit/add line like GRUB_CMDLINE_LINUX="nouveau.config=NvAGP=0" and run update-grub
20:23 dob1: imirkin, it's an old pc I am going to dismiss it, to be honest I was reporting it more that resolving it, but with your help I resolved it too
20:24 imirkin: dob1: ok cool
20:29 dob1: ccr, I did it and it's ok, thanks
23:17 Celmor[m]: I've switched from nvidia to nouveau. there are no nvidia related modules loaded but I get a bunch of nvidia relating errors in dmesg: https://termbin.com/5f1z
23:18 imirkin: you're using some bit of userspace which requests that module to be loaded
23:18 Celmor[m]: how can I find out what bit of userspace requests it?
23:18 imirkin: dunno
23:18 imirkin: you can just uninstall nvidia drivers
23:19 imirkin: could be some opencl thing? dunno
23:19 Celmor[m]: I'm still checking if nouveau is a usable alternative
23:19 karolherbst: imirkin: do you think you'll have some time to look over the general idea of making the fence list race free in nouveau? I am missing a few places, but I'd like to add helper functions for everything and assert on the lock being taken before accessing fence data: https://github.com/karolherbst/mesa/commit/02fe1404611357eb445246c61773c5c778fef88c
23:19 Celmor[m]: according to this everything is "TODO" for my card (1050ti) but it seems that matrix hasn't been updated in a while https://nouveau.freedesktop.org/VideoAcceleration.html
23:20 imirkin: Celmor[m]: that's right
23:20 Celmor[m]: which part?
23:20 imirkin: matrix hasn't been updated
23:20 Celmor[m]: so it might already be "DONE"?
23:20 karolherbst: send patches :p
23:20 imirkin: i barely even know what the rows mean tbh
23:21 karolherbst: ohhh
23:21 karolherbst: that's the video accel stuff
23:21 karolherbst: imirkin: which row?
23:21 imirkin: karolherbst: like any of them
23:21 karolherbst: the video engine one?
23:21 karolherbst: I think that's pretty obvious, no?
23:22 karolherbst: Celmor[m]: there is like no progress on video acceleration, as... it's just less important than GL and we require firmware which is a pita to get to users systems anyway
23:23 imirkin: karolherbst: oh, he was pointed at the video accel matrix. my bad
23:23 karolherbst: yes
23:23 imirkin: i certainly know what that one means ;)
23:23 karolherbst: the other one should be fairly updated :D
23:23 imirkin: Celmor[m]: no video decoding accel for maxwell+
23:23 imirkin: (including your gpu)
23:23 imirkin: not much point in updating it ... should be pretty obvious it's all TODO
23:24 imirkin: (and no one's interested in doing)
23:24 karolherbst: well I would be, I just prefer to work on more important bits :p
23:24 karolherbst: imirkin: btw.. my threading branch has 0 regressions in all deqp :3
23:25 imirkin: karolherbst: cool
23:25 Celmor[m]: imirkin: opengl is now imported anyway
23:25 imirkin: Celmor[m]: that page is not about opengl
23:25 karolherbst: there is just one problem left to fix.. and I am still thkning about how to fix it
23:25 karolherbst: _but_
23:25 imirkin: it's about video accel
23:25 karolherbst: it's good enough to run stuff like the android emulator :D
23:25 karolherbst: so thinking about pushing smaller self contained patches up so we get rid of the races one by one
23:25 Celmor[m]: Guess nouvea still isn't usable enough, at least for my card
23:25 imirkin: karolherbst: so what i'm most concerned by are deadlocks
23:26 karolherbst: imirkin: yeah I know
23:26 imirkin: Celmor[m]: ah ok, enjoy the nvidia experience!
23:26 karolherbst: but at least what I have works out just fine
23:26 Celmor[m]: imirkin: would a 980 ti have support?
23:26 karolherbst: the fence patch has a very ugly hack to workaround one terribly annoying deadlock
23:26 imirkin: Celmor[m]: for what?
23:26 karolherbst: imirkin: https://github.com/karolherbst/mesa/commit/02fe1404611357eb445246c61773c5c778fef88c#diff-3d285ef008ea8a4a002c5919503236e8d779734e9b7c33b94f3ff38a04959870R201
23:27 Celmor[m]: meh, xorg keeps freezing
23:27 karolherbst: but that is fine, just ugly
23:27 imirkin: Celmor[m]: are you using xf86-video-nouveau
23:27 imirkin: or modesetting?
23:27 karolherbst: didn't think about a less ugly solution
23:27 imirkin: if the latter, switch to -nouveau
23:27 Celmor[m]: i suppose the former
23:27 karolherbst: for the pushbuffers I still habe to think about how I want it to be in the end
23:27 imirkin: Celmor[m]: confirm that by looking at your xorg log
23:28 karolherbst: but for the nouveau_mm code _and_ the nouveau_fence code I have a good enough understanding on what works best
23:28 imirkin: some distros think it's funny to prefer -modesetting
23:28 imirkin: and leaving me with the support headache
23:28 imirkin: sigh
23:28 Celmor[m]: if I could, xorg keeps freezing
23:28 karolherbst: imirkin: you can ignore those until the fixes all land :D
23:28 imirkin: karolherbst: doing unlock / lock sequences basically admits that your locking logic is all bunk
23:29 karolherbst: imirkin: it's the only place and it's because the push logic touches fence state again
23:29 imirkin: Celmor[m]: don't need X to look at X log...
23:29 Celmor[m]: i can't switch tty
23:29 karolherbst: the callback doing fence shit :/
23:29 karolherbst: it's super annoying
23:29 imirkin: karolherbst: so this spot is what caused me tons of heartache in my attempt as well
23:29 karolherbst: yep
23:30 karolherbst: and I think this is the best way out without rewriting like everything
23:30 imirkin: Celmor[m]: sounds like you should just switch back to nvidia
23:30 karolherbst: I mean.. we could also call *_unlocked variants in the callback... I guess
23:30 imirkin: 980ti is unlikely to provide you with any better experience
23:30 Celmor[m]: I've followed this guide for setting up nouveau https://wiki.archlinux.org/index.php/Nouveau
23:31 imirkin: i have no idea how to help you if you can't look at logs
23:31 imirkin: so all i can say is "good luck"
23:31 karolherbst: imirkin: but the painful part of requiring the kick_notify handler to require the locked fence state is, that we will have to lock the list whenever we do literally anything
23:32 imirkin: karolherbst: yeah, sounds like too fine-grained of a locking thing?
23:32 karolherbst: yep
23:32 imirkin: i had much wider locks
23:32 imirkin: because of things like that
23:32 karolherbst: releasing the lock in the fence code in a controlled way allows us to lock the locks way less often
23:32 imirkin: i.e. it didn't make sense to make it super-granular
23:32 karolherbst: sure
23:32 imirkin: but perhaps you can reorganize the code so that this wouldn't matter as much
23:33 imirkin: e.g. instead of emitting stuff to pushbuf directly
23:33 karolherbst: but you don't have to look the fence stuff all that often
23:33 imirkin: you could stage it into a "buffer"
23:33 imirkin: and then emit all in one go
23:33 karolherbst: imirkin: yeah, I plan this to fix the pushbuf stuff for real
23:33 karolherbst: still we touch fence code
23:33 imirkin: of course the libdrm_nouveau logic has stuff to keep trakc of memory used
23:33 karolherbst: when kicking
23:33 imirkin: and emit when it reaches a watermark
23:33 karolherbst: imirkin: we can actually just use multple nouveau_pushbuf objects
23:34 karolherbst: and memory is tracked per push buffer
23:34 imirkin: sure, but that doesn't solve the issue
23:34 Celmor[m]: imirkin: there's no "modesetting" in my Xorgs log
23:34 karolherbst: we just have to be careful about submission order
23:34 imirkin: Celmor[m]: what about modeset(0)
23:34 karolherbst: imirkin: not in regards to fences, correct
23:34 imirkin: Celmor[m]: or do you see NOUVEAU(0)
23:34 Celmor[m]: nope
23:35 imirkin: karolherbst: this somewhat improved by adding an ioctl which allows the kernel to emit the fence
23:35 imirkin: Celmor[m]: pastebin xorg log
23:35 Celmor[m]: "initializing extension XFree86-VidModeExtension"
23:35 karolherbst: imirkin: yeah well sure.. but we still have this fence struct :/
23:36 karolherbst: the really painful part is just one of those fence functions to kick the pushbuf...
23:36 karolherbst: _maybe_ we should just rework the code so that doesn't happen
23:37 karolherbst: and then the problem just goes away
23:37 imirkin: just remeber that something like nouveau_bo_wait() or nouveau_bo_map() can cause a push kick
23:37 Celmor[m]: termbin.com/cwb3
23:37 karolherbst: sure
23:37 karolherbst: but that's irrelevant for the fences for now
23:37 imirkin: Celmor[m]: you're using nvidia driver in that log
23:37 karolherbst: soo.. the issue is when you call nouveau_fence_kick
23:37 karolherbst: and this adds stuff to the pushbuf
23:37 karolherbst: which can lead to a kick
23:38 karolherbst: either by PUSH_SPACE or nouveau_pushbuf_kick
23:38 imirkin: so *originally*
23:38 karolherbst: which calls nvc0_default_kick_notify later on, which touches fence state
23:38 imirkin: i didn't have that PUSH_SPACE there
23:38 imirkin: HOWEVER
23:38 imirkin: you can end up in situations where we emit a bunch of fences in sequence
23:39 Celmor[m]: then it has to be the wrong log, sry
23:39 imirkin: and so the extra 64 or whatever bytes we reserve in PUSH_SPACE end up getting used up and we run out of space
23:39 imirkin: i don't remember precisely the situation, but it's somethingl ike that
23:39 karolherbst: imirkin: what if we make the caller of nouveau_fence_kick responsible for reserving space and assert on that?
23:39 imirkin: you can look at the comit log when i added that PUSH_SPACE
23:39 Celmor[m]: oh wait, i did try to load Xorg a few times and only fixed xorg config later
23:40 Celmor[m]: so the start of the log isn't related to the current Xorg session
23:40 imirkin: Celmor[m]: there's one log file per session
23:40 karolherbst: imirkin: the space isn't all that relevant anyway
23:40 karolherbst: we still have that nouveau_pushbuf_kick call
23:42 karolherbst: so in the end nouveau_fence_wait and nouveau_fence_work can lead to a kicked push buffer
23:42 karolherbst: and I am wondering if we either leave it like that and do this crappy unlock/lock cycle
23:42 karolherbst: or
23:42 imirkin: right
23:42 karolherbst: we don't push the buffer
23:42 karolherbst: and make the callees do it
23:42 imirkin: you have to push
23:42 imirkin: o
23:43 imirkin: how would that help
23:43 karolherbst: we just have to move the push outside the locked areas
23:43 karolherbst: essentially
23:43 karolherbst: so, the idea was, if you touch fence state, you have to take a lock
23:43 karolherbst: because.. it gets read and written quite a lot
23:43 Celmor[m]: imirkin: this should be the correct one, sry for the confusion termbin.com/0g7s
23:43 karolherbst: and locking internally is super painful
23:44 karolherbst: because.. recursive locking and weird call trees and what not
23:44 karolherbst: it was easier to throw in a bunch of asserts and make callees to take the lock
23:44 imirkin: Celmor[m]: looks good
23:44 karolherbst: ehhh
23:44 karolherbst: s/callee/caller/
23:45 imirkin: [ 5149.966] (EE) NOUVEAU(0): failed to set gamma with 256 entries: Permission denied
23:45 imirkin: Celmor[m]: that implies that you should have some errors in dmesg
23:45 imirkin: i'd like to see them
23:45 imirkin: (sounds like a display hang)
23:45 karolherbst: the problem is just, once we get to kick_notify, we have take the fence lock, because.. we touch fence state there again
23:46 karolherbst: my solution is crappy but works, I am wondering if there is a nice solution without rewriting the world
23:46 imirkin: karolherbst: you could flip that
23:46 Celmor[m]: imirkin: https://termbin.com/kzjl4
23:47 karolherbst: imirkin: in what sense?
23:47 imirkin: you could require anything which MIGHT kick to take the fence lock
23:47 karolherbst: well...
23:47 karolherbst: that's like everything
23:47 imirkin: yea
23:47 imirkin: and then you don't have fine-grained locks :)
23:47 karolherbst: I wanted to not do that :D
23:47 imirkin: Celmor[m]: thanks, gimme a min
23:47 karolherbst: I had some test apps which really got slowed down by locking
23:48 karolherbst: so it is noticeable
23:48 karolherbst: and it's really not a problem of doing it fine grained
23:48 karolherbst: because if you mess it up while working on the code, you get asserts
23:49 karolherbst: imirkin: also.. simple_mtx_t is _cheap_
23:49 karolherbst: like really cheap
23:50 karolherbst: maybe I play around with the idea of moving the push_kick stuff out.. but that could be painful as well :/
23:50 imirkin: Celmor[m]: that log ends on Apr 11 00:38, but the errors in Xorg log are like an hour after that (unless i'm misreading)
23:50 Celmor[m]: must've gotten truncated then, I've read some errors from journalctl about that
23:51 imirkin: can you run "dmesg" and pastebin that?
23:51 Celmor[m]: not sure how else to get you the whole log
23:51 Celmor[m]: I've already rebooted hence why I did "journalctl -xb -1"
23:51 imirkin: oh
23:51 imirkin: well, anyways, my theory is that we did something to upset the display controller
23:51 karolherbst: ohhh, I have an idea :O
23:51 imirkin: which in turn led for things to go downhill quickly
23:51 karolherbst: will.... try stuff out for a few days :D
23:51 Celmor[m]: still trying to get you the full log
23:54 Celmor[m]: it appears the paste was simply too large. it's 3.6M
23:54 imirkin: pmoreau: btw, thanks for taking a look
23:55 imirkin: pmoreau: i don't think i'll be able to figure out the CB issues tonight, but i'm probably going to push those prep patches anyways
23:55 imirkin: since it's a lot closer to working now than before :)
23:55 imirkin: Celmor[m]: ok, i'm more interested in the last bits than the first bits
23:56 Celmor[m]: imirkin: alright, reversed the list: https://termbin.com/rivb
23:59 imirkin: Celmor[m]: looks like you have an electron-based app which triggered a hang
23:59 Celmor[m]: yeah, I've noticed the hang when I wanted to copy a message. using element.io/matrix to chat here
23:59 Celmor[m]: element.io aka matrix*