01:33tertl3: evenin
01:40tertl3: ive yet to contribute to the power/fan or video decoding on nv136 although the hope is still alive
01:41imirkin: for which one?
13:58TheLemonMan: is there a way to check what's keeping the discrete GPU always powered on?
13:59TheLemonMan: lsof of /dev/dri/card* nodes maybe?
14:12RSpliet: TheLemonMan: what is /sys/kernel/debug/vgaswitcheroo/switch saying?
14:23TheLemonMan: RSpliet, http://ix.io/2VBN looks good to me
14:24TheLemonMan: but the card runtime_status is always active and runtime_suspended_time is always zero
14:24RSpliet: Yeah, I would expect DIS to say "DynOff" rather than DynPwr.
14:24RSpliet: Got a dmesg by any chance?
14:25RSpliet: (just raw unfiltered/ungrepped please)
14:32RSpliet: TheLemonMan: ^
14:34RSpliet: Actally, bbl. But I'll read backlogs
15:05TheLemonMan: RSpliet, sure thing, see PMs
15:15RSpliet: TheLemonMan: interestingly, this looks to be an issue with snd_hda_intel
15:16RSpliet: "snd_hda_intel 0000:01:00.1: can't change power state from D3cold to D0 (config space inaccessible)"
15:16TheLemonMan: interesting, is that a known problem?
15:19RSpliet: Well, not really. I've had trouble with snd_hda_intel in the past, not properly initialising because the GPU's HDA controller doesn't have a codec, but haven't seen this error message before
15:23pmoreau: imirkin: Finally looking through your MR again, sorry for taking so long. Currently going through the fixes to half-registers; what’s the situation again on Tesla, does it only have half-regs, or 16-bit or 32-bit registers depending on the version?
15:24RSpliet: TheLemonMan: Better get in touch with the alsa people, explain that this error stops snd_hda_intel from registering DIS-Audio with vgaswitcheroo, in turn preventing the GPU from entering powersave.
15:24RSpliet: And then fingers crossed this isn't a BIOS error
15:26RSpliet: Or well, that's my first guess. I also don't know why nouveau is repeatedly trying to load nve7_fuc084
15:27TheLemonMan: heh awesome news
15:28TheLemonMan: I think your guess is correct, I've just found this https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907212
15:28RSpliet: unbinding works 'eh?
15:30TheLemonMan: not really, runtime_status is still 'active'
15:30TheLemonMan: and vgaswitcheroo is still reporting DynPwr
15:33RSpliet: Oh
15:33RSpliet: hmm
15:34RSpliet: imirking, karolherbst: this is a continuation of an LKML/ML discussion from back in January
15:34RSpliet: https://lkml.org/lkml/2020/12/14/25
15:34RSpliet: *imirkin
15:35tertl3: https://www.reddit.com/r/linux/comments/mo0ay0/hacker_figures_how_to_unlock_vgpu_functionality/
15:54RSpliet: TheLemonMan: just breathed some fresh life into the accompanying mailing list thread. Hopefully someone will pick it up soon
15:57TheLemonMan: RSpliet, thank you, I'm always up for testing and debugging new patches
15:59tertl3: i sgned up for the mai ling list but I never get mail?
16:01karolherbst: RSpliet: oh, that might work
16:19tertl3: ok I completed the mailing list confimation I shuld be getting the mail here on out
18:03imirkin: pmoreau: it has both, always
18:03imirkin: pmoreau: similar to x86 -- $r0 is like "ax" and $r0h is like "ah" and $r0l is like "al"
18:05pmoreau: Okay; is there a mode for byte-size ones too?
18:05imirkin: no
18:05imirkin: just like x86 ;)
18:05pmoreau: :-)
18:05imirkin: but certain ops may produce u8/s8 values
18:05imirkin: (esp cvt)
18:05imirkin: but they'd be dumped into 16-bit regs
18:06imirkin: generally there are very few ops which mix 16- and 32-bit regs
18:06imirkin: cvt (and mad) are the only ones i can think of
18:06imirkin: [and yeah, technically $a is 16-bit, but i'm not counting that)
18:07pmoreau: I see
19:16imirkin: pmoreau: btw, i discovered i have some CB flush issues, if i super-flush always, that fixes it, but obviously not desirable...
19:34dob1: hi, I have problems with an old geforce4 ti 4200, the mouse pointer/cursor is not visible
19:35imirkin: dob1: pastebin dmesg + xorg log
19:55dob1: imirkin, https://paste.debian.net/hidden/a91e088e/ and https://paste.debian.net/hidden/1e576e49/
19:55imirkin: tahnks
19:56imirkin: give me a min
19:56imirkin: dob1: is this a laptop btw?
19:56dob1: no wait a second
19:56dob1: because I removed it and now I am not able to install as it was
19:56imirkin: [ 6.570711] ata2.01: limited to UDMA/33 due to 40-wire cable
19:56imirkin: lol
19:57imirkin: i remember those days.
19:57dob1: it says Loading /usr/lib/xorg/modules/drivers/nouveau_drv.so but I don't see it on lsmod
19:57imirkin: dob1: those logs seem to indicate nouveau is used though
19:57dob1: ah ok
19:57imirkin: that's not going to show up in lsmod
19:57imirkin: that's an X driver
19:57imirkin: not a kernel driver
19:57imirkin: lsmod should show 'nouveau' (unless you built it into the kernel)
19:57dob1: I was sure it was on lsmod output
19:58dob1: before I uninstalled it
19:58imirkin: yeah, you should see `nouveau` in there
19:58dob1: it's not here right now
19:58dob1: anyway if it's loaded it's ok
19:58dob1: ah no, wrong pc :)
19:58dob1: sorry !|
19:58dob1: it is on lsmod!
19:58imirkin: lol
19:59dob1: ok so the problem: the mouse works, I can press buttons etc etc
19:59dob1: but no cursor
19:59imirkin: what DE are you using?
19:59imirkin: (if any)
19:59imirkin: also did this ever work?
19:59dob1: well I tried first xfce4 (changing the mouse pont cursor to try but nothing) , then openbox nothing, now I am using i3 because I don't need a mouse with it
20:00imirkin: hehe
20:00imirkin: ok, just making sure it wasn't some sort of fanciness with compositors/etc
20:00imirkin: i'm at a bit of a loss
20:00dob1: if I remove the xorg driver and the kernel driver I think it will go with vesa and the cursor is there
20:00imirkin: wait, maybe this does sound familiar ... let me do some searching
20:02imirkin: dob1: https://bugs.freedesktop.org/show_bug.cgi?id=54700#c69
20:03imirkin: dob1: and actually perhaps even more interestingly
20:04imirkin: dob1: https://bugs.freedesktop.org/show_bug.cgi?id=54700#c61
20:04imirkin: can you get a copy of envytools, and do a "nvapeek 100080" and let me know what it returns?
20:04dob1: if it's on debian
20:05imirkin: dunno
20:05dob1: it's not in its repository
20:08imirkin: ok
20:08imirkin: well, you should get it then
20:08imirkin: or try booting with nouveau.config=NvAGP=0
20:08imirkin: which will disable agp
20:09dob1: I try
20:14dob1: I am not sure where to use this options
20:14imirkin: you need to pass this on the kernel cmdline
20:14imirkin: i don't know how to do that in your setup, perhaps you can search around online or ask in a distro support channel
20:15dob1: at grub screen I edited the boot options
20:15imirkin: sure, that works
20:15dob1: there were some of them and the path to the kernel
20:15dob1: I added a line with this option
20:15imirkin: just stick that at the end
20:15dob1: but nothing changed
20:15dob1: ah at the end
20:15imirkin: of the kernel cmdline
20:15imirkin: (or beginning, or middle, doesn't matter)
20:15imirkin: iirc in grub the kernel cmdline is the first line
20:16imirkin: and the other lines are like extra options?
20:16imirkin: i forget
20:16imirkin: (like initrd, etc)
20:17dob1: I have like a lot of options / lines the last one is initrd /boot/initrd.img.xxxxxxx
20:17dob1: I write this options after this line?
20:18imirkin: no
20:18imirkin: on the first line.
20:18imirkin: at the end of the first line.
20:18imirkin: all lines after the first line are various grub options
20:18imirkin: not the kernel cmdline
20:20ccr: it's the line that starts with "linux" .. followed by something like /boot/vmlinuz... etc
20:20dob1: it's there!
20:20dob1: I got the mouse
20:21dob1: BUT
20:21dob1: how to I make this option permanent ?
20:21imirkin: you'll want to investigate this with your distro
20:22imirkin: we don't do distro support here
20:22dob1: I will do, you just helped me a lot :)
20:22dob1: thanks
20:22imirkin: (too many distros, insufficiently interested in keeping track of them all)
20:22imirkin: dob1: you should be able to keep agp around though, if you feel like things are slowing down
20:22imirkin: someone else tracked down the vbios init doing something dodgy
20:22imirkin: which nouveau should potentially undo
20:23imirkin: but i'd want to see if it works for you first
20:23ccr: in Debian edit /etc/default/grub and edit/add line like GRUB_CMDLINE_LINUX="nouveau.config=NvAGP=0" and run update-grub
20:23dob1: imirkin, it's an old pc I am going to dismiss it, to be honest I was reporting it more that resolving it, but with your help I resolved it too
20:24imirkin: dob1: ok cool
20:29dob1: ccr, I did it and it's ok, thanks
23:17Celmor[m]: I've switched from nvidia to nouveau. there are no nvidia related modules loaded but I get a bunch of nvidia relating errors in dmesg: https://termbin.com/5f1z
23:18imirkin: you're using some bit of userspace which requests that module to be loaded
23:18Celmor[m]: how can I find out what bit of userspace requests it?
23:18imirkin: dunno
23:18imirkin: you can just uninstall nvidia drivers
23:19imirkin: could be some opencl thing? dunno
23:19Celmor[m]: I'm still checking if nouveau is a usable alternative
23:19karolherbst: imirkin: do you think you'll have some time to look over the general idea of making the fence list race free in nouveau? I am missing a few places, but I'd like to add helper functions for everything and assert on the lock being taken before accessing fence data: https://github.com/karolherbst/mesa/commit/02fe1404611357eb445246c61773c5c778fef88c
23:19Celmor[m]: according to this everything is "TODO" for my card (1050ti) but it seems that matrix hasn't been updated in a while https://nouveau.freedesktop.org/VideoAcceleration.html
23:20imirkin: Celmor[m]: that's right
23:20Celmor[m]: which part?
23:20imirkin: matrix hasn't been updated
23:20Celmor[m]: so it might already be "DONE"?
23:20karolherbst: send patches :p
23:20imirkin: i barely even know what the rows mean tbh
23:21karolherbst: ohhh
23:21karolherbst: that's the video accel stuff
23:21karolherbst: imirkin: which row?
23:21imirkin: karolherbst: like any of them
23:21karolherbst: the video engine one?
23:21karolherbst: I think that's pretty obvious, no?
23:22karolherbst: Celmor[m]: there is like no progress on video acceleration, as... it's just less important than GL and we require firmware which is a pita to get to users systems anyway
23:23imirkin: karolherbst: oh, he was pointed at the video accel matrix. my bad
23:23karolherbst: yes
23:23imirkin: i certainly know what that one means ;)
23:23karolherbst: the other one should be fairly updated :D
23:23imirkin: Celmor[m]: no video decoding accel for maxwell+
23:23imirkin: (including your gpu)
23:23imirkin: not much point in updating it ... should be pretty obvious it's all TODO
23:24imirkin: (and no one's interested in doing)
23:24karolherbst: well I would be, I just prefer to work on more important bits :p
23:24karolherbst: imirkin: btw.. my threading branch has 0 regressions in all deqp :3
23:25imirkin: karolherbst: cool
23:25Celmor[m]: imirkin: opengl is now imported anyway
23:25imirkin: Celmor[m]: that page is not about opengl
23:25karolherbst: there is just one problem left to fix.. and I am still thkning about how to fix it
23:25karolherbst: _but_
23:25imirkin: it's about video accel
23:25karolherbst: it's good enough to run stuff like the android emulator :D
23:25karolherbst: so thinking about pushing smaller self contained patches up so we get rid of the races one by one
23:25Celmor[m]: Guess nouvea still isn't usable enough, at least for my card
23:25imirkin: karolherbst: so what i'm most concerned by are deadlocks
23:26karolherbst: imirkin: yeah I know
23:26imirkin: Celmor[m]: ah ok, enjoy the nvidia experience!
23:26karolherbst: but at least what I have works out just fine
23:26Celmor[m]: imirkin: would a 980 ti have support?
23:26karolherbst: the fence patch has a very ugly hack to workaround one terribly annoying deadlock
23:26imirkin: Celmor[m]: for what?
23:26karolherbst: imirkin: https://github.com/karolherbst/mesa/commit/02fe1404611357eb445246c61773c5c778fef88c#diff-3d285ef008ea8a4a002c5919503236e8d779734e9b7c33b94f3ff38a04959870R201
23:27Celmor[m]: meh, xorg keeps freezing
23:27karolherbst: but that is fine, just ugly
23:27imirkin: Celmor[m]: are you using xf86-video-nouveau
23:27imirkin: or modesetting?
23:27karolherbst: didn't think about a less ugly solution
23:27imirkin: if the latter, switch to -nouveau
23:27Celmor[m]: i suppose the former
23:27karolherbst: for the pushbuffers I still habe to think about how I want it to be in the end
23:27imirkin: Celmor[m]: confirm that by looking at your xorg log
23:28karolherbst: but for the nouveau_mm code _and_ the nouveau_fence code I have a good enough understanding on what works best
23:28imirkin: some distros think it's funny to prefer -modesetting
23:28imirkin: and leaving me with the support headache
23:28imirkin: sigh
23:28Celmor[m]: if I could, xorg keeps freezing
23:28karolherbst: imirkin: you can ignore those until the fixes all land :D
23:28imirkin: karolherbst: doing unlock / lock sequences basically admits that your locking logic is all bunk
23:29karolherbst: imirkin: it's the only place and it's because the push logic touches fence state again
23:29imirkin: Celmor[m]: don't need X to look at X log...
23:29Celmor[m]: i can't switch tty
23:29karolherbst: the callback doing fence shit :/
23:29karolherbst: it's super annoying
23:29imirkin: karolherbst: so this spot is what caused me tons of heartache in my attempt as well
23:29karolherbst: yep
23:30karolherbst: and I think this is the best way out without rewriting like everything
23:30imirkin: Celmor[m]: sounds like you should just switch back to nvidia
23:30karolherbst: I mean.. we could also call *_unlocked variants in the callback... I guess
23:30imirkin: 980ti is unlikely to provide you with any better experience
23:30Celmor[m]: I've followed this guide for setting up nouveau https://wiki.archlinux.org/index.php/Nouveau
23:31imirkin: i have no idea how to help you if you can't look at logs
23:31imirkin: so all i can say is "good luck"
23:31karolherbst: imirkin: but the painful part of requiring the kick_notify handler to require the locked fence state is, that we will have to lock the list whenever we do literally anything
23:32imirkin: karolherbst: yeah, sounds like too fine-grained of a locking thing?
23:32karolherbst: yep
23:32imirkin: i had much wider locks
23:32imirkin: because of things like that
23:32karolherbst: releasing the lock in the fence code in a controlled way allows us to lock the locks way less often
23:32imirkin: i.e. it didn't make sense to make it super-granular
23:32karolherbst: sure
23:32imirkin: but perhaps you can reorganize the code so that this wouldn't matter as much
23:33imirkin: e.g. instead of emitting stuff to pushbuf directly
23:33karolherbst: but you don't have to look the fence stuff all that often
23:33imirkin: you could stage it into a "buffer"
23:33imirkin: and then emit all in one go
23:33karolherbst: imirkin: yeah, I plan this to fix the pushbuf stuff for real
23:33karolherbst: still we touch fence code
23:33imirkin: of course the libdrm_nouveau logic has stuff to keep trakc of memory used
23:33karolherbst: when kicking
23:33imirkin: and emit when it reaches a watermark
23:33karolherbst: imirkin: we can actually just use multple nouveau_pushbuf objects
23:34karolherbst: and memory is tracked per push buffer
23:34imirkin: sure, but that doesn't solve the issue
23:34Celmor[m]: imirkin: there's no "modesetting" in my Xorgs log
23:34karolherbst: we just have to be careful about submission order
23:34imirkin: Celmor[m]: what about modeset(0)
23:34karolherbst: imirkin: not in regards to fences, correct
23:34imirkin: Celmor[m]: or do you see NOUVEAU(0)
23:34Celmor[m]: nope
23:35imirkin: karolherbst: this somewhat improved by adding an ioctl which allows the kernel to emit the fence
23:35imirkin: Celmor[m]: pastebin xorg log
23:35Celmor[m]: "initializing extension XFree86-VidModeExtension"
23:35karolherbst: imirkin: yeah well sure.. but we still have this fence struct :/
23:36karolherbst: the really painful part is just one of those fence functions to kick the pushbuf...
23:36karolherbst: _maybe_ we should just rework the code so that doesn't happen
23:37karolherbst: and then the problem just goes away
23:37imirkin: just remeber that something like nouveau_bo_wait() or nouveau_bo_map() can cause a push kick
23:37Celmor[m]: termbin.com/cwb3
23:37karolherbst: sure
23:37karolherbst: but that's irrelevant for the fences for now
23:37imirkin: Celmor[m]: you're using nvidia driver in that log
23:37karolherbst: soo.. the issue is when you call nouveau_fence_kick
23:37karolherbst: and this adds stuff to the pushbuf
23:37karolherbst: which can lead to a kick
23:38karolherbst: either by PUSH_SPACE or nouveau_pushbuf_kick
23:38imirkin: so *originally*
23:38karolherbst: which calls nvc0_default_kick_notify later on, which touches fence state
23:38imirkin: i didn't have that PUSH_SPACE there
23:38imirkin: HOWEVER
23:38imirkin: you can end up in situations where we emit a bunch of fences in sequence
23:39Celmor[m]: then it has to be the wrong log, sry
23:39imirkin: and so the extra 64 or whatever bytes we reserve in PUSH_SPACE end up getting used up and we run out of space
23:39imirkin: i don't remember precisely the situation, but it's somethingl ike that
23:39karolherbst: imirkin: what if we make the caller of nouveau_fence_kick responsible for reserving space and assert on that?
23:39imirkin: you can look at the comit log when i added that PUSH_SPACE
23:39Celmor[m]: oh wait, i did try to load Xorg a few times and only fixed xorg config later
23:40Celmor[m]: so the start of the log isn't related to the current Xorg session
23:40imirkin: Celmor[m]: there's one log file per session
23:40karolherbst: imirkin: the space isn't all that relevant anyway
23:40karolherbst: we still have that nouveau_pushbuf_kick call
23:42karolherbst: so in the end nouveau_fence_wait and nouveau_fence_work can lead to a kicked push buffer
23:42karolherbst: and I am wondering if we either leave it like that and do this crappy unlock/lock cycle
23:42karolherbst: or
23:42imirkin: right
23:42karolherbst: we don't push the buffer
23:42karolherbst: and make the callees do it
23:42imirkin: you have to push
23:42imirkin: o
23:43imirkin: how would that help
23:43karolherbst: we just have to move the push outside the locked areas
23:43karolherbst: essentially
23:43karolherbst: so, the idea was, if you touch fence state, you have to take a lock
23:43karolherbst: because.. it gets read and written quite a lot
23:43Celmor[m]: imirkin: this should be the correct one, sry for the confusion termbin.com/0g7s
23:43karolherbst: and locking internally is super painful
23:44karolherbst: because.. recursive locking and weird call trees and what not
23:44karolherbst: it was easier to throw in a bunch of asserts and make callees to take the lock
23:44imirkin: Celmor[m]: looks good
23:44karolherbst: ehhh
23:44karolherbst: s/callee/caller/
23:45imirkin: [ 5149.966] (EE) NOUVEAU(0): failed to set gamma with 256 entries: Permission denied
23:45imirkin: Celmor[m]: that implies that you should have some errors in dmesg
23:45imirkin: i'd like to see them
23:45imirkin: (sounds like a display hang)
23:45karolherbst: the problem is just, once we get to kick_notify, we have take the fence lock, because.. we touch fence state there again
23:46karolherbst: my solution is crappy but works, I am wondering if there is a nice solution without rewriting the world
23:46imirkin: karolherbst: you could flip that
23:46Celmor[m]: imirkin: https://termbin.com/kzjl4
23:47karolherbst: imirkin: in what sense?
23:47imirkin: you could require anything which MIGHT kick to take the fence lock
23:47karolherbst: well...
23:47karolherbst: that's like everything
23:47imirkin: yea
23:47imirkin: and then you don't have fine-grained locks :)
23:47karolherbst: I wanted to not do that :D
23:47imirkin: Celmor[m]: thanks, gimme a min
23:47karolherbst: I had some test apps which really got slowed down by locking
23:48karolherbst: so it is noticeable
23:48karolherbst: and it's really not a problem of doing it fine grained
23:48karolherbst: because if you mess it up while working on the code, you get asserts
23:49karolherbst: imirkin: also.. simple_mtx_t is _cheap_
23:49karolherbst: like really cheap
23:50karolherbst: maybe I play around with the idea of moving the push_kick stuff out.. but that could be painful as well :/
23:50imirkin: Celmor[m]: that log ends on Apr 11 00:38, but the errors in Xorg log are like an hour after that (unless i'm misreading)
23:50Celmor[m]: must've gotten truncated then, I've read some errors from journalctl about that
23:51imirkin: can you run "dmesg" and pastebin that?
23:51Celmor[m]: not sure how else to get you the whole log
23:51Celmor[m]: I've already rebooted hence why I did "journalctl -xb -1"
23:51imirkin: oh
23:51imirkin: well, anyways, my theory is that we did something to upset the display controller
23:51karolherbst: ohhh, I have an idea :O
23:51imirkin: which in turn led for things to go downhill quickly
23:51karolherbst: will.... try stuff out for a few days :D
23:51Celmor[m]: still trying to get you the full log
23:54Celmor[m]: it appears the paste was simply too large. it's 3.6M
23:54imirkin: pmoreau: btw, thanks for taking a look
23:55imirkin: pmoreau: i don't think i'll be able to figure out the CB issues tonight, but i'm probably going to push those prep patches anyways
23:55imirkin: since it's a lot closer to working now than before :)
23:55imirkin: Celmor[m]: ok, i'm more interested in the last bits than the first bits
23:56Celmor[m]: imirkin: alright, reversed the list: https://termbin.com/rivb
23:59imirkin: Celmor[m]: looks like you have an electron-based app which triggered a hang
23:59Celmor[m]: yeah, I've noticed the hang when I wanted to copy a message. using element.io/matrix to chat here
23:59Celmor[m]: element.io aka matrix*