00:35coderobe: Hey, occasionally when using DRI_PRIME (modesetting + nouveau) the application hangs forever and after a while locks up the device - requiring a hard reset. DMESG logs this during the event: https://tmp.codero.be/UbiquitousBlackTarsier6.txt - any ideas on how to fix this?
00:36coderobe: The kworker will continue to hang, and the cpu usage on the threads the program is running on rises to 100%
01:28Lyude: Is there a reason we actually free and re-request the main nvkm irq in nouveau on suspend/resume instead of just leaving it allocated and letting the kernel handle it?
14:05karolherbst: mupuf: do you have a GPU with "real" FP64 support?
14:07mupuf: karolherbst: What gens would be that?
14:07mupuf: I thought all of them had it, albeit a little slow
14:07karolherbst: no gen really
14:07mupuf: the titan should have fast FP64
14:07karolherbst: the GK110 780 Ti doesn't have it, but the Titan
14:07karolherbst: ohh, you actually have a titan?
14:08karolherbst: ohh right
14:08karolherbst: but on those FP64 was like 1/4 of FP32, right?
14:08mupuf: I paid more import taxes on this than I paid for most other GPUs
14:08karolherbst: yeah, I remember
14:08mupuf: well, something like this
14:09mupuf: why should it matter?
14:09mupuf: Are you making an instruction scheduler?
14:09karolherbst: just testing
14:09karolherbst: I just check general availability
14:09mupuf: btw, I ran IGT on nouveau last weekend
14:10karolherbst: I would like to have a collection of GPUs I can work on whenever I want, so I want to keep what you have in consideration as well
14:11mupuf: I will check again that everything was setup properly and that it was actually using nouveau
14:11karolherbst: not so nice?
14:11karolherbst: yeah, it seems a bit odd
14:11karolherbst: but yeah
14:11mupuf: well, only ~4 fails are nouveau issues
14:11karolherbst: good work nethertheless
14:11karolherbst: yeah, sure
14:11karolherbst: but 4k skips
14:11mupuf: the rest is just IGT complaining about the lack of Intel-provided stuff
14:12karolherbst: might want to move those into drm core maybe
14:12mupuf: at best, we could get 1500 tests
14:12karolherbst: better than 150 ;)
14:12mupuf: yep :p
14:12mupuf: actually, right now, only ~60 are completely free of Intel-isms
14:12karolherbst: anyway, things start to finally move here so I can jump into real testing as well
14:12mupuf:wants to check what it looks like on the blob too :D
14:15coderobe: reposting this from yesterday: occasionally when using DRI_PRIME (modesetting + nouveau) the application hangs forever and after a while locks up the device. The kworker will continue to hang, and the cpu usage on the threads the program is running on rises to 100% - requiring a hard reset. DMESG logs this during the event https://tmp.codero.be/Flu
14:16coderobe: does anyone have a clue as to what causes this?
14:19karolherbst: coderobe: nouveau.runpm=0 probably helps
14:19coderobe: i'm assuming that would disable runtime power management entirely
14:20karolherbst: what kind of device do you have?
14:20karolherbst: I mean laptop
14:21coderobe: this is a lenovo y50-70 with a muxless gtx 860m
14:21coderobe: can i somehow manually manage the power state of the gpu when runpm is disabled?
14:21coderobe: assuming that fixes the issue
14:21karolherbst: coderobe: in theory you can do the ACPI calls yourself
14:22karolherbst: it highly depends on your actual issues
14:22karolherbst: I want to dig into those bugs anyway, so I kind of want to know what issues we have on what machines
14:23coderobe: yeah, well i'd like to be able to provide more info but i don't think i can get any more info out of dmesg
14:23karolherbst: coderobe: I think the main issue is, that _PR3 support (some win8/10+ runpm feature) is causing troubles
14:23coderobe: also, this issue only occasionally happens
14:24coderobe: as in, sometimes offloading like this works without issues, and sometimes it renders nothing and then hangs like described above when killed
14:24karolherbst: an acpi_osi overwrite might also fix the bug
14:24karolherbst: like acpi_osi='!windows8' or so
14:24karolherbst: I never looked into that
14:24karolherbst: and I don't really know what to put there
14:30coderobe: should i try overriding acpi_osi first?
14:31karolherbst: you could, but you need to do some investigation on what to put in there
14:42karolherbst: mupuf: but do we actually support fp64 right now? I always though we just do crappy lowering with crap precision and that's why we had somebody working on getting better fp64 lowering working?
14:52coderobe: karolherbst: it seems like acpi_usi=!windows8 fixed it - i reproduced the issue before rebooting into the new cmdline and after the reboot i can't manage to reproduce it anymore
14:52karolherbst: yeah, makes sense
14:52karolherbst: thanks for confirming
14:52coderobe: this doesn't cripple the powermanagement functionality does it?
14:53karolherbst: not really
14:53karolherbst: maybe you save 1W less or something
14:53coderobe: heh alright
14:53coderobe: thanks for the help!
14:53karolherbst: does powertop report the power usage?
14:53karolherbst: usually if the GPU is off the usage should be somewhere around 5-14W depending on the laptop
14:53karolherbst: on idle
14:54karolherbst: and around +7W or more if the GPU is on
14:55coderobe: how long does it take for the gpu to drop to idle?
14:55karolherbst: around 5 seconds
14:55karolherbst: 5 seconds idle means it will be turned off
15:00coderobe: not quite sure how to get a power report when on AC - the "usage" percentage in powertop drops from 45% to 0% when closing all offloaded applications though
15:07karolherbst: coderobe: there should be a value in W or J as well
15:07karolherbst: but if not, then maybe Linux doesn't support reading it out for you
15:08karolherbst: coderobe: actually I doubt it will ever report it when on AC
15:08karolherbst: because it usually only gets read out from the battery
15:18coderobe: yeah that's what i thought
15:19coderobe: i guess i'll notice if my battery empties faster
15:20coderobe: next time i'm undocked and all
17:32Lyude: Is it possible to test suspending a nouveau GPU (and only test that) without putting the entire machine in S3?
18:09Lyude: nevermind, figured out a way I can test this without actually stopping the hw watchdog on this machine :)
18:21Lyude: btw mwk regarding the subdevice init order from yesterday: it seems a lot of stuff runs fini before pci init
18:21Lyude: so i'd assume something in there is probably getting us interrupts
19:00Lyude: I guess the bigger question now is whether there was a reason we tore down interrupts before suspend like this
19:23RSpliet: pmoreau: RE bug 104784, note Max is using Debian. You are permitted to question the age of their kernel
19:31pmoreau: RSpliet: True, but Karol’s patch for preventing the kernel to lock up when reclocking a suspend card haven’t landed yet, so does’t matter how old his kernel is. :-)
19:34Lyude: If interrupts are disabled on the MC subdev, I'm assuming that disables them for all of the sub devices
19:36feaneron: is it usual in nouveau/linux development that patches take so long to land?
19:40pmoreau: feaneron: They can take some time, yes. Thinking of any patches in particular? Also, it possible to miss some sometimes, and if no one pings about them, they’ll just never land.
19:40pmoreau: *it’s possible
19:47feaneron: i'm thinking about karol's patvh that prevents the kernel from freezing
19:49pmoreau: Somebody that has some knowledge about that stuff (which is very few people, most likely 2) need to find some time to review those patches.
19:49Lyude: feaneron: could you link to the patch you're referring to?
19:51pmoreau: Lyude: I think that it’s this series: https://patchwork.freedesktop.org/series/33967/
19:52pmoreau: Yep, patch 02 is the one that should prevent the kernel to freeze when reclocking a suspended card.
21:15feaneron: Lyude: what pmoreau said. that patch works wonders for dual gpu laptops.
21:16feaneron: i'm not sure what's the proper way of "upvoting" this patch series. adding a comment for that is obviously out of question