00:00 ericonr: is this an example of situations where nvidia doesn't have docs or that their docs don't provide enough info?
00:03 imirkin: ericonr: well, they provide nearly no documentation
00:03 imirkin: however it's not uncommon for the hardware team to say "do this sequence of 75 things and the good times will roll" without much further explanation
00:04 imirkin: i expect that the internal nvidia docs are pretty good, but even there i'm sure new hw bringup is a difficult affair
00:05 imirkin: we might not have docs, but we have something they don't have
00:05 imirkin: -- a working driver to analyze :)
00:06 skeggsb: yes, there are *some* advantages to reverse-engineering over implementing from docs - that is one of them :P
00:06 ericonr: kinda sucks that you can end up with an open implementation where the "base knowledge" is still closed off
00:06 ericonr: but I think this is an issue for other drivers as well, right?
00:07 imirkin: it's not easy knowledge to internalize
00:07 imirkin: even if the average graphics driver developer received a full dump of nvidia docs, it'd take them a year to make any sense of them
00:07 ericonr: does debugging the driver work well, then? I always figured that they must do some crazy magic to make it impossible for you to extract firmware from their drivers to make the card run at proper speeds
00:08 imirkin: nothing like that
00:08 imirkin: we're just lazy, and know that we can't redistribute the firmware anyways
00:08 skeggsb: i can extract them just fine, that's different to being useful as we can't ship them
00:08 imirkin: and they've recently made it more difficult (although i guess they're back to the original method now?)
00:09 imirkin: skeggsb: is it cpu-upload now? or do they still DMA it?
00:09 ericonr: so you're saying you could run a modern card on nouveau at adequate speeds?
00:09 imirkin: no.
00:09 imirkin: not with current code.
00:09 ericonr: fair enough
00:09 skeggsb: they still DMA them, but if you intercept and walk the GPU page tables...
00:09 imirkin: skeggsb: yeah, that's somewhat painful.
00:10 imirkin: skeggsb: how do you intercept btw?
00:10 skeggsb: i know, but i did it to debug turing acr
00:10 imirkin: ;)
00:10 skeggsb: well, on gp102 and above, with enough random modifications/options to nvidia's driver, you can extract it from a mmiotrace
00:11 skeggsb: on earlier GPUs you'd need to stick the extraction code *into* mmiotrace
00:11 imirkin: so basically gm20x
00:11 skeggsb: with some fun tricks so you can access registers from within the page fault
00:11 karolherbst: skeggsb: sounds painful
00:11 skeggsb: it's... not simple :P
00:12 skeggsb: i basically capture the entire WPR, then have code to pull the individual firmwares out of that and write them out to the format nvidia give them to us in
00:13 karolherbst: imirkin, skeggsb: anybody of you have a tesla gpu ready
00:13 karolherbst: ?
00:13 skeggsb: ericonr: and no, just having the fw is *not* enough, there's a *massive* amount of code to write to interface with PMU for that
00:13 karolherbst: or nearby or whatever?
00:13 ericonr: I once thought of extracting the firmware and distributing the program to retrieve it so people could stick it into nouveau. That part would be hard, then it would have to be necessary to make nouveau work with said firmware, right?
00:13 karolherbst: I'd like to know over what shaders we trip over in https://gitlab.freedesktop.org/mesa/mesa/-/issues/3066#note_516610
00:13 imirkin: karolherbst: G84
00:14 ericonr: skeggsb: makes sense. Sucks, too :c
00:14 karolherbst: apparently you just need to visit https://store.google.com/ with firefox
00:15 karolherbst: and I noticed I still didn't pushed the RA fix... uff
00:15 karolherbst: I thought I did.. oh well
00:15 imirkin: fwiw i did fix another error mozilla reported for G84
00:15 imirkin: er
00:15 imirkin: well, tesla series
00:15 imirkin: but i'm pretty sure it wasn't that. some use-after-free
00:15 karolherbst: but not an RA crash, was it?
00:16 imirkin: no - commit 1288ac7632b31a20497a0e75f374f66ce3d5bc3c
00:16 karolherbst: right..
00:16 karolherbst: I remember that one
00:16 imirkin: not use-after-free
00:16 imirkin: more like use-out-of-bounds
00:16 karolherbst: mhh.. ../src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp:903 now I need to know which version of mesa
00:16 imirkin: and very rarely out-of-bounds = next page, aka fail.
00:16 imirkin: (if it's not mapped)
00:17 imirkin: ok, so i just need to run firefox on the G84 right?
00:17 karolherbst: apparently
00:17 karolherbst: and visit the google store :p
00:17 imirkin: how do i check if it worked?
00:17 karolherbst: no clue
00:18 karolherbst: "Most users that experienced this crash hit it by visiting https://store.google.com with a recent version of Firefox (both release and ESR)." is all I got
00:18 karolherbst: ahh 20.0.4.. let's see
00:18 imirkin: success
00:18 imirkin: it crashes.
00:18 karolherbst: :)
00:18 karolherbst: nice
00:18 imirkin: no clue how to get info
00:19 karolherbst: if you give me the glsl file I can just try it locally
00:19 imirkin: nothing on stdout
00:19 karolherbst: dump shaders?
00:19 imirkin: oh yeah ok
00:19 imirkin: how do i do that again?
00:19 imirkin: MESA_GLSL=dump ?
00:19 imirkin: there's a thing that outputs shader_test files, no?
00:19 karolherbst: MESA_SHADER_CAPTURE_PATH
00:20 imirkin: fffffff shader cache
00:20 imirkin: how do i kill that?
00:20 karolherbst: MESA_GLSL_CACHE_DISABLE
00:21 imirkin: uh what
00:21 imirkin: are those disabled in an opt build or something?
00:21 karolherbst: uhh..
00:22 karolherbst: doesn't look like it
00:23 imirkin: and LD_LIBRARY_PATH doesn't work coz of some dumb sandbox thing i think
00:23 karolherbst: ohh
00:23 karolherbst: no
00:23 karolherbst: I think the env vars don't get passed to the rendering thread :p
00:23 karolherbst: or.. something dumb
00:23 imirkin: no they do
00:23 imirkin: coz it looks for nouveau_dri.so
00:23 karolherbst: ehh
00:23 imirkin: in the right place
00:24 imirkin: and doesn't find it
00:24 imirkin: and it works with glxinfo
00:25 karolherbst: I meant more like firefox cleaning the env vars or something stupid
00:25 imirkin: https://hastebin.com/umijujehiq.sql
00:25 imirkin: do you have another explanation?
00:26 karolherbst: LD_DEBUG=libs usually points out silly things
00:26 karolherbst: but yeah.. no idea what's wrong there
00:28 karolherbst: mhh.. the stack points to this line: https://gitlab.freedesktop.org/mesa/mesa/-/blob/d3586b5291e1be729023d978601be5d4af4a01ff/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp#L903
00:28 karolherbst: I guess it could also be a fallout from doing multiple RA iterations and I really should just push the fix.. (I had it locally on the master branch though...)
00:29 karolherbst: just noticed it today
00:30 imirkin: ok - progress.
00:30 imirkin: i got it to load my thing
00:32 imirkin: ok, got it
00:33 imirkin: ffs, MESA_GLSL=dump truncates it.
00:33 imirkin: and MESA_SHADER_CAPTURE_PATH doesn't seem to work
00:34 karolherbst: probably some sandboxing crap and it's written somewhere else
00:35 imirkin: no
00:35 imirkin: it gets written too late
00:35 karolherbst: :/
00:36 imirkin: let me try moving up
00:36 karolherbst: imirkin: mind running it with a different gpu
00:36 karolherbst: and see if the shader_test files crash with tesla?
00:36 imirkin: won't work well
00:36 karolherbst: might be simplier
00:36 imirkin: hold on
00:38 imirkin: got it
00:40 imirkin: fck
00:40 imirkin: doesn't fail outside firefox
00:40 karolherbst: ffs
00:40 karolherbst: nvm
00:40 imirkin: hold on
00:40 karolherbst: give me the sahders :D
00:40 imirkin: maybe it does.
00:40 karolherbst: I use libasan
00:41 karolherbst: if libasan doesn't find it, it's pointless to debug anyway :p probably
00:41 imirkin: no it fails
00:41 imirkin: ok
00:41 imirkin: er, i mean fails to repro
00:41 karolherbst: but are there RA fails or any other messages?
00:41 imirkin: https://paste.debian.net/1149669/
00:41 imirkin: no
00:42 karolherbst: ehh.. is it the only shader?
00:42 imirkin: oh lol
00:42 imirkin: i was running it on the wrong gpu
00:42 imirkin: hold on
00:42 karolherbst: ahhh
00:43 imirkin: yay, it fails
00:43 karolherbst: I am sure it's something dumb
00:43 imirkin: https://hastebin.com/ajimezogac.shell
00:43 imirkin: the instruction is null
00:43 imirkin: this sounds familiar.
00:43 karolherbst: fun
00:43 imirkin: i thought i even fixed that. i guess there's another instance.
00:45 imirkin: 876 if (isShortRegOp((*def)->getInsn()))
00:45 imirkin: (gdb) p **def
00:45 imirkin: $5 = {value = 0x0, origin = 0x0, insn = 0x0}
00:45 imirkin: so ... that's not COMPLETELY ideal...
00:47 imirkin: hm, looks like that lvalue is defined by a phi node
00:47 imirkin: anyways, you have the shader. enjoy
00:47 imirkin: the simplest thing to do will likely be to grab the tgsi, and then feed it to nouveau_compiler
00:47 imirkin: note that it has a small-ish buffer by dfault, so just 10x it
00:47 karolherbst: yeah.. just.. it doesn't seem I can reproduce it :O
00:48 imirkin: hm?
00:48 imirkin: did you do the stuff i mentioned?
00:49 karolherbst: NV50_PROG_DEBUG=3 NV50_PROG_CHIPSET=0x84 DRI_PRIME=1 run_local_mesa ./run -d1 tmp.shader_test
00:49 karolherbst: it just compiles..
00:49 karolherbst: maybe some random local patches fix it..
00:51 imirkin: dunno
00:52 karolherbst: the dumped binary is also a valid tesla binary...
00:52 karolherbst: oh well
00:52 imirkin: ok
00:52 karolherbst: I will figure it out
00:52 imirkin: well i didn't make it up ;)
00:53 karolherbst: yeah.. I was running on a local branch with like 50 patches :p
00:53 karolherbst: could be anything
00:53 imirkin: could be gcc10 =]
00:53 karolherbst: gcc fixing issues? doubtful :p
00:54 karolherbst: imirkin: what commit did you test on?
00:55 imirkin: mmm
00:55 imirkin: bf3c9d27706dc2362b81aad12eec1f7e48e53ddd (sorta - i have 2 local changes, but they should have nothing to do with this)
00:56 karolherbst: maybe some stupid difference between nvc0 and nv50 setting up codegen differently and things are different enough...
00:57 karolherbst: how can I force the nv50 path?
00:57 imirkin: grab the tgsi
00:57 imirkin: then feed it to nouveau_compiler -a 84
00:57 imirkin: that's 99% identical
00:57 karolherbst: but the TGSI could already be different :/
00:57 karolherbst: or not?
00:58 imirkin: between two wildly different drivers - yes
00:58 imirkin: based on various options they set
00:58 imirkin: however in practice, i think the differences between nv50 and nvc0 settings in this area are minor
00:58 imirkin: but let me get you the tgsi
00:58 karolherbst: yeah.. should be better if
00:58 imirkin: https://hastebin.com/diriyiyote.md
00:58 karolherbst: thanks
00:59 imirkin: note that you'll probably have to modify nouveau_compiler slightly
00:59 imirkin: to make it accept larger data input
00:59 imirkin: just add a 0 into that array size
00:59 karolherbst: the TGSI is different :p
00:59 imirkin: what about it is different?
01:00 karolherbst: UIF vs UCMP seems like the biggest difference
01:00 imirkin: uhm
01:00 imirkin: that's surprising
01:00 imirkin: something is doing optimizations in your version maybe?
01:00 imirkin: could just be difference of commits
01:00 karolherbst: maybe
01:01 imirkin: i can't think of a reason that would be different between the two
01:01 karolherbst: I am on your commit now.. let me see if the difference is still there
01:01 karolherbst: 84, right?
01:01 imirkin: yes
01:02 karolherbst: yeah.. still different
01:02 karolherbst: this is mine: https://gist.githubusercontent.com/karolherbst/8346dccfd412188fb10727f182148bf1/raw/d1ef054a0444d25042417cfde05613f01f818947/gistfile1.txt
01:02 imirkin: weird.
01:02 karolherbst: yeah well
01:02 imirkin: not sure why that'd be the case.
01:02 karolherbst: nvc0 vs nv50 :p
01:02 karolherbst: could be some weird CAP
01:02 imirkin: can't think which one
01:02 karolherbst: me neither
01:03 imirkin: would be worth tracking down imo
01:03 imirkin: i hate not understanding stuff
01:04 karolherbst: your TGSI compiles here :/ *sigh*
01:05 karolherbst: could be some weird memory shit for real...
01:05 karolherbst: I will check again tomorrow
01:06 imirkin: let me try it...
01:09 imirkin: karolherbst: i get the segfault with nouveau_compiler
01:09 imirkin: just make sure to bump up tokens and the other thing
01:09 karolherbst: heh...
01:09 karolherbst: yeah.. let me verify it
01:10 imirkin: https://hastebin.com/cozopilewu.m
01:10 imirkin: nouveau_compiler -a 84, and then feed the tgsi on stdin
01:10 imirkin: (or iirc it might be able to take a filename arg?)
01:10 karolherbst: still no crash
01:11 karolherbst: ./build/src/gallium/drivers/nouveau/nouveau_compiler -a 0x84 tmp2.frag
01:11 imirkin: -a 84
01:11 karolherbst: same
01:11 karolherbst: I am even on bf3c9d27706dc2362b81aad12eec1f7e48e53ddd
01:11 imirkin: https://hastebin.com/wikuxifoho.coffeescript
01:11 karolherbst: let me turn on libasan
01:11 karolherbst: just in case
01:13 karolherbst: ahhhh
01:13 karolherbst: ==962268==ERROR: AddressSanitizer: heap-use-after-free on address 0x6150002c4510 at pc 0x000000435fdb bp 0x7ffc4c7fa580 sp 0x7ffc4c7fa570 :)
01:13 karolherbst: even if this shouldn't be the issue, we should fix it :p
01:14 karolherbst: but yeah
01:14 karolherbst: it's the bug
01:14 karolherbst: imirkin: https://gist.githubusercontent.com/karolherbst/a329cb5437b06cd84cbbf7dc7dc47b34/raw/21c734052a0bb8def99c2fbb9cea1679232927f7/gistfile1.txt
01:14 karolherbst: looks like it, no
01:14 karolherbst: ?
01:14 karolherbst: but also looks like the spilling bug
01:15 karolherbst: ahh yeah..
01:16 karolherbst: imirkin: mind verifying that 4b8a5b0c1881d81f1a8047b58bf003abdfdca2ee from my repo fixes it?
01:17 imirkin: not right now
01:17 imirkin: my repo's in a weird state
01:17 karolherbst: I will go to bed now anyway :p
01:17 imirkin: k, see ya
03:59 imirkin: karolherbst: would you be able to run some blob mmt's for me?
04:17 imirkin: skeggsb: i'm looking for the piece of logic which determines whether we use snooped or unsnooped pci mappings for "gart" buffers. but it's a huge pile of spaghetti, esp with ttm in there
04:17 imirkin: can you give me some pointers?
04:20 imirkin: oh wait. i think i just realized something ...
04:21 imirkin: if you have any coherent mappings, userspace can modify the underlying data right after a glDraw()
04:21 imirkin: without waiting for the draw to read the data in the first place
04:21 imirkin: which means that we have to essentially synchronize the pipeline AFTER any such draw
04:21 imirkin: hrm.
04:21 imirkin: although that's not what's happening here
06:51 imirkin: karolherbst: fyi -- https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5276/diffs?commit_id=2900ea3cd7c581f0afe88e0249d9782f8db87a8b
06:52 imirkin: and AndrewR, in case you're looking at the logs - should fix blender
08:48 karolherbst: imirkin: does this fix the issue with this weird application which got reported here?
08:49 karolherbst: ohh wait.. that got solved differently :D
16:40 imirkin: karolherbst: iirc that was an application bug with forgetting to set gl_PointSize
16:41 imirkin: unless you were talking about a different issue
16:41 karolherbst: yeah.. I just remembered too late and read your sentence after the link too late
16:41 imirkin: ;)
16:42 imirkin: well, at least i'm glad i didn't have to turn off the whole coherent mapping mechanism for nouveau
16:42 karolherbst: yeah..
16:42 karolherbst: that would be annoying
16:42 imirkin: but i don't think it's a path that could previously be hit directly via GL
16:43 imirkin: doing the coherent map thing is a good idea for maxwell+
16:43 imirkin: i'd kinda rather have the userptr for the earlier gpu's though
16:44 imirkin: which could just take immediate vertex submissions directly
16:44 karolherbst: this will come with the UAPI reowrk
16:44 imirkin: no
16:44 imirkin: i mean the regular user ptr
16:44 imirkin: i.e. client-side buffer
16:44 imirkin: that i can read from directly
16:44 karolherbst: yeah
16:44 imirkin: and stick the data into the pushbuf
16:44 imirkin: not the SVM thing.
16:44 karolherbst: I know :p
16:44 imirkin: doesn't need any UAPI rework
16:44 karolherbst: you mean adding a page entry for CPU memory into the GPU, no?
16:44 imirkin: just what mesa/vbo decides to do
16:44 karolherbst: ohh
16:44 karolherbst: I see
16:45 imirkin: i guess for glVertex, it's gotta go somewhere
16:45 imirkin: (unless you make more dedicated APIs for submitting that stuff directly)
16:45 imirkin: hm
16:46 karolherbst: I am actually more interested in just being able to add host memory to the GPUs VM and not bother with a limited variant of that :p
16:47 karolherbst: imirkin: the old user mem stuff only works for things which can explicitly accept host memory if you set GART vs VRAM on the command, right?
16:59 imirkin: i just meant a pointer to the client buffer
16:59 imirkin: that the driver can read vertex data from.
16:59 imirkin: no gpu involvement.
17:00 karolherbst: ohh, I see
17:00 imirkin: not backed by a pipe_resource
17:00 imirkin: aka a user_buffer
17:19 imirkin: karolherbst: any opinion on the patch, or should i just push?
17:20 imirkin: i'm gonna run some CTS tests first
17:20 karolherbst: ahh cool
17:20 karolherbst: yeah, the patch is r-by me, I was mainly wondering if I want to run some tests first or not
17:22 imirkin: not a ton of cts coverage
17:22 imirkin: there are a few piglits
17:44 imirkin: well, piglit tests pass
20:08 urkk: Hi, since some months ago, I'm unable to suspend my system when using Xorg and nouveau, but from the virtual console works fine
20:09 imirkin: which GPU?
20:10 urkk: VGA compatible controller: NVIDIA Corporation GT216M [GeForce GT 230M] (rev a2)
20:10 urkk: I tried unbinding the vtconsole device /sys/class/vtconsole/vtcon1/bind
20:10 urkk: But still I'm unable to rmmod the nouveau module
20:11 urkk: As suggested as a workaround before suspending
20:12 imirkin: weird
20:12 urkk: Also, I'm unable to obtain any logs after the lines: kernel: PM: suspend entry (deep)
20:13 imirkin: which kernel did this work with?
20:13 urkk: 5.6.15-arch1-1
20:13 urkk: Ah sorry
20:14 urkk: thats the one I use now
20:14 urkk: But I don't remember when it was introduced
20:14 imirkin: hm ok
20:14 urkk: But I believe about 6 months ago
20:14 imirkin: ok, so like 5.3 or something
20:15 urkk: Yeah, I think so
20:16 urkk: Can I extract some debugging information somehow?
20:16 imirkin: pm is one of the worst things to debug =/
20:16 imirkin: so what happens when it fails?
20:16 imirkin: does it just insta-resume, or hang, or?
20:16 urkk: It hangs
20:17 urkk: Fans still working, and leds are on
20:17 imirkin: do the displays turn off?
20:17 urkk: Screen goes black
20:17 urkk: And the caps lock led doesnt work
20:17 urkk: In the laptop keyboard
20:17 imirkin: yeah, that's pretty hung
20:17 karolherbst: do you have a log?
20:17 karolherbst: urkk: what's your kernel version btw?
20:18 urkk: http://ix.io/2nQ1
20:18 urkk: karolherbst: 5.6.15-arch1-1
20:18 karolherbst: urkk: why does nouveau get's uploaded anyway?
20:18 imirkin: so ... rmmod nouveau is a really bad idea on suspend
20:18 karolherbst: so your issue is not suspend
20:18 karolherbst: but nouveau doesn't unload because something is using it?
20:18 karolherbst: doesn't sound like a bug to me, but just your system being configured in a bad way
20:19 imirkin: you certainly can't rmmod nouveau while X is running
20:19 urkk: It doesn't suspend
20:19 karolherbst: urkk: no
20:19 karolherbst: that's not true
20:19 karolherbst: your configu makes your system not suspend
20:19 karolherbst: it has nothing to do with nouveau
20:19 urkk: But I read a workaround to remove the nouveau driver here: https://wiki.archlinux.org/index.php/Power_management/Suspend_and_hibernate#Instantaneous_wakeups_from_suspend
20:19 karolherbst: urkk: don't follow those
20:20 karolherbst: those are workarounds and might break
20:20 karolherbst: if you have an issue with the stock config
20:20 karolherbst: then it makes sense for us to look into it
20:20 karolherbst: but if a workaround fails working after an update, there is nothing we can do about it
20:20 urkk: So, what should I do? Can I provide more info?
20:20 karolherbst: remove the workarounds first
20:20 karolherbst: as it is clearly buggy
20:21 urkk: done!
20:21 karolherbst: cool
20:21 karolherbst: now check if it all works now or if you still have issues
20:21 urkk: Okay, see you after the reboot
20:25 urkk: Back
20:25 urkk: So I rebooted and login into the virtual console
20:25 urkk: I can suspend and come back fine
20:25 urkk: Then I start Xorg, and try to suspend and it hangs
20:25 karolherbst: mhh
20:25 karolherbst: do you have a log from that?
20:25 urkk: So I waited for 1 minute and reboot
20:27 karolherbst: urkk: do you have a second system so you could check with ssh what's going on?
20:27 urkk: http://ix.io/2nQj
20:27 karolherbst: could be a bug inside nouveau indeed (and I assume that as you probably installed the workaround for this reason)
20:27 karolherbst: mhhh
20:28 urkk: karolherbst: yes, already tried but it doesnt respond to ping
20:28 urkk: yep
20:28 karolherbst: you might have more success with netconsole
20:28 karolherbst: but netconsole is annoying to set up
20:28 karolherbst: but usually better than ssh to debug
20:29 karolherbst: imirkin: aware of any system suspend issues with tesla gpus?
20:29 urkk: interesting
20:29 karolherbst: I could imagine that we mess up when storing the VRAM contents to system RAM... but hard to say if we don't have any logs
20:30 imirkin: no
20:30 imirkin: so the issue is insta-resume, right?
20:31 urkk: what is insta-resume?
20:31 imirkin: you try to suspend, and it resumes instantaneously
20:31 urkk: karolherbst: I will try with netconsole
20:31 imirkin: there's a PM debug thing you can enable
20:31 urkk: imirkin: no the problem is that it hangs
20:31 imirkin: which iirc providse a bit more info
20:31 imirkin: the log shows that it resumed...
20:32 urkk: Yes, from the virtual console
20:32 urkk: See: May 30 22:22:21 hop systemd[1]: Starting Suspend...
20:32 imirkin: weird.
20:32 karolherbst: urkk: what's your desktop?
20:33 urkk: karolherbst: i3+Xorg
20:33 karolherbst: urkk: mind sharing your full dmesg?
20:36 urkk: https://privatebin.net/?cca88556e415718e#J1JbiAS5zwmq36hWgWDrDG8XGcmUEudAG1fqZpFcJFKD
20:37 karolherbst: urkk: do you know if we power down the GPU on your system?
20:37 karolherbst: cat /sys/kernel/debug/vgaswitcheroo/switch might show it
20:38 karolherbst: but from a quick look it doesn't look like it
20:38 karolherbst: just wanting to make sure
20:38 urkk: Doesn't exist
20:38 karolherbst: mhhh
20:38 urkk: sudo cat /sys/kernel/debug/vgaswitcheroo/switch
20:38 urkk: cat: /sys/kernel/debug/vgaswitcheroo/switch: No such file or directory
20:39 urkk: You mean if I can tell by the fans or something?
20:39 karolherbst: and "cat /sys/bus/pci/devices/0000\:01\:00.0/power/runtime_status" returns "active"?
20:39 urkk: yep
20:40 karolherbst: okay
20:40 karolherbst: so at least it shouldn't be that.. mhh
20:40 karolherbst: I am out of ideas..
20:40 karolherbst: well, logs might help
20:40 urkk: Notice that this dmesg is without attempting to suspend
20:40 karolherbst: yeah
20:40 urkk: > imirkin: there's a PM debug thing you can enable
20:40 RSpliet: urkk: just to be sure, what are these kernel options for -> intel_pstate=disable intel_idle.max_cstate=1 ?
20:40 urkk: Maybe I can try that
20:41 urkk: RSpliet: hah that's going to be funny
20:41 RSpliet: Could do with a laugh during times like these
20:41 urkk: It's because if I let the cpu go beyond cstate 1 y whines
20:41 urkk: it*
20:41 karolherbst: uff
20:41 karolherbst: urkk: when was the last time you tested it?
20:41 RSpliet: as in, coil whine?
20:41 karolherbst: might be fixed by now
20:42 urkk: Probably a year ago
20:42 karolherbst: mhh
20:42 urkk: Yeah, I can try
20:42 urkk: Do you think it may be related?
20:42 karolherbst: dunno, coult be, could be not
20:42 karolherbst: we don't know :p
20:42 urkk: RSpliet: yes coil whine, is kind of common in this CPU
20:42 urkk: let me try :D
20:43 karolherbst: I had a powermac once which there was a developer extension to enable power savings on the CPU. But once you enabled it you got noice on the audio output
20:43 karolherbst: .. super weird
20:46 urkk: I observed no changes in the same tests
20:46 RSpliet: http://forum.notebookreview.com/threads/studio-xps-1647-cpu-whine.544428/
20:46 urkk: From the vtconsole can suspend, from Xorg hangs
20:46 RSpliet: Wow... how were we ever under the impression that Intel could make good CPUs?
20:46 urkk: And the whine is istill there
20:46 karolherbst: urkk: mhh, yeah.. I guess netconsole should help us here
20:47 urkk: Let see if I can setup netconsole
20:47 karolherbst: urkk: just make sure to execute "dmesg -n 8" before testing netconsole
20:47 karolherbst: otherwise you end up like me
20:47 karolherbst: trying for an hour and getting annoyed that I don't receive anything
20:47 urkk: :D
20:47 karolherbst: urkk: usually you can leave a lot of stuff out
20:48 karolherbst: netconsole=@/$interface,6667@$target_machine_ip/$target_mac_first_hop
20:48 urkk: I'm connected to my wifi router via an USB dongle
20:49 karolherbst: yeah...
20:49 urkk: Not sure if that is ok
20:49 karolherbst: it is
20:49 karolherbst: but then it depends on the router
20:49 urkk: But I can use a crossover cable to my other laptop otherwise
20:49 karolherbst: if clients can connect to each other directly, you need the targets machien mac address
20:49 karolherbst: if not, you need the routers mac address
20:49 karolherbst: check /sbin/arp -n
20:49 karolherbst: if the target machine is listed there then you use its mac address otherwise the routers one
20:50 karolherbst: ip always from the target
20:50 karolherbst: and $interface is the name of the wifi one
20:51 karolherbst: and on the target you do something like "nc -uklp 6667"
20:51 karolherbst: and it should start printing when it receives stuff
20:51 karolherbst: in case you have a stupid netcat
20:52 karolherbst: do this instead: nc -klu -p 6667 --sh-exec "cat > /proc/$$/fd/1"
20:52 RSpliet: urkk: sorry for the distaction then :-) You may be able to try other max_cstate values, depending on your ACPI table max_cstate=2 could also solve the whine but allowing your laptop to go into a slightly deeper CState (C3 for some processors) - saving a bit more power/battery. But perhaps focus on one issue at a time!
20:54 urkk: karolherbst: I piped `dmesg -w | nc $ip 6667` to test and looks like is not buffered
20:54 karolherbst: urkk: yeah.. but this could break up too soon
20:54 karolherbst: you can try though
20:54 karolherbst: maybe that's enough
20:54 urkk: Also mac shows in `arp -n`
20:57 urkk: I rename my interfaces with udev rules, I'm assuming the name is the one I set
20:57 karolherbst: mhhh...
20:57 karolherbst: should work yes
20:57 karolherbst: if not you'd notice :p
20:58 karolherbst: but yeah.. I also always use the udev set names
21:00 urkk: Let see! :)
21:06 urkk: No messages
21:06 urkk: Also dmesg | grep netconsole doesn't show anything
21:07 urkk: (Apart from the command line)
21:08 urkk: Oh, I needed to load the module :P
21:12 urkk: Okay, so I'm getting some telemetry haha
21:13 urkk: http://ix.io/2nRK
21:13 urkk: Aaand thats all :/
21:14 urkk: Maybe "Freezing remaining freezable tasks" includes wpa_supplicant
21:15 urkk: karolherbst: I assume I should expect more output (?) (I set dmesg to level 8 before atempting to suspend)
21:22 karolherbst: urkk: that's over netconsole?
21:23 urkk: Yes
21:23 urkk: Using the ethernet cable I get one more line
21:23 urkk: printk: Suspending console(s) (use no_console_suspend to debug)
21:29 karolherbst: mhhh
21:30 karolherbst: urkk: mind trying with "echo N | sudo tee /sys/module/printk/parameters/console_suspend" ?
21:30 karolherbst: this should prevent suspending the consoles
21:31 karolherbst: but normally oopses are still shown
21:31 karolherbst: weird...
21:31 urkk: I booted with no_console_suspend=1
21:32 urkk: Not sure if is the same
21:32 urkk: But only gained one extra line
21:32 urkk: ath0: deauthenticating from xxxx by local choice (Reason: 3=DEAUTH_LEAVING)
21:32 karolherbst: ehh :/
21:32 karolherbst: annoying
21:33 karolherbst: maybe Lyude has any ideas on how to debug this?
21:33 karolherbst: I am out of ideas
21:33 karolherbst: urkk: but I guess the network gets teared down before suspending
21:34 karolherbst: but that shouldn't matter really
21:34 urkk: Note I'm using ethernet now
21:34 karolherbst: at least not for ethnernet
21:34 karolherbst: yeah.. it's weird
21:36 urkk: Same with `dmesg -n debug`
21:36 karolherbst: urkk: I guess you also don't see anything in the journal?
21:36 karolherbst: normally disc should get synced if it succeeded
21:36 karolherbst: but mhh
21:37 karolherbst: urkk: mind checking if the ssh connection holds over ethernet?
21:37 karolherbst: maybe we just loop somewhere for eternety
21:38 karolherbst: and simply block the kernel
21:38 urkk: journal misses the last 5 lines
21:38 karolherbst: urkk: do you know if arch has a debug kernel with hangcheck and all that random stuff enabled?
21:38 karolherbst: maybe suspending with a debug kernel booted and waiting for a few minutes will give your move lines
21:39 karolherbst: hangcheck is what you really want o be sure is enabled
21:39 urkk: If the kernel is alive, shouldn't my caps key work?
21:40 urkk: hangcheck is set as a module
21:45 urkk: haha I love the module description: "detects when the system has gone out to lunch past a certain margin"
21:45 karolherbst: :)
21:46 urkk: I set hangcheck_dump_tasks to one
21:47 urkk: tick 60s, margin 30s
21:51 urkk: Oh, ethernet leds are off
21:51 karolherbst: :/
21:52 karolherbst: urkk: might be that the ethernet device gets shutdown due to suspending
21:52 karolherbst: and.. well
21:52 karolherbst: I guess at this point maybe only a real serial console would help :(
21:52 karolherbst: *sigh*
21:52 urkk: I have a serial adapter
21:53 karolherbst: ahh cool
21:53 urkk: But not sure where to plug it
21:53 karolherbst: don't know if usb could get shut down as well though
21:53 karolherbst: ohh
21:53 karolherbst: you mean a proper one
21:53 karolherbst: mhh
21:53 urkk: Is usb to uart
21:53 karolherbst: yeah.. can't help you with that
21:53 karolherbst: normally there are pins on the board
21:53 karolherbst: and sometimes it's even documented
21:53 urkk: haha
21:54 karolherbst: but usually you have to solder the pins yourself :p
21:54 karolherbst: it's a mess
21:54 urkk: But I can build a usb to usb serial if you meant that
21:54 urkk: Yeah, no problem with soldering :P
21:54 karolherbst: I am not sure.. because if everything works when suspending.. stuff gets shut down
21:54 RSpliet: Is this a laptop with a docking port?
21:54 urkk: RSpliet: nope
21:55 RSpliet: Shame, the "serial port" pins are on the inside! :-D
21:55 karolherbst: I sadly have no experience with debugging suspend related issues that deeply
21:55 karolherbst: I think imirkin had some hints on what one could do in such cases
21:56 urkk: Maybe another aproach is to bisect until I find which version worked
21:56 RSpliet: Yeah ideally you'd just want the resume log entries dumped to your harddrive, and reboot with like the magic sysrq combo
21:56 RSpliet: bisect is good
21:56 karolherbst: urkk: I assume that the suspend issue might be there forever
21:56 karolherbst: because you added the workaround for a reason I guess
21:57 RSpliet: Oh right, yes bisect is bad
21:57 karolherbst: the reason the workaround might not work is just userspace changing how it does suspend and keeps a reference on the driver
21:57 karolherbst: which .. is fine
21:57 urkk: no no
21:57 karolherbst: so that wouldn't help at all
21:57 urkk: The workaround never worked
21:57 karolherbst: urkk: ohh, I see
21:57 urkk: I added it today just in case
21:57 karolherbst: ahhhh
21:57 karolherbst: okay
21:57 karolherbst: yeah
21:57 karolherbst: then a kernel bisect would help
21:57 urkk: But I think it was introduced in 5.4
21:57 karolherbst: I thought that was out of question
21:57 karolherbst: my mistake then
21:58 karolherbst: urkk: yeah.. in this case please go ahead and git bisect the kernel...
21:58 karolherbst: it's... painful though
21:58 karolherbst: so it can take a day
21:58 urkk: yeah, but that would be for another day :D
21:59 RSpliet: With a bit of luck you can limit your search to drm only
21:59 RSpliet: Also: any chance you can try like a 5.7 release candidate kernel before you embark on your search
22:00 karolherbst: urkk: ohh.. did you try to suspend without nouveau?
22:00 RSpliet: Y'know, just in case the bug was fixed
22:00 karolherbst: I am wondering why you think it's a nouveau bug
22:00 karolherbst: urkk: mind checking booted with nouveau.modeset=0?
22:01 karolherbst: if it still fails, it's probably not a nouveau bug at all
22:01 karolherbst: as.. at this point we really don't anything about the issue
22:01 urkk: RSpliet: I can try 5.7 yes
22:02 urkk: karolherbst: suspending with nouveau without xorg loaded works fine
22:02 urkk: In fact is my current workaround, exit i3 and then suspend from the command line
22:02 karolherbst: urkk: sure, but that doesn't say anything about suspending with xorg while nouveau isn't loaded :p
22:02 urkk: Didn't tested without nouveau though
22:02 urkk: Ah okk
22:02 RSpliet: sound easy enough to test
22:04 urkk: 5.3.8 suspends fine from Xorg
22:04 RSpliet: urkk: I don't see any obvious changes in 5.7 that may make a difference, to be perfectly honest. I'm just shouting generic advice here :-)
22:05 urkk: Should I try with nouveau.modeset or continue ~bisect?
22:07 RSpliet: If you're already bisecting I'd go for it!
22:09 urkk: Yeah, bisect in 2 tries
22:09 urkk: 5.4.1 fails
22:10 imirkin: a bisect would typically use 'git'
22:10 urkk: I have no more packages in the cache to try
22:10 imirkin: what you're doing is more like "trying random versions"
22:10 imirkin: which is fine for finding a "good" version
22:12 urkk: I'm first trying the versions I used
22:12 urkk: it looks like something bad happened between 5.3.8 and 5.4.1
22:13 urkk: As git bisect will take me a while :P
22:17 RSpliet: Yeah that helps. Unlucky for you, kernel 5.4 had quite a lot of movement in nouveau.
22:17 urkk: karolherbst: Also, suspending from Xorg with nouveau.modeset=0 works fine with 5.6.15
22:18 imirkin: modeset=0 == "don't load nouveau, basically"
22:19 RSpliet: patches like "nouveau: simplify nouveau_dmem_migrate_to_ram" sound... suspicious
22:19 imirkin: anything with dmem is unlikely to matter for tesla
22:19 imirkin: unless it accidentally the whole thing
22:21 urkk: Damm, now I cannot go back from suspension (using modeset=0)
22:21 RSpliet: I'm still wondering what verb that sentence required :-)
22:23 imirkin: it's a reference to an old thing... let me see if i can find it
22:24 imirkin: https://external-preview.redd.it/iFlGDZ-uNRbadP13XtM_9cGyHOni-Cn7rk0C0DvUhvM.png?auto=webp&s=61b91ece585f4adbe6c65482552afaf6db9f9295
22:25 imirkin: someone messing with a help line, basically
22:28 RSpliet: TIL
22:29 imirkin: i think i saw it in like 2008 or so. might be older though.
22:32 urkk: I will try to bisect it properly with git in the following days, and tell you if I find the first commit
22:32 urkk: thanks for the help :)
22:33 urkk: first *bad commit
22:33 RSpliet: Thanks! And good luck