02:03 karolherbst: imirkin: mhhh.. so the annoying thing about those shader errors is, that at some point they just go away :(
02:04 imirkin_: when something gets reuploaded?
02:04 karolherbst: well.. it takes several seconds
02:04 imirkin_: do you know if there's tess going on?
02:04 karolherbst: no
02:04 karolherbst: only vp + fp
02:04 imirkin_: hrmph.
02:04 imirkin_: oh
02:04 imirkin_: do you disable the "other" shaders?
02:04 karolherbst: several seconds is like 4 or 5
02:04 karolherbst: imirkin_: what do you mean by that?
02:04 imirkin_: bleh, cgit is still dead
02:04 karolherbst: I essentially do the same thing as on screen create and mark the entire context state as dirty
02:05 imirkin_: nvc0_tevlprog_validate
02:05 imirkin_: BEGIN_NVC0(push, NVC0_3D(MACRO_TEP_SELECT), 1);
02:05 imirkin_: PUSH_DATA (push, 0x30);
02:05 imirkin_: etc
02:05 imirkin_: ah
02:05 karolherbst: yeah, nvc0_tevlprog_validate is being called by state_validate
02:05 karolherbst: anyway
02:05 karolherbst: 0x22 means fp+vp
02:06 imirkin_: yeah, that runs
02:06 karolherbst: it's a bitmask of enabled stages
02:06 imirkin_: ah ok
02:06 imirkin_: which gpu?
02:06 karolherbst: gm206
02:07 imirkin_: does it matter? hm, probably not
02:07 imirkin_: i was thinking some weird shader header issue
02:07 karolherbst: not sure
02:07 karolherbst: for reproducing those crashes it does :D
02:07 imirkin_: but it should all be the same throughout
02:07 karolherbst: yeah..
02:07 karolherbst: the shaders are fine
02:07 imirkin_: (or at least same enough)
02:07 karolherbst: it's not like new shaders are getting created
02:07 imirkin_: fermi and kepler+ have pretty different compute dispatch
02:07 karolherbst: I just think there is a state missmatch between the shaders we have and what the GPU state is
02:08 imirkin_: speaking of compute
02:08 karolherbst: no compute either
02:08 imirkin_: do you reinit the compute stuff too?
02:08 karolherbst: sure
02:08 karolherbst: even calling nvc0_screen_init_compute
02:08 imirkin_: mmkay
02:09 karolherbst: the only think which is different to a normal screen create are creating those bufctx and bos
02:09 karolherbst: and memory allocations and the like
02:09 karolherbst: I don't call nvc0_blitter_create though, but I hope that doesn't matter...
02:10 karolherbst: yeah.. shouldn't
02:10 karolherbst: I mean.. there are no rendering artefacts or anything
02:10 karolherbst: it look sperfect besides those short hangs and slowdowns through interrupt hell
02:11 karolherbst: it's probably something only set through an API call
02:11 karolherbst: and the application just "fixes" the state after a while
02:13 karolherbst: now it took 0.1seconds to get repaired..
02:14 karolherbst: at least the error is always equal... 0x44020e and 0x04020e,
02:14 karolherbst: *sigh*
02:16 karolherbst: mhhhh
02:16 karolherbst: maybe I should call nvc0_upload_tsc0?
02:18 karolherbst: sometimes I did saw "black texture" issues... but .. let's see
02:19 karolherbst: mhhh
02:19 karolherbst: doesn't fix it either
02:22 karolherbst: imirkin_: or maybe it's just that query stuff
02:22 karolherbst: the game uses occlusion queries
02:22 imirkin_: that's about buffer contents though
02:23 imirkin_: ohhh
02:23 imirkin_: but
02:23 karolherbst: welll
02:23 karolherbst: nvc0_render_condition ;_
02:23 karolherbst: ;)
02:23 imirkin_: it needs to be rebound
02:23 imirkin_: tsc0 has nothing to do with render conditions
02:23 karolherbst: yeah..
02:23 karolherbst: I know
02:23 karolherbst: but textures are fine
02:23 karolherbst: I tried that
02:23 imirkin_: it only matters for texelFetch
02:23 karolherbst: didn't fix things
02:23 imirkin_: oh also
02:23 imirkin_: it's only for fermi :)
02:23 karolherbst: right
02:23 karolherbst: anyway
02:24 karolherbst: looking at nvc0_render_condition now
02:24 karolherbst: and it does touch state
02:24 imirkin_: (maybe?) i forget.
02:24 imirkin_: yeah, it sets the render condition.
02:24 imirkin_: it's stored in nvc0->cond_cond & co
02:24 karolherbst: let's see if after calling it, the state gets fixed
02:24 imirkin_: remember that the render condition goes across multiple draws
02:24 karolherbst: ehhh
02:24 karolherbst: doesn't get called
02:25 karolherbst: could be anything inside nvc0_query_hw.c though
02:26 karolherbst: ehh
02:26 karolherbst: nvc0_hw_begin_query gets called way too often
02:27 karolherbst: I think something funky with queries is probably the best bet atm
02:29 karolherbst: PIPE_QUERY_OCCLUSION_COUNTER is the only query type used as it seems
02:30 karolherbst: question is.. why would a shader care?
02:33 karolherbst: ehhhhhhhh
02:35 imirkin_: welll
02:35 imirkin_: yea dunno
02:37 karolherbst: imirkin_: nvc0 before resetting it: https://gist.githubusercontent.com/karolherbst/b8096a1f6637130b3b7c2ca006a286f9/raw/380953f484d948003ccbe60d80abbf2e164780f9/gistfile1.txt
02:38 karolherbst: ehh
02:38 karolherbst: seems like some stuff is already set to dirty..
02:38 karolherbst: damn gdb
02:38 imirkin_: "some"?
02:38 imirkin_: aka "all"?
02:38 karolherbst: meant fields
02:39 karolherbst: tfbbuf_dirty isn't set :)
02:39 karolherbst: but nvc0->state is still in the pre reset state
02:39 imirkin_: huh? it's set to 0xf?
02:39 karolherbst: what?
02:39 imirkin_: tfbbuf_dirty = 15 '\017',
02:39 karolherbst: yeah
02:40 karolherbst: ohh
02:40 karolherbst: mhh
02:40 karolherbst: guess that never gets cleared
02:40 imirkin_: well dirty_3d/cp are also 0xffffffff
02:40 karolherbst: yeah.. let me dump new data without being messed up
02:40 karolherbst: used the wrong line with until
02:41 karolherbst: imirkin_: https://gist.githubusercontent.com/karolherbst/4d7a795e340fe8e63810d5c146901f4c/raw/e0208d23038d58c1daf6a7943e43257d7edf3fa0/gistfile1.txt
02:42 karolherbst: I guess tfbbuf_dirty stays dirty as long as there is no tfb bound
02:42 karolherbst: actually....
02:42 karolherbst: is tfbbuf_dirty ever set to 0?
02:43 karolherbst: :O
02:43 karolherbst: uhm
02:43 karolherbst: I think we never clear tfbbuf_dirty
02:53 karolherbst: imirkin_: :D okay, I am super confident it's texture/sampler related
02:53 karolherbst: I actually played the game a little and was able to trigger READ faults on textures
02:53 karolherbst: also was able to saw corrupted textures and the likes
02:55 karolherbst: even by looking around I was able to get those sph errors to start again
02:56 karolherbst: it survived like 100 channel resets though:D but now X crashed
02:56 karolherbst: but yeah... maybe the sampler state is a little messed up are something
02:57 karolherbst: oh well.. maybe I'll figure it out tomorrow
02:58 karolherbst: synced pb submission prevent those cpu_prep stalls btw... so I guess "dead" pb submissions are causing the issue here
03:28 imirkin_: just feed a tsc cache flush or something
05:58 glennk: does NV40_3D_MRT_BLEND_ENABLE actually work for nv40?
07:37 imirkin_: glennk: no clue. seems like it should, no?
07:38 glennk: no errors reported, but doesn't appear to do anything
09:22 karolherbst: imirkin_: yeah.. I guess
09:35 karolherbst: well.. it's not a TSC_FLUSH at least which is missing
09:43 karolherbst: duh....
09:43 karolherbst: as no texture/sampler state changes it also doesn't get flushed
10:10 karolherbst: ehh..
10:10 karolherbst: maybe I should look from a different perspective on this
10:10 karolherbst: disable stuff so I get the same errors before we reset the context
11:44 karolherbst: disabling all state validate thingies gibes me 0x040800 ...
11:44 karolherbst: *gives
13:44 KitsuWhooa: It looks like suspend is broken on a GeForce 8400 GS (G86) after updating to mainline kernel 5.13.9. It was working fine on 5.4 that shipped with ubuntu 20.04. Any idea what could be going wrong?
13:44 KitsuWhooa: https://tasossah.com/txt/nouveau_suspend_trace
13:44 KitsuWhooa: I'd like to avoid a full bisect between the two versions, given how long it'd take to compile a kernel on this computer
13:48 karolherbst: KitsuWhooa: yeah.. not quite sure. Can you install other kernel versions and try with those?
13:48 karolherbst: sadly ubuntu makes it hard to install random kernels versions even though they have the deb files for it :(
13:48 KitsuWhooa: Yeah, they bumped up the libc version requirement and you can only install the new kernels on 21.04
13:49 KitsuWhooa: I'll find what their last mainline build that works with 20.04 and try that
13:49 karolherbst: okay, cool
13:51 glennk: funny, resume isn't working on 5.4 for the nv43 i'm currently testing with
13:52 KitsuWhooa: whoops :p
13:52 glennk: and later kernels have working resume but something is going wrong in ttm somewhere leading to random hangs
13:52 karolherbst: yeah.. that stuff is quite annoying
13:54 karolherbst: I can check with my Tesla GPUs, but atm busy with other things :/
13:58 karolherbst: imirkin_: fun... disabling occlusion queries makes the shader errors go away.. :/
14:01 karolherbst: okay...
14:01 karolherbst: so it's queries for real I guess
14:01 karolherbst: fun that the game even launches though
14:02 karolherbst: as disabling PIPE_CAP_OCCLUSION_QUERY gives us GL 1.4 :)
14:02 KitsuWhooa: So, with 5.8.18 it did suspend and resume after taking a while but it threw these
14:02 KitsuWhooa: [ 146.414682] nouveau 0000:06:00.0: bus: MMIO write of 00000000 FAULT at 00fd94
14:02 KitsuWhooa: [ 146.415274] nouveau 0000:06:00.0: bus: MMIO write of 00000000 FAULT at 103d94
14:03 KitsuWhooa: I'm not sure if they are related to it or not
14:03 karolherbst: KitsuWhooa: those are usually not relevant
14:03 KitsuWhooa: ah, thanks!
14:03 karolherbst: they just mean we read from a register which doesn't exist
14:03 karolherbst: or
14:03 karolherbst: are disabled
14:03 karolherbst: or eh..write to it
14:03 karolherbst: of course they could indicate errors
14:03 karolherbst: but on their own they are usually harmless
14:04 KitsuWhooa: Yup, makes sense
14:04 karolherbst: KitsuWhooa: I assume you can't install 5.9 or newer?
14:05 KitsuWhooa: I was going through the build logs to see if I could tell what I can and can't install but I didn't find anything, so I picked 5.8 at random
14:05 KitsuWhooa: I'll try 5.9 now
14:07 KitsuWhooa: Hopefully I'll narrow it down enough and then do a localmodconfig bisect
14:07 KitsuWhooa: Building a generic kernel on a Q6600 will take forever :p
14:07 karolherbst: KitsuWhooa: I usually use the config files or the distribution kernels, but yeah.. localmodconfig should be better :D
14:08 karolherbst: KitsuWhooa: although that CPU isn't _that_ slow :D
14:09 KitsuWhooa: It's still holding up surprisingly well :p
14:09 karolherbst: my 8700 seems to be 2 to 7 times faster depending on the benchmark
14:09 karolherbst: as compiler usually don't use fancy SSE or AVX thingies, I assume compilation might be 2-3 times slower
14:09 karolherbst: which is still not that bad
14:10 karolherbst: but it also benefits from having tons of cores...
14:10 KitsuWhooa: Yup :p
14:11 karolherbst: but sounds like you can easily achive +20% through overclocking :D
14:11 karolherbst: oh well...
14:12 KitsuWhooa: I have the not so great stepping that heats up more, so unless I get a better cooler I probably won't bother
14:12 KitsuWhooa: I don't remember the details
14:12 karolherbst: ahh
14:12 karolherbst: annoying
14:16 KitsuWhooa: 5.9 took even longer to suspend, but it did eventually do so and also resumed. There were two of these as well
14:16 KitsuWhooa: [ 98.142284] nouveau 0000:06:00.0: fb: trapped read at 000000b018 on channel -1 [1fed0000 unknown] engine 06 [BAR] client 08 [PFIFO_READ] subclient 00 [FB] reason 00000002 [PAGE_NOT_PRESENT]
14:16 karolherbst: mhhh
14:16 karolherbst: KitsuWhooa: okay... sooooo
14:17 karolherbst: KitsuWhooa: how often did you try with 5.13?
14:17 karolherbst: maybe you just hit a bad timing
14:17 KitsuWhooa: All of the times I tried it seemed to fail
14:17 karolherbst: ahh
14:17 KitsuWhooa: I can try it again
14:17 karolherbst: question is, did you try on purpose or did it only break whenever you tried to suspend?
14:18 KitsuWhooa: First time was because I needed to suspend it, second time was to check if it was a one off
14:19 KitsuWhooa: And then the third and fourth times were the same thing
14:20 karolherbst: okay
14:20 KitsuWhooa: You think it works some of the time with the new kernel and I'm just unlucky? :p
14:20 KitsuWhooa: I can try it a few times
14:20 karolherbst: not sure
14:20 karolherbst: might be
14:20 karolherbst: that's important to know before bisecting :D
14:21 KitsuWhooa: Indeed. I assumed that it was flat out broken but it might not be the case
14:21 KitsuWhooa: Especially since it takes at least a minute to suspend now
14:22 karolherbst: yeah
14:22 KitsuWhooa: It seems like I always get the worst issues to bisect :p
14:25 KitsuWhooa: First attempt to resume, it took 2 minutes and then it gave up and woke back up
14:26 KitsuWhooa: On 5.13.X
14:26 KitsuWhooa: Without me pressing anything
14:26 KitsuWhooa: I'll do it 4 more times I guess
14:45 KitsuWhooa: Yeah, all 5 attempts failed the same way
14:45 karolherbst: imirkin_: .... I figured it out...
14:46 karolherbst: imirkin_: I think the problem is that the queries return garbage :)
14:48 karolherbst: okay... how annoying
15:00 karolherbst: imirkin_: yeah... so if I return 0 for queries started with the old channel, I significantly reduce the amount of errors....
15:01 karolherbst: so guess the games pushes the value through uniforms or something and some smart algirithm is doing... something
15:02 karolherbst: and that annoys the hw
15:02 karolherbst: if the values are bogus
15:10 KitsuWhooa: That's interesting. 5.11 won't suspend, but it won't give up and wake up either. It stayed there with the fans spinning until I pressed a key, and then the machine rebooted on its own
15:11 karolherbst: :(
15:11 karolherbst: as you see, not many users do actually suspend their machines
15:12 KitsuWhooa: I only suspend my main desktop when the ups battery is running low, so I agree :p
15:12 karolherbst: ah
15:14 KitsuWhooa: Let's try again I guess, because the trace in the logs showed unrelated acpi functions, which is a bit concerning
15:18 KitsuWhooa: Yup. 5.11 suspended fine now :p
15:19 karolherbst: imirkin_: not great, but better than before: https://gist.githubusercontent.com/karolherbst/6058a49704a15124743a6e60fdb153b0/raw/dea79912797bbe84ed940d5339d007e0a4ce9677/gistfile1.txt
15:22 karolherbst: bit it is still strange..
15:22 karolherbst: I'll ask nvidia about those errors
15:24 karolherbst: oh wow.. the game survived 88 channels :)
15:24 karolherbst: KitsuWhooa: whcih GPU was it? 8400?
15:25 KitsuWhooa: Yup, 8400 GS
15:25 KitsuWhooa: I'm currently testing 5.12
15:26 KitsuWhooa: Oooh. 5.12 failed to suspend and woke up
15:26 karolherbst: I'll try to reproduce now
15:26 KitsuWhooa: So it's somewhere between 5.11 and 5.12
15:26 KitsuWhooa: Awesome, thanks! I'll start the proper bisect now
15:26 karolherbst: let's hope my 8600 hits it
15:27 KitsuWhooa: Just keep in mind it might take more than a minute for it to actually suspend :p
15:28 KitsuWhooa: I can bisect that too afterwards but I'm not too fussed about it at the moment
15:28 karolherbst: yeah, no worries :D
15:28 karolherbst: those old GPUs always are so super noisey
15:29 karolherbst: *noisy
15:29 karolherbst: KitsuWhooa: mhh.. mine went off in an instant, but that was without a desktop
15:29 karolherbst: well
15:29 karolherbst: resuming doens't work out so well thoguh
15:30 KitsuWhooa: Mine has a broken fan and the previous owner bodged a generic fan on it, so it's not that loud :p
15:37 karolherbst: ehhh, rebooted the wrong machine
15:39 KitsuWhooa: I know the feeling :pp
15:39 KitsuWhooa: *:p
15:40 karolherbst: ehh.. yeah.. so at least on my g84 everything is fine
15:40 karolherbst: ohh
15:40 karolherbst: I have a 8400GS
15:41 karolherbst: sometimes you lose the overview
15:42 KitsuWhooa: I have the feeling that the failure is specific to my machine
15:42 KitsuWhooa: Knowing my luck
15:44 karolherbst: KitsuWhooa: or your desktop
15:45 KitsuWhooa: It happens even while in the lightdm greeter
15:45 karolherbst: yeah well.. that 8400 suspends just fine, where the 8600 just failed :)
15:45 karolherbst: does't display shit though
15:45 KitsuWhooa: :D
15:45 karolherbst: wait it does
15:46 karolherbst: the blinking cursor
15:46 karolherbst: ewww ugh
15:47 karolherbst: guess restarting gdm helped
15:47 karolherbst: KitsuWhooa: mind trying without any graphical interface?
15:47 KitsuWhooa: sure
15:47 KitsuWhooa: systemctl suspend, right?
15:47 karolherbst: yeah.. but I meant with stopping whatever runs as well
15:48 KitsuWhooa: yup
15:48 karolherbst: so it works for me with gnome and without
15:49 KitsuWhooa: So, without the desktop running, it immediately "woke up" with the nouveau suspend trace
15:49 KitsuWhooa: and the some devices failed to suspend
15:49 KitsuWhooa: message
15:49 karolherbst: ehhh
15:50 KitsuWhooa: Wait. Can you try suspending without any monitors plugged in?
15:50 karolherbst: mind pastebining your dmesg?
15:50 karolherbst: uhm... sure? worst case you use ssh
15:50 KitsuWhooa: No, I meant that I have no monitors plugged in to the nvidia card
15:50 KitsuWhooa: So maybe that's the problem
15:50 karolherbst: ohhhh
15:50 karolherbst: mhhh
15:51 KitsuWhooa: and yeah, sure
15:51 karolherbst: so it's your secondary GPU or something?
15:51 KitsuWhooa: Yeah, it's a secondary card
15:51 karolherbst: mhhh
15:51 karolherbst: it shouldn't matter, but...
15:53 KitsuWhooa: this is the log with the suspend without X running
15:53 KitsuWhooa: https://tasossah.com/txt/nouveau_suspend_kernel_log
15:53 KitsuWhooa: X did initially start on the login screen with lightdm, but I stopped it and then suspended
15:53 karolherbst: well.. my desktop CPU also has an intel GPU :)
15:54 karolherbst: yeah.. works just as well
15:55 karolherbst: heh
15:55 karolherbst: I have a different 8600 GS
15:55 karolherbst: 01:00.0 VGA compatible controller: NVIDIA Corporation G98 [GeForce 8400 GS Rev. 2] (rev a1)
15:55 karolherbst: G98 :)
15:55 KitsuWhooa: 06:00.0 VGA compatible controller: NVIDIA Corporation G86 [GeForce 8400 GS] (rev a1)
15:55 KitsuWhooa: ah :p
15:56 karolherbst: not even the awesome wiki tables knows about that combination
15:56 karolherbst: yeah.. that's slightly annoying
15:58 karolherbst: mhh
15:58 karolherbst: KitsuWhooa: guess you need to bisect then
15:58 KitsuWhooa: Yup. Just started
15:58 karolherbst: although "nouveau 0000:06:00.0: fb: trapped read at 0100443034 on channel -1 [1fed0000 unknown] engine 06 [BAR] client 08 [PFIFO_READ] subclient 01 [IN] reason 00000002 [PAGE_NOT_PRESENT]" that doesn't look so well
15:59 glennk: one chip is just mounted upside down
15:59 KitsuWhooa: that happened whhen I stopped lightdm
15:59 karolherbst: ahh
15:59 KitsuWhooa: so it was before I suspended
15:59 karolherbst: I am sometimes wondering if we should just ignore most of the errors and suspend regardless...
16:00 karolherbst: device gets power cycled anyway
17:12 KitsuWhooa: karolherbst: 14 kernel rebuilds later...
17:13 KitsuWhooa: 64f7c698bea9cf84cb224fd4352964c2af7252d9 is the first bad commit
17:13 KitsuWhooa: It doesn't look too helpful...
17:13 karolherbst: bisecting the kernel is _always_ annoying
17:14 karolherbst: I usually need like 2-3 bisects
17:14 karolherbst: KitsuWhooa: at least you ended up with a nouveau commit
17:14 karolherbst: KitsuWhooa: can you verify that the commit before that is absolutely fine?
17:14 KitsuWhooa: Yeah, when it narrowed down to nouveau, I stopped worrying
17:14 KitsuWhooa: I'm going to right now :p
17:17 KitsuWhooa: just to make sure, I should test at ab0db2bd853d4a61bf440d2846b046a1d11ce027
17:17 KitsuWhooa: correct?
17:17 karolherbst: yes
17:17 KitsuWhooa: awesome
17:19 karolherbst: soo nouveau on your GPU obviously used g84_fifo_chan_engine
17:20 karolherbst: where now it uses g84_fifo_engine_id...
17:20 karolherbst: soo
17:20 karolherbst: let's see if some mapping is bonkers
17:20 karolherbst: ehh lol
17:20 karolherbst: it's different
17:21 karolherbst: yeah
17:21 karolherbst: that's broken
17:22 karolherbst: KitsuWhooa: mind making g84_fifo_engine_id return the values there but -1?
17:22 karolherbst: add a "int ret;" thing and assign it and just return "return ret -1;" at the end or so
17:23 karolherbst: instead of returning in the switch
17:24 KitsuWhooa: Right, just tested it. I can confirm that that commit works :p
17:24 karolherbst: skeggsb: 64f7c698bea9cf84cb224fd4352964c2af7252d9 is broken :)
17:24 karolherbst: for g84 the new values are one too high
17:24 karolherbst: KitsuWhooa: do you want to fix it yourself or shall I write a patch?
17:25 KitsuWhooa: I wouldn't mind fixing it myself
17:25 KitsuWhooa: give me a bit first
17:25 karolherbst: I just don't know if it fixes other stuff :D
17:25 karolherbst: but should be fine if you test on top of 64f7c698bea9cf84cb224fd4352964c2af7252d9 directly
17:26 KitsuWhooa: Right, let me try doing what you suggested
17:26 KitsuWhooa: it takes a while :p
17:31 KitsuWhooa: So, question. Does that mean that the defines in the header are wrong?
17:32 karolherbst: ehh.... dunno tbh
17:32 KitsuWhooa: Because if that's the case, aside from testing, returning ret - 1 doesn't sound like a good idea if that's the case :p
17:32 karolherbst: yeah.. let me check
17:33 karolherbst: KitsuWhooa: if you are on top of the 64f7c698bea9cf84cb224fd4352964c2af7252d9 you can also change the defines to have a lower value
17:33 karolherbst: they are new as it seems
17:34 KitsuWhooa: Yeah, I am, but I already did return CONST - 1; for each one in the switch
17:34 KitsuWhooa: faster than making a new variable and assigning it
17:34 karolherbst: okay
17:34 KitsuWhooa: and adding breaks
17:34 karolherbst: :D
17:34 karolherbst: true
17:38 KitsuWhooa: karolherbst: looks like that worked
17:38 karolherbst: okay...
17:38 karolherbst: thanks for figuring out which commit broke it :)
17:39 karolherbst: KitsuWhooa: mind testing on something newer like.. I dunno.. 5.13 with a patch changing the defines?
17:39 karolherbst: or 5.14?
17:39 KitsuWhooa: Yeah, sure thing
17:39 karolherbst: just to make sure that fix doesn't break others things
17:39 karolherbst: cool, thanks
17:39 KitsuWhooa: https://tasossah.com/nouveau_fifo_engine_id.patch
17:39 KitsuWhooa: that's what I did basically
17:40 karolherbst: yeah
17:40 karolherbst: good enough for a quick test :)
17:42 KitsuWhooa: hmmm
17:42 KitsuWhooa: nvkm/engine/fifo/channv50.h:#define G84_FIFO_ENGN_SW 0
17:42 karolherbst: mhhh
17:43 KitsuWhooa: I get the feeling that's not meant to be -1 :p
17:43 karolherbst: KitsuWhooa: SW is essentially fake stuff
17:43 karolherbst: just use... I dunno 15 or the biggest one
17:43 KitsuWhooa: aaaah I see
17:43 karolherbst: other archs use 15
17:44 KitsuWhooa: 15 sounds good then
17:44 karolherbst: yeah... not 100% sure on this though... things are a bit .. weird
17:45 karolherbst: skeggsb: would probably know best how to fix it
17:45 karolherbst: he might be up in 4 hours or so :D
17:45 KitsuWhooa: NV50 is 0 as well
17:45 KitsuWhooa: Yeah, fair enough. I'll play with it, and if I get the OK I might make a patch
17:45 karolherbst: yeah...
17:46 karolherbst: might be the numbers being different in various places ¯\_(ツ)_/¯
17:46 KitsuWhooa: Given how much time I spent on this I'd like to at least submit a patch, but eh :p
17:47 karolherbst: I don't want to say that those old GPUs are bonkers in a lot of ways, so let's just say there are less consistent :D
17:48 KitsuWhooa: I think bonkers sounds about right
17:51 KitsuWhooa: Hmmm. I tried logging in and the whole computer froze
17:51 karolherbst: yeah...
17:51 karolherbst: I guess new patches messes something up
17:51 karolherbst: KitsuWhooa: I have a different idea
17:52 karolherbst: KitsuWhooa: on top of a new kernel, just add an -1 after the call to engine_id inside g84_fifo_chan_engine_addr
17:52 KitsuWhooa: I didn't even get around to testing the define edits because the latest HEAD won't build
17:52 karolherbst: ohh
17:52 karolherbst: annoying
17:52 KitsuWhooa: I tried logging in to look at the config changes I have to make that I don't remember off the top of my head, and it froze
17:53 KitsuWhooa: I really should get a monitor and plug it in to that card to make sure it actually works, too :p
18:07 KitsuWhooa: So, got 5.14.0-rc5 from HEAD built with https://tasossah.com/nouveau_fifo_engine_id2.patch on top
18:08 KitsuWhooa: I stopped lightdm on the login screen, suspended, resumed successfully, and then plugged in a monitor and started X and logged in.
18:08 KitsuWhooa: Looks like it's displaying video just fine
18:09 karolherbst: mhh, cool
18:09 karolherbst: and GL also works and everything?
18:09 KitsuWhooa: DRI_PRIME=1 glxgears renders
18:09 karolherbst: KitsuWhooa: and is it suspending instantly or is there still a delay?
18:12 KitsuWhooa: It suspends instantly without the desktop running. It still takes a while with the desktop running.
18:12 karolherbst: mhhh
18:12 KitsuWhooa: That probably started before 5.8
18:12 karolherbst: yeah...
18:13 karolherbst: if you don't mind you could figure that one out as well :D
18:13 KitsuWhooa: Sure. Only downside is that bisect cycles would take ages due to how long it takes :p
18:14 karolherbst: well.. you already finished one bisect within 2 hours
18:14 karolherbst: I always plan an entire day for that :)
18:15 KitsuWhooa: I told it to suspend a few seconds before I sent the message saying it's still taking a while
18:15 KitsuWhooa: I'm still waiting :Pp
18:16 KitsuWhooa: I get the feeling that it won't actually suspend now at all
18:16 karolherbst: :(
18:17 KitsuWhooa: I wonder if it's because I left glxgears running
18:17 karolherbst: ehh.. hopefully not, but maybe
18:18 KitsuWhooa: *sigh*
18:20 KitsuWhooa: I give up, I'll press a key
18:21 KitsuWhooa: Yeah I think it's frozen. RIP
18:21 karolherbst: anything in dmesg or just nothing?
18:21 KitsuWhooa: Well, the screens are off as it's still trying to suspend
18:21 KitsuWhooa: I'd need to reboot, and I assume doing that won't write anything to disk
18:22 KitsuWhooa: Maybe I should get out the good 'ol null modem cable and enable the serial console
18:22 karolherbst: I git a RS232 one for my desktop :D
18:22 KitsuWhooa: I'm "lucky" in the sense that both the desktop with the nvidia card and this laptop have RS232 ports
18:23 KitsuWhooa: Anyway, hit the rest button
18:27 KitsuWhooa: https://tasossah.com/uploader/files/nouveau_glxgears.png this is what I was referring to earlier
18:35 karolherbst: I think there is something broken with suspending engines or something.. mhh
18:39 KitsuWhooa: ...oh no
18:40 KitsuWhooa: https://tasossah.com/nouveau.log
18:40 KitsuWhooa: this is why it never suspends now
18:40 karolherbst: uhhh.. oops
18:41 karolherbst: I think that one got fixed
18:41 KitsuWhooa: I can't tell if it's because of my patch, or because of something else
18:41 karolherbst: let me find the commit
18:41 KitsuWhooa: Ah, so it didn't make it to the main repo?
18:41 karolherbst: mhh should have actually
18:42 KitsuWhooa: this is v5.14-rc5-256-g0aa78d17099b
18:42 karolherbst: ehh might have been something else.. let me see
18:43 karolherbst: ehh wait
18:43 karolherbst: no that could be because of your patch actually... the trace is just weird
18:44 karolherbst: a3a9ee4b5254f212c2adaa8cd8ca03bfa112f49d was the fix I was thinking about
18:44 karolherbst: but that's in since rc3
18:47 karolherbst: ehhh
18:47 KitsuWhooa: I'll try this patch on that commit that introduced the issue in the first place
18:47 KitsuWhooa: that'll clear up if it's my patch or something else
18:47 karolherbst: yeah
18:47 KitsuWhooa: it just never ends! :p
18:47 karolherbst: I suspect future changes might depend on the different values
19:02 KitsuWhooa: karolherbst: It took a while but it suspended and resumed with my patch. That said, it spewed these out
19:02 KitsuWhooa: https://tasossah.com/txt/nouveau_suspend_kernel_log_2
19:03 KitsuWhooa: But honestly, I've had enough for now :p
19:03 karolherbst: KitsuWhooa: yeah.. something is not right :D
19:03 karolherbst: maybe skeggsb has a GPU to trigger those issues
19:04 KitsuWhooa: Yeah, I'll wait for their opinion for now, because I'm tired :p
19:45 karolherbst: imirkin: that's the recovery commit btw: https://github.com/karolherbst/mesa/commit/d977a08822a8c68e66c64113e32b6fcf732b699c
19:46 karolherbst: might need some cleanup, but maybe you spot something obviously wrong with it or if we will just have bad luck here
19:46 imirkin_: that seems ... surprisingly non-invasive
19:46 karolherbst: yeah
19:46 karolherbst: did you expect anything else?
19:47 imirkin_: i didn't give it enough thought to expect one thing or another
19:47 karolherbst: well.. I only implemented it for nvc0
19:48 imirkin_: PUSH_KICK (push, screen, screen);
19:48 imirkin_: 
19:48 imirkin_: one screen too many? (nouveau_video.c)
19:48 karolherbst: ehhh
19:48 karolherbst: yeah.. I messed up my sed
19:48 karolherbst: that's guarded by this weirdo #ifdef
19:48 imirkin_: yea
19:49 karolherbst: there are a few issues though.. what happens if the channel does inside some random PUSH_SPACE command.. ohh... I should check PUSH_SPACE as well
19:49 karolherbst: or what happens if you crashes while validation and stuff
19:50 karolherbst: but I guess we should just restart validation and give up if it crashes again
19:50 karolherbst: but the kernel isn't ready yet anyway
19:52 karolherbst: but with synced pushbuffers I didn't run into this bo_wait problem at least
19:53 karolherbst: anyway.. it will need this libdrm MR: https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/188 would be cool to get it in, or I just merge it or something
19:54 imirkin_: karolherbst: btw, double-screen in nv98_video_* as well
19:56 karolherbst: yeah... I will rework the entire patch anyway
19:56 karolherbst: although I doubt I can do anything better for PUSH_KICK?
19:56 karolherbst: mhh
20:03 imirkin_: looks nice overall
20:04 karolherbst: yeah... I am actually surprised how well it works
20:04 imirkin_: well
20:04 imirkin_: the problem is that simulated hangs are different from real ones
20:05 karolherbst: it's not simulated
20:05 karolherbst: the channel dies for real
20:05 imirkin_: sure
21:12 karolherbst: ohh.. I have a better idea on how to deal with recovery :)