02:03karolherbst: imirkin: mhhh.. so the annoying thing about those shader errors is, that at some point they just go away :(
02:04imirkin_: when something gets reuploaded?
02:04karolherbst: well.. it takes several seconds
02:04imirkin_: do you know if there's tess going on?
02:04karolherbst: no
02:04karolherbst: only vp + fp
02:04imirkin_: hrmph.
02:04imirkin_: oh
02:04imirkin_: do you disable the "other" shaders?
02:04karolherbst: several seconds is like 4 or 5
02:04karolherbst: imirkin_: what do you mean by that?
02:04imirkin_: bleh, cgit is still dead
02:04karolherbst: I essentially do the same thing as on screen create and mark the entire context state as dirty
02:05imirkin_: nvc0_tevlprog_validate
02:05imirkin_: BEGIN_NVC0(push, NVC0_3D(MACRO_TEP_SELECT), 1);
02:05imirkin_: PUSH_DATA (push, 0x30);
02:05imirkin_: etc
02:05imirkin_: ah
02:05karolherbst: yeah, nvc0_tevlprog_validate is being called by state_validate
02:05karolherbst: anyway
02:05karolherbst: 0x22 means fp+vp
02:06imirkin_: yeah, that runs
02:06karolherbst: it's a bitmask of enabled stages
02:06imirkin_: ah ok
02:06imirkin_: which gpu?
02:06karolherbst: gm206
02:07imirkin_: does it matter? hm, probably not
02:07imirkin_: i was thinking some weird shader header issue
02:07karolherbst: not sure
02:07karolherbst: for reproducing those crashes it does :D
02:07imirkin_: but it should all be the same throughout
02:07karolherbst: yeah..
02:07karolherbst: the shaders are fine
02:07imirkin_: (or at least same enough)
02:07karolherbst: it's not like new shaders are getting created
02:07imirkin_: fermi and kepler+ have pretty different compute dispatch
02:07karolherbst: I just think there is a state missmatch between the shaders we have and what the GPU state is
02:08imirkin_: speaking of compute
02:08karolherbst: no compute either
02:08imirkin_: do you reinit the compute stuff too?
02:08karolherbst: sure
02:08karolherbst: even calling nvc0_screen_init_compute
02:08imirkin_: mmkay
02:09karolherbst: the only think which is different to a normal screen create are creating those bufctx and bos
02:09karolherbst: and memory allocations and the like
02:09karolherbst: I don't call nvc0_blitter_create though, but I hope that doesn't matter...
02:10karolherbst: yeah.. shouldn't
02:10karolherbst: I mean.. there are no rendering artefacts or anything
02:10karolherbst: it look sperfect besides those short hangs and slowdowns through interrupt hell
02:11karolherbst: it's probably something only set through an API call
02:11karolherbst: and the application just "fixes" the state after a while
02:13karolherbst: now it took 0.1seconds to get repaired..
02:14karolherbst: at least the error is always equal... 0x44020e and 0x04020e,
02:14karolherbst: *sigh*
02:16karolherbst: mhhhh
02:16karolherbst: maybe I should call nvc0_upload_tsc0?
02:18karolherbst: sometimes I did saw "black texture" issues... but .. let's see
02:19karolherbst: mhhh
02:19karolherbst: doesn't fix it either
02:22karolherbst: imirkin_: or maybe it's just that query stuff
02:22karolherbst: the game uses occlusion queries
02:22imirkin_: that's about buffer contents though
02:23imirkin_: ohhh
02:23imirkin_: but
02:23karolherbst: welll
02:23karolherbst: nvc0_render_condition ;_
02:23karolherbst: ;)
02:23imirkin_: it needs to be rebound
02:23imirkin_: tsc0 has nothing to do with render conditions
02:23karolherbst: yeah..
02:23karolherbst: I know
02:23karolherbst: but textures are fine
02:23karolherbst: I tried that
02:23imirkin_: it only matters for texelFetch
02:23karolherbst: didn't fix things
02:23imirkin_: oh also
02:23imirkin_: it's only for fermi :)
02:23karolherbst: right
02:23karolherbst: anyway
02:24karolherbst: looking at nvc0_render_condition now
02:24karolherbst: and it does touch state
02:24imirkin_: (maybe?) i forget.
02:24imirkin_: yeah, it sets the render condition.
02:24imirkin_: it's stored in nvc0->cond_cond & co
02:24karolherbst: let's see if after calling it, the state gets fixed
02:24imirkin_: remember that the render condition goes across multiple draws
02:24karolherbst: ehhh
02:24karolherbst: doesn't get called
02:25karolherbst: could be anything inside nvc0_query_hw.c though
02:26karolherbst: ehh
02:26karolherbst: nvc0_hw_begin_query gets called way too often
02:27karolherbst: I think something funky with queries is probably the best bet atm
02:29karolherbst: PIPE_QUERY_OCCLUSION_COUNTER is the only query type used as it seems
02:30karolherbst: question is.. why would a shader care?
02:33karolherbst: ehhhhhhhh
02:35imirkin_: welll
02:35imirkin_: yea dunno
02:37karolherbst: imirkin_: nvc0 before resetting it: https://gist.githubusercontent.com/karolherbst/b8096a1f6637130b3b7c2ca006a286f9/raw/380953f484d948003ccbe60d80abbf2e164780f9/gistfile1.txt
02:38karolherbst: ehh
02:38karolherbst: seems like some stuff is already set to dirty..
02:38karolherbst: damn gdb
02:38imirkin_: "some"?
02:38imirkin_: aka "all"?
02:38karolherbst: meant fields
02:39karolherbst: tfbbuf_dirty isn't set :)
02:39karolherbst: but nvc0->state is still in the pre reset state
02:39imirkin_: huh? it's set to 0xf?
02:39karolherbst: what?
02:39imirkin_: tfbbuf_dirty = 15 '\017',
02:39karolherbst: yeah
02:40karolherbst: ohh
02:40karolherbst: mhh
02:40karolherbst: guess that never gets cleared
02:40imirkin_: well dirty_3d/cp are also 0xffffffff
02:40karolherbst: yeah.. let me dump new data without being messed up
02:40karolherbst: used the wrong line with until
02:41karolherbst: imirkin_: https://gist.githubusercontent.com/karolherbst/4d7a795e340fe8e63810d5c146901f4c/raw/e0208d23038d58c1daf6a7943e43257d7edf3fa0/gistfile1.txt
02:42karolherbst: I guess tfbbuf_dirty stays dirty as long as there is no tfb bound
02:42karolherbst: actually....
02:42karolherbst: is tfbbuf_dirty ever set to 0?
02:43karolherbst: :O
02:43karolherbst: uhm
02:43karolherbst: I think we never clear tfbbuf_dirty
02:53karolherbst: imirkin_: :D okay, I am super confident it's texture/sampler related
02:53karolherbst: I actually played the game a little and was able to trigger READ faults on textures
02:53karolherbst: also was able to saw corrupted textures and the likes
02:55karolherbst: even by looking around I was able to get those sph errors to start again
02:56karolherbst: it survived like 100 channel resets though:D but now X crashed
02:56karolherbst: but yeah... maybe the sampler state is a little messed up are something
02:57karolherbst: oh well.. maybe I'll figure it out tomorrow
02:58karolherbst: synced pb submission prevent those cpu_prep stalls btw... so I guess "dead" pb submissions are causing the issue here
03:28imirkin_: just feed a tsc cache flush or something
05:58glennk: does NV40_3D_MRT_BLEND_ENABLE actually work for nv40?
07:37imirkin_: glennk: no clue. seems like it should, no?
07:38glennk: no errors reported, but doesn't appear to do anything
09:22karolherbst: imirkin_: yeah.. I guess
09:35karolherbst: well.. it's not a TSC_FLUSH at least which is missing
09:43karolherbst: duh....
09:43karolherbst: as no texture/sampler state changes it also doesn't get flushed
10:10karolherbst: ehh..
10:10karolherbst: maybe I should look from a different perspective on this
10:10karolherbst: disable stuff so I get the same errors before we reset the context
11:44karolherbst: disabling all state validate thingies gibes me 0x040800 ...
11:44karolherbst: *gives
13:44KitsuWhooa: It looks like suspend is broken on a GeForce 8400 GS (G86) after updating to mainline kernel 5.13.9. It was working fine on 5.4 that shipped with ubuntu 20.04. Any idea what could be going wrong?
13:44KitsuWhooa: https://tasossah.com/txt/nouveau_suspend_trace
13:44KitsuWhooa: I'd like to avoid a full bisect between the two versions, given how long it'd take to compile a kernel on this computer
13:48karolherbst: KitsuWhooa: yeah.. not quite sure. Can you install other kernel versions and try with those?
13:48karolherbst: sadly ubuntu makes it hard to install random kernels versions even though they have the deb files for it :(
13:48KitsuWhooa: Yeah, they bumped up the libc version requirement and you can only install the new kernels on 21.04
13:49KitsuWhooa: I'll find what their last mainline build that works with 20.04 and try that
13:49karolherbst: okay, cool
13:51glennk: funny, resume isn't working on 5.4 for the nv43 i'm currently testing with
13:52KitsuWhooa: whoops :p
13:52glennk: and later kernels have working resume but something is going wrong in ttm somewhere leading to random hangs
13:52karolherbst: yeah.. that stuff is quite annoying
13:54karolherbst: I can check with my Tesla GPUs, but atm busy with other things :/
13:58karolherbst: imirkin_: fun... disabling occlusion queries makes the shader errors go away.. :/
14:01karolherbst: okay...
14:01karolherbst: so it's queries for real I guess
14:01karolherbst: fun that the game even launches though
14:02karolherbst: as disabling PIPE_CAP_OCCLUSION_QUERY gives us GL 1.4 :)
14:02KitsuWhooa: So, with 5.8.18 it did suspend and resume after taking a while but it threw these
14:02KitsuWhooa: [ 146.414682] nouveau 0000:06:00.0: bus: MMIO write of 00000000 FAULT at 00fd94
14:02KitsuWhooa: [ 146.415274] nouveau 0000:06:00.0: bus: MMIO write of 00000000 FAULT at 103d94
14:03KitsuWhooa: I'm not sure if they are related to it or not
14:03karolherbst: KitsuWhooa: those are usually not relevant
14:03KitsuWhooa: ah, thanks!
14:03karolherbst: they just mean we read from a register which doesn't exist
14:03karolherbst: or
14:03karolherbst: are disabled
14:03karolherbst: or eh..write to it
14:03karolherbst: of course they could indicate errors
14:03karolherbst: but on their own they are usually harmless
14:04KitsuWhooa: Yup, makes sense
14:04karolherbst: KitsuWhooa: I assume you can't install 5.9 or newer?
14:05KitsuWhooa: I was going through the build logs to see if I could tell what I can and can't install but I didn't find anything, so I picked 5.8 at random
14:05KitsuWhooa: I'll try 5.9 now
14:07KitsuWhooa: Hopefully I'll narrow it down enough and then do a localmodconfig bisect
14:07KitsuWhooa: Building a generic kernel on a Q6600 will take forever :p
14:07karolherbst: KitsuWhooa: I usually use the config files or the distribution kernels, but yeah.. localmodconfig should be better :D
14:08karolherbst: KitsuWhooa: although that CPU isn't _that_ slow :D
14:09KitsuWhooa: It's still holding up surprisingly well :p
14:09karolherbst: my 8700 seems to be 2 to 7 times faster depending on the benchmark
14:09karolherbst: as compiler usually don't use fancy SSE or AVX thingies, I assume compilation might be 2-3 times slower
14:09karolherbst: which is still not that bad
14:10karolherbst: but it also benefits from having tons of cores...
14:10KitsuWhooa: Yup :p
14:11karolherbst: but sounds like you can easily achive +20% through overclocking :D
14:11karolherbst: oh well...
14:12KitsuWhooa: I have the not so great stepping that heats up more, so unless I get a better cooler I probably won't bother
14:12KitsuWhooa: I don't remember the details
14:12karolherbst: ahh
14:12karolherbst: annoying
14:16KitsuWhooa: 5.9 took even longer to suspend, but it did eventually do so and also resumed. There were two of these as well
14:16KitsuWhooa: [ 98.142284] nouveau 0000:06:00.0: fb: trapped read at 000000b018 on channel -1 [1fed0000 unknown] engine 06 [BAR] client 08 [PFIFO_READ] subclient 00 [FB] reason 00000002 [PAGE_NOT_PRESENT]
14:16karolherbst: mhhh
14:16karolherbst: KitsuWhooa: okay... sooooo
14:17karolherbst: KitsuWhooa: how often did you try with 5.13?
14:17karolherbst: maybe you just hit a bad timing
14:17KitsuWhooa: All of the times I tried it seemed to fail
14:17karolherbst: ahh
14:17KitsuWhooa: I can try it again
14:17karolherbst: question is, did you try on purpose or did it only break whenever you tried to suspend?
14:18KitsuWhooa: First time was because I needed to suspend it, second time was to check if it was a one off
14:19KitsuWhooa: And then the third and fourth times were the same thing
14:20karolherbst: okay
14:20KitsuWhooa: You think it works some of the time with the new kernel and I'm just unlucky? :p
14:20KitsuWhooa: I can try it a few times
14:20karolherbst: not sure
14:20karolherbst: might be
14:20karolherbst: that's important to know before bisecting :D
14:21KitsuWhooa: Indeed. I assumed that it was flat out broken but it might not be the case
14:21KitsuWhooa: Especially since it takes at least a minute to suspend now
14:22karolherbst: yeah
14:22KitsuWhooa: It seems like I always get the worst issues to bisect :p
14:25KitsuWhooa: First attempt to resume, it took 2 minutes and then it gave up and woke back up
14:26KitsuWhooa: On 5.13.X
14:26KitsuWhooa: Without me pressing anything
14:26KitsuWhooa: I'll do it 4 more times I guess
14:45KitsuWhooa: Yeah, all 5 attempts failed the same way
14:45karolherbst: imirkin_: .... I figured it out...
14:46karolherbst: imirkin_: I think the problem is that the queries return garbage :)
14:48karolherbst: okay... how annoying
15:00karolherbst: imirkin_: yeah... so if I return 0 for queries started with the old channel, I significantly reduce the amount of errors....
15:01karolherbst: so guess the games pushes the value through uniforms or something and some smart algirithm is doing... something
15:02karolherbst: and that annoys the hw
15:02karolherbst: if the values are bogus
15:10KitsuWhooa: That's interesting. 5.11 won't suspend, but it won't give up and wake up either. It stayed there with the fans spinning until I pressed a key, and then the machine rebooted on its own
15:11karolherbst: :(
15:11karolherbst: as you see, not many users do actually suspend their machines
15:12KitsuWhooa: I only suspend my main desktop when the ups battery is running low, so I agree :p
15:12karolherbst: ah
15:14KitsuWhooa: Let's try again I guess, because the trace in the logs showed unrelated acpi functions, which is a bit concerning
15:18KitsuWhooa: Yup. 5.11 suspended fine now :p
15:19karolherbst: imirkin_: not great, but better than before: https://gist.githubusercontent.com/karolherbst/6058a49704a15124743a6e60fdb153b0/raw/dea79912797bbe84ed940d5339d007e0a4ce9677/gistfile1.txt
15:22karolherbst: bit it is still strange..
15:22karolherbst: I'll ask nvidia about those errors
15:24karolherbst: oh wow.. the game survived 88 channels :)
15:24karolherbst: KitsuWhooa: whcih GPU was it? 8400?
15:25KitsuWhooa: Yup, 8400 GS
15:25KitsuWhooa: I'm currently testing 5.12
15:26KitsuWhooa: Oooh. 5.12 failed to suspend and woke up
15:26karolherbst: I'll try to reproduce now
15:26KitsuWhooa: So it's somewhere between 5.11 and 5.12
15:26KitsuWhooa: Awesome, thanks! I'll start the proper bisect now
15:26karolherbst: let's hope my 8600 hits it
15:27KitsuWhooa: Just keep in mind it might take more than a minute for it to actually suspend :p
15:28KitsuWhooa: I can bisect that too afterwards but I'm not too fussed about it at the moment
15:28karolherbst: yeah, no worries :D
15:28karolherbst: those old GPUs always are so super noisey
15:29karolherbst: *noisy
15:29karolherbst: KitsuWhooa: mhh.. mine went off in an instant, but that was without a desktop
15:29karolherbst: well
15:29karolherbst: resuming doens't work out so well thoguh
15:30KitsuWhooa: Mine has a broken fan and the previous owner bodged a generic fan on it, so it's not that loud :p
15:37karolherbst: ehhh, rebooted the wrong machine
15:39KitsuWhooa: I know the feeling :pp
15:39KitsuWhooa: *:p
15:40karolherbst: ehh.. yeah.. so at least on my g84 everything is fine
15:40karolherbst: ohh
15:40karolherbst: I have a 8400GS
15:41karolherbst: sometimes you lose the overview
15:42KitsuWhooa: I have the feeling that the failure is specific to my machine
15:42KitsuWhooa: Knowing my luck
15:44karolherbst: KitsuWhooa: or your desktop
15:45KitsuWhooa: It happens even while in the lightdm greeter
15:45karolherbst: yeah well.. that 8400 suspends just fine, where the 8600 just failed :)
15:45karolherbst: does't display shit though
15:45KitsuWhooa: :D
15:45karolherbst: wait it does
15:46karolherbst: the blinking cursor
15:46karolherbst: ewww ugh
15:47karolherbst: guess restarting gdm helped
15:47karolherbst: KitsuWhooa: mind trying without any graphical interface?
15:47KitsuWhooa: sure
15:47KitsuWhooa: systemctl suspend, right?
15:47karolherbst: yeah.. but I meant with stopping whatever runs as well
15:48KitsuWhooa: yup
15:48karolherbst: so it works for me with gnome and without
15:49KitsuWhooa: So, without the desktop running, it immediately "woke up" with the nouveau suspend trace
15:49KitsuWhooa: and the some devices failed to suspend
15:49KitsuWhooa: message
15:49karolherbst: ehhh
15:50KitsuWhooa: Wait. Can you try suspending without any monitors plugged in?
15:50karolherbst: mind pastebining your dmesg?
15:50karolherbst: uhm... sure? worst case you use ssh
15:50KitsuWhooa: No, I meant that I have no monitors plugged in to the nvidia card
15:50KitsuWhooa: So maybe that's the problem
15:50karolherbst: ohhhh
15:50karolherbst: mhhh
15:51KitsuWhooa: and yeah, sure
15:51karolherbst: so it's your secondary GPU or something?
15:51KitsuWhooa: Yeah, it's a secondary card
15:51karolherbst: mhhh
15:51karolherbst: it shouldn't matter, but...
15:53KitsuWhooa: this is the log with the suspend without X running
15:53KitsuWhooa: https://tasossah.com/txt/nouveau_suspend_kernel_log
15:53KitsuWhooa: X did initially start on the login screen with lightdm, but I stopped it and then suspended
15:53karolherbst: well.. my desktop CPU also has an intel GPU :)
15:54karolherbst: yeah.. works just as well
15:55karolherbst: heh
15:55karolherbst: I have a different 8600 GS
15:55karolherbst: 01:00.0 VGA compatible controller: NVIDIA Corporation G98 [GeForce 8400 GS Rev. 2] (rev a1)
15:55karolherbst: G98 :)
15:55KitsuWhooa: 06:00.0 VGA compatible controller: NVIDIA Corporation G86 [GeForce 8400 GS] (rev a1)
15:55KitsuWhooa: ah :p
15:56karolherbst: not even the awesome wiki tables knows about that combination
15:56karolherbst: yeah.. that's slightly annoying
15:58karolherbst: mhh
15:58karolherbst: KitsuWhooa: guess you need to bisect then
15:58KitsuWhooa: Yup. Just started
15:58karolherbst: although "nouveau 0000:06:00.0: fb: trapped read at 0100443034 on channel -1 [1fed0000 unknown] engine 06 [BAR] client 08 [PFIFO_READ] subclient 01 [IN] reason 00000002 [PAGE_NOT_PRESENT]" that doesn't look so well
15:59glennk: one chip is just mounted upside down
15:59KitsuWhooa: that happened whhen I stopped lightdm
15:59karolherbst: ahh
15:59KitsuWhooa: so it was before I suspended
15:59karolherbst: I am sometimes wondering if we should just ignore most of the errors and suspend regardless...
16:00karolherbst: device gets power cycled anyway
17:12KitsuWhooa: karolherbst: 14 kernel rebuilds later...
17:13KitsuWhooa: 64f7c698bea9cf84cb224fd4352964c2af7252d9 is the first bad commit
17:13KitsuWhooa: It doesn't look too helpful...
17:13karolherbst: bisecting the kernel is _always_ annoying
17:14karolherbst: I usually need like 2-3 bisects
17:14karolherbst: KitsuWhooa: at least you ended up with a nouveau commit
17:14karolherbst: KitsuWhooa: can you verify that the commit before that is absolutely fine?
17:14KitsuWhooa: Yeah, when it narrowed down to nouveau, I stopped worrying
17:14KitsuWhooa: I'm going to right now :p
17:17KitsuWhooa: just to make sure, I should test at ab0db2bd853d4a61bf440d2846b046a1d11ce027
17:17KitsuWhooa: correct?
17:17karolherbst: yes
17:17KitsuWhooa: awesome
17:19karolherbst: soo nouveau on your GPU obviously used g84_fifo_chan_engine
17:20karolherbst: where now it uses g84_fifo_engine_id...
17:20karolherbst: soo
17:20karolherbst: let's see if some mapping is bonkers
17:20karolherbst: ehh lol
17:20karolherbst: it's different
17:21karolherbst: yeah
17:21karolherbst: that's broken
17:22karolherbst: KitsuWhooa: mind making g84_fifo_engine_id return the values there but -1?
17:22karolherbst: add a "int ret;" thing and assign it and just return "return ret -1;" at the end or so
17:23karolherbst: instead of returning in the switch
17:24KitsuWhooa: Right, just tested it. I can confirm that that commit works :p
17:24karolherbst: skeggsb: 64f7c698bea9cf84cb224fd4352964c2af7252d9 is broken :)
17:24karolherbst: for g84 the new values are one too high
17:24karolherbst: KitsuWhooa: do you want to fix it yourself or shall I write a patch?
17:25KitsuWhooa: I wouldn't mind fixing it myself
17:25KitsuWhooa: give me a bit first
17:25karolherbst: I just don't know if it fixes other stuff :D
17:25karolherbst: but should be fine if you test on top of 64f7c698bea9cf84cb224fd4352964c2af7252d9 directly
17:26KitsuWhooa: Right, let me try doing what you suggested
17:26KitsuWhooa: it takes a while :p
17:31KitsuWhooa: So, question. Does that mean that the defines in the header are wrong?
17:32karolherbst: ehh.... dunno tbh
17:32KitsuWhooa: Because if that's the case, aside from testing, returning ret - 1 doesn't sound like a good idea if that's the case :p
17:32karolherbst: yeah.. let me check
17:33karolherbst: KitsuWhooa: if you are on top of the 64f7c698bea9cf84cb224fd4352964c2af7252d9 you can also change the defines to have a lower value
17:33karolherbst: they are new as it seems
17:34KitsuWhooa: Yeah, I am, but I already did return CONST - 1; for each one in the switch
17:34KitsuWhooa: faster than making a new variable and assigning it
17:34karolherbst: okay
17:34KitsuWhooa: and adding breaks
17:34karolherbst: :D
17:34karolherbst: true
17:38KitsuWhooa: karolherbst: looks like that worked
17:38karolherbst: okay...
17:38karolherbst: thanks for figuring out which commit broke it :)
17:39karolherbst: KitsuWhooa: mind testing on something newer like.. I dunno.. 5.13 with a patch changing the defines?
17:39karolherbst: or 5.14?
17:39KitsuWhooa: Yeah, sure thing
17:39karolherbst: just to make sure that fix doesn't break others things
17:39karolherbst: cool, thanks
17:39KitsuWhooa: https://tasossah.com/nouveau_fifo_engine_id.patch
17:39KitsuWhooa: that's what I did basically
17:40karolherbst: yeah
17:40karolherbst: good enough for a quick test :)
17:42KitsuWhooa: hmmm
17:42KitsuWhooa: nvkm/engine/fifo/channv50.h:#define G84_FIFO_ENGN_SW 0
17:42karolherbst: mhhh
17:43KitsuWhooa: I get the feeling that's not meant to be -1 :p
17:43karolherbst: KitsuWhooa: SW is essentially fake stuff
17:43karolherbst: just use... I dunno 15 or the biggest one
17:43KitsuWhooa: aaaah I see
17:43karolherbst: other archs use 15
17:44KitsuWhooa: 15 sounds good then
17:44karolherbst: yeah... not 100% sure on this though... things are a bit .. weird
17:45karolherbst: skeggsb: would probably know best how to fix it
17:45karolherbst: he might be up in 4 hours or so :D
17:45KitsuWhooa: NV50 is 0 as well
17:45KitsuWhooa: Yeah, fair enough. I'll play with it, and if I get the OK I might make a patch
17:45karolherbst: yeah...
17:46karolherbst: might be the numbers being different in various places ¯\_(ツ)_/¯
17:46KitsuWhooa: Given how much time I spent on this I'd like to at least submit a patch, but eh :p
17:47karolherbst: I don't want to say that those old GPUs are bonkers in a lot of ways, so let's just say there are less consistent :D
17:48KitsuWhooa: I think bonkers sounds about right
17:51KitsuWhooa: Hmmm. I tried logging in and the whole computer froze
17:51karolherbst: yeah...
17:51karolherbst: I guess new patches messes something up
17:51karolherbst: KitsuWhooa: I have a different idea
17:52karolherbst: KitsuWhooa: on top of a new kernel, just add an -1 after the call to engine_id inside g84_fifo_chan_engine_addr
17:52KitsuWhooa: I didn't even get around to testing the define edits because the latest HEAD won't build
17:52karolherbst: ohh
17:52karolherbst: annoying
17:52KitsuWhooa: I tried logging in to look at the config changes I have to make that I don't remember off the top of my head, and it froze
17:53KitsuWhooa: I really should get a monitor and plug it in to that card to make sure it actually works, too :p
18:07KitsuWhooa: So, got 5.14.0-rc5 from HEAD built with https://tasossah.com/nouveau_fifo_engine_id2.patch on top
18:08KitsuWhooa: I stopped lightdm on the login screen, suspended, resumed successfully, and then plugged in a monitor and started X and logged in.
18:08KitsuWhooa: Looks like it's displaying video just fine
18:09karolherbst: mhh, cool
18:09karolherbst: and GL also works and everything?
18:09KitsuWhooa: DRI_PRIME=1 glxgears renders
18:09karolherbst: KitsuWhooa: and is it suspending instantly or is there still a delay?
18:12KitsuWhooa: It suspends instantly without the desktop running. It still takes a while with the desktop running.
18:12karolherbst: mhhh
18:12KitsuWhooa: That probably started before 5.8
18:12karolherbst: yeah...
18:13karolherbst: if you don't mind you could figure that one out as well :D
18:13KitsuWhooa: Sure. Only downside is that bisect cycles would take ages due to how long it takes :p
18:14karolherbst: well.. you already finished one bisect within 2 hours
18:14karolherbst: I always plan an entire day for that :)
18:15KitsuWhooa: I told it to suspend a few seconds before I sent the message saying it's still taking a while
18:15KitsuWhooa: I'm still waiting :Pp
18:16KitsuWhooa: I get the feeling that it won't actually suspend now at all
18:16karolherbst: :(
18:17KitsuWhooa: I wonder if it's because I left glxgears running
18:17karolherbst: ehh.. hopefully not, but maybe
18:18KitsuWhooa: *sigh*
18:20KitsuWhooa: I give up, I'll press a key
18:21KitsuWhooa: Yeah I think it's frozen. RIP
18:21karolherbst: anything in dmesg or just nothing?
18:21KitsuWhooa: Well, the screens are off as it's still trying to suspend
18:21KitsuWhooa: I'd need to reboot, and I assume doing that won't write anything to disk
18:22KitsuWhooa: Maybe I should get out the good 'ol null modem cable and enable the serial console
18:22karolherbst: I git a RS232 one for my desktop :D
18:22KitsuWhooa: I'm "lucky" in the sense that both the desktop with the nvidia card and this laptop have RS232 ports
18:23KitsuWhooa: Anyway, hit the rest button
18:27KitsuWhooa: https://tasossah.com/uploader/files/nouveau_glxgears.png this is what I was referring to earlier
18:35karolherbst: I think there is something broken with suspending engines or something.. mhh
18:39KitsuWhooa: ...oh no
18:40KitsuWhooa: https://tasossah.com/nouveau.log
18:40KitsuWhooa: this is why it never suspends now
18:40karolherbst: uhhh.. oops
18:41karolherbst: I think that one got fixed
18:41KitsuWhooa: I can't tell if it's because of my patch, or because of something else
18:41karolherbst: let me find the commit
18:41KitsuWhooa: Ah, so it didn't make it to the main repo?
18:41karolherbst: mhh should have actually
18:42KitsuWhooa: this is v5.14-rc5-256-g0aa78d17099b
18:42karolherbst: ehh might have been something else.. let me see
18:43karolherbst: ehh wait
18:43karolherbst: no that could be because of your patch actually... the trace is just weird
18:44karolherbst: a3a9ee4b5254f212c2adaa8cd8ca03bfa112f49d was the fix I was thinking about
18:44karolherbst: but that's in since rc3
18:47karolherbst: ehhh
18:47KitsuWhooa: I'll try this patch on that commit that introduced the issue in the first place
18:47KitsuWhooa: that'll clear up if it's my patch or something else
18:47karolherbst: yeah
18:47KitsuWhooa: it just never ends! :p
18:47karolherbst: I suspect future changes might depend on the different values
19:02KitsuWhooa: karolherbst: It took a while but it suspended and resumed with my patch. That said, it spewed these out
19:02KitsuWhooa: https://tasossah.com/txt/nouveau_suspend_kernel_log_2
19:03KitsuWhooa: But honestly, I've had enough for now :p
19:03karolherbst: KitsuWhooa: yeah.. something is not right :D
19:03karolherbst: maybe skeggsb has a GPU to trigger those issues
19:04KitsuWhooa: Yeah, I'll wait for their opinion for now, because I'm tired :p
19:45karolherbst: imirkin: that's the recovery commit btw: https://github.com/karolherbst/mesa/commit/d977a08822a8c68e66c64113e32b6fcf732b699c
19:46karolherbst: might need some cleanup, but maybe you spot something obviously wrong with it or if we will just have bad luck here
19:46imirkin_: that seems ... surprisingly non-invasive
19:46karolherbst: yeah
19:46karolherbst: did you expect anything else?
19:47imirkin_: i didn't give it enough thought to expect one thing or another
19:47karolherbst: well.. I only implemented it for nvc0
19:48imirkin_: PUSH_KICK (push, screen, screen);
19:48imirkin_: 
19:48imirkin_: one screen too many? (nouveau_video.c)
19:48karolherbst: ehhh
19:48karolherbst: yeah.. I messed up my sed
19:48karolherbst: that's guarded by this weirdo #ifdef
19:48imirkin_: yea
19:49karolherbst: there are a few issues though.. what happens if the channel does inside some random PUSH_SPACE command.. ohh... I should check PUSH_SPACE as well
19:49karolherbst: or what happens if you crashes while validation and stuff
19:50karolherbst: but I guess we should just restart validation and give up if it crashes again
19:50karolherbst: but the kernel isn't ready yet anyway
19:52karolherbst: but with synced pushbuffers I didn't run into this bo_wait problem at least
19:53karolherbst: anyway.. it will need this libdrm MR: https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/188 would be cool to get it in, or I just merge it or something
19:54imirkin_: karolherbst: btw, double-screen in nv98_video_* as well
19:56karolherbst: yeah... I will rework the entire patch anyway
19:56karolherbst: although I doubt I can do anything better for PUSH_KICK?
19:56karolherbst: mhh
20:03imirkin_: looks nice overall
20:04karolherbst: yeah... I am actually surprised how well it works
20:04imirkin_: well
20:04imirkin_: the problem is that simulated hangs are different from real ones
20:05karolherbst: it's not simulated
20:05karolherbst: the channel dies for real
20:05imirkin_: sure
21:12karolherbst: ohh.. I have a better idea on how to deal with recovery :)