00:04airlied: yeah but I wonder is sys_pitch like an errno
00:04airlied: or actually a negative pitch
00:07karolherbst: maybe ENOMEM
00:07imirkin: probably an errno...
00:07imirkin: the product is -256k
00:07imirkin: which means it must be a POT error :)
00:09imirkin: should probably just try to repro
00:10karolherbst: mhh I checked the prime factors and only 12 is the only sane value I can get
00:10karolherbst: 2^2 * 3 * 5 * 17 * 257
00:11imirkin: not quite 256k
00:11imirkin: $ factor 262140
00:11imirkin: 262140: 2 2 3 5 17 257
00:11karolherbst: ohh there is a factor tool :D
00:11imirkin: yeah, could easily be ENOMEM
00:24Wolf480pl: karolherbst, do I understand correctly that you managed to replicate the GPU lockups that happened to me on 3 highest cstates?
00:25Wolf480pl: so it's ok if I don't read this channel for a couple days?
00:26karolherbst: yeah I think I will somehow find out what nouveau needs to do on my own now
00:33imirkin: Local0 += Arg0 = (VRMB (0x04) + Local0)
00:34imirkin: i wonder if that's the same thing as "local0 += arg0; local0 = (foo + local0)"
00:34karolherbst: imirkin: I don't think so
00:34karolherbst: Arg0 gets set too
00:35imirkin: karolherbst: this is AML
00:35imirkin: or ASL or whatever
00:36karolherbst: ohh okay
01:44imirkin: actually that 0x3fffc feels a lot more like 0xffff * 4
02:07karolherbst: mhh, would also make sense :D
09:49imirkin: mwk: any idea if a nv3x implements the nv2x/nv1x 3d classes?
09:50imirkin: mwk: iirc nv2x/nv1x are all backwards compatible with one another
09:53karolherbst: mupuf_: okay, the pdaemon counters can be tinkered with and the blob upclocks accordingly :)
10:25karolherbst: awesome, I can't trace the blob anymore :/
10:25pmoreau: What happens?
10:26karolherbst: mmiotrace: unexpected secondary hit for address 0xffffc90010001070 on CPU 0.
10:26karolherbst: then BUG: unable to handle kernel paging request at ffff8800f6000008
10:26imirkin_: ohhhh that's sad
10:26pmoreau: Oh yeah, I hit that as well
10:26karolherbst: bit stack
10:26karolherbst: I know it is somehow kernel config related
10:26imirkin_: that means that they're using a funny instruction
10:26imirkin_: that mmiotrace doesn't fully support
10:26imirkin_: (at least iirc)
10:26karolherbst: instruction as in x86 instruction?
10:27imirkin_: can you pastebin the full thing?
10:27karolherbst: I compile my kernel with native optimizations
10:27karolherbst: imirkin_: is pstate good enough or should I bother my kernel log
10:27imirkin_: karolherbst: i meant the BUG
10:28karolherbst: yeah lol, journalctl makes it hard to copy paste, what a pain :/
10:29karolherbst: imirkin_: https://gist.github.com/karolherbst/2d39c069fcf657f169b2
10:29imirkin_: you're missing some bytes
10:30imirkin_: the Code: line is cut off
10:31karolherbst: nope, there isn't such line
10:31imirkin_: it's there, it's just cut off
10:32imirkin_: in width
10:32imirkin_: there should be more letters at the end
10:32imirkin_: at least i think there should be
10:33imirkin_: that's better!
10:33karolherbst: yeah, journalctl uses less all the time :/
10:34imirkin_: except... it mgiht be the wrong code
10:34karolherbst: it's so annyoing
10:34imirkin_: solution: don't use systemd
10:34karolherbst: or pass through to system logger
10:34imirkin_: sure, you can patch around the idiocy
10:34imirkin_: or you can just excise it
10:35karolherbst: I could disable the gcc optimizations and try again
10:35karolherbst: or is it something inside nvidia?
10:35imirkin_: it's in the blob code
10:35karolherbst: ohh okay
10:35karolherbst: I think I will ust remove nvidia-smi and see what happens
10:35karolherbst: that tool is useless for me anyway
10:35imirkin_: also that code is from the wrong function =/
10:35imirkin_: very sad.
10:36imirkin_: i need to have a closer look at mmiotrace
10:36imirkin_: last i looked closely was back when pq was hacking on it
10:36karolherbst: like nvidia-settings doesn't care about nvidia-smi :D
10:37karolherbst: again :/
10:37mwk: imirkin_: nv3x implements the nv2x classes
10:37mwk: but not the nv1x classes
10:38mwk: this apparently includes supporting NV20 VP code (by translating it to NV30 VP ISA)
10:39mwk: have a look at the support table at http://envytools.readthedocs.org/en/latest/hw/graph/intro.html for exact classes supported
10:39imirkin_: mwk: awesome thanks
10:39imirkin_: i might grab a pci nv3x so i can test all generations at the same time
10:39imirkin_: and add a hack to optionally allow using nouveau_vieux with nv3x
10:43karolherbst: what a mess that is
10:45imirkin_: looks like i should get NV25_3D
10:46imirkin_: mwk: those variants seem off btw... NV10 had NV15_3D?
10:47imirkin_: also seems likely that NV11 would have had NV11_3D (unless NV11 is ordered after NV17? shouldn't be)
10:47mwk: ugh... right, NV15_3D seems off
10:47mwk: and as for NV11_3D, this is an ugly one
10:48imirkin_: coz NV1A is pre-NV11? :)
10:48mwk: whatever that class is (I'm not sure), it's definitely not present on NV11
10:48mwk: but it's called 0x1196 by nv blob
10:49imirkin_: yeah, this is how we pick the class for vieux: http://cgit.freedesktop.org/mesa/mesa/tree/src/mesa/drivers/dri/nouveau/nv10_context.c#n472
10:50mwk: FWIW weird shit *is* common with the variants
10:50mwk: eg. NV5_SIFM that doesn't exist on actual NV5 :)
10:50imirkin_: hehe fair enough
10:51mwk: but the NV15_3D thing is a mistake, I'll fix that
10:52imirkin_: that one should be nv15+, aka nv11+ if you look in numerical space
10:54mwk: rnndb seems to have the right variants...
11:00karolherbst: imirkin_: reading the mmiotrace doc really helps understanding what's going wrong :D
11:00imirkin_: cool :)
11:01karolherbst: like page faults is the way mmiotrace does stuff
11:01karolherbst: and when a page fault can't be handled, mmiotrace did it
11:02karolherbst: so I think this is the important thing: "mmiotrace: unexpected secondary hit for address 0xffffc90014001070 on CPU 0."
11:02RSpliet: nasty, isn't it... I'm sure pagefaults weren't invented for this ;-)
11:03RSpliet: (oh... and considering each page fault is over 1000 cycles, you'll get why mmiotrace is slow :-D)
11:03karolherbst: they are faster with miotrace
11:03karolherbst: because mmiotrace handles them
11:03karolherbst: not the _normal_ kernel thingy
11:06RSpliet: t make much of a difference unfortunately :-D
11:07karolherbst: mhh: ./arch/x86/mm/kmmio.c: pr_info("unexpected secondary hit for address 0x%08lx on CPU %d.\n",
11:07karolherbst: inside kmmio_handler
11:08karolherbst: I knew it
11:08karolherbst: so this is what happens:
11:08karolherbst: mmiotrace does mark the page as not there, and triggers a page fault
11:08karolherbst: and while handling this page fault, on the same address another page fault is tirggered
11:08mlankhorst: this is why mmiotrace offlines all cpu's
11:08mlankhorst: except boot
11:08imirkin_: recursive pagefault
11:09imirkin_: but it shouldn't happen
11:09imirkin_: unless the page is really not there
11:09imirkin_: it might do the wrong thing in that fallback case
11:09karolherbst: this is what the comment says: "A second fault on the same page means some other condition needs handling by do_page_fault(), the page really not being present is the most common."
11:10mlankhorst: or forgot to insert the page
11:14karolherbst: what does it mean when a kmmio_context is active?
11:15karolherbst: then I need to debug this issue with nvapeek :/
11:17imirkin_: or use an older blob
11:18karolherbst: I don't think it is caused by the driver directly :/
11:18karolherbst: I had this problem with much older version already
11:18karolherbst: and older kernel
11:18karolherbst: and it just disappear and appears again
11:43imirkin_: airlied: hey, so i was just perusing the ACPI 5.0 spec, and looks like _DSM is actually supposed to take a PACKAGE object as its last arg, while nouveau passes a buffer. do you know if earlier specs required a buffer?
11:44pmoreau: imirkin_: Earlier did as well
11:44imirkin_: looking at ACPI 4.0, also package
11:45imirkin_: ACPI 3.0, also package
11:45pmoreau: But as some ACPI tables were using BUFFERs, it is said in the spec that it can be a BUFFER, even if it is supposed to be a PACKAGE
11:45pmoreau: I had some patches to improve the ACPI warnings in Nouveau
11:47pmoreau: I made them as part of adding support for dual Nvidia GPU optimus
11:48karolherbst: imirkin_: it is messed up, it's everywhere implement wrongly on the laptops
11:50imirkin_: can a package be evaluated as a buffer?
11:52pmoreau: I'd say so, but not 100% sure
11:52pmoreau: -^ one of the patch I proposed
11:53pmoreau: I think using `acpi_evaluate_dsm` rather than `acpi_evaluate_dsm_typed` is better, as it let ACPI handle whether it is a PACKAGE or BUFFER
11:55karolherbst: imirkin_: do you think it is coincidence that the M parameter in the clk pll is always 0x1f with the blob? :D
11:56karolherbst: and P always 0x1
11:56imirkin_: pmoreau: ah
11:56karolherbst: Wolf480pl: wanna try out a patch today? :D
11:56karolherbst: this also explains why the blob only has like 15MHz steps
12:27karolherbst: mupuf_: okay, even the pll values from the blob doesn't help
12:27karolherbst: at all
12:28karolherbst: gpu clocked +50MHz with nouveau is already unstable and I use a higher voltage than nvidia with +135MHz
12:28karolherbst: I even poked nvidia PCLOCK stuff into the gpu
12:28karolherbst: and it didn't help
12:29airlied: imirkin_: everyone implemented things wrong from what I can tell
12:30imirkin_: airlied: how well do you know acpi?
12:30imirkin_: airlied: do you know if CreateWordField is legal on a Package object?
12:31airlied:has paged acpi out completely
12:31imirkin_: so there was a time when you knew it!
12:31airlied: probably not at that depth, I can read ASL without throwing up, but anything complex is beyond me
12:35karolherbst: ohh wow
12:35karolherbst: the blob failed soo hard now, that the kernel switched from tsc to hpet :D
12:39karolherbst: can anybody make nouveau as fast as the blob? thanks :D
12:39imirkin_: karolherbst: only you
12:40imirkin_: ... can prevent forest fires
12:41karolherbst: my entire system went into perma hung after I nvapeeked :/
12:46karolherbst: imirkin_: by the way, in pixmark_piano I get 75% blob performance
12:47karolherbst: which is a benchmark with like no gpu memory stuff
12:47imirkin_: get your patches ready for ben...
12:47imirkin_: and bug him early and often
12:47imirkin_: he's forgetful and busy
12:47karolherbst: I already do :D
12:47karolherbst: I am already nervous enough
12:48karolherbst: poked him thursday the last time
12:48imirkin_: do it again... he's putting a pull request together for dave
13:03airlied: imirkin_: good point
13:04airlied: skeggsb: this -next is very late :-)
13:13karolherbst: noo, my gddr5 patch :D
14:29RSpliet: airlied: Ben still alive and in one piece? :-P
14:31airlied: he was on holidays last week, I think he made it back :-P
14:32imirkin_: i spoke with him yesterday, so he's def alive
14:32imirkin_: or was, at least
14:34RSpliet: haha good stuff :-)
15:11karolherbst: he was back wednesday by the way already :D
15:20glennk: back to the feature
15:27mupuf_: karolherbst: hey
15:27mupuf_: what did you check exactly?
15:32karolherbst: mupuf_: 134000 0x1000 range
15:33karolherbst: the other one
15:33karolherbst: 137000 0x1000
15:35mupuf_: ok, check also in the FB area
15:36karolherbst: mupuf_: which is it?
15:36mupuf_: forgot on fermi+
15:45gryffus: Was this patch reintroduced by a proper fix? http://cgit.freedesktop.org/mesa/mesa/commit/?id=d0c22560a151a1ea726df4a6e001048a7c5b225e I'm having crash with "xe: nvc0/nvc0_screen.c:543: nvc0_screen_fence_emit: Assertion `PUSH_AVAIL(push) >= 5' failed." error. Full backtrace is here: https://bpaste.net/show/371b1b641283
15:45imirkin_: gryffus: yes it was
15:46gryffus: i'm on nvc0... any clues why i'm having this error?
15:46imirkin_: gryffus: but apparently not proper enough
15:46imirkin_: gryffus: what mesa version are you on?
15:47gryffus: which patch should solve it? I'm using galium nine branch from https://github.com/iXit/Mesa-3D
15:47imirkin_: hrmph... looks like a fail on my part
15:47imirkin_: errrr... or not.
15:48imirkin_: this is the case i was afraid of. ok. if i give you a patch, will you be able to test it?
15:49gryffus: imirkin_: no problem
15:49imirkin_: gryffus: http://hastebin.com/oroxopiwot.coffee
15:53hakzsam: imirkin_, this assert continues to give us some troubles :)
15:53imirkin_: hakzsam: well it's a legitimate assertion
15:53imirkin_: hakzsam: the problem isn't the assert... it's that it's being hit :)
15:53imirkin_: i just made the problem more visible
15:54imirkin_: instead of getting weird crashes and memory corruption
15:54hakzsam: I know
15:54imirkin_: you now get an assertion error. seems like a reasonable trade.
15:55imirkin_: then fun part is that it's generally OK if you go over a bit -- libdrm reserves a few bytes at the end for a "return" anyways. so you don't end up corrupting memory, you just end up messing up the cmdstream
15:55hakzsam: and I assume this PUSH_SPACE(0) will kick-off the pushbuf, right?
15:55imirkin_: it will ensure that there's enough space for a fence emission
15:56imirkin_: gryffus: errr, that won't build. try this: http://hastebin.com/tirikofini.coffee
15:57imirkin_: hakzsam: i kinda had an implicit assumption that some joker wouldn't be emitting fences all the time. however that can happen if you just call flush over and over
15:59hakzsam: yeah, I see
15:59imirkin_: i also have logic not to emit any fence if nothing refs the current fence
16:00imirkin_: since nothing could possibly care
16:00imirkin_: however if you supply a fence to ->flush() then it will cause fences to get rotated
16:00hakzsam: makes sense
16:00imirkin_: it's all very fragile, obviously
16:01imirkin_: but i can't think of a non-fragile way to handle it
16:01imirkin_: basically we can be called on to emit a fence from a callback at any point in time
16:01imirkin_: and we can't allocate more space from that callback
16:01imirkin_: (coz it's the callback that allocates more space)
16:02imirkin_: so we must guarantee that no matter what, there shall always be room for a fence to get put in there
16:02imirkin_: errrrrrrrr hm
16:02imirkin_: i just had a realization
16:02imirkin_: there's a rsvd_kick
16:03imirkin_: which lets libdrm ensure this
16:04imirkin_: i might rethink my strategy
16:05imirkin_: really i just need to do diff things from kick context and non-kick
16:15aaaa: when i try and use 2 monitors with different outputs on each with my laptop it crashes. the screens turn off and i cannot switch to another tty. mirroring the displays does not crash. it worked fine with the proprietary drivers. i am running debian testing and using xrandr.
16:20aaaa: from where?
16:20imirkin_: dmesg is a good start
16:21aaaa: where should i upload?
16:22aaaa: dmesg log: https://dpaste.de/9vo4
16:27imirkin_: aaaa: boot with nouveau.modeset=0
16:27imirkin_: er wait
16:27imirkin_: i thought this was an optimus setup
16:27imirkin_: you need this patch: http://cgit.freedesktop.org/~darktama/nouveau/commit/?id=f153acb3a41432e74fdbdfba9a005007e2957c1c
16:27imirkin_: and possibly some others
16:28aaaa: how do i apply it?
16:29imirkin_: aaaa: do you have an optimus option in your bios?
16:29imirkin_: aaaa: i'd enable it if i were you... intel graphics are a lot more reliable than nvidia. plus it uses less power.
16:30aaaa: i dont my laptop supports it. i wish i could use integrated.
16:30imirkin_: hm ok
16:30aaaa: *do not think
16:31imirkin_: well i def didn't see the pci device in your log, but it'd be hidden if it were disabled in the bios
16:31aaaa: mylaptop cannot use the intel graphics
16:33aaaa: so how do i apply the patch?
16:34imirkin_: things will go easier if you first install linux 4.3
17:00aaaa: well, i did a system update and now xorg crashes but my computer does not freeze and the tty is returned. here is the log https://dpaste.de/LDV1
17:02imirkin_: gryffus: probably a different issue
17:02imirkin_: gryffus: that bug is for the people for whom GPOB didn't help. but we're not even enabling GPOB for GK107
17:03skeggsb: imirkin_: we are now
17:03gryffus: imirkin_: oh, so sorry for confusion
17:03imirkin_: skeggsb: with your patch which is slated for 4.4, yes
17:03imirkin_: skeggsb: but not with kernel 4.2 or even 4.3
17:05aaaa: so what should i try?
17:13imirkin_: aaaa: the thing i said? install linux kernel 4.3 first
17:19imirkin_: aaaa: that will make it easier to build ben's tree
17:19aaaa: ok, trying to update now
17:27imirkin_: gryffus: did my patch help btw? i'm going to send a better one later.
17:33aaaa: imirkin_: its not in any debian repositroy, any way without it?
17:34gryffus: imirkin_: waiting for rebuild https://build.opensuse.org/package/show/home:gryffus:branches:home:pontostroy:gallium-nine/Mesa and doing a system update meanwhile, i will let you know
17:34imirkin_: you could build your own kernel... but if you knew how to do that, you probably wouldn't be asking.
17:34imirkin_: aaaa: you can boot with nouveau.noaccel=1
17:34imirkin_: aaaa: this should disable acceleration, but give you working monitors
17:34imirkin_: gryffus: ok thanks
17:34aaaa: will try
17:35aaaa: just as a kernel peramiter?
17:35imirkin_: aaaa: yep
17:47aaaa: imirkin_: did not work
19:02imirkin: aaaa: hmmm... maybe you also need nouveau.nofbaccel=1 in addition to it