00:05imirkin: yeah, c7 becomes the driver constbuf
00:05karolherbst: that means all computer shaders can access it?
00:06karolherbst: ohh wait
00:06karolherbst: CONT -> c7
00:06karolherbst: but it looks odd
00:06karolherbst: well, to me at least
00:10imirkin: no. address of const is loaded from c7, which is the driver constbuf
00:10imirkin: same way that ssbo's are
00:12karolherbst: what I don't get is, why there are so many c7 accesses added when going into SSA form
00:13karolherbst: ssbo stuff is also loaded from c7
00:13imirkin: c7 is the driver constbuf
00:13imirkin: it has all the info about ... stuff.
00:13imirkin: (mostly buffer addresses for compute shaders)
00:14imirkin: normally it's c15
00:14imirkin: but since there are only 8 bindable buffers, it's c7 for kepler+ compute
00:14imirkin: (p.s. wtf nvidia... why make such annoying restrictions)
00:17karolherbst: okay, anyway, need to go to bed. I am sure we will be able to track the issue down tomorrow. At least it's inside those 128 shaders somewhere (most likely=
02:48Horizon_Brave: quiet night
02:50nyef: Sunday night, and many people have to work in the morning, maybe?
03:36Horizon_Brave: nyef: I know that..was just sayin'
08:54karolherbst: hum.. vulkan only games, now it is getting interesting
09:00dboyan_: Are there vulkan-only games in the wild?
09:00karolherbst: not yet
09:03dboyan_: I guess vulkan-only game will be a hell for optimus laptops, at least now
09:05dboyan_: I once tried vulkan renderer of dota 2, and had to configure X and xrandr and let nvidia card to take over the whole rendering
09:35karolherbst: dboyan_: well everybody is free to pimp bumblbee to support vulkan
09:35karolherbst: should be failrly easy with primusrun
09:38dboyan_: Well, there was problem with blob iirc. It tries to connect to DISPLAY internally, and errors out somehow. I guess people will sort it out sooner or later
09:40dboyan_: karolherbst: I think I've found the solution to caching compute program binaries, turns out quite simple and clean. I also found a memory leak in my code. Will update my branch several hours later if I have time to verify and test my code today.
09:51karolherbst: dboyan_: awesome :)
09:51karolherbst: finally reasonable loading times for new games
09:52karolherbst: dboyan_: there is also noticable delays with hitman, I think this is due shaders being read too late
09:52karolherbst: aka at drawing time
09:52karolherbst: but this should be unrelated to your work?
09:52karolherbst: no idea
09:53karolherbst: anyway, try ot read the shader binary as soon as possible, wasting RAM is no concern
09:55karolherbst: maybe it would even make sense to optimize the cache to load shaders in advance for certain applications or so as well
10:21dboyan_: karolherbst: Now the cache will try loading shaders just before the program is translated. If cache hits, the translate pass won't be needed; if it misses it will start translating as normal.
10:21dboyan_: sometimes there will be translating and uploading just before draw calls
10:22karolherbst: just before drawing is too late
10:22karolherbst: you don't want to have IO delaying stuff
10:22karolherbst: do it at compile time as the latest
10:22karolherbst: applications/games compile their shaders in advance, so that they don't need to do it later
10:22karolherbst: the shader cache _has_ to respect that
10:23karolherbst: loading just before draw -> bug inside the shader cache, that simple
10:24karolherbst: or let me reprhase it: the shader shouldn't arrive later from tha cache than it does without the cache
10:25karolherbst: I expect that the issue I found in hitman is more due to the mesa core part of the shader cache
10:27dboyan_: well, I know what you mean, but I'm not yet clear what is needed to achieve that
10:29dboyan_: I guess the compiler infrastructure in nouveau is not as well-designed as that of, for example, radeonsi, which has fancy things like multi-threaded compiling, in-memory binary cache. But i just wonder if they will make a lot of difference
10:29dboyan_: And I gotta leave now, be back later
12:07pmoreau: dboyan_: Cool stuff! What’s your solution for caching compute program binaries? I should have a go at your branch at some point, to try it out.
13:08dboyan_: pmoreau: I decided to just store the symbol table for compute programs in the cache, it is actually no more difficult than the relocs and fixups, which are already stored
13:08pmoreau: Ah, okay :-)
13:09dboyan_: karolherbst: It seems I made a mistake with git operations yesterday, that result in pushing the outdated thing. I'll update my code again in a moment, I think the new version should be better
13:17karolherbst: dboyan_: nice :)
13:17karolherbst: will check today
13:42dboyan_: karolherbst, pmoreau: I force-pushed my branch: https://github.com/dboyan/mesa/tree/nouveau-cache, this has compute program cache enabled and fixed several previous issues
13:43pmoreau: dboyan_: Awesome, thanks! I’ll try to give it a try this week (I will have to rebase my work on top of your branch).
13:50karolherbst: dboyan_: nice, I hope I don't see the FPS drops anymore
15:08Lyude: Do we have any (probably community-made) documentation on macros like "GM204_3D.GRAPH.MACRO_UNK0124 = 0x3"?
15:08Lyude: Trying to figure out what other registers we need to poke to make NV_fill_rectangle work and I'm wondering if those might have something to do with this
15:08imirkin: no, but demmt interprets the macro
15:09Lyude: (also, I confirmed the shader test I wrote for that extension does indeed work properly with nvidia's blob)
15:09imirkin: the macros are initialized somewhere up top
15:09imirkin: and then they're invoked, and passed arguments
15:09imirkin: demmt evaluates them based on those arguments
15:09Lyude: Oh, you can define macros on nvidia firmware?
15:09Lyude: that's pretty nifty
15:09imirkin: this works for everything except indirect arguments which demmt doesn't properly capture
15:10imirkin: mmm... well, it's hardware, not firmware...
15:10imirkin: (ok, for all i know there's some firmware in the chips too, but that's not at all visible)
15:11imirkin: there's a well-defined macro language to write the graph macros in, which can do things like reading/writing pgraph methods, loop, etc.
15:11imirkin: envydis/macro.c defines the ISA
15:12imirkin: and there are nouveau macros written in nvc0/mme
15:12Lyude: nvidia definitely does a lot of stuff on their chips I haven't seen before
15:12imirkin: either way, demmt parses the macros as they're configured, and then when they're invoked, it applies them. note the diff between PB: and PM:
15:12imirkin: PB: is commands in the pushbuf
15:12imirkin: PM: is commands issued by the macro
15:13imirkin: well, normally the command processor has firmware that you can insert stuff like that into
15:13imirkin: on nvidia, the command processor has no configurability
15:13imirkin: so that was built into the GRAPH object instead
15:14imirkin: but that means you can only write macros about GRAPH things, as they don't have access to anything outside that engine
15:14Lyude: Ah, so what do you think the chance is that one of these macros might be related to why NV_fill_rectangle causes prims to go rogue and ignore their vertex points? or are macros not normally used by the blob for those purposes
15:14imirkin: on the bright side, it's runtime-configurable
15:14imirkin: well, i did notice something quite odd
15:15imirkin: which was that instead of writing some regs directly
15:15imirkin: it would invoke a macro which took the reg and value, and then wrote the value into the reg
15:15imirkin: now, that could be a workaround for a very weird hw bug
15:15imirkin: or it could be used to disable those writes conditionally (it checked some scratch method's value before doing the write)
15:15imirkin: my guess is the latter - they wanted to be able to compute cmdstreams once
15:16imirkin: but under some odd situation those writes shouldn't be done
15:17Lyude: And I'm guessing you mean some of the writes for telling the GPU to draw the rectangle right?
15:17imirkin: anyways, macros are used for various things
15:17imirkin: well, the way most (all?) gpu's work
15:17imirkin: is you set a whole bunch of values
15:18imirkin: which does nothing
15:18imirkin: and then you hit the "go" button
15:18imirkin: which actually kicks off the draw
15:18Lyude: i figured that part out at least :P
15:18imirkin: [by looking at the values that you set]
15:18imirkin: note that the FILL_RECTANGLE name is my own
15:18imirkin: for all i know it's MESS_UP_RASTERIZATION
15:19imirkin: or my favorite, TFB_UNFUCKUP_OFFSET_QUERIES
15:21imirkin: Lyude: i'd recommend recording a trace where you (a) draw without fill rectangle, (b) draw with fill rectangle and (c) draw without fill rectangle again
15:21imirkin: [a mmt trace of the blob, that is]
15:22Lyude: good idea
15:22imirkin: and maybe (d) draw with fill rectangle again for good measure
15:22imirkin: basically on the first draw, a ton of stuff is set up unrelated to that draw
15:22imirkin: which makes it hard to find anything useful
15:23imirkin: and then flipping back and forth will hopefully point out the relevant bits.
15:23imirkin: there could also be an ordering thing going on - i.e. you need to write x before you write y
15:23Lyude: pst, where can I find instructions on how to start a mmio trace?
15:23imirkin: (these methods often store values, but they can also have logic behind them)
15:24imirkin: mmt, not mmio
15:24Lyude: mmt right
15:24imirkin: Lyude: https://nouveau.freedesktop.org/wiki/Valgrind-mmt/
15:24Lyude: awesome, thanks
15:24imirkin: for mmiotrace, the best guide is probably https://wiki.ubuntu.com/X/MMIOTracing
15:25jamm: imirkin: okay, i'm using debian strip now. It installed with nouveau by default, but the mouse pointer got stuck in the top left corner (but it was still moving around, i could see the pointer change shapes as i go over text etc). After installing nvidia's proprietary drivers the problem went away. Could this have something to do with pascal?
15:25imirkin: jamm: yes, it was a bug with pascal on iirc 4.8 or 4.9 kernels
15:25imirkin: jamm: for accel to work on pascal you need a much later kernel anyways (drm-next i think)
15:27jamm: imirkin: i have access to this PC with pascal and a mac, do i need another machine to test nouveau changes or will i have to suck up and reuse the same PC for testing new changes?
15:27Lyude: also, is that VEX stuff upstream? or will I still need to compile it here
15:27imirkin: as long as you're not hacking on kernel, and not afraid of occasional hangs, same machine should be fine
15:28imirkin: Lyude: don't think so
15:28imirkin: Lyude: follow the instructions
15:28imirkin: and all will be well
15:28jamm: great! :D
15:28jamm: so if i want to bring-up pascal, like you mentioned yesterday, i'd have to compile drm-next and use that instead of 4.9?
15:28Lyude: alright, just checking :P. this laptop isn't the fastest at compiling
15:29imirkin: jamm: that'd be the easiest, yes
15:32jamm: imirkin: thanks a lot! since i'm really new to kernel level stuffs (the lowest level i have worked on is on D3D API's at wine) i might bother you a little while compiling dem-next nouveau and related dependencies. I'll try to refer existing resources as much as I can before I get to you though ^_^
15:43karolherbst: jamm: the good think about nouveau is, that after some time, coding isn't your main concern, but to figure out how stuff works, after that coding is the easy part :p
15:44jamm: karolherbst: right, that's going to take me a while before i can even get to my first line of code here XD
15:44karolherbst: other drivers (mainly radeonsi and intel) is just reading up in some docs and put this into code. this is annyoing, because boring :p
15:45jamm: so to start off, i need to compile nouveau with drm-next, and for that is following https://nouveau.freedesktop.org/wiki/InstallNouveau/ appropriate?
15:46karolherbst: you could use this
15:46karolherbst: I never used drm-next
15:48dboyan_: jamm, I guess you still need drm-next for proper Pascal accel
15:50karolherbst: dboyan_: well I rebase nouveau master on stable kernel trees, so I have current master on 4.10
15:52dboyan_: karolherbst: yeah, the out-of-tree nouveau repo should work fine, but the master branch of skeggsb's kernel repo there is a little bit old
15:53karolherbst: kernel tree is just for releases
15:53karolherbst: and drm-next doesn't contain everything as well
15:53karolherbst: if you want the newest stuff, you need to use the out of tree repo
15:55jamm: karolherbst: is this the one? https://cgit.freedesktop.org/~airlied/linux/?h=drm-next
15:56jamm: there's also a drm-tip https://cgit.freedesktop.org/drm-tip
16:11imirkin_: airlied's one
16:11imirkin_: no clue what drm-tip is
17:08karolherbst: imirkin_: time to help me with that compute OOR issue?
17:26karolherbst: dboyan_: got a segfault now
17:28karolherbst: dboyan_: on the second run on your branch: https://gist.github.com/karolherbst/c3c8f2341103f4429654f92424a3674c
17:29karolherbst: mhhh wait
17:29karolherbst: make clean first then I check again
17:40karolherbst: dboyan_: yeah, it crashes
18:00karolherbst: dboyan_: you can leave compute shaders out in your first version, that is fine
18:01karolherbst: also the long black screen is also there by having only the tgsi/glsl cache enabled
18:02karolherbst: dboyan_: also the tgsi/glsl cache has a nice big effect: 2m45s -> 1m6s
18:23Lyude: hey imirkin_ do i have to pass any special flags to that valgrind tool other then --mmt-trace-nvidia-ioctls to get the formatted MMT debugging output you got? (e.g. something that shows the name of the registers being written/read, values, colors, etc.) just running a plain trace on piglit's shader runner doesn't give me much useful info in the logs
18:24imirkin_: yeah... the nouveau tracing of it is pretty much broken
18:24imirkin_: for like 3-4 different reasons
18:24imirkin_: reason #1: nvif - you can disable that in mesa
18:25Lyude: jfyi this is tracing the nvidia blob, not nouveau
18:25imirkin_: well you need --log-file=foo
18:25Lyude: i've got that
18:25imirkin_: i think i usually also supply --mmt-something=/dev/nvidia0 etc
18:27Lyude: Just to make sure, is it possible I might need root for this?
18:29imirkin_: not possible.
18:30imirkin_: can you just share the trace? want to look at something
18:30imirkin_: (xz -9 it)
18:30Lyude: imirkin_: sure, gimme just a sec
18:34Lyude: imirkin_: https://people.freedesktop.org/~lyudess/archive/3-20-2017/gm204-nv_fill_rectangle.mmt.xz
18:35imirkin_: Lyude: add -m 126 to the demmt line
18:35imirkin_: it can't detect the chipset version
18:35imirkin_: which it tells you on the last line :)
18:37Lyude: imirkin_: weird, that argument still isn't getting it to spit out any more info
18:37imirkin_: update envytools? i pushed a fix a little while back
18:38imirkin_: [and rebuild]
18:41Lyude: imirkin_: ah, there we go, thanks!
18:43karolherbst: what is insbf?
18:43imirkin_: insert bitfield
18:43imirkin_: aka BFI
18:44karolherbst: does that make any sense? insbf u32 $r6 $r5 0x00000c04 $r63
18:44imirkin_: yeah, that zeroes out bits 12..15 on $r5
18:44imirkin_: (i think)
18:44imirkin_: or bits 4..15
18:44imirkin_: i never remember the args
18:44imirkin_: there's too many, and they're too confusing.
18:55Lyude: ah, I see the difference with non-fill-rectangle using the macro vs. fill-rectangle writing all the registers by hand now
19:01imirkin_: but i've never seen writes from a macro be any different than writes into the pushbuf...
19:01Lyude: What do you mean by that?
19:01imirkin_: macro is just a way to execute commands
19:02imirkin_: (and do so conditionally)
19:02imirkin_: and/or control-flow-ly
19:02karolherbst: I just noticed, I missed the shader-db shaders in my fma patch... well nvm
19:02Lyude: Ah, but aren't we seeing it do basically what the macro does except just by touching the registers by hand? or did I misinterpret what you said before
19:03Lyude: that would make sense if they had to do the writes in a special order to make things actually work
19:03imirkin_: we are seeing it... with PM: instead of PB:
19:03imirkin_: that said, it writes FILL_RECTANGLE directly it seems
19:03imirkin_: the macro stuff was for POLYGON_OFFSET stuff
19:03imirkin_: which for some weird reason doesn't work right on gm200
19:03imirkin_: but works fine on gm107
19:23karolherbst: that sel pass is problematic... I have some shaders which increase their gpr usage by a lot (one from 25 to 31)
19:23imirkin_: yeah, that can happen
19:24karolherbst: I could be smart int he apss and check if the selp can at least load ine immediate inside
19:24karolherbst: this would resolve that issue
19:25karolherbst: but I can't check that if I don't have the selp yet
19:25karolherbst: or I just throw the selp away later again
20:08karolherbst: this looks acceptable for a selp pass: https://gist.githubusercontent.com/karolherbst/40d0e1b8d662988b4010c0234de6d3fd/raw/c552ab62f24ca0b4367d8bd34568965000b6b045/gistfile1.txt
20:10imirkin_: huh, surprising that it has that effect on gpr's
20:10karolherbst: this is a small effect on gprs
20:11karolherbst: in my current version I only do the conversion if both are immediates, and both can be loaded (aka only 0 or limms)
20:11karolherbst: I meant short imms
20:11karolherbst: imirkin_: the effect on the gprs is easily explained
20:12karolherbst: imagine this:
20:12karolherbst: $p0 mov $r1 0x,...
20:12karolherbst: not $p0 mov $r1 0x....
20:12imirkin_: oh, i'm sure it is.
20:12karolherbst: if both can't be loaded
20:12imirkin_: yea i know. but ideally the set should go away...
20:12karolherbst: you have to track more values
20:12karolherbst: next step
20:12imirkin_: does it?
20:12imirkin_: oh, it doesn't :)
20:12imirkin_: that's unfortunate
20:13karolherbst: there is no pass for it, is there?
20:13imirkin_: oh, but the stupid branches
20:13karolherbst: the stupid branches go away
20:13karolherbst: now I have set+selp
20:13imirkin_: oh right.
20:13karolherbst: 3766: set ftz u8 $p0 le f32 $r4 abs c1[0x74]
20:14karolherbst: 3767: selp u32 $r4 $r63 0xffffffff not $p0
20:14karolherbst: guess what
20:14karolherbst: maybe I should do the selp only for 0 and -1
20:14imirkin_: so ... i'd convert that into cvt $r4 not $p0
20:14imirkin_: and then have an algebraic opt
20:14imirkin_: to "fix the glitch"
20:14imirkin_: or something
20:15karolherbst: I think I will remove my hacky detection for good situations
20:15karolherbst: and just do the selp opt if I get a bool output aka u32 0/-1
20:16karolherbst: can't I simply convert the set+selp into a slct?
20:17karolherbst: ohhhh wait
20:20karolherbst: that selp is pretty much useless in that case...
20:21imirkin_: set can output a thing directly
20:21karolherbst: imirkin_: the best is where $r4 gets used later: 3771: set u32 $r4 ne $r4 $r63 (8)
20:21imirkin_: yeah. it's all fantastic.
20:21imirkin_: improving such sequences has long been on my list
20:21imirkin_: but never made it to the top
20:22karolherbst: okay, so next step is: set $p + selp 0 -1 -> set $r
20:25karolherbst: imirkin_: where would you put such an opt?
20:25karolherbst: algebraic or constant?
20:41karolherbst: okay, now I need a good brain
20:42karolherbst: imirkin_: does this look correct to you? https://gist.githubusercontent.com/karolherbst/0ae548a011ac88cb8a800c176dde0b29/raw/cde93ee05da3f8add9fefefe5e89a7a85ea6285b/gistfile1.txt
20:44karolherbst: mhh, I don't respect the values of the result for now anyway, oh well, that's getting complicated now
20:47karolherbst: it stayed the same with respecting the result
20:52karolherbst: uhm, that one set misses a cc
20:56Lyude: By the way, what -is- PGRAPH anyway
20:57imirkin_: it's an engine.
20:57karolherbst: the important stuff
20:57imirkin_: the nvidia gpu is composed of several engines
20:57imirkin_: which behave somewhat independently
20:58imirkin_: GRAPH contains all the drawing-related items
20:59karolherbst: shouldn't invereseCondCode(lt) be gte?
21:00imirkin_: karolherbst: inverseCondCode is different than reverseCondCode
21:00imirkin_: or something
21:00imirkin_: it's all extremely confusing, and i don't remember specifics :)
21:00Lyude: Ah, btw: I'm not seeing any notable differences between the mmt traces for fill rect nv vs. fill, so I'm looking to see if the blob might be doing something differently in the macro it uses to set GL_POLYGON_MODE_FRONT/BACK
21:00karolherbst: inverse looks wrong
21:00imirkin_: karolherbst: but long story short is those functions are *correct*, if you think they're incorrect, adjust your thinking.
21:00karolherbst: reverse it is
21:00karolherbst: reverse is also wrong
21:01imirkin_: i've thought those functions were wrong many times too
21:01imirkin_: but in the end, they've come up victorious
21:01karolherbst: no, I meant in my situation I used them wrong
21:02imirkin_: oh, that's possible =]
21:02karolherbst: .............. but still
21:02karolherbst: reverse of lt is neu?
21:03karolherbst: I see
21:04karolherbst: at least everything looks broken in hitman after getting it wrong
21:04karolherbst: so I am fairly sure when I get it right
21:08karolherbst: this can't be right... can it
21:09imirkin_: look at where those functions are already used.
21:09karolherbst: I just completly lost track in my own code..
21:09kattana: arli said something about 'threads & context' not working well between virgl/nouveau. What does it mean?
21:10imirkin_: multithreaded GL calls + nouveau = sadface
21:10imirkin_: although, amusingly enough, mesa_glthread=true in the environment will probably work around the worst of it
21:11karolherbst: imirkin_: I try to copy the first set by making a second set with the def from the phi node and try to take into account the cc of the bra, the values of the immediates inside the branches and where the 0/-1 is: https://gist.github.com/karolherbst/9babc35caa9bfcd0ebfacd6d78ae3a54
21:11karolherbst: but I have no idea if this is even "correct"
21:11kattana: I wonder whether this has to do with mplayer+vdpau hard blocks. Even with a tiny resolution video.
21:11imirkin_: "hard blocks"?
21:12kattana: imirkin_: I asked because starting a vm using e17 with compositing on I haven't see any errors so far.
21:12kattana: imirkin_: instant kill of vid card.
21:12imirkin_: are you on a G92?
21:13kattana: only with vdpau, this used to happen last summer and I thought by now would've bee fixed.
21:13imirkin_: not aware of any vdpau hangs on GK104
21:14imirkin_: (that said, if you try to do vdpau in one thread and GL in another thread, then same thing - boom!)
21:14kattana: I recall trying to get valgrind to work with it and sent you something but never got around doing it.
21:14imirkin_: but mplayer -vo vdpau should work fine
21:14imirkin_: there *is* an issue where sometimes blocks appear when decoding H.264 issues which has not been diagnosed, on all generations.
21:15kattana: imirkin_: it doesn't always fail, the first vid can survived, it's towards the third or fourth.
21:15imirkin_: and this is with plain mplayer, and vo=vdpau?
21:15kattana: by the way -vo vdpau doesn't full use vdpau, mplayer needs something else.
21:15imirkin_: not with mpv or mplayer2 or whatever derivative
21:16imirkin_: yeah, you also need -vc ffh264vdpau, etc
21:18kattana: imirkin_: well right now I wanted to know that about 'threads & context'. I'll hard lock my machine some other time.
21:22karolherbst: it is preoptomized neu... that's why I got a neu
21:23karolherbst: this looks correct to me: https://gist.githubusercontent.com/karolherbst/9babc35caa9bfcd0ebfacd6d78ae3a54/raw/f0208fff6c08d7455ab93ce18d5faa21807a6c34/gistfile1.txt
21:24btborg: My nouveau broke and won't connect to X
21:24karolherbst: except the sType is missing :/
21:24karolherbst: ohh wait
21:24karolherbst: sType == dType
21:24btborg: Segfaults, hexdumps, and really strange errors in journalctl
21:24imirkin_: karolherbst: sounds right
21:25btborg: GPU is GTX980Ti (Maxwell)
21:25karolherbst: well piglit might tell me more
21:25imirkin_: btborg: pastebin dmesg
21:26btborg: will do
21:26btborg: hold on let me get on my desktop
21:27Lyude: some of this asm for the polygon mode front/back macros for the GM200 seems bizarre. "mov $r2 (add $r2 0xffffe4ff)"
21:28imirkin_: oh interesting.
21:28imirkin_: which macro?
21:29Lyude: One for setting POLYGON_MODE_BACK in the nvidia blog, 0x2f
21:29btborg: back on desktop
21:29Lyude: I'm assuming this must be where some of the magic is happening for NV_fill_rectangle if I can't see any other noticeable register differences
21:30imirkin_: Lyude: ok, well, the macro language is not exactly the most versatile
21:30imirkin_: Lyude: that's to check if the polygon mode == 0x1b01
21:30imirkin_: (which is FILL)
21:31imirkin_: looks like it does some weird shit with method 0xf14 - no clue what that is :(
21:31Lyude: yeah i kept seeing that register get set to 0 in the traces and never get set to anything else
21:31imirkin_: SP[0x3] == GS, SP[0x4] == FS
21:32imirkin_: maybe it has to get set to something funny if e.g. you're drawing patches but have a tess evaluation shader outputting in point mode
21:33imirkin_: btw, fyi, branches in the macro language execute the next instr too unless there's a "annul" there
21:33imirkin_: just like MIPS :)
21:33Lyude: i will sadly admit I have never done assembly before so this is new to me
21:33btborg: heres the relevant logs
21:34imirkin_: Lyude: oh, well, [hrm, my example was going to involve building a cpu, but no asm probably means you haven't built a cpu either...]
21:34imirkin_: btborg: i've been seeing that too =/ try booting with nouveau.runpm=0
21:35imirkin_: Lyude: imagine you're a CPU processing instructions... and you have an instruction that tells you to fetch the next op from some other place
21:35imirkin_: Lyude: if you have no pipeline, no biggie - just fetch it from the other place and move on
21:35btborg: also relevant:
21:35btborg: Mar 20 17:19:32 _Nadia_ kernel: nouveau 0000:01:00.0: bus: MMIO write of 800000dc FAULT at 10eb14 [ IBUS ]
21:35btborg: single line
21:35imirkin_: Lyude: but if you have, e.g., a 2-stage pipeline, it's convenient to not worry about flushing the pipeline and just executing the next instruction anyways
21:35imirkin_: btborg: that's likely harmless.
21:36btborg: this too?
21:36btborg: I saw this all the time when it was working
21:36btborg: Mar 20 17:19:32 _Nadia_ kernel: nouveau 0000:01:00.0: DRM: Pointer to flat panel table invalid
21:36imirkin_: btborg: boot with nouveau.runpm=0 and hopefully your situation will improve.
21:36imirkin_: yea, that's fine
21:36btborg: nouveau.runpm=0 ? does this just disable nouveau for intel?
21:37imirkin_: btborg: it disables runtime suspend of the GPU, which probably isn't a thing for you anyways. it's only a thing for laptops.
21:37btborg: ah, I see
21:37btborg: will this hopefully make X and Nouveau work again (in theory)?
21:37imirkin_: (and super-high-end motherboards which can power off individual devices)
21:38btborg: where do I set that parameter?
21:38imirkin_: kernel cmdline
21:38btborg: where's that, in grub?
21:38imirkin_: depends on how your machine is set up
21:38imirkin_: but yes, many machines use grub to set that up.
21:38btborg: ok, I'll check the relevant ArchWiki article
21:42Mortiarty: imirkin, regarding "there *is* an issue where sometimes blocks appear when decoding H.264" - I can confirm that on a gk106. Can this be firmware related or if not where could that problem arise?
21:42imirkin_: Mortiarty: i've spent days trying to figure out the issue. i failed.
21:43imirkin_: Mortiarty: works fine with nvidia blob, so they know something about H.264 that we don't
21:43imirkin_: i have a video that reproduces the issue very early on which i have yet to analyze
21:43imirkin_: mostly just need someone to take a mmt trace on blob for me
21:44Lyude: imirkin_: so like, https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/nvc0/mme/com9097.mme#n98 that would start fetching the instructions at #locn_0a_pmf, perform maddr0x36b, and then jump to #locn_0a_pmf if $r7 == 0?
21:44imirkin_: Lyude: correct.
21:44imirkin_: it's a 2-stage fetch-and-decode pipeline. so by the time the branch ends up "executing", it's already fetched the next op
21:45imirkin_: MIPS and probably a few other RISC architectures used to do this too
21:46Mortiarty: imirkin, i will glady help you with an mmt trace - can i use any video or do you have a clip for me for testing. it occurs btw. with some, not all h.264 encoded material
21:46imirkin_: [and this is why branching is so bad on deep pipelines - you end up having to throw away a ton of work]
21:46imirkin_: Mortiarty: well, the nice thing about this trace is that the issue happens on like the first frame
21:47imirkin_: basically the further you go, the more things tend to diverge
21:47imirkin_: since there's many ways to use the engine... like reference frame managemenet, etc
21:47Lyude: if that conditional branch ends up not doing the jump becausse $r7 is nonzero, would the maddr 0x36b get... discarded somehow?
21:47imirkin_: and it becomes difficult to compare what nouveau does to what blob does
21:47imirkin_: Lyude: no...
21:47Lyude: okay then I think I get it
21:48imirkin_: Lyude: basically there are 2 pointers... the currently-executing instruction, and the currently-fetching instruction
21:48imirkin_: [this is also why branch prediction is a thing... want to make sure you're filling your pipeline with the right instructions]
21:49imirkin_: normally you end up having to stuff nop's in the delay slot, although the macro isa allows the "annul" modifier which doesn't execute the delay slot
21:50imirkin_: [presumably by internally inserting a bubble into the pipeline]
21:52nyef: ... I just spent far too long trying to track down a regression with HDMI audio on gt215 with 3D output. And it turns out that it's a display-specific problem.
21:52imirkin_: Mortiarty: i don't have the trace on me, but i could probably share it with you later on
21:52imirkin_: nyef: doh!
21:52nyef: Audio works fine on a different panel, but it doesn't work on my PS 3D display.
21:52imirkin_: does it work without 3d?
21:53nyef: Not on the 3D display.
21:53nyef: And this also invalidates the "doesn't work if the DPort connector is in use" thing.
21:53btborg: No luck
21:53Mortiarty: imirkin, you meant the clip, right? And I make a trace?
21:53imirkin_: btborg: pastebin dmesg
21:53imirkin_: Mortiarty: yes.
21:54Lyude: imirkin_: alright, I think I understand this :)
21:54nyef: So, it's going to be about a week before I can do a proper regression test on the 3D patches for gt215.
21:54btborg: has anyone else reported errors like this with Maxwell cards?
21:54imirkin_: Lyude: it's really a little tangential to the whole thing :) btw, there's a bug in envyas that makes it unable to compile that mme file. have to comment something out in the ISA to make it work =/
21:54imirkin_: btborg: if you're seeing that error with nouveau.runpm=0, that's ... unlikely.
21:55btborg: is this likely an error of xorg or nouveau?
21:55imirkin_: depends what "this" is
21:55imirkin_: the error you were seeing before is nouveau's fault most likely... or something in drm core
21:55Lyude: imirkin_: oh. thanks for letting me know lol
21:55nyef: Which leaves the problems that I was having with the gk104 not running external displays at all, sorting out the frame-packing stuff, and starting in on the userland stuff.
21:55imirkin_: btborg: but you are unlikely to be seeing that error now. unless you didn't properly add that param.
21:55Lyude: also, I am guessing that 0xf14 register hasn't come up on other generations?
21:55btborg: do I type "set nouveau.runpm=0" into grub?
21:55imirkin_: btborg: it has to be on the kernel cmdline.
21:56imirkin_: i highly doubt that that's the proper grub syntax
21:56imirkin_: i think that just sets a varible in grub
21:56imirkin_: Lyude: i don't think so.
21:56btborg: where is that? in a config somewhere?
21:56imirkin_: btborg: depends on your setup.
21:56btborg: arch linux, systemd
21:56imirkin_: can't help ya. esp with the systemd bit.
21:57Lyude: interesting, maybe this 0xf14 register is the answer. i'll have to play with it later
21:58btborg: oh i get it now
21:58btborg: i did it wrong
21:58btborg: edited the wrong line in grub
22:00Mortiarty: imirkin, do i need https://nouveau.freedesktop.org/wiki/Valgrind-mmt/ ?
22:01btborg: wow omg, VICTORY
22:01btborg: X works again!!!
22:02imirkin_: Mortiarty: yes.
22:02btborg: thank you so, so much!
22:02imirkin_: airlied_: ok, so ... this is like the third report of runpm fail with kernel 4.10
22:03btborg: so whats the problem I was experiencing?
22:03airlied_: imirkin_: sounds like bisect time r lucky guess
22:03airlied_: should prob look at 4.10 log
22:04btborg: ah, so this is both a problem of nouveau and linux itself?
22:06imirkin_: btborg: nouveau is part of the linux kernel
22:07imirkin_: (like every other hw driver)
22:07imirkin_: airlied_: seems like the issues are largely happening on machines that should have no runpm in the first place
22:07imirkin_: (i.e. desktops/laptops without acpi methods)
22:07btborg: Is pm a laptop feature?
22:08imirkin_: btborg: depends ... in this case, powering PCI devices off, is a platform feature largely reserved to laptops.
22:08nyef: Hypothesis: GF119 worked fine with audio because the audio infoframe is handled by magic. GT215 doesn't because the audio infoframe is supplied by the driver and is hardcoded.
22:08imirkin_: nyef: magic does tend to work best.
22:09nyef: Thus, gt215 working for audio at all is hit or miss.
22:09btborg: yes, that explains why my system wouldnt shutdown properly!
22:09btborg: had to hard reset
22:09nyef: Also, g84 working for audio would *also* be hit or miss.
22:09btborg: probably shouldve mentioned... :/
22:09btborg: slipped my mind
22:10nyef: Alternately, something went wrong with my 3D display.
22:10nyef: Or both.
22:15nyef: ... gk104, 3D modes work. Audio not tested.
22:17Mortiarty: imirkin, compiled and tested vagrind - ready when you are.
22:18imirkin_: not till tonight
22:20Mortiarty: imirkin_, ok - btw i must congratulate you guys to the gk106 implementation. it runs much faster then the current nvidia-blob 378.13. i noticed a performance loss with the blob but never actually went into it until yesterday.
22:20Mortiarty: nvidia auto: glmark2 Score: 2939
22:20Mortiarty: mvodoa max : glmark2 Score: 2967
22:20Mortiarty: nouveau 07 : glmark2 Score: 712
22:20Mortiarty: nouveau 0a : glmark2 Score: 1782
22:20Mortiarty: nouveau 0f : glmark2 Score: 4570
22:21imirkin_: Mortiarty: that sounds ... surprising
22:22imirkin_: we must be cheating somehow
22:22Mortiarty: imirkin_, i know karolherbst said something like that... the older nvidia drivers worked much faster but i cant use them anymore with recent kernels... so
22:23Mortiarty: nvidia must have changed something and i tried a lot already
22:23imirkin_: very odd.
22:23imirkin_: well, glad it works :)
22:23Mortiarty: and the score matches 2 other results from the web... for nvidia blob
22:23Mortiarty: yeah! yay!!!
22:24airlied_: imirkin_: 4.10 has runtime pm for pcie ports
22:24imirkin_: airlied_: nouveau.runpm=0 fixes it
22:24imirkin_: and it's death by lock imbalance
22:28imirkin_: RSpliet: ping re https://github.com/envytools/envytools/issues/88
22:36airlied_: imirkin_: I'll take a dig today if my memory works :)
22:36imirkin_: airlied_: and if not, i hear you can get replacements cheap now :p
22:39imirkin_: [or you can try the memory doubler, johnny mnemonic-style.]
22:54RSpliet: imirkin_: sorry, haven't had a lot of time since. I'll send that trace home tomorrow so I can take a proper look
22:55imirkin_: RSpliet: ok. feel free to pass it on to me as well.
23:01nyef: Okay, gk104 audio works with 3D.
23:01imirkin_: could be a pre-existing bug on gt215. although i do think people generally said it worked.
23:01nyef: Hell, *I've* said it worked.
23:01imirkin_: otoh it had that eld bug since v4.3, i guess people don't test it a ton
23:01imirkin_: well - back in the 3.x days :)
23:02nyef: Sure, that was back before the driver hardcoded the infoframe.
23:04nyef: So, gt215 final test needs to wait about a week, until I have access to the other 3D panel, but everything else works as expected.
23:05nyef: And that leaves the frame-packing damage to sort out and then userland work.
23:06nyef: Whatever userland work I do needs to leave at least a concession towards the 3D Vision kit, but at this point I'm not expecting to have the drivers for that sorted any time soon.
23:16karolherbst: imirkin_: okay, I figured out the difference between inverseCC and reverseCC
23:41karolherbst: when I only enable my SELFolding pass, hitman pro still runs, but if I add other opts, I get "asynchronous wait on fence nouveau:HitmanPro:80006d51 timed out" :/
23:41karolherbst: I bet algebraic opt does something odd
23:42nyef: Yeah, HDMI audio on gt215 doesn't stop working if there's something on the DPort. It's dependent on the panel liking whatever the driver is doing.
23:42nyef: That's a simpler problem to debug, in a way.
23:43nyef: At the very least, it's a fundamentals thing, not a "this hardware does something weird" thing.
23:44nyef: Also had a coldplug failure on the DPort.
23:44karolherbst: yep, it is algebraic opt
23:48karolherbst: imirkin_: .............. hilarious: https://gist.githubusercontent.com/karolherbst/ef47a42fd073d14c6a23e2d215cd0d70/raw/d9494570aba7259e51442bece962cf3aaff7dfe4/gistfile1.txt
23:48karolherbst: check the and
23:50karolherbst: I guess an early DCE will fix that