00:07karolherbst: those vbios are in the stupid nvflash format from nvidia :/
00:10karolherbst: I have a really bad feeling now
01:05R3d_Sky: how is maxwell first gen (GM107) support?
01:05R3d_Sky: is it plug&play or is there anything to worry about?
02:29skeggsb_: imirkin: well, i fixed that, but i don't know that it can even actually happen
04:35ptero_1: i've set up vdpau, like it said in the archwiki (i have a gtx260 and nouveau driver, nouveau-fw is installed). when i tried to play a video, both with mplayer and vlc, it has taken for over 30 seconds to load the video (playback didn't start) and then it has hanged my system (i was still able to move the mouse, but clicking and keyboard didn't work). with vdpau disabled in mplayer's config, videos start, but i get lags on ful
05:41mupuf: https://github.com/envytools/envytools/commit/2bfe3d7ef43b7bfaf22a35bcb9a25aa6b622f2d8 <- Karol must have been tired :D
05:41mupuf: they all point to unk64 :D
07:20karolherbst: mwk: the vbios is somewhat weird I've found. They have thie nvflash header in front, but that can be removed. Otherwise they are two part vbios where the bit tables of part 1 reference to data in part 2
07:20karolherbst: but the odd part about this is, that part 2 has like a lot of random data in front
07:31karolherbst: NPDE, what that might be
07:33karolherbst: okay, this is some form of encrypted stuff but with the plain text after that?
07:38karolherbst: not that this makes any more sense after I cut that out
07:40karolherbst: but I hit all table headers...
08:03mupuf: karolherbst: https://github.com/envytools/envytools/commit/2bfe3d7ef43b7bfaf22a35bcb9a25aa6b622f2d8 <-- oopsie?
08:03mupuf: you gorgot to update the power->unk64 part
08:04karolherbst: mupuf: exactly
08:04karolherbst: mupuf: but the pascal tables are all above 16bit range too
08:04karolherbst: I found some pascal vbios
08:04karolherbst: a lot of fun
08:04mupuf: cool, but the ones I find usually are unreadable by nvbios
08:05karolherbst: mupuf: just look at this: https://gist.github.com/karolherbst/f4da2dc7cec1ffbe8176529a4335751f
08:05karolherbst: mupuf: yeah
08:05karolherbst: mupuf: they have two parts
08:05karolherbst: mupuf: and the second part starts with a 0x11000 block of blob data
08:05karolherbst: starts with the string NPDE
08:05karolherbst: after you remove that, it can be somwhwat read (+ my 32bit offset patches)
08:05mupuf: more encrypted crap....
08:05karolherbst: but it still is mostly garbage
08:06mupuf: karolherbst: care to make a script to convert the traces?
08:06mupuf: err, the vbios
08:06karolherbst: mupuf: https://gist.github.com/karolherbst/a280409a6abd78ff426489a4197c2320
08:06mupuf: you can push it in the script folder of the vbios repo
08:06karolherbst: at least I hit all the table headers
08:06karolherbst: but that doesn't make much sense
08:06karolherbst: it just looks so wrong
08:06karolherbst: mupuf: anyway, we have a gm206 vbios which also needs 32bit table offsets
08:07karolherbst: a lot of fun within nvbios
08:07karolherbst: more fun in the kernel
08:07mupuf: -- type: TOGGLE, duty_range: [0:0]%, fan_div: 0 --
08:07karolherbst: that's not the biggest concern
08:07karolherbst: there are a few things different
08:07karolherbst: 1. no CSTEP table
08:08karolherbst: 1. no VOLT table
08:08karolherbst: 3. sense, rail, vmap table totally borked
08:08mupuf: yeah, but that may explain why Ilia's card pretends it is a toggle fan
08:08karolherbst: he has a pascal?
08:09karolherbst: now I get it
08:09karolherbst: yeah, happens :D
08:09karolherbst: also the power budget table is totally bricked
08:09karolherbst: but as you see, I get reasonable table headers usually
08:09karolherbst: ohh I forgot -v
08:12mupuf: the performance table is borked, wrong offst
08:14karolherbst: 0x19c55. looks good to me
08:14mupuf: ok, possible
08:14karolherbst: and version 80
08:14mupuf: can't have a xloser look now
08:15karolherbst: maybe it is just a reference tbale now
08:15mupuf: well, it would have gone from 0x50 to 0x80
08:15karolherbst: you mean from 64 to 80
08:16karolherbst: 0x40 to 0x50 ;)
08:16karolherbst: it makes sense somewhat, but the content of those tables is just broken
08:16karolherbst: the headers are fine
08:16karolherbst: no idea
08:16karolherbst: something is wrong, but I don't see it
08:17karolherbst: and there are always those NPDE and NPDS strings all over the vbios
08:18karolherbst: but some OC guys already said that maybe with pascal vbios mods aren't possible anymore
08:22karolherbst: mupuf: maybe we have to do something like that: get the data from the second half of the second vbios part. merge it with the first part, add some kind of key to it and we get the real table....
08:23mupuf: sounds insane
08:23karolherbst: you see how much sense those vbios make :D
08:23karolherbst: and the binary blob has the same size as the second vbios part
08:23karolherbst: there is nou doubt about that
08:24karolherbst: and the table offsets also point to table headers
08:31karolherbst: well at least those m tables aren't in the second part
10:54karolherbst: I think we can just enable memory reclocking for maxwell1 gpus
10:54karolherbst: it doesn't seem worse than kepler
11:56karolherbst: anything special anybody wants to know from the eon devs for figuring out the performance issues we have?
12:37RSpliet: karolherbst: are we having performance issues with eon? :-P
12:40karolherbst: but seriously, I am talking with an eon dev which wants to take a look
12:40karolherbst: and he asked me what we would need to investiage
12:41RSpliet: do they have specific problems with nouveau, or is it just the "more efficient RA, better insn scheduling" dance?
12:41karolherbst: performance sucks
12:41karolherbst: big times
12:41RSpliet: how does eon differ with other modern games in that respect? :-P
12:41karolherbst: like a lot
12:41karolherbst: a lot
12:41karolherbst: RSpliet: let me put it this way: tomb raider is able to run at like 70% nvidia speed
12:42karolherbst: with saints row 3 it feels more lke 15%
12:42karolherbst: well, bioshock infinite also runs pretty good
12:42karolherbst: but this is dx10 to gl
12:42karolherbst: and not dx9 to gl
12:43RSpliet: wait, is it using Wine?
12:43karolherbst: also memory clock has no effect on the performance, so I am sure something completly stalls the pipeline
12:43karolherbst: they have their own translating thing
12:43karolherbst: but basically they take the windows engines as it is
12:43RSpliet: okay... that memclk doesn't make a difference is surprising
12:43karolherbst: metric-ipc is below 0.4
12:44karolherbst: but I doubt that our compiler is _that_ bad
12:46pmoreau: Is Tomb Raider a native port or is it using some eon/Wine/etc.?
12:48RSpliet: karolherbst: have you got other metrics? L1/L2 misses?
12:48karolherbst: well. if they would work, I could get them
12:48RSpliet:pokes hakzsam... can we have those please? :-P
12:49karolherbst: I am interessted in those *_utilization metrics too
12:49RSpliet: and what about register usage for their shaders
12:50karolherbst: RSpliet: not an issue afaik
12:50RSpliet: karolherbst: have you got the exact count?
12:50karolherbst: RSpliet: let me put it this way: we execute maybe 10% of instructions compared to pixmark_piano, while having the same performance
12:51RSpliet: karolherbst: but if you use 127 registers in the process, you're still not getting a lot out of it
12:51RSpliet: l1_global_load_miss could be interesting
12:51karolherbst: well, the only thing I am sure off is, that something stalls. no idea what though
12:51RSpliet: karolherbst: stalls aren't a problem if there's enough parallelism to mask it
12:51RSpliet: hence the register count matters
12:52karolherbst: I meant like _any_ stall
12:52RSpliet: so do I
12:52karolherbst: it could be even a stall in the API or something like that
12:52karolherbst: fact is, the gpu is bored
12:52karolherbst: but the pmu counters show a high load on the GR
12:53RSpliet: did you actually get the GPR count for their shadrers?
12:53hakzsam: RSpliet, hehe, bugfixes time before because it's the RC window :)
12:53karolherbst: I dmped them
12:53karolherbst: RSpliet: l1_global_load_miss doesn't work yet
12:54RSpliet: karolherbst: where? the link you sent only contains a list of available perf counters
12:54RSpliet: oh that's a shame!
12:54karolherbst: RSpliet: repository, we have one for shaders
12:54karolherbst: and I usually throw a lot of shaders in it
12:54RSpliet: oh I don't have access to that I don't think
12:56karolherbst: maybe active_cycles is interessting
12:56karolherbst: compared to other applciations
12:57karolherbst: 200M active cycles
12:57karolherbst: that sounds like not much
12:59karolherbst: at 10fps...
13:00karolherbst: pixmark_piano: 22M @ 22fps
13:02karolherbst: 2.70G inst_executed@7fps (pixmark) 80M@10fps (saints_row)
13:02karolherbst: ^^ this is a big difference
13:03RSpliet: how's dual issue for those shaders?
13:03karolherbst: around 50% with my patches
13:04karolherbst: I am really worried why the inst_executed count is soo low
13:34karolherbst: uhh 20M thread launched
13:34karolherbst: is that much?
13:35karolherbst: and around 700k warps launched
13:36RSpliet: I wouldn't say 20M is shocking
13:36RSpliet: is that per-frame?
13:39karolherbst: with tomb raider I get like 60M
13:41karolherbst: and around 200M inst_executed with 3-4x times fps
13:41karolherbst: so yeah, I still think that the gpu isn't exactly occupied with work but something stupid
13:45karolherbst: metric-ipc is around 1.50 in TR2013
13:46karolherbst: so I am quite sure, that this value being low in SR3 does show us why the perf is bad, but sadly it doesn't show us why the value is that low
13:51RSpliet: well, an ipc of 0.4 gives a bit of a hint
13:51karolherbst: yeah, I know, but I still didn't figure out why that is
13:52karolherbst: the ipc counter is also a bit odd
13:52karolherbst: ohh wait, no, it is actually fine
13:55RSpliet: well, here's a list of ideas: 1) relative large density of memory reads (be it through the TPC or otherwise) means it stalls a lot. 2) Lots of cache misses due to placement collisions 3) Insufficiently benefiting from the distance between a load op and a usage op
13:56RSpliet: and those are only general ideas, I'm sure graphics pipeline experts can come up with a load of other stuff too :-P
13:59imirkin: skeggsb: cool. like you said, i doubt that's what was actually happening there.
14:06karolherbst: RSpliet: but if it would be large amount of memory reads, shouldn't a higher memory clock actually help?
14:07RSpliet: karolherbst: not if you're saturating your MSHRs
14:14karolherbst: RSpliet: what's MSHR?
14:14RSpliet: "Miss Status and Handling Register"
14:17RSpliet: karolherbst: here's an interesting read on the topic: http://ece.uvic.ca/~lashgar/files/15heart.pdf
14:19RSpliet: (note that the "PRT" thing is patented by NVIDIA, funny how those "researches" talk about reverse engineering (in a more elaborate version of the paper) while it's all documented there.)
14:23RSpliet: hmm, maybe I should be a bit more PC in a publicly logged channel like this... it's an informative read still :-)
14:23karolherbst: why shouldn't that bad been "PC"? :D
14:24karolherbst: if you use "PC" to cover the truth you have a problem anyway :p
14:25RSpliet: well, putting «researchers» between quotation marks based on one sample of their work is a bit premature by the least ;-)
14:26karolherbst: but who is "mature" _and_ "honest" in publich anyway :p
14:26RSpliet: Vladimir Putin, and we all know it
14:26karolherbst: well :D
14:30karolherbst: RSpliet: well the thing is now, how can we find out?
14:53karolherbst: hakzsam: by the way, compute shaders still crashes TR2013
14:54karolherbst: hakzsam: I just got told I actually have to restart the game to test that ...
14:54karolherbst: well, crashes the gpu
15:07pmoreau: karolherbst: Ok. Have you tried running Tomb Raider through Wine to see how well/bad it performs there?
15:09karolherbst: pmoreau: tomb raider isn't the issue ;) but on wine it uses the dx9 rendering path and on linux the dx10 one (as one of the devs told me)
15:09karolherbst: even on windows the dx9 path is like 25% faster
15:09karolherbst: and the same is usually true for native vs wine
15:09karolherbst: but I am more concerned about saints row 3 ;)
15:09pmoreau: Mhh… it’s a native port but using DX10 on Linux? You lost me
15:10karolherbst: pmoreau: I think they just abstract the same rendering path
15:11karolherbst: yeah, the dev just said it
15:11karolherbst: they have two "completly" different paths
15:11karolherbst: and one is the dx9, the other the dx10 path
15:11karolherbst: but they use gl
15:11karolherbst: it's just the way they call it
15:12pmoreau: I was thinking of comparing TR native vs wine, to see how much you lose by moving to wine, even if it’s not a good measurement. But if TR still runs at 60% through wine, then eon has most likely some issues
15:13karolherbst: pmoreau: TR wine-csmt-dx9 was faster than dx10-nvidia-native
15:13karolherbst: pmoreau: but you forget that bioshock infinite is also a dx10 game
15:13karolherbst: and this runs pretty good
15:13karolherbst: on nouveau and nvidia
15:14karolherbst: but the dev already said that there are issues with their dx9 stuff
15:14karolherbst: nouveau is just really bad
15:14karolherbst: even compared to nvidia
15:14karolherbst: where I get like 60 fps with nvidia, I get hardly 15 with nouveau
15:15pmoreau: do I have any of those eon games? I guess The Witcher 2 is one of them?
15:15imirkin: karolherbst: if you're talking to eon guys, try to get them to cough up some keys for mesa developers
15:16karolherbst: pmoreau: I can give you access to sr2/sr3
15:16karolherbst: pmoreau: we've got keys
15:16karolherbst: imirkin: there is one issue
15:17karolherbst: imirkin: they don't have publishing rights
15:17imirkin: karolherbst: based on some things you're saying, it's sounding like poor resource placement is a large part of the issue.
15:17karolherbst: imirkin: so it costs them
15:17imirkin: better them than me
15:17karolherbst: I have both games anyway, and we got sr2/sr3 keys
15:17karolherbst: and this should be enough to get starting
15:18pmoreau: sr2 == Saints Row 2, not Normandy SR-2! Okay :-D
15:18karolherbst: yeah :D
15:18imirkin: ok, good luck
15:18karolherbst: those are also games where nouveau has issues
15:18karolherbst: so, it should keep us busy
17:43mwk: alright, the llvm assembler is reasonably well-tested at this point
17:43mwk:considers it closed for v3 and proceeds to beating codegen into shape
19:11karolherbst: by the way, when nvagetbios says "Card has second bios", that means I have like 2 rom chips?
19:26hakzsam_: I'm going to make a list of tested/supported games with nouveau. Can you guys send me a list of games you tested/played with some notes like (good perf, no know issues, totally broken, etc) ? Thanks
19:26hakzsam_: the list will probably be available on the wiki
19:27karolherbst: nv134 is the chipset name for gp104
19:28karolherbst: and nvagetbios -s prom still seems to work :)
19:29karolherbst: the vbios is really odd :D
19:42pmoreau: hakzsam_: I guess (for the perf), a comparison to the blob is required?
19:45hakzsam_: pmoreau, not really, does the game is playable is enough I would say
19:48pmoreau: Then, for GK107: Crusader Kings II: runs fine, Stellaris: playable as long as you do not look at the galaxy map which makes framerate drop (and I run out-of VRAM too, but the game remains stable)
19:49hakzsam_: which mesa version?
19:50pmoreau: 11.2, and kernel is Karol’s kepler stable reclocking v5, so 4.6-rc7 IIRC
19:51karolherbst: hakzsam_: well do you plan to make that public?
19:52hakzsam_: yeah, why not?
19:52karolherbst: hakzsam_: I am sure that some "gamers" would like to fill stuff and crap in there
19:52karolherbst: just asking
19:52karolherbst: good stuff and crap :D
19:52hakzsam_: but editing the wiki requires rights :/
19:52hakzsam_: karolherbst, do you have an account btw?
19:53karolherbst: winehq also somehow works
19:53karolherbst: I should have
19:53karolherbst: maybe not
19:53karolherbst: I don't know
19:53karolherbst: hakzsam_: but honestly, I think it is a bad idea if it would require a fdo account
19:54karolherbst: appdb on winehq is also community managed
19:54karolherbst: and it works quite well
19:54hakzsam_: do you have a better idea? :)
19:54karolherbst: no, I talk about appdb/winehq just for fun :p
19:55karolherbst: well, I am sure we can have something community driven there
19:55karolherbst: especially when reclocking will be kind of stable on kepler/maxwell1 and people actually use nouveau for playing games
19:56karolherbst: Smiles has a 750 Ti running with my maxwell_reclocking branch and it seems to work quite well
19:56karolherbst: which is I think the fastest maxwell1 goes
19:57karolherbst: maybe a 860m is faster
19:57karolherbst: the faster 860m is kepler...
19:58karolherbst: hakzsam_: if it goes out of hand, because we get too much crap, we can always make it more moderated or something
19:58karolherbst: but usually the ones who play the games know best how well nouveau works on that
19:59hakzsam_: but let's start with that
19:59hakzsam_: and improve later if needed
20:00karolherbst: yeah, just saying, so that we can migrate easily to whatever we would like to use later
20:00hakzsam_: if the list grows up
20:01karolherbst: so, I pushed the gp104 vbios I got today through nvagetbios
20:02karolherbst: if somebody wants to take a look at the (most likely) encrytpion bs, feel free to do that :p
20:19Yoshimo: and how well does the nouveau toolchain parse the pascal bios?
20:51pmoreau: Since I was adding `high(a.low * b.low)`, I thought I didn’t needed the carry. That’s true, as long as the initial operation was a MUL and not a MAD…
20:51hakzsam: just played Serious Sam 3 on GF119, it works just fine
21:54RSpliet: karolherbst: oh joy, everything is upside down in the VBIOS :-D
21:54RSpliet: well, at least the RAM type table is still intact
21:56karolherbst: RSpliet: well basically it is like this: the vbios has two parts
21:56karolherbst: 1. 0-0xf200
21:56karolherbst: 2. 0xf200 - 0xf200 + 0x11000+stuff
21:56karolherbst: 2. part begins with the 0x40 meta data
21:56karolherbst: then comes a 0x11000 crypto? block
21:56karolherbst: then 0x11000 data
21:57karolherbst: if you cut out the crypto? block, then the P table points to the right headers
21:57karolherbst: RSpliet: check by pascal branch on envytools
21:57RSpliet: haven't I seen VBIOSes where they cluncked the FALCON firmwares directly after it?
21:57karolherbst: RSpliet: the issue is, since maxwell, the tables have 32bit offsets
21:58karolherbst: not 16bit as we always thought
21:58karolherbst: so tables starting above 0xffff happens
21:58karolherbst: and yes, we have to change the kernel module for that :D
21:58karolherbst: we got a maxwell vbios two days ago
21:58RSpliet: btw, I thought I fixed the RAM script -v output a long time ago, but that looks very broken
21:59RSpliet: *RAM type
21:59karolherbst: RSpliet: nv124/Lekensteyn
21:59karolherbst: RSpliet: GDDR5X
21:59karolherbst: but yeah, maybe everything is broken in this pascal vbios
21:59karolherbst: first we have to fix this nv124 vbios
21:59RSpliet: RAM script -v output is broken for Fermi too
21:59karolherbst: then we can have a look at pascal
22:06RSpliet: don't you mean NV134/Unseen2?
22:06karolherbst: RSpliet: no
22:07karolherbst: RSpliet: we have to fix that maxwell first
22:07karolherbst: check the P table
22:07karolherbst: you will see it
22:08RSpliet: small things first
22:08karolherbst: we need this for pascall one way or another
22:08karolherbst: but maxwell is currently "supported"
22:09karolherbst: check this PR for fixing nvbios up: https://github.com/envytools/envytools/pull/55
22:09RSpliet: I was looking at smaller things right now
22:09RSpliet: that NV134, where does it come from? what is it? 1070 or 1080?
22:09RSpliet: so GDDR5X?
22:10karolherbst: and it was retrieved via nvgetbios -s prom
22:10imirkin_: 1070 will probably also be NV134, with 1060 being NV136. if previous patterns are followed at all.
22:10karolherbst: yes, 1070 is confirmed to be nv134
22:11imirkin_: not that any of this matters
22:11RSpliet: do we have anything else, like a strap peek or trace?
22:11karolherbst: not yet
22:11karolherbst: he said he might do a trace on the weekend
22:15RSpliet: here's a bit of N=1 statistics trickery... let's wait with pusing until seen a strap peek or some more samples
22:15RSpliet: (small... ;-P)
22:16imirkin_: RSpliet: well, induction tells that if it works for n=1, it works for all n...
22:16RSpliet: only if it works for n=n+1 too
22:17imirkin_: poke holes in my theory, why don't ya
22:19RSpliet: is "af" a new initscript opcode?
22:21karolherbst: I also thought there might a new opcode
22:21karolherbst: you mean 0x0000b7ed, right?
22:22RSpliet: can't say what it is, but byte 3 indicates how many 32-bit words follow
22:24RSpliet: those 32-bit words are regs
22:35RSpliet: oh... the 1152 bytes that follow are probably part of the same opcode
22:35RSpliet: (6 * 0x30 * 4)
22:42RSpliet: skeggsb, karolherbst: can I leave you to play with this http://paste.fedoraproject.org/373912/49073341 ?
22:42RSpliet: you probably need trace to solve all the mysteries around it, but well, I'm just hitting the sack :-P
22:45RSpliet: oh, heh, forgot a break; there
22:46RSpliet: more "oh, heh"
22:46RSpliet: it's the training set 1 upload routine
23:02imirkin_: i like it ;)
23:03RSpliet: I guess they couldn't get away with untrained memory in low speeds anymore
23:08dcomp: guys does this mean anything?
23:08dcomp: [ 696.582080] x86/PAT: nvagetbios:7041 conflicting memory types a0000000-b0000000 uncached-minus<->write-combining
23:08dcomp: [ 696.582084] x86/PAT: reserve_memtype failed [mem 0xa0000000-0xafffffff], track uncached-minus, req uncached-minus
23:08imirkin_: yeah, that's fine, ignore it
23:09dcomp: [ 696.584336] nouveau 0000:07:00.0: bus: MMIO read of 00000000 FAULT at 6013d4 [ IBUS ]
23:09dcomp: sorry that last line was the one I mean to paste
23:10dcomp: happens everytime I run nvagetbios
23:10imirkin_: that means that something tried to read mmio register 0x6013d4 and that reg is not there
23:10imirkin_: now, that happens to be the vga-style 0x3d4 reg, i think
23:10imirkin_: which isn't there on "3D Accelerator" cards
23:10imirkin_: (as opposed to "VGA compatible adaptor" cards)
23:11dcomp: actually second time it tries to read 619f04
23:11imirkin_: probably nvagetbios ends up touching it for some reason
23:11imirkin_: and it probably shouldn't. but it's harmless as well
23:11imirkin_: the card just raises an error, and nouveau reports on that error
23:12RSpliet: 0x619f04 is a pointer to the VBIOS memory location. It's lack of existance could mean nvagetbios fails to find your VBIOS
23:13RSpliet: in my personal experience, a better way to obtain the VBIOS is by running nouveau and grabbing it from /sys/kernel/debug/dri/<number>/vbios.rom
23:13RSpliet: (but I understand there's situations in which this is impossible ;-))
23:13karolherbst: RSpliet: XD
23:14RSpliet: karolherbst: ?
23:14karolherbst: nothing, just your last sentence ;)
23:14karolherbst: doesn't nvagetbios default to pramin?
23:15karolherbst: and yes, 0x619f04 is pramin related
23:16karolherbst: dcomp: nvagetbios -s prom
23:17dcomp: karolherbst: now I get spam of mmios read failures 3f84dc > 3ffff4
23:17karolherbst: dcomp: mobile chip?
23:17karolherbst: I guess it is one of those ACPI thingies?
23:18karolherbst: dcomp: but I can give you your vbios from your gm108 if you really want to have it
23:24dcomp: managed to get it... SHould the mem train table be full of zeros
23:40dcomp: Should 11d65c onwards be the same between nouveau and nvidia?
23:41dcomp: As far as I can tell its related to ram init