00:19imirkin: hrmph. apparently we also fail some tests where geometry shaders emit points. coincidence?
03:05JodaZ: anyone got an idea how i could just write to something resembling a framebuffer on windows from kernel mode?
07:55karolherbst: hmpf, nouveau doesn't write the pmu scripts in one go and then demmio won't read the SEQ script out of it :/ I wonder how I did that earlier
08:09hussam: karolherbst: hello. The patch enabled reclocking. It was not as fast as the nvidia proprietary driver but noticeably faster.
08:09karolherbst: hussam: I see. So most likely memory wasn't the big bottleneck in your case
08:10karolherbst: skeggsb: maybe we should indeed enable engine reclocking on fermis and try to solve all upcoming issues there, because for some it might make a noticeable difference
08:11hussam: there is however an issue which I don't know the root cause of. under nouveau/mesa, resizing gtk3 CSD windows causes terrible tearing. windows where mutter draws the decoration resize very smoothly.
08:13karolherbst: hussam: how much more performance do you get in your usecase with the reclocking enabled?
08:14hussam: I didn't know how to get exact numbers but the desktop lag stopped.
08:14karolherbst: that's a good enough reason anyway
08:15hussam: Yes, thank you very much for your help yesterday.
08:15karolherbst: no worries
08:15karolherbst: it was something I proposed months ago anyway
08:16karolherbst: but because it shouldn't do much of a difference in general it was kind of dropped
08:16karolherbst: but maybe we should just enable it and don't find bs arguments why not to
09:23karolherbst: RSpliet, mupuf: does anybody of you know how I can parse the nouveau PMU memx scripts?
09:23mupuf: RSpliet is the one to talk to for this
09:24karolherbst: yeah, I asked you both :p
09:24karolherbst: I did this once, but for some reasons it doesn't work now
09:27karolherbst: meh I have a file called "''$'\033'"....
09:27karolherbst: how do I remove this :D
09:28cosumu: oh god
09:48karolherbst: mhh, I don't find the stuff in my IRC logs :/
10:05karolherbst: ohh right, the nouveau scripts are value -> reg pairs
10:15Tom^: sigh, next time im ordering a 2000$ router.
10:17karolherbst: aereaux: your stuff looks odd :/
10:22karolherbst: aereaux: ahh, got it now
10:22karolherbst: aereaux: mem clock on your gpu was 5GHz,right?
10:25RSpliet: karolherbst: we don't have a tool to parse those scripts, but there's only 6 opcodes
10:31karolherbst: RSpliet: I am sure I compared what nouveau does with what nvidia does, but I can't remember how I did that :/
10:32karolherbst: but parsing those by hand shouldn't be a problem, allthough it might make sense to add support for that in demmio
10:35karolherbst: RSpliet: can you think of any other reasons than messed up reclocking sequence in the PMU script, so that after the script, the gpu crashes after executing it?
12:50karolherbst: okay, GR is at full load while running those eon games :/
12:51karolherbst: graph does all the shader stuff, right?
13:01mupuf: but is it waiting on the memory controller or is it ALU-limited?
13:18karolherbst: just GR load
13:18karolherbst: really stupid waits
13:18karolherbst: memory load doesn't matter
13:18karolherbst: 0f perf == 0a perf
13:18mupuf: and what difference is there between the two perf domains?
13:18karolherbst: the GPU also doesn't consume much power
13:19karolherbst: mupuf: 1.6 GHz vs 4.0 GHz mem
13:19karolherbst: engine clocks stay the same
13:19mupuf: well, it may be hitting L2 most of the time
13:19mupuf: but yeah, that is unlikely
13:19karolherbst: mhh maybe
13:19karolherbst: well the odd thing is
13:19mupuf: there are perf counters for this
13:19karolherbst: the issues slots are around 20-40% filled
13:20karolherbst: mupuf: yeah, non working ones :D
13:21karolherbst: well anyway, I think something stalls in the MP internally
13:21karolherbst: memory doesn't seem to be the issue in itself
13:22karolherbst: improving the shaders doesn't help much either
13:22karolherbst: especially the dual issuing patch changed nothing
13:22karolherbst: but that's because of the under utilization of the issue slots
13:36karolherbst: mupuf: well anyway, the shaders also don't have that many texture operations
14:00karolherbst: RSpliet: okay, I already found some differences in the memory reclocking script :)
14:00karolherbst: aereaux: if you have some time later (like in 4 hours?) we could try to check what exactly went wrong
15:37karolherbst: RSpliet: when nvidia writes to PBFB_BROADCAST.MEM_TIMINGS_9 and we not, could that cause big issues?
15:52karolherbst: ahh it is already set to the right value
16:53imirkin: alright, plugged in my GK208 instead of the GF108 now. ftr, it's a NV106 (GK208B).
16:53imirkin: reclocking on it works. way faster than the GF108 without reclocking :)
16:55imirkin: aaaand DRI_PRIME works on the NV34. eeexcellent.
16:55imirkin: so now i can test nvc0, nv50, and nv30 all without rebooting
16:56imirkin: hm. but the actual buffer sharing isn't QUITE 100% right
16:57imirkin: skeggsb: nouveau 0000:09:01.0: DRM: Moving pinned object ffff88008a6d7c00!
16:57imirkin: skeggsb: should i expect buffer sharing to work between a NV34 and a GK208?
16:58aereaux: karolherbst: I may have some time to test stuff today. How long do you think it'll take?
16:59karolherbst: aereaux: no clue
17:03aereaux: OK, I'm not sure how busy I'll be later yet, but I have some time now for a bit to test stuff.
17:04karolherbst: aereaux: can you try out this patch for 0f? https://gist.github.com/karolherbst/abd53ea152c59d5185c7af699cf03729
17:04aereaux: karolherbst: applied to your tree?
17:14imirkin: wow, elemental demo looks *way* better at 30fps than 3fps
17:16Tom^: technically it should just look a bit more speed up
17:16karolherbst: imirkin: stock nouveau or my branch?
17:17imirkin: karolherbst: 4.6 stock
17:18imirkin: Tom^: well, when you see only one frame out of ... many ... you lose some of the intended effect :)
17:18imirkin: btw, it looks like the first view of the lava flow is messed up with the GL 4.3 version
17:18imirkin: the second scene with the lava flow is fine
17:18karolherbst: imirkin: well, I guess it may crash any time then :p
17:19imirkin: karolherbst: why?
17:21imirkin: meh. it's set to 1V. should be enough no?
17:21karolherbst: depends on the clocks and gpu
17:21imirkin: at least according to the hwmon thing
17:21karolherbst: those cards can do 1.2V usually
17:21imirkin: it's a GT 730 with DDR3 vram
17:22karolherbst: doesn't matter
17:22karolherbst: mine hardly reaches 1V
17:22imirkin: mine's a desktop chip
17:22karolherbst: thats why you can expect more
17:23imirkin: AC: core 901 MHz memory 1800 MHz
17:23karolherbst: 901MHz is a bit low though :/
17:23imirkin: as high as it goes :)
17:23imirkin: this is the "free" adapter that came in a random dell box
17:23karolherbst: mhhh well I would say 1.02- 1.03V would be appropiate
17:23imirkin: i was surprised it had 2GB vram
17:24karolherbst: a 780 Ti is clocked at like 900MHz zoo
17:24karolherbst: and goes up to 1.15V usually
17:24imirkin: it started at 0.9V initially
17:24imirkin: went to 1V on reclock
17:25karolherbst: well. If you give me your vbios I can check what nouveau actually should have set
17:25karolherbst: but maybe because the memory is the big bottleneck, maybe the engines don't get enough load to crash :/
17:27karolherbst: aereaux: well what nouveau does looks fine somewhat, but something bad happens :/
17:27karolherbst: aereaux: I don't think the patch actually changes anything, but it's the only think which stand out
17:30hakzsam: imirkin, does element look fine on your kepler?
17:30imirkin: hakzsam: with the hud, yea. i think there's a bit of lava that looks wrong compared to the GL 3.2 version
17:32imirkin: and without the hud i get:
17:33imirkin: sounds like it dislikes something about the texture state? also for the short few frames it rendered for, there was a weirdly-textured rectangle up on the screen
17:34hakzsam: imirkin, btw, does this patch improve the thing http://hastebin.com/qucozelefi.coffee ?
17:34hakzsam: without the HUD
17:35hakzsam: anyways, we don't need to invalidate the compute CBs for Kepler
17:35imirkin: although something that just occurred to me is that you need to also invalidate the driver constbuf
17:35imirkin: for fermi
17:36hakzsam: because they are not aliased
17:36hakzsam: yeah, most likely
17:37imirkin: oh, you already dirty it
17:37hakzsam: in the driverconst() hooks yeah
17:39imirkin: that invalidate thingie should go into nvc0_compute_validate_constbufs
17:39imirkin: instead of being in the main launch_grid
17:39hakzsam: on fermi you mean?
17:40hakzsam: yeah, but nvc0_compute_upload_input() might upload uniforms too
17:40imirkin: mmmmm ok
17:40imirkin: ok wtvr
17:40hakzsam: this thing needs to be improved :)
17:40hakzsam: imirkin, what about the patch?
17:41imirkin: seems reasonable
17:41hakzsam: but does it improve?
17:41imirkin: oh, i didn't test
17:41imirkin: should it?
17:41hakzsam: that's could be nice, yeah
17:42aereaux: karolherbst: I tried building nouveau as a module again, and it builds fine, but then I get some dmesg errors about unknown symbols.
17:42karolherbst: aereaux: which ones?
17:43imirkin: hakzsam: btw, you need to do more bufctx clearign when you invalidate
17:43imirkin: hakzsam: all these state validate functions just add things onto the bufctx
17:43imirkin: so you can end up with this ever-growing list
17:43imirkin: normally you have a foo_set() function which clears the bufctx and sets the dirty bit
17:44imirkin: i think you just need to add bufctx_resets() everywhere you invalidate
17:46aereaux: karolherbst: nvm, I got it. I needed to modprobe some dependencies.
17:46hakzsam: imirkin, mmh, right
17:46hakzsam: imirkin, we also need to reset the thing before validatin images on kepler
17:47imirkin: hakzsam: no help, fyi
17:47imirkin: hakzsam: actually resetting in validate is probably wrong. need to reset when you set the dirty bit.
17:47hakzsam: thanks for testing
17:48imirkin: hakzsam: same exact failure... some sort of texture fail
17:48hakzsam: but the patch will reduce the number of flushes :)
17:49imirkin: not against it
17:49hakzsam: I will send it just after dinner time
17:49hakzsam: and look at the invalidation issues on my gf119
17:57karolherbst: aereaux: how does it look?
17:59aereaux: karolherbst: OK, so writing to pstate doesn't change anything now. it gives me some errors about ramcfg data for 0mhz not found, and unable to find matching pll values (what I got before), and then doesn't freeze. however, when I read from upstate ac says 0mhz, and the * is at 0f.
17:59aereaux: * read from pstate.
18:08karolherbst: I guess the memory controller is still upset and the GPU reseted or something like that :/
18:08imirkin: or the gpu is runpm'd
18:09karolherbst: ohh right, but that should only happen after
18:09karolherbst: aereaux: well try to use the GPU then
18:12aereaux: karolherbst: trying to use the GPU freezes the screen for a bit, and then I get it back, but it exits with an error. bunch of errors in dmesg.
18:13aereaux: and I do have nouveau.runpm=0 currently set.
18:15karolherbst: aereaux: what does cat /sys/module/nouveau/parameters/runpm give you?
18:15aereaux: and it somehow froze my statusbar
18:17aereaux: a bunch of programs are frozen, including the sudos that I try to start to read that parameter.
18:18aereaux: and glxgears is running at 115% CPU, with no display.
18:20karolherbst: aereaux: :/
18:20karolherbst: aereaux: well somewhat is bad, but I have no idea what
18:21aereaux: karolherbst: anything else I should try, or should I reboot?
18:21karolherbst: aereaux: reboot
18:25aereaux: karolherbst: any other ideas, or is it just not going to work?
18:25karolherbst: aereaux: no clue yet
18:26karolherbst: well there is one thing
18:26karolherbst: aereaux: load nouveau with runpm=0 config=NvMemExec=0
18:26karolherbst: and see if you can reclock this way (allthough memory reclocking is disabled)
18:27karolherbst: and if this works, we know at least that it is indeed memory related
18:27aereaux: with the new modified kernel module?
18:29imirkin: i assume no one has any interest in reviewing nv30 3d patches? i'm planning on just pushing stuff...
18:34aereaux: karolherbst: seems to be working with those parameters.
18:35aereaux: karolherbst: I used your branch without the patch you gave me if it makes any difference.
18:40karolherbst: aereaux: nope, it is just for memory
18:40karolherbst: so memory reclocking is the issue
18:41karolherbst: I guess it is some stupid corner case we missed
18:48hakzsam: imirkin, the patch should not hur anything, but I can check tomorrow
18:48imirkin: probably best
18:55karolherbst: skeggsb: I think the GPU should get reset whenever we encounter this: "R 0x000140 0xffffffff"
18:56karolherbst: no point in continueing and trying to get anything to work
19:02karolherbst: ahh okay right
19:02karolherbst: the PMU doesn't reply
19:09karolherbst: aereaux: okay, I think I have an idea what might get wrong, let me check
19:24karolherbst: aereaux: can you retest with this patch applied? https://gist.github.com/karolherbst/47288d4393d2632f99288be5ae37a86f
19:25aereaux: karolherbst: any module options you want me to apply?
19:25karolherbst: aereaux: just runpm
19:27karolherbst: aereaux: and install envytools
19:28aereaux: karolherbst: what for?
19:29karolherbst: debugging the GPU
19:37aereaux: karolherbst: with the patch it locks up a bit, and then pstate gives ac as 0mhz and glxgears doesn't work on the discrete GPU.
19:37aereaux: after attempting reclocking
19:37karolherbst: yeah, as expected
19:37karolherbst: it should just speedup timeouts
19:37karolherbst: aereaux: dmesg?
19:38karolherbst: aereaux: nvapeek 0x00137390
19:39karolherbst: aereaux: and nvapeek 0x00110974
19:39aereaux: karolherbst: nvapeek gives bad0011f for both of those.
19:41karolherbst: crap :/
19:42karolherbst: aereaux: can you git checkout the modified file?
19:43aereaux: karolherbst: yeah, what for?
19:43imirkin: glennk: would you expect nv3x (geforce fx) to do GL_*_MIPMAP_LINEAR?
19:44karolherbst: aereaux: to reset the file to stock :D
19:44aereaux: karolherbst: do you still want the dmesg?
19:44karolherbst: aereaux: yeah
19:47glennk: imirkin, for POW2 textures only
19:47aereaux: karolherbst: jmad.org/tmp/dmesg_patch1.log
19:48imirkin: glennk: right, of course.
19:48glennk: it has a bug with s3tc where it interpolates the endpoint colors incorrectly
19:48imirkin: glennk: deqp is having some precision errors =/
19:48karolherbst: aereaux: okay
19:48imirkin: it's happy with mipmap nearest though
19:48karolherbst: aereaux: can you try out this patch? https://gist.githubusercontent.com/karolherbst/47288d4393d2632f99288be5ae37a86f/raw/7a548b33f3f7a887241cc23b50aa84cf5343dbd7/tmp.patch
19:49glennk: if i remember correctly it has 6 subtexel bits of precision for interpolation
19:49imirkin: does GLES2 expose such information somehow?
19:50glennk: not that i know of
19:50glennk: d3d 11 specifies some minimum values, gl doesn't afaik
19:50imirkin: i'm force-enabling gles2 on it to see how bad it'd be... seems like it's mostly ok. i'm going to pipe the st bits through to make it possible to advertise gles2 on it
19:51glennk: i guess the main issue would be with fbos?
19:54imirkin: fbo's are fine
19:54imirkin: [for gles2]
20:05karolherbst: aereaux: I am pretty sure that patch will help somehow, but it is also the last thing I can think of trying out today. If that doesn't help somebody has to take a deeper look into the traces
20:20aereaux: karolherbst: with this patch, it actually seems to be working. pstate shows the highest accel level, glxgears is running fine, no hang
20:25aereaux: trying stellaris, runs much better than before.
20:25karolherbst: skeggsb: https://gist.github.com/karolherbst/47288d4393d2632f99288be5ae37a86f this helps aereaux reclocking his card
20:25karolherbst: skeggsb: he had even truobles to reclock to 07 or 0a
20:25karolherbst: aereaux: can you check if reclocking to 07 and 0a also works?
20:26karolherbst: aereaux: by the way, I have no idea what this reg is there for
20:27karolherbst: aereaux: I just noticed nvidia isn't touching it, so I thought, maybe nouveau shouldn't touch it too :D
20:28karolherbst: mhh and on my GPU nvidia touches this reg
20:28karolherbst: very strange indeed
20:28karolherbst: well anyhow, at least we figured this out quite fast
20:29aereaux: karolherbst: 0a and 07 work.
20:30karolherbst: okay, good
20:30karolherbst: yep, now we have to figure out when to touch 0x62c000 and when not
20:30mupuf: karolherbst: more fun :p
20:31karolherbst: but at least we are lucky we found the bad reg
20:34karolherbst: aereaux: so, have fun with your increased performance now :p
20:37mupuf: do we have a mmiotrace and vbios?
20:37mupuf: we need at the very least this to figure out why
20:37karolherbst: already uploaded
20:37mupuf: and the email address to be able to contact aereaux again
20:37karolherbst: ohh yeah, that might come in handy
20:38karolherbst: aereaux: could you send us your email address in a private message?
20:38mupuf: aereaux: just to karolherbst :)
20:38mupuf: karolherbst: update the vbios repo when you get it :)
20:39aereaux: karolherbst: sure, thanks so much for your help. any idea what the best way for me to use this fix would be for now?
20:39karolherbst: aereaux: apply this patch for every kernel you use
20:39karolherbst: you want my branch anyway
20:39karolherbst: I really hope we get this thing merged for 4.8
20:41karolherbst: mupuf: well first I check if there are other cards where this reg doesn't get touched
20:41mupuf: sounds like a good start
20:42mupuf: we need to get an idea about what this reg does
20:42mupuf: and when we do, I am sure we will know when to touch it or not
20:43mupuf: that may also be because of the brand of the ram chips
20:43mupuf: aereaux: do you have easy visual access to the RAM chips?
20:44aereaux: mupuf: not really, it's a laptop, and I haven't taken it apart or anything before.
20:44mupuf: aereaux: ok, good to know, it likely is DDR3
20:44karolherbst: mupuf: ohh, some of the traces I added from the gmail box aren't touching it either
20:45karolherbst: mupuf: no
20:45karolherbst: mupuf: gddr5 with 5GHz ;)
20:45aereaux: karolherbst: did you get my private message? I'm not entirely sure how to do this.
20:45karolherbst: got it
20:46karolherbst: mupuf: I would say around 10%
20:47karolherbst: mupuf: any idea how we want to save email -> IRC mappings?
20:47karolherbst: or should I just create a file in top dir and save some
20:48Hijiri: kepler cards still fairly expensive because of SLI
20:49karolherbst: Hijiri: what did you expect. Kepler cards are still pretty fast
20:49mupuf: karolherbst: keep just the email
20:49mupuf: if you really want to keep the nick, well, you can also encode this in the folder name
20:50mupuf: but seriously, email is enough
20:50karolherbst: but then we would have to encode it in every chipset subdir :D
20:50karolherbst: ohh odd
20:50karolherbst: 0x62c000 is really interessting
20:50karolherbst: mupuf: 00ffffff is the value for it
20:50karolherbst: but 0x0f0f0f00 is poked
20:50karolherbst: aereaux: can you do nvapeek 0x62c000 ?
20:51mupuf: karolherbst: link training?
20:51karolherbst: mupuf: well it is the first reg we touch
20:51karolherbst: and the last one
20:51karolherbst: or the second and second last
20:51karolherbst: something like that
20:51mupuf: yeah, likely related to link training then
20:51karolherbst: let's check the vbios then
20:52mupuf: and either this is the pattern you want to test
20:52mupuf: or this something to reset all the partitions
20:52aereaux: karolherbst: gives badf1300
20:52karolherbst: that helps a lot
20:53karolherbst: now the serious answer
20:53karolherbst: why does poking it messes up the MC
20:53karolherbst: it is something which isn't there on all GPUs
20:53karolherbst: and we have to figure out how to detect that
20:53imirkin: keep in mind these aren't simple registers
20:54imirkin: they're effectively function calls
20:54karolherbst: yeah I know
20:54karolherbst: but I always though badf.... always means like: here is nothing
20:54karolherbst: or GPU messed up
20:54karolherbst: or something like that
20:54karolherbst: maybe the number tells us why exactly the reg is bad
21:01imirkin: gr. all these stupid gles2 tests which test loops that the nv30 hw can't do =/
21:05loonycyborg: imirkin: you said you can test several videocards at the same time due to DRI_PRIME. Do you have them plugged into same motherboard?
21:05imirkin: loonycyborg: yep
21:05imirkin: 2 PCIe and 1 PCI
21:06loonycyborg: so it's mb with 2 pci-express 16x?
21:06imirkin: (i have 2 PCI slots, but the PCIE one happens to hang over one of them...)
21:06imirkin: yeah. although actually one of those PCIe ones is x8 :)
21:07loonycyborg: and the third is plain old pci?
21:07imirkin: and i bet i don't actually have a full 2x x16's worth of lanes. but i don't really care.
21:07imirkin: NV34 (GeForce FX 5200)
21:08imirkin: which, in theory, will allow me to test the nouveau_vieux driver as well since it has hw-level backward compatibility with NV20. but that has yet to be tested.
21:09loonycyborg: It seems it's very hard to get any sort of multi-seat to work without separate videocards, but my mb has only 1 pcie 16x slot
21:09loonycyborg: the rest are 1x
21:10imirkin: you can actually buy x1 pcie gpu's
21:11imirkin: skeggsb: looks like you don't expose the nv2x classes in gr/nv3*.c =/
21:12loonycyborg: seems not in the shop that is close to me :P
21:12loonycyborg: even cheapest videocards there are 16x
21:13imirkin: loonycyborg: http://www.newegg.com/Product/ProductList.aspx?Submit=ENE&IsNodeId=1&N=100007709%20600007854
21:13loonycyborg: and there's not even a sorting criterion for pcie arity
21:14imirkin: not a ton of them, but they're out there.
21:15imirkin: nv30/nvfx_vertprog.c:74:temp: Assertion `0' failed.
21:15imirkin: gr. finally defeated.
21:15imirkin: i thought gles2 didn't even require this dynamic loop junk
22:02karolherbst: mupuf: his CON table is empty
22:02karolherbst: mupuf: mine is not
22:02karolherbst: mupuf: if CON non empty -> poke 0x62c000
22:02karolherbst: now, let me increase the sample size
22:03karolherbst: another with no poke to 0x62c000 and empty CON table :)
22:04karolherbst: yeah, maybe that's it
22:04imirkin: karolherbst: the real test is to fiddle with your own vbios
22:04imirkin: and see what the blob does when you do that
22:05karolherbst: but now I have at least an idea
22:05karolherbst: and it seems to fit for at least 5 cards now
22:16karolherbst: stupid nvidia, causes kernel crashes again
22:36RSpliet: +1 lamme kutnerd!
22:37RSpliet: whoops, wrong chan