01:35 Caterpillar: Reading the following tablehttp://nouveau.freedesktop.org/wiki/VideoAcceleration/ I don't understand if nouveau drivers on Geforce GT240 support x264 hardware decoding acceleration
01:37 Caterpillar: Kernel is 4.1.7-200.fc22.x86_64
01:38 Caterpillar: ah it should be VP4.0: NVA3-NVA8, NVAF (GeForce 200, 300 series; corresponds to VDPAU feature set C)
01:38 mupuf: Caterpillar: ;)
01:39 mupuf: yep, just checked, as long as you have the fw, you may use mplayer to decode your videos
01:40 Caterpillar: mupuf: why the firmware?
01:41 Caterpillar: nvidia no longer pays attention to 340xx drivers so it's time to dismiss them
01:41 mupuf: Caterpillar: because no-one here spent the gigantic amount of time necessary to both reverse engineer the hw and reimplent them with nouveau
01:41 mupuf: the RE-ing happened for VP2
01:41 mupuf: and it is a beast
01:42 mupuf: and aside from mwk, I do not know anyone who would have had the patience for it! 1 mwk rates really high on the reverse engineering skills scale!
01:43 Caterpillar: mupuf: ah okay I did not know about those things
01:43 Caterpillar: I did not read the whole webpage
01:46 mupuf: that last part was apparent from your question :p
01:46 Caterpillar: concerning
01:46 Caterpillar: $ wget http://us.download.nvidia.com/XFree86/Linux-x86/325.15/NVIDIA-Linux-x86-325.15.run
01:46 Caterpillar: can I use another and more recent file?
01:47 Caterpillar: for example 340xx drivers
01:47 mwk: Caterpillar: better not
01:47 Caterpillar: :O
01:47 mwk: the extraction script is rather tied to the exact image
01:47 mwk: it's not particularly smart
01:48 Caterpillar: then shame on nvidia that uses tricks to hide its software
01:50 mupuf: indeed. And newer releases will not bring you support you care about anyway
01:50 mupuf: while it will drop the firmwares of older gpus
01:51 mupuf: when video decoding is added to maxwell, we will need to take action. But not before
01:51 Caterpillar: maxwell is the lastest GPU architecture right?
01:51 mupuf: yes
01:57 Caterpillar: concerning my personal computer, I have to keep propertary drivers due CUDA usage
02:25 karlmag: mupuf: so.. is there a way to get more mwks? Duplication, cloning, etc? :-P
02:32 mupuf: karlmag: well, it would take years to grow a clone, and they will likely not yield the same experience because it requires more tham genetics :p
02:33 karlmag: true that... instant duplication would be better
02:47 RSpliet: karlmag: you could consider just encouraging the natural process
02:47 RSpliet: of breeding
02:48 karlmag: breeding is one thing, directing interest is another thing
02:51 RSpliet: then aim for redundancy, there's got to be one good copy amongst them
03:05 pmoreau: RSpliet: Could you please check on Tesla+ for reads of 0x8807c? And check 0x88080, 0x8807c if they have the EXT_TAG enabled or not.
03:06 karlmag: RSpliet: law of big numbers..
03:06 RSpliet: pmoreau: G200 does not have 0x88080 reads, I think the trace should be in NVBIOS
03:06 pmoreau: Hum...
03:07 pmoreau: RSpliet: But does it have a 0x88080 write? O:-)
03:07 pmoreau: I'll have a look tonight
03:07 RSpliet: well, yes it does, but wait a tick
03:09 RSpliet: G94 does not have any mention of 0x8807c, G200 does, but the first occurrence is a write
03:09 RSpliet: although that might be VBIOS execution?
03:10 RSpliet: no, that's not VBIOS for sure :-)
03:10 RSpliet: no lost events
03:11 pmoreau: A write to 0x8807c? But it's supposed to be PCI.CAP iirc
03:11 RSpliet: yes, that doesn't make a lot of sense...
03:12 pmoreau: What are the other occurence?
03:12 pmoreau: Could it be that the lower limit is G96 rather than G84? I'll have to check that
03:13 RSpliet: okay, ignore some of this... I believe I might just be silly. That write to 0x8807c does not exist on G200
03:14 pmoreau: Ah ah ah ah! :D
03:14 RSpliet: G94 has no mention of reads to 8807c
03:14 RSpliet: but don't be fooled, there could be some other conditions in play here :-P
03:14 pmoreau: True
03:14 RSpliet: 88080 is def. in play
03:15 RSpliet: as is 154c btw
03:17 pmoreau: What do you have for value for 154c on the G94 and G200?
03:17 RSpliet: anyway, you can take a look at trace yourself, and rest assured that the card is the only parameter that differs
03:17 RSpliet: (same motherboard, same computer, same driver version)
03:18 pmoreau: Yeah
03:18 RSpliet: G94: boot 0x7c -> 0x7d -> 0xfd
03:20 RSpliet: G200 same, but with UNK8 set by the VBIOS
03:20 pmoreau: Ok
03:21 pmoreau: Probably have to take into account the current PCIe version, which could mean I'll have to change the value along PCIe version changes from Karol's patches
03:22 RSpliet: possibly
04:56 karolherbst: okay, the reclocking algorithm of gk20a kind of works :/
04:59 karolherbst: mupuf: with nouveau + userspace daemon using cstate debugfs interface: https://gist.githubusercontent.com/karolherbst/3ba4ab8aca0daa5f131d/raw/8c123d2597ddf5bb137aabcc44bd5e2d3e9eab0f/pwr_read
05:00 karolherbst: was playing antichamber at nearly constant 60 fps
05:02 karolherbst: mhhh, the algorithm needs 0.4 seconds until it clocks up though :/
05:07 karolherbst: RSpliet: do you know any good reclocking algos? I try to look up what radeon does, but maybe I won't find it :)
05:10 RSpliet: you mean an algorithm to dynamically clock up?
05:10 RSpliet: no, haven't given that too much consideration so far
05:11 karolherbst: I see
05:11 karolherbst: I tried out the gk20a algo
05:11 karolherbst: it has its weaknesses
05:11 RSpliet: but it likely involves getting the load of various engines, and clock up when one of them hits 70%
05:11 RSpliet: and clock back down when they all drop to sth like 20 or 30%
05:11 RSpliet: NVIDIA clocks up whenever a new GL program starts
05:11 RSpliet: like, all the way up
05:11 karolherbst: RSpliet: the algo is already much smarter than this :)
05:11 karolherbst: RSpliet: really?
05:12 karolherbst: not for me
05:12 RSpliet: not any more?
05:12 karolherbst: the algo takes around 0.4 seconds
05:12 karolherbst: sooo
05:12 karolherbst: until it reaches full power
05:12 RSpliet: are you sure that's not the measurement method being inaccurate?
05:12 karolherbst: no
05:12 karolherbst: the algo waits until you get to 90% avg loag
05:12 karolherbst: *load
05:13 karolherbst: I will show you
05:13 karolherbst: this is the important part of gk20a: https://github.com/karolherbst/nouveau/blob/master/drm/nouveau/nvkm/subdev/pmu/gk20a.c#L79L87
05:13 karolherbst: load is the average load though
05:13 karolherbst: which is usually calculated by having the old and new one / 2
05:14 karolherbst: the else part clocks down below 70
05:14 karolherbst: and clocks slowly up above 70
05:14 RSpliet: right, but for non-IGPs you probably want to monitor memory load as well
05:14 karolherbst: and wich slowly I mean really slow
05:14 karolherbst: yeah
05:14 karolherbst: but this is easier
05:14 karolherbst: usually you have only 3 memory states
05:14 karolherbst: for the gpu core though I "only" have 36
05:14 RSpliet: yes, but you can saturate the memory bus without saturating your cores
05:15 karolherbst: and some desktop hi perf kepler have like 50
05:15 karolherbst: RSpliet: right
05:15 karolherbst: I was testing this at 0f pstate
05:15 karolherbst: but changing pstates when memory load is too high isn't that difficult
05:15 karolherbst: just go one up
05:15 karolherbst: and if that's not enough, go another up
05:15 RSpliet: (although, arguably when you saturate the bus your cores will be busy stalling, so will have a high load)
05:16 karolherbst: yeah, but after pstate change the gpu core may drop
05:16 karolherbst: and a lower cstate will be used
05:16 karolherbst: finding the correct algo for memory may be more difficult though, allthough it is wasy to do :/
05:16 karolherbst: for the core the cstates are like 20MHz steps
05:16 karolherbst: but memory ...
05:17 RSpliet: you don't have to be spot on straight away
05:17 karolherbst: I know
05:17 karolherbst: but I don't want the driver to change pstate all the time :D
05:17 RSpliet: I don't think that's a serious issue
05:17 RSpliet: under load it'll just clamp to the highest speed
05:17 karolherbst: not aslong perf doesn't drop, right
05:18 RSpliet: in desktop use it might swing around, but nobody will notice
05:18 karolherbst: I meant pstate switches every 0.1 seconds
05:18 karolherbst: because the load is high at 0a, but under 50% at 0f
05:18 karolherbst: so with a bad algo...
05:19 karolherbst: I think I need to read the memory clocks for each pstate
05:19 karolherbst: and use them in my algo
05:19 karolherbst: even if that is inacurate
05:21 karolherbst: also I need the load of the "display" engine :D
05:21 RSpliet: well, not the load, but you do need to make an estimate of bandwidth required for scanout
05:22 RSpliet: that's mostly relevant on Tesla, where for two big displays you'll see NVIDIA never drops to the lowest memclk speed
05:22 karolherbst: yeah
05:22 karolherbst: but we need something like that for nouveau I guess too
05:22 RSpliet: eventually yes
05:22 karolherbst: if someone just plugs some display 4k in
05:23 karolherbst: *dispalys
05:23 RSpliet: oh that won't work for other reasons
05:23 karolherbst: fermi supports 16K res, right? :)
05:23 karolherbst: there was a patch so that it works
05:23 karolherbst: RSpliet: http://cgit.freedesktop.org/~darktama/nouveau/commit/?id=ee1013254158feb34e952115f87c7311e5aa0658
05:23 karolherbst: there was a reason for that patch ;)
05:23 RSpliet: I've seen it, but the output ports have clock limitations
05:24 karolherbst: I know, but if you plug ind like 3 displays
05:24 karolherbst: and each at very high res
05:25 RSpliet: iirc, 4K monitors only work with Kepler on max. 30Hz
05:25 RSpliet: but yes
05:25 RSpliet: more pixels equals more bandwidth
05:25 karolherbst: ohh I need to rebase my pcie stuff :/
05:48 karolherbst: okay, done:)
05:53 karolherbst: mhh well at least I didn't find anything usefull in the readeon source :/
05:53 karolherbst: RSpliet: any idea how an algo for memory reclocking could look like?
05:54 karolherbst: I somehow doubt that it behaves like the core load and will most like not reach this 70/90 load targets
06:00 karolherbst: for what are the pmu counters currently used anyway? Or for what will they be used? not that I claim too many slots all the time and nothing is able to use them anymore
06:26 karolherbst: imirkin, RSpliet: do you think it may be usefull to let the pcie speed be clockable without a pstate change? Maybe this is just some corner case for some hardware, which hash different "lowest" cstate across pstates, but increasing pcie speed also always means increasing memory clock :/
06:27 karolherbst: mupuf: do you know if the pmu can also reclock? one radeon guy told me, that for newer cards the gpu reclocks itself?
06:56 RSpliet: karolherbst: PMU can currently not do autonomous reclocking
06:57 karolherbst: okay
06:57 karolherbst: so it has to be done inside the driver
06:57 karolherbst: good to know
06:57 RSpliet: the driver instructs the PMU to do memory reclocking for us, but the whole script is uploaded
06:58 karolherbst: okay
06:58 karolherbst: but this is because of latencies and the pmu can do it quicker and with less race conditions?
06:59 RSpliet: mostly stability, the PMU can pause the bus engine which stops anything from accessing RAM
07:00 karolherbst: ohh okay
07:00 karolherbst: makes sense
08:13 Franz__: hi, i want to load a custom edid but it fails to load, i don't know why... it says "platform DP-1: firmware: failed to load edid/8bpc.bin (-2)"
08:13 Franz__: i created /usr/lib/firmware/edid/8bpc.bin before
08:13 karolherbst: :) pstate changes also implemented now
08:14 Franz__: and added drm_kms_helper.edid_firmware=edid/8bpc.bin to the kernel line in grub
08:14 Franz__: like it said in nouveau wiki
08:39 karolherbst: RSpliet: what is like the "quality" factor of dyn clocks: same perf compared to full speed all time?
08:40 karolherbst: or is it fine if the perf decreases by less than 5%?
08:43 RSpliet: karolherbst: ideally not noticable
08:43 RSpliet: but for a little swoosh in Gnome Shell nobody is going to care if your animation was only 30fps
08:45 karolherbst: mhh
08:45 karolherbst: right
08:45 karolherbst: I want to let the driver check every 0.1 seconds like it is done for gk20a
08:45 karolherbst: and usually 4 checks are needed at full load to start reclocking
08:45 karolherbst: and after 7-8 checks it is at full speed
08:46 karolherbst: though we might have to test a lot of stuff to find perfet params
08:46 karolherbst: RSpliet: this is my userspace daemon currently: https://github.com/karolherbst/envytools/commit/6baa060eaefeaf6de80a4381ad7883ddd175f9a2
08:47 karolherbst: calc_new_state does the work
08:47 karolherbst: it handles already mem+core, core and pcie speeds
08:47 karolherbst: mem alone isn't worth considering somehow, because the mem load is weird
08:49 karolherbst: or maybe I just need different counters for that :/
08:50 RSpliet: btw, when you said "PMU", you meant the Falcon engine right? because that's not the performance monitoring unit, but the performance management unit :-P
08:50 karolherbst: I write into/read from PMU regs
08:51 karolherbst: mhh, -5% in heaven :/ but why, the daemon didn't change clocks there :D
08:52 karolherbst: ohh it dropped pstate
09:08 karolherbst: okay, when running the daemon fast enough, it still changes pstate sometimes, but the perf is the same as with max speed always
09:10 imirkin: and this is why there's a falcon in the pmu, so that you can do this polling from within the gpu
09:10 imirkin: and send interrupts to the host when you want to reclock
09:11 karolherbst: :) I understand
09:11 karolherbst: saw that in the gk20a source already
09:11 karolherbst: imirkin: this thingy right? nvkm_timer_alarm(tmr, 100000000, alarm);
09:12 imirkin: no.
09:12 karolherbst: ohh
09:12 imirkin: there are several CPUs inside of the GPU
09:12 imirkin: aka "falcons"
09:12 imirkin: they run little RT OS's on them
09:12 imirkin: which do things
09:12 karolherbst: okay
09:13 karolherbst: I just looked over how the gk20a part does stuff
09:13 karolherbst: and though I do it the same way
09:13 imirkin: i dunno how they do it in nouveau
09:13 imirkin: but in the blob, there's a RTOS running on the pmu, telling the host when to switch clocks
09:13 karolherbst: nvkm_timer_alarm(tmr, 100000000, alarm); and then some driver code
09:14 karolherbst: mhh okay
09:14 imirkin: at least i'm moderately sure that's the case
09:14 karolherbst: would that apply to all gens?
09:24 karolherbst: imirkin: so what would be the prefered way to do it in your opinion?
09:36 imirkin: karolherbst: the RTOS way
09:37 imirkin: karolherbst: have the dedicated falcon processor monitor the GPU
09:37 imirkin: karolherbst: and send the host interrupts when it's ready
09:37 imirkin: karolherbst: this is tricky though
09:37 karolherbst: yeah I figured
09:37 karolherbst: but does it work with every chipset?
09:37 karolherbst: if not I would rather implement something which works everywhere :)
09:38 RSpliet: I'm sure it *can* work for GT21x+
09:38 karolherbst: okay, so not for older chips?
09:38 RSpliet: nope
09:38 RSpliet: older chips only had HWSQ, no "falcon"
09:38 karolherbst: then we would need a driver implementation which polls the pmu stuff anyway
09:39 RSpliet: (fuc, as they're called in nouveau)
09:39 imirkin: yes
09:39 karolherbst: then I will do this first :D
09:39 imirkin: but only for pre-GT215
09:39 karolherbst: it also works on my kepler
09:39 karolherbst: so
09:39 karolherbst: we can have it for now and after that I could take a look at this rtos thingy
09:40 karolherbst: anyway, for the algorithm itself it doesn't matter if it runs on the falcons or on the cpu, right?
09:40 imirkin: right
09:40 karolherbst: okay
09:41 imirkin: except iirc falcons have access to a ton more data than is available over mmio
09:41 karolherbst: figuring out a good algo is the hardest part anyway
09:41 karolherbst: ohhh
09:41 karolherbst: okay
09:41 karolherbst: but that would be in the domain of improving the algo
09:41 imirkin: sure
09:42 karolherbst: if you want you can try my daemon out :D you would only need my pstate/cstate interface patches
09:42 karolherbst: I have a 10% perf drop in valley though
09:42 karolherbst: ohh no, only 1%
09:42 karolherbst: :D
09:43 karolherbst: I think while switching the scenes, the daemon clocks down, and raises the clock too late
09:43 karolherbst: maybe I should add some kind of "only actually clock down, if load is low for 10 seconds"
09:46 imirkin: clock up early, clock down late will give you the best perf probably
09:48 karolherbst: yeah, I think in games "loading" stuff is the biggest issue for such algos
09:49 karolherbst: lusually for workloads where the gpu is at 100% load, the driver clocks up in under 1 second
09:49 karolherbst: *daemon
09:49 karolherbst: is this "fast" enough?
09:56 karolherbst: imirkin: dyn clocks _require_ an module option, right?
09:56 imirkin: ?
09:56 karolherbst: I mean, we shouldn't add this without having an option for now
09:56 karolherbst: *parameter
10:58 karolherbst: PDAEMON.COUNTER_MASK[0x1] <= { PVLD | PPDEC | PPPP | 0x20000 } this 0x20000 seems to be highly video related somehow :D
11:02 karolherbst: stupid X server, closing all the time, just because i press some keys wtf :/
11:05 karolherbst: what is NVLeaveVT ?
11:06 karolherbst: after that my x server just closes
11:06 imirkin_: that's called on shutdown
11:07 imirkin_: note that if you're doing -sharevts
11:07 karolherbst: I know
11:07 imirkin_: then your secondary X server gets all of your primary X server's input
11:07 karolherbst: yeah, but why does want the second X server to stop anyway?
11:07 karolherbst: that happens just too often to be able to do stuff
11:07 karolherbst: mhhh, now it works without -sharevts
11:08 karolherbst: so if it works without it, I shouldn't use sharevts?
11:09 imirkin_: it doesn't work -- it's sitting inactive
11:09 imirkin_: which in turn means you can't actually do anything with it
11:10 karolherbst: well I can play videos though vdpau though :/
11:10 imirkin_: hm. a little surprising.
11:11 karolherbst: it only launch sometimes without -sharevts though
11:11 karolherbst: mhhh
11:11 karolherbst: how can I encode through vdpau (or somehow on the nvidia card?)
11:11 imirkin_: nvenc
11:11 imirkin_: no one's looked at it for nouveau
11:12 imirkin_: it's kepler+
11:12 karolherbst: nvenc is the engine?
11:12 imirkin_: and i only recently gained access to a kepler board
11:12 karolherbst: I bet 0x20000 is the bit for nvenc then
11:15 karolherbst: wow, yeah makes sense
11:15 karolherbst: reg is 0x00000070 on fermi, 0x00020070 on kepler
11:16 imirkin_: no reason for nvenc to be on for vdpau
11:16 m3n3chm0:nasZ
11:16 karolherbst: I know, but the blob uses 4 counters on pdaemon
11:16 karolherbst: 1. core load 2. mem load 3. pcie load 4. video accel load
11:17 karolherbst: and this bit is enabled along the other 3 video accel bits
11:17 imirkin_: ah makes sense
11:17 karolherbst: sometimes even in a seperated counter though
11:17 karolherbst: but then alone
11:18 karolherbst: imirkin_: like this https://gist.github.com/karolherbst/b605f3559873d84bef82
11:18 imirkin_: it's probably the same clock domain
11:18 karolherbst: maybe
11:18 karolherbst: could be something else though, but the nvenc bit is missing anyway
11:19 karolherbst: and there is no other bit enabled on all cards which isn'T REed yet
11:19 karolherbst: would like to confirm this one though
11:20 karolherbst: what is new on maxwell?
11:20 karolherbst: mhhh
11:20 karolherbst: maxwell looks strange
11:21 imirkin_: maxwell video decoding situation is all different
11:21 karolherbst: ohh no, it was just 2 cards in one trace
11:21 imirkin_: but as i've said before, i think that each person only has it in them to RE a single video decoding engine per lifetime
11:21 imirkin_: and i've spent mine
11:21 karolherbst: :D
11:22 karolherbst: too much work or each too different?
11:22 imirkin_: too much really boring really repetitive work
11:22 karolherbst: ohhh okay
11:22 imirkin_: i think mlankhor1t had a more principled approach than i did
11:23 imirkin_: but i'm pretty sure he also lost the will to live over doing vp4, so dunno in the end
11:23 karolherbst: I could encode with the blob though
11:23 imirkin_: yeah, the video encoding might actually be a lot easier
11:23 imirkin_: depends on the api
11:23 karolherbst: I meant verifying that one bit :D
11:24 imirkin_: in an ideal world it's "configure engine with parameters; feed frames; read bitstream"
11:24 imirkin_: but in a practical world i'm sure it won't be that easy
11:25 karolherbst: ohh right, nouveua clocks the video engines per pstate
11:25 karolherbst: and not per cstate like the blob does
11:33 karolherbst: imirkin_: something in mesa increased cpu overhead :/
11:33 karolherbst: I am not getting even close to my old glxspheres64 perfs now
11:33 imirkin_: debug build?
11:33 karolherbst: no
11:33 imirkin_: this is since yesterday?
11:33 imirkin_: i haven't pushed anything... if you can bisect, that'd be quite useful
11:33 karolherbst: kind of, but I reinstalled system mesa
11:39 karolherbst: the algo from gk20a is very good though
11:39 karolherbst: never actually chooses a cstate with reduced perf for given load
11:43 karolherbst: ohh wait
11:43 karolherbst: kernel bug again
11:43 karolherbst: after suspend, my cpu is with disabled turbo
11:44 karolherbst: ahhh much better now :)
11:44 karolherbst: okay, there is no mesa issue
11:48 karolherbst: I need a 4k video or something :/
11:48 karolherbst: I need to put heavy load on the video engine
11:53 karolherbst: ohh wow
11:53 karolherbst: nouveua is too weak for 4K vdpau decoding :/
11:53 karolherbst: at least for me
11:53 karolherbst: wait
11:53 karolherbst: stupid daemon
11:54 imirkin_: :)
11:54 imirkin_: at the lowest perf level, sure is!
11:55 karolherbst: even at 0f
11:56 imirkin_: hrmph
11:56 imirkin_: sad.
11:56 karolherbst: even cpu mplayer isn't fast enough
11:56 karolherbst: vlc plays it fine though
11:56 imirkin_: i suspect we're not making the best use of the engine
11:56 imirkin_: there are these funny ringbuffers
11:56 karolherbst: maybe
11:57 karolherbst: will check the clocks
11:57 imirkin_: iirc we don't size them properly
11:58 karolherbst: yeah, don't scale well up
11:58 karolherbst: VDEC at 504MHz
11:58 karolherbst: 540
11:58 karolherbst: 405 at 07
11:58 karolherbst: so mhh
11:59 karolherbst: video load at 100% though
11:59 karolherbst: soo
12:01 karolherbst: ha fun, the blob doesn't even want to play this 4K video
12:02 imirkin_: kepler should handle 4k videos
12:02 imirkin_: the video engine was specifically updated for it
12:02 karolherbst: "Protocol name not provided,"
12:02 karolherbst: something stupid
12:02 imirkin_: probably.
12:11 karolherbst: ohh okay
12:12 karolherbst: the clocks are indeded the same
12:12 karolherbst: at least the clocks displayed by mupufs tool
12:12 karolherbst: which are plenty
12:12 karolherbst: mhhh wiat a second
12:14 karolherbst: how did that happend
12:14 karolherbst: I got a much higher vdec freq than usual
12:39 karolherbst: anybody here with a tesla gt215+ card?
12:40 imirkin_: me, but not right now
12:40 karolherbst: okay
12:40 karolherbst: I just want to check if there is really no PCIE counter on tesla
13:41 karolherbst: antichamber is really nice gpu mem load test :) I am able to clock to 4th or 5th cstate (around 450MHz) while running at 0f pstate and get constant 60 fps :)
13:41 karolherbst: and 0a pstate produces a lot of gpu core stalls
14:09 karolherbst: imirkin_: with borderlands I get sometimes this: https://gist.github.com/karolherbst/bb3340599d7388541e71
14:09 karolherbst: may this be clocking related?
14:09 karolherbst: or could there be seriously something wrong
14:09 imirkin_: is this new?
14:10 karolherbst: no
14:10 karolherbst: while figuring out the gddr5 situation I got those a lot
14:10 karolherbst: but
14:10 karolherbst: mhh
14:10 karolherbst: mupufs pwm patches may select a too low voltage for me
14:10 imirkin_: i dunno how to read those traps tbh
14:11 karolherbst: core load at 100% anyway
14:11 karolherbst: even after application exit
14:12 karolherbst: reclocking still works
14:12 karolherbst: mhh
14:15 karolherbst: either way, it isn't always reproducible, so I just assume this is a voltage issue
14:15 karolherbst: nouveau drives my card at a lower voltage than the blob anyway
14:29 karolherbst: mhh usually even 90% target load is too low and higher clocks than actual needed are choosen
14:29 karolherbst: well we should play it save and use 75 or 80
14:32 karolherbst: imirkin_: I get the same perf in borderlands as in my benchmark with 2/3 of the core clock :/
14:32 karolherbst: but it isn't 60 fps
14:32 karolherbst: 33% stalls in the core?
14:33 karolherbst: I always think my daemon does stupid things, because it doesn't clock higher even when games are running below 60 fps, but raising clocks doesn't do anything
16:12 karolherbst: what is PMFB and BFB_NISO?
16:12 skeggsb_: PMFB is ltc
16:12 skeggsb_: nfi what BFB_NISO really maps to
16:13 karolherbst: ohhh I don't know what ltc nor nfi is either :D
16:14 imirkin_: you know what nfi is.
16:14 skeggsb_: ltc is "level two cache"
16:14 imirkin_: trust me.
16:14 skeggsb_: nfi is "no fucking idea" ;)
16:15 karolherbst: :D
16:15 karolherbst: I see
16:15 karolherbst: okay, I usually know ltc as L2 cache
16:16 skeggsb_: it's nvidia's name for that device
16:16 karolherbst: okay
16:17 karolherbst: PMFB doesn't seem to be used by the blob (in the pdaemon counters=
16:17 skeggsb_: they'd care more about the traffic that actually makes it to the ram chips
16:17 karolherbst: BFB_PART0_REQ is used for memory load though
16:18 karolherbst: okay, I think I made my tool ready for fermi+ chips
16:19 karolherbst: skeggsb_: wanna try out a userspace dynamic reclocking daemon
16:19 karolherbst: I need some tests to check if the algorithm itsels is good enough
16:20 karolherbst: and if it works across devices, too
16:20 skeggsb_: there's a lot more factors than load that needs to be taken into account :P
16:21 karolherbst: I just looked at the gk20a source
16:21 karolherbst: :D
16:21 karolherbst: they only care about core load
16:21 karolherbst: but I know
16:21 karolherbst: there is bunch of crappy stuff and all, but across all application I used, I still got max perf at lower clocks than full 0f
16:21 karolherbst: so I am happy already
16:22 karolherbst: skeggsb_: that's what I wrote so far: https://github.com/karolherbst/envytools/blob/943a8eb84ba1a8db2c46a892c085a2c17a25dca4/nva/nvagpuload.c#L94-L179
16:25 karolherbst: it usually the gk20a algo adapted to pcie speed, memory speed, clock stalls because of memory and video engine
16:33 karolherbst: skeggsb_: do you have any idea what kind of algorithm is usefull for memory?
16:46 karolherbst: imirkin_: you said there was some kind of binaries executed by the falcons. Is it the stuff written into PDAEMON.DATA or is this something else?
16:47 imirkin_: the stuff written to PDAEMON.CODE :)
16:47 imirkin_: data feeds data to the falcons... somehow. i forget the exact mechanism.
16:47 karolherbst: ohhh
16:47 karolherbst: okay
16:48 imirkin_: i forget if demmt auto-decodes it
16:48 karolherbst: the algorithm how nvidia decides to reclock should be in the binaries?
16:48 imirkin_: envydis definitely knows about the isa
16:48 imirkin_: heh. yes and no.
16:48 imirkin_: yes it's there, no you won't find it
16:48 imirkin_: it's a full RTOS iirc
16:48 imirkin_: aka huge
16:48 karolherbst: ohhh
16:49 imirkin_: but i could be misremembering
16:49 imirkin_: feel free to have a look
16:51 karolherbst: hee wait
17:24 skeggsb_: karolherbst: no, resman makes those decisions, pmu does various power/clock gating things on its own, but clock freqs etc are in resman
17:24 karolherbst: what is resman?
17:25 imirkin_: skeggsb_: it doesn't report gpu loads/etc to the kernel?
17:25 imirkin_: ultimately the kernel module makes the decision
17:25 skeggsb_: imirkin_: yes, i believe pmu monitors that stuff (at least, tegra's does), but RM is what consumes that data
17:25 imirkin_: right
17:25 skeggsb_: karolherbst: the primary part of nvidia's kernel module
17:25 skeggsb_: all the closed bits, basically
17:25 imirkin_: that's in line with my understanding as well
17:25 karolherbst: ohh I thought it is some kind of chip on the gpu
17:26 karolherbst: okay, so resman is just "normal" software
17:26 skeggsb_: yes
17:26 karolherbst: okay, but the gk20a code pretty much looked like recklocking already
17:26 skeggsb_: it's roughly the equivilant of the stuff nouveau has under nvkm/
17:26 skeggsb_: gk20a's pm code in nouveau is primitive compared to nvgpu (android gk20a driver)
17:27 karolherbst: it even calls nvkm_clk_astate
17:27 karolherbst: ohhh okay
17:27 karolherbst: was looking at the nouveau code
17:27 karolherbst: yeah I already wondered why it is so trivial
17:27 karolherbst: it works well enough though
17:29 karolherbst: skeggsb_: okay so basically the driver configures some PMU counter, reads them in an specific interval, does some magic and then it reclocks, right?
17:30 karolherbst: I guess there is more data collected as well
20:42 imirkin: skeggsb_: just sent a patch to fix OF loading... let me know if you had something substantially different in mind
20:42 skeggsb_: nah, that looks fine, there's not really a non-hackish way to do it i don't think
20:43 imirkin: yeah, couldn't think of anything simple
20:43 imirkin: but then i didn't spend too much time trying either
20:47 skeggsb_: did you get very far with the mesa be issues?
20:47 imirkin: not yet
20:47 imirkin: but i'm about to try
20:47 imirkin: i didn't really try earlier tbh
20:48 skeggsb_: it's pretty keen of you to try at all :P
20:48 imirkin: i expended so much effort on that gr be bug a while back
20:48 imirkin: that i didn't turn the thing back on until just now
20:52 imirkin: it also doesn't help that it seems pretty hang-y
20:53 imirkin: i dunno if this one's just damaged or if it's a more general issue... benh apparently no longer tests on g5's
21:19 imirkin: oh well this REALLY STINKS
21:19 imirkin: the dri driver path is *hardcoded* into the X server binary
21:20 imirkin: grrrrrrrr
21:23 imirkin: whoa. it works.
21:23 imirkin: that's... unexpected.
22:36 imirkin: skeggsb_: hm, are you aware of any pageflipping issues on nv34? looks like sync to vblank isn't working quite right on the ppc64 nv34
22:37 skeggsb_: not off the top of my head, no
22:38 imirkin: i'm getting odd hangs every so often in glxgears when it's sync'd to vblank
22:38 imirkin: on a VGA display
22:40 imirkin: without vblank sync i get 640fps
22:40 imirkin: so it's not like the card is underpowered
23:05 imirkin: fire: nouveau_fence.c:102: nouveau_fence_emit: Assertion `fence->state == 1' failed.
23:05 imirkin: that seems bad
23:05 imirkin: (on nv30)
23:05 imirkin: well, it'd be bad anywhere. hit it on the nv34 + ppc.
23:07 imirkin: and the overall thing is buggy tooo...
23:34 imirkin: urgh. i bet the kick handler is doing something dodgy :(