00:00 AndrewR: still not work, with this path http://www.fpaste.org/276795/ I still only have http://www.fpaste.org/276796/ (volt_table_Version: 0) and obviously all downstream functions doesn't work
00:19 AndrewR: still something wrong, I used 0x10 instead of 0x16 for this case switch (like in nvbios source), but now it thinks I have 0uv voltage ...!
00:21 AndrewR: http://www.fpaste.org/276803/
00:25 AndrewR: with this patch http://www.fpaste.org/276804/43755131/
00:28 AndrewR: ups, forgot to add same case in nvbios_volt_entry_parse ... recompiling
00:35 AndrewR: |o? it works!
00:35 AndrewR: \o/
00:36 AndrewR: 20: core 300 MHz shader 300 MHz memory 1000 MHz AC DC *
00:36 imirkin: what does the AC line say?
00:38 AndrewR: http://www.fpaste.org/276805/
00:39 AndrewR: imirkin, in theory - there can be different perf level if machine works from line power AC or battary
00:39 AndrewR: but because this is desktop ..it should be irrelevant...now lets see if I lose my display at highest perf level
00:42 imirkin: the last line is the only thing that's relevant
00:42 imirkin: the other lines are just for show
00:44 AndrewR: at highest perf level it still fail ....
00:56 AndrewR: imirkin, ah, sorry ..initial line (boot perf level) was AC: core 100 MHz shader 100 MHz memory 501 MHz
00:58 imirkin: AndrewR: i mean once you reclock, what does that line say?
00:58 AndrewR: imirkin, AC: core 300 MHz shader 300 MHz memory 1000 MHz :}
00:58 imirkin: cool
00:58 AndrewR: a bit confusing ....
00:59 AndrewR: like, cat'ing pstate file gives you 3 (in my case) lines, 2 real perf levels and boot one. After switch happen - boot level disappear!
01:00 imirkin: actually it's the available pstates, and the current one
01:02 AndrewR: imirkin, ok (so, there is no easy way to return card to boot clocks? without full reset ...)
01:02 imirkin: no way at all.
01:05 AndrewR: imirkin, at least very fast switch to perflevel max and back not resulted in display & machine hang ...
01:05 AndrewR: echo 21 > /sys/class/drm/card0/device/pstate && echo 20 > /sys/class/drm/card0/device/pstate
01:07 AndrewR: but if I leave it at 21 - even without any 3d, in fb (kms) console - it will surely hang (monitor lost signal, machine can't be powered off via power button) after some seconds
02:05 karolherbst: AndrewR: yeah, sounds a bit like a voltage issue :D
02:05 karolherbst: usually without load the gpu should be able to stay at higher perf level and low voltage, but as long it does something, it is likely to crash
07:03 hakzsam: imirkin, gr: ILLEGAL_CLASS ch 5 [004f9c2000 Xorg[6997]] subc 1 class 91c0 mthd 335c data 00000000
07:03 hakzsam: weird, because the GF110 should support that class
07:05 hakzsam: works fine with the NVC0_COMPUTE_CLASS
07:05 hakzsam: well, I'll use it instead of that weird 0x91c0 class
07:17 karolherbst: mupuf: you like to forget about files, don't you? on your ppwr_rework the pwr.c file is missing
07:17 karolherbst: I meant, perf.c
07:23 mupuf: karolherbst: I do :D
07:23 mupuf: it must be on reator
07:23 mupuf: src/nouveau
07:23 karolherbst: no worries though
07:24 karolherbst: I won't rebase your patches, I just write the perf one from scratch
07:24 karolherbst: and if I need anything from the other ones, I will try to cherry-pick them
07:24 karolherbst: mupuf: I just wanted this perf.c file, because there is the pdaemon <=> host interaction I wanted to check :D
07:24 karolherbst: or so I assumed
07:26 mupuf: you are rewriting the fuc code?
07:28 karolherbst: the perf.fuc file
07:29 karolherbst: which is kind of empty currently ;)
07:29 karolherbst: your code hasn't support for the pcie counter anyway
07:36 mupuf: what?
07:36 mupuf: the fuc file is there
07:37 mupuf: http://cgit.freedesktop.org/~mperes/nouveau/commit/?h=ppwr_rework&id=27310fa8fdc39e54a3f4383fada96a3562c5a022
07:37 mupuf: adding support for one more counter is a matter of copy-pasting
07:37 karolherbst: do I need anything else for that one commit?
07:37 karolherbst: and the missing file is from http://cgit.freedesktop.org/~mperes/nouveau/commit/?h=ppwr_rework&id=c79744f19e8ff656b7c372c08b5ed89724d2a2a6
07:38 karolherbst: perf.c
07:39 karolherbst: mupuf: sadly it isn't as trivial as that, I need to hack in support for more than 4 counters, then different bit combinations for differenct architectures, stuff like that
07:39 karolherbst: but this should be trivial as well
07:40 mupuf: yes, just use the m4 macros that are already in :)
07:41 mupuf: as for perf.c, ack, check it out on reator
07:41 mupuf: you can toy with the code as much as you want, I doubt you will get anything much different than the current code
07:41 mupuf: but feel free to learn how to code in fuc by reimplenting that
07:41 mupuf: it is trivial code anyway
07:42 mupuf: just let me check it out before pushing that in nouveau
07:44 mupuf: I definitely encourages you to learn about fuc :)
07:44 karolherbst: will the compiler prevent me from doing stupid stuff? :D
07:44 mupuf: nope
07:44 mupuf: it is not a compiler
07:44 mupuf: it is an assembler
07:44 karolherbst: ohh right
07:45 mupuf: and it is a pain when you do not know the ISA
07:45 mupuf: but you have a fair amount of code now that you can use as an example
07:45 mupuf: and you can use the documentation of the ISA mwk put together
07:45 mupuf: it is really good
07:46 karolherbst: k
07:46 karolherbst: and NVKM_PPWR_CHIPSET is kind of important I assume :D
07:48 mupuf: a bit :D
07:48 mupuf: see, it is a macro :)
07:49 mupuf: as for the PCIE signal, I did not expose it because I was not sure at the time
07:49 mupuf: speaking about this, I still am not sure what it exactly is
09:15 karolherbst: mupuf: me neither
09:15 AndrewR: hm, at least nvbios thinks this board has max6649 (driven by lm90 kernel hwmon driver?) chp as extdev. But nouveau itself doesn't see/use it?
09:15 karolherbst: mupuf: it doesn't reach 100% when a higher pcie speed it needed
09:15 karolherbst: even at 20% it makes sense to increase pcie speed
10:05 karolherbst: any clue what PMFB could be?
10:11 AndrewR: in http://cgit.freedesktop.org/~darktama/nouveau/tree/drm/nouveau/nvkm/subdev/clk/nv40.c there is comment around line 107: case nv_clk_src_href /*XXX: PCIE/AGP differ*/ . I have AGP board. does this mean I should search mmio trace for specific info, or just hack different value here and see if my 'hang at highest perf level' problem will be solved?
10:13 AndrewR: also, can nvai2cspy tool help anyhow in finding why apparently i2c->monitoring chip fails in nouveau case? (by looking at nvidia's binary driver comm?)
10:13 AndrewR: *nvaspyi2c
10:25 karolherbst: mupuf: instructions can only have one "constant" in the sources?
10:28 imirkin: AndrewR: fyi, people aren't ignoring you... just no one has the answers.
10:30 AndrewR: imirkin, thanks :}
10:31 karolherbst: AndrewR: maybe you should stay focus on one thing and don't try to search for _the_issue_ randomly
10:31 karolherbst: nouveau doesn't find the gpios for you, why?
10:31 karolherbst: this is the thing you have to find out first
10:32 karolherbst: voltage won't work as long the gpio problem isn't solved
10:32 GaivsIvlivs: Any plans on xorg's side for kernel 4.3?
10:33 AndrewR: karolherbst, no, gpio was/is fine after I added my case in switch statement
10:33 karolherbst: AndrewR: ahh okay, didn't know that
10:33 karolherbst: so the gpios are getting set now?
10:33 karolherbst: and so the voltagE?
10:34 karolherbst: we could try something out
10:34 AndrewR: karolherbst, volt: current voltage: 1300000uv
10:34 karolherbst: yeah okay, this looks good
10:34 karolherbst: on the higher pstate you should get 1.4V though
10:37 karolherbst: AndrewR: did you boot with nouveau.debug=debug and verified that the voltage gets set at reclock?
10:37 AndrewR: karolherbst, http://www.fpaste.org/277189/ - looks like it being set (with this specific line setting pstate 21 and just after this again 20)
10:38 karolherbst: okay, looks good
10:39 karolherbst: AndrewR: you could disable core/memory reclocking
10:39 karolherbst: and check if one of it works
10:39 karolherbst: like does it work when you only reclock your memory?
10:39 karolherbst: or the shader?
10:42 AndrewR: (I just thinking now it can be done. Failing mem/core reclock artificially for a test?)
10:43 AndrewR: I see there is tool for uploading custom (edited?) vbios ..but.. i don't have tools for editing it
10:44 karolherbst: AndrewR: you can either change it in the nouveau code though
10:46 karolherbst: AndrewR: https://github.com/karolherbst/nouveau/blob/master/drm/nouveau/nvkm/subdev/clk/base.c#L175
10:46 karolherbst: there you see the ram thingy
10:46 karolherbst: nvkm_cstate_prog is for the cores
10:52 AndrewR: karolherbst, so, I can just add my fixed memory clock here , in khz?
10:53 AndrewR: karolherbst, I also see some comments saying pstate can be 'statically' provided (not from vbios?) , but ..how?
10:55 AndrewR: karolherbst, also, according to this I'm already at max mem speed? http://www.fpaste.org/277206/ even if pstate not highest ...
10:58 karolherbst: AndrewR: I meant you can simply disable those mem reclocking calls :D
11:03 AndrewR: karolherbst, ya, but I prefer to disable core reclocking up to 500Mhz, because I suppose it more likely to hang everything, compared to 'just' shader ...
11:03 karolherbst: well usually there is no big difference in reclocking core or shader
11:04 karolherbst: memory is usually the more complicated part
11:05 AndrewR: karolherbst, but well, unless my card need some additional programming exactly IF shader/core runs (consumes data from memory faster) at 500 but not 300 ... because 300/300/1000 seems to work fine!?
11:07 imirkin: karolherbst: performance characteristics of nv40 might differ a bit from kepler :)
11:10 karolherbst: imirkin: I know
11:10 karolherbst: AndrewR: ohh the memory clock doesn't change at all, didn't know that
11:11 karolherbst: AndrewR: you can disable the clocking of one domain
11:13 karolherbst: AndrewR: https://github.com/karolherbst/nouveau/blob/master/drm/nouveau/nvkm/subdev/clk/nv40.c#L146
11:13 karolherbst: you see the gclk and sclk thingy?
11:13 karolherbst: ohhh wait
11:14 karolherbst: you could compare those regs on nouveau and on the blob
11:14 karolherbst: you might find a difference
11:14 karolherbst: I somehow doubt that, but who knows
11:15 AndrewR: karolherbst, yes I see them ... (in code).
11:16 karolherbst: AndrewR: did you check the pstate file after reclocking to 21?
11:16 AndrewR: karolherbst, I fear just seeing _some_ difference will not fix it here - because all those equations ...blob and nouveau can use slightly different algos, and I have no idea now it all work
11:16 karolherbst: depends
11:16 AndrewR: karolherbst, moment, I'll construct bash line for this and hope it will not hang too fast :}
11:17 karolherbst: :)
11:18 AndrewR: http://www.fpaste.org/277217/
11:20 karolherbst: looks good
11:22 karolherbst: AndrewR: npll_ctrl is 0x004000, npll_coef is 0x4004, spll is 0x4008 and ctrl is 0x00c040
11:22 karolherbst: you could check which values the blob is using for those
11:22 karolherbst: and compare with your 20 pstate and 21 pstate
11:24 imirkin: oh wow
11:24 imirkin: i figured out what was going on with the crash
11:24 imirkin: hakzsam: --^
11:24 imirkin: the thing is that nouveau_bo_wait could trigger a kick as well. that emits a fence into the buffer
11:24 imirkin: but doesn't rotate it
11:25 imirkin: and then when the rotate comes around, we want to emit another fence into it.
11:26 imirkin: sneaky.
11:35 AndrewR: karolherbst, http://www.fpaste.org/277222/ so, spll not changed?
11:36 karolherbst: AndrewR: the important thing is to compare with the blob
11:36 karolherbst: not what the blob itself is doing
11:36 karolherbst: what are the values with nouveau
11:36 AndrewR: karolherbst, good question ....
11:37 karolherbst: AndrewR: nvapeek ;)
11:37 AndrewR: karolherbst, good answer!
11:39 AndrewR: karolherbst, 00004004: 10031906 (for pstate 20)
11:40 AndrewR: karolherbst, 00004000: c001001c
11:43 AndrewR: karolherbst, and at 21: http://www.fpaste.org/277228/
11:45 karolherbst: AndrewR: you could check if it is better with the blob values
11:45 karolherbst: but then you need to set all 4 regs to the values used by the blob
11:45 karolherbst: mhh
11:46 karolherbst: allthough only 0x4000 and 0x4004 are important
11:46 karolherbst: and 0x4000 looks fine
11:46 karolherbst: 0x4004 is different
11:47 karolherbst: but different doesn't mean wrong here
11:48 AndrewR: karolherbst, ya
11:54 AndrewR: karolherbst, so, I nvapoke'd 0x0b041b04 into 0x004004, it seems to stay here...time to hang?
12:00 AndrewR: it hanged
12:06 karolherbst: AndrewR: when did you poke it?
12:06 karolherbst: it is usually not safe to just poke the plls though :/
12:07 karolherbst: I highly doubt though that the issue is there, it is one possibility, but not the most likely one
12:07 AndrewR: karolherbst, before issuing echo 21 > /sys ... ya, and may be at reclock kernel/nouveau overwrites it anyway?
12:07 karolherbst: yes
12:07 karolherbst: ohh wait
12:07 karolherbst: so before pstate it was kind of "stable"?
12:07 karolherbst: you may want to cat the pstate file after poking
12:07 karolherbst: the clocks should change
12:08 AndrewR: karolherbst, I haven't touched ctr stuff (4000).
12:08 karolherbst: maybe you want to poke the pll after writing into pstate, and cat pstate and go back to 20
12:08 karolherbst: just to see what happens to the clock
12:09 AndrewR: karolherbst, ok
12:12 AndrewR: karolherbst, http://www.fpaste.org/277233/
12:19 pmoreau: imirkin: What is the role of nv50_prog_info? For example, why does it stores immediate values and type? Aren't they already included in the NV50 IR code produced?
12:20 imirkin_: dunno.
12:20 imirkin_: but i have a guess.
12:20 imirkin_: with nvfx, there is no such thing as "consts" or "immediates"
12:20 imirkin_: they're sprinkled in between instructions
12:21 imirkin_: i'm guessing calim had aspirations of adapting codegen to do nvfx
12:21 imirkin_: and this would include the "linkage" information there
12:21 pmoreau: Hum, ok. What is nvfx btw? :D
12:21 imirkin_: nv30/nv40
12:22 imirkin_: well, technically just nv30
12:22 imirkin_: but nv40 has a very similar ISA and encoding
12:22 imirkin_: they just move everything over by 1 bit to maximize decoding pain
12:22 pmoreau: :D
12:23 imirkin_: marvel at the beauty of the NVFX_VP macro: http://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/nv30/nvfx_shader.h#n10
12:24 imirkin_: er wait, that's nto it
12:24 imirkin_: although that's a good one too
12:24 pmoreau: How does the "environment" (OpenGL or OpenCL) "knows" which reg in NV50 IR corresponds to a var?
12:24 pmoreau: So this is the shader ISA?
12:24 imirkin_: er hm, maybe it is.
12:25 imirkin_: errrr huh?
12:25 imirkin_: i dunno what question you're asking
12:25 imirkin_: please be more specific
12:25 imirkin_: afaik immediates are entirely unused in nv50_prog_info though
12:26 imirkin_: perhaps calim wanted to play with inlining constbufs
12:26 imirkin_: that'd make more sense.
12:26 imirkin_: but i guess he discovered that'd be a bad idea, or at least not worth it
12:26 pmoreau: Like, how do you link r0 to variable foo, for example, if you are modifying a kernel arg named foo
12:28 pmoreau: I noticed that the nv50_ir_from_tgsi is doing some mkStore(EXPORT) on some shader's output, but I guess there is more to it
12:29 imirkin_: you use OP_EXPORT to write to outputs
12:29 imirkin_: which are then lowered to writes to fixed registers
12:31 pmoreau: Ack, but how does the calling program knows that fix_reg_0 contains the value of foo, and fix_reg_1 bar's value, and not the opposite?
12:32 imirkin_: there's nv50_varying_info or something like that
12:32 imirkin_: but... there's a lot of very fixed assumptions in GL programs
12:32 imirkin_: like the color outputs of frag progs have to go in very specific places
12:32 pmoreau: Make sense
12:32 imirkin_: varying writes on nvc0 are to special storage locations, not registers
12:33 imirkin_: varying writes on nv50 are anywhere you want (iirc) but then there's a remapping thingie
12:33 pmoreau: Do you have any idea how it works for kernel args?
12:34 imirkin_: i don't know anything at all about compute
12:34 imirkin_: you can read some of the existing code
12:34 imirkin_: afaik it worked at one point or another, at least on kepler
12:46 pmoreau: I'll have a look at the compute code on Kepler
12:46 imirkin_: but note that nv50 probably is very different from kepler
12:46 imirkin_: i think there might be some nv50 support code too... maybe not upstream though
12:47 imirkin_: iirc curro worked on it
12:47 pmoreau: I have curro's code
12:49 pmoreau: But afaik, it doesn't generate any NV50 IR code, so there might not be clues on how to link kernel args to actual NV50 IR code
12:49 imirkin_: someone had a branch for supporting compute on nv50
12:49 imirkin_: maybe it was plombo
12:50 pmoreau: there is https://github.com/curro/mesa/tree/nv50-compute
12:50 imirkin_: ah right
12:53 imirkin_: i should probably look over that branch
12:57 imirkin_: pmoreau: i don't think there's a thing as what you're describing
12:57 imirkin_: pmoreau: it's just reads/writes to a buffer
12:57 imirkin_: you look up the right place to read based on your invocation. or something like that.
13:00 pmoreau: Oh, really?
13:00 pmoreau: Hum..
13:00 imirkin_: i don't think there's any arg fetching as there is for e.g. vertex shaders
13:01 pmoreau: So even though you pass separate args to the kernel, they all get merged into a buffer under the hood?
13:02 imirkin_: do you pass separate args?
13:02 imirkin_: how do you pass args?
13:02 pmoreau: At least in CUDA, and quite sure in OpenCL as well
13:02 imirkin_: what is the mechanism for passing args
13:03 pmoreau: Under the hood, I don't know, but for the CUDA API, it is some_kernel_name<<<grid_dim, block_dim>>>(arg1, arg2, arg3, arg0, arg5, ...)
13:04 imirkin_: and what is arg1?
13:04 imirkin_: you don't invoke a kernel on a *single* argument right?
13:04 pmoreau: whatever you specified it to be
13:05 imirkin_: you hvae like 10000000 values for that argument
13:05 imirkin_: that you want the gpu to parallel-ly compute
13:05 imirkin_: right?
13:05 pmoreau: It could be a pointer to some memory location on the GPU, or just a float that you pass by value
13:06 imirkin_: ok, so then you can just pass that stuff in via constbuf
13:07 pmoreau: Isn't constbuf "const", so read-only like uniforms?
13:07 imirkin_: the arg values are constant for all the grid invocations right?
13:07 pmoreau: Except if you specified so, otherwise no
13:07 imirkin_: when would it be different?
13:08 pmoreau: you could have some kernel: void kernel(float* foo) { foo[0] = 3.2 * foo[1]; }
13:08 imirkin_: so the kernel takes one argument, a pointer
13:08 imirkin_: right?
13:09 imirkin_: and each invocation gets the same pointer, right?
13:09 pmoreau: You could have void kernel(float* foo, float bar, unsigned int foo2)
13:09 pmoreau: Each invocation get the same args, right
13:09 imirkin_: whereby again each invocation gets the same 3 args
13:09 imirkin_: aka constbuf :)
13:10 pmoreau: If you say so :-)
13:10 pmoreau: But then, constbuf isn't read-only, right?
13:10 imirkin_: no, it is
13:10 imirkin_: but you never need to modify foo
13:10 imirkin_: only the memory pointed to by foo
13:10 pmoreau: Ha, ok
13:10 imirkin_: the value of "foo" (i.e. the pointer value) is the same for all invocations
13:10 imirkin_: aka constbuf
13:11 imirkin_: right?
13:11 pmoreau: constbuf == T * const
13:11 pmoreau: ?
13:11 imirkin_: constbuf = values that don't change across invocations
13:11 imirkin_: doesn't matter if it's a bool, float, integer, or pointer. they're all just sequences of bits.
13:11 imirkin_: a pointer is just an integer, you know
13:12 pmoreau: I do know that :D
13:12 pmoreau: But it's different to say that you can't change the pointer, or you can't change the value of the pointer.
13:12 AndrewR: hm, interesting. running blob + windowed glxgears only raises temperature (according to nvidia-settings display) to 83-84-86c, and it works OK in this mode (clocks are at 500Mhz). But if I maximize gears to fullscreen - temp raised to 88+, and then ..same (symptomatically) hang as with nouveau@highest perf level.
13:13 AndrewR: so, may be it 'just' temperature-related shudown
13:13 imirkin_: pmoreau: ... who cares about the value of the pointer? for all you know, the kernel just does pointer arithmetic.
13:13 imirkin_: pmoreau: actually, i don't know that there's a distinction between "the pointer" and "the value of the pointer"
13:13 imirkin_: pmoreau: there's "the value pointed to by the pointer", which is a separate thing
13:14 pmoreau: Sorry, I meant the latter by the value of the pointer
13:14 imirkin_: the kernel doesn't receive the data pointed at by the pointer in its args. just the pointer.
13:14 imirkin_: i.e. constant across all invocations.
13:15 pmoreau: Right, but I was wondering if constbuf would also include the data pointed by the pointer, and so also apply its constness to it
13:15 imirkin_: constbuf contains whatever you stick into it
13:16 imirkin_: i'm suggesting you stick the kernel args into it
13:17 pmoreau: Yeah, of course --" Kinda mixing everything again...
13:18 pmoreau: The data would have been allocated separetely on the GPU, so no need to care about it. We just need the pointer to it and offset all accesses to the element by the pointer value
13:22 pmoreau: I don't need to export then, as there is no return value, and the data is stored separetely
13:23 imirkin_: :)
13:23 imirkin_: glad you figured it out all by yourself
13:23 pmoreau: :-D
13:24 imirkin_: note that nv50 has some funny restrictions around memory accesses from shaders
13:24 imirkin_: specifically it's limited to 32-bit
13:24 imirkin_: ... but it's a 40-bit virtual address space
13:24 pmoreau: I still needed to be carefully babysitted by you. Thanks! :-)
13:24 imirkin_: a bit sad.
13:25 imirkin_: so you have to make sure all the resources fit within 32 bits of each other, and set the base address accordingly
13:25 AndrewR: in theory nvapeek 0x15b4 should give me 'raw' integrated temp. sensor value? for nv43 .... after nvapoke 0x15b0 0xff
13:26 pmoreau: Instant cooling? :D
13:28 pmoreau: I don't understand how you can get to 88+ degrees while just running glxgears in full screen mode with the blob.
13:29 imirkin_: my fanless nv44 gets that hot easily
13:29 imirkin_: however my fanful nv42 stays pretty cool
13:29 pmoreau: Wow
13:30 AndrewR: imirkin, may be something else failed
13:30 pmoreau: The card did accumulate to much dust?
13:31 AndrewR: pmoreau, very possible
13:31 pmoreau: That could be an easy fix then :D
13:32 AndrewR: as far as I understand nouveau disables tmp sensor (integrated) due to lack of those >if (!sensor->slope_div || !sensor->slope_mult || !sensor->offset_num || !sensor->offset_den)
13:41 AndrewR: from mmiotrace .. [0] 437.810457 MMIO32 R 0x0015b4 0x0800012c PBUS.THERM.STATUS => { SENSOR_RAW = 0x2c <skipped> [0] 591.968575 MMIO32 R 0x0015b4 0x08000031 PBUS.THERM.STATUS => { SENSOR_RAW = 0x31 | ADC_CLOCK_XXX = 0x4 } . So, it looks like it uses integrated sensor? But it only 'talks' if temp actually change (not pulled each N msec for example)?
13:44 pmoreau: I guess there is a system of alarm, to generate a CPU interrupt if temp goes too high?
13:44 pmoreau: Maybe there is another reg to get the temp
13:45 pmoreau: And this other one gets pulled regularly?
13:51 AndrewR: pmoreau, anyway, I'll try this nvapeek'ing on blob (while watching temperature monitor), amy be table with 'real' temperature vs sensor values will help
13:51 AndrewR: *may be
13:51 AndrewR: but not right now ...
13:58 karolherbst: mupuf: okay, init the counters is done now :)
14:26 karolherbst: mupuf: I end today with this: https://github.com/karolherbst/nouveau/commit/d7b02585493895643f93581b89a7fada958054eb#diff-5e5cb4582f6faff078d1cad6144b248a
14:26 karolherbst: doesn't store the values, but init seems okay and the reset works
15:31 mupuf: cool, I did not remembered that my patch series to simplify writing code for ppwr landed :)
16:56 karolherbst: can I do a floating point div on the falcon?
16:56 imirkin_: i don't think falcon does floating point at all
16:57 imirkin_: i wouldn't be surprised if there were no idiv either
16:57 karolherbst: there is div
16:57 karolherbst: and mod
16:57 imirkin_: ah ok
16:57 karolherbst: mhh I can also do stuff with integer percentage values though :/
16:58 karolherbst: but then I need 64 bit values kind of
16:58 karolherbst: mhhh
16:58 imirkin_: no div on the xtensa procs... i found a very weird function in their logic which i eventually determined was idiv :)
16:58 karolherbst: I have to divide two integers with max size 0x80000000
16:58 karolherbst: :D
16:59 imirkin_: why not just report both integers
16:59 imirkin_: and let userspace deal with it?
16:59 imirkin_: note that there's no floating point in the linux kernel either
16:59 karolherbst: this is for dyn reclock
16:59 karolherbst: and the counters are checked every 0.1 seconds
16:59 karolherbst: we could let the kernel handle that, but interrupts every 0.1 seconds?
17:00 imirkin_: wait, what are you trying to do
17:00 karolherbst: read counters out
17:00 imirkin_: if x / y > 0.5 or something?
17:00 karolherbst: calculate load of the engines
17:00 imirkin_: in order to compare to something?
17:00 karolherbst: throw that into a calc cstate/pstate algo
17:00 imirkin_: or in order to report the load?
17:00 karolherbst: and notify the host, if cstate/pstate should change
17:00 imirkin_: so... to compare
17:00 imirkin_: remember that if x / y > 0.5 is the same thing as if 2x > y
17:01 karolherbst: yeah
17:01 karolherbst: but these numbers are a litlte bit bigger
17:01 imirkin_: if the values are large
17:01 imirkin_: shr both of them
17:01 karolherbst: biggest I saw currently is 0x108f5af
17:01 imirkin_: ok, but you don't really need that much precision. just shr both of them by 8 and move on
17:01 karolherbst: basically I let the counter run for 0.1 seconds
17:01 karolherbst: devide all of them through the ticks in that time
17:02 imirkin_: how many ticks?
17:02 imirkin_: a lot right? more than 256?
17:02 karolherbst: gk104 has 324 per us
17:02 imirkin_: so... more than 256 in 0.1s :)
17:02 karolherbst: yes :D
17:02 imirkin_: just shr both values by 8
17:02 karolherbst: basically I want to do that:
17:02 karolherbst: counter * 100 / total_ticks
17:03 imirkin_: and make sure that for your a/b fraction, both a and b are < 256
17:03 imirkin_: and then you can just mul both values by the relevant things and compare
17:03 imirkin_: and everything stays nice and integer.
17:03 karolherbst: counter is always less than total_ticks
17:03 imirkin_: you lose a tad bit of precision but who cares.
17:03 karolherbst: yeah
17:04 karolherbst: I compare that with two values basically
17:04 karolherbst: target_load and max_load
17:04 karolherbst: target_load is something like 70 or 80
17:04 karolherbst: max_load 95
17:04 imirkin_: hopefully you understood what i was saying.....
17:04 imirkin_: my point is about plain math
17:04 karolherbst: I know
17:04 imirkin_: how to do things with integers
17:04 imirkin_: when all you want to do is compare, not get absolute numbers
17:04 karolherbst: if there would be no silly overflow, it would be easy :D
17:05 imirkin_: i guess you didn't understand what i was saying.
17:05 karolherbst: mupuf added a mul_32_32_64 function, but I want to keep it simplier
17:05 imirkin_: my way there's no overflow.
17:05 karolherbst: ohhh
17:05 karolherbst: yeah right
17:05 karolherbst: now I udnerstood :D
17:05 imirkin_: as long as you don't go crazy on trying to specify a precision for the comparison... limit yourself to numerator and denominator < 256 and you're good
17:06 karolherbst: why 256?
17:07 imirkin_: because you shr both values by 8
17:07 imirkin_: if it goes over 256, you get back into overflow land
17:07 karolherbst: mhh
17:07 imirkin_: let's try this another way
17:07 imirkin_: let's say you want to determine if x/y > a/b
17:07 imirkin_: that's the same as doing
17:07 imirkin_: (x / 256) / (y / 256) > a/b
17:08 imirkin_: which is the same as
17:08 imirkin_: b * (x / 256) > a * (y / 256)
17:08 imirkin_: so as long as a and b are < 256, you can never overflow
17:08 imirkin_: you'll lose a tad bit of precision on the values of x and y, but... meh
17:08 imirkin_: you said those values are big, so it won't matter much
17:09 karolherbst: the absolute max value I can get is 0x5f5e100
17:09 karolherbst: and I will get that
17:10 imirkin_: it doesn't matter
17:10 imirkin_: even if your max value was 0xffffffff my method would still work.
17:10 imirkin_: as logn as the min value is at least 0x100
17:10 karolherbst: nope
17:10 karolherbst: it can be 0x0
17:11 imirkin_: how?
17:11 karolherbst: nothing happens?
17:11 imirkin_: i thought you said it would increment by at least 300 in 1us
17:11 karolherbst: on my gpu there are some counters at 0x0
17:11 imirkin_: ok, i see.
17:11 karolherbst: I think on a deskopt that won't happen, ohh wait
17:11 karolherbst: for video load
17:12 imirkin_: what's the smallest the denominator can be?
17:12 imirkin_: i.e. the "total" that you divide by
17:12 karolherbst: the total is pretty constant
17:12 imirkin_: if that's less than 0x100, then we're toast
17:13 karolherbst: I will read the counters out every 100ms
17:13 imirkin_: otherwise as long as you don't care about tiny fractions, you're good with my method
17:13 karolherbst: and the ticks per 1us are constant
17:13 imirkin_: like 0.5% will stay under the radar... maybe
17:13 imirkin_: but who cares.
17:13 imirkin_: you're going to be comparing to like 10% at least
17:14 karolherbst: yeah, doesn't really matter
17:14 karolherbst: so I just shift both sides by 8 and then I am fine
17:15 karolherbst: ohh right, now I get why the denominator should be at least 256 :D
17:15 karolherbst: I really have to get used to all that
17:15 karolherbst: mhhh
17:15 karolherbst: I was thinking
17:15 karolherbst: why don't I just calculate per 256 instead of percent?
17:16 karolherbst: so I only have to shift the denominator and spare one shift and one multiplication :D
17:16 imirkin_:&
17:16 karolherbst: I also get increased precision that way
17:41 karolherbst: imirkin_: are there any gpu regs where I can just write garbage into and they do nothing?
17:46 mwk: karolherbst: 0, for one
17:46 mwk: PMC_ID
17:46 mwk: it's even used that way in init scripts
17:46 karolherbst: okay
17:46 karolherbst: and all bits are changeable
17:46 karolherbst: nice
17:46 karolherbst: ohhh wait
17:46 karolherbst: 0x0 can't be written to
17:47 mwk: ah, then you use a different meaning of "they do nothing" :)
17:47 karolherbst: yeah
17:47 mwk: are you on PWR?
17:47 mwk: there are plenty of scratch regs on that thing
17:47 karolherbst: I don't want to add all those overhead to get values out of the falcons
17:47 karolherbst: so I just want to write into regs for now
17:47 karolherbst: pwr?
17:48 karolherbst: ohh yeah
17:48 mwk: http://envytools.readthedocs.org/en/latest/hw/pm/pdaemon/host.html
17:48 mwk: knock yourself out
17:48 mwk: just make sure noone else is using them
17:49 karolherbst: DSCRATCH regs?
17:49 mwk: yep
17:49 mwk: or H2D/D2H
17:49 karolherbst: wait
17:50 karolherbst: MMIO 0x5d0+i*4 / I[0x17400+i*0x100], i<4: DSCRATCH[i] ?
17:50 karolherbst: :D
17:50 mwk: it's at 0x10a5d0 in MMIO space, 0x17400 in falcon I[] space
17:50 karolherbst: ohh okay
17:50 mwk: the first one
17:50 karolherbst: okay, these seems to be 0x0 for me
17:51 karolherbst: all 4
17:51 mwk: second is 0x10a5d4 / 0x17500, and so on
17:51 karolherbst: yeah, okay
17:51 karolherbst: I was a bit confused by that "/"
17:51 karolherbst: okay, so I would write into l[0x17400] on falcon?
17:51 mwk: yes
17:51 karolherbst: nice
17:51 karolherbst: I only need 4 anyway :D
17:52 karolherbst: with st?
17:52 mwk: no
17:52 mwk: http://envytools.readthedocs.org/en/latest/hw/falcon/io.html
17:52 mwk: wait, what falcon gen are you using?
17:52 karolherbst: ohh right,
17:52 karolherbst: 4
17:53 karolherbst: but I have to code for all
17:53 karolherbst: okay, nv_iowr it is
17:53 mwk: then it's just I[0x5d0]
17:54 mwk: falcon addressing is... confusing
17:54 mwk: right, there should be helper functions for that
17:54 karolherbst: mhhhh
17:54 karolherbst: maybe just nv_iowr(0x5d0, $r..) ?
17:55 mwk: they should also take care of address shifting for you
17:55 karolherbst: like done everywhere else
17:55 mwk: likely yes
17:55 karolherbst: mwk: that's my current thingy already: https://github.com/karolherbst/nouveau/commit/516c0404e0b13c06ef7b520024edc8f6c0246caa#diff-5e5cb4582f6faff078d1cad6144b248a
17:57 karolherbst: mhhh
17:57 karolherbst: I get 0xffffffff :/
17:57 karolherbst: but the card isn't dead :D
17:57 karolherbst: just my calculation is somehow off
17:58 karolherbst: okay, this looks wrong: shr b32 $r1 0xff
17:58 karolherbst: and then div $r2 $r2 $r1 :D
17:58 karolherbst: I am stupid
17:59 karolherbst: okay, now I get 0x0, this looks much better
17:59 karolherbst: yay
17:59 karolherbst: it works
18:07 karolherbst: mwk: I bet the falcons doesn't know how many pstates and cstates the gpu has, so I need to feed that information from the host?
18:07 karolherbst: can I do that through perf_init?
18:22 xen---: anyone having problems when they plug in their display port cables? I'm using an Quadro 2800M
18:23 xen---: the card is essentially useless when this happens. I have the problem with Ubuntu 15.04 and Fedora 22. If I drop back to Centos 7 (3.10), everything is OK.
21:17 imirkin: skeggsb_: did you see my comments about the kick situation?
21:39 AndrewR: some observations for my nv43: pwm fan control under nouveau work! I can set any speed, and it sounds different, and raw temp sensor data seems to confirm speed changes.
21:41 AndrewR: also, nvidia at least in its control panel (nvidia-settings) says it uses internal sensor for temperature graph. so, may be nouveau just missed where exactly those calibration (?) parameters stored in exactly this vbios
21:44 AndrewR: also, nouveau at boot level gives bigger values in this temp register than under nvidia. This can mean nouveau not disables as much as nvidia .. but back to my temperature-related shutdown hypothesis - it was falsified by setting under nouveau fan to 1% of its full speed and watching raw temp sensor value goes up to 49 (blob died at highest perf level when same reg was at just 45)
22:10 AndrewR: imirkin, I've added bug for my reclocking problem on unmodified nouveau (so it will not lost, hopefully). But I'm not sure if attached by bugzilla as text/plain vbios still ok?
22:10 AndrewR: https://bugs.freedesktop.org/show_bug.cgi?id=92377
22:10 imirkin: attach it as application/octet-stream
22:12 AndrewR: imirkin, reattached
22:13 imirkin: could have just changed the mime type...
22:16 AndrewR: at least md5sum of orig file and just downloaded stays the same.
22:18 AndrewR: is there any specific category for apitrace tool? (want to enter my bug with unconditional sse2-enabled compilation)
22:20 imirkin: github.com/apitrace/apitraces/issues ?
22:20 AndrewR: imirkin, ok (I was looking at freedesktop.org bugzilla)
22:45 imirkin: hmmmm... i wonder if i can limit fence emission to situations where the fence refcnt > 1...
22:46 AndrewR: https://github.com/apitrace/apitrace/issues/394 - someone might be surprised such old machines still in use :}
22:47 imirkin: AndrewR: i'm trying to make nouveau work better on a ppc g5 with a nv34
22:47 AndrewR: imirkin, g5 has altivec and surprizingly hv virtualization ...so, not as old feature-wise :}
22:49 imirkin: ooh, nice. celeron 1000 -- is that from the original p4 line? p3 only went up to ~600mhz iirc...
22:49 AndrewR: imirkin, no, its from p3 line :} p4-based ones had sse2
22:49 imirkin: ah right.
22:49 imirkin: oh, it was p2 that went up to 500mhz or so
22:49 imirkin: p3 went higher, that's right.
22:50 imirkin: still one of the slot ones?
22:50 imirkin: or already back to socket?
22:51 AndrewR: imirkin, s370 as far as I saw ..now machine sandwiched between two other cases
22:51 imirkin: heh ok
22:53 AndrewR: https://techuman.wordpress.com/2014/03/24/g4-and-g5-powerpc-virtualization-with-qemu-2-0-and-kvm/ - I was surprised even g4 had hw virtualization ....
22:54 imirkin: well, i'll leave virtualization to another day
22:54 imirkin: this box runs pretty unreliably as is
22:54 imirkin: i don't need additional problems
22:56 AndrewR: imirkin, I wish you good luck
22:56 imirkin: thanks
22:58 imirkin: unfortunately my laziness is counteracting me quite aptly