00:09 karolherbst: how can I execute a game with noops shaders?
01:09 karolherbst: imirkin: is there any documentation about all the known/used gpu instructions in nouveau?
01:45 RSpliet: pmoreau: j'ai recu une boîte! ;-)
01:45 RSpliet: merci!
01:48 pmoreau: Ouah, ce fut rapide !!
01:48 pmoreau: Is the card ok?
01:50 RSpliet: I haven't opened it yet
01:50 RSpliet: will do so when I get home :-)
01:50 pmoreau: ;-)
02:08 karolherbst: :)
02:08 karolherbst: I would really like to be able to debug some shaders and gpu code, but didn't find much how to do so :/
02:15 RSpliet: karolherbst: debug is a broad term
02:16 RSpliet: have you spotted a bug?
02:16 karolherbst: I meant like to execute noop code instead of the real thing, even partly. "disable" shaders or specific types of them
02:16 karolherbst: no, just performance stuff
02:17 RSpliet: hmm, I don't think the driver has a lot of sight on which shader is which
02:17 karolherbst: I already got the instructions to be printed in stdout, so that's at least something
02:17 RSpliet: 3D programs generate and upload them, at least nouveau has no metadata to identify specific shaders
02:18 karolherbst: mhhh
02:18 karolherbst: so I can only noop all of them and check how much better the performance is?
02:18 RSpliet: while you're not seeing a thing? :-D
02:19 karolherbst: yep
02:19 karolherbst: :D
02:19 RSpliet: I'm not sure what that information will help you with
02:19 RSpliet: but you can try...
02:19 karolherbst: but the optimized shaders have a lot of mul and mad
02:19 RSpliet: yep
02:19 karolherbst: usually I would think these could be somehow combined, but :/
02:19 RSpliet: mad already is a combined multiply+addition
02:20 karolherbst: I am not that good with assembly in general, so I would really need a descriptions of the "parameters"
02:20 karolherbst: yeah, that much I know
02:20 karolherbst: but I don't know anything about the order of the regs and such
02:20 RSpliet: you can't just combine more muls in there, having one fat multiplier in your ALU is already bad enough
02:21 karolherbst: there are also some neg
02:21 karolherbst: neg is just *-1 ?
02:21 RSpliet: yes
02:21 RSpliet: well
02:21 karolherbst: src first?
02:21 karolherbst: or is dest first?
02:21 karolherbst: :D
02:22 RSpliet: for ints neg is implemented as ^ (~0) + 1 usually, that's a lot cheaper
02:22 RSpliet: it's a property for each of the inputs
02:23 RSpliet: or, on NV50 I think it was... maybe not each input though
02:23 karolherbst: so if I have something like "629: neg ftz f32 $r5 $r23 (8)"
02:23 RSpliet: that looks like the end result is negated
02:24 karolherbst: so
02:24 karolherbst: $r5 is written negated into $r23?
02:25 karolherbst: or each of them are negated into themselves?
02:25 RSpliet: the usual assembly style is <op> <rdest> <rs1> <rs2> <rs3>
02:25 karolherbst: okay
02:25 karolherbst: so $r23 negated into $r5
02:26 RSpliet: no, the result of ftz(-$23) is written into $5
02:26 karolherbst: ahh
02:26 karolherbst: what is ftz?
02:26 RSpliet: see http://envytools.readthedocs.org
02:26 RSpliet: there's some isa documentation on there, despite not being complete can be quite useful
02:27 karolherbst: https://envytools.readthedocs.org/en/latest/hw/falcon/isa.html this one?
02:28 RSpliet: nope
02:28 RSpliet: that's falcon
02:28 RSpliet: you want pgraph
02:30 karolherbst: so somewhere here: https://envytools.readthedocs.org/en/latest/hw/graph/index.html
02:31 RSpliet: yep
02:34 karolherbst: RSpliet: does this look something which can be optimized? https://gist.github.com/karolherbst/4e652910b239f28ab37f
02:35 karolherbst: or at least could be
02:35 RSpliet: not really tbh
02:36 RSpliet: unless there's a way of optimising c1 access
02:37 karolherbst: yeah
02:37 karolherbst: I was thinking about anything SIMD related
02:37 karolherbst: but always the fourth instruction is odd
02:39 karolherbst: what would be best way to check what the blob is doing?
02:39 karolherbst: mmt?
02:42 karolherbst: RSpliet: what about this? https://gist.github.com/karolherbst/4f1f368bc6e89661bea0
02:42 karolherbst: couldn't be the 2.000000 be moved already in the mov?
02:42 karolherbst: and just move the doubled 0xbf800000 value
02:43 karolherbst: ohh, mad is different than I though I guess
02:43 karolherbst: rdest = (rs1 * rs2) + rs3
02:44 RSpliet: yes, so that's not going to work
02:44 RSpliet: and you have to work with limitations like only one intermediate in an instruction (for mad always the second param)
02:44 karolherbst: I see
02:45 karolherbst: that would be my other idea to eliminate the mov :D
02:46 RSpliet: eliminating a single mov is not going to gain you a lot of performance either
02:46 karolherbst: yeah I know, but it still better than doing nothing about it :)
02:46 RSpliet: time is lost in memory access, not in insns
02:46 karolherbst: okay
02:52 karolherbst: what is stuff like BB:3 (3 instructions) - idom = BB:0, df = { BB:4 }?
02:52 karolherbst: branches?
02:55 RSpliet: basic block I think
02:57 karolherbst: okay
05:36 karolherbst: mupuf: what would you say about PDAEMON stuff direclty after writes into PCLOCK.CLK0_* ?
05:37 mupuf: karolherbst: can I see your traced?
05:37 karolherbst: mupuf: https://gist.github.com/karolherbst/321a8f69ffe2a19f1a88#file-gistfile1-txt-L515-L522
05:37 karolherbst: the trace is in git too
05:37 karolherbst: mmiotrace-nvidia-355.06-GTX-770M-clocking.log.xz
05:37 karolherbst: should only contains stuff I did while nvidia-settings was running
05:38 mupuf: it does not look super promising
05:38 karolherbst: sometimes the second write is just 0x4
05:38 karolherbst: and the forth 0x18b82c
05:38 karolherbst: into DATA
05:39 karolherbst: but CLK0 is gpc?
05:39 mupuf: you think the script uploaded to pdaemon is the one handling the voltage?
05:39 karolherbst: don't know, could be
05:39 mupuf: now that I think of it ... it may make sense
05:39 karolherbst: don't find something else in the near of the CLK0 stuff
05:39 karolherbst: which would do that
05:40 karolherbst: and its like always there
05:40 karolherbst: also 0x0 W after R from CLK0
05:40 mupuf: since nvidia wanted to control more tighly what is going on their gpu, having pdaemon changing the voltage could help for this
05:40 karolherbst: some lines before
05:40 karolherbst: yeah
05:41 karolherbst: you could check my trace from git, its pretty much related
05:41 mupuf: still at work :)
05:41 karolherbst: found this like 6 times now, same order and stuff
05:41 karolherbst: I see
05:42 mupuf: well, I guess it would be good to check why demmio did not decode the script
05:42 karolherbst: there are some more which don't get decoded
05:42 karolherbst: I was suprised after I saw one decoded
05:42 mupuf: maybe because they are sent to a process we do not know
05:44 karolherbst: there are also sometimes 0 writes into PDAEMON after CLK0 writes, but that may be related to the fact, that the voltage doesn't need to be changed
05:46 karolherbst: wow yeah
05:46 karolherbst: if there is no COEF write there is only empty PDAEMON data
05:46 karolherbst: if COEF gets written, there is something written into PDAEMON.data
05:49 karolherbst: there is a lot of PDAEMON stuff going on in general anyway
05:50 karolherbst: mupuf: should all PDAEMON.DATA writes be decoded into some kind of assembly?
05:58 mupuf: nope, it may not always
05:58 mupuf: most of the time, it is RPC-like information
05:58 mupuf: so we should see function calls, not assembly :p
05:59 karolherbst: I see
06:00 karolherbst: I think the next good question would be: how can I RE such PDAEMON writes and check if these are really scripts we just don't recognize as such
06:37 karolherbst: mupuf: seems like it really is no SEQ script
06:37 mupuf: it would be decoded if it was
06:37 karolherbst: mhh right
06:38 karolherbst: I guess there is no pretty way to RE what this thing does?
06:39 karolherbst: mhh wait
06:39 karolherbst: ...
06:41 karolherbst: mupuf: does voltage stuff work for other the nve6 inside git?
06:41 karolherbst: seems to be on of yours
06:41 mupuf: yes, it works on my nve6
06:42 mupuf: and all my kepler cards
06:42 karolherbst: okay nice
06:42 karolherbst: becuase this card has the same PDAEMON stuff
06:42 mupuf: and the only pretty way to reverse this ... is to look at the assembly code of pdaemon
06:42 mupuf: good
06:42 mupuf: the pdaemon stuff is likely for power gating then
06:42 karolherbst: how does the vltage thing looks like in demmio?
06:43 mupuf: well, look for GPIO
06:43 karolherbst: okay
06:43 mupuf: it is super obvious
06:43 mupuf: I already compared the two traces
06:43 mupuf: but suit yourself
06:48 karolherbst: tag 0x73 ?
06:48 karolherbst: the VID gpio for core voltage
07:03 mupuf: one of the VIDs
07:03 mupuf: there are more than one
07:20 karolherbst: I think I need to create a mmiotrace like the on I created here locally :/ does X over ssh work with reator?
07:31 imirkin: karolherbst: nv50 isa is at least partly documented on envytools.rtfd.org -- but it's all mostly self-explanatory. let me know if you have questions.
07:34 imirkin: karolherbst: in general the compiler is pretty good at doing arithmetic optimizations, but its knowledge is definitely not infinite -- there will often be small things that you could do better, but probably don't have that much effect
07:34 karolherbst: okay
07:34 karolherbst: It would be just nice to know where the real bottleneck is for a given application
07:34 imirkin: karolherbst: an interesting bit of nvc0 (and i think nv50) isa is that there are post-multipliers on the FMUL instruction, i.e. you can do a * b * N where N \in (0.125, 0.25, 0.5, 2, 4, 8)
07:35 imirkin: the compiler tries to do a good job with those, but it's not infinitely smart about it
07:35 karolherbst: I see
07:36 imirkin: ftz = flush denorms to 0
07:36 karolherbst: so reg set to 0 before actual write?
07:36 imirkin: before read actually
07:36 karolherbst: I see
07:37 karolherbst: but only for the dest reg
07:37 imirkin: (i think!)
07:37 karolherbst: mhhh
07:37 imirkin: no, for the source regs
07:37 karolherbst: ohh
07:37 imirkin: on alu input basically
07:37 imirkin: this is important if e.g. you're comparing something to 0
07:37 karolherbst: yeah okay
07:37 imirkin: a denorm would be > or < 0 by a trivial amount
07:38 mwk: shouldn't that be on output too?
07:38 imirkin: but if you use ftz, the comparison turns out equal
07:38 karolherbst: mhh
07:38 karolherbst: but why using ftz on mul and mad?
07:38 imirkin: mwk: i'm really not sure tbh where the ftz'ing happens
07:38 mwk: I'd think both...
07:38 imirkin: could be, yeah
07:39 imirkin: karolherbst: results are different... denorm * big number = big number. 0 * big number = 0
07:39 karolherbst: totally forgot about this
07:39 mwk: I'm brewing up a sw implementation of the Tesla/Fermi float and sfu instructions btw, should be added to hwtest soonish
07:40 mwk: would be nice to verify the ISA stuff
07:41 mupuf: ah ah, nice :)
07:42 mwk: I have most of the code already for sfu, I just need to make a test runner :)
07:42 karolherbst: mupuf: for my card there is something unknown on PTHERM: 0x020344
07:43 karolherbst: and never touched for your kepler
07:43 karolherbst: 0x020340 is another one
07:45 mwk:had a lot of fun figuring out how they get reasonable accuracy using two 26-bit LUTs until he found out it's actually three LUTs, 26-bit, 16-bit and 10-bit
07:46 karolherbst: :D
07:49 mupuf: mwk: ... wow
07:49 mupuf: make that a blog post@!
07:49 mupuf: karolherbst: interesting finding!
07:49 karolherbst: there is also a third unkown one, but this also happens for yours
07:49 imirkin: mwk: wow awesome! can you figure out wtf fchk does?
07:49 karolherbst: 0x020144
07:50 mupuf: this one I may already know
07:50 karolherbst: 0x020340 is always 0x60
07:51 karolherbst: but it also gets written
07:51 karolherbst: 0x020344 looks more interessting
07:51 karolherbst: values are 0x26 0x28 0x2f
07:52 karolherbst: writes the same but like 0x80000026
07:52 karolherbst: the 8 is always there
07:52 karolherbst: reg doesn't seem to change without the writes
07:52 karolherbst: 0x30 and 0x2d are also possible
07:53 karolherbst: card starts with 0x30
07:53 mwk: imirkin: fchk?
07:53 mwk: that's a Fermi thing, isn't it?
07:53 imirkin: mwk: yes
07:53 mwk: yeah, I suppose
07:53 imirkin: mwk: i've never seen an nvidia shader use it
07:53 imirkin: so it probably isn't that useful
07:53 imirkin: but i've been curious ;)
07:54 imirkin: mwk: there's also a FCHK.DIVIDE variant of it
07:54 karolherbst: maybe some d3d only stuff?
07:54 mwk: imirkin:
07:54 mwk: "Table 5. Kepler Instruction Set"
07:54 mwk: "FCHK FP32 Division Test"
07:54 imirkin: where is that?
07:54 mwk: according to nv ISA documentation
07:54 mwk: http://www.ece.lsu.edu/koppel/gp/refs/cuobjdump.pdf
07:55 mwk: unfortunately I still have no idea what this is :p
07:55 imirkin: well, i knew it had something to do with division based on the FCHK.DIVIDE thing ;)
07:55 tobijk: testing if the fp is accurate? :D
07:56 imirkin: and CHK definitely gave me a clue it was testing something
07:56 mwk: hmm
07:57 mwk: maybe the final fixup iteration to get a/b out of a * rcp(b)?
07:57 imirkin: that table doesn't have TMML
07:57 mwk: no idea really
07:57 mwk: yeah, that table only lists stuff that could be possibly used by CUDA
07:57 imirkin: ah ok
07:57 mupuf: karolherbst: looks like you found a very possible candidate!
07:57 karolherbst: ...
07:57 mupuf: I do not remember seeing this reg on my maxwell
07:58 karolherbst: there is also 0x2e
07:58 karolherbst: wait a second
07:58 mupuf: 20340 would be the DIV
07:58 mupuf: and 20344 would be the DUTY
07:58 mupuf: bit 31 is commit
07:58 mupuf: a small value like 60 is expected
07:59 mupuf: because the PWM frequency should be high
07:59 karolherbst: where is the stupid voltage table :/ :D
07:59 mupuf: somewhere in the bios, I guess :)
08:00 mupuf: and on top of that, we need to check what is the clock tree for this PWM controller
08:00 mupuf: I would first check if setting a low voltage yields to a crash
08:00 karolherbst: don't know if I saw the values inside nvbios or becuase of code hacks
08:00 mupuf: yeah, maybe the PWM value is written as the voltage tag
08:00 mupuf: have fun checking htis out
08:01 mupuf: as for me, I am still at work!
08:01 karolherbst: mhhh
08:01 karolherbst: voltage tags are not bigger than 47
08:01 karolherbst: and bigger than 10
08:01 karolherbst: wait
08:01 karolherbst: I dumpt the cstates
08:02 karolherbst: tag is 10 for 07 pstate
08:02 karolherbst: 10-47 for 0a and 0f cstates
08:03 karolherbst: but I won't rely on that
08:03 karolherbst: nouveau only clocks down to 405MHz anyway
08:03 karolherbst: and the blob even does 135MHz
08:07 karolherbst: mupuf: whats the most noticable reaction if thats the right thingy?
08:08 mupuf: black screen :D
08:08 karolherbst: okay... now the gpu crashed while going to 0a
08:08 karolherbst: ouveau E[ PGRAPH][0000:01:00.0] TRAP ch 2 [0x00bf88f000 glxspheres[4258]]
08:08 karolherbst: fan got loud too
08:09 mupuf: and the screen is black?
08:09 karolherbst: how should I know? my card doesn't have one
08:09 karolherbst: or you mean like intel too?
08:09 mupuf: ah, sorry, laptop
08:09 mupuf: well, that's as good as it can be
08:09 mupuf: but do not reclock
08:09 mupuf: just drop the voltage
08:10 mupuf: until it crashes
08:10 mupuf: then upclock
08:10 mupuf: and drop the voltage until it crashes
08:10 mupuf: then downclock
08:10 mupuf: and do the same thing
08:10 mupuf: then you can check that the values where the gpu crashes are values close to what the blob sets
08:10 karolherbst: okay
08:10 karolherbst: so highest clock first?
08:10 mupuf: with close not having to be really close
08:11 mupuf: it does not matter
08:11 karolherbst: well 07 doesn't have any cstates
08:11 karolherbst: so I have to use 0a anyway
08:11 karolherbst: how fortune that I am able to clock :O
08:11 mupuf: hehe
08:11 karolherbst: stupid hacks :D
08:14 karolherbst: 0x80000027 and gone
08:14 karolherbst: on highest cstate
08:14 karolherbst: 0x80000028 to 0x8000002f worked without issues
08:15 karolherbst: I am sure 0x80000026 worked on 07 pstate
08:15 karolherbst: fans are loud as crazy after setting the reg
08:17 karolherbst: mupuf: I hope you don't mind I test all the 36 cstates and just test like 4 or 5
08:17 karolherbst: 37 actually
08:17 mupuf: test 3
08:17 mupuf: there is no freaking point in testing everything
08:17 mupuf: test the extremes
08:17 karolherbst: testing 20: {gpc: 1307000, mem: 810000, voltage: 31} now
08:17 karolherbst: pretty in the iddle
08:17 karolherbst: 36: {gpc: 1725000, mem: 810000, voltage: 47} is highest
08:18 karolherbst: 0x80000027 works now
08:20 karolherbst: mupuf: !
08:20 karolherbst: cstate 20 gone at 0x80000017
08:20 karolherbst: like 16 under the other one :D
08:21 karolherbst: coincidence?
08:21 karolherbst: :)
08:21 karolherbst: imirkin_: I think I found it :)
08:23 karolherbst: 0 cstate works with 0x80000017
08:23 karolherbst: doesn't like 0x80000011 though
08:24 mupuf: well, sounds good
08:24 mupuf: congrats! You found it for your kepler
08:24 karolherbst: ...
08:24 mupuf: hopefully, it is there too for my maxwell
08:24 karolherbst: yeah maybe
08:24 karolherbst: I check
08:24 karolherbst: nv117?
08:25 karolherbst: mupuf: guess what
08:25 karolherbst: mupuf: https://gist.github.com/karolherbst/11a3e7b08fd3da2aafd7
08:26 mupuf: how the heck did I miss it? ...
08:26 karolherbst: solution to dull
08:26 mupuf: well, you have earned a virtual cookie!
08:27 karolherbst: I saw this damn reg weeks ago
08:27 karolherbst: but I thought: PTHERM yeah temp sensor or stupid stuff
08:27 karolherbst: ....
08:27 mupuf: yeah, PTHERM is misnamed
08:27 mupuf: it is a beast
08:27 karolherbst: dull solution: find regs in mine trace but not in yours
08:27 mupuf: check 20340 plz
08:28 karolherbst: 0x60
08:28 mupuf: great
08:28 karolherbst: wait....
08:29 karolherbst: ohh no, don't know what the 0x60 means
08:29 karolherbst: so
08:29 karolherbst: what next?
08:29 karolherbst: mhh envytools PR :)
08:30 karolherbst: I think my EC is messed up
08:30 karolherbst: CPU at 50°C and still full speed fan
08:31 mupuf: yeah, you need to reboot
08:31 karolherbst: suspend?
08:31 karolherbst: well
08:31 karolherbst: that seems to have worked
08:32 karolherbst: mupuf: name PTHERM.PWM_VID ?
08:32 karolherbst: nve6+?
08:33 mupuf: nve4+
08:33 karolherbst: k
08:34 mupuf: when DIV=0x60, and if the input clock is 27MHz, we get a PWM frequency of 281250 Hz
08:35 mupuf: try to look for this number in your vbios
08:35 mupuf: the input clock is likely 27MHz, as it is the frequency of the Quartz
08:35 mupuf: err, crystal
08:37 karolherbst: yeah
08:37 karolherbst: 27Mhz sounds right
08:39 karolherbst: cstate 20 has voltage 31: in bios -- ID = 31, link: 35, voltage_min = 800000, voltage_max = 912500 [µV] --
08:40 karolherbst: cstate 20 was gone at 0x17
08:42 karolherbst: but there is no 281250 Hz in the vbios
08:51 karolherbst: strange
08:55 karolherbst: mupuf: looks good for now? https://github.com/karolherbst/envytools/commit/dde9a31023bf460da53ce4a5f80cbf6e23951628
08:55 karolherbst: after we figure out what the other reg is about and what the values are saying I will add the information in that commit
09:05 mupuf: <reg32 offset="0x344" name="PWM_VID_DUTY" variants="GK104-">
09:05 mupuf: <reg32 offset="0x340" name="PWM_VID_DIV" variants="GK104-"/>
09:06 mupuf: it is present on all maxwells and all kepler
09:06 mupuf: the register does not magically appear when an assembler requires it :)
09:11 karolherbst: how should I call the 0-7 bits?
09:13 koz_: Can this card reclock properly under nouveau? https://www.thinkpenguin.com/gnu-linux/geforce-8400gs-1gb-pci-express-20-video-card-gnulinux-full-low-profile-brackets
09:14 specing: lol, that thing is ancient
09:14 koz_: Well, yeah, the 1G memory is kinda a hint to that. :P
09:17 specing: it is actually from ~ 2007/2008
09:18 mupuf: karolherbst: DUTY
09:19 mupuf: did you use nvascan to check out how large the reg is?
09:28 imirkin_: koz_: there are a few iterations of the 8400GS
09:28 imirkin_: koz_: i can't guarantee it, but it *looks* like that one's probably the GT218, which can reclock on linux 3.19+
09:28 imirkin_: koz_: the other possibility is that it's the G98, in which case, no dice (at least for now... Roy's looking at it)
09:28 imirkin_: it's PCIe 2.0, which means it's not the G84
09:29 imirkin_: they don't provide any of the additional details which sometimes help to differentiate
09:30 imirkin_: but note that either way it's a very weak card, irrespective of any inefficiencies that nouveau might have.
09:30 koz_: imirkin_: Thanks for the clarification.
09:31 imirkin_: koz_: take a look at https://en.wikipedia.org/wiki/GeForce_8_series#Technical_summary
09:32 imirkin_: i guess ddr2 vs ddr3 would be the differentiator
09:32 imirkin_: but it doesn't say on there
09:32 imirkin_: (also i think there's often variation in those things)
09:32 imirkin_: 1G ram suggests it's the GT218
09:32 imirkin_: but again, these things aren't 100% accurate.
09:33 koz_: I think I need to ask the guy behind ThinkPenguin to clarify this.
09:33 imirkin_: (aka don't get mad at me when you buy it and it turns out to not be the GT218)
09:33 koz_: imirkin_: It's OK - I'll ask for clarification before putting down any cash.
09:35 imirkin_: koz_: btw, why are you looking at buying an older gpu?
09:36 koz_: imirkin_: I was more checking it out for curiosity's sake - I stumbled across it by accident.
09:36 koz_: Also partly the fact that my current one still doesn't properly reclock...
09:37 imirkin_: koz_: ah ok. what do you have?
09:37 koz_: How do I check the exact model?
09:37 imirkin_: lspci -nn -d 10de:
09:37 koz_: I'll just SSH into that computer and tell you.
09:38 koz_: GK107
09:38 imirkin_: ah yeah
09:38 imirkin_: that's a lot newer. and a lot faster at the lowest perf level than the GT218 will ever be.
09:38 koz_: OK, thanks for clearing that up.
09:38 koz_:hopes there'll be progress on my card's reclocking.
09:38 imirkin_: (i sort of assume... i guess i don't know for sure)
09:40 karolherbst: mupuf: https://github.com/karolherbst/envytools/commit/f2d9344f7aa30bd1beb3eb966ad12ed6415dce0c
09:40 karolherbst: mupuf: no
09:40 karolherbst: mupuf: 020344: 0000002d 00000fff 00000000
09:40 imirkin_: so high=12
09:41 karolherbst: not 11?
09:42 mupuf: yep, 11
09:43 karolherbst: https://github.com/karolherbst/envytools/commit/1004c7bb5b533de289faeee7456a003d0388f5fa
09:46 imirkin_: er, duh. 12 bits :p
09:48 karolherbst: mupuf: there has to be some indicator which tells the driver how to set the voltage I assume
09:50 karolherbst: oh well
09:50 karolherbst: found a bunch of more unkown PTHERM regs
09:51 karolherbst: 0x0203c0 to 0x0203dc
09:51 mupuf: your gist looks good :)
09:51 karolherbst: thats a commit, but okay, nice :)
09:51 mupuf: as for the indicator, it is likely in the vbios
09:51 karolherbst: https://github.com/envytools/envytools/pull/16
09:51 mupuf: oops, sorry :p
09:52 karolherbst: ohhhh
09:52 karolherbst: mupuf: its easy
09:52 mupuf: what is easy?
09:52 karolherbst: GPIO 0: line 0 tag 0x81 [VID_PWM] OUT DEF 0 param 1 gpio: normal SPEC_OUT 0x5d [???]
09:52 karolherbst: this tells us
09:53 mupuf: yes, it tells us to use PWM
09:53 mupuf: but it does not tell us how to compute DIV nor DUTY?
09:53 karolherbst: mhhh
09:53 karolherbst: wait
09:53 karolherbst: 0x17 should be 0.8V
09:54 karolherbst: at least for me
09:55 karolherbst: wait...
09:55 karolherbst: ahh no, okay fine
09:56 karolherbst: the bios uses 0.006250V steps
09:56 karolherbst: but this is for the voltage ranges
09:56 mupuf: yes :)
09:57 karolherbst: 0x27 was too low for highest cstate
09:57 karolherbst: which needs 0.9V min
09:58 karolherbst: 1/16 = 0.0625
09:58 karolherbst: but I am wrong. 0x18 is 0.8V and 0x28 whould be 0.9V
09:59 karolherbst: checking the maxwell card now
10:00 karolherbst: maxwell card used 0x41 and 0x37 for its values
10:01 mupuf: The values are stored in the vbios somehow
10:01 mupuf: either as an afine function
10:01 mupuf: or anything else
10:01 karolherbst: 0x41 = 1.09375V and 0x37 = 1.08V
10:01 karolherbst: mhhh okay
10:02 karolherbst: checking
10:02 karolherbst: I am sure its not the id though
10:04 karolherbst: mhhh
10:08 mupuf: can't find anything
10:08 mupuf: not sure if the PWM output is to produce a DC voltage that is used as an input by the VR
10:08 mupuf: or if the VR itself gets the PWM as an input
10:10 karolherbst: mhh
10:10 karolherbst: that was wierd
10:10 mupuf: karolherbst: http://www.analog.com/media/en/technical-documentation/data-sheets/ADuM5000.pdf look at the description of pin4
10:11 karolherbst: I got the pwm down to 0xc
10:11 karolherbst: even fps droped from 430 to 330 in glxspheres
10:11 mupuf: what?
10:12 mupuf: I doubt the gpu would actively monitor timings and react
10:12 karolherbst: it doesn't
10:12 mupuf: but .... you never know how crazy they could have gone
10:12 mupuf: it doesn't?
10:12 karolherbst: it was stable at 430 from 0x2d to around 0x10
10:12 imirkin_: mupuf: could be though... some sort of tdp situation
10:12 karolherbst: but then there were also errors in dmesg
10:12 mupuf: imirkin_: how could it do that?
10:12 imirkin_: but in the opposite direction :)
10:12 mupuf: karolherbst: are you running the blob?
10:13 imirkin_: karolherbst: well normally you just need higher voltage to make the damn thing work at all, it's not to increase/decrease speed of something
10:13 imirkin_: and increasing voltage doesn't, by itself, improve anything once the thing works
10:13 karolherbst: no
10:13 karolherbst: nouveau
10:13 imirkin_: of course increased voltage often allows you to also increase clocks
10:13 mupuf: exactly, when the voltage is not high-enough to keep the timings, all hell breaks loose
10:13 mupuf: and BOOM ... nothing happens anymore
10:14 karolherbst: I try to get my demsg somehow
10:14 karolherbst: but journald ....
10:14 mupuf: really have to go, see you!
10:16 karolherbst: nouveau E[ PFIFO][0000:01:00.0] LB_ERROR
10:16 karolherbst: nouveau E[ PFIFO][0000:01:00.0] FB_FLUSH_TIMEOUT
10:17 karolherbst: Aug 13 19:06:19 pingu kernel: irq 16: nobody cared (try booting with the "irqpoll" option)
10:17 karolherbst: https://gist.github.com/karolherbst/9eec4945ead6dcfea0cb
10:17 karolherbst: thats what I got
10:17 rektide: is NV110 at all usable? on the FeatureMatrix it's listed as WIP
10:17 karolherbst: working on it
10:17 karolherbst: like currently
10:17 rektide: :)
10:18 karolherbst: currently figured out how to set voltage a bit
10:18 karolherbst: needed for my kepler card, too
10:19 imirkin_: rektide: kinda sorta
10:19 imirkin_: rektide: it's not great, but it sorta works
10:19 karolherbst: imirkin_: any idea about those errors? or is it just fallout because voltage toooo low
10:19 imirkin_: the thing that karolherbst is working on is largely unrelated to anything you'd immediately care about
10:19 imirkin_: karolherbst: i'd guess the FB just fell off the bus, leading to confusion
10:30 karolherbst: does pstates work on maxwell?
10:30 karolherbst: or not
10:30 karolherbst: I mean pstates itself
10:31 imirkin_: karolherbst: no reclocking has been done on maxwell. i'm not sure that ben even really figured out where all the pstate info is to begin with
10:31 imirkin_: much less the reclocking of it
10:32 imirkin_: also nvbios was def not updated for it
10:38 imirkin_: glennk: assuming that blend/alphatest isn't enabled, there's no difference between R32F, R32UI, R32I rt format right?
10:39 imirkin_: [i'm trying to think of some big hammers to use to sort out some fermi-only issues with blitting Z32F_X24S8... same exact logic works on both nv50 and kepler]
10:52 glennk: imirkin_, guess it depends on if the blitter preserves NaN values etc or not
10:53 glennk: or if the tiling block layouts are different
10:55 imirkin_: glennk: hrmph.... right. interesting point about tiling.
10:56 imirkin_: i wonder if that's why a bunch of the MS depth/stencil piglits fail
10:56 imirkin_: glennk: the NaN thing is why i want to switch it from RG32F to RG32UI
10:56 imirkin_: it's just a texture format, so the shader won't be the wiser
10:57 glennk: radeons have a bit to set in the blender if nans are preserved or not
11:01 karolherbst: imirkin_: mhh I still need a hack like this so that reclocking works for me: https://github.com/karolherbst/nouveau/commit/5554a27415b61a59f1667074cd2162c9f2470cdf
11:01 karolherbst: I think there is some kind of arithmetic issue inside the code
11:01 karolherbst: but nether bothered enough to actually check it
11:02 karolherbst: there are others with the same issue as well
11:02 imirkin_: start printing like it's going out of style ;)
11:03 imirkin_: i.e. what is the requested voltage, and what are the options
11:03 karolherbst: yeah I know, did it once, but never went deep enough
11:03 karolherbst: I think this is a mismatch between a calculation and a table lookup
11:44 karolherbst: imirkin_: what are these entries in debug=debug? [ 5691.218740] nouveau D[ VOLT][0000:01:00.0] VID 00: 600000uv
11:44 karolherbst: [ 5691.218741] nouveau D[ VOLT][0000:01:00.0] VID 01: 610176uv
11:44 karolherbst: I mean, where do they come from
11:44 imirkin_: uhm... rtfs? :)
11:44 imirkin_: [vbios somewhere]
11:44 karolherbst: wait ...
11:44 karolherbst: I think
11:45 karolherbst: are these calculated?
11:45 karolherbst: vbios doesn't have these
11:46 imirkin_: you mean nvbios doesn't print them? that's entirely possible.
11:46 karolherbst: I have different values
11:46 karolherbst: "Voltage map table"?
11:46 imirkin_: just read the source for where nouveau reads those from.
11:46 imirkin_: don't worry about nvbios
11:47 karolherbst: mhh you have to know, that nouveau can't read my current voltage anyway
11:47 karolherbst: but the nvbios stuff seems right
11:48 imirkin_: figure out where the prints are coming from
11:48 imirkin_: and figure out what they're printing
11:49 imirkin_: and you'll have your answer
11:49 karolherbst: mupuf: https://gist.github.com/karolherbst/1c96ed575c1d1105e7ee
11:49 phillipsjk256: voltage in µV seems very... precise.
11:50 karolherbst: ohh wait
11:50 karolherbst: found it
11:50 karolherbst: completly forgot about my patch
11:51 karolherbst: imirkin_: "Voltage table"
11:51 karolherbst: imirkin_: this patch needed to get all in envytools: https://github.com/karolherbst/envytools/commit/f684eadaa33eb8d4677f5900b80db0e1e7788aec
11:55 karolherbst: imirkin_: okay the issue is, it gets the min voltage value out of the Voltage map table with the VID from the cstate and tries to find the exact voltage from the Voltage table
11:55 karolherbst: which are completly off
12:51 karolherbst: imirkin_: what about unknown regs inside PGRAGH?
12:52 imirkin_: karolherbst: what about them?
12:52 imirkin_: there are like 100 million of them
12:52 karolherbst: I created a list of unknown regs inside my trace and just wanted to know if its worth to dig into that
12:53 imirkin_: up to you
12:57 karolherbst: k
13:01 karolherbst: but there should be some memory voltage thing as well for me?
13:01 imirkin_: yea
13:01 imirkin_: probably in PNVIO or PTHERM or some random place
13:02 karolherbst: this is the only GPIO thingy I have in my reclocking trace: [0] 18353.851804 MMIO32 R 0x00d638 0x00007000 PGPIO.GPIO[0xa] => { SPECIAL_IDX = NVIO_SLI_SENSE_0 | MODE = NORMAL | OUT | OE | IN }
13:02 karolherbst: some more of them but only the same reg
13:02 imirkin_: yeah, could be that PDAEMON takes care of it
13:02 imirkin_: which would be extremely annoying
13:04 karolherbst: PGPIO.GPIO_OUT_TRIGGER too but that's not that important
13:04 karolherbst: mhh
13:04 karolherbst: wait
13:04 karolherbst: 0x02044c reg is strange for me
13:04 karolherbst: PTHERM.I2C_SLAVE+0x4c never written but has values 0x6c 0x6d 0x6e 0x6f 0x70
13:04 karolherbst: but unrelated I guess
13:04 karolherbst: in PTHERM there is nothing interesting else
13:05 karolherbst: PNVIO: https://gist.github.com/karolherbst/e00b58e9c364f0405ef5
13:05 karolherbst: mhh
13:06 imirkin_: maybe in the SEQ script?
13:07 karolherbst: the script is pretty big
13:07 karolherbst: but there should be this PTHERM reg maybe
13:07 karolherbst: yeah
13:07 karolherbst: 000040: R[0x00d638] := 0x00006000
13:08 karolherbst: https://gist.github.com/karolherbst/65f1c5bc6719b711db9f
13:09 imirkin_: why are you focused on that one reg?
13:09 karolherbst: because there isn't much left in PTHERM
13:10 karolherbst: ohh wrong reg anyway
13:10 karolherbst: ...
13:10 karolherbst: will go to bed I guess, will have a look tomorrow then
19:04 marcosps1: imirkin: Hi there :) It's kind of tricky this problem you related, but I'm reading a lot of code here :)]
19:04 imirkin: marcosps1: cool
19:05 imirkin: marcosps1: basically figure out what's getting set incorrectly by the parser and then... don't do that ;)
19:06 marcosps1: imirkin: hehe, so, about tess_ctrl/tess_eval, in both cases we can have that empty brackets?
19:06 marcosps1: imirkin: there is some documentation that I could get to understand all these shader things?
19:06 imirkin: marcosps1: yeah. so basically the idea is that both the inputs/outputs of tess ctrl are 2-dimensional (except patch vars), and all the inputs of tess eval are 2-dimensional (again, except patch vars)
19:07 imirkin: the 2nd dimension is unsized
19:07 imirkin: which is expressed with the empty brackets
19:07 imirkin: however that unsized-ness is also implied
19:07 imirkin: i.e. the printer just writes [] and moves on
19:10 marcosps1: imirkin: Ok... I tried to follow the code until get the code generated, but this is really crazyness... the code is big, and for a newcomer, it is confusing :)
19:10 imirkin: as expected
19:11 marcosps1: imirkin: But, I could reach the tgsi_parse_tokens, and I could look at the property/declarations/imm token reading...
19:11 marcosps1: And, why uso OO here..? :)
19:11 imirkin: marcosps1: well, look at why it ends up printing [][0][0]
19:12 imirkin: i dunno, it was like that when i got there :p
19:12 marcosps1: imirkin: yes, I'm trying to reach the code that prints that problem you related hehehe
19:42 marcosps1: imirkin: still around?
19:42 imirkin: ya
19:43 marcosps1: Its weird but, the nouveau_compiling is not getting compiled anymore... I run make, but the binary inside driver/nouveau isn't updated...
19:43 imirkin: you mean nouveau_compiler?
19:43 imirkin: should be...
19:44 marcosps1: Opps, sorry :) Damn typo!
19:44 imirkin: i've never had such an issue
19:44 imirkin: make an edit and then run make and then pastebin the log of that
19:50 marcosps1: imirkin: I did a make clean here... lets wait for this to work :)
19:50 marcosps1: Damn, it doesnt compile.. I'll edit the output for you..
19:51 marcosps1: imirkin: https://paste.fedoraproject.org/254981/20686143/
19:51 marcosps1: I'm executing make on mesa top dir...
19:52 imirkin: that's probably why it wasn't rebuilding
19:52 imirkin: because it had a compile error
19:52 imirkin: definitely a bit surprising that it's building *every dir* except nouveau
19:53 marcosps1: sh*t
19:53 marcosps1: I need to execute make inside driver/nouveau...
19:53 marcosps1: sorry for the noise...
19:55 marcosps1: But should it compile from top level dir...?
19:55 imirkin: no
19:55 imirkin: it should get called from top-level
19:56 imirkin: which leads me to believe you did something odd with your configre line
19:56 imirkin: what does 'head config.log' show?
20:01 marcosps1: imirkin: https://paste.fedoraproject.org/254984/14395212/
20:01 imirkin: marcosps1: ah, that doesn't build nouveau :)
20:01 marcosps1: So, could I send patch about it...?
20:01 imirkin: marcosps1: you want something like --with-gallium-drivers=nouveau,swrast --with-dri-drivers=i965 --enable-gallium-llvm --enable-texture-float --enable-debug
20:04 marcosps1: imirkin: Hum... thanks for this tip..
20:04 marcosps1: imirkin: As the code of compiler if complicated, I'll fill it with debug statements :)
20:40 marcosps1: imirkin: still hacking??
20:40 marcosps1: imirkin: I believe I found the problem...
23:14 karolherbst: mupuf: found another thingy related to clocking: 0x02044c
23:15 karolherbst: temp sensor :O
23:15 karolherbst: 51°C : 0x73, 50°C 0x72
23:15 karolherbst: and so on
23:15 karolherbst: 55: 0x77
23:20 mupuf: 20400 is already the temperature
23:20 mupuf: that's funny that they would have another temperature probe that could go under 0°C
23:21 mupuf: -64°C and onward
23:21 mupuf: well, document that
23:21 mupuf: I don;t recall seeing this
23:21 mupuf: check on a few more cards
23:23 karolherbst: 0x020450 reads always 0x40
23:23 karolherbst: checking 20400
23:24 karolherbst: mhh
23:24 karolherbst: they are kind of different
23:24 karolherbst: but usally consant diff
23:25 karolherbst: nvidia settings says "GPU Internal" for me
23:27 karolherbst: mupuf: you know something funny? nvidia doesn't use 0x020400 for me
23:27 mupuf: sure, the offset will be 64°C
23:27 mupuf: 20400 counts from 0
23:28 karolherbst: maybe 0x02044c is a newer sensor? or maybe there are just more now
23:29 mupuf: I think it is the same sensor
23:29 karolherbst: okay
23:29 karolherbst: but just different values
23:30 karolherbst: okay, the blob doesn't seem to use 0x020400 on nve+ cards
23:30 karolherbst: ohh wait
23:30 karolherbst: ohh no, nouveau uses it
23:31 karolherbst: yeah, blob uses 2044c on nce*
23:31 karolherbst: *nve*
23:32 karolherbst: maybe there were some bugs with cards starting in extreme situations? :D
23:32 mupuf: fun
23:32 karolherbst: some embedded nvidia cards in the arctic or something
23:32 mupuf: yes, it has been a concern of mine, but I never checked what happened in this case :D
23:32 mupuf: I can easily get to 130°C, but the opposite is hard
23:32 karolherbst: would be fun to know since when the reg is there, kepler?
23:33 karolherbst: so 204450 is the offset and 2044c is the value - ofsset = ° C
23:34 karolherbst: names?
23:35 karolherbst: wait..
23:36 karolherbst: there is no 0x020440 for me
23:36 karolherbst: ohh wrong reg
23:36 karolherbst: 0x020444 isn't there either, which is TEMP_LOW?
23:38 karolherbst: and now its there
23:38 karolherbst: and now gone
23:41 karolherbst: nvd has them too
23:45 karolherbst: mupuf: GF117 seems to be the first card having those
23:46 karolherbst: https://github.com/karolherbst/envytools/commit/7ab7d11a61fbde8db98a8fe2746e4dd3074f8d91 needs better names
23:53 karolherbst: wierd, there aren't there on maxwell either?