14:57 Tom^2: karolherbst: is envytools borked? was just about to get your temp readings from nvidia and nvaforcetemp just errors with , WARN: Can't probe 0000:01:00.0 and PCI init failure!
14:57 karolherbst: huh
14:57 karolherbst: you have to run that as root
14:58 Tom^2: yea i am
14:58 karolherbst: pretty yure you don't :p
14:59 Tom^2: karolherbst: http://i.imgur.com/9GAPw7J.png well. *shrug*
15:00 karolherbst: run "id"
15:00 Tom^2: uid=0(root)
15:00 Tom^2: or want a scrot of that too? xD
15:01 karolherbst: mhhh
15:01 karolherbst: well
15:01 karolherbst: for me it works
15:01 karolherbst: mwk: any ideas?
15:02 Tom^2: it has worked before, so its something changed or im missing something
15:13 Tom_2_: karolherbst: its something nvidia has changed then, because with nouveau booted i can issue the nva tools without errors
15:14 karolherbst: huh
15:14 karolherbst: which driver version?
15:14 karolherbst: alltough that shouldn't matter at all
15:15 Tom_2_: karolherbst: 364.19
15:15 karolherbst: super odd
15:15 karolherbst: I have lke 367.18
15:16 Tom_2_: unless envytools required bunch of lib32-libs that got brought in because of lib32-mesa-libgl
15:16 Tom_2_: but that shouldnt be the case
15:17 karolherbst: nope
15:17 karolherbst: I am sure you messed something up :p
15:18 Tom_2_: karolherbst: https://gist.github.com/anonymous/c9ea95e191ca0fbbed53d13c61ec210f well no idea but il bring it back.
15:19 karolherbst: Tom_2_: anyway, thw nvidia driver shouldn't have anny effect on those nva tools
15:21 Tom_2: karolherbst: it sure does here
15:21 karolherbst: Tom_2: well gdb helps then :p
15:25 Tom_2: karolherbst: https://gist.github.com/anonymous/5252fa1633587327aaaf81e082e34d99 that resource0 file doesnt exist.
15:26 karolherbst: hum
15:27 karolherbst: this is liker super odd then
16:13 karolherbst: Tom^: found anything?
16:13 Tom^: nope, that file is missing and it makes nva_init bork. i think
16:14 Tom^: and then i rebooted to windows and endulged my senses i pointless procrastination.
16:14 Tom^: *in
16:15 karolherbst: very odd though
20:30 gregory38: karolherbst: hello
20:30 gregory38: where did you get the register allocation info ?
20:30 karolherbst: which on?
20:30 karolherbst: you mean the ir dump?
20:31 gregory38: yesterday you told me that my new shader used 20 register
20:31 karolherbst: ahh
20:31 karolherbst: I checked which was the highest used red
20:31 karolherbst: but there is a way to print that info
20:31 imirkin_: gregory38: we output stats via debug info
20:31 imirkin_: gregory38: you can see it in glretrace, or you can add your own debug handler in your application
20:31 imirkin_: only in debug contexts though
20:32 karolherbst: imirkin_: maybe it makes sense to print it with NV50_PROG_DEBUG enabled though
20:32 imirkin_: karolherbst: i'm going to revamp the env vars at some point
20:32 imirkin_: the current thing is dumb for lots of reasons
20:32 karolherbst: right
20:32 gregory38: oh ok
20:33 gregory38: So far I set the env var in PCSX2 and I redirect stderr to file
20:33 imirkin_: gregory38: https://cgit.freedesktop.org/mesa/shader-db/tree/run.c#n575
20:33 imirkin_: that will give you notifications whenever we compile
20:33 imirkin_: (in a debug gl context)
20:33 gregory38: ok, I will redirect it
20:34 imirkin_: well, it's up to you what you do with it - it just calls a function
20:34 imirkin_: that function can print the message to stderr, or it can go make coffee - doesn't really matter
20:34 gregory38: love coffee :p
20:34 imirkin_: anyways, that includes a bunch of metrics i felt were interesting to keep track of
20:35 imirkin_: instruction count, gpr usage, local memory, and byte size
20:35 imirkin_: for nvc0+, byte size == 8 * instruction count
20:35 imirkin_: but nv50 has both 4- and 8-byte encodings
20:35 imirkin_: [actually nvc0 does too, but neither we nor the blob use them]
20:36 gregory38: Nice :)
20:36 imirkin_: if local memory > 0, that means you lost
20:36 imirkin_: you really want it to be 0
20:36 gregory38: why ?
20:36 imirkin_: GFxxx / GK10x have 64 registers, GK110+ has 256 registers
20:36 imirkin_: coz it's slow
20:37 imirkin_: but we have to use it if you e.g. have an array that you indirectly access
20:37 imirkin_: (so don't do that)
20:37 gregory38: ok
20:37 imirkin_: or if you use too many registers and we have to spill
20:37 imirkin_: (so don't do that)
20:38 gregory38: 64 registers <= I guess for a groups of threads ? Or by thread ?
20:38 imirkin_: or if you're using opencl and have an alloca() call. so don't do that :)
20:38 imirkin_: 64 registers per thread
20:38 imirkin_: but there's a finite number of registers in a SM
20:38 gregory38: by thread, I mean a shader run for a fragment (or a vertex)
20:38 imirkin_: so by using more registers per thread, you end up losing parallelism
20:39 gregory38: ok
20:39 imirkin_: for fermi, there's 32K registers/SM, for kepler+ there's 64K
20:39 imirkin_: [except the mythical GK210 which has 128K]
20:39 karolherbst: gk210?
20:40 karolherbst: ohhh
20:40 Calinou: 640K ought to be enough for anybody.
20:40 karolherbst: k80
20:40 imirkin_: Tesla K80
20:40 gregory38: are register 32/64/128 bits ?
20:40 imirkin_: 32-bit
20:40 karolherbst: Calinou: not thrtr yet :p
20:40 karolherbst: *there
20:41 imirkin_: on G80, each register is actually 16-bit, but most registers are accessed via 32-bit views on that space.
20:41 imirkin_: and there are 128 32-bit registers, or 256 16-bit ones, depending how you look at it
20:42 imirkin_: but short encodings can only address up to 64 regs, so it's better to keep under that boun
20:42 gregory38: thanks for all the info.
20:42 imirkin_: probably a lot more than you wanted to hear :)
20:43 gregory38: Well I'm quite curious so it is fine
20:43 karolherbst: gregory38: well if you see stupid things the compiler does, you can always tell us. I tried to find some simple things we could improve, but somehow the simple things are mostly gone :/
20:44 imirkin_: yeah, one thing to remember is that it's really easy to optimize things in your head, but a compiler can have a more difficult time
20:44 karolherbst: yes....
20:44 gregory38: yeah I know
20:45 gregory38: Better tune glsl a bit
20:45 karolherbst: but I think some of my opts are save enough though but I think I will still them for some time because I seem to run into issue every week
20:46 gouchi: hi
20:46 gouchi: the backtrace I got for mpv: pushbuf.c:238: pushbuf_krel: Assertion `bkref' failed.
20:46 gouchi: http://www.hastebin.com/tiletavibe.pas
20:47 gouchi: I will have to recompile to mpv with debug symbol
20:47 imirkin_: gouchi: expected.
20:47 karolherbst: multithreading again?
20:47 imirkin_: yep
20:47 imirkin_: and my patch won't help him either
20:47 imirkin_: gouchi: use mplayer.
20:47 karolherbst: k
20:47 gouchi: imirkin: mpv which is for of mplayer and mplayer2
20:48 gouchi: fork*
20:48 imirkin_: gouchi: mplayer. mplayer. not mplayer fork. mplayer.
21:03 gregory38: https://gist.github.com/gregory38/4f7ae7a18d8d09501174dba4ae0f25ad
21:03 gregory38: is it normal to use that much gpr ?
21:04 gregory38: the glsl shader just move stuff around (create 2 triangles (quad) from a line )
21:05 imirkin_: it's unfortunate
21:06 imirkin_: the issue is that something decides to (a) load ALL the inputs and then (b) emit stuff
21:06 imirkin_: [something = varying packing]
21:06 imirkin_: and we don't have an instruction scheduler
21:06 imirkin_: which means that we also don't do anything to reduce register pressure/etc
21:07 gregory38: ah ok. Doesn't help
21:07 imirkin_: if you can redo things to always use vec4's, that would "solve" that problem, since it wouldn't add stupid varying packing bs
21:07 imirkin_: oh, and just fyi, $r63 == hard wired to 0
21:08 gregory38: well I need to move either the X/U or the Y/V stuff
21:08 imirkin_: or i think if you use explicit locaitons, that will also avoid the varying packer
21:08 gregory38: you mean for the interface?
21:08 imirkin_: ya
21:09 gregory38: I'm waiting tim's patches to support the missing 4.4 extensions
21:09 gregory38: but good to knwo
21:09 imirkin_: you mean enhanced layouts?
21:09 gregory38: yes
21:09 gregory38: I'm using interface block
21:10 imirkin_: i think you can set explicit locations on items within the iface block, no?
21:10 gregory38: in enhanced layouts ;)
21:11 gregory38: anyway, it is a low priority
21:14 imirkin_: i wish the varying packer didn't suck so much on geometry shaders
21:14 karolherbst: though nouveau sucks here too
21:14 imirkin_: yeah, but it's creating unnecessary suckage for nouveau
21:15 imirkin_: and i think ken fixed it for tess
21:15 imirkin_: just have to hook that same thing up for geom
21:15 karolherbst: I am sure this entire thing can be done in lik 30 instructions if we would b clever about it
21:17 gregory38: well you need 18 export + 6 emit
21:18 gregory38: 30 instructions feel a bit optimistic IMHO ;)
21:24 imirkin_: it's better on nv50 where you can export as you load :)
21:25 karolherbst: yeah... do we have gemotry shaders there already?
21:25 imirkin_: yep. that was my second large contribution to nouveau :)
21:25 karolherbst: ui
21:25 karolherbst: ohh
21:25 karolherbst: mhh
21:25 karolherbst: now I know why my tesla spit out 3.0 today
21:26 karolherbst: my glxinfo was just too old then