02:06 pmoreau: jayhost: nobody working on it, IIRC
02:26 zeq: karolherbst: I'm running with the Fermi reclocking enabling patches, since the pstate interface has been removed how do I know if it's working? Is it fully automatic now?
02:26 karolherbst: zeq: no, you still need nouveau.pstate=1
02:26 karolherbst: ohh wait
02:27 karolherbst: zeq: it is in /sys/kernel/debug/dri now
02:27 zeq: ok
02:27 karolherbst: 0 or 1
02:27 karolherbst: depends on what gpus you have
02:28 karolherbst: imirkin: how can I do min(abs(x), 0) = 0?
02:29 zeq: cat /sys/kernel/debug/dri/2/pstate
02:29 karolherbst: just do a OP_MOV?
02:29 zeq: 03: core 50 MHz memory 135 MHz
02:29 zeq: 07: core 202 MHz memory 324 MHz
02:29 zeq: 0f: core 672 MHz memory 1569 MHz
02:29 zeq: AC: core 202 MHz memory 324 MHz
02:29 karolherbst: zeq: then echo 0f in it
02:29 karolherbst: ohh
02:29 karolherbst: your gpu boots with 202?
02:29 zeq: apparently
02:30 zeq: hmm
02:30 zeq: there's no asterisk there though
02:30 zeq: I echoed 0f and I get:
02:31 zeq: cat /sys/kernel/debug/dri/2/pstate
02:31 zeq: 03: core 50 MHz memory 135 MHz
02:31 zeq: 07: core 202 MHz memory 324 MHz
02:31 zeq: 0f: core 672 MHz memory 1569 MHz AC DC *
02:31 zeq: AC: core 670 MHz memory 324 MHz
02:31 karolherbst: zeq: then check how stable the gpu runs with that
02:31 zeq: glxgears is still running
02:32 zeq: seems ok
02:32 karolherbst: with vsync disabled?
02:33 karolherbst: unigine heaven or gputest furmark are good stresstests
02:34 zeq: I've got xonotic installed, I'll try that
02:34 zeq: do I ned to echo the pstate file every time the gpu comes "online"?
02:34 zeq: (using PRIME here)
02:36 karolherbst: yeah
02:36 karolherbst: or
02:36 karolherbst: mhh not sure
02:36 karolherbst: I think yes, but I might be wrong
02:39 zeq: okay, it definiitely seemed a fail bit faster, still slow though. need to run actual benchmark to know for sure obviously. I just tried echoing 03 into pstate, it doesn't work:
02:39 zeq: hangs with "clk: unable to find matching pll values" in kernel log.
02:39 zeq: s/fail/fair/
02:39 karolherbst: ohh
02:40 karolherbst: yeah reclocking isn't well tested on fermi, so you might run into troubles
02:40 karolherbst: but at least with that we can find them now
02:40 zeq: when I say it hangs I mean the echo hangs (blocks)
02:40 karolherbst: yeah
02:40 karolherbst: it shouldn't though
02:41 zeq: 0f caused no problems
02:41 karolherbst: zeq: might want to give me the stacktrace of the hanging bash?
02:44 karolherbst: imirkin: ... all those max(abs(x), 0) => abs(x) thingies are used in muls and mads :/ so now there are a bunch of abs instructions in the shader :D
02:44 karolherbst: but I think this is nice for dual issuing?
02:47 zeq: karolherbst: I've clearly not had enough coffee yet this morning, how do I get the stacktrace?
02:47 karolherbst: zeq: you should see a dead bash in htop
02:48 karolherbst: take its pid and cat /proc/$pid/stack
02:48 zeq: oh, I've found it, was looking for bash, but should have searched for su!
02:52 karolherbst: imirkin: mul ftz f32 $r21 $r21 $r21 ...
02:52 zeq: karolherbst: [<ffffffffc06d24c0>] nvkm_pstate_calc+0x80/0x99 [nouveau]
02:52 zeq: [<ffffffffc06d2b3e>] nvkm_clk_ustate+0x3a/0x3c [nouveau]
02:53 karolherbst: zeq: gdb nouveau.ko
02:54 karolherbst: ohh right, no symbols :/
02:55 karolherbst: ohh that doesn't help anyway
02:55 karolherbst: zeq: there should be also a dead kworker in htop
02:55 karolherbst: I would need that stack of that one too
03:00 zeq: the hang caused more problems than a blocked bash - had to reboot!
03:02 karolherbst: RSpliet: do you know if nv50 ir keeps track of the values of registers being > 0 or < 0?
03:05 karolherbst: ohh shit, the slct checks against lt :/
03:05 karolherbst: meh
03:05 karolherbst: the selector might be 0
03:06 zeq: karolherbst: I've found the cause
03:07 zeq: karolherbst: when power state is D3cold writes to pstate cause "trouble"
03:08 karolherbst: ohh right
03:08 karolherbst: I had a dirty fix for that somewhere
03:16 zeq: karolherbst: maybe writes to pstate should produce an error with D3cold power state?
03:16 pmoreau: RSpliet: Current quote on bugzilla: "< RSpliet> WHOA! that might have been the first time someone used "love" and "nv50" in one sentence, without negations" ;-D
03:22 RSpliet: pmoreau: hahaha
03:23 RSpliet: I didn't know the list was actively updated
03:24 RSpliet: karolherbst: sorry, I don't know the IR quite that well
03:24 karolherbst: yeah, it was a good idea, but didn't turn out well
03:24 RSpliet: only done a few minor changes to the compiler, very ad-hoc
03:24 karolherbst: RSpliet: imirkin told me you did some post-ra stuff :D
03:25 RSpliet: yeah, this thing with mad imm compression or sth
03:26 karolherbst: yep
03:26 karolherbst: I need that for nvc0
03:26 karolherbst: mad ftz f32 $r21 $r22 $r14 $r21
03:26 karolherbst: and move a limm into this mad
03:26 RSpliet: you might just be able to enable the pass for NVC0, after you double-checked it can be emitted properly
03:27 RSpliet: http://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp line 3328
03:28 zeq: imirkin, glennk: I tried enabling PIPE_CAP_NPOT_TEXTURES on nv35. It enabled gnome-shell [pretty sure it was using OpenGL; so not surprising] to "work" but with plenty of corruption. weston too with MESA_GLES_VERSION_OVERRIDE=2.0 again corruption, it was definitely using GLES2.
03:28 karolherbst: RSpliet: k
03:29 karolherbst: RSpliet: yay, one instruction cut out
03:30 RSpliet: note that NV50 might have some restrictions in the ISA that NVC0 doesn't have (src3 must be equal to dst if src2 is an imm)
03:30 karolherbst: RSpliet: it doesn't switch the sources yet?
03:30 zeq: imirkin, glennk: 04:21 < glennk> NPOT in gles 2.0 doesn't require mipmaps or wrap modes other than clamp_to_edge
03:30 zeq: Apparently weston must be using mipmaps or wrap modes with NPOT?
03:30 karolherbst: RSpliet: same on nvc0
03:31 karolherbst: but only for limms
03:31 karolherbst: if you want to move the limm in, dst == src3
03:31 karolherbst: RSpliet: I guess if src1 is the limm, swap(src1, src2)?
03:33 RSpliet: I think it should already attempt to swap the inputs accordingly somewhere else
03:33 karolherbst: RSpliet: this post-ra thing: 101935 -> 101808 (-0.12%)
03:33 karolherbst: in shader-db
03:34 karolherbst: RSpliet: mhh okay
03:34 karolherbst: RSpliet: well, mov u32 $r22 0x3c8b4396; mad ftz f32 $r21 $r22 $r14 $r21; wasn't optimized
03:35 RSpliet: hmm, well, feel free to check in the pass and do the swap otherwise :-)
03:35 karolherbst: RSpliet: I could do that in the pre-ra pass though
03:35 karolherbst: swapping this should be rather cheap
03:36 RSpliet: I'd add it to the post-RA pass instead
03:36 RSpliet: because it only makes sense if your dst == src3
03:36 RSpliet: and it's looking for a mov imm anyway
03:37 RSpliet: line 2881-2884, play around with that a little to do a swap if deemed necessary
03:38 RSpliet: surely that's valid on NV50 too
03:38 karolherbst: yeah
03:40 karolherbst: RSpliet: so I do the same check with def = i->getSrc(0)->getInsn(); again if that is false and swap with setSrc and swap the mods?
03:40 karolherbst: or do I have to do something else?
03:40 RSpliet: sounds about right, but just play with it
03:46 karolherbst: nice
03:46 karolherbst: cut around 45 instructions in pixmark_piano
03:46 karolherbst: now I am at 3795
03:46 karolherbst: I was at around 3900 before I started this :D
03:47 karolherbst: and with the swap another 101935 -> 101843 (-0.09%) in shader-db
03:48 karolherbst: RSpliet: does this look good? https://github.com/karolherbst/mesa/commit/d2e0e6b02577ad545681735f1f60cfeafe8a8387
03:50 karolherbst: RSpliet: mhh 135: mad ftz f32 $r18 $r21 neg 0.200000 $r18
03:50 karolherbst: do you think it makes sense to optimize that neg away?
03:51 RSpliet: doesn't make a difference
03:52 karolherbst: ohh wait my change has an error
03:53 RSpliet: it doesn't look entirely right, in the sense that I'd expect it to not swap if SRC2 has a mov imm as source
03:54 RSpliet: also, I'd look around to doublecheck that this is the right way to swap
03:55 RSpliet: I think it only requires flipping around pointers, I'm sure I've seen it written differently in other places
03:55 RSpliet: might even be a method for that
03:55 karolherbst: no, this swap increases instruction count here in shader-db :/
03:55 karolherbst: something went wrong
03:56 RSpliet: insn count or bytes?
03:58 imirkin: zeq: yeah it def won't work out of the box... if you're interested in debugging stuff yourself, i can assist you, but i don't think i'll have time to investigate myself
03:59 karolherbst: RSpliet: insn count
03:59 imirkin: karolherbst: could i convince you to do an mmt on the blob for me?
03:59 karolherbst: RSpliet: it helped 40, hurt 15 :/
03:59 karolherbst: imirkin: depends :D
03:59 imirkin: RSpliet: we always use 8-byte instructions on fermi+
03:59 karolherbst: imirkin: currently I am still optimizing
04:00 karolherbst: imirkin: does this look right for you? https://github.com/karolherbst/mesa/commit/d2e0e6b02577ad545681735f1f60cfeafe8a8387
04:01 karolherbst: imirkin: this is the post-ra limm mad stuff
04:01 imirkin: karolherbst: no
04:01 imirkin: it does not.
04:02 karolherbst: ahhh
04:02 karolherbst: I think it always swaps
04:02 karolherbst: ...
04:02 karolherbst: and there might be other issues
04:02 imirkin: the split thing is to handle the idiotic nv50 integer mad situation which does u16 muls
04:03 zeq: imirkin: if I do so, should I split the NPOT extensions into GLES1/2 and OpenGL(/GLES3?) so the the GLES version doesn't support anything it does need to?
04:04 imirkin: zeq: well, ARB_texture_non_power_of_two is an actual extension
04:04 zeq: imirkin: OpenGL extension
04:04 imirkin: the fact that we check Const.Extensions.ARB_texture_non_power_of_two is a sign of laziness...
04:05 karolherbst: imirkin: mhh okay, the initial mad $r0 $r1 limm $r0 thing did a -0.13% change and if add support for mad $r0 limm $r1 $r0, there will be another -0.20% :)
04:05 imirkin: so yeah, you could split it up
04:05 imirkin: BUT
04:05 imirkin: i don't think that nv30 will *magically* support NPOT normalized coords just like that
04:05 glennk: GL_ARB_texture_rectangle is supportable on nv3x, texture_non_power_of_two isn't
04:05 imirkin: even without mips/etc
04:06 imirkin: i think that for normalized coords to work, you might have to scale them in the shader
04:06 imirkin: i'm not sure, i'd have to double check
04:06 glennk: yeah, if memory serves r300 does that?
04:06 karolherbst: ahh no I understand the code, k
04:06 imirkin: zeq: this might be of interest to you: https://github.com/envytools/envytools/blob/master/rnndb/graph/nv30-40_3d.xml
04:07 imirkin: zeq: specifically this: https://github.com/envytools/envytools/blob/master/rnndb/graph/nv30-40_3d.xml#L526
04:08 zeq: RECT only NV40?
04:09 imirkin: well, that bit of it...
04:09 imirkin: checking how rect works on nv30...
04:10 imirkin: right.
04:10 imirkin: there are rect and non-rect formats
04:10 imirkin: http://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/nv30/nv30_fragtex.c#n116
04:12 imirkin: so for normalized coords i wouldn't exclude the possibility that you have to scale the coords in the shader. would need to test it out and just see what happens.
04:12 imirkin: with a simple test
04:13 zeq: if r300 does that, perhaps the code could be shared. I have to confess, I know practically nothing about shaders...
04:14 glennk: the technique can be shared, the implementation, not so much
04:14 imirkin: zeq: this is a huge whammy as well, esp for modern software attempting to run on ancient hw: http://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/nv30/nv30_state.c#n373
04:14 imirkin: zeq: they're programs. they take inputs, and produce outputs.
04:15 zeq: imirkin: my hope was to get modern compositors working
04:15 imirkin: yeah, i get what you're trying to do
04:16 imirkin: and if you file concrete bugs against the driver, i might even look at fixing them. i wouldn't bother with something as open-ended as "weston doesn't work"
04:16 imirkin: but you could make an apitrace of it not working
04:16 imirkin: and that might be something more concrete to go on
04:16 zeq: I can not believe it's not possible to do enough to get it to be performant for that use case
04:17 imirkin: you'd be right in not believing that
04:18 zeq: just enabling the extension (and expecting breakage) was enough to get weston to start
04:18 karolherbst: imirkin: https://github.com/karolherbst/mesa/commit/37d7f8eae194ee6e5d60a75a2c9e57a210485e93
04:18 karolherbst: this should be better
04:18 zeq: textures were very wrong
04:18 karolherbst: though the diff looks bad :D
04:19 zeq: imirkin: I'll have a read of the code and see if I can get my head around it.
04:21 imirkin: zeq: have a look in http://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/nv30/nv30_miptree.c
04:21 imirkin: right now it's set up to assume that npot means "lay out like rect"
04:21 imirkin: but you need to add a nv30 case for laying out npot by rounding up to the next pot
04:22 karolherbst: imirkin: can we do this? https://gist.github.com/karolherbst/37909ab9a1207dd6e0cb
04:23 karolherbst: or could that cause problems
04:23 imirkin: and if you want clamp_to_edge to work, you might need to copy the edge pixels all the way down at transfer_unmap time
04:23 karolherbst: imirkin: and what mmt do you need?
04:24 imirkin: karolherbst: bin/arb_framebuffer_no_attachments-atomic -fbo -auto
04:24 imirkin: karolherbst: yeah, that sounds like you can do that...
04:24 imirkin: karolherbst: seems like a pretty rare situation though :)
04:24 karolherbst: well I found it :D
04:25 karolherbst: imirkin: anyway, I am now at total instructions in shared programs : 103007 -> 101716 (-1.25%), total gprs used in shared programs : 15083 -> 15103 (0.13%) with all my changes in shader-db
04:25 zeq: imirkin: it would be nice to have some kind of generic solution which could work with other cards without full OpenGL2 NPOT support. Maybe completely unrealistic?
04:26 imirkin: zeq: you could do the emulation at the st/mesa level
04:26 imirkin: karolherbst: neat... will have to take a look
04:26 glennk: thing is those old cards typically also don't have enough vram for a compositor to run well
04:26 zeq: I'm sure there are quite a few d3d9 cards out there than can *nearly* do OGL2/GLES2
04:26 karolherbst: glennk: every card can do composited xfwm though
04:26 imirkin: zeq: however we've been moving away from putting junk into st/mesa and letting drivers worry about it
04:27 glennk: karolherbst, non-composited yes
04:27 imirkin: glennk: dunno, nv3x had 64M or more usually
04:27 karolherbst: glennk: also composited
04:27 karolherbst: glennk: xfwm doesn't use opengl
04:27 glennk: karolherbst, it does in its git version
04:27 karolherbst: oh really
04:27 zeq: glennk: how much does a compositor need vram-wise? It doesn't all have to be in VRAM, or are AGP transfers too slow?
04:27 karolherbst: well then 4.12 doesn't do opengl :D
04:28 glennk: zeq, every surface takes w*h*4 bytes + alignment
04:28 glennk: plus two for your screen buffers
04:29 karolherbst: imirkin: mad(abs(x), abs(x), y) = mad(x, x, y) should be done in algebraicopt?
04:29 imirkin: karolherbst: i think so, yes
04:29 karolherbst: ohh mad(neg(x), neg(y), z) = mad(x,y,z) would be also a good idea
04:30 imirkin: sure
04:30 imirkin: i wonder if that's a good idea for integers...
04:30 imirkin: neg(MIN_INT) = MIN_INT =/
04:30 karolherbst: ohh
04:31 karolherbst: I would go for the reasonable idea a dev had. If they depend on such hacks...
04:31 karolherbst: nobody would ever do i = neg(MIN_INT), because then they would do i = MAX_INT in the first place
04:32 karolherbst: well, maybe not nobody, but well
04:32 imirkin: yeah, but they might do a = neg(b)
04:32 karolherbst: yes, and they expect b to change signs
04:32 imirkin: and b could happen to turn out to be MIN_INT
04:33 karolherbst: imirkin: but I think if you multiple with MIN_INT you will get other issues anyway
04:33 karolherbst: *multiply
04:33 imirkin: integer math is pretty well-defined
04:33 imirkin: and if you test your algorithm assuming regular integer math, this could cause problems
04:33 karolherbst: mhh
04:34 imirkin: but it's fine for floats
04:34 karolherbst: so I should only do this for floats for now
04:34 karolherbst: which is like the general case anyway?
04:34 imirkin: yeah
04:34 karolherbst: k
04:34 karolherbst: how do I check for that?
04:35 imirkin: isFloatType(i->dType)
04:35 karolherbst: thanks
04:35 imirkin: (so that you catch the even less common F64 case...)
04:36 karolherbst: yeah I know
04:37 karolherbst: imirkin: for mad/mul I have to check the src definition->op thing, because the src.mod won't be ever have abs?
04:38 imirkin: it might
04:38 imirkin: esp if you're runnign things in a loop
04:38 imirkin: and AlgebraicOpt might run after ModifierFolding
04:38 karolherbst: mhh okay
04:38 imirkin: and iirc mul can take modifiers... dunno
04:39 imirkin: hm, nope
04:39 imirkin: both mul/mad take neg, neither takes abs
04:39 imirkin: but you shouldn't rely on that in core code
04:39 imirkin: although it's true on nv50 as well
04:39 karolherbst: k
04:40 karolherbst: imirkin: seems like the never have a abs mod in this shader, so I will concentrate on the def first
04:47 karolherbst: imirkin: can I do i->src(0) == i->src(1) or do I have to do something special?
04:48 imirkin: mmmmmmmm
04:48 imirkin: you can, but it's not great
04:48 imirkin: what you really want to do is like
04:49 zeq: glennk: 128Mb should typically be enough VRAM for a compositor, at least with a single head up to any reasonable resolution, right?
04:49 imirkin: i->getSrc(0)->equals(i->getSrc(1))
04:50 karolherbst: k, got two hits with this :)
04:50 karolherbst: src0->equals(src1) && src0->getInsn()->op == OP_ABS
04:51 karolherbst: and then I move the src0->getInsn->getSrc(0) into i
04:51 imirkin: you might also check for a cvt with an abs modifier
04:52 karolherbst: mhh okay
04:52 imirkin: but yeah, this should be rare.
04:52 karolherbst: ohh right, your mmt :D
04:55 glennk: zeq, 128 is probably a minimum spec
04:56 karolherbst: imirkin: http://filebin.ca/2URuiym9gONd/arb.log.xz
04:56 imirkin: karolherbst: thanks. it passed i assume?
04:56 karolherbst: yes
04:58 imirkin: cool
04:58 imirkin: now to see what they're doing that i'm not
05:01 karolherbst: imirkin: https://github.com/karolherbst/mesa/commit/8244127d082e3415878c3aeee95a68cb20db6712
05:01 karolherbst: this did something strange
05:01 karolherbst: 2888: abs ftz f32 $r28 $r28; 2889: mul ftz f32 $r28 $r28 0.400000 ==> 2888: mul ftz f32 $r28 $r28 $r28
05:01 karolherbst: I guess I don't know something
05:02 imirkin: what was $r28 before the abs?
05:02 karolherbst: mad ftz f32 $r28 $r23 $r13 $r28
05:03 imirkin: hmmm
05:03 karolherbst: also it didn't catch "323: abs ftz f32 $r25 $r19; 332: mad ftz f32 $r21 $r25 $r25 $r21"
05:03 imirkin: hold on
05:05 imirkin: i dunno what's going wrong
05:05 imirkin: but your function seems fine
05:05 imirkin: you left off a break, but that wouldn't matter here.
05:06 karolherbst: I will print the instructions and check what's going on
05:07 karolherbst: mul f32 %r13327 %r13325 %r13326 (0) abs f32 %r13325 %r13324 (0) mov u32 %r13326 0x3ecccccd (0)
05:08 karolherbst: i, src0->getInsn, src1->getInsn
05:09 karolherbst: there is no way that %r13325 %r13326 can be equal, well it could be, but I doubt the compiler is that smart
05:14 imirkin: PIGLIT: {"result": "pass" }
05:14 imirkin: finally.
05:14 imirkin: karolherbst: uhm
05:17 zeq: imirkin, glennk: r300g does indeed have OGL2 NPOT emulation. http://cgit.freedesktop.org/mesa/mesa/commit/src/gallium/drivers/r300?id=13359e6a4b732335cdd8da48276960d0b176ffe3 and later commits
05:17 imirkin: zeq: actually that commit does it for 3D textures, not 2D :)
05:18 zeq: I did say later commits :)
05:18 imirkin: coz 2D were already done i guess
05:18 zeq: http://cgit.freedesktop.org/mesa/mesa/log/src/gallium/drivers/r300?qt=grep&q=NPOT
05:18 karolherbst: imirkin: I guess you are out of ideas here? :/
05:18 imirkin: karolherbst: oh
05:18 imirkin: heh
05:19 imirkin: right.
05:19 imirkin: there should be a check in there, if the id is negative, then check the other id :)
05:19 karolherbst: ?
05:20 imirkin: or screw that
05:20 imirkin: just do == and move on
05:20 imirkin: i.e. if the ptr's are equal
05:20 karolherbst: mhh, they are never equal :/
05:21 karolherbst: it is in pre-RA though: 335: abs ftz f32 %r16202 %r16185 (0) 338: mad ftz f32 %r16205 %r16202 %r16202 %r16204 (0)
05:22 imirkin: which things are you == 'ing?
05:22 imirkin: .src() or ->getSrc()?
05:22 imirkin: .src() is a ValueRef
05:22 imirkin: and those will be different
05:23 karolherbst: getSrc
05:23 karolherbst: I removed the OP_ABS check
05:23 karolherbst: and now I have some hits
05:24 karolherbst: but not the one I am after
05:24 karolherbst: ohhhh
05:24 karolherbst: imirkin: I guess a later optimization is creating this
05:25 imirkin: i thought you had it in a loop
05:25 karolherbst: yes I have
05:26 karolherbst: ahh have to run the loop another time
05:27 karolherbst: this cuts 20 instruction in this shader :/
05:33 karolherbst: imirkin: which modifiers can abs have, but mul/mad not?
05:34 karolherbst: or shouldn't I worry about this at that point?
05:35 RSpliet: karolherbst: well, it's not so much that you can't have certain modifiers, but some modifiers are not commutative
05:36 RSpliet: eg, you can't just migrate them from a src to the mad/mul's dst
05:37 imirkin: karolherbst: i don't think abs can have any modifiers tbh
05:37 imirkin: karolherbst: but cvt can.
05:38 karolherbst: imirkin: https://github.com/karolherbst/mesa/commit/d1e4077065340c3f89f946e9b395083088242a44
05:38 karolherbst: so this is safe as it is for now
05:38 karolherbst: ohh I have to clear the modifiers?
05:39 karolherbst: ahh well I just copy them over for now
05:46 karolherbst: add ftz f32 $r14 $r14 1.800000; add ftz f32 $r17 $r14 -0.100000 ==> add ftz f32 $r17 $r14 1.800000
05:46 karolherbst: ...
05:46 karolherbst: add ftz f32 $r14 $r14 1.800000; add ftz f32 $r17 $r14 -0.100000 ==> add ftz f32 $r17 $r14 1.700000
05:46 karolherbst: imirkin: or is there something wrong with that?
05:47 imirkin: nope, just requires building up expression trees and balancing them
05:48 karolherbst: can't I just get the value of the immedaites and merge them together?
05:48 imirkin: i suspect a good amount of research has gone into how to do this efficiently and effectively
05:48 imirkin: sure, but that's just like a pretty limited case
05:48 karolherbst: ohh you meant for like everything at once
05:48 imirkin: what if it's like 0.1 + a + b + c + d + 0.2
05:49 imirkin: it'd be nice to combine the 0.1 and 0.2
05:49 RSpliet: karolherbst: are you sure that $r14's liveness ends after the second add?
05:49 karolherbst: no
05:49 karolherbst: but it might
05:49 imirkin: RSpliet: i think he's talking about the general idea
05:49 karolherbst: in other cases
05:49 imirkin: RSpliet: i haven't been able to get him to use the pre-RA things, which make way more sense
05:49 imirkin: but have larger numbers
05:50 karolherbst: somehow I see those stuff in the post-ra stuff easier :D
05:50 karolherbst: RSpliet: add ftz f32 $r22 abs $r14 -0.250000
05:50 karolherbst: RSpliet: add ftz f32 $r14 $r14 1.500000
05:50 karolherbst: the 1.8 could be merged into the last use as well
05:51 karolherbst: mhhh
05:51 RSpliet: is that called immediate propagation? :-P
05:51 karolherbst: but that would require some reordering
05:51 RSpliet: anyway, it's not a super simple problem :-)
05:51 karolherbst: ...
05:51 karolherbst: yeah I see the problems now
05:51 karolherbst: k, then I will find other stuff
05:52 imirkin: RSpliet: it's called expression balancing
05:52 imirkin: basically you have an expression tree, and you want to move things around in an advantageous manner
05:53 karolherbst: what does sat do?
05:54 imirkin: saturate
05:54 imirkin: clamp(0, 1)
05:56 karolherbst: imirkin: only mad src2 can be a immediate, right?
05:56 imirkin: only the second src, iirc
05:56 karolherbst: uhm, yeah, I meant that
05:56 imirkin: usually src1
05:57 imirkin: but depends how you count :)
05:57 karolherbst: ohh right
05:57 karolherbst: "mul ftz f32 $r20 $r20 $r20"
05:57 karolherbst: this looks like something can be done to id
05:57 karolherbst: *it
05:58 imirkin: like what?
05:58 karolherbst: no idea
05:59 karolherbst: yeah, I think I found something
06:00 imirkin: if you later take its sqrt you can do something :)
06:00 karolherbst: what happens with rsq f32 $r21 0x0?
06:01 imirkin: ka-boom
06:01 karolherbst: real ka boom?
06:01 karolherbst: or just $r21 contains stuff
06:01 imirkin: i dunno, NaN i assume
06:01 karolherbst: mhh
06:01 imirkin: 1/sqrt(0) = not great
06:01 karolherbst: k
06:01 karolherbst: so I have this idea:
06:01 karolherbst: $20 is >=0 (for mad/mul reasons)
06:01 imirkin: but if it feeds into a mul.dnz then nan -> 0
06:02 karolherbst: rsq f32 $r21 abs $r20
06:02 karolherbst: slct ftz f32 $r20 lt $r21 $r63 neg $r20
06:02 karolherbst: $20 should be either >0 or NaN
06:02 karolherbst: so what does slct do in that case?
06:03 imirkin: dunno
06:03 karolherbst: exmaple here: https://gist.github.com/karolherbst/37909ab9a1207dd6e0cb
06:03 imirkin: i don't even know what slct does offhand :p
06:03 imirkin: (not precisely)
06:03 karolherbst: select
06:04 imirkin: yeah, i know the basic idea :p
06:04 karolherbst: I think it checks the the last arg against 0
06:04 karolherbst: ahh k
06:04 karolherbst: I think we might be able to optimize this slct away
06:05 karolherbst: which would allow us to drop some instructions if $r63 is selected
06:05 karolherbst: but I think $r21 should be selected there cause neg $20 < 0
06:05 karolherbst: but the general idea behind this
06:06 zeq: imirkin: util_next_power_of_two() seems to be a helper for implementing POT alignment. I'm reading through the r300 driver atm
06:06 imirkin: zeq: it's just a function that will produce the next power of two :)
06:07 imirkin: quite useful in many scenarios
06:07 imirkin: including, but not limited, to this one
06:21 zeq: imirkin: yup, I worked that one out! :)
06:50 karolherbst: imirkin: I found such a case: min(max(max(a,0),max(max(b,0),max(c,0)), 0)
06:51 karolherbst: ... maybe I can simplify that a bit
06:52 karolherbst: ohh wait, I think I did somethign wrong
06:52 karolherbst: yeah I did something wrong :/
07:14 karolherbst: imirkin: set ftz u8 $p0 lt f32 abs $r14 $r17, $r17 can only be a short immediate?
07:14 imirkin: karolherbst: check nv50_ir_target_nvc0.cpp
07:14 imirkin: and/or envydis
07:15 karolherbst: well I only see shorts in the binary dump, so I assume it
07:15 karolherbst: imirkin: imm = 0x2 means short?
07:16 imirkin: you mean the instruction type?
07:16 imirkin: iirc for SM20/SM30 all the limm ops are &7 == 2
07:16 karolherbst: in the opProperties struct
07:17 imirkin: that means the immediate can only ever go on the second source
07:18 imirkin: it's a bitfield
07:18 karolherbst: what does "BB:15 (0 instructions) - idom = BB:14, df = { BB:16 } -> BB:16 (forward)" mean by the way?
07:18 karolherbst: empty branch jump to BB16?
07:18 karolherbst: mhh no, doesn't make sense
07:19 imirkin: that's what it means
07:19 imirkin: there used to be code in the block
07:19 imirkin: but now it's all gone
07:19 imirkin: but it's easier to just leave that block in place
07:19 imirkin: there's no actual branch generated
07:19 imirkin: that's just a print of the control flow graph
07:19 imirkin: (aka CFG)
07:19 karolherbst: okay
07:19 imirkin: and the forward is a lie :( unfortunately it's all mislabeled, and i haven't had the heart to attempt to fix it
07:20 karolherbst: k
07:22 mangeurdenuage: hello everyone sorry to interject but I'm trying to make a MMIO Trace with this https://wiki.ubuntu.com/X/MMIOTracing and when at the echo part " echo 64000 > /sys/kernel/debug/tracing/" the folder "/tracing/trace_pipe" does not exist did somebody already bump into this problem ?
07:23 imirkin: mangeurdenuage: you don't have CONFIG_MMIOTRACE or CONFIG_TRACING
07:23 imirkin: or you don't have tracefs mounted (least likely option)
07:23 imirkin: grep trace /proc/filesystems
07:25 mangeurdenuage: I made the "grep trace /proc/filesystems " there is nothing
07:25 mangeurdenuage: I thought that mmiotrace was part of the kernel.
07:25 imirkin: it is... like a lot of things
07:25 imirkin: you just have to make sure to turn it on when building your kernel
07:25 mangeurdenuage: OOOOk
07:27 mangeurdenuage: so I need to compile my own kernel for this ? if I understand
07:27 karolherbst: mangeurdenuage: did you mount tracefs?
07:27 mangeurdenuage: nope
07:27 imirkin: karolherbst: he doesn't *have* tracefs
07:27 karolherbst: ohh okay
07:28 mangeurdenuage: I looking how to actually
07:31 karolherbst: imirkin: meh, this commit https://github.com/karolherbst/mesa/commit/495613b09ac06d54196c5a7a887af3b43f155e4d messed up unigine :/
07:32 imirkin: that pass may be best suited to nv50
07:32 imirkin: i forget
07:32 imirkin: but it should be generalizable
07:34 karolherbst: this pass is pretty small though :/
07:35 imirkin: and yet it's had a *ton* of bugs
07:35 imirkin: it clearly does some very nv50-specific things
07:35 imirkin: like checking that reg < 64
07:36 imirkin: although that happens to work out for fermi/kepler1 as well :)
07:36 imirkin: but on nv50 there are diff encodings used for "small" regs and "big" regs (i.e. < 64 and 64-127)
07:40 karolherbst: imirkin: that part causes trouble: https://github.com/karolherbst/mesa/blob/495613b09ac06d54196c5a7a887af3b43f155e4d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp#L2952-L2954
07:41 karolherbst: remove those three lines and it works
07:41 imirkin: ah yeah
07:41 imirkin: only do those if typeSizeof(i->sType) == 2
07:42 imirkin: (which was necessarily the case for nv50)
07:42 imirkin: actually
07:42 imirkin: change the if (isFloatType(ty)) to
07:42 imirkin: if (typeSizeof(i->sType) > 2)
07:46 karolherbst: imirkin: for which chips can I enable that pass then? I only added [c0,f0)
07:46 imirkin: in theory for all of them
07:46 imirkin: in practice, i don't think we support the limm emission everywhere
07:46 imirkin: take a look at...
07:47 imirkin: https://github.com/imirkin/mesa/commit/16b04998c02365ce572cd96d27b5a79b52d8d229
07:47 imirkin: pretty sure that's wrong, but...
07:47 imirkin: or rather, not wrong, just incomplete
07:47 imirkin: or maybe it's totally fine
07:47 imirkin: heh
07:48 imirkin: oh, i also have a thing in RA which attempts to allocate dstreg == src2reg for mad's
07:48 imirkin: but that's only enabled on nv50
07:48 imirkin: http://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp#n1469
07:49 imirkin: you *probably* want to add a thing which checks that src1 points to a mov with an immediate in it for nvc0. for nv50 it's advantageous either way since we can use the short encoding if it's all regs
07:54 karolherbst: imirkin: I think digging into RA is a task for another day, for now I want to search for simple optimizations to get used to the compiler stuff
07:55 imirkin: well... it's not really *digging*, it's more like copying a short block of code :)
07:55 imirkin: but up to you!
07:58 karolherbst: imirkin: anyway: https://gist.github.com/karolherbst/9ac253187cd7ad74623a
07:58 karolherbst: I think I try to check some shaders in shader-db with a high gpr count and try to find something there to improve :D
07:58 karolherbst: otherwise I feel bad for hurting some shaders here
07:59 imirkin: they can't feel the pain
07:59 imirkin: i'd say this is well worth it
07:59 imirkin: # of gpr's isn't *necessarily* a good thing to minimize at the cost of instruction count
07:59 imirkin: you obviously don't want to double it
07:59 karolherbst: yeah I know
07:59 imirkin: but such a minimal change doesn't really matter
08:00 imirkin: also i have a larger shader-db i can run things on
08:00 karolherbst: I got a little perf drop in unigine though
08:00 imirkin: unfortunately the public one kinda stinks
08:01 karolherbst: imirkin: what is allowed to be added there?
08:02 imirkin: the public shader db?
08:02 karolherbst: yeah
08:02 imirkin: things that are redistributable
08:03 karolherbst: k
08:04 karolherbst: imirkin: there is indeed one shader with gpr:63
08:04 karolherbst: 5226 instructions D:
08:04 imirkin: the orbital explorer one?
08:04 karolherbst: yeah
08:04 imirkin: that thing's crazy
08:05 karolherbst: I think I will look at this one now :D
08:08 karolherbst: imirkin: what does this do? set ftz f32 $r14 lt $r14 $r63
08:09 karolherbst: min($r14, $r63)?
08:09 imirkin: no
08:09 imirkin: sets $r14 to 0x0 or 0xffffffff depending on whether $r14 < 0
08:09 karolherbst: ohh okay
08:10 karolherbst: imirkin: what does emit do?
08:11 imirkin: EmitVertex() in glsl
08:11 imirkin: a geometry shader produces multiple vertices
08:11 karolherbst: yeah okay, I was just confused because of the two $r63
08:11 imirkin: emit basically sends the current outputs out as a vertex to the relevant vertex stream
08:15 karolherbst: slct u32 $r1 eq $r63 0xffffffff $r1 .... does it check $r1 against 0 and if it's 0 $r1=0 ?
08:15 karolherbst: otherwise $r1=0xffffffff ?
08:15 imirkin: maybe?
08:15 imirkin: i forget what the args of slct mean
08:16 karolherbst: I think src0 for true, src1 for false and src2 is checked
08:17 imirkin: yeah, that sounds right
08:17 imirkin: and the check is against 0
08:17 imirkin: so that slct is basically to clean up a boolean
08:17 imirkin: int -> boolean
08:17 imirkin: same thing as !!x
08:17 karolherbst: set u32 $r1 eq $r32 0x00000001; slct u32 $r1 eq $r63 0xffffffff $r1
08:18 imirkin: except bools are represented as 0xffffffff
08:18 imirkin: yeah
08:18 imirkin: so
08:18 imirkin: there's a lot of stupidity
08:18 karolherbst: :D
08:18 imirkin: as a result of us not knowing the values
08:18 imirkin: coz if it's *any* int, then we can't optimize
08:18 imirkin: but if we know that it's 0/1 or 0/-1 then we can be cleverer about it
08:19 karolherbst: well set sets $r1 to either 0x0 or 0xffffffff right?
08:20 imirkin: right.
08:20 imirkin: so set can also produce predicates
08:20 imirkin: i forget where that's all touched up
08:20 karolherbst: mhh okay
08:21 imirkin: potentially in the RA? dunno
08:21 karolherbst: but the slct sounds pretty useless to me in that case
08:21 karolherbst: because it doesn't change $r0
08:21 imirkin: yeah, that slct should become a set
08:21 imirkin: and we have things that optimize multiple lists of sets
08:21 karolherbst: why does it become a set?
08:22 imirkin: well, if it becomes a set, then we can merge the multiple sets together (sometimes)
08:22 imirkin: there's an opt for that
08:22 imirkin: in constantfolding
08:22 karolherbst: I am sure this can be completly removed already
08:22 karolherbst: https://gist.github.com/karolherbst/5a46d794a4cb115af69f
08:23 imirkin: so slct eq (or ne) 0 -1 x -> set eq (or ne) 0 x
08:23 imirkin: and then another thing can merge multiple sets together if they're there
08:23 karolherbst: ahh okay
08:24 karolherbst: you look at the more general case
08:24 imirkin: yes.
08:24 imirkin: opts should attempt to be as general as possible
08:24 karolherbst: okay
08:24 karolherbst: and if that opt doesn't help here we can still think of a more clever one
08:24 imirkin: we can look at why it doesn't help
08:24 imirkin: coz it should
08:25 karolherbst: okay
08:25 karolherbst: algebraic opt again?
08:25 imirkin: but sometimes there are other things preventing it from working, and then we can fix those too
08:25 imirkin: no, constfolding
08:25 imirkin: another case for opnd3
08:25 karolherbst: ohh right, makes more sense
08:25 imirkin: er no
08:25 imirkin: not opnd3
08:25 imirkin: expr actually, i guess
08:25 imirkin: since we want src0 and src1 to be specific immediates
08:26 karolherbst: expr(i, src0, src1, src2)
08:26 imirkin: no
08:26 karolherbst: ohh
08:26 karolherbst: right
08:26 imirkin: expr(i, src0, src1)
08:26 karolherbst: okay
08:26 imirkin: hold on
08:26 karolherbst: no, src2 is also immedaite in the overload
08:26 imirkin: yeah
08:26 imirkin: it'll have to go into both =/
08:27 karolherbst: ...
08:27 karolherbst: k
08:27 imirkin: or just call the 2-arg one from the 3-arg one
08:27 imirkin: for OP_SLCT
08:27 karolherbst: yeah makes sense somehow
08:27 imirkin: although you'll need fancier logic
08:27 imirkin: since everything else produces an immediate
08:27 imirkin: whereas here you just want to modify the op
08:28 imirkin: so... be careful :)
08:28 karolherbst: I'll try
08:30 karolherbst: meh, I don't get the instruction through "if (imm0.isInteger(0) && imm1.isInteger(-1))" :/
08:30 imirkin: why not
08:30 karolherbst: ahh I have to reorder
08:30 karolherbst: imm1 == 0 and imm0 == -1 works
08:31 imirkin: ?
08:31 user_8865: allah is doing
08:32 user_8865: sun is not doing Allah is doing
08:36 karolherbst: imirkin: with "if (imm0.isInteger(-1) && imm1.isInteger(0))" it works
08:36 imirkin: ah
08:36 imirkin: yeah, you need to allow both
08:36 imirkin: resulting in flipping the set between eq and ne
08:37 karolherbst: ohh you are right, it has a ne
08:37 imirkin: you have to handle all the various cases
08:37 imirkin: also note that nvc0 has something called "boolean" floats
08:38 imirkin: which are 1.0/0.0 -- set can produce those optionally
08:38 imirkin: instead of producing -1/0
08:39 karolherbst: imirkin: is imm0 here always src0?
08:40 imirkin: i believe so, yes
08:41 karolherbst: yep, it is
08:42 karolherbst: imirkin: where do I find that ne?
08:42 imirkin: i->cc? something like that. check nv50_ir_print.cpp when in doubt.
08:43 karolherbst: yeah cc
08:47 karolherbst: or maybe not? :/
08:47 karolherbst: weird
08:51 karolherbst: imirkin: I tried "imm1.isInteger(0) && imm0.isInteger(-1) && i->asCmp()->cc == CC_NE" and "imm1.isInteger(0) && imm0.isInteger(-1) && i->cc == CC_NE" any neither works
08:51 karolherbst: but print prints "slct u32 %r8161 ne %r8160 %r8156 %r8159 (0)"
08:52 imirkin: is it i->asCmp()->setCond?
08:52 imirkin: that stuff was always confusingly overlapping =/
08:52 karolherbst: that works, thanks
08:55 karolherbst: imirkin: "slct u32 $r1 ne 0xffffffff 0x0 $r1" => "set u32 $r1 ne 0x0 $r1" ?
08:55 karolherbst: or set u32 $r1 ne $r1 0x0?
08:56 imirkin: doesn't matter
08:56 imirkin: i slightly prefer the latter
08:56 karolherbst: ohh, set compares both sources, right
08:56 imirkin: yea
08:56 imirkin: order matters for lt but not ne/eq :)
08:56 karolherbst: that makes it somewhat easier
08:57 karolherbst: then I do the former
08:57 karolherbst: and just move src1=> src0 and src2=>src1 and change op to SET
08:57 karolherbst: mhh
08:57 karolherbst: or I just move src2=>src0 :D
08:58 imirkin: src2 -> src0 makes sense.
08:58 imirkin: and src1 = 0
09:03 karolherbst: imirkin: in the end, both the set and the slct were optimized away :D
09:03 karolherbst: becuase there was a third one
09:04 karolherbst: 19 instructions cut in that shader :)
09:04 karolherbst: I did something wrong though
09:05 karolherbst: ohh no, jsut forgot NV50_PROG_DEBUG
09:11 imirkin: or to remove the i->print()
09:11 imirkin: or to initialize the stupid colors thing
09:13 karolherbst: well nv-report complaint
09:14 karolherbst: imirkin: I am pretty sure I did a lot of mistakes here: https://github.com/karolherbst/mesa/commit/dca41e6e240dd48fd4a0b871da1bbc2f94448d12
09:15 imirkin: yeah that won't work
09:15 imirkin: or actually it might, but by luck
09:15 imirkin: i->asCmp()->setCond == CC_NE
09:15 karolherbst: well it works in that shader
09:15 imirkin: you don't want to have this check
09:16 karolherbst: ohh, why not?
09:16 imirkin: here's what you want
09:16 imirkin: cc = i->asCmp()->setCond
09:16 imirkin: if (cc != ne && != eq) goodbye
09:16 imirkin: if (imm0.isInteger(0) && imm1.isInteger(-1) { flip cc, flip imm0, imm1 }
09:17 imirkin: and make sure to return if you do anything
09:17 imirkin: coz the stuff after the while are bad news for you
09:17 imirkin: and make sure to up the foldCount, otherwise more things won't happen
09:17 karolherbst: foldCount?
09:22 imirkin: yeah.
09:23 imirkin: https://github.com/karolherbst/mesa/commit/dca41e6e240dd48fd4a0b871da1bbc2f94448d12#diff-46df282115646e17aa1321265c4e170aR682
09:23 imirkin: you won't hit that line
09:24 imirkin: coz you're going to return before it gets there
09:24 imirkin: but you still want that
09:24 karolherbst: ohh okay
09:24 karolherbst: but I break?
09:25 imirkin: you should return
09:25 imirkin: you don't want the stuff that comes at the end
09:25 karolherbst: okay
09:25 imirkin: it's designed for all the stuff that just folds everything in
09:25 imirkin: like add and stuff
09:27 imirkin: ravior: any luck with the blob fw?
09:28 ravior: imirkin: None. I'm still getting those warnings/errors. The firmware however was loaded correctly.
09:29 imirkin: ravior: are you still getting the hangs?
09:29 imirkin: (or were you never getting hangs?)
09:30 ravior: I haven't got any hanging as of yet, but I haven't test it that much today.
09:31 ravior: I was getting hangs usually preceded by those errors first.
09:31 imirkin: i think the main thing is that the blow fw recovers while the nouveau fw dies
09:31 imirkin: so the errors aren't necessarily "bad"
09:31 imirkin: (i mean, they are, but not fatal)
09:31 ravior: I'll come with more details after when I'll receive an hang.
09:32 ravior: I understand.
09:32 ravior: If there is any way I can help with testing/debugging this, please let me know.
09:33 imirkin: well, if you can come back and say that you're pretty sure that the blob fw "fixes" things for you, that will hopefully give skeggsb a bit more pause before declaring that nobody needs to use the blob fw
09:34 karolherbst: imirkin: like this? https://github.com/karolherbst/mesa/commit/2677840e587c940537faa5e049eb4047e3827ab5
09:35 imirkin: karolherbst: hehehe
09:35 imirkin: now you ALWAYS do it
09:35 karolherbst: ohh hav to add more immediate checks
09:35 imirkin: even if it's not 0/-1
09:35 karolherbst: ...
09:35 karolherbst: yeah noticed that
09:35 karolherbst: but otherwise?
09:35 imirkin: other than the fact that it's totally broken? yeah, looks great :p
09:35 karolherbst: :D
09:35 karolherbst: but for what is the cc flip actually needed?
09:36 imirkin: well
09:36 imirkin: heh
09:36 imirkin: basically you're taking
09:36 imirkin: a = (x == 0 ? 1 : 0) and converting it into a = (x == 0)
09:37 imirkin: however sometimes it's
09:37 imirkin: a = (x == 0 ? 0 : 1)
09:37 imirkin: in which case that needs to become a = (x != 0)
09:37 imirkin: right?
09:37 karolherbst: ohh okay
09:37 imirkin: and if it was a = (x != 0 ? 0 : 1) then it needs to become a = (x == 0)
09:37 imirkin: etc
09:38 imirkin: [except the way bools work on gpu's is that they tend to be -1 for true, but i've replaced that with 1 above for illustrative purposes]
09:39 imirkin: but not always -- e.g. on adreno, the output of cmp is 1/0
09:39 imirkin: so we end up having to negate its output in order to conform with gallium's expectations (esp to be compatible with values loaded into uniforms)
09:40 karolherbst: sounds messy
09:41 imirkin: compilers are hard :)
09:41 imirkin: appreciate gcc.
09:44 orbea: yea, I tried to use tcc once, doesn't even compile kernel headers
09:45 orbea: it was fast though!
09:45 imirkin: tcc? you mean turbo c++?
09:45 orbea: tinycc
09:46 imirkin: ah ok. i was gonna say that turbo c(++) from like 1995 is probably *not* going to build a modern kernel
09:46 imirkin: esp as it was a 16-bit compiler
09:46 orbea: yea P
09:46 karolherbst: imirkin: like that? https://github.com/karolherbst/mesa/commit/047286160999115f2ab47e627f31cbcbb94c3c29
09:46 orbea: :P
09:47 karolherbst: the change in the output looks odd though
09:48 imirkin: that looks like it could work...
09:48 karolherbst: imirkin: https://gist.github.com/karolherbst/5a46d794a4cb115af69f
09:49 imirkin: so... it worked, right?
09:49 imirkin: i guess something still uses $r1 so it couldn't DCE that?
09:50 imirkin: i wonder if a single set can produce both the flag *and* the register dest... i don't think so
09:50 karolherbst: I just find it funny why the $r1 suddenly becomes 0x00000001 in the second set
09:50 imirkin: huh?
09:51 imirkin: no, it inlines the whole thing in
09:51 imirkin: the first arg changes to $r32 as well :p
09:51 karolherbst: ohhh
09:51 imirkin: i.e. it "sees through" the slct, which became a set
09:52 karolherbst: imirkin: $r1 might be used in another block
09:52 imirkin: right
09:52 imirkin: which is very sad
09:52 imirkin: esp since it could probably end up using the predicate instead
09:53 imirkin: but we're not extremely smart about it
09:53 imirkin: fyi there are also set_and/set_or/set_xor variants
09:53 imirkin: which allow you to do the set op condition
09:54 imirkin: so like a = (b == c) && x
09:54 imirkin: where x is some predicate
09:54 karolherbst: imirkin: https://gist.github.com/karolherbst/5a46d794a4cb115af69f
09:54 karolherbst: $r50 and $r51
09:54 karolherbst: mhh, okay
09:54 imirkin: lame.
09:55 imirkin: the mov's happen post-ra
09:55 imirkin: er
09:55 imirkin: during ra
09:55 karolherbst: ...
09:55 imirkin: because it has to condense the values into a block
09:55 karolherbst: these are stupid ones
09:55 imirkin: i.e. it has to be a sequential set of 4 regs
09:55 karolherbst: mhh okay
09:56 imirkin: this stuff should be handled better
09:56 imirkin: but i haven't had the heart to fix it up
09:56 karolherbst: well that's in the block where also $r62 is used
09:56 karolherbst: would be nice to reduce the load/stores there a bit
10:02 karolherbst: imirkin: there are 2382 movs in that shader...
10:02 imirkin: you're looking at the orbital explorer one? just let it go :)
10:02 karolherbst: :D
10:02 imirkin: it's gone.
10:02 karolherbst: ohh okay
10:03 imirkin: look at something useful
10:03 karolherbst: so they did something really stupid in it and replaced that with something better?
10:03 imirkin: like shaders from heaven or something
10:03 imirkin: oh, i dunno
10:03 imirkin: i'm just saying... not worth worrying about.
10:03 karolherbst: imirkin: I will take a look at shaders with a high gpr count, no idea if that makes sense, but reduing instruction counts seems easy enough, so I rather concentrate on shaders where I could accidentally also redude gpr count
10:04 imirkin: you can make an apitrace of your favorite game
10:04 karolherbst: found some nice tesseract and unity ones
10:04 imirkin: and then run it with ST_DUMP_SHADERS=foo
10:04 imirkin: and then use split-to-files.py
10:04 imirkin: and then use the runner thing on it
10:04 karolherbst: I planned that for later
10:06 karolherbst: imirkin: those unity shaders are also used in games or mainly benchmarks?
10:06 karolherbst: or no clue?
10:06 imirkin: unity is an engine
10:06 imirkin: which i believe is used by a number of games
10:06 imirkin: but not AAA ones
10:07 imirkin: unigine tends to be used for simulations, i don't think any games use it
10:07 imirkin: unreal... i think some AAA games use it, not 100% sure
10:08 ravior: imirkin: It just hanged when I was using the proprietary firmware.
10:08 imirkin: :(
10:08 imirkin: oh well, it was something to try
10:12 karolherbst: imirkin: I would like to have a way to check in which shader the gpu uses the most time :/ that would be somewhat usefull
10:12 imirkin: yes, quite
10:12 imirkin: i think apitrace has some stuff to help with that
10:12 imirkin: i.e. timing each draw call
10:14 karolherbst: mhhh ls changed
10:14 karolherbst: weird
10:15 imirkin: karolherbst: btw, you may be interested in https://github.com/chrisbmr/Mesa-3D/commits/sched-wip -- calim's instruction scheduling branch
10:15 imirkin: 4 years old... would need a bit of rebasing, but i doubt things have changed that much
10:17 karolherbst: imirkin: can I also create those shaders without apitrace?
10:17 imirkin: karolherbst: which shaders?
10:17 karolherbst: like just run the game with ST_DUMP_SHADERS?
10:17 imirkin: oh yeah of course
10:17 karolherbst: okay, because some games are not apitraceable :D
10:17 imirkin: but apitrace has tooling to tell you how long things take
10:18 imirkin: sad. i was hoping he had something with images. oh well.
10:19 pmoreau: :-/ One of the pass seems to change some value type to TYE_NONE (NV50_PROG_DEBUG=255: https://phabricator.pmoreau.org/P76)
10:20 imirkin: what line am i looking for?
10:20 karolherbst: saints row IV is a joke of a port ...
10:21 karolherbst: what the hell is going on witht hat game
10:21 pmoreau: Well, it sadly can't be seen in the trace
10:21 karolherbst: no cpu load
10:21 karolherbst: no gpu load
10:21 karolherbst: ...
10:21 karolherbst: and like 10 fps
10:21 imirkin: pmoreau: what evidence of failure am i looking for?
10:21 pmoreau: imirkin: the mad 64 still being there at the end, and not having been split
10:22 imirkin: oh
10:22 imirkin: that's expected iirc
10:22 imirkin: or did you add mad splitting?
10:22 pmoreau: Gonna link the source link
10:23 imirkin: oh, i bet it's the zero-insertion logic
10:23 imirkin: the zero it inserts is a plain 32-bit zero
10:23 imirkin: potentially TYPE_NONE, dunno
10:24 imirkin: or i could imagine the RA creating TYPE_NONE merged register nodes
10:24 imirkin: you don't want to rely on the TYPE of those things
10:24 imirkin: you want to rely on i->dType/i->sType
10:24 pmoreau: https://phabricator.pmoreau.org/diffusion/MESA/browse/spirv_1.0/src/gallium/drivers/nouveau/codegen/nv50_ir_build_util.cpp;bc10731897ae2ad5ef5999c292ff0d22c2a98eb1$589
10:25 pmoreau: Well, when I ran GDB, the immediates had the correct type, but not the %r0d
10:25 imirkin: yeah, don't dick around with that stuff... just if i->dType == whatever
10:25 pmoreau: K
10:25 imirkin: the type of the value doesn't really matter...
10:25 imirkin: values are basically typeless
10:26 imirkin: i don't even know where reg.type is ever used
10:26 imirkin: maybe for some system values, dunno
10:26 pmoreau: Especially since getScratch() won't set it to any type, and passing it an instruction won't modify it either
10:27 imirkin: sure, but like things can be reinterpreted too
10:27 imirkin: values are basically typeless
10:27 imirkin: it's the instruction that defines what type the value is
10:28 pmoreau: Understood
10:28 imirkin: via i->sType and i->dType
10:28 imirkin: normally those are the same
10:28 imirkin: but in some rare cases they're not
10:28 karolherbst: imirkin: now I got like 3400 shaders :D does split debug search for duplicates?
10:28 imirkin: karolherbst: i don't remember.
10:31 karolherbst: imirkin: https://gist.github.com/karolherbst/2bdc049f1cd3f95a2b77 :D
10:31 imirkin: oh yeah... i have a bunch of fixes locally for my script
10:31 karolherbst: but still
10:31 karolherbst: no locals used?
10:31 imirkin: that's common
10:32 karolherbst: mhh okay
10:32 imirkin: local = lmem
10:32 karolherbst: ohh okay
10:32 imirkin: that gets used if you have to spill or if you have variable array accesses
10:32 imirkin: neither of which normal people do
10:32 karolherbst: okay
10:32 karolherbst: mhh but the gpr usage went pretty up with my changes here :/
10:33 pmoreau: Ah ah ah!! Even the conditional was wrong! `if (i->getSrc(0)->reg.type != TYPE_U32 || i->getSrc(0)->reg.type != TYPE_S32 || i->getSrc(1)->reg.type != TYPE_U32 || i->getSrc(1)->reg.type != TYPE_S32) return NULL`
10:33 pmoreau: Of course it would always return NULL
10:33 imirkin: pmoreau: yeah i wasn't sure what you were trying to achieve
10:34 imirkin: pmoreau: fyi there's a typeSizeof() and a isFloatType()
10:35 imirkin: hakzsam: fyi i've updated my atomic3 branch to just have ssbo/atomic again, images are off on their own now
10:35 pmoreau: Ah, nice :-)
10:36 karolherbst: 79 BBs in a 340 instruction shader :D
10:36 imirkin: pmoreau: so i guess if (typeSizeof() != 8 || isFloatType()) return NULL?
10:36 imirkin: karolherbst: but i bet most of the control flow goes away
10:36 imirkin: karolherbst: and just becomes predicated instructions
10:37 karolherbst: imirkin: but the compiler will optimize less, right?
10:37 pmoreau: imirkin: Right, or even if (isFloatType()) return NULL, since the split64BitOpPostRA starts by returning if typeSizeof() != 8
10:37 imirkin: karolherbst: well, the big thing is we don't have GCM
10:37 imirkin: but other than that, about the same
10:37 imirkin: pmoreau: ah right
10:38 pmoreau: Works better know O:-)
10:38 imirkin: awesome
10:38 pmoreau: Thanks!
10:38 imirkin: np
10:38 imirkin: pmoreau: if you had to gauge your % completion for SPIR-V
10:38 imirkin: where would you say you're at?
10:38 imirkin: like 5% or like 30%?
10:39 pmoreau: All features equal? And how would you define a feature?
10:39 imirkin: pmoreau: time to completion
10:39 pmoreau: Ugh…
10:39 pmoreau: Well, still have to get out of SSA…
10:40 pmoreau: That should take at least two months I think
10:41 imirkin: and control flow? :)
10:41 pmoreau: Dealing with all the decorations might take some time as well (alignments, fixed insn, etc.)
10:41 pmoreau: I got to run some simple if/else that had no phi node
10:41 imirkin: heheheh
10:41 pmoreau: https://phabricator.pmoreau.org/w/mesa/opencl_through_spirv_current_status/ and https://phabricator.pmoreau.org/w/mesa/testing_opencl_through_spirv/
10:41 pmoreau: If you're interested
10:42 imirkin: heh. those items are not of equal size :)
10:42 imirkin: the operators should be pretty easy to knock out
10:42 pmoreau: Sure they aren't
10:42 imirkin: and give you a sense of accomplishment
10:42 pmoreau: Right
10:43 pmoreau: Some of the builtins shouldn't be too hard either
10:43 CyberShadowL: Hi all, I'm getting an "Oops: 0002 [#1] PREEMPT SMP" on load.
10:43 karolherbst: can in "add ftz f32 $r13 neg $r15 $r13" $r13 be replaced by a immediate?
10:43 pmoreau: Array and structures should be almost supported already.
10:43 imirkin: CyberShadowL: please pastebin the whole thing
10:44 CyberShadowL: imirkin: http://dump.thecybershadow.net/5fdbe2c18150d8dc239c713c0dac92a8/nouveau-dmesg
10:44 imirkin: karolherbst: i dunno, check nv50_ir_target_nvc0 :p
10:44 CyberShadowL: This is with two GPUs, a GTX Titan and 8600 GTS
10:44 imirkin: [ 3.862334] BUG: unable to handle kernel paging request at ffff88110fa7cffc
10:44 imirkin: [ 3.862369] IP: [<ffffffffa0c868aa>] evo_wait+0x5a/0x130 [nouveau]
10:44 imirkin: impressive.
10:45 karolherbst: imirkin: well the blob does, so...
10:45 pmoreau: Oh!
10:45 pmoreau: Rings a bell
10:45 imirkin: CyberShadowL: did you have the blob loaded before nouveau?
10:45 pmoreau: Going to find the bug report for that
10:45 imirkin: er hm, probably not 3 seconds into the boot
10:45 CyberShadowL: Well, yes, before a reboot. I deleted it from the system and rebooted.
10:45 CyberShadowL: Do I need to power cycle? :)
10:45 imirkin: CyberShadowL: could you try booting with nouveau.config=NvForcePost=1 ?
10:46 CyberShadowL: OK, let me try that.
10:46 pmoreau: https://bugs.freedesktop.org/show_bug.cgi?id=82714
10:46 pmoreau: CyberShadowL: Is the Titan the main card?
10:46 CyberShadowL: Yes, pmoreau
10:47 CyberShadowL: By "main" you mean the one with monitors attached?
10:47 pmoreau: Could be the same bug as https://bugs.freedesktop.org/show_bug.cgi?id=82714
10:47 pmoreau: Right
10:48 imirkin: pmoreau: oh, so it was actually the posting that nouveau does that was breaking it? :)
10:48 pmoreau: Well… it's seems to change from time to time :-D
10:48 pmoreau: When I initially tested, I don't think it was the case
10:49 pmoreau: Haven't been able to get back to it since I have been moving a lot, and still don't have a desktop computer to plug it back.
10:58 karolherbst: so, "82: mov u32 $r14 0x3f800000; 90: add ftz f32 $r13 $r14 neg $r2" =LoadPropagation=> "62: mov u32 $r13 0x3f800000; 70: add ftz f32 $r13 neg $r15 $r13"
10:58 karolherbst: imirkin: in which opt would you put add(imm, a) => add(a, imm)?
10:58 imirkin: uhhhh
10:58 CyberShadowL: imirkin: Still getting the oops with NvForcePost=1.
10:59 imirkin: karolherbst: LoadPropagation should try to swap orders
10:59 imirkin: CyberShadowL: sad. unfortunately i have no idea what can cause that.
10:59 imirkin: CyberShadowL: i'd recommend filing a bug on bugs.freedesktop.org xorg -> Driver/nouveau with the relevant info
10:59 imirkin: i.e. dmesg + lspci -nn -d 10de:
10:59 CyberShadowL: OK, I'll do that, thanks.
11:00 pmoreau: imirkin: Wouldn't it be the same as the one I pointed at?
11:00 karolherbst: imirkin: well the immediate is moved to src1 in loadpropagation, but the mov is still there
11:00 imirkin: pmoreau: could be, who knows
11:00 imirkin: karolherbst: should get DCE'd
11:00 imirkin: karolherbst: or perhaps it's not allowed if there's a neg? check the target :p
11:00 pmoreau: imirkin: Probably no one :-D
11:01 karolherbst: imirkin: blob does something like that: add ftz f32 $r4 neg $r2 0x3ca3d70a
11:01 karolherbst: ohh it is u32
11:01 karolherbst: mhh
11:01 imirkin: hm, the only thing is we don't allow saturation
11:02 imirkin: i touched that code recently, perhaps i messed it up
11:02 imirkin: try to figure out why it doesn't get inlined
11:02 karolherbst: and the inline happens in DCE?
11:02 imirkin: looks like add sat + limm is disallowed, but you're not saturating
11:02 imirkin: no
11:02 imirkin: inline happens in LoadPropagation
11:02 karolherbst: okay
11:26 CyberShadowL: imirkin: Sorry, just wanted to confirm - I'm filing the bug against the Xorg driver, even though the fault is in the kernel module?
11:26 imirkin: CyberShadowL: yes.
11:26 CyberShadowL: OK
11:26 imirkin: CyberShadowL: there is no separate component for the kernel driver, and historically what was in the xorg driver moved into the kernel
11:27 CyberShadowL: OK. If you hadn't directed me I'd probably have gone to bugzilla.kernel.org.
11:29 imirkin: yeah, we don't really pay attention to the kernel bugzilla
11:32 karolherbst: imirkin: mhh in the same shader it works for a add where the immediate is already at the second src and the mov gets eliminated, but for the other add only the srcs are swapped, no idea why
11:41 heiko: good evening
11:49 imirkin: karolherbst: figure out why?
11:49 imirkin: karolherbst: i recently messed around with the switching logic, i could have messed it up somehow
11:49 CyberShadowL: imirkin: It works after a power cycle. (I'm not going crazy, am I?)
11:54 imirkin: CyberShadowL: probably not
11:54 imirkin: CyberShadowL: sounds like blob did something that persisted across reboot that nouveau wasn't able to properly fix up
11:54 imirkin: or some state that nouveau assumed was ok but wasn't. dunno.
12:04 karolherbst: imirkin: k....
12:05 karolherbst: imirkin: mov $r1 0x5436; mov $r2 $r1; add $r3 neg $r4 $r2;
12:05 karolherbst: any idea what happens now? :D
12:05 imirkin: well, if it's a u32 add, i dunno if there's an imm version
12:05 karolherbst: nope
12:05 karolherbst: add $r3 neg $r4 $r1 happens
12:06 karolherbst: and then it is done with it
12:06 imirkin: right, LocalCSE should do that
12:06 imirkin: ohhhhh
12:06 imirkin: i see
12:06 imirkin: something is generating a mov
12:06 karolherbst: yes
12:06 imirkin: one of the opts, like ConstantFolding
12:06 karolherbst: maybe
12:06 imirkin: and normally we use CSE to fix that up
12:06 imirkin: but that happens outside of your loop
12:06 karolherbst: okay, then I will check where that comes from
12:06 imirkin: try adding LocalCSE into the loop -- it'll be mega-slow, but would prove the theory
12:06 karolherbst: k
12:08 karolherbst: mhh no, doesn't change it
12:09 karolherbst: ohhh wait
12:09 karolherbst: imirkin: the mov is u32 and the add is f32
12:09 karolherbst: could that cause troubles?
12:11 mwk: karolherbst: there's only one kind of 32-bit mov anyhow
12:11 karolherbst: ohhh okay
12:11 imirkin: karolherbst: no.
12:12 karolherbst: gdb annoyed me a bit anyway, so I will do another debugging round :/
12:17 karolherbst: imirkin: ConstantFolding does something odd
12:18 imirkin: karolherbst: it generates MOV's
12:18 imirkin: that's expected.
12:18 karolherbst: mov u32 %r720 0x00000000; mov u32 %r721 0x3f800000; mad ftz f32 %r722 %r705 %r720 %r721; add ftz f32 %r742 %r722 neg %r734
12:18 karolherbst: so the mad becomes a mov
12:18 imirkin: right
12:18 mwk: ... shouldn't that be mov b32?
12:19 imirkin: mwk: this is nv50 ir
12:19 imirkin: mwk: it prints whatever type happens to be on there
12:19 mwk: ah
12:19 karolherbst: imirkin: and which pass should be able to clean that up?
12:19 karolherbst: localcse?
12:19 imirkin: karolherbst: LocalCSE i think
12:19 imirkin: or... hm
12:20 imirkin: karolherbst: CopyPropagation
12:20 imirkin: stick that at the end of the loop
12:20 karolherbst: yep
12:20 karolherbst: works
12:20 imirkin: cool
12:20 karolherbst: after dead code?
12:21 karolherbst: nice, 3 instructions down
12:29 karolherbst: imirkin: by the way, it was a nice idea to generate all those shaders :D
12:29 karolherbst: will run it on some aaa titles and then I have a really big database here
12:30 karolherbst: imirkin: ohh the black gun issue in shadow warrior is still there?
12:30 imirkin: :)
12:31 imirkin: i have some patches that aren't ready
12:31 karolherbst: game runs awesome by the way
12:31 karolherbst: highest settings and really smooth :)
12:31 karolherbst: good port
12:41 karolherbst: imirkin: AssertionError: tessellation evaluation :/
12:41 karolherbst: with unigine
12:42 karolherbst: ahh well nvm
12:42 imirkin: in split-to-files?
12:42 imirkin: yeah, the output changed. marek sent a patch.
12:42 karolherbst: ohh okay
12:44 karolherbst: mhh some spilling issues in some shadow warrior shaders, maybe that is related to the black gun stuff?
12:45 imirkin: no
12:45 imirkin: i know what's going on there
12:45 imirkin: just unclear how to fix it
12:45 imirkin: it's an issue in the RA
12:45 karolherbst: ohh okay
12:48 imirkin: basically the fixed regs of the division call are breaking it
12:48 karolherbst: imirkin: mhh but when I run them through shader_runner myself, they don't crash :/
12:52 karolherbst: ohh wait, I should run it with -1
13:09 urjaman: this is a bit "what?" ... current behavior: browsing tumblr crashes firefox (because "Crash Annotation GraphicsCriticalError: |[0][GFX1-]: GLContext is disabled due to a previous crash.") ....
13:09 urjaman: and [198593.549853] nouveau 0000:02:00.0: fifo: CACHE_ERROR - ch 1 [DRM] subc 0 mthd 0060 data 80000002
13:10 urjaman: (that could be associated or not... i'm running KSP also at the moment)
13:16 imirkin: tesla gpu?
13:17 urjaman: G96 (9400 GT)
13:17 urjaman: same as before when i was here ...
13:17 karolherbst: imirkin: by the way: the loop isn't noticeable in a 22s shader-db run
13:17 urjaman: (obviously you cant remember but anyways)
13:19 imirkin: urjaman: there's an undiagnosed issue where something gets messed up sometimes and you see those CACHE_ERROR messages
13:20 urjaman: this still does all kinds of stuff, like DMA_PUSHER, DATA_ERROR and CACHE_ERROR
13:20 imirkin: yes
13:20 imirkin: all related
13:20 imirkin: probably DMA_PUSHER error 406040 or 400040?
13:21 imirkin: the fifo somehow gets out of sync, and sadness ensues
13:21 urjaman: behaviour varies from firefox crashing to Xorg crashing (sysrq kill all tasks wakes it up) to just all graphics gone (sysrq reboot is what i do at that point)
13:21 urjaman: yep 406040
13:23 urjaman: it's just very curious how now i cant seem to be able to browse _tumblr_ (for more than 5min) without a firefox crash... works hours as long as i dont tumblr -.-
13:23 imirkin: moral of the story: don't use tumblr
13:23 urjaman: ha ha :P
13:25 urjaman: btw i've disabled vdpau (by taking the _nouveau library out of its place) and that makes video playback a little bit less crashy, but i guess this is something more generic and vdpau just happens to be talking to gpu lots...
13:26 karolherbst: imirkin: I found a case where running algebraicopt once again causes +3 instructions, +2 max gpr
13:27 imirkin: karolherbst: it can happen
13:27 imirkin: karolherbst: it's not perfect.
13:27 imirkin: karolherbst: but it does a lot more good than harm
13:27 karolherbst: yeah, but maybe I figure out why
13:27 karolherbst: imirkin: yeah I know, but sometimes running it twice instead of a third time is better :/
13:28 imirkin: lol
13:30 karolherbst: for some reasons there is some add => mad conversion going on
13:30 karolherbst: with no benefit
13:30 karolherbst: sometimes it even adds a few movs
13:31 karolherbst: imirkin: "add u32 %r467 %r349 0x00000010" becomes "mov u32 %r847 0x00000010; mov u32 %r863 0x00000030; mad u32 %r467 %r339 %r863 %r847"
13:32 karolherbst: but
13:32 imirkin: it doesn't know about IMAD32I
13:32 imirkin: right.
13:33 imirkin: wouldn't help here but there's also a ISCADD which is a shift + add
13:33 karolherbst: I will disable all other optimization after the thrid algebraic
13:33 karolherbst: maybe that shows what the real issue is
13:33 imirkin: no, it's working as intended
13:34 imirkin: that %r349 is a mul, right?
13:34 karolherbst: yes
13:34 karolherbst: mul u32 %r349 %r339 0x00000030
13:35 imirkin: right, so it takes a mul + add and makes it into a mad
13:35 karolherbst: yes
13:36 karolherbst: but leaving that add would mean a higher optimization
13:36 karolherbst: in this case
13:38 imirkin: that's not the pass's concern
13:41 karolherbst: that's not really fixable, right?
13:49 karolherbst: imirkin: do you have a patch or do you know any way to list all hurt shaders?
13:50 karolherbst: I have a lot where the instruction count gets higher and I want to check why
13:52 imirkin: yeah, just stick a print in
13:52 karolherbst: mhh but I think this will be the case for every shader
13:52 imirkin: hol don
13:52 karolherbst: this add+mul => mad thing
13:52 imirkin: karolherbst: this is my current patch: http://hastebin.com/qoyonidugo.pl
13:52 imirkin: just uncomment the print's you're interested in
13:55 karolherbst: imirkin: https://gist.github.com/karolherbst/ffcbc373cb2f1025c31f :/ that doesn't look that good anymore
13:56 karolherbst: thought I guess gprs have a smaller perf impact than instruction count
13:56 imirkin: one can hope
13:57 karolherbst: okay, found one real bad one
14:04 karolherbst: I wonder if this vfetch export stuff can be optimized in a smart way :/
14:05 karolherbst: one eliminated mov caused 3 moves to be added in the FlatteningPass
14:09 karolherbst: but I think I should not worry about movs :D
14:13 wvuu: hello
14:14 wvuu: I am running dolphin emu with nouveau and check out the LSD results --> https://imgur.com/a/jBWDI
14:14 wvuu: I have other apps such as gimp, mplayer, etc, nothing shows that type of glitches.
14:16 karolherbst: imirkin: I wonder what the result of this is "set ftz u32 $r2 ge f32 $r0 neg $r0"
14:16 koz_: Hi friends - how good would the performance of Guild Wars 2 be on the latest Nouveau?
14:16 karolherbst: koz_: with kepler good enough
14:18 karolherbst: stupid neg :/
14:18 koz_: karolherbst: I have a 680, so that should be OK, right?
14:18 karolherbst: should be
14:19 koz_: OK, thanks. Clearly my performance issues aren't related to Nouveau. I have this weird thing with fonts in GW2.
14:19 karolherbst: you might get some reclokcing issues though, but you can try out some patches in such cases
14:19 koz_: Like, they don't display properly *at all*.
14:19 koz_: karolherbst: Oh, I reclock just fine.
14:19 karolherbst: k
14:19 koz_: Could the fonting issue be Nouveau-related somehow, or is this an issue elsewhere in the stack?
14:19 karolherbst: koz_: is that with wine?
14:19 koz_: karolherbst: Yep - GW2 has no native client for GNU/Linux.
14:20 karolherbst: then it could also be a wine issue
14:20 koz_: (they have a Mac one, but it's apparently rather shitty)
14:20 karolherbst: koz_: you could try out gallium-nine and see if thats better
14:20 koz_: I suspect as much - I've reported it, but haven't heard back. So pursuing other options.
14:20 koz_: Gallium-nine?
14:20 karolherbst: yeah, like native d3d9 on linux
14:20 karolherbst: no d3d9-> opengl translation
14:20 imirkin: wvuu: what version of mesa?
14:20 karolherbst: you get also better quality, not only better performance
14:21 koz_: karolherbst: I have not heard of this. Could you link me?
14:21 wvuu: hold on
14:21 karolherbst: koz_: https://wiki.ixit.cz/d3d9
14:21 imirkin: wvuu: i recently fixed a bug that affected dolphin a lot like that
14:21 koz_: Thanks - I'll investigate it!
14:21 wvuu: 11.1.0
14:21 imirkin: try 11.1.1 -- that has the fix
14:22 wvuu: not in portage tree :(
14:22 imirkin: ok
14:22 imirkin: it's in my portage tree...
14:22 wvuu: ok
14:22 imirkin: are you sure it's not there?
14:22 wvuu: syncinc
14:23 imirkin: https://packages.gentoo.org/packages/media-libs/mesa
14:23 imirkin: 11.1.1 added 10 days ago
14:23 karolherbst: imirkin: can neg(and(a, 0x1)) be somehwat optimized?
14:23 imirkin: heh
14:24 wvuu: do you know by any chance what's the nature of the error? I am curious
14:24 imirkin: wvuu: i got too excited with an opt? :)
14:24 imirkin: wvuu: http://cgit.freedesktop.org/mesa/mesa/commit/?h=11.1&id=1415b6b0aebd37f9d472a2c24b9ce603023d0475
14:24 karolherbst: imirkin: a is either true or false in my case
14:24 wvuu: ha ha ha!!!
14:24 karolherbst: I saw that several times already
14:24 imirkin: karolherbst: incredibly dumb right? coz it's a no-op
14:24 wvuu: lol
14:25 imirkin: karolherbst: if you can detect what's going on, you could make an algebraic opt
14:25 karolherbst: usually I am pretty weak with bit operations :D
14:25 imirkin: karolherbst: no time to learn like the present!
14:26 karolherbst: yeah I know
14:26 koz_: imirkin: Have the STK people bothered to do anything with regard to the whole issue I reported? Have you guys had any contact/patches from them, or am I hoping for too much?
14:26 imirkin: koz_: can you remind me what issue you reported?
14:27 imirkin: (i remember it was something, but can't remember what)
14:27 koz_: imirkin: https://github.com/supertuxkart/stk-code/issues/2386#issuecomment-161967394
14:27 imirkin: i did semi-recently fix an issue where we were crashing sometimes due to indirect draws
14:27 karolherbst: imirkin: still I don't get why that is a nop? 0xffffffff => -1, 0 .... okay, I get it
14:27 imirkin: but that should be included in 11.1.1 as well
14:27 karolherbst: :D
14:27 imirkin: karolherbst: ;)
14:28 karolherbst: I usually do stuff like that in python, but python does stupid things
14:28 imirkin: koz_: hm, random corruption? i'd need an apitrace to be able to investigate.
14:28 koz_: How would I give you one of those?
14:28 imirkin: koz_: https://github.com/apitrace/apitrace
14:28 karolherbst: imirkin: is there something like source is result of boolish operation?
14:29 karolherbst: so we know the source is either 0 or -1?
14:29 imirkin: karolherbst: no, we don't know that. but if it comes out of a OP_SET you know it's good
14:29 imirkin: since that's all that OP_SET can produce
14:29 karolherbst: yeah
14:29 koz_: imirkin: Alrighty, I'll try and get you one. I should report this via the bugtracker I assume?
14:29 karolherbst: but there might be other ops
14:29 karolherbst: ;)
14:29 imirkin: koz_: first make sure that replaying the trace you see the same issue
14:30 imirkin: karolherbst: you're talking about value-range propagation (aka "VRP"). it's a tricky business.
14:30 karolherbst: yeah, I was thinking about that several times already
14:30 karolherbst: but I was thinking about somethig simplier
14:30 karolherbst: like
14:30 karolherbst: hasBooleanResult(src->op)
14:31 imirkin: ah yeah, but that misses stuff
14:31 imirkin: for example
14:31 imirkin: OP_NEG -- that doesn't have a boolean result
14:31 imirkin: BUT
14:31 imirkin: what if you knew that its input was only ever 0 or 1
14:31 imirkin: then all of a sudden it has a boolean result
14:31 karolherbst: I see
14:31 koz_: imirkin: What do you mean 'replaying the trace'? I don't see any such option in its usage instructions.
14:31 imirkin: but your thing would be fine too -- that would catch OP_SET and its friends
14:31 imirkin: OP_SET_AND, OP_SET_OR, OP_SET_XOR
14:31 karolherbst: hasBooleanResult(src->op, hasBooleanResult(src->src->op)) ;)
14:31 imirkin: koz_: glretrace
14:32 koz_: imirkin: Ah, right. OK.
14:32 koz_: Will give it a go and see what I find.
14:32 imirkin: i.e. make sure that the thing you're giving me shows the issue you're seeing
14:32 imirkin: coz if it doesn't, then it's all for naught
14:32 koz_: I see.
14:32 imirkin: koz_: also for my sanity (and your bandwidth) please use a small window when making the trace
14:32 imirkin: these things get pretty big at 2560x1440
14:33 karolherbst: imirkin: I have to optimize the and away, right?
14:33 imirkin: karolherbst: well, you can't just "optimize it away"
14:33 imirkin: but you can detect a certain algebraic sequence
14:33 karolherbst: because something else can optimize neg(set(...))
14:33 karolherbst: yeah I meant after I detect the case
14:33 imirkin: but that's the thing
14:33 imirkin: set doesn't produce 1/0
14:33 imirkin: it produces -1/0
14:33 koz_: imirkin: So *don't* run it full-screen?
14:33 imirkin: so the AND will change its value
14:33 imirkin: koz_: no :p
14:34 karolherbst: ohhh
14:34 koz_: imirkin: OK, low-res, gotcha.
14:34 karolherbst: right
14:34 imirkin: unless the issue only happens when you're full-screen. in that case we're sunk.
14:34 imirkin: koz_: basically keep in mind that my GPU is about 1000x slower than yours
14:34 imirkin: koz_: so when i go to replay, i don't want to have to sit there for half an hour :)
14:34 koz_: imirkin: Really? That big a difference? Woah.
14:34 imirkin: well, i might be exaggerating
14:35 karolherbst: the 680 is pretty fast though
14:35 imirkin: but i have a slow GF108
14:35 koz_: I guess your point is 'not everyone has a 680 to tinker on'.
14:35 imirkin: which is the slowest fermi of them all
14:35 koz_: (should I send you guys one?)
14:35 karolherbst: well my gpu is pretty close
14:36 koz_: karolherbst: Yeah, I recall you run some monster mobile GPU.
14:38 karolherbst: imirkin: okay, so the thing is neg(add(set, 1)) is set, mhh, but I always have to start with the last instruction in a pattern, so I need t check the neg if the source is neg(bool, 1)
14:38 karolherbst: *add(bool, 1)
14:41 karolherbst: imirkin: does it also only applies to floats?
14:41 karolherbst: ohh wait
14:51 imirkin: yes.
15:14 karolherbst: seems like somebody knows what he is doing: 398: not $p0 mul ftz f32 $r1 $r1 340282346638528859811704183484516925440.000000
15:14 karolherbst: :D
15:16 karolherbst: imirkin: looks good? https://github.com/karolherbst/mesa/commit/cbb074f815d3d702b5aa6814a212709308dbaa73
15:18 wvuu: after compiling new mesa, is there a need to recompile any other X related package?
15:20 imirkin: wvuu: nope
15:20 imirkin: karolherbst: that's infinity i think
15:20 wvuu: is the 'xvmc' USE flag relevant to nouveau?
15:20 wvuu: I always thought it was an ancient thingy.
15:21 karolherbst: imirkin: that change in saints row iv shaders: total instructions in shared programs : 255227 -> 245545 (-3.79%)
15:21 karolherbst: ....
15:21 imirkin: wvuu: yes, but probably not to your gpu -- it's only useful for NV30-G98
15:21 imirkin: karolherbst: very cool
15:21 karolherbst: will run the game and see if anything looks odd
15:22 karolherbst: imirkin: and total gprs used in shared programs : 25536 -> 25410 (-0.49%)
15:22 wvuu: I see mad old
15:22 wvuu:goes for a pizza
15:22 karolherbst: I hope there is more stuff like that in that game :D
15:23 imirkin: grrr! SULD.P doesn't seem to work on nvc0 :(
15:32 karolherbst: imirkin: it also helped bioshock quite a lot
15:32 karolherbst: and I bet it will help each eon game
15:36 wvuu: imirkin: works now
15:38 imirkin: wvuu: cool :)
15:41 wvuu: I was lucky to find you and that you knew exactly the problem.
15:41 karolherbst: imirkin: which op also generate some bools, set, slct, ..?
15:41 wvuu: that doesn't happen with m$$$
15:46 karolherbst: mhhh sub(abs($r0), floor(abs($r0)))
15:48 imirkin: that's.... trunc(abs($r0))
15:48 imirkin: which can be done with a single cvt
15:49 karolherbst: why trunc? I thought trunc and floor or somewhat the same, but different
15:50 karolherbst: doesn't sub(abs($r0), floor(abs($r0))) just remove the integer part of $r0 leaving the floating point part?
15:52 karolherbst: ohh right, trunc was rounding towards 0
15:53 mwk: karolherbst: depends on what you consider to be the integer part of -1.5
15:53 karolherbst: :D
15:53 karolherbst: right
15:53 karolherbst: but you got what I meant, right?
15:53 mwk: and using floor results in annoying loss of precision if the number is negative and close to 0
15:53 mwk: while using trunc always gives you exact results
15:54 mwk: as in, you can put the integer and float parts back together and get your number back
15:54 karolherbst: mwk: well I got "floor ftz f32 $r2 abs $r0; sub ftz f32 $r6 abs $r0 $r2" and $r0 could be anything
15:56 karolherbst: I still don't see how that can be trunc(abs($r0))
15:56 imirkin: er
15:56 imirkin: did i say trunc?
15:56 karolherbst: yes
15:57 imirkin: i meant fract
15:57 karolherbst: :D
15:57 karolherbst: ahh
15:57 karolherbst: yeah, that makes sense
15:57 imirkin: which, actually, we can't do
15:57 karolherbst: mhh
15:57 karolherbst: sad
15:57 imirkin: yes.
15:57 karolherbst: why not?... I mean,... ugh
15:57 karolherbst: wtvr :D
15:57 karolherbst: I have that stuff several times in that shader here
15:59 glennk: the compiler doesn't do gvn or cse does it?
15:59 imirkin: glennk: it does CSE
15:59 imirkin: mostly local, and some fake-o global
16:00 imirkin: i.e. if all of the sources of a phi node produce identical values, it will replace the phi node with that value
16:00 karolherbst: imirkin: so sofrware frac will produce this floor, sub thing? :/ meh
16:01 imirkin: yes.
16:01 imirkin: unless mwk finds a frac op we missed
16:02 imirkin: mwk: btw, looks like SULD.P is broken on fermi :( that makes me very sad.
16:02 imirkin: at least SUST.P works
16:02 jayhost: I'd like to help gm107 reclock but pmoreau says nobodies working on it. Can I start Mmio trace?
16:03 imirkin: jayhost: what's needed is people who will work on it, not people who have just have the hardware
16:03 imirkin: although having the hardware is an important first step
16:04 karolherbst: imirkin: add ftz f32 $r13 $r6 $r6; add ftz f32 $r1 $r1 neg $r13... can this be done as mad?
16:04 karolherbst: looks like it
16:04 karolherbst: or was there no neg for mad?
16:04 imirkin: - r6 * 2 + r1 -- should be possible.
16:05 jayhost: @imirkin is it a job that requires many people and many hours?
16:05 imirkin: jayhost: dunno about many people, but def many hours :)
16:05 karolherbst: yeah
16:06 karolherbst: that might work actually
16:06 karolherbst: imirkin: do you think if we optimize add(a, a) to mul(a, 2) that other passes will do the mad stuff automagically? :D
16:07 imirkin: actually we optimize mul(a, 2) -> add(a, a) ;)
16:07 karolherbst: :D
16:07 karolherbst: mhh and why?
16:07 imirkin: add is faster than mul?
16:07 karolherbst: ohhh
16:07 karolherbst: so 2 adds can be faster than mad+neg?
16:08 imirkin: i dunno. ask mwk.
16:11 karolherbst: mwk: soo, any clue?
16:17 andril: hello
16:18 andril: do i need any additional drivers for this device? 01:00.0 3D controller: NVIDIA Corporation GF117M [GeForce 610M/710M/820M / GT 620M/625M/630M/720M] (rev a1) - in debian 8
16:21 karolherbst: imirkin: max($r1, 0x3a83126f) == $r63, that is always false, right?
16:22 karolherbst: I am not sure about the instructions though
16:22 karolherbst: https://gist.github.com/karolherbst/c1086ae5400dd008aea9
16:22 imirkin: >>> struct.unpack("f", struct.pack("I", 0x3a83126f))
16:22 imirkin: (0.0010000000474974513,)
16:22 jayhost: I'll start envtools/mmio on gtx750ti and try to learn me something.
16:23 imirkin: karolherbst: so... yeah. i think that's always false.
16:23 imirkin: andril: i have no idea what debian 8 ships with. however you're unlikely to get too much out of that GPU even with the latest kernel/everything -- no reclocking on fermi supported for now
16:23 imirkin: andril: you're probably better off sticking with the intel igp
16:24 imirkin: i guess it depends on your needs
16:24 karolherbst: imirkin: what does it actually means when $p0 is set to false?, or what is $p0 ?
16:24 imirkin: karolherbst: predicate register
16:24 karolherbst: ahhh okay
16:24 imirkin: can be used to predicate other instructions
16:24 andril: imirkin, so the defaults loaded is ok - i just get issues with apps like docky red boxes
16:25 imirkin: andril: are you actually using nouveau though? unless you've got prime configured, you're just using the intel igp
16:26 imirkin: if you're seeing missing textures, chances are you're missing libtxc_dxtn
16:26 karolherbst: imirkin: maybe I will take a look at value-range stuff, but mhh, I guess this is work for several weeks
16:26 imirkin: karolherbst: yeah it's tricky business
16:30 karolherbst: imirkin: where is the tesselation patch for shader-db? then I would also add the unigine stuff locally
16:31 andril: imirkin, https://imgur.com/2bqPoAA
16:35 karolherbst: imirkin: found it
16:48 imirkin: andril: is there a problem?
16:48 imirkin: (not sure what it's supposed to look like)
16:49 imirkin: andril: either way, run "glxinfo" - i suspect you'll see that you're using intel
17:39 andril: sorry for the delay imirkin http://pastebin.com/37fPKQHD - the red boxes were the issue
17:40 imirkin: andril: and as you can clearly see, nouveau has nothing to do with it
17:40 imirkin: i note that's a pretty ancient version of mesa, i might recommend upgrading
17:41 imirkin: btw, i have _no clue_ what red boxes you're talking about...
22:09 imirkin: images are kinda-sorta getting there. some things are passing. i still have addressing all wrong for image loads... blob does something funny with all 3 coords.
23:24 Jayhost: echo 64000 > /sys/kernel/debug/tracing/buffer_size_kb
23:24 Jayhost: Enable the mmio tracer, and start recording the log:
23:24 Jayhost: root@Ein:~# echo mmiotrace > /sys/kernel/debug/tracing/current_tracer
23:24 Jayhost: Eh I was trying to ask if the commands on the ubuntu MMIO were outdated