07:35 mupuf_: imirkin_: shall I plug a kepler and verify that Nicolai's patchset for enhanced layout works for nouveau too?
07:41 imirkin: mupuf_: i plan on working out what's necessary over the weekend
07:41 imirkin: i doubt it'll work as-is, but .. who knows
07:41 mupuf_: imirkin_: ok, I iwll let you do it since you already worked on it
07:41 imirkin: but for now ... bed.
07:43 mupuf_: hehe, sleep tight
12:12 karolherbst: uhh, that mad folding also works with a predicate on gk110+ :)
12:17 karolherbst: "total instructions in shared programs : 2818120 -> 2807943 (-0.36%)" :)
12:18 mupuf: :)
12:18 karolherbst: also implemented for gk110 and gm107 isa :)
12:18 karolherbst: maybe due to the predicate it also helps more on gk110+, but well
12:19 karolherbst: is there a cheaper way of doing "(i->src(2).mod | Modifier(NV50_IR_MOD_NEG)) != Modifier(NV50_IR_MOD_NEG)" ?
12:25 karolherbst: imirkin: if I have mov $r2 0x24242424; mad $r0 $r1 neg $r2 $r0 and then I just do mad->setSrc(1, mov->getSrc(0)); do I have to apply the mod on the immediate or will that be done automatically?
12:33 karolherbst: ohh
12:33 karolherbst: gm107 can also do that for U32
12:33 karolherbst: *S32
12:41 karolherbst: hakzsam: I heard it is possible to fake another gpu while running shader-db for nouveau now? Or do I need a patch for this?
12:43 mupuf: karolherbst: the commit tells you how to do it
12:43 mupuf: NV50_CHIPSET=... IIRC
12:43 karolherbst: right, but I don't want the commit
12:43 mupuf: ?
12:44 karolherbst: *find
12:44 karolherbst: NV50_PROG_CHIPSET ahh
12:46 karolherbst: mhh, doesn't seem to have worked out though
12:46 karolherbst: https://cgit.freedesktop.org/mesa/mesa/commit/src/gallium/drivers/nouveau?id=ac859d68f474694f9cb1de007997c936d735a48c
12:46 karolherbst: titles are important :D
12:47 karolherbst: mhh, tried "NV50_PROG_CHIPSET=f0 ./run -d 1 shaders/ ../nvidia_shaderdb/ > old_f0" but it didn't work
12:48 karolherbst: or the generated stuff is really the same
12:48 karolherbst: which I highly doubt
12:50 karolherbst: NV50_PROG_CHIPSET=241 .. yay
12:50 karolherbst: ahh 0xf1 works
12:53 hakzsam_: try with 0xf0 you will see it works :p
12:54 karolherbst: yeah, I already figured that out
12:54 karolherbst: I am sure the code emiter will crash for either gk110 or gm107
12:55 hakzsam_: because of what?
12:56 karolherbst: post ra mad folding
12:56 karolherbst: I even enabled that for s32 on gm107
12:56 hakzsam_: ah yeah
12:56 karolherbst: mhh
12:56 karolherbst: 0x107 and 0xf1 generate the same binary
12:56 karolherbst: ohhh right
12:57 karolherbst: 117 is maxwell
12:57 hakzsam_: 0x117
12:59 karolherbst: hakzsam_: by the way, do you have some shaders you still haven't uploaded yet?
13:00 karolherbst: mhh differences between e6 and 117: total instructions in shared programs : 2818144 -> 2824280 (0.22%)
13:03 karolherbst: hihi, it crashes for f1 indeed
13:03 karolherbst: and for 117
13:04 hakzsam_: no new shaders on my side
13:04 karolherbst: k, maybe I will generate some more starting next weekend
13:29 dcomp: so I just tried to run nouveau with only config=NvClkMode=7 and It hangs the machine
13:30 karolherbst: dcomp: which version?
13:30 karolherbst: and which gpu?
13:30 dcomp: I guess I still need runpm=0
13:31 dcomp: GM108 and your stable reclocking v6 branch
13:31 karolherbst: mhh
13:31 karolherbst: maybe I still have a bug int here on boot
13:31 karolherbst: odd
13:31 karolherbst: I am sure I fixed those though
13:32 karolherbst: hakzsam_: how well do you know the gk110 emiter code?
13:42 karolherbst: uhm
13:43 karolherbst: I think I found a precision issue
13:43 karolherbst: 0.2 is 0x3e4ccccd right?
13:44 karolherbst: but mesa makes 0x3e5ccccd out of it
13:44 karolherbst: which is 0.215625
13:44 karolherbst: mhh
13:44 karolherbst: maybe it was me messing up
13:44 karolherbst: let me check
13:46 karolherbst: yeah... odd
13:46 karolherbst: it is even right in the mov
13:46 karolherbst: 11: mov u32 %r220 0x3e4ccccd
13:48 karolherbst: yeah, emitForm_L does something funny
13:52 dcomp: is drm_device dev->led guaranteed to be non null
13:52 karolherbst: dcomp: ohhhhh
13:52 karolherbst: dcomp: I didn't have the fix yet on that branch
13:53 karolherbst: dcomp: revert a4027e2d513797f2123728a2169155547f9bcdff
13:55 dcomp: karolherbst and it seems to work
13:56 karolherbst: nice
13:57 karolherbst: okay, that srcId(i->src(s), s ? 42 : 10); kills it :/
13:59 karolherbst: uhh
13:59 karolherbst: right
13:59 karolherbst: messa
13:59 karolherbst: *messy
14:02 karolherbst: now it looks good :)
14:10 karolherbst: https://github.com/karolherbst/mesa/commit/17b104749f5638a6d5f5043f563d990231b0cddb :)
14:11 karolherbst: gm107 will be interesting
14:41 vita_cell: guys
14:42 vita_cell: now my GPU is not detected in "xrandr --listproviders" (only)
14:42 karolherbst: you don't need this with dri3
14:42 vita_cell: you said to unload dri3 and use dri2
14:42 vita_cell: not?
14:43 karolherbst: nope, I was telling you to use dri3
14:43 vita_cell: I did this
14:43 vita_cell: Section "Device"
14:43 vita_cell: Identifier "Intel Graphics"
14:43 vita_cell: Driver "intel"
14:43 vita_cell: Option "DRI" "2"
14:43 vita_cell: EndSection
14:43 vita_cell: in /etc/X11/xorg.conf.d/20-intel.conf
14:51 vita_cell: karolherbst
14:51 vita_cell: http://dpaste.com/37FGYSM
14:51 vita_cell: is this what you talked about?
14:51 karolherbst: yeah
14:52 vita_cell: and then?
14:52 vita_cell: xrandr --setprovideroffloadsink 1 0 ??
14:52 karolherbst: not needed anymore
14:52 vita_cell: so I just need run games with DRI_PRIME=1 parameter
14:52 karolherbst: yeah
14:52 vita_cell: thanks you
15:08 vita_cell: karolherbst what is difference Xonotic SDL vs GLX?
15:08 vita_cell: what should I run
15:08 karolherbst: no clue
15:19 karolherbst: mhh odd
15:19 karolherbst: so far I didn't found any mads on gm107 with an long immediate :/
15:19 karolherbst: *a
15:36 karolherbst: uhh
15:36 karolherbst: 2% more perf in pixmark_piano due to that folding pass
15:39 mupuf: nice
15:44 karolherbst: soo, now are there other things like that? :D
15:44 karolherbst: I've heard SAD is a thing, but I don't think we actually use this yet?
15:47 karolherbst: uhh
15:47 karolherbst: sad sounds like not really that much usable
15:48 mupuf: SAD?
15:48 karolherbst: Sum of absolute differences.
15:48 karolherbst: abs(a-b) + c
15:49 karolherbst: we could opt add(add(a, neg(b), c) to sad(a, b, c)
15:49 mupuf: ahah, well, unlikely to need
15:49 karolherbst: well
15:50 karolherbst: I am sure a sad is faster than two adds
15:50 karolherbst: and as long as an add with a neg is used in another one, we could opt that into a sad
15:50 karolherbst: mhhh
15:50 karolherbst: no idea if we can add a neg opt to sad
15:50 karolherbst: *mod
15:51 karolherbst: doesn't seem like it
15:51 karolherbst: ohh wait
15:51 karolherbst: add(abs(add(a, neg(b), c)) to sad(a, b, c)
15:51 karolherbst: this looks far less usefull out of the sudden now
15:52 mupuf: well, if nvidia added it, it likely is :)
15:52 karolherbst: first we would need to add opt passes to produce sads
15:52 karolherbst: cause I doubt we get any at all
15:55 karolherbst: mupuf: you know what we need? A CI system where we could just send our patches to, and it would automatically run piglit on tesla+ chipsets :)
16:27 karolherbst: what is this instruction? { 0x0000000000000002ull, 0xf800000000000007ull, T(addop), DST, T(acout3a), SESTART, N("mul"), T(high6), T(us32_7), SRC1, T(us32_5), LIMM, SEEND, DST }
16:27 hakzsam_: iscadd
16:28 karolherbst: ohhh
16:28 karolherbst: I doubt we use it often enough?
16:28 karolherbst: it also has a limm form like mad
16:28 karolherbst: 0x0000000000000002 gives me $p0 add $r0 (mul u32 $r0 u32 0x0) $r0
16:28 hakzsam_: I worked on that part
16:29 hakzsam_: and I have a patch which will use iscadd more often
16:29 karolherbst: ohh nice :)
16:29 karolherbst: right I saw the patches I think
16:30 hakzsam_: I already pushed MAD to ISCADD/SHLADD
16:30 hakzsam_: the new one is ADD+SHL to SHLADD
16:30 karolherbst: all using the same form?
16:31 karolherbst: mhh wait
16:31 karolherbst: ohhh now I see
16:32 karolherbst: that thing I found is basically add_op(mul(a, b), c)
16:32 karolherbst: where b is the limm
16:32 karolherbst: isn't that like plain mad?
16:33 vita_cell: karolhebst what do you think about this?
16:33 vita_cell: [vita@es2l ~]$ dmesg | grep fail
16:33 vita_cell: [ 0.236640] acpi PNP0A08:00: _OSC failed (AE_NOT_FOUND); disabling ASPM
16:34 karolherbst: no idea
16:34 vita_cell: thanks you, now every game is working fine now!!
16:34 karolherbst: seems like ony gm107 added a bunch of more of those
16:34 vita_cell: that"fail" makes my system to crash/hang
16:35 karolherbst: hakzsam_: is the fadd32i thing a +limm + b?
16:35 hakzsam_: karolherbst, yes, mad or iscadd if b is pow2
16:36 hakzsam_: fadd32i uses LIMM form yes
16:36 karolherbst: mhhh
16:36 karolherbst: seems like that pass is indeed just usefull for mad and sad
16:36 karolherbst: and I really don't bother enough for sad yet
17:15 karolherbst: hakzsam_: NV50_IR_SUBOP_MOV_FINAL are those movs to write into outputs or something like that. without that check a bunch of things get removed
17:17 hakzsam_: yeah, expotr
17:17 hakzsam_: *export
17:18 karolherbst: anyway, I wrote a post ra dce pass once, and this thing worked, with post_ra_dead I had a bunch of other issues I don't really remember anymore, but there was some messed up stuff going on
17:26 karolherbst: hakzsam_: I will move the RA commit after the pass so that the effect is obvious
17:26 karolherbst: without that the pass is just " total instructions in shared programs : 2818606 -> 2811662 (-0.25%)"
17:27 hakzsam_: okay, cool
17:27 karolherbst: the noise was like 40 helped and 70 hurt < 0.00% change
17:35 karolherbst: hakzsam_: are those piglit tests new?
17:36 karolherbst: ohh
17:36 karolherbst: generated_tests
17:37 karolherbst: hakzsam_: they pass for me
17:37 karolherbst: ohh not all
17:40 karolherbst: mhhh
17:40 karolherbst: hakzsam_: https://gist.github.com/karolherbst/fe6707b6348d09cf27e5fdad2e5aa515
17:41 karolherbst: " 21: phi u32 %r67 %r58 %r85 (0)" gets opted away
17:41 karolherbst: and that modifies the sustp
17:42 hakzsam_: yeah, this probably removes too much things :)
17:42 karolherbst: mhhhh
17:42 karolherbst: no, I don't think so
17:43 karolherbst: I think something can't handle this " 31: not %p84 sustp 2D $r0 $s0 f32 # u8 %r81d c7[0x424] %p78 %r66 %r66 %r68 %r69 (0)"
17:43 karolherbst: question is rather if we really shouldn't handle this
17:43 karolherbst: or try to handle that
17:43 karolherbst: of course that means we would need to add a pseudo mov $r67 $r66 if we don't want to handle it
17:46 karolherbst: mhhh it still seems odd that this fails
17:47 karolherbst: $r5 and $r6 aren't set though
17:47 karolherbst: I meant $r6 and $r7
17:57 karolherbst: hakzsam_: and you read the patches in the wrong order ;)
18:18 hakzsam_: karolherbst, replied with more defails :)
18:29 karolherbst: hakzsam_: I don't see why codegen should start using the limm forms out of the sudden just by adding the code to the emiter
18:30 karolherbst: but I've already added asserts to catch those situations
18:30 karolherbst: but again, why should anything start using this just by having those two commits?
18:41 karolherbst: longIMMD in gm107 is wrong anyway
18:41 karolherbst: it checks against 0xfff, but usually the short imms are just 19 bits
18:42 karolherbst: yeah, it is always 19 or 32
18:45 karolherbst: not even insnCanLoad checks for that
18:46 hakzsam_: after double-checking, insnCanLoad() prevents such a situation, I missed that MAJOR point :) so all is good
18:46 karolherbst: yeah
18:47 karolherbst: but I just found out it is broken for gm107
18:47 karolherbst: gm107 short imms are 19 bit, not 20
18:47 hakzsam_: I know
18:47 karolherbst: but the emiter will try to emit 20 bit imms as short ones
18:48 karolherbst: see CodeEmitterGM107::longIMMD
18:48 hakzsam_: yeah, that part is dumb
18:48 karolherbst: I should add support for those mads within insnCanLoad at some point :/ that method is just a major headache...
18:49 hakzsam_: true :)
18:49 karolherbst: but now I have to fix that, cause I found the short imm bug :(
18:49 hakzsam_: a bunch of the compiler parts are actually headaches, not only that insnCanLoad() func
18:50 karolherbst: like RA? :D
18:50 hakzsam_: yup
18:51 karolherbst: mhh
18:51 karolherbst: it seems like on gf100 short imms for integers are also just 19 bit long
18:51 karolherbst: ohh wait
18:52 karolherbst: if (reg.data.s32 > 0x7ffff || reg.data.s32 < -0x80000)
18:52 karolherbst: that is 20
18:52 karolherbst: for gm107 it has to be if (reg.data.s32 > 0x3ffff || reg.data.s32 < -0x40000)
18:53 imirkin: iirc it should actually be abs(reg.data.s32) < 0x80000
18:54 imirkin: and we need to stuff the abs(reg.data.s32) into the immd field
18:54 imirkin: along with the neg sign
18:54 imirkin: but it's been a while since i've looked at it
18:56 karolherbst: mhh
18:56 karolherbst: I just turn this thing into an even bigger mess
18:58 karolherbst: https://github.com/karolherbst/mesa/commit/89ad9796326f265e8332fceca19f929af8856bd3
18:59 karolherbst: ...
18:59 karolherbst: imirkin: when you have time, might take a look at this? https://gist.github.com/karolherbst/fe6707b6348d09cf27e5fdad2e5aa515 ?
18:59 karolherbst: changes are " 21: phi u32 %r67 %r58 %r85 (0)" gets eliminated
19:00 karolherbst: and "32: not %p84 sustp 2D $r0 $s0 f32 # u8 %r81d c7[0x424] %p78 %r66 %r67 %r68 %r69 (0)" $r67 source is set to $r66
19:00 karolherbst: and then it fails later
19:00 karolherbst: just want to know if the phi removeal is the big issue or if RA is just silly
19:08 imirkin: karolherbst: i THINK that it's just RA being silly, but i'm not 100% sure
19:08 imirkin: karolherbst: which test is this?
19:09 mupuf: karolherbst: yes, this is one goal too
19:10 mupuf: we have this at intel
19:10 mupuf: (related to CI)
19:21 karolherbst: imirkin: on of those hakzsam_ listed... wait
19:22 karolherbst: imirkin: generated_tests/spec/glsl-4.30/execution/built-in-functions/cs-all-bvec4-using-if.shader_test
19:23 imirkin: k
19:49 hakzsam_: karolherbst, did you already use pts for your own benchmarks btw?
19:50 karolherbst: yeah, but it isn't as usefull ./
19:50 hakzsam_: why?
19:51 karolherbst: couldn't get stable enough results, also I had to modify the source to actually use my own mesa installation and accept DRI_PRIME
19:52 hakzsam_: LD_LIBRARY_PATH doesn't work with pts?
19:52 hakzsam_: that sounds weird
19:53 karolherbst: I think it clears the environment or something like that
19:53 hakzsam_: ahah, not cool :)
19:53 karolherbst: took me quite some time too find out why DRI_PRIME didn't work
19:55 hakzsam_: the shl+add to shladd opt
19:56 hakzsam_: total instructions in shared programs :2286901 -> 2284473 (-0.11%)
19:56 karolherbst: sounds good :)
19:56 hakzsam_: yup
19:56 karolherbst: also for nvc0?
19:56 hakzsam_: nvc0+
19:56 karolherbst: nice
19:56 hakzsam_: coz nv50 doesn't support iscadd
19:57 karolherbst: maybe I write that sad pass
19:57 karolherbst: or do we have one already?
19:57 hakzsam_: I don't think
19:58 karolherbst: are the hurt shaders RA randomness as usual?
19:59 hakzsam_: yes, expected
20:00 karolherbst: hakzsam_: "s ? 0 : 1" => "s ^ 1"
20:01 karolherbst: ohh wait
20:01 karolherbst: no, should be fine
20:01 karolherbst: :D
20:01 karolherbst: no idea which one does look better thuogh
20:04 karolherbst: I should respond in mails instead of IRC ...
20:04 karolherbst: (more often)
20:24 karolherbst: hakzsam_: :O a tesseract shader has indeed the sad situation
20:25 karolherbst: mhh
20:25 karolherbst: but instructions can't have two c0[] sources, right?
20:28 vita_cell2: karolherbst what is the most effective benchmark for cpu?
20:35 karolherbst: what, sad is integer only? :/ uhhh
20:35 karolherbst: useless?
20:39 imirkin: quite sad :)
20:42 imirkin: i think SAD is an op in one of the weird extensions, dunno why it made into their ISA
20:53 karolherbst: well
20:54 karolherbst: it would be kind of usefull if you could use it for f32
21:05 karolherbst: imirkin: do you plan to look into that RA issue?
21:06 imirkin: i do.
21:06 karolherbst: ohh, I could add those missing max/min opts
21:07 karolherbst: k thanks.
21:07 karolherbst: at one point I also need to learn how to dig into RA and figure things out like that..
21:08 imirkin: the issue is generally not with the RA itself
21:08 imirkin: it's often with some little detail going into RA or coming out of it
21:08 imirkin: and for all i know, the issue isn't RA-related at all
21:08 karolherbst: I see
21:09 imirkin: as i said... investigation required :)
21:09 karolherbst: I am sure something can't handle that the quad op is $r66 $r66 $r68 $r69 in pre RA
21:09 karolherbst: *quad reg
21:16 imirkin: it gets condensed into a single arg
21:16 imirkin: with a merge op
21:16 imirkin: which in turn generates constraint moves
21:16 imirkin: or at least, it should
21:16 imirkin: also please don't confuse $r66 with %r66
21:16 imirkin: %r66 is a ssa value. $r66 is an allocated register.
21:16 karolherbst: okay
21:17 karolherbst: I was wondering why nothing hit the else branch in AlgebraicOpt::handleMINMAX, but there are no mods yet...
21:20 karolherbst: imirkin: I am sure I saw something like that, but is there a method which gets me the Modifier for a src like NEG_ABS even when src.mod = NEG and src->insn->op is ABS?
21:21 imirkin: ModifierFolding fixes it all up
21:21 imirkin: have a look at how it works
21:21 karolherbst: sure
21:21 karolherbst: but algebraicopt is before that
21:22 karolherbst: ohh
21:22 karolherbst: Modifier(op), right, that was a thing
21:55 karolherbst: doesn't this look a little dumb? https://gist.github.com/karolherbst/e3644af6fe5ad7ec9870e4ae1d5e6dd8
21:58 karolherbst: the heck? https://gist.github.com/karolherbst/848d4a63be0435d9f9888e59c3e1fce9 :D
21:58 karolherbst: do I miss something or is "761: ld u32 $r16 l[0x8]; 762: st u32 # l[0x8] $r16" pretty much pointless?
22:01 imirkin: quite.
22:01 karolherbst: uhh, this is spilling related I think
22:02 imirkin: this is a result of the spilling logic being ... shall we say ... imperfect
22:02 karolherbst: currently looking at the orbital shader...
22:02 karolherbst: it looks very weird
22:02 karolherbst: bbs with 20 phis.. why not
22:05 imirkin: don't worry about that one
22:05 imirkin: it's a crazy shader. just ignore it. make sure it doesn't crash the compiler, but that's about it.
22:08 karolherbst: mhh yeah, seems like it
22:08 karolherbst: the pow thing comes next anyway