12:54 karolherbst: uhh what means "unhandled TGSI property 18" ?
12:56 yoshimo: i see we ask ourselfes the same questions
12:56 hakzsam: it's because TGSI_NEXT_SHADER is not supported by our codegen
12:56 hakzsam: don't worry about that
12:56 hakzsam: it's not a bg
12:56 hakzsam: *bug
12:57 karolherbst: k
12:57 karolherbst: I just created payday 2 shaders and check if we can optimize something there :)
12:58 karolherbst: mhh -0.43% with my pending patches, that isn't that much
13:00 karolherbst: mhh all those shaders are pretty simple
13:05 karolherbst: I think it would be a good idea to make RA more aware of those dtq instructions so that we need less movs
13:10 karolherbst: mhh
14:48 karolherbst: set ftz u32 $r1 lt f32 $r63 $r0
14:48 karolherbst: what are the hardware boolean values possible for this? 0x1 and 0x0 or soemthign else?
15:30 mwk: karolherbst: ISTR -1 being involved
15:30 karolherbst: mhh I thinkg there was also a difference between floats and ints?
15:31 mwk: set u32 gives you 0xffffffff/0, set f32 gives you... 0x3f800000/0, I guess
15:31 mwk: no idea how it works on Fermi, on Tesla all comparisons give you either -1 or 0
15:33 karolherbst: mhh okay
15:33 karolherbst: because I found something like this:
15:33 karolherbst: set ftz u32 $r1 lt f32 $r63 $r0
15:33 karolherbst: and u32 $r1 $r1 0x00000001
15:33 karolherbst: could be merged to a slct maybe
15:34 karolherbst: but I was thinking there is a simplier way
15:41 karolherbst: and then cvt f32 $r1 s32 $r1 ...
15:41 karolherbst: mwk: 0x3f800000 float is 1 right?
15:41 karolherbst: then set u32, and u32, cvt f32, should be a simple set f32 right?
15:45 mwk: yep, 1.0
15:46 mwk: sounds good
15:46 mwk: btw and+cvt is pointless either way, you could just use cvt.neg
15:46 karolherbst: that would be simplier
15:46 karolherbst: but only cuts one instruction
15:46 karolherbst: k
15:47 karolherbst: mhh
15:47 karolherbst: I don't see the and+cvt thing though
15:53 karolherbst: mwk: what does and u32 0x1 do when the input is 0x3f800000? the result is just 0x0?
16:09 karolherbst: mwk: ahh now I got it
16:10 karolherbst: and u32 a 0x1+ cvt f32 d s32 a is pointless...
16:10 karolherbst: mhh
16:10 karolherbst: I think this is only valid for boolean inputs anyway
16:31 bublic: hi
16:31 bublic: who is on?
16:31 bublic: i ahve seen this page
16:31 bublic: https://nouveau.freedesktop.org/wiki/CodeNames/
16:32 bublic: i do not find my vcard there which is a 870m, does it mean that it is not supported?
16:32 karolherbst: bublic: it is a GK104 right? well it is supported
16:33 karolherbst: and if you wat a few months you should be able to fully reclock it too
16:33 bublic: ok, thx
16:34 karolherbst: well you can also use my branch if you want to, just depends on if you need the performance or not
16:34 karolherbst: but I guess you want to play games with such a gpu
16:34 bublic: karolherbst
16:34 bublic: no
16:34 bublic: i want to do
16:34 bublic: http://www.x.org/wiki/SummerOfCodeIdeas/
16:34 bublic: the project idea
16:34 karolherbst: ahhh
16:34 karolherbst: for nouveau?
16:34 bublic: Switch OpenMAX state tracker in Mesa/Gallium to use Tizonia
16:35 bublic: yah
16:35 karolherbst: ohh okay
16:35 karolherbst: I have no idea if nouveau supports openmax though
16:36 karolherbst: bublic: but you can also come up with your own ideas
16:36 karolherbst: bublic: just choose something _you_ want to do
16:36 karolherbst: like what do you miss the most when using nouveau or open source drivers and see if it fits in a gsoc project (shouldn't be too easy)
16:37 bublic: ok, thx
16:37 karolherbst: I have also some ideas which aren't on the list if you are interessted
17:21 karolherbst: mwk: that was time well spent: total instructions in shared programs : 2216481 -> 2216473 (-0.00%)
17:22 karolherbst: why do I always find those useless optimization potentials...
18:47 imirkin: karolherbst: because nouveau does fairly well already at the simple stuff
18:48 karolherbst: yeah...
18:49 karolherbst: but this cut 8 instruction out of a 270 instruction shader
18:49 karolherbst: but it was the only one affected
18:50 karolherbst: but maybe you see a more generic optimization for this: https://github.com/karolherbst/mesa/commit/12ba65498008d92bf1ddd57ed00ad9c8ac915145
18:50 sarnex: karolherbst: hey do you know if any nouveau-related lockups were resolved in the past couple of months? i've had issues with lockups even though my main gpu is AMD, blacklisting nouveau fixed it
18:50 karolherbst: sarnex: well depends on the lockup
18:50 karolherbst: did you reclock?
18:50 sarnex: no
18:50 sarnex: i didnt even use the gpu
18:50 sarnex: it was just loaded so PRIME would work if i used it
18:51 karolherbst: depends on the error then
18:51 sarnex: it was a super strange error
18:51 karolherbst: I guess X was messing around
18:51 sarnex: like it would kill 1 cpu core, i would get a dmesg error about cpu core stall detected
18:51 sarnex: and then like 30 seconds later it would lockup
18:52 sarnex: i guess ill try un-blacklisting it
18:52 karolherbst: well just by loading nouveau nothing bad should happen
18:52 karolherbst: this should be fixed either way
18:52 sarnex: ok thanks ill report in if it comes back
18:53 sarnex: imirkin: hey do you remember the DRI_PRIME command that uses the pci location?
18:53 imirkin: karolherbst: uhhhh what? that AND 1 is necessary...
18:53 imirkin: sarnex: not offhand... you feed it the udev name
18:53 imirkin: udevadm info /dev/dri/renderD129
18:53 imirkin: E: ID_PATH_TAG=pci-0000_04_00_0
18:54 imirkin: i think you feed that tag into DRI_PRIME
18:54 imirkin: or maybe the ID_PATH above it
18:55 sarnex: imirkin: yep thats right thanks alot
18:55 sarnex: does 'unhandled TGSI property 18' matter?
18:56 karolherbst: imirkin: set u32; and u32 0x1; cvt f32 u32
18:56 karolherbst: isn't this equal to set f32?
18:58 karolherbst: sarnex: mwk says it is to be expected
18:58 sarnex: ok, thanks again guys
19:00 karolherbst: imirkin: I though u32 produces -1/0, and -1/0 0x1 makes 1/0 and cvt 1/0 to f32 makes 1.0/0 which is the result of set f32, or is there something wrong?
19:00 imirkin: karolherbst: oh, with the cvt, yes
19:00 imirkin: but you never check for the vcvt
19:00 karolherbst: I sure do
19:01 karolherbst: ohh
19:01 imirkin: oh duh
19:01 karolherbst: well the other opts could modify insn->op to something else
19:01 imirkin: of course you do
19:01 imirkin: i'm getting forgetful in my old age
19:01 karolherbst: there should be some break out of the switch if insn->op changes
19:01 karolherbst: but yeah, only one shader in payday2 was affected
19:02 karolherbst: none in my other games nor in shader-db
19:02 karolherbst: mhh
19:02 imirkin: yeah, it's not very common
19:02 imirkin: they must have done something odd
19:02 karolherbst: set_and; and coult be optimized...
19:03 imirkin: yep
19:03 imirkin: you also could just flip the dType on the original insn
19:03 imirkin: i mean on the original set insn
19:03 imirkin: er hm
19:03 imirkin: i guess not really, yeah
19:03 imirkin: what you did is fine
19:03 karolherbst: yeah I was there too at first
19:05 karolherbst: the game runs like 50% slower though
19:05 karolherbst: seems to be a pretty decent port though
19:05 karolherbst: no crash at least
19:07 karolherbst: ...
19:07 karolherbst: what is happening there
19:07 karolherbst: after the set,and,cvt thing
19:07 karolherbst: add ftz f32 $r0 $r0 $r1
19:07 karolherbst: cvt ftz s32 rz $r0 f32 $r0
19:07 karolherbst: shl u32 $r17 $r0 0x00000006
19:07 karolherbst: ...
19:08 imirkin: someone's stupidly introducing floating point math when they totally don't need to
19:08 karolherbst: I try to find it in the glsl
19:09 karolherbst: uhh
19:09 karolherbst: int() call
19:09 karolherbst: tmpvar_18 = textureProj (shadow_depth, (tmpvar_16 * global_shadow_projection[(int(dot (vec4(ivec4(lessThan (tmpvar_17, xlv_TEXCOORD5.wwww))), vec4(1.0, 1.0, 1.0, 1.0))) - 1)]));
19:09 karolherbst: and array access based on the cast value
19:10 imirkin: yep, that's it
19:10 imirkin: instead of the stupid dot, which is done as floating point
19:10 imirkin: they could have just summed the whole thing
19:10 imirkin: and kept it as integers
19:10 imirkin: however that sort of idiocy could be difficult to detect
19:11 karolherbst: well
19:12 karolherbst: at least I remove the one int cast
19:12 karolherbst: ohh wait
19:12 karolherbst: I eliminated the ivec...
19:13 karolherbst: and the vec4() constant
19:14 karolherbst: ohh
19:14 karolherbst: what is left are 4 sets, which results are summed up
19:17 imirkin: right, but no reason to use floating point to do the summing
19:17 karolherbst: imirkin: when you got time, I have still those two opts (PostRADCE and sub(a,0))here: https://github.com/karolherbst/mesa/commits/to_upstream
19:18 karolherbst: finally found a user for the latter one
19:35 imirkin: try to get hakzsam to review them
19:36 karolherbst: k, I try to ask him tomorrow then
20:14 martm: i have nv98 too
20:15 martm: ouh it's g98m gforce 9200 on my laptop, seems to have ogl 3.3 by default with nouveau