12:54karolherbst: uhh what means "unhandled TGSI property 18" ?
12:56yoshimo: i see we ask ourselfes the same questions
12:56hakzsam: it's because TGSI_NEXT_SHADER is not supported by our codegen
12:56hakzsam: don't worry about that
12:56hakzsam: it's not a bg
12:57karolherbst: I just created payday 2 shaders and check if we can optimize something there :)
12:58karolherbst: mhh -0.43% with my pending patches, that isn't that much
13:00karolherbst: mhh all those shaders are pretty simple
13:05karolherbst: I think it would be a good idea to make RA more aware of those dtq instructions so that we need less movs
14:48karolherbst: set ftz u32 $r1 lt f32 $r63 $r0
14:48karolherbst: what are the hardware boolean values possible for this? 0x1 and 0x0 or soemthign else?
15:30mwk: karolherbst: ISTR -1 being involved
15:30karolherbst: mhh I thinkg there was also a difference between floats and ints?
15:31mwk: set u32 gives you 0xffffffff/0, set f32 gives you... 0x3f800000/0, I guess
15:31mwk: no idea how it works on Fermi, on Tesla all comparisons give you either -1 or 0
15:33karolherbst: mhh okay
15:33karolherbst: because I found something like this:
15:33karolherbst: set ftz u32 $r1 lt f32 $r63 $r0
15:33karolherbst: and u32 $r1 $r1 0x00000001
15:33karolherbst: could be merged to a slct maybe
15:34karolherbst: but I was thinking there is a simplier way
15:41karolherbst: and then cvt f32 $r1 s32 $r1 ...
15:41karolherbst: mwk: 0x3f800000 float is 1 right?
15:41karolherbst: then set u32, and u32, cvt f32, should be a simple set f32 right?
15:45mwk: yep, 1.0
15:46mwk: sounds good
15:46mwk: btw and+cvt is pointless either way, you could just use cvt.neg
15:46karolherbst: that would be simplier
15:46karolherbst: but only cuts one instruction
15:47karolherbst: I don't see the and+cvt thing though
15:53karolherbst: mwk: what does and u32 0x1 do when the input is 0x3f800000? the result is just 0x0?
16:09karolherbst: mwk: ahh now I got it
16:10karolherbst: and u32 a 0x1+ cvt f32 d s32 a is pointless...
16:10karolherbst: I think this is only valid for boolean inputs anyway
16:31bublic: who is on?
16:31bublic: i ahve seen this page
16:32bublic: i do not find my vcard there which is a 870m, does it mean that it is not supported?
16:32karolherbst: bublic: it is a GK104 right? well it is supported
16:33karolherbst: and if you wat a few months you should be able to fully reclock it too
16:33bublic: ok, thx
16:34karolherbst: well you can also use my branch if you want to, just depends on if you need the performance or not
16:34karolherbst: but I guess you want to play games with such a gpu
16:34bublic: i want to do
16:34bublic: the project idea
16:34karolherbst: for nouveau?
16:34bublic: Switch OpenMAX state tracker in Mesa/Gallium to use Tizonia
16:35karolherbst: ohh okay
16:35karolherbst: I have no idea if nouveau supports openmax though
16:36karolherbst: bublic: but you can also come up with your own ideas
16:36karolherbst: bublic: just choose something _you_ want to do
16:36karolherbst: like what do you miss the most when using nouveau or open source drivers and see if it fits in a gsoc project (shouldn't be too easy)
16:37bublic: ok, thx
16:37karolherbst: I have also some ideas which aren't on the list if you are interessted
17:21karolherbst: mwk: that was time well spent: total instructions in shared programs : 2216481 -> 2216473 (-0.00%)
17:22karolherbst: why do I always find those useless optimization potentials...
18:47imirkin: karolherbst: because nouveau does fairly well already at the simple stuff
18:49karolherbst: but this cut 8 instruction out of a 270 instruction shader
18:49karolherbst: but it was the only one affected
18:50karolherbst: but maybe you see a more generic optimization for this: https://github.com/karolherbst/mesa/commit/12ba65498008d92bf1ddd57ed00ad9c8ac915145
18:50sarnex: karolherbst: hey do you know if any nouveau-related lockups were resolved in the past couple of months? i've had issues with lockups even though my main gpu is AMD, blacklisting nouveau fixed it
18:50karolherbst: sarnex: well depends on the lockup
18:50karolherbst: did you reclock?
18:50sarnex: i didnt even use the gpu
18:50sarnex: it was just loaded so PRIME would work if i used it
18:51karolherbst: depends on the error then
18:51sarnex: it was a super strange error
18:51karolherbst: I guess X was messing around
18:51sarnex: like it would kill 1 cpu core, i would get a dmesg error about cpu core stall detected
18:51sarnex: and then like 30 seconds later it would lockup
18:52sarnex: i guess ill try un-blacklisting it
18:52karolherbst: well just by loading nouveau nothing bad should happen
18:52karolherbst: this should be fixed either way
18:52sarnex: ok thanks ill report in if it comes back
18:53sarnex: imirkin: hey do you remember the DRI_PRIME command that uses the pci location?
18:53imirkin: karolherbst: uhhhh what? that AND 1 is necessary...
18:53imirkin: sarnex: not offhand... you feed it the udev name
18:53imirkin: udevadm info /dev/dri/renderD129
18:53imirkin: E: ID_PATH_TAG=pci-0000_04_00_0
18:54imirkin: i think you feed that tag into DRI_PRIME
18:54imirkin: or maybe the ID_PATH above it
18:55sarnex: imirkin: yep thats right thanks alot
18:55sarnex: does 'unhandled TGSI property 18' matter?
18:56karolherbst: imirkin: set u32; and u32 0x1; cvt f32 u32
18:56karolherbst: isn't this equal to set f32?
18:58karolherbst: sarnex: mwk says it is to be expected
18:58sarnex: ok, thanks again guys
19:00karolherbst: imirkin: I though u32 produces -1/0, and -1/0 0x1 makes 1/0 and cvt 1/0 to f32 makes 1.0/0 which is the result of set f32, or is there something wrong?
19:00imirkin: karolherbst: oh, with the cvt, yes
19:00imirkin: but you never check for the vcvt
19:00karolherbst: I sure do
19:01imirkin: oh duh
19:01karolherbst: well the other opts could modify insn->op to something else
19:01imirkin: of course you do
19:01imirkin: i'm getting forgetful in my old age
19:01karolherbst: there should be some break out of the switch if insn->op changes
19:01karolherbst: but yeah, only one shader in payday2 was affected
19:02karolherbst: none in my other games nor in shader-db
19:02imirkin: yeah, it's not very common
19:02imirkin: they must have done something odd
19:02karolherbst: set_and; and coult be optimized...
19:03imirkin: you also could just flip the dType on the original insn
19:03imirkin: i mean on the original set insn
19:03imirkin: er hm
19:03imirkin: i guess not really, yeah
19:03imirkin: what you did is fine
19:03karolherbst: yeah I was there too at first
19:05karolherbst: the game runs like 50% slower though
19:05karolherbst: seems to be a pretty decent port though
19:05karolherbst: no crash at least
19:07karolherbst: what is happening there
19:07karolherbst: after the set,and,cvt thing
19:07karolherbst: add ftz f32 $r0 $r0 $r1
19:07karolherbst: cvt ftz s32 rz $r0 f32 $r0
19:07karolherbst: shl u32 $r17 $r0 0x00000006
19:08imirkin: someone's stupidly introducing floating point math when they totally don't need to
19:08karolherbst: I try to find it in the glsl
19:09karolherbst: int() call
19:09karolherbst: tmpvar_18 = textureProj (shadow_depth, (tmpvar_16 * global_shadow_projection[(int(dot (vec4(ivec4(lessThan (tmpvar_17, xlv_TEXCOORD5.wwww))), vec4(1.0, 1.0, 1.0, 1.0))) - 1)]));
19:09karolherbst: and array access based on the cast value
19:10imirkin: yep, that's it
19:10imirkin: instead of the stupid dot, which is done as floating point
19:10imirkin: they could have just summed the whole thing
19:10imirkin: and kept it as integers
19:10imirkin: however that sort of idiocy could be difficult to detect
19:12karolherbst: at least I remove the one int cast
19:12karolherbst: ohh wait
19:12karolherbst: I eliminated the ivec...
19:13karolherbst: and the vec4() constant
19:14karolherbst: what is left are 4 sets, which results are summed up
19:17imirkin: right, but no reason to use floating point to do the summing
19:17karolherbst: imirkin: when you got time, I have still those two opts (PostRADCE and sub(a,0))here: https://github.com/karolherbst/mesa/commits/to_upstream
19:18karolherbst: finally found a user for the latter one
19:35imirkin: try to get hakzsam to review them
19:36karolherbst: k, I try to ask him tomorrow then
20:14martm: i have nv98 too
20:15martm: ouh it's g98m gforce 9200 on my laptop, seems to have ogl 3.3 by default with nouveau