01:30 karolherbst: imirkin: okay, I understood what GlobalCSE is doing. I think instead of moving the phi around, we could simply add a mov bld.getSSA phi.def to the current bb instead of moving that phi instruction
01:31 karolherbst: ohh, we don't need a new SSA value, but simply reuse the def of the phi globalCSE wants to whipe out
01:32 karolherbst: *wipe
01:48 karolherbst: fun, that original phi was "phi u32 %r354 %r207 %r207"
01:52 karolherbst: inserting this mov causes worse code to be generated
01:57 karolherbst: here another attempt to fix that, but it generates worse code. Funny enough for this shader it doesn't matter if GLobalCSE is turned on or not, the produced result after RA is the same, so maybe we could just drop that moving phis around in globalCSE completly? I shall ask shader-db what it thinks about this. https://github.com/karolherbst/mesa/commit/2abf0999dbe1e876c4e58c1541144cda3f5be1ac
02:17 technohacker: Hello again, sorry for disturbing, I wanted to know whether OpenCL is supported on NVC0. Wanted to use my GPU in Blender3D
02:20 imirkin: not with nouveau.
02:21 technohacker: imirkin: Oh ok, thanks
09:20 karolherbst: ugh, "precise" is a mess. It looks like it has to be fully analyzed in glsl, because if you assign a result of a function to a precise value, the internal operations of that function don't need to care about precise at all :/
09:20 karolherbst: should check how that's handled in nir
09:23 karolherbst: yeah, in nir the instruction are tagged as precise, which makes perfectly sense
10:57 karolherbst: hakzsam, imirkin: any of you to do a quick review of my tgsi-precise patches today? Never did anything there so I would like to show them to you before posting them on the ML
11:08 karolherbst: yay, that flickering is fixed :)
11:08 karolherbst: I know there a re a few things missing (like parsing that PRECISE modifier from the tgsi string), but I think it's a good first start: https://github.com/karolherbst/mesa/commits/gallium_precise
11:44 karolherbst: okay nice, that other tombRaider issue isn't caused by optimizations, let's look into this one
12:57 karolherbst: anybody an idea what can cause something like this? EXT_shader_integer_mix
12:57 karolherbst: ...
12:57 karolherbst: https://i.imgur.com/w31cIwa.png
13:56 imirkin: karolherbst: you're on the right track, but there's more to precise.
13:56 imirkin: karolherbst: for example, if you have
13:57 imirkin: foo = a + b
13:57 imirkin: bar = c + a + b + d
13:57 imirkin: you can't rewrite it as
13:57 imirkin: bar = c + foo + d
13:58 imirkin: [because floats suck]
14:04 karolherbst: imirkin: I know, but I just wanted to have enough to fix the issue I've encountered
14:04 karolherbst: adding more rules is optional for now
14:04 karolherbst: important is: 1. tgsi infrastructure 2. fixing that issue. Everything else we can also add with follow ups
14:05 imirkin: otoh, we don't have stuff like algebraic rebalancing in nouveau
14:05 imirkin: so the above rewrite won't end up happening.
14:05 imirkin: so the issue really is when we do inexact algebraic transformations
14:05 karolherbst: mhh I think it could happen through some CSE magic by accident
14:05 imirkin: which is the float mul+add -> fma
14:06 karolherbst: yeah
14:06 imirkin: and some annoying things involving NaN's
14:06 karolherbst: that's also the example written in the spec
14:06 imirkin: which i don't think matter
14:06 karolherbst: that fma thing
14:07 karolherbst: imirkin: but to be honest, that a + b + c = a + c + b thing won't matter for precise as long as we do it _always_
14:08 karolherbst: the idea behind precise is, to do the same calculation the same way. Nouveau breaks this, because there is this refcount check
14:08 karolherbst: for fma
14:08 imirkin: no
14:08 karolherbst: it's the reason written inside the spec
14:08 imirkin: the idea of precise is for the same calculation to come out the same way in different shaders
14:08 karolherbst: well, that's what I meant
14:09 imirkin: karolherbst: mind handling 'invariant' while you're at it? should be expressible in terms of 'precise'
14:09 karolherbst: imirkin: in glsl to nir invariant = precise
14:09 karolherbst: so yeah
14:10 imirkin: that should account for a handful of CTS fails
14:10 karolherbst: yeah, I guess so
14:10 karolherbst: maybe I should fix most of those while I work at it
14:11 karolherbst: the TGSI is just looking a bit ugly though. Any better idea to express precise? maybe the same way as saturate? https://gist.github.com/karolherbst/8a574602e6be19a4e2292faf112d80be
14:13 imirkin: yeah, _PRECISE can work.
14:13 imirkin: or maybe at the very end
14:13 imirkin: although that may cause confusion for parsing
14:13 karolherbst: yeah
14:15 karolherbst: allthough then we can get OP_SAT_PRECISE, but what ever
14:18 karolherbst: _PRECISE version: https://gist.github.com/karolherbst/42d2e26e6f92713bd74d9b0d83950d0a
14:18 karolherbst: yeah, I think this looks better
14:23 karolherbst: imirkin: any idea about this other issue though?
14:25 karolherbst: the thing which is broken is this fire light refraction
14:25 imirkin: hm?
14:25 karolherbst: this one: https://i.imgur.com/w31cIwa.png
14:26 imirkin: uhm ok
14:26 karolherbst: this is on intel: https://i.imgur.com/wcAog81.png
14:26 karolherbst: and this is on intel before that refraction gets rendered: https://i.imgur.com/Beil0Qe.png
14:26 imirkin: fun.
14:26 karolherbst: yeah
14:27 karolherbst: I think they reuse stuff from the prior frame or something like this
14:27 imirkin: probably.
14:39 swedave: Hello all . I want to try to reclock my nvidia 970 with Noveau driver and mount customs fan to my GPU . I was told that the reclock works but not the fans . How can i reclock it ? Im on Linux mint 18.1 with kernel 4.12.0- rc4
14:40 imirkin: karolherbst: care to provide instructions?
14:40 imirkin: swedave: it's not straightforward.
14:41 imirkin: [have to get the nouveau pmu firmware onto the gpu after the nvidia pmu firmware boots the gpu]
14:41 swedave: ok . I thout it was like changing pstate at kernel . Im not good at linux , but i like to try stuff ;)
14:41 imirkin: it's not for end users.
14:44 karolherbst: well it's not hard, but messy
14:44 karolherbst: 1. disable secboot and build 2. load nouveau 3. reclock 4. unload nouveau 5. enable secboot and build 6. load nouveau (and all that without rebooting)
14:45 karolherbst: one could automate this with a script on boot though with two nouveau modules
14:45 karolherbst: or something fancy like this
14:45 swedave: When i updatet my kernel i noticed that the noveau driver with wined3d9 is almost as fast as the Nvidia driver . Graphics is better with noveau and sound . I run my monitor with hdmi .
14:46 karolherbst: swedave: yeah well. We were thinking about adding a flag or something for this, because the fans on laptops are controlled by the EC in most cases
14:46 imirkin: karolherbst: would this be as easy as adding a "do secboot" module param?
14:46 karolherbst: but I got stuck with odd PMU issues and secboot
14:46 karolherbst: imirkin: kind of, but I should rather search for the real bug why loading the nouveau PMU image after secboot doesn't work
14:47 swedave: ok . I have a homebuild desktop .
14:48 swedave: I noticed now that winth the noveau driver , the fans dont spin .
14:49 karolherbst: swedave: yes, because we need signed firmware to control those
14:49 karolherbst: but we can reclock and set whatever voltage, ironic, isn't it?
14:49 swedave: Yeah , Read about it .
14:51 swedave: Yeah , Thats ironic . Thats why im thought mounting custom fans and reclock . The nvidia driver with wine sux cause it convert d3dx9 to opengl . World of warcraft is not playable now :( Thats why i thouch to get messy ;)
14:51 karolherbst: imirkin: mhh, that TGSI parser doesn't like me :/ I guess I missed a place to set that precise flag
14:53 karolherbst: ohh, found it
14:53 swedave: karolherbst . Can you email me instrucktions how i can try to get reclock work with my nvidia 970 4gb ?
14:57 karolherbst: swedave: first build nouveau from source and get this working ;) you could use a drm-next kernel or 4.11 or so
15:06 karolherbst: imirkin: do you think it makes sense to print the precise flag in nv50_ir as well?
15:07 imirkin: dunno... maybe at the end (p) or osmething
15:07 karolherbst: it may be confusing if you look at the ir and think "huh, that could be optimized"
15:09 karolherbst: but I also don't mind it that much, because you will see it inside the TGSI as well
15:10 imirkin: btw, your code doesn't deal with a situation like
15:10 imirkin: foo = a * b;
15:10 imirkin: precise bar = a * b;
15:10 imirkin: foo2 = foo + c;
15:10 imirkin: precise bar2 = bar + c;
15:11 imirkin: [or ... something]
15:11 imirkin: basically once you CSE something, you can lose the precise bit
15:11 karolherbst: well, glsl does everything
15:11 karolherbst: ohhh
15:11 karolherbst: I see
15:11 imirkin: so once you CSE, you should copy the precise bit from the removed value if any
15:11 karolherbst: yeah, I was thinking about situations where we build new instructions and forget to check for precise flags
15:11 imirkin: exactly.
15:11 karolherbst: but seriously, I would rather concentrate on real issues and CTS fails for now
15:12 karolherbst: otherwise I could spend a month reading all codegen to figure out every corner case ;)
15:12 imirkin: it's important to get this right.
15:12 imirkin: a month well spent imho
15:12 karolherbst: sure it is, not saying we shouldn't do it
15:12 swedave: Someone needs to get out more ;)
15:12 karolherbst: I should have a focus and later on we can fix up all the other parts
15:12 karolherbst: *just
15:13 imirkin: anyways, difficult to get EVERY case, but we should cover the ones we can think of.
15:13 karolherbst: or first the ones which matter ;) second what we can think of and third what ever bugs arrise
15:13 karolherbst: fixing the CTS fails may be challanging enough already
15:14 imirkin: the CSE case matters.
15:14 karolherbst: I know
15:14 imirkin: and is easy to fix.
15:14 karolherbst: yes
15:14 karolherbst: just need a way to run selected test with the CTS somehow
15:16 karolherbst: not really looking forward to run every case. who was it with the fancy scrpts to make those tests run under piglit, airlied?
15:18 imirkin: those are checked in
15:18 imirkin: there are cts_foo profiles
15:19 karolherbst: ohh, how do I run those?
15:19 imirkin: not easily.
15:19 imirkin: it'll take you an hour to work it out the first time
15:19 imirkin: but then it'll be easy to reuse.
15:19 karolherbst: :/
15:19 karolherbst: I could also just run the cTS binaries and select subtests though
15:22 swedave: Im gonna let you highiq folks towork on . Im gonna go to the gym . Thats for explaining the situation . I just have to wait for updates then ;( Take care .
15:22 swedave: thanks *
15:24 karolherbst: imirkin: do you know how to list all tests in the cts?
15:24 imirkin: yeah, but it doesn't work
15:24 imirkin: they recently added a master file
15:24 imirkin: which has all the tests you're supposed to pass
15:24 karolherbst: I see
15:25 karolherbst: external/openglcts/data/mustpass
15:26 karolherbst: ohhh nice
15:26 karolherbst: I can run it for 3.0
15:27 karolherbst: gpu_shader5.precise_qualifier
15:29 karolherbst: mhhh
15:30 karolherbst: now it would be nice to know what exactly went wrong
15:30 karolherbst: :D
15:31 imirkin: look at TestResults.qpa
15:32 imirkin: iirc there may be options you can pass to the runner to get more debug output
15:32 karolherbst: it also fails with NV50_PROG_OPTIMIZE=0 :/
15:33 karolherbst: it passes on intel though
15:33 imirkin: yeah, st/mesa might do some stuff
15:34 karolherbst: yeah
15:34 karolherbst: just wanted to ask how to disable those opts there
15:34 imirkin: although i can't imagine what...
15:34 imirkin: it does copy prop
15:34 imirkin: and dce
15:34 imirkin: and various register renumbering
15:34 imirkin: but ... none of that should affect results
15:34 imirkin: so in combination with disabling all nouveau-side opts, should be fine
15:35 imirkin: welll... good luck :)
15:35 imirkin: feel free to file a card
15:35 imirkin: on the CTS board
15:35 imirkin: and include your analysis
15:35 karolherbst: it's not nv50_irs fault :p
15:37 karolherbst: so how can I disable all st/mesa opts?
15:37 imirkin: judicious use of "//"
15:37 imirkin: oh wait
15:37 imirkin: there are a couple of lame ones in there
15:37 imirkin: hold on
15:38 imirkin: https://cgit.freedesktop.org/mesa/mesa/tree/src/mesa/state_tracker/st_glsl_to_tgsi.cpp#n1548
15:38 imirkin: =]
15:39 imirkin: karolherbst: oh, one more thing - i think for OP_MAD, for precise, nv50_ir_from_tgsi should emit a mul + add instead.
15:39 imirkin: or maybe do it in lowering in nvc0
15:39 karolherbst: ...
15:39 karolherbst: well
15:39 imirkin: (but not for OP_FMA, of course)
15:39 karolherbst: only ADDs are tagged as precise in the TGSI
15:39 karolherbst: the TGSI looks wrong anyway
15:40 karolherbst: but I also don't find the glsl
15:40 imirkin:hates that there's both MAD and FMA
15:41 imirkin: i wish i could take like ... a year off ... and fix all this shit
15:41 karolherbst: imirkin: make a list, somebody might be able to take care of it soon
15:41 imirkin: =]
15:46 karolherbst: ohh
15:46 karolherbst: imirkin: it suceeds now :D
15:46 karolherbst: wondering why that is
15:46 imirkin: after doing what?
15:46 imirkin: commenting the try_emit_mad's?
15:46 karolherbst: yes
15:47 imirkin: coz they become FMA
15:47 karolherbst: yeah
15:47 imirkin: but in a different shader, the try_emit_mad fails
15:47 imirkin: and so it comes out as mul+add
15:47 imirkin: hence diff result
15:47 karolherbst: but the Precise flag should be set there as well already
15:47 karolherbst: so I can disable this st/mesa pass if the instruction is flagged as precise
15:47 imirkin: imo we should lower mad -> mul + add in nouveau unconditionally
15:48 imirkin: irrespective of precise.
15:48 karolherbst: no
15:48 imirkin: and then recombine into mad if possible.
15:48 karolherbst: compilers are allowed to do the mul+add=mad opt
15:48 imirkin: as an optimization
15:48 karolherbst: well sure, but if we split it, we would just opt it again
15:48 imirkin: yep
15:48 karolherbst: because the precise flag isn't there
15:49 karolherbst: so it's kind of bogus to do that
15:49 imirkin: shouldn't there be a precise flag in this case though?
15:49 imirkin: in which case it wouldn't get recombined
15:49 karolherbst: not really
15:49 imirkin: o..
15:49 imirkin: that's a problem.
15:49 karolherbst: I just check for precise in that st/mesa pass
15:49 karolherbst: it's simple enough
15:50 karolherbst: or we could simply remove those peepholes
15:51 imirkin: look ... you're not trying to hack your way around things until this one test passes
15:51 imirkin: you want to do something that is logically consistent
15:51 imirkin: logically, MAD = mul + add; FMA = fma
15:52 karolherbst: I know, but if the expression is tagged as precise, st/mesa isn't allowed to do this mad peephole as well
15:52 imirkin: should be possible for MAD to have a precise flag
15:52 imirkin: sure it is.
15:52 karolherbst: why?
15:52 imirkin: why not?
15:52 imirkin: MAD = mul + add. all's well.
15:52 karolherbst: because it's tagged as precise
15:52 imirkin: it's nouveau that implements it as fma, which is wrong.
15:53 karolherbst: mhh, okay, I see your point
15:53 imirkin: which is also why we should disassociate MAD on input
15:53 imirkin: and then reassociate as possible.
15:53 imirkin: that said, the st/mesa opt is pretty dumb too
15:53 imirkin: so wtvr
15:54 imirkin: but the st/mesa thing isn't wrong. it's just dumb.
15:54 karolherbst: I see
15:54 karolherbst: fun thing is, that those mul
15:54 karolherbst: +adds aren't tagged as precise in the first place
15:54 imirkin: what's the glsl say?
15:55 karolherbst: I try to find it
15:55 imirkin: MESA_GLSL=dump helps
15:55 karolherbst: ohhhhh
15:55 karolherbst: "precise float result = (p.x*w.x + p.y*w.y) + (p.z*w.z + p.w*w.w);"
15:55 karolherbst: all subexpressions need to be tagged as well afaik
15:56 imirkin: of course, yea
15:56 karolherbst: https://gist.github.com/karolherbst/64456e53890819fb3ee6e083492ef347
15:58 imirkin: yeah, you need to see how glsl -> nir handles it
15:58 imirkin: or ask around
15:58 karolherbst: I already checked
15:58 karolherbst: and I was sure I do exactly the same thing
15:58 imirkin: could be that the nir thing is incomplete and intel just gets lucky
15:59 karolherbst: well nouveau also gets lucky
15:59 imirkin: or could be that nir has some logic on its side to propagate precise's
15:59 karolherbst: mhhh, doubt it
16:01 karolherbst: imirkin: what's "hir" again?
16:01 imirkin: hierarchical ir visitor
16:02 imirkin: [i have no clue]
16:02 imirkin: it goes ast -> hir -> ir
16:02 imirkin: or something
16:02 karolherbst: ahh okay
16:02 karolherbst: so unrelated to nir
16:02 imirkin: correct.
16:02 karolherbst: there is only one line regarding precise in glsl_to_nir
16:03 karolherbst: and it's where assignments are checked
16:03 karolherbst: but maybe nir has some more information and can analye the entire assignment tree to tag more as precise
16:03 karolherbst: and we would need to do that in glsl_to_tgsi as well
16:05 imirkin: they might treat the whole ssa chain as precise - i dunno.
16:06 imirkin: like i said ... worth asking
16:06 karolherbst: yeah
17:30 karolherbst: imirkin: nice, got it working with a crapy hack :/
17:30 karolherbst: the mads are flagged as precise as well though
17:31 karolherbst: this is uper ugly: https://github.com/karolherbst/mesa/commit/4e65f8a1e5e97038adf8fd35ef48dacc355da815
17:31 karolherbst: especially because I've added state to glsl_to_tgsi_visitor
17:32 karolherbst: the lines around "ir->rhs->accept(this);" is the important part
17:34 imirkin: hmmm
17:34 imirkin: yea
17:34 imirkin: not the worst thing in the world
17:36 karolherbst: but I think other ways especially if it comes to subtree parsing are much more complex than this
17:37 karolherbst: especially because precise has no effect if it's declared out of scope
17:38 karolherbst: interesting is stuff like float f = a * b; precise float g = 2*f+a
17:38 karolherbst: is the expression a*b also taged as precise?
17:38 imirkin: ;)
17:38 karolherbst: but I think it's a yes here
17:38 imirkin: talk to kayden
17:39 imirkin: and/or jekstrand
17:39 karolherbst: no, it's a yes
17:39 imirkin: i believe one of them did the i965 precise stuff
17:39 karolherbst: it's basically in the spec
17:39 karolherbst: https://www.khronos.org/registry/OpenGL/extensions/ARB/ARB_gpu_shader5.txt
17:39 karolherbst: "ome examples of the use of "precise" include:" part
17:39 imirkin: and can share some insights with you about how to properly implement it
17:39 karolherbst: the main function there
17:39 karolherbst: but I think glsl handles this already
17:40 karolherbst: it looked like it looking at the tomb_raider shader output
17:40 karolherbst: anyway, now I implement this mad->mul+add split in nv50_ir and then we should be good to go
17:45 karolherbst: pass :)
17:47 karolherbst: imirkin: what do you think? https://github.com/karolherbst/mesa/commit/1a89b4afc582df4d08aa5086384119da282ac1c5
17:48 imirkin: sounds reasonable
17:49 karolherbst: would be fun if that other tomb raider bug is fixed now as well
17:50 karolherbst: sadly no
17:57 karolherbst: fun
17:57 karolherbst: 'KHR-GL44.shaders.arrays.constructor.float4_fragment' passes
17:58 karolherbst: but causes segfault when 'KHR-GL44.shaders.arrays.constructor.float4_*' is ran
17:58 karolherbst: https://gist.github.com/karolherbst/6f06886e7fc0ac47bd3e98074ad76a15
17:58 karolherbst: sounds like memory corruption somewhere
18:06 imirkin: annoying, right?
18:06 imirkin: there are a few of those.
18:06 karolherbst: super annoying
18:12 karolherbst: but it seems like we pass quite a lot of tests already
18:18 karolherbst: this looks interesting: https://gist.github.com/karolherbst/ef7ab8b4b5e5201d6967f104a2664db6
18:20 imirkin: oh yeah... i seem to recall debugging that... something gets left in a bin that's not supposed to
18:23 karolherbst: imirkin: wanna look at my whole precise series again and then after fixing your comments I post it as RFC on mesa-dev?
18:30 imirkin: karolherbst: just post it...
18:31 karolherbst: okay
18:42 karolherbst: mhhh, I should alter my CC settings :/ sorry
20:06 karolherbst: imirkin: ohh, I think I found the error for that memory thing
20:06 karolherbst: the same entry is twice inside the list
20:07 karolherbst: mhh
20:07 karolherbst: that's odd though
20:18 karolherbst: interesting
20:19 karolherbst: when I ignore on_flush in nvc0_bufctx_fence and always use the pending bufctx, there is no invalid read.
20:24 karolherbst: KHR-GL44.cull_distance.functional is broken on a glsl ir level
20:25 karolherbst: but doesn't fail on intel? ...
20:33 karolherbst: yeah, that looks wrong: "(declare (location=17 shader_out ) (array float 0) gl_ClipDistance)"
20:39 tobijk: karolherbst: it is right cull and clip get narrowed together ;>
20:40 tobijk: see lower_distance.cpp or how its called
20:41 karolherbst: tobijk: I get an error though
20:41 karolherbst: "ir_variable has maximum access out of bounds (1 vs 0) (declare (location=17 shader_in ) (array (array vec4 2) 1) gl_ClipDistanceMESA)"
20:42 tobijk: not said its implemented right, just that it is put into one slot
20:42 karolherbst: I see
20:45 tobijk: karolherbst: mh there was soething with array size=1 and cull, but i cant remember exactly
20:46 imirkin: karolherbst: yeah, it's a weird one
20:47 imirkin: karolherbst: double-weird is that it passes on radeon :)
20:52 karolherbst: and on intel
20:53 tobijk: karolherbst: well intel does not use the lowering and gallium
20:54 tobijk: two places to get things wrong :/
20:54 karolherbst: "Conditional jump or move depends on uninitialised value(s)" :)
20:54 karolherbst: https://gist.github.com/karolherbst/be24928c1270cb209fb96eeb0a0ce9ba
20:56 karolherbst: tobijk: is this array an implicit_sized_array?
20:58 tobijk: uhhhhm...
20:58 tobijk: should be :>
20:59 karolherbst: okay, at least that "!fields[i].implicit_sized_array" is not initialized
21:05 tobijk: karolherbst: what bothers me: normaly there should be lower_distance.cpp in your call stack
21:05 karolherbst: tobijk: I think I compiled with Ofast
21:05 tobijk: validate_ir_tree() is called rigth after the lowering not somewhere later
21:06 karolherbst: ohh wait
21:06 karolherbst: actually with O0
21:10 tobijk: karolherbst: mh the lowering pass looks different to the original one
21:10 karolherbst: what do you mean?
21:11 tobijk: its a bit different to what i remember :)
21:11 karolherbst: ahh I see
21:11 karolherbst: well code changes :p
21:20 tobijk: karolherbst: am i right, there is no clip size defined?
21:20 tobijk: or better it is 0, only cull is used in the test?
21:22 tobijk: karolherbst: https://cgit.freedesktop.org/mesa/mesa/tree/src/compiler/glsl/lower_distance.cpp#n674 make this cull_size and give it a test :>
21:30 airlied: pretty surs clip size is correct
21:30 tobijk: airlied: shouldnt it be cull_size if cli_size is 0?
21:30 airlied: no
21:30 airlied: its an offset
21:31 airlied: where to start the cull dists
21:31 tobijk: uhm the offset is after that
21:31 airlied: no it isnt
21:31 airlied: there are two initialiser paths for the isiotr
21:32 airlied: visitor
21:32 airlied: i have vague memories of writing that pass
21:33 airlied: only a year ago, how quickly i forget
21:35 karolherbst: anyhow, clip_size is 0
21:35 karolherbst: okay, makes sense though
21:35 tobijk: airlied: hmhm so new_size is 0 and max_array_access ends up as -1
21:36 tobijk: not sure if that matters
21:36 karolherbst: ohh, when is the glsl shader printed, after it linked or before that?
21:37 karolherbst: I was wondering, because I don'T think the shader I've got has anything todo with the error but is a new shader all along
21:38 tobijk: karolherbst: which test are you runnig exactly and can i get my hands on it? :D
21:38 karolherbst: tobijk: ./glcts --deqp-case='KHR-GL44.cull_distance.functional'
21:39 tobijk: well the cts are not for veryone as i remember?
21:39 karolherbst: I only have access to the public stuff
21:39 karolherbst: tobijk: https://github.com/KhronosGroup/VK-GL-CTS
21:39 tobijk: oh i'm out of the loops :>
21:40 karolherbst: removed that abort() call now
21:40 karolherbst: let's see
21:41 airlied: tobijk: new sizr is never 0
21:41 karolherbst: ahh, now it makes more sense
21:41 airlied: total size has to be > 0
21:44 tobijk: airlied: yep oversaw the orig->total_size ~_~
21:45 karolherbst: mhhh, it seems like there are multiple tests ran at once
21:50 tobijk: hrmpf the build system does not honor my prefix given to configure everywhere :/
21:50 tobijk: */usr/bin/install: das Entfernen von '/usr/share/glvnd/egl_vendor.d/50_mesa.json' ist nicht möglich: Keine Berechtigung*
21:50 karolherbst: hihi
21:51 karolherbst: don't install it
21:51 karolherbst: why would you?
21:51 tobijk: well configure mesa with prefix and just hit make install
21:51 tobijk: goes to the prefix path
21:51 karolherbst: ohhh mesa
21:51 tobijk: well it did in the good old days :-)
21:52 karolherbst: maybe that glvnd path is hardcoded?
21:52 tobijk: i think the forgot the prefix to include or simething
21:52 karolherbst: well
21:52 karolherbst: it's a fixed path
21:53 tobijk: jep, still not my favored behavior
21:54 karolherbst: tobijk: no, I meant according to spec the path is fixed. It doesn't make sense to install it somewhere else
21:54 karolherbst: but yeah
21:54 karolherbst: they should honor it
21:54 tobijk: mhm damn that spec :/
22:15 tobijk: karolherbst: do you run the cts with additional args to work with prime?
22:16 karolherbst: tobijk: second X server
22:16 tobijk: meh
22:16 karolherbst: tobijk: also you need to force 4.5 450 with nouveau ;)
22:17 karolherbst: tobijk: but you can also use DRI_PRIME
22:17 karolherbst: you just need to force a higher gl version
22:17 tobijk: ah
22:25 tobijk: karolherbst: oh with the test clip and cull sizes are 0 and thus get not lowered at all
22:25 karolherbst: there are several though
22:25 tobijk: ?
22:25 karolherbst: there are several tests where both values are 0
22:27 karolherbst: heh
22:29 karolherbst: this should be the shader messing up on nouveua: https://gist.github.com/karolherbst/1a0b741155e8c15cd4b7beaeb9190089
22:39 karolherbst: tobijk: shader_test file: https://gist.githubusercontent.com/karolherbst/24f83b4367231268d937c3b03a1d7bbb/raw/c590f4a3d8d7ff4a6770b6e9e5c7bc6c4ea1863c/gistfile1.txt
22:41 tobijk: karolherbst: ay thanks
22:47 karolherbst: tobijk: do you see how lower_clip_cull_distance isn't called for intel?
22:48 karolherbst: ohhhhhhhh
22:48 karolherbst: imirkin: you will love this
22:48 karolherbst: ../src/compiler/glsl/linker.cpp:4951
22:52 tobijk: karolherbst: yeah as i said, they do not lower :)
22:52 karolherbst: yeah...
22:52 karolherbst: I guess the lowering code is a bit faulty then
22:52 karolherbst: does it even work on AMD?
22:52 tobijk: dunno
22:52 tobijk: i only see cull sizes of 0 instead o 8
22:52 karolherbst: I would be surprised if it suceeds
22:57 karolherbst: tobijk: hum, I think there is something odd
22:58 tobijk: where?
22:58 karolherbst: this "in float clipdistance_data[1];" confuses me
22:59 karolherbst: there is no reason why gl_ClipDistanceMESA should be access at index 1, because nothing reads from it
22:59 karolherbst: or I am just blind
22:59 karolherbst: I can't see why the compiler generates this expression: "(declare (location=17 shader_in ) (array (array vec4 2) 1) gl_ClipDistanceMESA)"
23:00 tobijk: mhm im no further :/
23:01 karolherbst: he
23:02 karolherbst: tobijk: after removing all ASSIGN_CULL_DISTANCE calls in the geometry shader, it compiles
23:03 tobijk: heh well you just removed the error then :) (nothing uses cull, just discard it)
23:04 karolherbst: yeah
23:04 karolherbst: but the error is about clip
23:04 karolherbst: not cull
23:04 tobijk: no its clipDistanceMesa
23:04 karolherbst: ohhh
23:04 tobijk: clipDistance and cullDistance get lowered to that
23:04 karolherbst: it's an array for both?
23:05 karolherbst: okay, makes sense then
23:05 karolherbst: wondering why that array is of size 0 then
23:06 tobijk: yep it should be 1
23:07 tobijk: (we lower to 2x4)
23:10 karolherbst: going to bed sounds like a reasonable plan now :)
23:10 tobijk: yep good idea :)
23:10 airlied: I'm pretty sure radeon passes that test fine
23:10 karolherbst: airlied: but how?
23:11 karolherbst: gallium complains if that lowering isn't done
23:11 airlied: I'm pretty sure I even wrote the lowering pass on radeon
23:11 karolherbst: ohh, so there is special handling for radeon for this?
23:11 airlied: nope
23:11 karolherbst: mhhh
23:12 karolherbst: would you mind to check that?
23:12 karolherbst: because it doesn't looks like it should work
23:12 airlied: yupo just going to now
23:12 karolherbst: maybe somebody just broke something
23:14 airlied: might take me a few mins, the machine I nromally run CTS on seem to be dead
23:15 tobijk: a quick search did only bring up two commits really: https://cgit.freedesktop.org/mesa/mesa/commit/?id=64e201ab8f08daa2c189ab615a4096daf60c27c5 and https://cgit.freedesktop.org/mesa/mesa/commit/?id=fc707f570f918ab0defd33405c8c82f307196d17
23:16 imirkin: it's something about some lowering which is done differently between nouveau and radeon
23:16 tobijk: imirkin: and silly me thought it was run unconditionally if you use gallium
23:17 karolherbst: it should run on both actually
23:17 airlied: https://paste.fedoraproject.org/paste/u0bLNMAN9QFzEXrKdE1HmA
23:17 airlied: is from 17.0.5 just building master
23:17 karolherbst: mhhh
23:17 karolherbst: interesting
23:18 imirkin: i mean, some unrelated lowering
23:18 imirkin: like ... e.g. return lowering
23:18 karolherbst: ohh, I see
23:18 karolherbst: yeah, that might explain it
23:19 imirkin: which in turn is triggering a bug in one of the passes that radeon/intel don't trigger
23:20 airlied: it's also valgrind clean here
23:21 tobijk: airlied: master? so we are back to square 1
23:22 airlied: oh debug radeon build shows it here
23:22 karolherbst: that un-init read?
23:22 airlied: the assert
23:22 karolherbst: ohhh
23:22 karolherbst: odd
23:23 karolherbst: ohh wait
23:23 karolherbst: on nouveau as well
23:23 karolherbst: on nouveau there is just a silly test fail: Fail (Vertex is unexpectedly clipped or invisible at gl3cCullDistanceTests.cpp:2437)
23:23 karolherbst: on release build
23:23 airlied: we pass the test on release build
23:23 karolherbst: fun
23:24 airlied: let me go see if I can work it out
23:24 karolherbst: but at least radeon hits the same assert in the glsl code as well
23:24 karolherbst: now the world makes sense again :)
23:25 karolherbst: tobijk: fyi, running CTS and piglit on a second X server doesn't bring up those annoying windows all the time ;)
23:26 airlied: PIGLIT_NO_WINDOW=1 gets rid of most of them
23:27 karolherbst: well sure, but I also don
23:27 tobijk: oh piglit gets mature :D
23:27 karolherbst: 't like that my main X server gets messed up
23:27 karolherbst: sometimes.... it gets weird
23:27 karolherbst: especially if a test causes to hang the GPU or stuff like this
23:28 tobijk: karolherbst: well then i'm fucked anyway as my external screen is connected to the nvidia card
23:28 karolherbst: true
23:28 karolherbst: but I've heard nouveau can recover from faults now
23:28 karolherbst: kinda
23:29 tobijk: karolherbst: actually i saw it revoer from some minor errors just fine
23:29 karolherbst: yeah I know it works most of the time, but I still get nouveau to crash real hard
23:29 tobijk: yeah
23:32 airlied: looks like it's validating the wrong level of array if I had to guess, not sure how max_array_access works for multi-level arrays
23:33 tobijk: airlied: you got to the lowering pass with actual sizes for clip and cull in that test?
23:34 tobijk: max_array_access was my idea earlier on where i bugged you about, but it looks like it does not even get there
23:34 karolherbst: have fun, I am outta here
23:34 tobijk: karolherbst: good night
23:34 airlied: max_array_acccess is being set wrong
23:36 airlied: https://paste.fedoraproject.org/paste/xDQiVUptpzVFmYurfLr2IQ fixes it here
23:39 tobijk: airlied: let me do a clean compile
23:47 airlied: patch sent to list
23:50 tobijk: mh ok then one: one less bug more to go for nouveau