04:06 imirkin: someone should look at GL_NV_scissor_exclusive -- seems like an easy ext to implement -- but it's turing+
04:07 skeggsb: RT exts would be more fun:P
04:08 HdkR: Please implement RT for my TU102 :)
04:08 imirkin: HdkR: do you run nouveau?
04:09 skeggsb: HdkR: that'll be nice and fun at boot clocks :P
04:09 HdkR: imirkin: Once you've gone RT do you want to live your life without RT anymore?
04:09 HdkR: :P
04:11 HdkR: mmm, boot clock RT sounds perfect for a couple rays per frame
04:12 imirkin: you live in a low-light world :)
04:12 HdkR: Very noisy
04:16 raket: a? bgc¥m
04:16 imirkin: time to change password?
04:16 raket: phone + irssi :)
04:17 imirkin: hehe ok
14:37 karolherbst: imirkin: mhh.. is there anything special the compiler needs to do for tfb to work? I thought that's all driver side?
14:38 karolherbst: ahh ehh
14:38 karolherbst: silly me
14:38 karolherbst: should have checked dmesg before
14:38 karolherbst: "gr: GPC1/TPC1/MP trap: global 00000000 [] warp 3f0009 [ILLEGAL_INSTR_ENCODING]"
14:38 imirkin: yeah, i mean compiler has to respect varying layout or else nothing works
14:38 imirkin: but other than that, it doesn't care about tfb
14:38 karolherbst: yeah.. right, but the compiler needs to respect that for other things anyway :p
14:39 karolherbst: just trying to get the remaining CTS fails out of the way
14:39 karolherbst: 4 fails in a 4.6 run
14:39 imirkin: what are the failing ones?
14:39 imirkin: btw, i assume you're not workign on some ancient tree?
14:39 karolherbst: nir
14:40 imirkin: i did have some mild xfb fixes for some failing CTS tests
14:40 imirkin: i pushed those a couple months ago, i think
14:40 imirkin: (or maybe they were failing GTF tests... i don't remember. something was failing.)
14:41 karolherbst: KHR-GL45.direct_state_access.textures_parameter_setup_errors but that's on master
14:41 imirkin: yeah, that sounds like a "global" issue
14:41 karolherbst: this is the only fail with TGSI (besides CTS and API bugs)
14:41 karolherbst: ahh right.. probably a core bug as well
14:41 imirkin: i mean something due to core mesa
14:41 imirkin: nouveau (or even st/mesa for that matter) doesn't handle GL errors
14:41 karolherbst: only nir special fails are KHR-GL46.shader_subroutine.ssbo_atomic_image_load_store and GTF-GL46.gtf40.GL3Tests.transform_feedback3.transform_feedback3_geometry_instanced
14:42 imirkin: oh man
14:42 imirkin: i don't remember all this instanced business
14:42 karolherbst: yeah.. but the tfb one throws encoding errors
14:42 imirkin: would have to RTFS
14:42 imirkin: (s = spec in this case)
14:42 imirkin: ah, well check if it's legit
14:42 karolherbst: yeah :)
14:42 imirkin: or if we're overwriting something we shouldn't be
14:49 karolherbst: ehh.. nvdisasm is happy with it :/
14:49 imirkin: must be perfect!
14:49 imirkin: does nvdisasm match nvdisasm with tgsi
14:49 imirkin: or rather
14:49 karolherbst: https://gist.githubusercontent.com/karolherbst/19808c02012076c718cad7be2a61e0cb/raw/d251245de6306657cd2ff12523e87f980bc6b9d1/gistfile1.txt
14:50 karolherbst: well.. geometry shaders are annoying to compare
14:50 imirkin: yea....
14:50 karolherbst: out.EMIT_THEN_CUT maybe?
14:50 imirkin: lemme see if i see anything obvious
14:50 karolherbst: especially as nir optimizes writes after the last emit away
14:50 karolherbst: and codegen doesn't
14:50 imirkin: wouldn't matter if you do EMIT / CUT or if you do it one go.
14:51 imirkin: oh
14:51 imirkin: um
14:51 imirkin: this is points, right?
14:51 imirkin: i don't think you're supposed to have CUT in there
14:51 imirkin: (or EMIT_THEN_CUT)
14:51 imirkin: i could imagine that causing the illegal ops
14:51 karolherbst: not quite sure...
14:51 imirkin: check if they show up in tgsi
14:52 imirkin: (this must be points since it's multi-stream output)
14:52 imirkin: alllso ...
14:52 imirkin: /*0158*/ OUT.EMIT R8, RZ, RZ ; /* 0xfbe000800ff7ff08 */
14:52 imirkin: not 100% sure that's legal
14:52 karolherbst: mhhh
14:52 imirkin: the final arg may *have* to be an immediate
14:52 imirkin: i.e. 0x0
14:52 imirkin: vs RZ
14:52 imirkin: (yeah, i know, it's dumb)
14:52 imirkin: (why have the encoding if it doesn't work? dunno. i don't make the rules.)
14:53 imirkin: again - check against tgsi
14:53 karolherbst: " 67: emit u32 $r4 $r4 $r255" is what TGSI does :p
14:53 imirkin: ok
14:53 imirkin: so i guess RZ works
14:53 karolherbst: the test does like 8 or so geomertry shaders
14:53 imirkin: does it do cut's?
14:53 karolherbst: and only one fails
14:53 karolherbst: restart in codegen, right?
14:53 imirkin: i don't remember.
14:53 imirkin: probably
14:53 imirkin: just check the nvdisasm :)
14:54 karolherbst: trying to find the one causing issues in my dumped output :)
14:54 karolherbst: soo yeah.. no "emit restart" in TGSI
14:55 karolherbst: but seeing restarts with TGSI
14:56 karolherbst: restart == OUT.CUT
14:56 karolherbst: emit == OUT.EMIT
14:56 karolherbst: and "emit restart" == OUT.EMIT_THEN_CUT
14:56 karolherbst: so I'd assume the EMIT_THEN_CUT mayb be illegal
14:58 karolherbst: ohh yeah.. with TGSI we have some special handling
14:58 karolherbst: if (stream && op == OP_RESTART) break;
14:58 karolherbst: and if (info->prop.gp.maxVertices == 0) break;
14:59 imirkin: that's a super-weird case
14:59 imirkin: some dEQP test hit it
14:59 imirkin: where someone puts in layout (max_vertices = 0) and then does an emit
14:59 karolherbst: ahhh
14:59 imirkin: i.e. "wtf were you thinking"
15:00 karolherbst: I think with nir that gets optimized away.. at least I hope it will
15:00 karolherbst: mhhh
15:00 karolherbst: no illegal encoding, but still a fail
15:01 karolherbst: at least tfb reports 0 for everything now
15:03 karolherbst: ehh...
15:03 karolherbst: my fault
15:03 karolherbst: you know.. you put "// fallthrough" for a reason and then you don't see it
15:04 karolherbst: soo.. passing
15:12 imirkin: break; // fallthrough
15:12 imirkin: heh
15:13 imirkin: those can't *both* be right...
15:13 karolherbst: maybe a static code analyzer complained :p
15:14 karolherbst: the one spirv fail is also slightly annoying
15:14 linkmauve: In C++ you now have [[fallthrough]] to annotate that the absence of a break; was wanted.
15:14 karolherbst: "Too many compute shader image uniforms (9 > 8)" :/
15:15 karolherbst: and I am wondering if anything stupid messes up completly
15:19 karolherbst: imirkin: can we really only do 8 images per stage?
15:20 imirkin: iirc yes
15:20 imirkin: 8 on fermi
15:20 karolherbst: mhhh...
15:21 imirkin: we never cared to bump the limit higher on other hw
15:21 karolherbst: wondering what nvidia does to pass the test
15:21 karolherbst: ohh
15:21 karolherbst: what hw can support more?
15:21 imirkin: kepler+ is all bindless
15:21 karolherbst: ahh, right
15:21 imirkin: kepler can support infinity
15:21 imirkin: maxwell+ has to have a TIC, but can still support a lot.
16:32 karolherbst: imirkin: mhh.. your 2D/3D image lowering seems to either show a bug or breaks something here :/
16:32 karolherbst: don't see anything obviously wrong witht his shader: https://gist.githubusercontent.com/karolherbst/08c5b5d85222cbc96fcebbd298482740/raw/af73ba59fd1202922d8d40b72198cc2d9b4db454/gistfile1.txt
16:32 karolherbst: ohh wait...
16:33 karolherbst: there is an arg too many on the suldp
16:34 imirkin: it's very fiddly.
16:34 karolherbst: yeah...
16:34 karolherbst: nir added support for explicit lods on images
16:35 imirkin: mmm ... i suspect we can support that, but would require explicit logic to do so.
16:35 karolherbst: so now I have an additional lod = 0 param... and my code isn't really protected against that
16:35 imirkin: ah oops!
16:35 karolherbst: yeah..
16:35 karolherbst: it used to be different backt then :D
16:36 imirkin: one of the downsides of an actively-developed thing ... you gotta keep up!
16:36 karolherbst: my logic wasn't good though :p
16:36 karolherbst: I just checked against src counts
16:36 imirkin: btw, if you're looking to make nir equivalent, you need to pipe gl_ViewportMask through
16:36 karolherbst: already done
16:36 imirkin: k cool
16:37 karolherbst: was like 1 loc :p
16:37 karolherbst: https://gitlab.freedesktop.org/karolherbst/mesa/-/commit/b93d9a5a87fe5e1604cfe07caf0cb0c73e993cfb
16:38 karolherbst: or maybe I missed something?
16:38 imirkin: dunno
16:38 imirkin: run the tests?
16:38 karolherbst: this is for nv_viewport_array2 right?
16:38 karolherbst: or did you refer to somethign else?
16:39 imirkin: NV_viewport_array2, yes
16:39 imirkin: i pushed piglit tests
16:39 karolherbst: ahh yeah, all piglit test pass
16:39 imirkin: (or at least wrote them ... hopefully pushed them)
16:48 karolherbst: mhh.. sadly it's something else it seems
16:48 karolherbst: RA goes nuts for some reason and optimizes the suldb away :/
16:49 karolherbst: or well.. rather makes is dead
16:50 imirkin: karolherbst: oh, also make sure you pipe through the EXT_texture_shadow_lod support
16:51 karolherbst: also done :p
16:51 imirkin: i believe nir already supported it previously, but it will require some ... updates ... to pass it through to nouveau correctly
16:51 imirkin: ok
16:51 karolherbst: I have a bunch of stuff here: https://gitlab.freedesktop.org/karolherbst/mesa/-/commits/nv_nir_fixes/
16:51 imirkin: and support for this GL_EXT_demote_to_helper_invocation ?
16:51 karolherbst: yep :p
16:52 karolherbst: although not 100% about EXT_texture_shadow_lod actually...
16:52 imirkin: i think that's it then.
16:52 karolherbst: no texture_lod related regressions though
16:52 imirkin: there are tests somewhere
16:52 imirkin: for it. either CTS or dEQP
16:52 imirkin: i used them for testing my impl
16:52 karolherbst: I have a fail with ext_demote_to_helper_invocation .. but yeah
16:53 karolherbst: ohh, right.. wanted to check GLES after piglit and the GL CTS
16:53 imirkin: (for ext_texture_shadow_lod that is)
16:53 karolherbst: some piglit regressions are super annoying...
16:53 karolherbst: like nir does a so good job, it optimizes uniforms away the test relies on...
16:53 karolherbst: or well... uses
16:54 imirkin: does no one use piglit anymore?
16:54 karolherbst: no fail in piglit was never a goal :p
16:54 imirkin: super.
16:54 karolherbst: but not sure if that's piglit or nirs fault though
16:54 karolherbst: generated_tests/spec/glsl-120/execution/variable-indexing/fs-temp-array-mat2-index-row-wr.shader_test
16:55 karolherbst: "index" is gone eg
16:55 karolherbst: but that sounds weird...
16:55 karolherbst: or maybe not
16:55 karolherbst: and it always returns vec4(1.0, 0.0, 0.0, 1.0)...
16:55 karolherbst: or the other value
17:24 karolherbst: yeah.. something is broken in the lowering... ufff
17:25 karolherbst: in an odd way
17:43 karolherbst: imirkin: I found it... ufff
17:43 karolherbst: imirkin: https://gist.githubusercontent.com/karolherbst/58fc84a99acbfe5caded3d05ec55231b/raw/057eb8d31738de31f3dcbe4aec6088e5fada3d62/gistfile1.txt
17:44 karolherbst: branched code.. both sustp use the same regs
17:44 imirkin: why is that bad?
17:44 karolherbst: apparently bad things happen
17:44 karolherbst: but I am confused why it happens at all
17:45 karolherbst: the lowering pass changes all sustps using those same values
17:45 karolherbst: so if the second gets lowered the first is changed as well
17:45 imirkin: oh, literally the same pointers
17:45 karolherbst: yeah..
17:46 imirkin: that means something isn't getting disconnected the way it should
17:46 karolherbst: in nir I have a register.. mhh
17:46 karolherbst: I guess
17:46 imirkin: or two things are connected that normally wouldn't be
17:46 karolherbst: I think it's because def.replace is used
17:46 imirkin: right, but that's normally safe i guess
17:46 karolherbst: pre ssa?
17:46 imirkin: dunno
17:46 karolherbst: yeah.. not sure either
17:46 imirkin: maybe not
17:47 karolherbst: I am wondering why nir places a register there anyway...
17:47 karolherbst: mhhh
17:47 karolherbst: right
17:47 karolherbst: the last store uses the same values
17:48 karolherbst: in BB:4
17:48 karolherbst: and the values are filled by an image_load
17:50 karolherbst: yeah.. mhhh
17:51 karolherbst: imirkin: this is the step when the first def.replace is called: https://gist.github.com/karolherbst/6f7b4ba2a8d409eed6a9c647d09a95f3
17:51 karolherbst: and this just looks wrong
17:51 karolherbst: not directly
17:51 karolherbst: but it is
17:52 karolherbst: I have an idea
17:53 karolherbst: we just reuse the old value...
17:54 karolherbst: but not sure what difference that should make.. *sigh*
17:54 karolherbst: ohh, it actually makes a difference
18:04 karolherbst: "Passed: 1/1 (100.0%)" mhhh
18:05 karolherbst: imirkin: https://gist.github.com/karolherbst/d0cda2739dd5dcaf35e0077a27dbb86a
18:05 imirkin: uhm
18:06 imirkin: maybe?
18:06 imirkin: not a HUGE fan of that
18:06 imirkin: iirc it won't quite work
18:06 imirkin: you can't have a = union(a, b)
18:07 karolherbst: why not though?
18:07 imirkin: RA will sometimes screw it up
18:07 karolherbst: ahh
18:07 imirkin: that's my recollection
18:07 karolherbst: :/
18:07 imirkin: you can scroll through my ancient commits about that...
18:07 karolherbst: well
18:07 karolherbst: you don't end up with a = union(a, b) though
18:08 imirkin: 5b8f1a0f7c5b1412577a913d374192a2329fa615
18:08 karolherbst: something replaces some other bits later on or so..
18:08 imirkin: i think that was the thing.
18:08 imirkin: so might actually be fine
18:08 imirkin: and i misremembered
18:08 karolherbst: well.. a is defined when inserting this union :p
18:09 imirkin: heh. this was a fun one. d3a5cf052c38087b395871b5b46776e2a7d4a7d7
18:09 karolherbst: yeah.. well, will run the CTS and see how bad it is
18:10 karolherbst: but before RA I also have such unions: union u32 %r183 %r168 %r164 %r182 (0)
18:11 karolherbst: when going into SSA this all gets fixed up I guess
18:11 imirkin: right ... and nothing is reused...
18:11 imirkin: but the claim is that all of the sources are predicate-ly defined
18:11 karolherbst: I see a phi node resolving this
18:11 karolherbst: right..
18:11 karolherbst: that they are
18:11 imirkin: phi nodes are based on edges
18:11 imirkin: union nodes are not.
18:12 karolherbst: I mean.. you have two unions being the source of the same phi value
18:12 karolherbst: which is used at the end of the shader
18:12 imirkin: that should be fine.
18:12 karolherbst: yeah
18:13 karolherbst: maybe I create a new SSA value and just mov to the old one...
18:13 karolherbst: mhhh
18:13 karolherbst: kind of pointless though
18:14 karolherbst: anyway.. the sources are from the two suldps and the mov 0 and the predicates we have correctly set up
18:14 karolherbst: and if a union is fine with writing to one of it's sources then.. yeah
18:14 karolherbst: fine then :p
18:15 karolherbst: but I'd prefer to set new values on the suldps
18:15 karolherbst: do the union magic
18:15 karolherbst: and then mov into the old value
18:15 karolherbst: yeah...
18:18 imirkin: that's fine.
18:19 imirkin: so yeah. def.replace() is a bit dodgy pre-ssa
19:46 karolherbst: imirkin: do you think anything special needs to be done to bump the image limit or should it just work on kepler+?
19:46 karolherbst: thinking about bumping it to 16
19:49 imirkin: need to make room in the thing
19:49 karolherbst: right..
19:49 imirkin: where the image info is stored
19:49 imirkin: otherwise no
19:49 karolherbst: k
19:54 karolherbst: mhh.. not sure what to do about fermi though... :/ probably need to fix the CTS test anyway.. or maybe nvidia already has a patch
20:01 imirkin: yeah dunno
20:01 imirkin: fermi is hard limit of 8 though, and frag shader only
20:01 karolherbst: mhhh
20:01 imirkin: (nvidia driver supports it in other stages, but we don't, since "we" are lazy, aka me)
20:02 karolherbst: yeah well..
20:02 karolherbst: there is no good reason to support it, right?
20:02 karolherbst: or what's the difficult part here?
20:02 imirkin: nothing uses images outside frag/compute afaik
20:02 imirkin: oh, it's just annoying... there's an overall limit of 8 images
20:02 imirkin: which is global
20:02 imirkin: or something
20:02 imirkin: i forget
20:03 karolherbst: ehh...
20:03 karolherbst: sounds annoying
20:03 imirkin: and the spec only requires frag/compute
20:03 imirkin: (well, spec doesn't talk about compute. but the compute spec requires 8 images :) )
20:04 karolherbst: what was the website with the gl limits per driver?
20:05 HdkR: https://opengl.gpuinfo.org/ ?
20:05 karolherbst: yeah.. maybe
20:05 karolherbst: ahh yeah
20:06 HdkR: I've heard of a game dev using images in the vertex stage, so not completely impossible :P
20:07 karolherbst: heh
20:07 karolherbst: nvidia always only reports 8 images
20:07 karolherbst: even on newer gens
20:08 karolherbst: per stage that is
20:08 karolherbst: ehh...
20:09 karolherbst: let me run the last released cts version then :D
20:09 imirkin: HdkR: yeah, i guess it's possible
20:09 imirkin: HdkR: but i think you'll agree that it's at least rare
20:09 imirkin: and spec doesn't require support for it
20:09 imirkin: afaik on pre-GCN amd it's completely impossible to do it outside frag
20:09 karolherbst: sadly we can't do it on tesla either :/
20:10 imirkin: well, tesla is compute-only
20:10 imirkin: can't do images in frag
20:10 imirkin: ES 3.10 only requires images in compute btw
20:10 karolherbst: wondering if somebody writes lowering code....
20:10 karolherbst: but ssbos are also compute only, right?
20:11 HdkR: Yes, it's a super rare use case, and I think the only reason they used Images over SSBOs is to get the free format conversion
20:11 airlied: yeah pre-GCN only has compute/frag
20:11 airlied: and Haswell I think as well
20:11 karolherbst: mhhhh
20:11 karolherbst: I have a crazy idea...
20:12 karolherbst: just ower vps and fps into compute shaders :p
20:12 karolherbst: *lower
20:12 airlied: yeah haswell has only frag/compute image uniforms
20:16 imirkin: karolherbst: yeah, do T&L in compute and then just feed the results into a no-op rast pipeline
20:16 karolherbst: exactly
20:17 karolherbst: but I am indeed wondering why there is no ssbo support in fps...
20:17 karolherbst: sounds like.. uff..
20:17 karolherbst: ohh wait.. tesla wasn't unified design, right?
20:17 karolherbst: that's fermi+
20:17 imirkin: i believe tesla is considered unified
20:17 imirkin: even though it's not 100%
20:18 karolherbst: mhh
20:18 karolherbst: those numbers...
20:18 karolherbst: 128 shader cores in the high-end version :D
20:18 karolherbst: cute
20:20 karolherbst: airlied: ohh, are you willing to review nir_to_llvm patches? :p
20:20 karolherbst: airlied: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5480
20:20 karolherbst: could allow you to get rid of image derefs :d
20:20 karolherbst: :D
20:20 karolherbst: the MR.. not the one patch
20:20 karolherbst: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5480/diffs?commit_id=b784a610586ceff8188b687401d883ffa5305e3c
20:20 karolherbst: is what I need a review for
21:57 airlied: karolherbst: hakzsam or bas might be best, maybe just cc them ont