04:06imirkin: someone should look at GL_NV_scissor_exclusive -- seems like an easy ext to implement -- but it's turing+
04:07skeggsb: RT exts would be more fun:P
04:08HdkR: Please implement RT for my TU102 :)
04:08imirkin: HdkR: do you run nouveau?
04:09skeggsb: HdkR: that'll be nice and fun at boot clocks :P
04:09HdkR: imirkin: Once you've gone RT do you want to live your life without RT anymore?
04:11HdkR: mmm, boot clock RT sounds perfect for a couple rays per frame
04:12imirkin: you live in a low-light world :)
04:12HdkR: Very noisy
04:16raket: a? bgc¥m
04:16imirkin: time to change password?
04:16raket: phone + irssi :)
04:17imirkin: hehe ok
14:37karolherbst: imirkin: mhh.. is there anything special the compiler needs to do for tfb to work? I thought that's all driver side?
14:38karolherbst: ahh ehh
14:38karolherbst: silly me
14:38karolherbst: should have checked dmesg before
14:38karolherbst: "gr: GPC1/TPC1/MP trap: global 00000000  warp 3f0009 [ILLEGAL_INSTR_ENCODING]"
14:38imirkin: yeah, i mean compiler has to respect varying layout or else nothing works
14:38imirkin: but other than that, it doesn't care about tfb
14:38karolherbst: yeah.. right, but the compiler needs to respect that for other things anyway :p
14:39karolherbst: just trying to get the remaining CTS fails out of the way
14:39karolherbst: 4 fails in a 4.6 run
14:39imirkin: what are the failing ones?
14:39imirkin: btw, i assume you're not workign on some ancient tree?
14:40imirkin: i did have some mild xfb fixes for some failing CTS tests
14:40imirkin: i pushed those a couple months ago, i think
14:40imirkin: (or maybe they were failing GTF tests... i don't remember. something was failing.)
14:41karolherbst: KHR-GL45.direct_state_access.textures_parameter_setup_errors but that's on master
14:41imirkin: yeah, that sounds like a "global" issue
14:41karolherbst: this is the only fail with TGSI (besides CTS and API bugs)
14:41karolherbst: ahh right.. probably a core bug as well
14:41imirkin: i mean something due to core mesa
14:41imirkin: nouveau (or even st/mesa for that matter) doesn't handle GL errors
14:41karolherbst: only nir special fails are KHR-GL46.shader_subroutine.ssbo_atomic_image_load_store and GTF-GL46.gtf40.GL3Tests.transform_feedback3.transform_feedback3_geometry_instanced
14:42imirkin: oh man
14:42imirkin: i don't remember all this instanced business
14:42karolherbst: yeah.. but the tfb one throws encoding errors
14:42imirkin: would have to RTFS
14:42imirkin: (s = spec in this case)
14:42imirkin: ah, well check if it's legit
14:42karolherbst: yeah :)
14:42imirkin: or if we're overwriting something we shouldn't be
14:49karolherbst: ehh.. nvdisasm is happy with it :/
14:49imirkin: must be perfect!
14:49imirkin: does nvdisasm match nvdisasm with tgsi
14:49imirkin: or rather
14:50karolherbst: well.. geometry shaders are annoying to compare
14:50karolherbst: out.EMIT_THEN_CUT maybe?
14:50imirkin: lemme see if i see anything obvious
14:50karolherbst: especially as nir optimizes writes after the last emit away
14:50karolherbst: and codegen doesn't
14:50imirkin: wouldn't matter if you do EMIT / CUT or if you do it one go.
14:51imirkin: this is points, right?
14:51imirkin: i don't think you're supposed to have CUT in there
14:51imirkin: (or EMIT_THEN_CUT)
14:51imirkin: i could imagine that causing the illegal ops
14:51karolherbst: not quite sure...
14:51imirkin: check if they show up in tgsi
14:52imirkin: (this must be points since it's multi-stream output)
14:52imirkin: alllso ...
14:52imirkin: /*0158*/ OUT.EMIT R8, RZ, RZ ; /* 0xfbe000800ff7ff08 */
14:52imirkin: not 100% sure that's legal
14:52imirkin: the final arg may *have* to be an immediate
14:52imirkin: i.e. 0x0
14:52imirkin: vs RZ
14:52imirkin: (yeah, i know, it's dumb)
14:52imirkin: (why have the encoding if it doesn't work? dunno. i don't make the rules.)
14:53imirkin: again - check against tgsi
14:53karolherbst: " 67: emit u32 $r4 $r4 $r255" is what TGSI does :p
14:53imirkin: so i guess RZ works
14:53karolherbst: the test does like 8 or so geomertry shaders
14:53imirkin: does it do cut's?
14:53karolherbst: and only one fails
14:53karolherbst: restart in codegen, right?
14:53imirkin: i don't remember.
14:53imirkin: just check the nvdisasm :)
14:54karolherbst: trying to find the one causing issues in my dumped output :)
14:54karolherbst: soo yeah.. no "emit restart" in TGSI
14:55karolherbst: but seeing restarts with TGSI
14:56karolherbst: restart == OUT.CUT
14:56karolherbst: emit == OUT.EMIT
14:56karolherbst: and "emit restart" == OUT.EMIT_THEN_CUT
14:56karolherbst: so I'd assume the EMIT_THEN_CUT mayb be illegal
14:58karolherbst: ohh yeah.. with TGSI we have some special handling
14:58karolherbst: if (stream && op == OP_RESTART) break;
14:58karolherbst: and if (info->prop.gp.maxVertices == 0) break;
14:59imirkin: that's a super-weird case
14:59imirkin: some dEQP test hit it
14:59imirkin: where someone puts in layout (max_vertices = 0) and then does an emit
14:59imirkin: i.e. "wtf were you thinking"
15:00karolherbst: I think with nir that gets optimized away.. at least I hope it will
15:00karolherbst: no illegal encoding, but still a fail
15:01karolherbst: at least tfb reports 0 for everything now
15:03karolherbst: my fault
15:03karolherbst: you know.. you put "// fallthrough" for a reason and then you don't see it
15:04karolherbst: soo.. passing
15:12imirkin: break; // fallthrough
15:13imirkin: those can't *both* be right...
15:13karolherbst: maybe a static code analyzer complained :p
15:14karolherbst: the one spirv fail is also slightly annoying
15:14linkmauve: In C++ you now have [[fallthrough]] to annotate that the absence of a break; was wanted.
15:14karolherbst: "Too many compute shader image uniforms (9 > 8)" :/
15:15karolherbst: and I am wondering if anything stupid messes up completly
15:19karolherbst: imirkin: can we really only do 8 images per stage?
15:20imirkin: iirc yes
15:20imirkin: 8 on fermi
15:21imirkin: we never cared to bump the limit higher on other hw
15:21karolherbst: wondering what nvidia does to pass the test
15:21karolherbst: what hw can support more?
15:21imirkin: kepler+ is all bindless
15:21karolherbst: ahh, right
15:21imirkin: kepler can support infinity
15:21imirkin: maxwell+ has to have a TIC, but can still support a lot.
16:32karolherbst: imirkin: mhh.. your 2D/3D image lowering seems to either show a bug or breaks something here :/
16:32karolherbst: don't see anything obviously wrong witht his shader: https://gist.githubusercontent.com/karolherbst/08c5b5d85222cbc96fcebbd298482740/raw/af73ba59fd1202922d8d40b72198cc2d9b4db454/gistfile1.txt
16:32karolherbst: ohh wait...
16:33karolherbst: there is an arg too many on the suldp
16:34imirkin: it's very fiddly.
16:34karolherbst: nir added support for explicit lods on images
16:35imirkin: mmm ... i suspect we can support that, but would require explicit logic to do so.
16:35karolherbst: so now I have an additional lod = 0 param... and my code isn't really protected against that
16:35imirkin: ah oops!
16:35karolherbst: it used to be different backt then :D
16:36imirkin: one of the downsides of an actively-developed thing ... you gotta keep up!
16:36karolherbst: my logic wasn't good though :p
16:36karolherbst: I just checked against src counts
16:36imirkin: btw, if you're looking to make nir equivalent, you need to pipe gl_ViewportMask through
16:36karolherbst: already done
16:36imirkin: k cool
16:37karolherbst: was like 1 loc :p
16:38karolherbst: or maybe I missed something?
16:38imirkin: run the tests?
16:38karolherbst: this is for nv_viewport_array2 right?
16:38karolherbst: or did you refer to somethign else?
16:39imirkin: NV_viewport_array2, yes
16:39imirkin: i pushed piglit tests
16:39karolherbst: ahh yeah, all piglit test pass
16:39imirkin: (or at least wrote them ... hopefully pushed them)
16:48karolherbst: mhh.. sadly it's something else it seems
16:48karolherbst: RA goes nuts for some reason and optimizes the suldb away :/
16:49karolherbst: or well.. rather makes is dead
16:50imirkin: karolherbst: oh, also make sure you pipe through the EXT_texture_shadow_lod support
16:51karolherbst: also done :p
16:51imirkin: i believe nir already supported it previously, but it will require some ... updates ... to pass it through to nouveau correctly
16:51karolherbst: I have a bunch of stuff here: https://gitlab.freedesktop.org/karolherbst/mesa/-/commits/nv_nir_fixes/
16:51imirkin: and support for this GL_EXT_demote_to_helper_invocation ?
16:51karolherbst: yep :p
16:52karolherbst: although not 100% about EXT_texture_shadow_lod actually...
16:52imirkin: i think that's it then.
16:52karolherbst: no texture_lod related regressions though
16:52imirkin: there are tests somewhere
16:52imirkin: for it. either CTS or dEQP
16:52imirkin: i used them for testing my impl
16:52karolherbst: I have a fail with ext_demote_to_helper_invocation .. but yeah
16:53karolherbst: ohh, right.. wanted to check GLES after piglit and the GL CTS
16:53imirkin: (for ext_texture_shadow_lod that is)
16:53karolherbst: some piglit regressions are super annoying...
16:53karolherbst: like nir does a so good job, it optimizes uniforms away the test relies on...
16:53karolherbst: or well... uses
16:54imirkin: does no one use piglit anymore?
16:54karolherbst: no fail in piglit was never a goal :p
16:54karolherbst: but not sure if that's piglit or nirs fault though
16:55karolherbst: "index" is gone eg
16:55karolherbst: but that sounds weird...
16:55karolherbst: or maybe not
16:55karolherbst: and it always returns vec4(1.0, 0.0, 0.0, 1.0)...
16:55karolherbst: or the other value
17:24karolherbst: yeah.. something is broken in the lowering... ufff
17:25karolherbst: in an odd way
17:43karolherbst: imirkin: I found it... ufff
17:43karolherbst: imirkin: https://gist.githubusercontent.com/karolherbst/58fc84a99acbfe5caded3d05ec55231b/raw/057eb8d31738de31f3dcbe4aec6088e5fada3d62/gistfile1.txt
17:44karolherbst: branched code.. both sustp use the same regs
17:44imirkin: why is that bad?
17:44karolherbst: apparently bad things happen
17:44karolherbst: but I am confused why it happens at all
17:45karolherbst: the lowering pass changes all sustps using those same values
17:45karolherbst: so if the second gets lowered the first is changed as well
17:45imirkin: oh, literally the same pointers
17:46imirkin: that means something isn't getting disconnected the way it should
17:46karolherbst: in nir I have a register.. mhh
17:46karolherbst: I guess
17:46imirkin: or two things are connected that normally wouldn't be
17:46karolherbst: I think it's because def.replace is used
17:46imirkin: right, but that's normally safe i guess
17:46karolherbst: pre ssa?
17:46karolherbst: yeah.. not sure either
17:46imirkin: maybe not
17:47karolherbst: I am wondering why nir places a register there anyway...
17:47karolherbst: the last store uses the same values
17:48karolherbst: in BB:4
17:48karolherbst: and the values are filled by an image_load
17:50karolherbst: yeah.. mhhh
17:51karolherbst: imirkin: this is the step when the first def.replace is called: https://gist.github.com/karolherbst/6f7b4ba2a8d409eed6a9c647d09a95f3
17:51karolherbst: and this just looks wrong
17:51karolherbst: not directly
17:51karolherbst: but it is
17:52karolherbst: I have an idea
17:53karolherbst: we just reuse the old value...
17:54karolherbst: but not sure what difference that should make.. *sigh*
17:54karolherbst: ohh, it actually makes a difference
18:04karolherbst: "Passed: 1/1 (100.0%)" mhhh
18:05karolherbst: imirkin: https://gist.github.com/karolherbst/d0cda2739dd5dcaf35e0077a27dbb86a
18:06imirkin: not a HUGE fan of that
18:06imirkin: iirc it won't quite work
18:06imirkin: you can't have a = union(a, b)
18:07karolherbst: why not though?
18:07imirkin: RA will sometimes screw it up
18:07imirkin: that's my recollection
18:07imirkin: you can scroll through my ancient commits about that...
18:07karolherbst: you don't end up with a = union(a, b) though
18:08karolherbst: something replaces some other bits later on or so..
18:08imirkin: i think that was the thing.
18:08imirkin: so might actually be fine
18:08imirkin: and i misremembered
18:08karolherbst: well.. a is defined when inserting this union :p
18:09imirkin: heh. this was a fun one. d3a5cf052c38087b395871b5b46776e2a7d4a7d7
18:09karolherbst: yeah.. well, will run the CTS and see how bad it is
18:10karolherbst: but before RA I also have such unions: union u32 %r183 %r168 %r164 %r182 (0)
18:11karolherbst: when going into SSA this all gets fixed up I guess
18:11imirkin: right ... and nothing is reused...
18:11imirkin: but the claim is that all of the sources are predicate-ly defined
18:11karolherbst: I see a phi node resolving this
18:11karolherbst: that they are
18:11imirkin: phi nodes are based on edges
18:11imirkin: union nodes are not.
18:12karolherbst: I mean.. you have two unions being the source of the same phi value
18:12karolherbst: which is used at the end of the shader
18:12imirkin: that should be fine.
18:13karolherbst: maybe I create a new SSA value and just mov to the old one...
18:13karolherbst: kind of pointless though
18:14karolherbst: anyway.. the sources are from the two suldps and the mov 0 and the predicates we have correctly set up
18:14karolherbst: and if a union is fine with writing to one of it's sources then.. yeah
18:14karolherbst: fine then :p
18:15karolherbst: but I'd prefer to set new values on the suldps
18:15karolherbst: do the union magic
18:15karolherbst: and then mov into the old value
18:18imirkin: that's fine.
18:19imirkin: so yeah. def.replace() is a bit dodgy pre-ssa
19:46karolherbst: imirkin: do you think anything special needs to be done to bump the image limit or should it just work on kepler+?
19:46karolherbst: thinking about bumping it to 16
19:49imirkin: need to make room in the thing
19:49imirkin: where the image info is stored
19:49imirkin: otherwise no
19:54karolherbst: mhh.. not sure what to do about fermi though... :/ probably need to fix the CTS test anyway.. or maybe nvidia already has a patch
20:01imirkin: yeah dunno
20:01imirkin: fermi is hard limit of 8 though, and frag shader only
20:01imirkin: (nvidia driver supports it in other stages, but we don't, since "we" are lazy, aka me)
20:02karolherbst: yeah well..
20:02karolherbst: there is no good reason to support it, right?
20:02karolherbst: or what's the difficult part here?
20:02imirkin: nothing uses images outside frag/compute afaik
20:02imirkin: oh, it's just annoying... there's an overall limit of 8 images
20:02imirkin: which is global
20:02imirkin: or something
20:02imirkin: i forget
20:03karolherbst: sounds annoying
20:03imirkin: and the spec only requires frag/compute
20:03imirkin: (well, spec doesn't talk about compute. but the compute spec requires 8 images :) )
20:04karolherbst: what was the website with the gl limits per driver?
20:05HdkR: https://opengl.gpuinfo.org/ ?
20:05karolherbst: yeah.. maybe
20:05karolherbst: ahh yeah
20:06HdkR: I've heard of a game dev using images in the vertex stage, so not completely impossible :P
20:07karolherbst: nvidia always only reports 8 images
20:07karolherbst: even on newer gens
20:08karolherbst: per stage that is
20:09karolherbst: let me run the last released cts version then :D
20:09imirkin: HdkR: yeah, i guess it's possible
20:09imirkin: HdkR: but i think you'll agree that it's at least rare
20:09imirkin: and spec doesn't require support for it
20:09imirkin: afaik on pre-GCN amd it's completely impossible to do it outside frag
20:09karolherbst: sadly we can't do it on tesla either :/
20:10imirkin: well, tesla is compute-only
20:10imirkin: can't do images in frag
20:10imirkin: ES 3.10 only requires images in compute btw
20:10karolherbst: wondering if somebody writes lowering code....
20:10karolherbst: but ssbos are also compute only, right?
20:11HdkR: Yes, it's a super rare use case, and I think the only reason they used Images over SSBOs is to get the free format conversion
20:11airlied: yeah pre-GCN only has compute/frag
20:11airlied: and Haswell I think as well
20:11karolherbst: I have a crazy idea...
20:12karolherbst: just ower vps and fps into compute shaders :p
20:12airlied: yeah haswell has only frag/compute image uniforms
20:16imirkin: karolherbst: yeah, do T&L in compute and then just feed the results into a no-op rast pipeline
20:17karolherbst: but I am indeed wondering why there is no ssbo support in fps...
20:17karolherbst: sounds like.. uff..
20:17karolherbst: ohh wait.. tesla wasn't unified design, right?
20:17karolherbst: that's fermi+
20:17imirkin: i believe tesla is considered unified
20:17imirkin: even though it's not 100%
20:18karolherbst: those numbers...
20:18karolherbst: 128 shader cores in the high-end version :D
20:20karolherbst: airlied: ohh, are you willing to review nir_to_llvm patches? :p
20:20karolherbst: airlied: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5480
20:20karolherbst: could allow you to get rid of image derefs :d
20:20karolherbst: the MR.. not the one patch
20:20karolherbst: is what I need a review for
21:57airlied: karolherbst: hakzsam or bas might be best, maybe just cc them ont