00:54 AndrewR: 03:45:08 up 10 days, 26 min :}
01:03 imirkin: is that a lot?
01:04 karolherbst: not really :p
01:06 imirkin: i've had 6-mo uptimes with nouveau... maybe more
09:44 RSpliet: imirkin: Only my NVAC ever achieved decent uptime. If it wasn't for sort-of-fortnightly kernel updates of course
14:12 imirkin: skeggsb: fyi, i skipped the kbuild files because i figured they were derivative of the kbuild system, which is gpl. but i didn't give it too much thought, and certainly have no objection to marking them MIT
14:51 karolherbst: mhh
14:51 karolherbst: I am sure we have to license the Kbuild files under GPL
14:53 karolherbst: doesn't matter anyway
17:05 karolherbst: imirkin: mhh, I am currently wondering why we have the shared memory size passed into codegen. Normally we never adjust that value, right? I think nir would allow us to reduce the API given amount in case we eliminate some arrays or something though. Or was there something funky going on I don't see?
17:06 karolherbst: pmoreau: weren't you saying we use shared memory for the kernel input on nv50? Can't we just move it to being a "normal" ubo?
17:08 imirkin: yeah, compute kernel params show up in s[] on nv50
17:08 karolherbst: but that's because we decided to do that, right?
17:08 imirkin: no
17:08 karolherbst: ohh
17:08 karolherbst: we have to?
17:08 karolherbst: :/
17:08 imirkin: welllll
17:08 imirkin: we probably don't HAVE to
17:09 imirkin: also in GL, there are no parameter
17:09 imirkin: parameters are only a thing in CL, iirc
17:09 karolherbst: sure, but in CL we have no uniforms
17:09 karolherbst: so we use c0[] for the input in nvc0
17:11 imirkin: yeah, i mean it could work
17:11 imirkin: iirc on nv50, you just dump them into the pushbuf
17:11 karolherbst: CL has to support at least 8 constant memory args though... but.. we can also just do whatever we do for compute shaders
17:11 imirkin: and they show up in s[]
17:11 karolherbst: ohh
17:11 karolherbst: how does it makes a difference performance wise?
17:11 karolherbst: I'd expect accessed to c[] to be faster than s[]
17:11 karolherbst: *accesses
17:11 imirkin: more like an easiness win :)
17:12 karolherbst: how fast ist c[] on nv50? as fast as register reads or slower? or does nobody know?
17:12 imirkin: yeah, no clue
17:13 karolherbst: how big can shared memory be on nv50?
17:14 karolherbst: CL requires 16kb of "local memory" which is shared in nv terms
17:17 HdkR: shared is slower than constant unless you're accessing constant in a divergent manner
17:17 HdkR: Since constant accesses serialize
17:17 karolherbst: uhh
17:18 karolherbst: that's good to know
17:18 HdkR: While shared works in banks that will be serviced simultaneously
17:18 karolherbst: mhh, interesting
17:20 HdkR: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#shared-memory-5-x Has information on the shared side
17:20 imirkin: HdkR: is that true on SM 1.0?
17:20 HdkR: imirkin: I don't know anything about old hardware :P
17:22 HdkR: memory on those things are wacky regardless
17:23 imirkin: the big thing on nvc0+ is that i don't think you can load s[] from any op other than LD
17:23 imirkin: while on nv50, all kinds of ops can do all kinds of stuff
17:23 HdkR: ah
17:23 karolherbst: ohhh, right... that was the thing about s[] on nv50
17:23 karolherbst: mhhh
17:24 imirkin: of course on average, an nv50 op has trouble accessing *registers*, much less memory of any kind...
17:24 karolherbst: he?
17:24 imirkin: depending on the encoding, the upper 64 regs may not be available
17:24 karolherbst: ohh, there was also this weirdo address file for nv50
17:24 imirkin: that's also a fun one.
17:24 karolherbst: was that for inreicts?
17:24 karolherbst: *indirects?
17:24 karolherbst: or what was that about?
17:25 imirkin: the address registers with sticky bits to support DX-whatever
17:25 imirkin: yeah, only need it for indirect access
17:26 karolherbst: we handle that with lowering passes, right? I was thinking removing it entirely from the TGSI/NIR -> nv50ir code and just get it added later on
17:26 imirkin: (so that add didn't overflow it back down to 0)
17:27 imirkin: the address stuff? i dunno... there's all kinds of shenanigans.
17:27 imirkin: among other things, it's a separate register file on nv50
17:27 karolherbst: :/
17:27 imirkin: you can do stuff like shl $a0, $r0, 2
17:27 imirkin: (which is pretty common as you might imagine)
17:28 karolherbst: right
17:28 imirkin: it's a lot easier to make a compiler that supports 1 piece of hw than a bunch
17:28 imirkin: look on the bright side - we don't support the vector isa of nv30/nv40 :)
17:29 karolherbst: :)
17:30 karolherbst: imirkin: I was asking because I am wondering if we actually have to store the shared memory size... normally we get this value from the state tracker anyway, no?
17:31 imirkin: at compile time, i think
17:31 imirkin: oh
17:31 imirkin: heh
17:31 imirkin: totally forgot
17:31 imirkin: unfortunately these gpu's have a finite amount of memory
17:31 karolherbst: yeah.. that's why I was wondering about the s[] situation on nv50
17:31 imirkin: hm - dunno if that is affected by shared memory
17:31 imirkin: i was thinking about regs * invocs <= max
17:32 karolherbst: beacuse if we have to split s[] across "local" memory and "kernel input" we kind of have to deal with it inside the compiler
17:32 karolherbst: but for nvc0 it really doesn't matter, as we don't run into this problem
17:33 karolherbst: mhh, well the cheap way out would be to let codegen do the hashing and it can hash differently on how it behaves differently across chipsets I think
17:33 imirkin: https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/nv50/nv50_compute.c#n219
17:33 imirkin: it's just super-convenient.
17:34 imirkin: shared memory size is fixed for a program though
17:34 imirkin: it's not variable per-invocation
17:34 karolherbst: not in CL
17:34 karolherbst: you can reuse the same kernel with different sizes
17:34 imirkin: how do you specify it?
17:35 imirkin: (it = the size)
17:35 karolherbst: not inside the kernel
17:35 karolherbst: it's just a pointer into memory
17:35 karolherbst: so you specify the size while setting the kernel parameter
17:35 karolherbst: clSetKernelArg(kernel, idx, 0x100, NULL)
17:36 karolherbst: and if idx is a "int local* shared" the runtime will create a 0x100 sized local memory buffer
17:36 imirkin: harsh.
17:36 karolherbst: yes
17:36 imirkin: good thing the kernel params come first =]
17:36 karolherbst: ohh wait
17:36 karolherbst: yeah
17:37 karolherbst: was just thinking about it
17:37 karolherbst: so I guess we should be save
17:37 karolherbst: I am just wondering if s[] is big enough on nv50
17:38 karolherbst: 16 kb (local) + 1kb for the input.. oh well, that's not that big actually
17:39 karolherbst: mhh, but yeah, in the end the size doesn't really matter, except we have to guard against out of bound accesses
17:39 karolherbst: but... I guess we can handle that outside the shader?
20:56 ReinUsesLisp: here's the select id used by Nvidia's OpenGL driver to implement queries: https://pastebin.com/nf4ke0G0
20:56 ReinUsesLisp: you probably already know this though
20:57 imirkin: ReinUsesLisp: https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw.c#n186
21:00 imirkin: lmk if you have any questions about that
21:00 imirkin: i already explained some query-related stuff on-list to someone (you?)
21:01 ReinUsesLisp: probably Blinkhawk (Fernando), he was implementing the non-query related stuff of conditiona rendering
21:01 imirkin: ah ok
21:01 ReinUsesLisp: although there's one Q, the blob uses a select id of 26 (0x1A) and I don't seem to be able to trick the driver to use it
21:02 ReinUsesLisp: it uses it on an NVN game, it could be something not exposed in OpenGL, but that would be weird from Nvidia
21:02 imirkin: let's see...
21:02 imirkin: wtf is select id...
21:03 ReinUsesLisp: https://github.com/envytools/envytools/blob/715cba01cb983fed0c856382a66943f734e6edc2/rnndb/graph/gf100_3d.xml#L1145-L1150
21:03 imirkin: thanks =]
21:03 imirkin: yeah, nouveau doesn't have that one
21:04 imirkin: we have 0x1b (tcp launches)
21:04 ReinUsesLisp: I know that game uses compute, so it could be COMPUTE_SHADER_INVOCATIONS_ARB
21:04 imirkin: oh wait
21:04 imirkin: we do have it
21:04 ReinUsesLisp: but when I tried to query that from Nvidia's driver it was just using zero
21:04 imirkin: case NVC0_HW_QUERY_TFB_BUFFER_OFFSET:
21:04 imirkin: /* indexed by TFB buffer instead of by vertex stream */
21:04 imirkin: nvc0_hw_query_get(push, q, 0x00, 0x0d005002 | (q->index << 5));
21:04 imirkin: this is needed to implement glDrawTransformFeedback()
21:05 ReinUsesLisp: nice, I'll check if it can be queried from GL
21:05 imirkin: esp in the presence of stream pausing/restarting
21:05 imirkin: it can't, not directly
21:06 imirkin: actually yeah, i guess it's just for stream pause/restart
21:06 imirkin: when you restart, you need to know where the first thing stopped
21:08 ReinUsesLisp: ok, we'll probably have to ignore it and find a hacky way to implement TFB
21:08 ReinUsesLisp: it's tempting to use the NV_ extensions here, since they can be specified from GL instead of GLSL :P
21:08 ReinUsesLisp: thanks
21:15 karolherbst: ReinUsesLisp: ohh btw, did somebody start with pattern patching to convert a group of hardware commands/shader instructions to _one_ glsl function?
21:15 karolherbst: eg for texlod
21:15 karolherbst: uhm.. textureLod
21:16 karolherbst: or.. wait, what was txd again
21:17 karolherbst: ahh, drivates
21:18 imirkin: ReinUsesLisp: what NV_* extension did you have in mind?
21:18 karolherbst: we have this funky handleManualTXD thing :/
21:19 imirkin: karolherbst: for derivatives, sure. he's talking about xfb though, which has nothign to do with shader contents
21:19 karolherbst: no, it's unrelatd
21:19 karolherbst: just a general question
21:19 imirkin: well - most of the time on maxwell should fit into the builtin txd
21:19 imirkin: so that's easy
21:19 karolherbst: yeah
21:19 imirkin: i mean who does textureGrad with a 3d or shadow texture
21:19 imirkin: hopefully no one :)
21:19 karolherbst: I am just wondering if they encounter shaders where they hit tons of lowering
21:20 karolherbst: which would be plenty more faster on AMD/intel instead of just 1to1 translation that crap
21:20 karolherbst: and for that you need some good pattern patching thing in order to back translate lowering to glsl functions
21:20 imirkin: well, quadop isn't generically implementable with glsl
21:20 imirkin: some exts allow you to get close though
21:21 karolherbst: yeah.. that's why I am wondering if something worked on that already.. I thought they was something where they hit bad perf issues due to something like this
21:21 karolherbst: but my recolection of the state is like half a year old
21:22 karolherbst: anyway, just wondering
21:32 ReinUsesLisp: imirkin: the NV_transform_feedback family
21:33 ReinUsesLisp: karolherbst: gdkchan from Ryujinx made an optimized shader decompiler
21:33 ReinUsesLisp: yuzu currently translates them 1 by 1 and lets the host compiler do the work
21:33 karolherbst: how much does it help in general?
21:33 karolherbst: and how much in specific cases?
21:33 imirkin: ReinUsesLisp: GL_NV_transform_feedback2 doesn't seem like it would help either...
21:34 imirkin: perhaps i'm missing something?
21:34 ReinUsesLisp: I didn't profile it, the goto removal algorithm helped a lot on pre-GCN and old Intels
21:34 karolherbst: ReinUsesLisp: well, the host compiler can only do so much... yuzu requires a pattern matcher, otherwise that wont fly probably... I highly doubt that any compiler would be able to optimize that in the same way
21:34 ReinUsesLisp: NV_transform_feedback lets you specify the TFB layout after the shader has been compiled, core OpenGL requires it in the shader
21:35 imirkin: ReinUsesLisp: well, nvc0+ kinda wants it in the shader too
21:35 imirkin: with nv50, you could do a lot of remapping
21:35 imirkin: (actually, i don't 100% remember how nvc0 xfb setup works either)
21:36 ReinUsesLisp: karolherbst: I'm a bit skeptical about optimizing shaders a lot, we solved Z-fighting issues by adding precise on FADD and FMUL decompilation
21:37 karolherbst: ReinUsesLisp: I don't mean "optimizing" though
21:37 karolherbst: ReinUsesLisp: there is a lot of stuff the hardware simply can't do
21:37 ReinUsesLisp: ah, like variable AOFFI
21:37 karolherbst: so some glsl functions can explode to 100s of instructions
21:37 karolherbst: ...
21:37 karolherbst: and reversing this would be very helpful
21:38 ReinUsesLisp: there are games using BRX, guess how we implemented it :P
21:38 karolherbst: especially ll those image operations are lowered to multiple surface instructions on hardware
21:38 karolherbst: but.. some things are just super evil
21:39 ReinUsesLisp: anyways, most of yuzu's overhead is not on the GPU (unless we are talking about games using BRX a lot)
21:39 karolherbst: ReinUsesLisp: and sometimes you hit the jack pots like explicit derivates in texture instructions: https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp#n1171
21:39 karolherbst: ohh, interesting
21:39 karolherbst: so most is still CPU overhead?
21:39 karolherbst: *jackpot
21:40 ReinUsesLisp: OpenGLState takes around 5% of profiling time on some games
21:40 karolherbst: :/
21:40 karolherbst: mhhh
21:40 ReinUsesLisp: (that's our state tracker)
21:40 karolherbst: pattern matcher _could_ reduce CPU overhead as well
21:40 ReinUsesLisp: it could be optimized a bit
21:40 karolherbst: as the OpenGL runtime has less to do then
21:40 ReinUsesLisp: https://github.com/yuzu-emu/yuzu/blob/master/src/video_core/renderer_opengl/gl_state.cpp
21:40 karolherbst: well, the compiler
21:40 karolherbst: mhh, and maybe yuzu could also cache? hardware shader -> glsl source
21:41 ReinUsesLisp: we cache it in disk
21:41 karolherbst: :)
21:42 karolherbst: ReinUsesLisp: but yeah.. the FMA stuff is quite stupid :/
21:42 karolherbst: the maxwell isa only has one variant :/
21:43 karolherbst: so FMA is always fused
21:43 karolherbst: and there is no unfused version
21:43 karolherbst: we are still hitting precise bugs in mesa as well from time to time
21:43 ReinUsesLisp: there are games that use different float-point roundings, we can't control that on GLSL
21:44 ReinUsesLisp: (aside from rounding towards even, it uses .RZ)
21:46 karolherbst: ReinUsesLisp: mhh, are you sure?
21:46 karolherbst: it might just be pow lowering
21:46 karolherbst: pow lowering sets the dnz flag on mul
21:46 ReinUsesLisp: when we nvdisasm those shaders they says RZ on some FMUL/FADD operations
21:47 ReinUsesLisp: they say*
21:47 karolherbst: pow = (ex2 (preex2 (mul_dnz b (lg2 a)))
21:47 karolherbst: ohh
21:47 karolherbst: mhh, I think there was something as well
21:48 karolherbst: ReinUsesLisp: do you know if the native toolchain allows it?
21:48 ReinUsesLisp: NVN and OpenGL both use GLSL
21:49 karolherbst: okay... then how would they be able to specify it?
21:49 ReinUsesLisp: it's probably the driver expanding a more complex expresion
21:49 karolherbst: yeah.. probably
21:49 karolherbst: could be some CVT merging
21:49 karolherbst: or.. something
21:50 karolherbst: how easy is it to get the sdk officially? I guess paying tons of bucks is the only way?
21:50 karolherbst: I only see us setting the RZ flag for conversions :/
21:50 karolherbst: ReinUsesLisp: do you have such a shader? maybe it's something more or less obvious
21:51 ReinUsesLisp: yes, one sec
21:54 ReinUsesLisp: karolherbst: https://pastebin.com/tTGkDXPu
21:56 karolherbst: those {} brackets :/
21:56 karolherbst: there was a reason nvidia adds those
21:56 karolherbst: but they are kind of pointless
21:57 karolherbst: ufff
21:57 karolherbst: imirkin: this looks a bit like optimized long to float stuff, no?
21:58 karolherbst: ahh
21:58 karolherbst: ReinUsesLisp: see the first MUL.RZ thing?
21:58 ReinUsesLisp: yes
21:58 karolherbst: one operand is a i2f thing
21:58 karolherbst: and the result is pushed into a ftrunc
21:59 karolherbst: to int again
21:59 karolherbst: I am sure there is some good reason to compile it like this
21:59 karolherbst: anyway, it seems like there is float/int conversions involved
21:59 karolherbst: and it might be the glsl shader had explicit conversions
22:00 ReinUsesLisp: this VS is used in almost all home coocked Nintendo games to render fonts
22:00 karolherbst: would be interesting to write ptx code which would give us the same shader
22:00 ReinUsesLisp: they render all fonts in the screen with a single instanced draw
22:00 karolherbst: ahh
22:00 karolherbst: makes sense
22:00 karolherbst: low CPU overhead :)
22:01 ReinUsesLisp: it's funny that those shaders didn't render properly on AMD's proprietary driver
22:01 karolherbst: yeah.. it's not the best opengl implementation
22:01 ReinUsesLisp: they have a bug on variable vec4 indexing, which we use to implement LDC and friends
22:02 karolherbst: or well, it's considered one of the worst actually on linux :p
22:02 ReinUsesLisp: I consider it top tier compared to mobile drivers :P
22:02 karolherbst: :D
22:02 karolherbst: that's true
22:03 ReinUsesLisp: LDG and STG were fun to implement
22:03 karolherbst: ReinUsesLisp: but that mul.rz is always consumed by a f2i.trunc
22:03 karolherbst: that's interesting
22:03 karolherbst: ReinUsesLisp: ssbos :p
22:03 karolherbst: and drop the bound checks
22:03 ReinUsesLisp: or NV_shader_buffer_store
22:04 karolherbst: heh? but that's atomic stuff, no?
22:04 ReinUsesLisp: it let's you write stuff like
22:04 ReinUsesLisp: uniform float* foo;
22:04 ReinUsesLisp: foo[5] = 342.2f;
22:04 karolherbst: uff
22:04 karolherbst: yeah well
22:04 karolherbst: but you could also just use ssbos
22:05 ReinUsesLisp: you can do pointer arithmetics
22:05 ReinUsesLisp: or double dereferences
22:05 ReinUsesLisp: uniform float** foo;
22:05 ReinUsesLisp: pack them in structs, fun stuff like that
22:05 karolherbst: right, but do all implementation implement this extension?
22:05 ReinUsesLisp: just NV's blob
22:05 karolherbst: ;)
22:05 karolherbst: so you need to use ssbos anyway
22:06 ReinUsesLisp: yeah
22:06 karolherbst: it's anyway the only way to get those stg or ldg thorugh glsl
22:06 karolherbst: besides those insane extensions nobody uses
22:06 ReinUsesLisp: please don't summon HdkR :P
22:06 karolherbst: those are mainly enterprise shit, because somebody just put a lot of money on nvidias desk, so they kind of implemented such extensions :p
22:06 ReinUsesLisp: 80% of yuzu users use Nvidia's blob :|
22:07 karolherbst: or sometimes they just push those out for no reason
22:07 karolherbst: besides "because we can"
22:07 karolherbst: ReinUsesLisp: well, AMD wouldn't work, right? :p
22:07 ReinUsesLisp: they are useful for Nintendo Switch developers
22:07 karolherbst: but they don't use it
22:07 karolherbst: I am sure they don't
22:07 ReinUsesLisp: how can you tell for sure?
22:07 karolherbst: because nobody is that insane to use those
22:08 ReinUsesLisp: Shovel Knight uses bindless textures :shrugh:
22:08 karolherbst: yeah, but that's implemented on most drivers :p
22:08 karolherbst: and some newer linux ports use it as well
22:08 karolherbst: as they run out of samplers
22:09 ReinUsesLisp: some home cooked games use the CST
22:09 karolherbst: ReinUsesLisp: anyway, if you see a stg/lgd with a bound check, that's a plain ssbo
22:09 karolherbst: uhm ldg
22:09 ReinUsesLisp: NV doesn't do the bound checking though
22:09 karolherbst: ufff
22:09 karolherbst: mhhh
22:09 karolherbst: I mean, it's optional in opengl as well :/
22:09 karolherbst: it's part of the robustness stuff
22:09 ReinUsesLisp: I can compile arbitrary GLSL with the blob, I've already checked it
22:09 karolherbst: maybe on a switch they don't care and just tell game devs to fix their shit
22:09 karolherbst: huh?
22:10 karolherbst: against the opengl driver?
22:10 ReinUsesLisp: some games ship with an offline compiler to build GLSL on the fly
22:10 ReinUsesLisp: you inject your own GLSL there
22:10 karolherbst: ohh, interesting
22:10 karolherbst: guess they don't do robustness then
22:10 karolherbst: but are you sure you checked with ssbos?
22:11 ReinUsesLisp: lmk if you want to test how a GLSL binary looks like
22:11 ReinUsesLisp: I'll try enabling robustness from the GLSL
22:11 karolherbst: yeah.. maybe that changes it
22:11 karolherbst: normally we just always do bound checks, because otherwise the shader traps :)
22:11 karolherbst: or well, worse, crashes the system
22:12 ReinUsesLisp: SUST has a trap flag, sadly STG doesn't seem to do so
22:12 ReinUsesLisp: nouveau uses ST anyways
22:12 karolherbst: right
22:12 karolherbst: maybe stg is bound checked?
22:12 karolherbst: imirkin: do we actually know the differences between st and stg?
22:14 imirkin: hdkr claims that stg doesn't get the stupid memory windows taken out of it
22:14 imirkin: i think that stg also goes through a slightly different sequencing path
22:18 karolherbst: imirkin: ohh, maybe on shared memory GPUs the bound checking isn't required? that would be weirdly coold
22:18 karolherbst: *cool
22:18 ReinUsesLisp: I think ST can write to local memory
22:18 karolherbst: although.. for ssbos you really want that
22:20 imirkin: bounds checking is most definitely required
22:20 imirkin: ReinUsesLisp: yeah, there are stupid memory windows
22:20 imirkin: that you can configure
22:20 ReinUsesLisp: https://pastebin.com/pTMDcXz8 compiles down to https://pastebin.com/s327YT1H
22:21 karolherbst: mhhh
22:22 ReinUsesLisp: keep in mind that doesn't follow the OpenGL rules, since I'm compiling it from an NVN game
22:22 karolherbst: what if you add an indirect array access
22:22 ReinUsesLisp: let me try
22:23 ReinUsesLisp: nope https://pastebin.com/UM7Nvzi7
22:23 karolherbst: ReinUsesLisp: but that's not indirect?
22:23 karolherbst: you need to use an uniform for the index
22:23 ReinUsesLisp: I thought "value[index]" was indirect
22:24 karolherbst: if index is a constant then it's a constant access
22:24 karolherbst: and if you know the size you won't need to bound check
22:24 ReinUsesLisp: "index" is from the SSBO
22:25 karolherbst: ohh, I missed the LDG.E
22:25 karolherbst: uff
22:25 karolherbst: and I didn't see the glsl code
22:26 karolherbst: heh... am I tired or something?
22:26 karolherbst: okay, but hey, that's weird
22:26 karolherbst: imirkin: for that shader we would assume that we would have to add a bound check, no?
22:26 ReinUsesLisp: anyways, this is not following the OpenGL rules
22:27 karolherbst: yeah
22:27 karolherbst: well, in opengl you need robustness support for it
22:27 ReinUsesLisp: (because it doesn't have to follow them)
22:27 karolherbst: maybe ssbos even mandate it
22:27 karolherbst: yeah.. I kind of see the point
22:27 karolherbst: bound checks are slowing done the shader... but... well, compared to the global memory access the bound check doesn't matter all that much
22:28 karolherbst: but then you kind of have to guarantee that your shader is bug free
22:28 ReinUsesLisp: you have to read the size :|
22:28 karolherbst: and your game
22:28 karolherbst: ReinUsesLisp: const buffer
22:28 karolherbst: it's essentially for free
22:28 ReinUsesLisp: ah yeah, I thought for a sec that it was in global memory
22:28 karolherbst: so you do a setp with the offset and c[][]
22:28 karolherbst: and predicate the stg
22:29 ReinUsesLisp: yup, I was inspecting nouveau shaders some time ago
22:29 karolherbst: ReinUsesLisp: what if you read from the ssbo and write it somewhere else?
22:30 karolherbst: would make a difference, because out of bound could be defined to be 0
22:30 karolherbst: or undefined
22:30 karolherbst: a write could just end up doing nothing if it is out of bound
22:32 ReinUsesLisp: some Switch devs are kind of mad
22:33 ReinUsesLisp: they use TFB and compute in the same application
22:33 ReinUsesLisp: (as well as quads and u8 indices)
22:33 karolherbst: ReinUsesLisp: it's all fun and games until you try to run those applications on 10 years older hardware and the undefined bits are behaving differently and now you have games not running on modern implementations because of that crap
22:33 karolherbst: bugs like that happen... sadly
22:33 karolherbst: because undefined behaviour is different on different hardware
22:35 ReinUsesLisp: btw how does nouveau implement OpenGL compat depth/stencil sampling?
22:35 ReinUsesLisp: (with compat I mean 3 and older, since it doesn't support compat)
22:36 karolherbst: uhm... either it's correct by accident or we have code somewhere...
22:36 karolherbst: ReinUsesLisp: do you know what's special about it with compat?
22:36 ReinUsesLisp: sampling non-R components behaves diferently, I couldn't find the specific bit in the spec that makes them different
22:36 karolherbst: normally we don't really have to change anything in our driver, as stuff just works
22:37 ReinUsesLisp: although some games behave properly on compat but not on core
22:37 karolherbst: sometimes you need to add support for some stuff in tess and geom shaders
22:37 ReinUsesLisp: and we need quads, u8 indices, and legacy attributes anyways, soe we moved to compat
22:37 ReinUsesLisp: err... u8 indices are core*
22:37 karolherbst: mhh
22:37 karolherbst: most of the compat stuff is handled inside gallium
22:37 karolherbst: if something doesn't work it's because nobody tested it before :)
22:38 ReinUsesLisp: hehe, makes sense :P
22:41 imirkin: ReinUsesLisp: you mean the broadcast vs not-broadcast for sampling depth?
22:42 ReinUsesLisp: I'm not sure what that is
22:42 imirkin: it's a pretty subtle distinction
22:42 imirkin: let me find what we do... hold on
22:42 ReinUsesLisp: those are the worst distinctions :P
22:47 imirkin: yeah
22:53 imirkin: bleh. can't find it.
22:57 imirkin: ReinUsesLisp: this is what we do:
22:57 imirkin: src/mesa/main/texobj.c: obj->DepthMode = ctx->API == API_OPENGL_CORE ? GL_RED : GL_LUMINANCE;
22:57 imirkin: depth mode, in turn, corresponds to GL_DEPTH_TEXTURE_MODE
22:58 ReinUsesLisp: nice, that confirms what I guessed from reading the spec
22:58 imirkin: note that hardware can sample depth and stencil values at the same time
22:58 imirkin: but GL does not allow this
22:59 imirkin: this is done using the much-newer ARB_stencil_texturing ext
23:10 ReinUsesLisp: forcing "glTextureParameteri(texture, GL_DEPTH_TEXTURE_MODE, GL_RED)" on compat makes it behave like core
23:11 ReinUsesLisp: that solves a long running mistery, thanks :)
23:11 imirkin: np
23:11 imirkin: note that this can mess some things up if you have fixed function shaders
23:11 imirkin: for some of the stages
23:11 imirkin: however i'm guessing that won't be a problem :)
23:12 ReinUsesLisp: nope, we might use fixed function attributes at some point, but never fixed function shaders
23:13 imirkin: (technically, fixed function stages ... obviously can't have fixed function shaders)
23:13 imirkin: shaders can be used to implement a fixed function stage...
23:14 imirkin: [and are, on nv40+]
23:14 ReinUsesLisp: some games use compatibility-only AST addresses like gl_Fog :|
23:15 ReinUsesLisp: (I don't remember the exact varying, it was probably gl_FrontColor)
23:17 imirkin: by the time you're doing gl_Fog, you're already using a shader
23:17 imirkin: i'm talking about fixed function stages ;)
23:18 imirkin: (gl_FogColor iirc?)
23:20 ReinUsesLisp: it's interesting that those varyings are not emulated on nvc0
23:21 karolherbst: why would they?
23:23 ReinUsesLisp: IMO it's weird to keep support for legacy varyings in the hardware itself
23:23 ReinUsesLisp: but it makes sense
23:23 karolherbst: the question is rather, how hard would it be to emulate it and how expensive is it to just add it to the hardware
23:25 imirkin: ReinUsesLisp: some of it is lowered further up
23:25 imirkin: case TGSI_SEMANTIC_FOG: return 0x2e8;
23:26 imirkin: https://gallium.readthedocs.io/en/latest/tgsi.html#tgsi-semantic-fog
23:27 karolherbst: imirkin: which of the offsets are actually driver defined? I never looked into it
23:27 karolherbst: or who controls the values?
23:28 imirkin: none
23:28 imirkin: the hardware designers control them
23:28 imirkin: it's not just input/output
23:29 imirkin: they have semantic meaning attached to them
23:29 karolherbst: right, but I mean for the generic ones we can put whatever values into those, no?
23:29 imirkin: sure
23:29 imirkin: but then it won't have the desired effect
23:29 imirkin: like for example the texcoords are subject to pointcoord replacement
23:29 karolherbst: so 0x0:0x7f is whatever the hardware does and 0x80- is generic and we fully decide the values?
23:29 imirkin: if they're not in the proper place, then fail
23:29 karolherbst: ohh
23:29 imirkin: no
23:30 imirkin: up to 0x270 (non-inclusive)
23:30 imirkin: actually, the 0x270 might not be special
23:30 karolherbst: okay. so only generic are the ones we defince
23:30 karolherbst: *define
23:30 imirkin: https://nvidia.github.io/open-gpu-doc/Shader-Program-Header/Shader-Program-Header.html
23:31 imirkin: the generic ones are generic.
23:31 imirkin: the non-generic ones are not generic :)
23:31 karolherbst: yeah.. I know that we have some header fields to configure the varyings we use
23:31 karolherbst: right :)
23:31 karolherbst: I was just wondering how much of it we just define for convience so it's always in the same place
23:31 imirkin: the header fields map to the addresses fairly well
23:31 imirkin: afaik all the non-generic ones have special meaning attached
23:31 karolherbst: and how much its actually hardware defined
23:32 karolherbst: okay
23:32 ReinUsesLisp: Nvidia's blob doesn't set all those fields
23:32 imirkin: clipvertex is just us being cheeky -- that has no real semantic meaning
23:32 karolherbst: ReinUsesLisp: well, they do. Most of it is just 0 :p
23:32 ReinUsesLisp: e.g. it doesn't set gl_ClipDistance when it uses gl_ClipDistance
23:32 karolherbst: you can't not set the bytes
23:32 ReinUsesLisp: :P
23:32 imirkin: ReinUsesLisp: uhhh ... that's surprising.
23:33 ReinUsesLisp: set as in std::bitset::set
23:33 imirkin: i'm guessing you're misunderstanding.
23:33 imirkin: or perhaps that clip distance is disabled.
23:33 imirkin: (also, note that these are effectively clip/cull distances -- there's a method that controls whether each one is clip or cull)
23:34 ReinUsesLisp: I added an assert when gl_ClipDistance was used but not set on the header, it hit all the time gl_ClipDistance was used
23:34 ReinUsesLisp: (gl_ClipDistance subindex, ofc)
23:35 imirkin: that's weird
23:35 imirkin: my guess is that those clip planes are disabled
23:35 imirkin: but they still emit in the shader?
23:36 ReinUsesLisp: err... I can build a shader using clip distances and get the header
23:36 ReinUsesLisp: one sec
23:39 ReinUsesLisp: xxd https://pastebin.com/LjaCe66X
23:39 ReinUsesLisp: glsl https://pastebin.com/PiB5De2W
23:39 imirkin: and did you enable the clip distances?
23:39 imirkin: glEnable(GL_CLIP_DISTANCE0);
23:39 imirkin: otherwise they have no effect.
23:40 karolherbst: it's not OpenGL ;)
23:40 karolherbst: but yeah, that stuff is API enabled/disabled
23:40 ReinUsesLisp: yeah, this binary is stateless
23:40 imirkin: btw, hexdump -C is much nicer to look at
23:40 karolherbst: ReinUsesLisp: do you know if the API has some knobs for that?
23:41 ReinUsesLisp: I doubt they modify the header at runtime, the whole shader is built outside the application
23:41 karolherbst: it's not about the shader
23:41 karolherbst: ReinUsesLisp: or is the header part of it?
23:42 karolherbst: I mean.. the runtime can just |= the stuff into it at runtime
23:42 karolherbst: anyway, with GL that is runtime defined
23:42 ReinUsesLisp: it emits header + code
23:42 ReinUsesLisp: (and ARB)
23:42 karolherbst: sure, but they can just mask it in
23:42 karolherbst: that's why I was asking if there are some API calls for it
23:42 ReinUsesLisp: yes, unless we are not emulating it, I haven't seen it doing so at runtime either
23:43 ReinUsesLisp: I can't find any symbol naming ClipDistances
23:43 ReinUsesLisp: (and it has stuff like nvnCommandBufferSetAlphaRef)
23:44 imirkin: ReinUsesLisp: might be clip plane
23:44 imirkin: ReinUsesLisp: do you see anything about cull?
23:45 imirkin: those clip distances may be flipped to a cull distance
23:45 imirkin: with a method call
23:46 ReinUsesLisp: nvnTextureBuilderGetZCullStorageSize, nvnTextureGetZCullStorageSize, nvnCommandBufferSaveZCullData, nvnCommandBufferRestoreZCullData
23:46 imirkin: zcull is something else
23:47 imirkin: ReinUsesLisp: do you have a SetUserClipOp ?
23:47 imirkin: (0x1940)
23:48 ReinUsesLisp: it might be defined in a struct name (which I can't see from function pointer queries)
23:48 ReinUsesLisp: struct member*
23:48 imirkin: and SetUserClipEnable (0x1510)
23:48 imirkin: anything about UserClip?
23:48 ReinUsesLisp: nope :|
23:49 ReinUsesLisp: I guess I can post symbol names
23:49 imirkin: hm right - we have the enable which is separate from the header
23:49 imirkin: ok yeah, i can believe the header does little.
23:49 imirkin: dunno.
23:49 imirkin: or perhaps they're enabled / disabled in tandem
23:50 ReinUsesLisp: https://pastebin.com/DGFeh0M2
23:51 ReinUsesLisp: I wish this API is published some day :P
23:51 ReinUsesLisp: some symbols might be missing since I used to different games to gather them
23:51 ReinUsesLisp: two*
23:59 imirkin: have you tested clip distnace stuff?