00:23pmoreau: imirkin_: Do you know how to specify which dimension you want when running a dim texture query? I couldn’t find it out in the tgsi code.
00:24pmoreau: I’ll have another look at the emission code, as I know where those its are located.
00:24imirkin_: you get all of them
00:24imirkin_: you mean for textureSize()?
00:26pmoreau: Well, get_image_width() in OpenCL, which is translated to `TXQ.T R12, RZ, TEX_HEADER_DIMENSION, 0x51, 0x1;`
00:26pmoreau: If there is one that returns everything, that’s even better :-)
00:26pmoreau: (Cause the SPIR-V equivalent expects everything, so it’s easier for me if I get everything at once O:-) )
00:28imirkin_: that's on maxwell+?
00:28imirkin_: is get_image_width() the size of a texture or the size of an image?
00:28pmoreau: I’m currently on a kepler1 (sm_30)
00:28imirkin_: either way, have a look at how TXS and IMGQ are implemented
00:30pmoreau: It’s a read only piece of memory, with filtering operations on it. :-) SPIR-V does not differentiate between images and textures (nor does OpenCL it seems), but NVIDIA is using tex operations for it, so I’d guess it’s a texture
00:30pmoreau: Will have a look, thanks
00:30imirkin_: ok, so a texture if you can texture it
00:30imirkin_: look at TGSI_OPCODE_TXS (iirc)
00:30imirkin_: i don't remember how it works
00:30imirkin_: but ... however it works, it works
00:31imirkin_: for a read/write image, it'll be different
00:31imirkin_: might be OPCODE_TXQ
00:31pmoreau: I was looking at handleTXQ which has a TexQuery::TXQ_DIMS, but I can’t see where the dim is selected there.
00:32imirkin_: all of them.
00:32imirkin_: x, y, arrays, levels (iirc)
00:32imirkin_: are returned in a vec4 thing
00:33imirkin_: oh, i think that's == TEX_HEADER_DIMENSION
00:33imirkin_: you see the 0x1?
00:33imirkin_: that means only the .x component is returned, aka width
00:33imirkin_: if you did height, i bet it'd say 0x2
00:33pmoreau: It does
00:33imirkin_: and if you did depth, it'd say 0x4 :)
00:33imirkin_: and levels, 0x8
00:34imirkin_: and 0xf you get all 4
00:35pmoreau: How do you configure that though :-D
00:36pmoreau: I can see it in the binary, but I am having a hard time finding that out in handleTXQ
00:37imirkin_: it's by the defs
00:37imirkin_: it takes up to 4 defs
00:37imirkin_: those 4 defs determine the mask
00:37imirkin_: er ... no
00:37imirkin_: there's i->tex.mask
00:38imirkin_: or texMask
00:38imirkin_: or soemthing
00:38imirkin_: tex->tex.mask |= 1 << c;
00:38imirkin_: so like def(0) is going to get the value that corresponds with the first-set channel of the mask
00:38imirkin_: def(1) will get the second-set channel, etc
00:39pmoreau: I was setting it, but maybe I messed up, let me have another look at it.
00:39imirkin_: so for width, it tex->tex.mask == 1
00:39imirkin_: for height, tex->tex.mask == 2
00:40imirkin_: (if def(0) always gets the desired result)
00:58pmoreau: So I had the mask correctly set it seems, but nvdisasm won’t accept decoding the instruction unless I set bit 20 of code (otherwise I get “nvdisasm error : Opclass 'TXQ', undefined value 0x0 for table 'TXQModeDim' at address 0x00000000”).
00:59pmoreau: But the emitTXQ seems to never emit that bit.
01:02imirkin_: that happens.
01:02imirkin_: i've noticed that.
01:02imirkin_: nvdisasm is wrong.
01:03pmoreau: Could very well be: I can’t run the code yet, as I hit
01:03pmoreau: Thread 1 "test_basic" received signal SIGSEGV, Segmentation fault.
01:03pmoreau: 0x00007fffebb654fa in nvc0_stage_set_sampler_views (nvc0=0x555556cdb970, s=5, nr=1, views=0x0) at ../../../../../mesa_spirv/src/gallium/drivers/nouveau/nvc0/nvc0_state.c:512
01:03pmoreau: 512 if (views[i] == nvc0->textures[s][i])
01:04pmoreau: Since I never set up those, yet
01:05imirkin_: is that legal?
01:05imirkin_: nr > 0, views == 0?
01:06pmoreau: BTW, what are “R” and “S” for, in TexInstruction->tex: resource and sampler?
01:09pmoreau: nr > 0 && views == 0 comes from clover, which is always passing a NULL pointer (while giving the actual size) after launching a grid.
01:12imirkin_: hm. maybe it's supposed to unbind everything
01:12imirkin_: and we just don't handle that use-case
01:12pmoreau: That would be my guess as well
01:16pmoreau: Should it call nvc0_sampler_view_destroy to unbind the views? Having “destroy” in the name seems to do a bit more than just unbinding though.
01:18imirkin_: what it's doing is fine
01:18imirkin_: iirc most gallium api's support this kind of usage
01:18imirkin_: although it's weird
01:18imirkin_: alternatively just pass in an array of null views
02:28karolherbst: pmoreau: you can also look at how I do things. It should be much clearer than looking at the TGSI code
03:35karolherbst: imirkin: segfault in demmt :( https://gist.githubusercontent.com/karolherbst/24ceac92e24e009a9eb6fe0a11b0aba2/raw/2f7a4b747511f2652097a8f6a328e39690a51a09/gistfile1.txt
03:36imirkin_: did you do the thing i said?
03:36imirkin_: i.e. turn off nvif?
03:36imirkin_: oh ... and i think there's more stuff. i might have some local patches.
03:36karolherbst: it decodes stuff, but at some point it crashes
03:36imirkin_: it's all in a state of some disrepair :(
03:36karolherbst: I see
03:37karolherbst: without disabling nvif I got "unknown ioctl" errors :)
03:37imirkin_: it needs someone to love it
03:38karolherbst: maybe it would be easier to just add something to libdrm_nouveau to dump stuff we push to the kernel
03:38karolherbst: and parse that
03:46karolherbst: imirkin_: I think this issue is unrelated to prim_id... I will investiage the other geom fails I have, maybe I get around trying to fix everything to be able to do a mmt trace :)
04:00karolherbst: imirkin_: mhh, the error is, that all vertex IDs are simply 0, but geometry shader seems to work in general...
04:01karolherbst: maybe I should search for the issue within the vertex shader ....
04:12karolherbst: mhh, but vertex-id also works in general, weird
04:22karolherbst: imirkin_: okay, that tg4 issue in nir is, that it doesn't support multiple offsets
04:24karolherbst: ohh wait, you wrote exactly this
04:24karolherbst: I should learn to read more carefully
14:57pmoreau: I am not sure I completely understand how textures work nor where all the information comes from (I feel like TGSI and NIR give more information than what SPIR-V does).
14:58imirkin_: sampler view (= texture) has the BO reference + things that are properties of that bo. like size, format, etc.
14:58imirkin_: as well as a pointer to things like min/max layer, level, and the such.
14:58pmoreau: The tex instructions seems like they take a resource and sample ID, which references a texture/sampler slot? somewhere?
14:58imirkin_: sampler = parameters that tell tex wtf to do with the data
14:58imirkin_: i.e. filtering type, wrapping, etc
14:59pmoreau: And there is the rIndirect/sIndirect which is an offset on top of the start of the texture mem address?
14:59imirkin_: on all (G80+) arches, sampler views and samplers are stored in (large) tables -- TIC and TSC respectively
14:59imirkin_: on pre-kepler
14:59imirkin_: you have to *bind* a slot to a TIC/TSC entry
15:00imirkin_: i.e. you bind texture slots to specific TIC entries, and sampler slots to TSC entries
15:01pmoreau: So, the tex instruction reference the index of those entries, and the driver upload the TIC and TSC tables
15:02pmoreau: As well as doing the binding, which is done in nvc0_stage_sampler_states_bind & co
15:02pmoreau: (I assume)
15:04karolherbst: pmoreau: I assume that SPIR-V can't tell you the correct texture/sampler slots anyway so you need to look it up somewhere
15:05pmoreau: karolherbst: I think you are right, and SPIR-V is bindless.
15:05karolherbst: pmoreau: well wir NIR I can simply do this in all cases: mkTex(op, target.getEnum(), insn->texture_index, insn->sampler_index, defs, srcs);
15:05karolherbst: insn is the nir instruction
15:06pmoreau: Yes, I had a look at your code
15:06karolherbst: and defs/srcs is whatever have to be filled in depending on the tex ops and its arguments
15:06pmoreau: And then tried to see in the spirv_2_nir how those texture_index/sampler_index are computed.
15:06karolherbst: yeah.. for me the TGSI code was a bit too much spread around
15:07karolherbst: so I decided to not do that :)
15:08imirkin_: for pre-kepler, the tex instruction references the binded slots
15:08imirkin_: for kepler+, the tex instruction references an integer that has the TIC and TSC id encoded
15:09imirkin_: (except it can also reference a special constbuf which contains those integers, so an index into that constbuf)
15:09karolherbst: imirkin_: but this gets all handled inside gallium/codegen, right?
15:09pmoreau: Ah, the 0x1fff from `TXQ.B.T R0, R0, TEX_HEADER_DIMENSION, 0x1fff, 0x1;`, I guess?
15:09imirkin_: so .B = bindless
15:10imirkin_: it comes from R1 or R2 or something
15:10imirkin_: (or maybe even R0 for the txq case)
15:11imirkin_: without the .B, the 0x1fff mean "offset 0x1fff into special constbuf that has the TIC/TSC combo referneces"
15:11pmoreau: Do you know what .T, .P and .NODEP are? I have seen those in the binary generated by the blob. https://hastebin.com/uhureniqar.swift
15:12pmoreau: Okay, good to know
15:20imirkin_: NODEP == liveonly
15:20imirkin_: i think
15:20imirkin_: T and P are confusing. see how the emitter sets them.
15:21pmoreau: liveonly == only execute on live pixels of a quad (optimization), so similar to it being predicated?
15:22karolherbst: pmoreau: ignore it ;) I tried to figure it out and ran into all kind of issues
15:23karolherbst: pmoreau: https://github.com/karolherbst/mesa/commit/20d79f1b982f4e4f664c377199f700845a12d8ba
15:28pmoreau: Ah, found it:
15:28pmoreau: if (isNextIndependentTex(i))
15:28pmoreau: code |= 0x080; // t mode
15:28pmoreau: code |= 0x100; // p mode
15:29pmoreau: So I do not need to care about that flag, nice :-)
15:31pmoreau: karolherbst: Thanks for the pointer. It seems the blob applies that flag to more than just fragment shaders (if liveonly == NODEP).
15:31karolherbst: pmoreau: yeah I know
15:31karolherbst: but I was fighting with some freezes due to this opt
15:31karolherbst: but I think that NODEP may be something else
15:31karolherbst: or maybe not
15:31pmoreau: 12.5% improvement in furmark, not too bad! :-)
15:32karolherbst: actually faster than nvidia
15:32karolherbst: at some point I want to get this right, because it really makes a difference in some applications
15:32karolherbst: especially memory limited workloads
15:32karolherbst: I think
15:32karolherbst: not quite sure
15:33pmoreau: So looking at spirv_to_nir, it does not set insn->texture_index nor insn->sampler_index, but instead sets insn->texture and insn->sampler, which I would guess is the bindless approach?
15:35karolherbst: not quite sure
15:35karolherbst: I think this is just meta information
15:35karolherbst: and it gets processed later
15:36karolherbst: or maybe it is
15:36karolherbst: "If this is null, use texture_index instead." :)
17:49martestan: is bejesus present here?
18:07martestan: bejesus: i heard one of the real trolls on osdev is making claims that a quasimodo like this can manage to live family life? bejesus, are you bragging with this too? Bejesus or are you willing to answer to some programming questions?
18:09martestan: i was actually thinking about the lifetime of warps, in different scenarios, i.e multipass sso and whatnot, do you have clearity on this subject?
18:25Guest171: bejesus: tell me if a clueless quasi like you, do you brag also that you have another clueless one to put your dick into? or is it that you do some programming instead of trolling also sometimes?
18:32BootI386: Peace and chocolate
19:40annadane: i have chocolate. it is quite good
19:41karolherbst: annadane: unfair
19:41karolherbst: :( nothing appears here
19:44karolherbst: but still thanks for trying
21:18pmoreau: Expected: 0.184314 0.666667 0.768627 0.592157
21:18pmoreau: Actual: 0.674510 0.039216 0.498039 0.549020
21:18pmoreau: “Actual” is no longer “0 0 0 0” :-D
21:18pmoreau: I’ll blame floating point precision and ship it! :-p
21:19pmoreau: I guess I should worry a bit about “linked tsc: 0”
21:20karolherbst: I am wondering if you could just pipe that through spir-v to nir... or how many missing bits we would detect here
21:21pmoreau: That might work, haven’t really looked into it.
21:22karolherbst: yeah :)
21:22karolherbst: then you would be able to compare
21:22karolherbst: looking on what TGSI does helped me quite a lot
21:31imirkin_: pmoreau: ok, so linked tsc means ...
21:32imirkin_: "ignore sampler id. always use the tic id as the tsc id"
21:32imirkin_: so that means that the tsc/tic lists have to be grown/set in tandem
21:32imirkin_: this maps well to GL but not to DX (or, i imagine, vulkan)
21:33pmoreau: How do you grow it in tandem? By creating simultaneously a sampler view and a sampler?
21:33imirkin_: by having one object that represents both
21:33imirkin_: it doesn't map well to gallium
21:34imirkin_: we always set linked tsc = 0 :)
21:34pmoreau: Ah, so when in OpenGL you do not create a separate sampler object but rely on the one created by the driver/default one?
21:34imirkin_: but that's not how gallium works.
21:43karolherbst: imirkin_: some optimization potential: https://gist.githubusercontent.com/karolherbst/b1c8b9fcbe5f652ea3a3fe2ac1e6fa99/raw/c795f7ffb9bc13d90c7ee2c7b6655356136c6c40/gistfile1.txt
21:56imirkin_: karolherbst: very weird.
21:56imirkin_: feels like 2 opts fighting or something
21:59karolherbst: maybe yes
22:09martinjok: hello, god is back -- so who wants bejesus to be crucified?
22:37karolherbst: imirkin_: uhh, for a cvt f32 -> f16 can I select the split? there is this weird subOp = c & 1 for TGSI_OPCODE_UP2H
22:37karolherbst: like so that I can skip the & 0xffff and >> 16 operations?
22:39imirkin_: "select the split"
22:39imirkin_: f32 -> f16 is not a split
22:39imirkin_: the subop is for integer conversions
22:40imirkin_: do you mean f16 -> f32?
22:40karolherbst: ohhh right, it is unpack, not pack
22:40karolherbst: well the nir ops are called unpack_half_2x16_split_x and unpack_half_2x16_split_y
22:45imirkin_: after the glsl ir ops presumably
22:46imirkin_: but basically it should grab the high word using whatever means (e.g. a shr), and then convert to f32
22:46karolherbst: yeah, it seems much easier to do the f16 -> f32 cvt with the subop
22:51imirkin_: not 100% sure that it's an option, but if it is, go for it
22:52karolherbst: well TGSI does the same
22:52imirkin_: then it's an option ;)
22:52imirkin_: i don't really remember how that all works
22:53imirkin_: i figured it out, implemented it, and promptly forgot
22:53karolherbst: well as long as the test passes it isn't that wrong
22:53karolherbst: now load_per_vertex_output :) last bit missing for full tessellation pass
22:58karolherbst: imirkin_: can it be, that those vertex outputs can only be read once? otherwise I don't really get that vtxBaseValid handling inside Converter::getOutputBase
22:59karolherbst: uhh, that is just some nasty opt in the converter
23:00karolherbst: or maybe not, ...
23:02imirkin_: Value *vtxBase; // base address of vertex in primitive (for TP/GP)
23:02imirkin_: so ....
23:03imirkin_: for a geom shader
23:03imirkin_: you get N vertices on input
23:03imirkin_: for each one you have to do a PFETCH to find its "base" address
23:03imirkin_: to be fed into a VFETCH or whatever
23:03imirkin_: vtxBase caches those base addresses
23:04imirkin_: vtxBaseValid keeps track of which thigns are valid in there
23:04karolherbst: well right, but getOutputBase simply does some adds if it isn't cached
23:05imirkin_: the real question is ...
23:05imirkin_: why is s 0..4
23:05imirkin_: gimme a sec
23:05imirkin_: s is the argument in the tgsi op
23:05imirkin_: ok, so imagine you have a SINGLE tgsi opcode
23:06imirkin_: which takes an arg like ...
23:06imirkin_: now, this gets split up into 4 logical ops
23:06imirkin_: to fetch all 4 inputs
23:06imirkin_: however you only want to fetch the vertex base *once* since it'll be the same for all 4 things
23:07imirkin_: so getVertexBase keeps track of the vertex base for a particular argument of a particular tgsi op
23:07pmoreau: I’ll get those images working later; I’d better get back to the clover series.
23:07karolherbst: imirkin_: ahh, I see
23:08karolherbst: but still, in the output case nothing gets fetched
23:08imirkin_: with a tess control output
23:08imirkin_: it also has a base
23:08imirkin_: and you might be READING from the output
23:08imirkin_: so ... same basic idea.
23:09imirkin_: any particular arg will only ever be an input or an output
23:09imirkin_: but the way you calculate that base address to pass to VFETCH is different
23:09karolherbst: ahh, I see
23:10imirkin_: the specific details behind this were not easy to figure out =/
23:11imirkin_: i learned a lot about tess :)
23:12imirkin_: and hopefully you're learning a lot right now too
23:12imirkin_: about all the various things that can exist
23:34karolherbst: pass :)
23:34karolherbst: but I set the hdr field with a fixed value... so I need to figure that out now
23:46imirkin_: just min/max read output tcs slot
23:47imirkin_: (i assume that's what you're talking about?)
23:56karolherbst: I didn't set the oread flag on the outputs
23:57karolherbst: imirkin_: one thing I am wondering about: maybe we could get something like that shader_info thing for TGSI as well to save us the trouble of iterating over all instructions twice