09:40 martty: hi, i was wondering about the following: nvidia recommends using combined sampler-image descriptors (in their vulkan dos and don'ts doc)
09:40 martty: does someone know why and how much of a difference that makes?
09:42 HdkR: It makes a difference. How much of one is ¯\_(ツ)_/¯
09:43 HdkR: And you can make the assumption that it makes less of an impact the newer the generation of GPU you test
09:46 martty: i was hoping there was some insight on what happens differently when using a combined descriptor
09:54 HdkR: something about TIC and TSC caches
09:54 HdkR: So you know that it just has some special caching which makes it faster :P
12:28 karolherbst: marex: fyi, the nv50 suspend fix got queued for 5.14
12:28 karolherbst: ohh
12:29 karolherbst: wait
12:29 karolherbst: you got the email as well :D
14:15 marex: karolherbst: the suspend-takes-forever fix ?
14:16 karolherbst: ehh, no, the suspending-is-compltely-broken one
14:16 marex: ahh
14:16 karolherbst: you should have gotten the email from greg as well
14:16 marex: karolherbst: I am about to get to that part
14:18 karolherbst: well, there is nothing to do from our end, it's just a fyi email :D
14:19 marex: karolherbst: the event stuff might be nice to fix too :)
14:20 karolherbst: yeah...
14:22 marex: karolherbst: oh hey, there is also email about this hitting stable, lovely
14:23 karolherbst: that's what I meant
18:19 imirkin: martty: HdkR: the hardware supports two modes -- "linked" and "unlinked" modes. but a driver kinda has to pick one and work with it, i doubt it's practical to switch at runtime. so if the vk driver picks the "linked" mode, then there's a 1:1 relationship between view and sampler. and it would have to jump through a lot of hoops to let you pick an arbitrary sampler in that case. dunno if the vk driver picks linked mode though.
18:21 martty: hm, interesting
18:21 imirkin: GL tends to run in linked, and DX tends to run in unlinked
18:22 imirkin: (and since gallium is kinda DX, nouveau also runs in unlinked)
18:23 karolherbst: imirkin: is there actually a practical advantage over using linked?
18:23 imirkin: there are additional complications on e.g. nv50 where you can directly address 128 views but only 16 samplers. so for GL, running in linked mode is the only way to have > 16 textures.
18:23 martty: do the modes require different instructions or just different setup?
18:23 imirkin: martty: different setup
18:23 imirkin: in the linked mode, the "sampler" index is just ignored, and it uses the view index to index into the TSC
18:24 imirkin: (fermi as well btw, but it can address 32 samplers)
18:31 martty: do you know if this changed for kepler? IIRC fermi does not support vulkan
18:31 imirkin: sort of
18:31 imirkin: kepler is bindless.
18:32 imirkin: (even nv50 supports vulkan, nvidia just chose not to make drivers for those)
18:32 imirkin: basically kepler+ operates on texture "handles", whose upper bits are an explicit sampler index reference
18:32 imirkin: however in linked mode, same deal as before, sampler index is ignored
18:38 martty: hmm, so on kepler+ combining image and sampler is just a bitor?
18:38 martty: (in unlinked)
18:47 imirkin: martty: yep
18:48 imirkin: but also a lot more bits, so you can index much deeper into the table
18:48 imirkin: (since you don't have bindings, you sorta have to)
18:48 imirkin: modifying the table requires flushes, so you don't want just the active set in the table
18:50 martty: then I don't see why NV would recommend a pre-bitored handle (i assume that is what it is then) over doing it in the shader
18:50 imirkin: if for whatever reason vulkan runs in linked mode
18:50 imirkin: then you can't index samplers directly
18:54 martty: i suppose if the handles don't both contribute 32 bits, then you are wasting some bits loading both
18:54 imirkin: the whole handle is 32 bits :)
18:55 imirkin: but you don't get more "index bits" in linked mode afaik
18:55 imirkin: it's just that the sampler bits are ignored, and it uses the view bits for both
18:56 martty: oh i thought it would be 64 :), but i guess the same still applies (but it is just one extra load)
18:56 imirkin: i guess 20 bits of view and 12 of sampler
18:56 karolherbst: imirkin: ehh,, should be correct..
18:58 karolherbst: imirkin: btw.. is there something really annoying we have to fix in this regard?
18:58 karolherbst: I remember the bindless stuff being annoying, but besides that?
18:58 imirkin: karolherbst: switching to linked mode?
18:58 imirkin: bindless isn't any more annoying than binded...
18:59 karolherbst: generally I mean
18:59 imirkin: nothing's broken to begin with...
18:59 karolherbst: or why would you want to switch to linked mode?
18:59 karolherbst: ohh.. for nv50 and/or fermi
18:59 imirkin: well, to have >16 or >32 textures on nv50/fermi
18:59 imirkin: (for GL)
18:59 karolherbst: is it useful?
18:59 imirkin: if the application expects that many textures, then yes, very useful
19:00 imirkin: otherwise totally useless ;)
19:00 karolherbst: mhh.. I think only wine uses more than 32 actually
19:00 imirkin: but an application ported from DX to GL (or DX emulated on GL) might use all 128 views
19:00 karolherbst: yeah..
19:00 imirkin: DX10 specifies 128 views, 16 samplers
19:00 imirkin: DX11 is 32 samplers
19:02 karolherbst: imirkin: soo.. what I know is, that there isn't just "this is full bindless" in hw, but some weird combinations. like you can specify the texture and the ptr just points to the sampler
19:03 imirkin: isn't that what i said?
19:03 karolherbst: I meant for bindless textures
19:03 imirkin: no such thing
19:03 imirkin: it's all bindless.
19:03 karolherbst: well, on hw it is
19:03 imirkin: on hw it's all bindless.
19:03 karolherbst: *in
19:03 imirkin: the bindings are just pre-baked handle combos retrieved from TEX_CB
19:04 karolherbst: right, but you don't have to go full bindless, that's what I mean
19:04 karolherbst: the ptr could just specify the sampler and the texture is retrieved from the constant
19:04 imirkin: sure. you can assemble it in the shader.
19:04 karolherbst: nope
19:04 imirkin: but i don't see how that's any different.
19:04 karolherbst: I meant the isntruction supports it
19:05 imirkin: ok, that's news to me
19:05 karolherbst: hence me mentioning it
19:05 karolherbst: and if it could help with anything
19:05 imirkin: afaik the instr only supports retrieving the full value from TEX_CB
19:05 imirkin: or not retrieving anything from TEX_CB
19:05 imirkin: i.e. TEX vs TEX.B
19:05 karolherbst: mhhh, it could be that not all gens support this
19:05 imirkin: perhaps
19:07 karolherbst: but if you do bindless in GL, you have both bindless or something else? never looked at it really from an API perspective
19:07 imirkin: sure
19:07 imirkin: you still have bindings
19:07 imirkin: but you also have the option of creating API-level bindless handles, and using them in shaders
19:07 karolherbst: right
19:07 karolherbst: but that handle points to both or just the texture/sampler?
19:07 imirkin: (you have to declare which ones you're going to use)
19:08 imirkin: both
19:08 karolherbst: ahh, okay
19:08 imirkin: nouveau does it slightly "wrong"
19:08 imirkin: (or perhaps "not flexible enough"
19:08 karolherbst: then I wonder why there are those mixed modes :/
19:08 imirkin: coz for GL, it's a lot more natural to keep them linked 1:1
19:09 karolherbst: so what might be the benefit of loading the texture from the instruction encoding and the sampler as a passed in handle
19:09 imirkin: well, #1 it's uniform
19:09 karolherbst: sure
19:10 imirkin: realistically, i'm not entirely sure.
19:10 karolherbst: I mean.. that could have some benefits, but I mean, in what cases does it make sense to have this full bindless handle for samplers, but not textures
19:10 imirkin: like what's the difference between TEX and MOV c[] + TEX.B? dunon.
19:10 karolherbst: probably none
19:10 imirkin: at least not offhand.
19:11 imirkin: no, i'm sure there is some.
19:11 karolherbst: in Volta+ you can even specify the cb buffer now
19:11 imirkin: in the ISA?
19:11 karolherbst: yes
19:11 imirkin: neat
19:12 karolherbst: I guess before that it's stored in the context somewhere
19:12 imirkin: yes
19:12 imirkin: TEX_CB_INDEX
19:12 karolherbst: yeah.. I think we don't set it anymore for volta..
19:12 imirkin: aho k
19:13 imirkin: gotta come from somewhere :)
19:13 karolherbst: or we still do.. dunno
19:13 karolherbst: the hw doens't care
19:13 karolherbst: but for emitTEX we do emitField(54, 5, prog->driver->io.auxCBSlot);
19:13 karolherbst: so..
19:14 imirkin: heh ok
19:15 karolherbst: ohh wait.. we use the mixed bindless forms for indirects
19:16 imirkin: for indirects, we just load the handle from the tex cb directly
19:16 imirkin: at least that's what we used to do.
19:17 karolherbst: yeah... I think that changed with volta
19:17 karolherbst: or maybe not?
19:17 imirkin: ah ok. i haven't been following volta at all.
19:17 karolherbst: all this lowering is a bit complicated
19:18 karolherbst: we do check for "insn->tex.rIndirectSrc" though and use bindless once it's 0+
19:18 imirkin: oh yeah, the lowering attempts to deal with indirection in R and S
19:18 imirkin: which would be a thing on e.g. fermi
19:18 imirkin: but on kepler that concept is kinda shot
19:18 imirkin: and since it never happens in practice, hard to care.
19:19 karolherbst: yeah... mhh
19:21 karolherbst: I guess this mixed form could be used for indirects somehow
19:21 karolherbst: but we still do our own thing for now