00:40karolherbst: imirkin: okay, so TEXS is Texture Fetch with scalar/non-vec4 source/destinations and I think there isn't really much more to it than that
00:40karolherbst: there is also TLD4S
00:40karolherbst: and TLDS
00:41imirkin: you can make a wild guess what those do :)
00:41imirkin: anyways -- iirc TEXS has 2 dests, which are 2-wide
00:41imirkin: and 2 sources, which i think are just that -- two sources
00:41imirkin: so this only works for basic 2d lookups
00:41karolherbst: the encoding is a bit non descriptive
00:41karolherbst: or well
00:41karolherbst: what envydis and nvdisasm output at least
00:42imirkin: the 2-dest thing threw me off originally
00:42karolherbst: but at least here: "texs 0x0 $r32 $r31 $r29 0x10 t2d rgb"
00:42karolherbst: all regs are scalar ones
00:42karolherbst: but what is this 0x0 and what is this 0x10?
00:42imirkin: 0x10 is the texture id
00:43imirkin: 0x0 i suspect is one of the 2 dst's (i.e. RZ)
00:43karolherbst: okay, that should be easy to verify
00:43karolherbst: 0x1111 as the texture id?
00:43karolherbst: I mean
00:44karolherbst: the max value is 0x1fff
00:44imirkin: same as with TEX...
00:45karolherbst: nvdisasm: TEXS.T RZ, R32, R31, R29, 0x10, 2D, R;
00:45karolherbst: so yeah, RZ
00:46imirkin: right, so it's actually R and not RGB
00:46imirkin: that makes more sense
00:46imirkin: i think that second dst is actually the first one, logically
00:46karolherbst: I guess I will fix the envydis output regarding that rgb stuff
00:46imirkin: so this is saying R32 = texture(x=R31, y=R29).R
00:47karolherbst: and what is that .T thing
00:50karolherbst: valid color combinations: r,g,b,a,rg,ra,ga,ba
00:54imirkin: all are valid
00:54imirkin: rgba is valid too
00:54imirkin: each dest is a pair
00:54karolherbst: it doesn't fit
00:54karolherbst: or well
00:54karolherbst: at least from what envydis is telling me
00:54karolherbst: there is a 0x1c mask for it
00:54karolherbst: so 8 values
00:54karolherbst: but yeah... I saw rgb with nvdisasm
00:57karolherbst: imirkin: it depends on if the second dest is set
00:57imirkin: if the second dest is not set
00:57imirkin: then you can only have 2 components
00:57karolherbst: the above combinations are valid if the second is RZ
00:58karolherbst: rgb, rga, rba, gba, rgba, invalids5/6/7 with two dests
00:59karolherbst: sooo how do we make envydis do the right thing here
00:59karolherbst: two variants?
01:00imirkin: just do it as part of the dests
01:00imirkin: i.e. have one table which handles the r/g/b/a stuff AND the argument
01:00imirkin: er, dest
01:03karolherbst: mhh, makes sense
01:04imirkin: also it's first-match
01:04imirkin: so you could make the zero-second-dest ones come first
01:04imirkin: and match the fact that the zero dest is ... a zero
01:05imirkin: pretty sure that should be safe.
01:05karolherbst: this is my current plan
01:10karolherbst: texs 0x0 $r32 $r31 $r13 0x10 t2d rg
01:10karolherbst: texs $r239 $r32 $r31 $r13 0x10 t2d rgba
01:11karolherbst: imirkin: https://github.com/karolherbst/envytools/commit/08212d9341c66ca4d48b48947df809cc8769df0b
01:14karolherbst: that .T thing seems to be always a .T
01:14karolherbst: okay, so this makes sense then after all
01:18karolherbst: mhh there are weird flags though
01:18karolherbst: or more like, types
01:22karolherbst: imirkin: any idea what ll, lz, dc, aoffi, mz could mean for texs and tlds?
01:23karolherbst: ohh we have those for tex as well
01:53imirkin: karolherbst: ll = level lod, lz = level zero, dc = depth compare, aoffi = offset, mz ... dunno that one. ms maybe? that's ms :)
02:03nyef: Maybe it's for rendering via a DOS executable file?
02:27imirkin: karolherbst: there's also "la" which is level auto
02:28karolherbst: imirkin: I see. Well I was focusing on the ones we got with texs and tlds
02:28imirkin: of course if you can only have 2 args, i dunno how useful any of those are ... each one of those takes an extra arg, so ... with 2 args ...
02:28imirkin: not a lot of room for coordinates
02:29imirkin: only useful one in there would be lz
02:29karolherbst: it goes only up to 3d or 2darray
02:29imirkin: how would that work?
02:29imirkin: do the args become doubles in some cases?
02:29karolherbst: I odn't see any other way
02:29imirkin: like i said ... needs some RE :)
02:30karolherbst: but I expect that you've got two vec2 srcs effectively
02:30karolherbst: there is also cube
02:31imirkin: 99% sure i've seen texs 2d with the x/y args in separate args
02:31karolherbst: that's right
02:31karolherbst: but for 3d it could be xy in one and z in the other
02:31karolherbst: or xz in one and the other is y
02:33imirkin: could be.
02:33imirkin: or could just not be usable
02:33karolherbst: anyway, should be fairly easy to check
02:38karolherbst: imirkin: TEXS.NODEP.P RZ, R0, R4, R3, 0x58, 3D, R;
02:39karolherbst: MOV R4, c[0x0][0x150];
02:39karolherbst: MOV R5, c[0x0][0x154];
02:39karolherbst: MOV R3, c[0x0][0x158];
02:39karolherbst: before the TEXS
02:39karolherbst: xy in src0
02:39karolherbst: z in src1
02:40imirkin: check if i was right about how it worked for 2d
02:40imirkin: i could have been confused
02:40karolherbst: I have the pixmark_piano shader from nvidia
02:40karolherbst: "00006448: f2272927 d822010f texs nodep 0x0 $r39 $r41 $r34 0x10 t2d rgb"
02:42karolherbst: anyway, with OpenCL code doing 2d images, I also get something like that:
02:42karolherbst: TEXS.NODEP.P R6, R4, R2, R3, 0x58, 2D, RGBA;
02:42karolherbst: wondering why I get .P though and not .T
02:43imirkin: you can look at the isDependentSomething() logic for selecting T vs P
02:43imirkin: tbh i don't quite remember what those mean
02:45karolherbst: it is in the sched stuff
02:46imirkin: used to be an explicit flag
02:46imirkin: dunno about maxwell
02:47karolherbst: it is
02:50karolherbst: for texs a 0 sched opcode is even legal
02:50karolherbst: with -1
02:51karolherbst: 0x20000 = T, 0x40000 = P
02:51karolherbst: 6 = INVALIDPHASE3
02:52karolherbst: that is the ru field
02:59karolherbst: imirkin: array_2d: src0: index, src1: xy
02:59karolherbst: actually no
02:59karolherbst: src0: index+x, src: y
03:05karolherbst: tlds with 2dms: TLDS.LZ.MS.NODEP.P R6, R4, R2, R8, 0x58, 2D, RGBA;
03:05karolherbst: src0: x+y src1: sample
03:10karolherbst: no idea why array2d is so weird
03:11karolherbst: the fuck
03:11karolherbst: imirkin: array1d: src0: index+x, no src1
03:12karolherbst: and a weirdo 0xf value
03:12karolherbst: ohh wait
03:12karolherbst: it is TEX, not TEXS
03:12karolherbst: okay, no array1d then
13:35rhyskidd: put up some low-level documentation to envytools
13:36rhyskidd: RSpliet: you might be interested in the FECS work, given your recent publication instrumenting FECS firmware on Kepler
13:43mwk: rhyskidd: these commands are a property of the running firmware, not hardware
13:43mwk: ideally it'd live in a separate xml file
13:44mwk: but since we don't really have one at the moment, just add a big comment above it saying that's just nvidia firmware's enum
13:54rhyskidd: the FECS ones?
14:00rhyskidd: updated with the comment
14:01JayFoxRox: rhyskidd = Echelon9 ?
14:04rhyskidd: JayFoxRox: think that we both worked on Cxbx and XQEMU previously?
14:04JayFoxRox: I always have a hard time putting your name to anything. but your name keeps popping up :P
14:04JayFoxRox: I know that you have some relation with Cxbx [the non -reloaded portion]. I did not know you worked on XQEMU in the past tbh
14:05JayFoxRox: I personally never worked on any upstream Cxbx, but I did work with a private fork of Cxbx and eventually other Cxbx inspired projects; then did a lot of work on XQEMU and now the whole XboxDev ecosystem
14:07JayFoxRox: [I can't seem to find your name in the espes/xqemu or xqemu/xqemu contributor list either?]
16:24karolherbst: imirkin: do you think that adding a Target::scalarTEXLegal(Instruction*) and then the emitter and RA just assume the other will do the right thing? What my current plan is to check that when lowering the TEX instructions and build the TEXS layout there and RA could just skip processing the tex completly. Because at some point we have to reorganize the sources and I would rather do it when lowering than inside RA
16:26karolherbst: or maybe we even add a tex.scalar field and if this is set while lowering, RA and the emitter know what to do
16:29karolherbst: or we check that inside RA and set a tex.scalar bit there
16:29karolherbst: but it seems like we might end up reordering stuff again
16:31karolherbst: mhh, maybe I go for the bit field, then we could adjust the name when printing
16:40imirkin: karolherbst: up to you.
16:56karolherbst: okay, then I just see what works good enough
16:56karolherbst: imirkin: maxwell is the first gen with those scalar textures ops, right?
17:00karolherbst: mhh, volta seems to have a weird mix
17:00karolherbst: 2 dests, but vec4 coords source
17:05karolherbst: imirkin: any tex state you want to have in the prints? I will probably add some to easy debugging, but maybe there are some which you always miss to see.
17:53karolherbst: imirkin: mhhh, we need DCE to get the proper texture mask :/
17:55imirkin: that's why doing this in the texConstraintFoo thing is a good idea :p
17:56imirkin: it already has logic to partition the various args into 2 args
17:56imirkin: the split isn't always at 4
17:59karolherbst: yeah, seems like it
19:26karolherbst: imirkin: changes for TEX.2D: https://gist.githubusercontent.com/karolherbst/de8cd9e6752fb57125a3e5a3a9d46415/raw/d6014ee25f2d112f98c785cba6dc544f8cd3d560/gistfile1.txt
19:35karolherbst: oh wow, but that's mainly because we RA isn't really that smart: https://gist.githubusercontent.com/karolherbst/f5cd18ef311b0b52064450f3d9d76860/raw/47412c525eaae641690c0c90180cb1b943f3dcf2/gistfile1.txt
20:11imirkin: karolherbst: not a lot of ways around it ... but why is it taking a coalesced 64-bit arg?
20:11imirkin: shouldn't it be 2 separate args?
20:11imirkin: that would solve your RA woes
20:11karolherbst: no, that is the old way
20:12imirkin: well then yeah. that's the point of TEXS :)
20:12karolherbst: with ,y stuff that becomes:
20:12karolherbst: texs 2D $r8 $s0 r f32 $r4 $r0 $r2 (8)
20:12karolherbst: texs 2D $r9 $s0 ra f32 $r0d $r0 $r2 (8)
20:12karolherbst: (the texs is simply print() stuff, op is OP_TEX
20:13karolherbst: should be less confusing, otherwise you don't really know offhand how to interpret the sources/defs
20:13karolherbst: or maybe it would be obvious in most cases. Don't know yet
20:13imirkin: seems nice.
20:33karolherbst: mhh with all possible OP_TEX and OP_TXL combinations I get insn: -1.27% gprs: -0.70%... oh well, I guess tld(4) with a similiar impact
20:34karolherbst: *will have
21:07rhyskidd: pendingchaos,karolherbst: some nice opt improvements there in the backlog
21:16karolherbst: rhyskidd: yeah. That TEXS stuff is quite nice. I hope I get to -2% insn, -1% gprs in the end
21:19karolherbst: and with the XMAD stuff we should be quite up to speed on maxwell. Still need to review that one though
21:33rhyskidd: anyone noticed there appears to be Falcon v6 on Pascal (for some of the microprocessors)? envytools docs are silent on anything above v5
21:33rhyskidd: time for some additional docs!
21:34mwk: yeah, there is
21:34mwk: so what
21:34mwk: we barely have a clue about v5
21:34rhyskidd: PVDEC, PSEC2 and PGRAPH.CTXCTL are v6 on Pascal
21:38rhyskidd: sure. well, at least not to hard for me to write up that the v6 is at least a thing
21:38mwk: go for it then
21:57karolherbst: mhh, there is quite a big incentive to write a vulkan driver for nouveau. With that vkd3d, dxvk and dxup stuff, running dx10-dx12 applications is like super fast
21:58HdkR: I fully back that idea ;)
21:59pendingchaos: there are also Vulkan-only games I think
21:59pendingchaos: (though I don't know how well they would run with Vulkan-compatible hardware)
22:02karolherbst: kind of
22:02karolherbst: Rise of the Tomb Raider is vulkan only
22:02karolherbst: and those games are usually runnning quite fast with Nouveau
22:02karolherbst: but yeah... might be different with a new vulkan driver
22:03karolherbst: anyway, we can have vulkan on kepler hardware, so this is a win
22:03karolherbst: if nobody starts with that in like 1 or 2 months, I just go ahead and work on that :p
22:04rhyskidd: there's other Vulkan-only titles like F1 2017, Thrones of Britannia and perhaps one or two in the Feral pipeline
22:06nyef: Can we have vulkan on tesla hardware, or is that too much to ask?
22:11pendingchaos: I think it's too much to ask
22:11pendingchaos: it doesn't look like it supports the required features?
22:13pendingchaos: wikipedia, opengl.gpuinfo.org and mesamatrix.net all seem to show it at OpenGL 3.3
22:13karolherbst: I don't even know if fermi can do it for real
22:13karolherbst: with the nvidia driver it seems to be kepler only
22:13karolherbst: kepler+ only
22:13pendingchaos: nvidia had a Vulkan driver for Fermi
22:13pendingchaos: but they removed support for some reason
22:14karolherbst: pendingchaos: mhh
22:14karolherbst: "February 23rd, Windows 356.43, Linux 355.00.28"
22:15karolherbst: but I think it was a mistake actually
22:24pendingchaos: yeah, apparently it never worked
22:44karolherbst: pendingchaos: yeah, allthough the ISA is quite the same, they had to change a few things
22:45karolherbst: and I don't think fermi even supports bindless textures
22:45rhyskidd: a series of Falcon cleanups to envytool's docs are on the PR list for review by any takers :)
22:49pendingchaos: karolherbst: what's the status of https://github.com/karolherbst/mesa/commits/opt_codegen_slcts_v2 btw?
22:50karolherbst: unreviewed I guess
22:50karolherbst: and untested
22:51karolherbst: I think the patches are correct though
22:53karolherbst: but yeah, those patches should be merged, becaues they kind of benefit all chipsets
22:54karolherbst: and they don't make things worse
22:57pendingchaos: HdkR: fcmp/icmp
22:57pendingchaos: (on Maxwell/Pascal)
22:57karolherbst: HdkR: (a compare 0) ? b : c ;)
22:58HdkR: Ah, comparison + conditional selection, I see
23:16mooch2: rhyskidd, i just reviewed both of them l)
23:19imirkin: nyef: nothing that i'm aware of should preclude vulkan from running on tesla hardware.
23:19imirkin: it will take a bunch of work, since compute hasn't been seriously brought up on tesla
23:20skeggsb_: memory management at least would be tricky/impossible to do efficiently, and not sure how badly the lack of bindless will go
23:20imirkin: skeggsb_: doesn't seem to stop the intel folk
23:21imirkin: skeggsb_: and i dunno - memory management may not be so hard, since vulkan's pretty explicit about stuff
23:21imirkin: skeggsb_: iirc the issues center around moving buffers between vram and gart?
23:22skeggsb_: more the lack of dual page tables, and not being able to mix small/large pages in the same massive chunk of virtual address-space
23:23rhyskidd: mooch2: thanks
23:23skeggsb_: the explicitness is exactly what makes that trickier :P
23:23skeggsb_: not a problem on fermi and up though
23:28imirkin: skeggsb_: hmmm... i guess i don't have a perfect view of the constraints. seems like it should be manageable -- it's the user's problem, and the kernel just rejects illegal things
23:28imirkin: skeggsb_: what's the dual page table thing? you just mean being able to flip in and out between contexts?
23:28imirkin: via a single update vs rewriting the pde each time
23:29skeggsb_: there's multiple page tables for different page sizes covering the same area of virtual address-space on fermi, so you can mix small/large pages
23:30skeggsb_: on tesla, that has to be decided in advance, and can't mix within a large (512MiB?) area
23:31skeggsb_: so, with vulkan, you'd potentially end up with memory objects that you can't bind to certain images/buffers
23:31skeggsb_: of course there are probably ways we can work around it, just not sure if it'd even be worth the trouble
23:32imirkin: well, 512MB isn't that big -- that's just for VA space, right?
23:33imirkin: are there restrictions, like some stuff HAS to be in large pages (or small ones)?
23:33skeggsb_: if you want compression/zbc/etc, you need to use large
23:33imirkin: i think when you allocate, you have to say what all it can be used for
23:34imirkin: i'd have to check details
23:34skeggsb_: yeah, i think we *could* make it work, it'll just be tricky
23:38imirkin: so looks at vkAllocateMemory time, you have a size (duh), as well as a memory type.
23:38karolherbst: imirkin: huh... how would a TLD.LL look like inside codegen? OP_TXF with !levelZero?
23:38imirkin: karolherbst: yes.
23:38imirkin: except i think the .LL is implied
23:38imirkin: so it wouldn't be listed
23:38karolherbst: well, not with TLDS
23:38skeggsb_: not on volta either :P
23:39karolherbst: in TLDS there are either LL or LZ variants
23:39imirkin: well ... what level would it fetch from?
23:40karolherbst: that TLDS4.aoffi stuff will be fun...
23:40karolherbst: anyway. TEXS is complete as far as I can tell
23:41karolherbst: _maybe_ we can do an indirect texture reference
23:41karolherbst: as there is still one source slot left
23:59karolherbst: imirkin: mhh, for TXF we do texi->tex.levelZero = ms;, but there are LL and LZ variants at least from nvdisasm