02:50 gnurou: RSpliet: afraid I don't, sorry :(
15:04 RSpliet: gnurou: shame! Thanks
16:02 gdk: hello, does anyone know the size of the gpr registers on the gm107 gpu? I assume its 64 bits
16:08 imirkin_: 32-bits on all nvidia gens
16:08 imirkin_: some ops will take a pair of regs to operate as a 64-bit quantity (like the various double ops)
16:08 imirkin_: [contiguous regs]
16:15 gdk: ok, thanks
16:15 gdk: also, about the constant buffer offsets
16:16 gdk: it seems to hold the offset in bytes * 16, is that correct?
16:16 gdk: for what are the lower 4 bits used?
16:18 imirkin_: const buffer offsets are specified in multiples of the access width iirc
16:18 imirkin_: so if you're accessing it as 32-bit, then it's in multiples of 4 bytes
16:18 imirkin_: etc
16:19 imirkin_: why do you ask btw?
16:19 gdk: I've been trying to understand maxwell shaders
16:19 imirkin_: use nvdisasm :)
16:19 imirkin_: or envydis
16:19 imirkin_: it knows how to process this stuff and normalize it
16:19 gdk: I use envydis but it seems to have some bugs
16:19 imirkin_: point them out
16:20 imirkin_: or better yet -- fix them
16:20 gdk: when I try to disassembly some shaders it seems to get off by one byte at some point
16:20 imirkin_: inconceivable.
16:20 imirkin_: your source data must be wrong
16:22 gdk: I checked against the file on a hex editor and it seems to get off by one byte and read the wrong opcode values at some point
16:22 gdk: it doesn't happen with all shaders through
16:24 gdk: anyway, the last shader I looked at uses ld const buffer, with a register offset
16:24 gdk: and the register offset seems to be pretty much always left-shifted by 4 before it is used with the ld instruction
16:25 gdk: the access size is 64-bits
16:25 imirkin_: that's common because of how GL uniforms work
16:25 imirkin_: in layout (std140), even if you have like float[5], it's supposed to be laid out with a 128-bit stride
16:27 gdk: hmm
16:27 gdk: looking at the const buffer dump it doesn't seems to have those strides through
16:27 imirkin_: ok, well, without seeing the shader, it's hard for me to tell
16:28 gdk: I can send you the shader if you want and have the time to look into it :P
16:32 gdk: the "CB_POS" register is an offset in bytes, right?
16:32 imirkin_: sure, pastebin the envydis output
16:33 imirkin_: CB_POS is in bytes, yeah
16:37 gdk: its better to send the binary file I think as envydis can't disassembly the full shader here
16:39 imirkin_: that means your binary file is corrupt
16:41 gdk: it shows "00000210: ??????04 ???????? ??? [unknown: ??????04 ????????] [incomplete] [unknown instruction]"
16:41 gdk: even through there is data at 0x210 and after that aswell
16:43 gdk: I can get it to go further by manually replacing some instructions by 0
16:43 imirkin_: probably a missing sched?
16:44 imirkin_: you're supposed to have a sched every 4th op
16:46 gdk: yea it does have the sched
16:46 imirkin_: fine, send me the raw file
16:46 imirkin_: i'll have a look to see why envydis hates it
16:47 gdk: I guess its a problem on my build
16:48 imirkin_: or you're using envydis wrong
16:48 gdk: here: https://www.dropbox.com/s/t5hcskc00miz9x1/spl_sh.bin?dl=0
16:51 imirkin_: worksforme
16:51 imirkin_: you do seem to have some gunk at the end...
16:51 imirkin_: a trailing 0x00
16:52 imirkin_: gdk: https://hastebin.com/izegozelad.bash
16:52 gdk: thanks, I built envydis with msys2 and I'm on windows so maybe thats the issue
16:53 imirkin_: hmmm ... dunno. could be some kind of msys fail.
16:53 imirkin_: i know fairly little about it
16:53 imirkin_: could also be 32-bit fail -- did you build a 32- or 64-bit binary?
16:54 gdk: 64-bit
16:54 imirkin_: ok, it's not that then :)
16:55 gdk: the part I was talking about is at around 0x210
16:55 gdk: ld b64 $r0 c3[$r6]
16:55 imirkin_: and then note the next op
16:55 imirkin_: so it's really loading 128 bits of data
16:56 imirkin_: into $r0..$r3
16:56 gdk: yea, I just don't understand the iscadd with the left shift by 4
16:56 imirkin_: well
16:56 imirkin_: the shader is like
16:56 gdk: its basically doing 0xe0 << 4
16:56 imirkin_: vec4 foo[5]
16:56 imirkin_: no
16:56 imirkin_: it's doing (n<<4) + 0xe0
16:56 gdk: oh
16:57 imirkin_: the array foo starts at a base of 0xe0, and n is the index into that array
16:57 gdk: ah, thats it then, thanks
16:57 gdk: I mean an immediate shifted by another doesn't make sense, the compiler would have optimized that :P
16:58 imirkin_: not always. but usually.
16:59 gdk: the only place where this makes sense is when the value doesn't fit on the immediate, as the immediate operands usually have less than 32 bits
17:02 pendingchaos: ping on https://lists.freedesktop.org/archives/mesa-dev/2018-May/194602.html
17:03 imirkin_: gdk: yeah, but compilers aren't perfect, so sometimes things are done after some opt passes
17:05 gdk: thank you so much for your help imirkin_, my problem is now solved
17:07 imirkin_: you're welcome
17:12 gdk: I have another puzzling shader here if you want to take a look
17:12 imirkin_: sure
17:12 imirkin_: (i've seen a lot of shaders...)
17:14 gdk: here: https://pastebin.com/FBdVSnHa
17:15 gdk: fset sets some float values on registers ($r3/$r6), and they are later used as integers, not floats
17:28 imirkin_: fset produces 0/-1 integers
17:28 imirkin_: fset bf produces boolean floats (i.e. 0/1.0)
17:28 imirkin_: fsetp produces predicates ($pN)
17:29 imirkin_: so basically fset + i2i abs == compare + bool -> int
17:31 imirkin_: (since bool -> int wants 0/1, not 0/-1)
17:31 gdk: so fset produces 0 or 0xffffffff?
17:31 imirkin_: this could just as well work as fset bf + f2i, of course
17:31 imirkin_: correct.
17:31 imirkin_: all the sets do
17:32 imirkin_: fset, iset, dset, pset
17:32 imirkin_: (cset, if such a thing exists? might only be a csetp)
18:23 HdkR: <3 flexible set instructions
18:24 imirkin_: oh and of course hset2
18:25 imirkin_: which actually has a few output modes
18:25 imirkin_: as i don't have the hw, no clue how they all work
18:26 imirkin_: pendingchaos: btw, funny story - apparently SULD.P is a thing on maxwell+ -- there's an ext to expose it, and i wonder if we shouldn't always just use SULD.P.
18:26 imirkin_: if you're interested, happy to tell you more.
18:26 HdkR: imirkin_: Sounds like you need an X1 :P
18:26 imirkin_: actually i need a GT 1030
18:26 imirkin_: coz then i could fix up HDMI 2.0
18:27 imirkin_: but they're way too expensive
18:27 HdkR: pfft, that's practical things to work on
18:27 imirkin_: i might be willing to go as high as $20 for one, but, meh
18:27 HdkR: lol
18:27 imirkin_: that's already 2x my usual per-card budget
18:28 imirkin_: but you know, for twice the hdmi, might be worth it :)
18:29 imirkin_: but they're still at like $80-$90, so wtvr
18:30 HdkR: wtf, why are they $20 more than they should be? Stupid GPU market
18:30 imirkin_: you mean $60 more than they should be?
18:30 HdkR: well fine, $20 more than the MSRP :P
19:34 pendingchaos: imirkin_: by changing the TIC's format to match that of the image binding, instead of that of the texture?
19:35 imirkin_: pendingchaos: the TIC should have the format anyways
19:35 imirkin_: but currently we use SULD.B (since that was the only option on kepler and earlier)
19:35 imirkin_: and then bit-hack
19:36 imirkin_: using SULD.P would enable us to not have to specify format in shader, e.g. for GL_EXT_shader_image_load_formatted
19:38 juri_: hmm. i have ten NVS310 cards, and they have four different bios revisions. *ponders*
19:40 imirkin_: congratulations?
19:43 pendingchaos: "the TIC should have the format anyways" I'm not sure I understand your response
19:43 pendingchaos: When you said "i wonder if we shouldn't always just use SULD.",
19:43 pendingchaos: I was asking if doing so would be done by switching from SULD.B to SULD.P for all image loads on GM107+ and ensuring the format in the TIC matches that of the image binding (instead of the texture's format)
19:43 juri_: maybe. i still have yet to get one line of my haskell running on them. :(
19:44 imirkin_: pendingchaos: the TIC's format should be correct already
19:44 imirkin_: pendingchaos: but currently we use SULD.B. perhaps we should just use SULD.P on maxwell+
19:51 pendingchaos: ah I misread the source
19:51 pendingchaos: I thought the TIC was set to the texture's format, not that that of the image binding
19:51 pendingchaos: I suppose that would make sense, since image stores are typed
20:13 HdkR: imirkin_: Woo free format interpretation? No need to bit swizzle things anymore? :P
20:33 imirkin_: HdkR: well, presumably it's happening somewhere
20:33 imirkin_: nothing in life is free
20:33 imirkin_: and i suppose it depends what you do with the data
20:57 HdkR: imirkin_: Sure, just free from the perspective of the shader code :P