00:34 feaneron: what's the difference between nvc0 and nv50?
00:34 feaneron: why are they split?
00:34 imirkin_: tesla vs fermi+
00:34 imirkin_: they changed pushbuf formats
00:35 imirkin_: and also rearranged a lot of methods in the 3d class
00:35 imirkin_: fermi+ has been largely stable
00:36 feaneron: hm, nv50 is reasonably old then
00:36 imirkin_: reasonably
00:36 imirkin_: not as old as nv30
00:37 imirkin_: or as the ENIAC
00:39 feaneron: sure :)
00:42 feaneron: i also see in https://mesamatrix.net/ that nvc0 implements all gl extensions up to 4.5, but (and i don't know if i'm doing something wrong) it reports support for 4.3
00:43 imirkin_: all correct.
00:43 feaneron: i suppose there's an important reason for that, right? heh
00:43 imirkin_: KHR said they'll get mad if you advertise OpenGL 4.4+ without submitting (passing) conformance test results
00:44 imirkin_: (this might come as some surprise, but we don't pass the conformance testsuite)
00:44 feaneron: hmmmm... is that piglit?
00:44 imirkin_: no
00:44 imirkin_: they have their own. they open-sourced most of it
00:44 imirkin_: https://github.com/KhronosGroup/VK-GL-CTS
00:45 feaneron: fantastic, thanks imirkin_
00:45 feaneron: apologies is my questions are... too easy
00:45 imirkin_: you may be interested in this too: https://trello.com/b/lfM6VGGA/nouveau-cts
00:46 imirkin_: i like easy questions. those i can actually answer.
00:48 feaneron: well, apparently https://trello.com/c/XrBM08pB/15-https-bugsfreedesktoporg-showbugcgiid103140 is fixed
00:50 imirkin_: it's not perfectly maintained...
00:52 feaneron: didn't assume that :) sorry if it sounded finger-pointing
00:52 imirkin_: nope
00:52 imirkin_: not at all
00:53 feaneron:knows the gory details about maintaining things
06:56 almeida1: https://github.com/VerticalResearchGroup/miaow/blob/6ed88045a2dacd0314594e48900c5b3d851e74dd/src/verilog/rtl/alu/alu_controller.v
06:57 almeida1: here is part of the magic shown...another part is in vgpr.v
06:57 almeida1: to eloborate readbacks happen via instances in parallel ..while opcode flops get serialized
06:58 almeida1: obviously cause there is not possible to have that amount of alus
06:59 almeida1: https://github.com/VerticalResearchGroup/miaow/blob/6ed88045a2dacd0314594e48900c5b3d851e74dd/src/verilog/rtl/vgpr/reg_1024x32b_3r_1w.v
07:00 almeida1: there are the last read address flops cached
07:01 almeida1: https://github.com/VerticalResearchGroup/miaow/blob/6ed88045a2dacd0314594e48900c5b3d851e74dd/src/verilog/rtl/vgpr/vgpr_2to1_rd_port_mux.v and this is the mux of 2048 read instances in parallel
07:19 almeida1: so phase1 readbacks go over the mux rd_en signal, then execute is pullued in a cycle later, which accesses the regfile slightly below the mux
07:20 almeida1: btw. it is the only real way it can ever work
07:21 almeida1: if the reg in regfile has spotted a change, the alu will be programmed from issue flops, the alu bus
07:21 almeida1: if not it will not happen and inst. will get skipped, the opcode is zerod and goes to new iteration
07:24 almeida1: the previous add rd0 last stuff, was for pointer accesses, but the flop of changed reg detection is in 256b file
07:25 almeida1: meaning it's a flop that keeps also two states
07:26 almeida1: i entirely agree with the flow, allthough in miaow case it can be somewhat slimmed down...still the hw designers have done good job even in this miaow case
07:27 almeida1: all the methodical spec of that is available starting from fragment program arb enabled gpus
07:27 almeida1: which support texture stuff, aka arrays
07:32 almeida1: so whoever wants to argue, i can say determinndly that every feature incudling scheduling can be implemented with pointers, including atomics, case there are also scalar units
07:32 almeida1: and scheduling is still as mentioned times ago, just pathetic amount of 5instruction in the queue
10:28 karolherbst: imirkin_: so, final results: total instructions in shared programs : 5581871 -> 5558451 (-0.42%) total gprs used in shared programs : 646834 -> 645989 (-0.13%) :)
10:28 karolherbst: imirkin_: mind taking a look at those patches at https://github.com/karolherbst/mesa/commits/opt_codegen_slcts ?
10:28 karolherbst: I will run piglit and some affected games later that day to verify I didn't mess up
10:29 karolherbst: there are still some set+slcts pairs left, but those result from ConstantFolding converting slcts->set and AlgebraicOpt doesn't get a chance to clean it up again
10:30 karolherbst: and I could also opt unordered sets + slcts, but for that I am not 100% sure how unordered comparison work, so I left it for now
10:36 karolherbst: imirkin_: maybe we should split AlgebraicOpts internally into opts which handle modifiers and run it a second time after ConstantFolding but skipping those which don't handle modifiers
10:36 karolherbst: I expect a ~2% improvement in avg
10:44 karolherbst: mhh, only 0.03% in svg, sad
10:55 karolherbst: mhh, forgot to increase foldCount, so even more improvements after that, nice
13:16 karolherbst: imirkin: ohh by the way, that spilling thing :)
13:16 karolherbst: I think we should sort that out _before_ 18.2 :)
13:17 imirkin: "that spilling thing"
13:17 imirkin: which spilling thing is that?
13:18 karolherbst: that nv50 tex patch of yours which broke spilling on kepler
13:18 karolherbst: or at least for the tests I was testing against
13:20 karolherbst: imirkin: https://github.com/karolherbst/mesa/commit/008c5c32d6dae46c07a072092c1f1cbc75537081
13:20 imirkin: fck
13:21 imirkin: fine. reviewed-by: me
13:21 imirkin: although i'd rather you change the subject
13:21 imirkin: to something like
13:22 karolherbst: sure it doesn't regress the fix? You should check that or tell me how to check this, just to be sure
13:22 imirkin: nv50/ir: only avoid spilling constrained def if a mov away is added
13:22 imirkin: s/away//
13:22 imirkin: yeah, 99% sure it doesn't matter, but i'll test it tonight
13:22 imirkin: just push it though
13:22 imirkin: if it's broken, we'll fix more.
13:23 karolherbst: I can wait until tomorrow to push it, if user didn't complain until today, they probably won't tomorrow :)
13:23 sigod_: what's this about kepler?
13:24 karolherbst: only 63 regs
13:24 karolherbst: so spilling happens more likely
13:24 karolherbst: well and fermi actually as well
13:24 karolherbst: kepler2+ have 255 regs, so it really doesn't matter
13:25 imirkin: except for compute and lots of invocs
13:25 karolherbst: right
13:25 karolherbst: it is just super rare
13:25 karolherbst: imirkin: but I thought kepler2 has enough regs in total anyway?
13:26 karolherbst: ohh, indeed
13:26 karolherbst: same amount of total regs per thread block
13:27 karolherbst: GK210 has x2 though LD
13:27 karolherbst: :D
13:27 imirkin: i'll believe it when i see it.
13:27 karolherbst: and more shared memory
13:27 karolherbst: SM3.7 = GK210
13:27 karolherbst: right
13:27 karolherbst: like if somebody would run nouveau on a gk210
13:28 karolherbst: imirkin: seems like your original commit indeed got into 18.1?
13:28 imirkin: no clue
13:28 imirkin: i just tell people who have problems to build HEAD
13:29 karolherbst: ohh wait
13:29 karolherbst: no
13:29 karolherbst: it was after the branch
13:29 imirkin: git tag --contains
13:29 karolherbst: I simply checked the release date
13:36 karolherbst: imirkin: uhm.. which test profile do you use with piglit now?
13:38 karolherbst: ohh, I just have to write "gpu", not tests/gpu.py
16:33 karolherbst: pendingchaos: no regressions on a gk208 with https://cgit.freedesktop.org/mesa/mesa/commit/src/gallium/drivers/nouveau?id=66ca7e400b8cf736943feddafef7f76adabf9120
16:34 pendingchaos: cool
16:34 karolherbst: piglit doesn't catch everything, but at least is a good sign nethertheless
16:36 imirkin_: deqp had much more thorough interp tests
16:37 karolherbst: true
16:40 karolherbst: but I will get back to work more on the CTS anyway, and there should be plenty of related tests. it was just as a quick check that that commit didn't break anything, because otherwise we would have reverted it
16:40 karolherbst: I think
16:41 karolherbst: imirkin_: do you know anything about that funny "ioremap reserve_memtype failed -16" thing?
16:42 imirkin_: what's -16?
16:42 karolherbst: ohh "x86/PAT: tex3d-maxsize:31916 conflicting memory types d0720000-f0720000 uncached-minus<->write-combining" and "x86/PAT: reserve_memtype failed [mem 0xd0720000-0xf071ffff], track write-combining, req write-combining" before that
16:42 karolherbst: 16 is EBUSY
16:43 karolherbst: random error code which doesn't seem to make any sense?
16:43 mwk: "this piece of memory is currently busy being reserved in this other mode"?
16:44 karolherbst: mwk: do those mode ever change anyway? from my experience those are rather static in general
16:44 karolherbst: not generally though
16:44 karolherbst: only if it comes to device memory
16:45 mwk: *shrug*
16:45 mwk: don't know much about them
16:46 karolherbst: I know that PAT is a fancy way of doing stuff you used MTRR before or something
16:48 karolherbst: pendingchaos: by the way, if you have some time and want to spend time also reviewing others patches, you could take a look at my slct optimization patches: https://github.com/karolherbst/mesa/commits/opt_codegen_slcts
16:48 karolherbst: I am never sure I actually did the correct thing here
17:47 pendingchaos: karolherbst: thoughts about the first patch: https://hastebin.com/episobixiy.txt
17:48 karolherbst: I actually caused piglit regressions with my patches :( *sigh*
17:50 karolherbst: pendingchaos: thanks
18:08 pendingchaos: I don't see anything wrong with the second and third though the use of swapSources() in the third is a bit confusing IMO
18:08 pendingchaos: I think it could just be changed into "slct->setSrc(1, bld.mkImm(0));" or "slct->setSrc(1, slct->getSrc(0));"
18:08 pendingchaos:disappears for food
18:44 pendingchaos: what are all the cc codes after CC_ALWAYS/CC_TR?
18:47 pendingchaos: (the CC_*U ones)
18:49 HdkR: pendingchaos: unordered?
18:50 karolherbst: pendingchaos: some nasty NaN stuff
18:52 pendingchaos: like changing what "foo < NaN" results in (previously false iirc)?
18:53 karolherbst: yeah
18:53 pendingchaos: what would it result in with e.g. CC_LTU?
18:54 karolherbst: I think it also makes 1.0 == 1.0 false for unordered compares
18:54 karolherbst: pendingchaos: mhh I think it looks at the NaN bits more closely
18:54 karolherbst: but.. I can only guess, never actually checked what those do
18:54 karolherbst: NaN is quite a big range of values in the end
18:55 pendingchaos:nods
18:55 karolherbst: like if you compare 1.0/0 and 4.0/0 the result should be different afaik
18:55 karolherbst: I mean the produced NaNs
18:55 karolherbst: and you can compare those as well
18:55 karolherbst: uhm
18:55 karolherbst: allthough I guess that returns inf
18:56 pendingchaos: also on the first patch: I think the line 1911 case could be merged with the previous ones by handling CC_EQ and CC_NEQ
18:57 karolherbst: pendingchaos: well I will look at those piglit fails first and check if I get any regressions inside games, but yeah, I think there are some bugs and some code can be reranged after taking care of those
18:57 karolherbst: also set.neu + slct.ne came up in a few shaders
18:59 mwk: as for U
18:59 mwk: the way floating-point comparison works, it can give 4 different results
18:59 mwk: L aka less than, E aka equal, G aka greater than, U aka unordered
19:00 mwk: if any of the two compared numbers is a NaN, you get U; otherwise you get L if a < b, E if a == b, G if a > b
19:00 mwk: the cc codes are simply a list of matching conditions
19:00 mwk: so CC_LT means "true if comprison was L", while CC_LTU means "true if comparison was L or U"
19:02 pendingchaos:nods
19:03 karolherbst: okay, that makes more sense
19:03 pendingchaos: thanks
19:03 karolherbst: mwk: I guess there is also a CC_U?
19:03 karolherbst: ahh yeah, there is
19:04 mwk: yep
19:04 karolherbst: what are those O C S A things? in CondCode?
19:04 mwk: umm
19:04 mwk: link?
19:05 pendingchaos: https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/codegen/nv50_ir.h#n298
19:05 karolherbst: https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/codegen/nv50_ir.h#n298 but I could also check the code
19:05 mwk: ah, those
19:05 mwk: those are for integers
19:06 mwk: and are only used with the actual CC register
19:06 karolherbst: I see
19:07 mwk: G80 and up have a condition code register
19:07 mwk: like most CPUs
19:07 mwk: with 4 bits: overflow, sign, carry, zero
19:07 mwk: they have pretty much the same semantics as they do on CPUs
19:07 mwk: for integers
19:08 mwk: O/NO are "overflow flag set/not set", same for C/NC and S/NS
19:09 mwk: A is "above" which expands to !Z && C or something like that
19:09 karolherbst: I see
19:09 mwk: for floating-point, O and C are always 0, while S and Z encode the condition code in a manner compatible with integer comparisons, so you can use the same cond codes
19:11 mwk: but um
19:11 mwk: that condcode enum looks quite worrying to me
19:11 mwk: it's missing a "!U" code
19:11 mwk: which is 7 in hardware
19:12 mwk: while CC_ALWAYS is 0xf in hardware
19:20 karolherbst: pendingchaos: actually I wrote the wrong comments in the patch and fixed that up later
19:21 karolherbst: pendingchaos: so yeah, for slct.eq I probably have to use reverse()
19:21 karolherbst: but hence I only do the opts for slct.ne, I am probably fine :)
19:21 karolherbst: pendingchaos: updated: https://github.com/karolherbst/mesa/commit/0b7c7f3b6202c5834e28a09016707edac7b8e5e0
21:42 mwk: who in here is familiar with pre-K1 Tegra GPUs?
21:43 imirkin_: tagr
21:43 HdkR: kusma would be a good target as well right? :)
21:43 imirkin_: also some dude is sending a bunch of patches bringing up video decoding stuff on them
21:43 imirkin_: yeah, but he's not here ;)
21:43 karolherbst: mwk: those GPUs are nothing like nvidia gpus, right?
21:43 imirkin_: they're a lot like nv40 afaik
21:44 karolherbst: really?
21:44 mwk: sort of
21:44 imirkin_: more like nv40 than, say, adreno ;)
21:44 karolherbst: okay, maybe there is resemblence, but they still different enough, right?
21:44 mwk: they seem to have the vertex shaders copied straight from NV40
21:44 karolherbst: ahh
21:44 mwk: but that seems to be all
21:44 imirkin_: they would not be easily supportable in nouveau
21:45 karolherbst: yeah, requiring a new driver is enough to say those are different enough :)
21:45 mwk: and I have a hypothesis that the pixel shaders might just happen to be the same thing as NV40
21:45 karolherbst: mwk: as in same architecture or even as in same ISA?
21:45 imirkin_: mwk: kusma is in #dri-devel
21:46 imirkin_: he did a bunch of the grate work afaik
21:46 mwk: except that you program them directly in their internal VLIW microcode, instead of having an ISA decoder
21:46 karolherbst: uhhhh
21:46 karolherbst: that sounds... annoying?
21:46 mwk: yeah well
21:46 mwk: from what I've read about Tegra pixel shaders, they sound very annoying
21:46 mwk: so yeah
21:47 imirkin_: mwk: you probably know about https://github.com/grate-driver
21:47 mwk: yes
21:47 imirkin_: https://github.com/grate-driver/grate/wiki/Fragment-Shader-ISA
21:48 karolherbst: actually, this looks like fun
21:48 mwk: see? it's horrible in many ways :)
21:48 karolherbst: :D
21:49 karolherbst: what a luck I don't own such hardware
21:49 imirkin_: otherwise you might be tempted to make it work? :)
21:49 mwk: also as for video decoding / encoding
21:49 karolherbst: I don't know if "work" is the right term here :D
21:50 mwk: AFAICT Tegras have "loose" Falcon-equipped video decode/encode engines similar to desktop GPUs
21:50 mwk: maybe even the same thing
21:50 mwk: but they're, technically, not part of the GPU
21:51 mwk: matter of fact, Tegras have sort of two GPUs
21:51 mwk: GR2D and GR3D
21:51 imirkin_: you should talk to digetx about it if you're interested
21:51 imirkin_: he seems to be on top of it all
21:52 mwk: sounds good
21:52 mwk: right now I'm in the process of figuring which way is up
21:54 imirkin_: start with "down" :)
21:55 mwk: also, an unrelated question
21:55 mwk: there's a Falcon firmware file format
21:55 mwk: that starts with the bytes 'de 10'
21:55 mwk: do we know anything about it?
21:56 imirkin_: 10de is the nvidia device id...
21:56 mwk: it is
21:57 mwk: but it also seems to be a signature for the android falcon firmware files
21:57 imirkin_: https://github.com/envytools/firmware/blob/master/scanner.go#L98
21:57 imirkin_: that's the only format i know.
21:58 mwk: mmh, that one
21:58 mwk: I know that one, and it doesn't match
21:58 mwk: it's quite specific to pgraph, anyhow
21:58 imirkin_: right
22:16 karolherbst: imirkin_: is there any CondCode slct can't accept, but set can? (except those for the U32 case)
22:19 karolherbst: imirkin_: because in the end the entire opt should boil down to something like that: https://github.com/karolherbst/mesa/commit/a6ec6b6368d74a431528587188599f9d0c7db4f3
22:19 karolherbst: which is kind of easy to understand at the same time (compared to what I had before)
22:20 imirkin_: tons
22:20 imirkin_: iirc slct can only accept a handful
22:20 karolherbst: mhh
22:20 karolherbst: well the emiter seems to be happy though
22:20 imirkin_: e.g. only eq/ne for u32
22:20 karolherbst: okay, right, but what about unordered for s32 and f32?
22:21 karolherbst: at least the emitter is happy about all that :(
22:21 imirkin_: doubtful, dunno
22:21 imirkin_: check the emitted data
22:21 karolherbst: well, thing is, it doesn't really come up with shader-db, except NEU.. once
22:22 karolherbst: but yeah, I could check with nvdisasm what it supports at least
22:26 karolherbst: imirkin_: mhh /*0000*/ @P0 FCMP.NEU R0, R0, R0, R0; /* 0x3e80000000000000 */
22:27 karolherbst: @P0 FCMP.LEU R0, R0, R0, R0; /* 0x3d80000000000000 */
22:29 karolherbst: okay, for ICMP there are no U variants
22:30 karolherbst: at least according to nvdisasm
22:31 karolherbst: mwk: do you know something about that? FCMP supporting unordered condCodes, but ICMP not?
22:31 mwk: uh, obviously
22:31 mwk: integers have no NaNs, so there's no use for unordered codes...
22:32 karolherbst: ohhhh
22:32 karolherbst: silly me
22:32 imirkin_: karolherbst: hm, maybe FCMP/ICMP support everything. dunno.
22:32 imirkin_: should check what the nv50 situation is
22:33 karolherbst: yeah well, my nvdisasm is too new :(
22:33 mwk: envydis is pretty much rock-solid for g80 though
22:33 karolherbst: ahh
22:33 karolherbst: mwk: but do you know anything about fcmp not supporting unordered condcodes?
22:33 karolherbst: imirkin_: nv50 doesn't know slct :)
22:34 imirkin_: right ok
22:34 karolherbst: sooo the code lowering that stuff has to deal with that
22:34 mwk: not really
22:34 mwk: fcmp should support unordered
22:34 karolherbst: okay
22:34 mwk: g80 does have slct btw
22:35 karolherbst: really?
22:35 mwk: not a particularly expressive one, but there is such instruction
22:35 karolherbst: ahh, I see
22:35 karolherbst: I guess it was easier to lower everything instead
22:35 karolherbst: yeah okay, then this makes my code wonderfully easy
22:35 karolherbst: uhm, simple
22:36 karolherbst: and on tesla we still get that original OP_SET removed
22:38 karolherbst: only thing left are those slct->set + slct pairs opted by ConstFolding :(
22:39 karolherbst: *left after
22:41 karolherbst: checking if there are other condcodes on slct with set(0, a) on src2
23:00 karolherbst: ahhh, there is a bug in my code
23:01 karolherbst: or... maybe it is okay?
23:09 karolherbst: imirkin_: also reminder on that patch: https://github.com/karolherbst/mesa/commit/008c5c32d6dae46c07a072092c1f1cbc75537081