00:34feaneron: what's the difference between nvc0 and nv50?
00:34feaneron: why are they split?
00:34imirkin_: tesla vs fermi+
00:34imirkin_: they changed pushbuf formats
00:35imirkin_: and also rearranged a lot of methods in the 3d class
00:35imirkin_: fermi+ has been largely stable
00:36feaneron: hm, nv50 is reasonably old then
00:36imirkin_: not as old as nv30
00:37imirkin_: or as the ENIAC
00:39feaneron: sure :)
00:42feaneron: i also see in https://mesamatrix.net/ that nvc0 implements all gl extensions up to 4.5, but (and i don't know if i'm doing something wrong) it reports support for 4.3
00:43imirkin_: all correct.
00:43feaneron: i suppose there's an important reason for that, right? heh
00:43imirkin_: KHR said they'll get mad if you advertise OpenGL 4.4+ without submitting (passing) conformance test results
00:44imirkin_: (this might come as some surprise, but we don't pass the conformance testsuite)
00:44feaneron: hmmmm... is that piglit?
00:44imirkin_: they have their own. they open-sourced most of it
00:45feaneron: fantastic, thanks imirkin_
00:45feaneron: apologies is my questions are... too easy
00:45imirkin_: you may be interested in this too: https://trello.com/b/lfM6VGGA/nouveau-cts
00:46imirkin_: i like easy questions. those i can actually answer.
00:48feaneron: well, apparently https://trello.com/c/XrBM08pB/15-https-bugsfreedesktoporg-showbugcgiid103140 is fixed
00:50imirkin_: it's not perfectly maintained...
00:52feaneron: didn't assume that :) sorry if it sounded finger-pointing
00:52imirkin_: not at all
00:53feaneron:knows the gory details about maintaining things
06:57almeida1: here is part of the magic shown...another part is in vgpr.v
06:57almeida1: to eloborate readbacks happen via instances in parallel ..while opcode flops get serialized
06:58almeida1: obviously cause there is not possible to have that amount of alus
07:00almeida1: there are the last read address flops cached
07:01almeida1: https://github.com/VerticalResearchGroup/miaow/blob/6ed88045a2dacd0314594e48900c5b3d851e74dd/src/verilog/rtl/vgpr/vgpr_2to1_rd_port_mux.v and this is the mux of 2048 read instances in parallel
07:19almeida1: so phase1 readbacks go over the mux rd_en signal, then execute is pullued in a cycle later, which accesses the regfile slightly below the mux
07:20almeida1: btw. it is the only real way it can ever work
07:21almeida1: if the reg in regfile has spotted a change, the alu will be programmed from issue flops, the alu bus
07:21almeida1: if not it will not happen and inst. will get skipped, the opcode is zerod and goes to new iteration
07:24almeida1: the previous add rd0 last stuff, was for pointer accesses, but the flop of changed reg detection is in 256b file
07:25almeida1: meaning it's a flop that keeps also two states
07:26almeida1: i entirely agree with the flow, allthough in miaow case it can be somewhat slimmed down...still the hw designers have done good job even in this miaow case
07:27almeida1: all the methodical spec of that is available starting from fragment program arb enabled gpus
07:27almeida1: which support texture stuff, aka arrays
07:32almeida1: so whoever wants to argue, i can say determinndly that every feature incudling scheduling can be implemented with pointers, including atomics, case there are also scalar units
07:32almeida1: and scheduling is still as mentioned times ago, just pathetic amount of 5instruction in the queue
07:35almeida1: i've been violated quite heavy with utter crap in my country, i will make all of them responsible for their shit trown at me
07:38almeida1: the beaty is there are also new ultra low voltage high density and cheap fabrication mram methods that i talked about, for such chip the circuits remain similar, but can be changed even though tools ara compatible
07:44almeida1: due to their lightning fast memory accesses lot of the circuit is shaved down, and many modules are just abondened, and fifo queues are filled without fetch, dispatch and schedule so to speak
07:49almeida1: i did read some about the technology, and to be honest, tech those days will go entirely insanely fast and cheap
07:50almeida1: enough to worry about, if there will morans filled in such channels in an amount like this, then robots take over the world
07:51almeida1: what kind of brain do you at all have, it's as small as a nut
07:51almeida1: as it appears probably
09:56almeida1: so basically i have finished my research, and i do not care about you at all, reality wise you are stupid and arrogant, but bans are somewhat different case than the real sanctionizers/violaters who i put a large effort soon to get them all killed
09:56almeida1: they need to whatever and however end up dead preferebly without me taking further sanctions
10:28karolherbst: imirkin_: so, final results: total instructions in shared programs : 5581871 -> 5558451 (-0.42%) total gprs used in shared programs : 646834 -> 645989 (-0.13%) :)
10:28karolherbst: imirkin_: mind taking a look at those patches at https://github.com/karolherbst/mesa/commits/opt_codegen_slcts ?
10:28karolherbst: I will run piglit and some affected games later that day to verify I didn't mess up
10:29karolherbst: there are still some set+slcts pairs left, but those result from ConstantFolding converting slcts->set and AlgebraicOpt doesn't get a chance to clean it up again
10:30karolherbst: and I could also opt unordered sets + slcts, but for that I am not 100% sure how unordered comparison work, so I left it for now
10:36karolherbst: imirkin_: maybe we should split AlgebraicOpts internally into opts which handle modifiers and run it a second time after ConstantFolding but skipping those which don't handle modifiers
10:36karolherbst: I expect a ~2% improvement in avg
10:44karolherbst: mhh, only 0.03% in svg, sad
10:55karolherbst: mhh, forgot to increase foldCount, so even more improvements after that, nice
13:16karolherbst: imirkin: ohh by the way, that spilling thing :)
13:16karolherbst: I think we should sort that out _before_ 18.2 :)
13:17imirkin: "that spilling thing"
13:17imirkin: which spilling thing is that?
13:18karolherbst: that nv50 tex patch of yours which broke spilling on kepler
13:18karolherbst: or at least for the tests I was testing against
13:20karolherbst: imirkin: https://github.com/karolherbst/mesa/commit/008c5c32d6dae46c07a072092c1f1cbc75537081
13:21imirkin: fine. reviewed-by: me
13:21imirkin: although i'd rather you change the subject
13:21imirkin: to something like
13:22karolherbst: sure it doesn't regress the fix? You should check that or tell me how to check this, just to be sure
13:22imirkin: nv50/ir: only avoid spilling constrained def if a mov away is added
13:22imirkin: yeah, 99% sure it doesn't matter, but i'll test it tonight
13:22imirkin: just push it though
13:22imirkin: if it's broken, we'll fix more.
13:23karolherbst: I can wait until tomorrow to push it, if user didn't complain until today, they probably won't tomorrow :)
13:23sigod_: what's this about kepler?
13:24karolherbst: only 63 regs
13:24karolherbst: so spilling happens more likely
13:24karolherbst: well and fermi actually as well
13:24karolherbst: kepler2+ have 255 regs, so it really doesn't matter
13:25imirkin: except for compute and lots of invocs
13:25karolherbst: it is just super rare
13:25karolherbst: imirkin: but I thought kepler2 has enough regs in total anyway?
13:26karolherbst: ohh, indeed
13:26karolherbst: same amount of total regs per thread block
13:27karolherbst: GK210 has x2 though LD
13:27imirkin: i'll believe it when i see it.
13:27karolherbst: and more shared memory
13:27karolherbst: SM3.7 = GK210
13:27karolherbst: like if somebody would run nouveau on a gk210
13:28karolherbst: imirkin: seems like your original commit indeed got into 18.1?
13:28imirkin: no clue
13:28imirkin: i just tell people who have problems to build HEAD
13:29karolherbst: ohh wait
13:29karolherbst: it was after the branch
13:29imirkin: git tag --contains
13:29karolherbst: I simply checked the release date
13:36karolherbst: imirkin: uhm.. which test profile do you use with piglit now?
13:38karolherbst: ohh, I just have to write "gpu", not tests/gpu.py
16:33karolherbst: pendingchaos: no regressions on a gk208 with https://cgit.freedesktop.org/mesa/mesa/commit/src/gallium/drivers/nouveau?id=66ca7e400b8cf736943feddafef7f76adabf9120
16:34karolherbst: piglit doesn't catch everything, but at least is a good sign nethertheless
16:36imirkin_: deqp had much more thorough interp tests
16:40karolherbst: but I will get back to work more on the CTS anyway, and there should be plenty of related tests. it was just as a quick check that that commit didn't break anything, because otherwise we would have reverted it
16:40karolherbst: I think
16:41karolherbst: imirkin_: do you know anything about that funny "ioremap reserve_memtype failed -16" thing?
16:42imirkin_: what's -16?
16:42karolherbst: ohh "x86/PAT: tex3d-maxsize:31916 conflicting memory types d0720000-f0720000 uncached-minus<->write-combining" and "x86/PAT: reserve_memtype failed [mem 0xd0720000-0xf071ffff], track write-combining, req write-combining" before that
16:42karolherbst: 16 is EBUSY
16:43karolherbst: random error code which doesn't seem to make any sense?
16:43mwk: "this piece of memory is currently busy being reserved in this other mode"?
16:44karolherbst: mwk: do those mode ever change anyway? from my experience those are rather static in general
16:44karolherbst: not generally though
16:44karolherbst: only if it comes to device memory
16:45mwk: don't know much about them
16:46karolherbst: I know that PAT is a fancy way of doing stuff you used MTRR before or something
16:48karolherbst: pendingchaos: by the way, if you have some time and want to spend time also reviewing others patches, you could take a look at my slct optimization patches: https://github.com/karolherbst/mesa/commits/opt_codegen_slcts
16:48karolherbst: I am never sure I actually did the correct thing here
17:47pendingchaos: karolherbst: thoughts about the first patch: https://hastebin.com/episobixiy.txt
17:48karolherbst: I actually caused piglit regressions with my patches :( *sigh*
17:50karolherbst: pendingchaos: thanks
18:08pendingchaos: I don't see anything wrong with the second and third though the use of swapSources() in the third is a bit confusing IMO
18:08pendingchaos: I think it could just be changed into "slct->setSrc(1, bld.mkImm(0));" or "slct->setSrc(1, slct->getSrc(0));"
18:08pendingchaos:disappears for food
18:44pendingchaos: what are all the cc codes after CC_ALWAYS/CC_TR?
18:47pendingchaos: (the CC_*U ones)
18:49HdkR: pendingchaos: unordered?
18:50karolherbst: pendingchaos: some nasty NaN stuff
18:52pendingchaos: like changing what "foo < NaN" results in (previously false iirc)?
18:53pendingchaos: what would it result in with e.g. CC_LTU?
18:54karolherbst: I think it also makes 1.0 == 1.0 false for unordered compares
18:54karolherbst: pendingchaos: mhh I think it looks at the NaN bits more closely
18:54karolherbst: but.. I can only guess, never actually checked what those do
18:54karolherbst: NaN is quite a big range of values in the end
18:55karolherbst: like if you compare 1.0/0 and 4.0/0 the result should be different afaik
18:55karolherbst: I mean the produced NaNs
18:55karolherbst: and you can compare those as well
18:55karolherbst: allthough I guess that returns inf
18:56pendingchaos: also on the first patch: I think the line 1911 case could be merged with the previous ones by handling CC_EQ and CC_NEQ
18:57karolherbst: pendingchaos: well I will look at those piglit fails first and check if I get any regressions inside games, but yeah, I think there are some bugs and some code can be reranged after taking care of those
18:57karolherbst: also set.neu + slct.ne came up in a few shaders
18:59mwk: as for U
18:59mwk: the way floating-point comparison works, it can give 4 different results
18:59mwk: L aka less than, E aka equal, G aka greater than, U aka unordered
19:00mwk: if any of the two compared numbers is a NaN, you get U; otherwise you get L if a < b, E if a == b, G if a > b
19:00mwk: the cc codes are simply a list of matching conditions
19:00mwk: so CC_LT means "true if comprison was L", while CC_LTU means "true if comparison was L or U"
19:03karolherbst: okay, that makes more sense
19:03karolherbst: mwk: I guess there is also a CC_U?
19:03karolherbst: ahh yeah, there is
19:04karolherbst: what are those O C S A things? in CondCode?
19:05karolherbst: https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/codegen/nv50_ir.h#n298 but I could also check the code
19:05mwk: ah, those
19:05mwk: those are for integers
19:06mwk: and are only used with the actual CC register
19:06karolherbst: I see
19:07mwk: G80 and up have a condition code register
19:07mwk: like most CPUs
19:07mwk: with 4 bits: overflow, sign, carry, zero
19:07mwk: they have pretty much the same semantics as they do on CPUs
19:07mwk: for integers
19:08mwk: O/NO are "overflow flag set/not set", same for C/NC and S/NS
19:09mwk: A is "above" which expands to !Z && C or something like that
19:09karolherbst: I see
19:09mwk: for floating-point, O and C are always 0, while S and Z encode the condition code in a manner compatible with integer comparisons, so you can use the same cond codes
19:11mwk: but um
19:11mwk: that condcode enum looks quite worrying to me
19:11mwk: it's missing a "!U" code
19:11mwk: which is 7 in hardware
19:12mwk: while CC_ALWAYS is 0xf in hardware
19:20karolherbst: pendingchaos: actually I wrote the wrong comments in the patch and fixed that up later
19:21karolherbst: pendingchaos: so yeah, for slct.eq I probably have to use reverse()
19:21karolherbst: but hence I only do the opts for slct.ne, I am probably fine :)
19:21karolherbst: pendingchaos: updated: https://github.com/karolherbst/mesa/commit/0b7c7f3b6202c5834e28a09016707edac7b8e5e0
19:22almeida1: can't help not thinking if there is something that you did still not understand:(? you thought i was bluffing, i mean who are you aside from mental illness experts who get humiliated in real work, what else can you do?
19:22karolherbst: also modifiers are done _after_ algebraicOpt :)
19:22karolherbst: I probably regret this, but...
19:23karolherbst: almeida1: don't use that kind of language to anybody
19:23almeida1: so miaow has veriperl code according to signal to issue a pointer
19:24almeida1: karolherbst: i do not understand, you have violated my rights most the time, and banned me for telling the truth!
19:24almeida1: how's that me being a bad guy here overall?
19:25karolherbst: almeida1: I won't discuss that. Telling somebody or a group of people that they are "mental illness experts who get humiliated in real work". Just don't.
19:25karolherbst: and this applies to everybody speaking here
19:29almeida1: than why do you keep telling me that i am mentally ill, of course you can not tell that i am physically ill i understand? why are those nonsense estonians ruining my life, they get their penalties cause of that, do you have any information on this what you tried to share?
19:29karolherbst: pendingchaos: ohh wait, I actually enable that for slct.eq as well
19:31almeida1: well let's make you some example , well vcs the synopsys or cadence or whatever verilog compiler, it merges compilation units together by default, and there are two basic stages in the issue ps alu flops of opcode, the lsu opcode which comes in the state of lsu_select, and alu_flops
19:32almeida1: this is hardcoded logic that miaow uses mostly that flops have two states alternating
19:33karolherbst: almeida1: I didn't and whoever did shouldn't except they are experts and were able to take a look at you in the appropiate setting. And If I ever told you that I am sorry for this.
19:33almeida1: it's a bit longer story, since i know most the code how it works there, it takes couple of days to have this discussion for ensurance, that my ideas are most likely 96percent going to work easily
19:35HdkR: Sounds like you should be a hardware designer. Too bad nobody here is designing the hardware.
19:35almeida1: i can confirm that to 98 percent success prediction, when also simulating the logic though
19:36almeida1: it can be run with sinmulator for instance the input_flops and fifo's and see what it does though, but easier is to code and test straight for the real gpu
19:38almeida1: you know i have had very difficult life in some sense , cause i am largely hated guy, but the things you should not expect anyone giving anything to you without hard work, i have to do the same
19:46almeida1: so anyways programming is planX to earn something and probably i won't make it there, since i was strategically very motivated in youth ages, and conpirators struck me off my feet , what is the point to conflict on irc channel i can not see at all btw.
19:56almeida1: what i am saying i have previously done run-up of some technologies, that noone knows about in foreign countries, and i did not get even respect but only shit spilled on me
20:00almeida1: so anyhow, as you saw from the files, it keeps and flops the last pointer address and exchanges in between those
20:00almeida1: i looked at compiler of glsl about whole-array spec, since i have not read all the glsl spec itself
20:00almeida1: those are on khronos page
20:02almeida1: judging by only the appareance of the verilog and glsl code there, i think the handler to change the reg into array, should be something like mov reg4 regpinter, or something very primitive close to this
20:02almeida1: need to test
21:42mwk: so um
21:42mwk: who in here is familiar with pre-K1 Tegra GPUs?
21:43HdkR: kusma would be a good target as well right? :)
21:43imirkin_: also some dude is sending a bunch of patches bringing up video decoding stuff on them
21:43imirkin_: yeah, but he's not here ;)
21:43karolherbst: mwk: those GPUs are nothing like nvidia gpus, right?
21:43imirkin_: they're a lot like nv40 afaik
21:44mwk: sort of
21:44imirkin_: more like nv40 than, say, adreno ;)
21:44karolherbst: okay, maybe there is resemblence, but they still different enough, right?
21:44mwk: they seem to have the vertex shaders copied straight from NV40
21:44mwk: but that seems to be all
21:44imirkin_: they would not be easily supportable in nouveau
21:45karolherbst: yeah, requiring a new driver is enough to say those are different enough :)
21:45mwk: and I have a hypothesis that the pixel shaders might just happen to be the same thing as NV40
21:45karolherbst: mwk: as in same architecture or even as in same ISA?
21:45imirkin_: mwk: kusma is in #dri-devel
21:46imirkin_: he did a bunch of the grate work afaik
21:46mwk: except that you program them directly in their internal VLIW microcode, instead of having an ISA decoder
21:46karolherbst: that sounds... annoying?
21:46mwk: yeah well
21:46mwk: from what I've read about Tegra pixel shaders, they sound very annoying
21:46mwk: so yeah
21:47imirkin_: mwk: you probably know about https://github.com/grate-driver
21:48karolherbst: actually, this looks like fun
21:48mwk: see? it's horrible in many ways :)
21:49karolherbst: what a luck I don't own such hardware
21:49imirkin_: otherwise you might be tempted to make it work? :)
21:49mwk: also as for video decoding / encoding
21:49karolherbst: I don't know if "work" is the right term here :D
21:50mwk: AFAICT Tegras have "loose" Falcon-equipped video decode/encode engines similar to desktop GPUs
21:50mwk: maybe even the same thing
21:50mwk: but they're, technically, not part of the GPU
21:51mwk: matter of fact, Tegras have sort of two GPUs
21:51mwk: GR2D and GR3D
21:51imirkin_: you should talk to digetx about it if you're interested
21:51imirkin_: he seems to be on top of it all
21:52mwk: sounds good
21:52mwk: right now I'm in the process of figuring which way is up
21:54imirkin_: start with "down" :)
21:55mwk: also, an unrelated question
21:55mwk: there's a Falcon firmware file format
21:55mwk: that starts with the bytes 'de 10'
21:55mwk: do we know anything about it?
21:56imirkin_: 10de is the nvidia device id...
21:56mwk: it is
21:57mwk: but it also seems to be a signature for the android falcon firmware files
21:57imirkin_: that's the only format i know.
21:58mwk: mmh, that one
21:58mwk: I know that one, and it doesn't match
21:58mwk: it's quite specific to pgraph, anyhow
22:16karolherbst: imirkin_: is there any CondCode slct can't accept, but set can? (except those for the U32 case)
22:19karolherbst: imirkin_: because in the end the entire opt should boil down to something like that: https://github.com/karolherbst/mesa/commit/a6ec6b6368d74a431528587188599f9d0c7db4f3
22:19karolherbst: which is kind of easy to understand at the same time (compared to what I had before)
22:20imirkin_: iirc slct can only accept a handful
22:20karolherbst: well the emiter seems to be happy though
22:20imirkin_: e.g. only eq/ne for u32
22:20karolherbst: okay, right, but what about unordered for s32 and f32?
22:21karolherbst: at least the emitter is happy about all that :(
22:21imirkin_: doubtful, dunno
22:21imirkin_: check the emitted data
22:21karolherbst: well, thing is, it doesn't really come up with shader-db, except NEU.. once
22:22karolherbst: but yeah, I could check with nvdisasm what it supports at least
22:26karolherbst: imirkin_: mhh /*0000*/ @P0 FCMP.NEU R0, R0, R0, R0; /* 0x3e80000000000000 */
22:27karolherbst: @P0 FCMP.LEU R0, R0, R0, R0; /* 0x3d80000000000000 */
22:29karolherbst: okay, for ICMP there are no U variants
22:30karolherbst: at least according to nvdisasm
22:31karolherbst: mwk: do you know something about that? FCMP supporting unordered condCodes, but ICMP not?
22:31mwk: uh, obviously
22:31mwk: integers have no NaNs, so there's no use for unordered codes...
22:32karolherbst: silly me
22:32imirkin_: karolherbst: hm, maybe FCMP/ICMP support everything. dunno.
22:32imirkin_: should check what the nv50 situation is
22:33karolherbst: yeah well, my nvdisasm is too new :(
22:33mwk: envydis is pretty much rock-solid for g80 though
22:33karolherbst: mwk: but do you know anything about fcmp not supporting unordered condcodes?
22:33karolherbst: imirkin_: nv50 doesn't know slct :)
22:34imirkin_: right ok
22:34karolherbst: sooo the code lowering that stuff has to deal with that
22:34mwk: not really
22:34mwk: fcmp should support unordered
22:34mwk: g80 does have slct btw
22:35mwk: not a particularly expressive one, but there is such instruction
22:35karolherbst: ahh, I see
22:35karolherbst: I guess it was easier to lower everything instead
22:35karolherbst: yeah okay, then this makes my code wonderfully easy
22:35karolherbst: uhm, simple
22:36karolherbst: and on tesla we still get that original OP_SET removed
22:38karolherbst: only thing left are those slct->set + slct pairs opted by ConstFolding :(
22:39karolherbst: *left after
22:41karolherbst: checking if there are other condcodes on slct with set(0, a) on src2
23:00karolherbst: ahhh, there is a bug in my code
23:01karolherbst: or... maybe it is okay?
23:09karolherbst: imirkin_: also reminder on that patch: https://github.com/karolherbst/mesa/commit/008c5c32d6dae46c07a072092c1f1cbc75537081