00:09 karolherbst: imirkin_: uhm... fp16 support will be fun as well I guess
00:09 karolherbst: maybe I should keep that in mind already
00:10 imirkin_: will have to look at how that works exactly
00:10 imirkin_: i saw there's HADD stuff
00:10 imirkin_: but i dunno if that's 2x16, or 1x16
00:10 imirkin_: (but schedulable so that 2x of them exec at the same time)
00:11 karolherbst: well, I am more interested how that works out with inpu/outputs, or are those still at least 32bit?
00:11 imirkin_: definitely.
00:11 imirkin_: slots are all 32-bit on all hardware
00:11 karolherbst: and then will there be a hvec4 in nir?
00:11 imirkin_: anything else is just api's trying to be generic
00:12 imirkin_: can't speculate about nir
00:12 imirkin_: although i think nir already has f16 support
00:12 karolherbst: there are pack and unpack isntructions for that at least
00:12 imirkin_: i believe they have a bitsize concept now
00:12 karolherbst: mhh
00:12 karolherbst: yeah
00:12 imirkin_: no, it has full fp16 support
00:12 karolherbst: ohh right
00:12 karolherbst: vec4 16
00:12 karolherbst: :)
00:12 imirkin_: makes sense.
00:13 imirkin_: the question is whether this needs to be SIMD'd by us on the fly (i.e. allocated into high/low halves of regs)
00:13 imirkin_: or whether 1 val = 1 reg, but scheduling allows them to exec 2x as fast
00:13 karolherbst: mhhh
00:13 imirkin_: that will require someone to look at the pascal (and separately, volta) isa
00:13 karolherbst: I am sure we will have to work a lot inside codegen for this and changing tgsi/nir is just a slim part of that
00:14 imirkin_: i dunno....
00:14 imirkin_: i think it should be quite easy
00:14 imirkin_: i just don't have a gpu that supports it
00:14 karolherbst: mhh
00:15 karolherbst: things like variable indexed arrays could be painful on the nir side, because we get 0x2 offsets, which we have to translate into shifts or hi/lo loads or whatever there is
00:15 karolherbst: or maybe we can load at 0x2 then...
00:15 karolherbst: too many questions
00:16 karolherbst: or ld f16 $r0 l[0x2] would fill the high bits?, also a possibility
00:16 imirkin_: yes, it would.
00:16 karolherbst: nice
00:16 imirkin_: i don't know if there are short loads from lmem
00:16 imirkin_: but there are definitely short loads from gmem
00:16 karolherbst: okay, then this makes it easier
00:16 imirkin_: (i.e. 8- and 16-byte)
00:16 imirkin_: er, bit
00:16 karolherbst: somewhat
00:16 imirkin_: i kinda assume the others can do it too, but i dunno
00:16 karolherbst: but I guess we don't want to deal with that before doing RA
00:17 karolherbst: and just have $r0h values or whatever
00:17 imirkin_: the question is how do the ops want it
00:17 imirkin_: yeah, that's how nv50 does it
00:17 imirkin_: for mul
00:17 karolherbst: I see
00:17 imirkin_: you can actually get hi/lo addressing there
00:17 karolherbst: and it is fine with g[0x2] in inputs?
00:17 imirkin_: yea
00:17 imirkin_: (for 16-bit loads)
00:17 karolherbst: okay, so only 64bit stuff is painful
00:18 karolherbst: mhh, interesting
00:18 imirkin_: it all has to be aligned to load size
00:18 imirkin_: if you load 32-bit, it has to be aligned to 32-bit
00:18 imirkin_: etc
00:18 karolherbst: well right, but 64 bit also requires those merge/splits
00:18 karolherbst: otherwise it would be childs play as well
00:18 imirkin_: that's a failing of RA
00:18 karolherbst: right
00:18 imirkin_: (i think)
00:18 karolherbst: yeah, it is
00:18 karolherbst: quite sure
00:19 karolherbst: everything is fine as long as there is not this phi node messup
00:19 imirkin_: maybe i'll have time at some point this week to grok the RA code and just fix it?
00:20 karolherbst: I really think this would be the better solution, because this would keep the code much cleaner and the nvir print outs much easier to read :)
00:21 karolherbst: I would look into it myself if you won't have time for it and try to figure something out
00:21 karolherbst: I doubt the issue is _that_ hard to solve, it just needs a careful fix, because we don't have sources with the original values, but those are part of the def now
00:21 karolherbst: and this might be the issue here really
00:22 karolherbst: for compounds you just increase the livei range by the arguments of that pseudo op, more or less
00:29 karolherbst: ...
00:29 imirkin_: yeah
00:29 karolherbst: check your inbox
00:30 imirkin_: am i supposed to see something there?
00:30 karolherbst: "nv50/ir/ra: Fix copying compound for moves"
00:30 imirkin_: oh
00:30 imirkin_: mesa-dev
00:30 karolherbst: :)
00:30 karolherbst: right
00:30 imirkin_: skips my inbox :)
00:31 karolherbst: my fault
00:31 karolherbst: yeah, for me not, because CC
00:31 karolherbst: you are in CC as well
00:31 imirkin_: still skips ;)
00:31 karolherbst: fix your filters :p
00:32 imirkin_: anyways, give it a whirl
00:32 imirkin_: cwabbott: thanks =]
00:32 karolherbst: :)
00:32 imirkin_: cwabbott: so all i need to do is point at a line in some code and you'll fix it? hmmm... dangerous precedent!
00:33 cwabbott: np :)
00:33 cwabbott: lol
00:33 cwabbott: don't expect to be too lucky like that
00:33 imirkin_: hope it was fun.
00:33 cwabbott: it was, in a way
00:33 imirkin_: (or at least, interesting)
00:33 karolherbst: ...
00:33 cwabbott: i've worked on mesa's allocator, so it was nice seeing a slightly different take on it
00:33 karolherbst: cwabbott: at least compile check your changes :p
00:34 cwabbott: nah, that's for wimps :)
00:34 karolherbst: well
00:34 karolherbst: obviously not, because you just sent out a broken patch, duh!
00:34 cwabbott: ok, fine, i'll get it set up
00:35 karolherbst: I am curious about the testing you did ;)
00:35 imirkin_: it's unfortunate that building codegen requires libdrm_nouveau
00:35 imirkin_: it's largely a failing of the build system more than anything
00:35 imirkin_: there are no actual code deps there
00:35 imirkin_: but it's _so_ rare that anyone wants it...
00:35 karolherbst: well, we get rid of it when moving into src/nouveau or src/nvidia or src/compiler/... or something
00:36 cwabbott: karolherbst: nothing, since i don't have an nvidia gpu atm
00:36 cwabbott: you thought you could get away scot-free? :)
00:36 karolherbst: cwabbott: if I get significantly less crashes, your are my preliminary heor of the year :p
00:36 karolherbst: *hero
00:37 karolherbst: but it might be that I just hit my own asserts now everywhere...
00:38 karolherbst: nir just stores vec4 64 values in shader outputs :)
00:38 karolherbst: and I don't handle that currently that well
00:38 imirkin_: karolherbst: iirc you can do 64-bit exports
00:38 imirkin_: as long as the slot id's are all good, you're fine
00:38 karolherbst: slotting
00:38 imirkin_: (and 64-bit aligned)
00:38 cwabbott: well, configure didn't give up, so i suppose i already have libdrm_nouveau
00:38 karolherbst: ;)
00:39 imirkin_: cwabbott: ftr, you can use nouveau_compiler and feed it tgsi directly (on stdin)
00:39 imirkin_: (although i recommend sticking it in a file first, unless you're REALLY familiar with tgsi...)
00:39 cwabbott: doesn't the problem only show up with nir though?
00:40 imirkin_: i haven't tried with tgsi
00:40 imirkin_: i think it should be possible.... maybe.
00:40 imirkin_: i'd have to plan it out carefully.
00:40 imirkin_: bbl
00:40 cwabbott: i think the allocator will choke on all the splits/merges and not optimize things properly
00:41 kherbst: really...
00:41 imirkin_: it actually cleans up a lot of them :)
00:42 imirkin_: there's a MergeSplits thing
00:42 imirkin_: or SplitMerges
00:42 imirkin_: i forget
00:42 karolherbst: well I got CTXSW_TIMEOUT
00:42 karolherbst: cwabbott: seems like your patch made it worse :(
00:42 imirkin_: ok. now i'm gone for real. ttyl
00:42 cwabbott: well, that's unfortunate
00:43 cwabbott: hopefully the commit message explains things enough that it's easier to figure out what the actual issue is
00:43 orbea: imirkin_: my guess is that mpv now works because wm4 finally implemented your original suggestion of "dont do that", more specifically looking in the man page shows hwdec=vdpau-copy which is described as only working for some video cards and instantly crashes, well at least not the entire system now.
00:44 cwabbott: i kinda guessed that that was the issue, although i might've been wrong
00:44 cwabbott: or maybe my patch uncovered more issues
00:46 karolherbst: cwabbott: well, what if there is no mov to begin with?
00:46 karolherbst: like the phi points to a merge and mul instruction
00:46 cwabbott: the RA inserts moves
00:47 cwabbott: https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp#n446
00:48 cwabbott: basically, all the "hard" stuff is deferred until you start coalescing moves
00:48 karolherbst: might be some other fail then
00:48 karolherbst: skeggsb: sec2: unhandled intr 00000010
00:49 karolherbst: I see
00:49 cwabbott: what happens now, btw?
00:50 karolherbst: what do you mean?
00:50 karolherbst: well it seems like that some crashes are fixed, but some of the tests which passed are failing now... maybe I get a proper result somehow without my machine crashing again
00:51 cwabbott: ah, ok
00:51 cwabbott: it's entirely possible there's another issue this uncovered
00:52 karolherbst: I run piglit again :)
00:54 karolherbst: it could have been unrelated as well, sometimes piglit just messes up the gpu for real
00:54 skeggsb: karolherbst: i have nfi about anything to do with that engine
00:54 karolherbst: :)
00:54 karolherbst: who has?
00:54 skeggsb: nvidia...
00:55 karolherbst: let's ask them....
00:55 karolherbst: do you want to talk about that tomorrow?
00:55 skeggsb: we only use it for secboot stuff, which is basically a black box
00:55 karolherbst: and?
00:56 skeggsb: how'd you trigger it?
00:56 karolherbst: cwabbott: okay, seems to have been bad luck
00:56 karolherbst: skeggsb: piglit
00:57 skeggsb: guessing in response to a page fault?
00:57 karolherbst: engine crash
00:58 karolherbst: but yeah
00:58 karolherbst: nouveau 0000:01:00.0: fifo: read fault at 00003a6000 engine 00 [GR] client 05 [GPC0/PE_1] reason 00 [PDE] on channel 5 [00ffc64000 shader_runner[6106]]
00:58 karolherbst: kind of?
00:58 karolherbst: usually nouveua manages to restart the engine
00:58 karolherbst: *nouveau
00:58 skeggsb: yeah, so it's a secboot bug when trying to reset gr
00:59 karolherbst: cwabbott: well I will keep your patch and see if it fixes the issues I am after
00:59 karolherbst: but I can't tell, because I still need to fixup 64 bit slotting and stuff like that
01:00 skeggsb: i think that's the falcon halt interrupt
01:00 skeggsb: assuming they haven't changed the bitfield on newer boards
01:00 karolherbst: yeah, most likely
01:13 karolherbst: cwabbott: okay, your patched fix all crashes with that compound assert hit, which were 2 in total currently, because all the other 3000 fail because of new asserts in my nir code :)
01:35 Lyude: i think i'll start getting some work done on powergating again tomorrow, was hoping that staying home and fixing this dock wouldn't take too long but now I am not so sure of that :(
01:39 karolherbst: Lyude: :)
01:39 karolherbst: Lyude: pro tip: never make any plans, everything just needs as much time as you need for it anyway
01:39 Lyude: true..
02:42 karolherbst: uhm right...
02:42 karolherbst: cwabbott: I found a bug
02:42 karolherbst: 0: mov u32 $r0 c1[0x60] (8) 1: mov u32 $r1 c1[0x64] (8) 2: mov u32 $r0 c1[0x68] (8) 3: mov u32 $r1 c1[0x6c] (8)
02:42 karolherbst: $r0 and $r1 get overwritten
02:43 karolherbst: pre RA: https://gist.githubusercontent.com/karolherbst/c58aa8ff07db82f672f4ec4092075a30/raw/cac2fa69a941b9696bf727377af89429b209b177/gistfile1.txt
02:44 karolherbst: I guess I will split the stores for now
02:46 cwabbott: karolherbst: hmm, interesting
02:47 karolherbst: maybe a mov helps
02:47 karolherbst: let me try
02:47 cwabbott: i'd add some debug prints to see what the compMask is
02:47 karolherbst: like I also need to add a mov for exports
02:47 cwabbott: nah, this is definitely a bug with RA's handling
02:48 cwabbott: from my reading of the sources, %r147d and %r150d will be replaced with a merged value in the store
02:49 cwabbott: so after coalescing has done it's thing, %r145 should have a compMask of 00010001
02:49 cwabbott: %r146 should have 00100010
02:50 cwabbott: %r148 should have 0100 0100
02:50 cwabbott: and %r149 should have 1000 1000
02:50 cwabbott: and they should all be merged together
02:50 karolherbst: how can I dump those, debug=7?
02:50 cwabbott: add some debug prints
02:50 karolherbst: okay
02:51 cwabbott: basically, the compMask tells you what the address mod 8 can be
02:51 karolherbst: ahh
02:53 cwabbott: except two-wide registers will have two bits set instead of one
02:53 karolherbst: %145:55 %146:aa %148:55 %149:aa
02:54 cwabbott: so for example, %r147d should have a compMask of 0011 0011
02:54 cwabbott: and that means it can start at 0 or 4 mod 8
02:54 karolherbst: interesting
02:54 karolherbst: I know of a super annoying bug we have in RA
02:55 karolherbst: spilling of wide registers -> fail
02:55 cwabbott: i saw some hacks in there around it
02:55 karolherbst: yeah, doesn't work
02:55 cwabbott: maybe cbrill was trying to get it to work and never did
02:55 karolherbst: well
02:55 karolherbst: we don't spill those
02:55 karolherbst: even if we would have to
02:55 cwabbott: that's good then :)
02:56 karolherbst: well no :D
02:56 cwabbott: oh, maybe not so good
02:56 karolherbst: some CTS tests fail due to this
02:56 karolherbst: on kepler that is relevant
02:56 karolherbst: and we also have some games in wine which faile due to this
02:56 karolherbst: using a lot of tex and in the wrong order :)
02:57 imirkin: cwabbott: calim wrote all that stuff
02:57 imirkin: he's basically been away from nouveau since 2014 or so
02:57 karolherbst: I think there is something using 20 tex with all quad registers and we spill neither of those
02:57 cwabbott: oh, right, wrong person
02:58 cwabbott: anyways... yeah, those values look wrong
02:58 karolherbst: :)
02:58 karolherbst: I think I hit that bug while trying to fix that spilling
02:58 karolherbst: I got the exact same issue
02:58 cwabbott: like it's not doing the second merge
02:58 karolherbst: and I always though I was wrong...
02:58 karolherbst: *thought
02:58 cwabbott: i.e. it's not merging %r147d and %r150d
02:58 imirkin: the spilling of wide values is problematic due to unrelated matters
02:59 karolherbst: imirkin: in my fix I hit that issue that regs got overwritten
02:59 cwabbott: maybe you can find out the compMask for those?
02:59 karolherbst: I try
02:59 karolherbst: ahh
02:59 karolherbst: I got them
02:59 karolherbst: wait
02:59 karolherbst: %147:33 %150:cc
03:00 karolherbst: cwabbott: and they get printed out by debug=4 nicely :)
03:00 karolherbst: https://gist.githubusercontent.com/karolherbst/a590e31f5dc7a4bd187c29fa8b7fd7b3/raw/9d35411581d2de71f479d141ba8e1fbc06654da9/gistfile1.txt
03:02 cwabbott: karolherbst: oh, i know what the issue is
03:02 cwabbott: my fix wasn't quite correct
03:05 cwabbott: https://hastebin.com/eyogijoveb.hs
03:05 cwabbott: tbh, i'm not sure how that assert didn't get triggered
03:05 cwabbott: i realized it's a bogus assert
03:06 cwabbott: actually... no, it's not a bogus assert
03:07 cwabbott: thinking about it a bit more, my patch should've disabled coalescing %r147d and %r150d
03:08 karolherbst: still the same issue though
03:08 cwabbott: since you'd be trying to coalesce two compound things
03:08 cwabbott: and in general, that's nasty, although in this case it's easy
03:08 cwabbott: *since in general
03:08 karolherbst: and the output didn't change with your small patch
03:08 karolherbst: mhh
03:08 karolherbst: well I could not do that
03:09 karolherbst: it wouldn't be a problem
03:09 cwabbott: i realized my small patch is bogus
03:10 cwabbott: does %r147d get coalesced with anything besides %r145 and %r146?
03:11 karolherbst: no
03:11 karolherbst: well except that thing: merge b128 %r296q %r147d %r150d
03:11 karolherbst: but I guess you asked about something else
03:11 cwabbott: it does get merged with %r296q?
03:12 cwabbott: that's what i was asking about
03:12 karolherbst: into
03:12 cwabbott: hmm
03:13 cwabbott: i thought both %r147d and %r296q would be marked compound
03:13 cwabbott: so coalesceValues() would bail out
03:13 karolherbst: well shouldn't be the %r150d compund be fixed up if used for another compund?
03:14 cwabbott: yes, it should
03:15 cwabbott: it seems like it's compMask is correct
03:15 cwabbott: as if it was merges
03:15 cwabbott: *merged
03:15 karolherbst: mhh actually
03:15 karolherbst: both values need to be fixed
03:16 karolherbst: %145:55 -> %145:11, %146:aa -> %146:22 %148:55 -> %148:44 and %149:aa -> %149:88?
03:17 cwabbott: yeah, that''s what it should look like, it the double registers are merged into the quad register
03:17 cwabbott: that's basically what my patch was trying to do
03:17 cwabbott: my small patch
03:18 karolherbst: mhh
03:18 cwabbott: but i thought my earlier patch would've disabled the merging
03:18 karolherbst: I think the sources need to be fixed
03:18 karolherbst: or... mhh
03:19 karolherbst: mhhh
03:20 karolherbst: no, the compound stuff looks correct actually
03:20 karolherbst: 296 gets the reg 0 assigned
03:21 karolherbst: and the sources aren't checked
03:21 karolherbst: so this is fine
03:21 cwabbott: i'd try and figure out why %r147d is being merged into %r296q
03:21 cwabbott: despite them both being compound
03:21 karolherbst: 128 bit load with two 64bit sources
03:21 cwabbott: in my patch, i added something to disable that
03:21 karolherbst: store actually
03:21 karolherbst: well
03:22 karolherbst: merge b128 %r296q %r147d %r150d
03:22 karolherbst: st b128 # l[0x60] %r296q
03:22 cwabbott: right
03:22 cwabbott: that gets added by the InsertConstraintsPass
03:22 karolherbst: but I think the compound code should be correct, just the code where the parts of the compounds get their regs assigned should have a bug
03:23 karolherbst: pre RA: st b128 # l[0x60] %r147d %r150d
03:24 karolherbst: sources: merge u64 %r147d %r145 %r146 and merge u64 %r150d %r148 %r149
03:24 karolherbst: mhh
03:25 karolherbst: well before SSA this was two stores
03:25 karolherbst: st s64 # l[0x60] %r3d and st s64 # l[0x68] %r4d
03:25 karolherbst: and both got a merge, because it reads from 64 bit c[] value
03:25 cwabbott: so some pass optimized the two stores into one, i guess
03:25 karolherbst: memoryOpt
03:25 cwabbott: makes sense
03:25 cwabbott: and it inserted the merge
03:26 cwabbott: so far, so good
03:26 karolherbst: actually, not
03:26 cwabbott: why not?
03:26 karolherbst: the merges were always there
03:26 cwabbott: oh, right
03:26 karolherbst: we can't read 64bit values from c[] afaik, so I have to put two 32bit reads there and merge them
03:27 karolherbst: I could also just convert the nir opcode to two stores
03:27 karolherbst: but... that wouldn't fix the problem :)
03:27 cwabbott: oh, ok... i see
03:27 cwabbott: yeah, this seems unrelated to the problem
03:27 karolherbst: but I think the current code is actually fine
03:28 karolherbst: just the last bit, where the compound sources are getting their "part" of the compound reg assigned should be buggy
03:28 cwabbott: the problem is clearly with the RA and its coalescing stuff
03:28 karolherbst: the RA part itself looks correct
03:28 karolherbst: well the register choosing bit
03:28 cwabbott: does it?
03:28 karolherbst: yes
03:29 cwabbott: it shouldn't be merging %r147d and %r296q
03:29 karolherbst: %r147d doesn't get a reg assigned
03:29 karolherbst: only %r296q
03:29 karolherbst: and %r296q gets 0
03:29 cwabbott: yeah, because apparently RA decided to coalesce them together
03:29 karolherbst: so what should happen is, that %r147d gets 0 and %r150d gets 2 based on what %r296q got
03:30 karolherbst: %296:ff && %147:33 -> 33 : 0/1
03:30 cwabbott: and that's not happening since %r147d and %r150d are getting the wrong compMask
03:30 karolherbst: %296:ff && %150:cc -> cc : 2/3
03:30 karolherbst: right?
03:31 cwabbott: yeah, that looks correct
03:31 karolherbst: %147:33 && %145:55 => 11 : 1
03:31 cwabbott: that also looks fine
03:31 karolherbst: %147:33 && %146:aa => 22 : 2
03:32 cwabbott: that's what should happen, in theory
03:32 karolherbst: %150:cc && %148:55 => 33 : 3
03:32 karolherbst: I found the bug I think :O
03:32 karolherbst: compound: %147:ff <- %145:55
03:32 karolherbst: vs
03:32 karolherbst: compound: %296:ff <- %147:33
03:32 karolherbst: compound: %150:ff <- %148:55
03:32 karolherbst: vs
03:32 karolherbst: compound: %296:ff <- %150:cc
03:32 karolherbst: I think the old mask is used to select the childs of 150 and 147
03:33 karolherbst: not the one from the parent compound
03:33 cwabbott: right
03:33 karolherbst: I have no idea how to fix that :)
03:33 cwabbott: that's why my second patch goes through and and's the compMask of all the coalesced def's
03:34 karolherbst: ...
03:35 karolherbst: lsrc->compMask &= ldst->compMask?
03:35 cwabbott: so that when we merge %296 and %147, we and %145 and %146 with 33
03:35 cwabbott: yes
03:35 karolherbst: mhh, I swaped them, but it didn't do anything though
03:36 karolherbst: ohhh
03:36 karolherbst: we need to go one level deeper
03:36 karolherbst: becase now we fixed the doubles, but not the single regs
03:37 cwabbott: right... basically that patch is trying to go one level deeper
03:37 karolherbst: mhh
03:38 cwabbott: what confuses me is that 296 should never get coalesced with 147 in the first place
03:38 karolherbst: why not?
03:38 cwabbott: since they're both compound, and in my first patch i disallowed that in coalesceValues()
03:39 cwabbott: with a // TODO: handle this case
03:40 cwabbott: if they're both compound, but one of them has a compMask of ff, it's copying a subregister to the whole thing, so we could handle it with something like my second patch
03:40 cwabbott: maybe i should rephrase that
03:43 karolherbst: force == true
03:43 karolherbst: :)
03:43 cwabbott: if one of src and dst has compMask == ff, say src, then it's the same as when src isn't compound, since we're copying a subregister to a whole register or vice-versa
03:43 karolherbst: right
03:43 karolherbst: OP_MERGE get force = true
03:43 cwabbott: don't we insert extra moves before OP_MERGE?
03:44 karolherbst: and rep->compound is 0
03:44 karolherbst: it seems like not in this case
03:44 karolherbst: https://gist.githubusercontent.com/karolherbst/4fff8c76e8160156e259d0affc61d1c2/raw/cc1a033ea01ea79e325227f142356dc87d861c5a/gistfile1.txt
03:45 cwabbott: i mean, in RA
03:45 cwabbott: in the InsertConstraintMoves pass
03:46 cwabbott: what does the IR look like after InsertConstraintMoves runs?
03:46 imirkin: cwabbott: happy to share some of my collection with you if you like
03:46 imirkin: (if you're still in nyc)
03:46 karolherbst: cwabbott: should be the second one
03:47 cwabbott: karolherbst: ok, so it does insert the moves
03:47 karolherbst: well right, but not between two merges
03:49 cwabbott: yeah, that's weird
03:49 cwabbott: the code totally isn't set up to handle that
03:49 karolherbst: :)
03:50 karolherbst: imirkin is telling me this for days basically
03:50 cwabbott: it just blindly sets the compMask for the source of the merge, assuming it's not compound or anything funky
03:51 cwabbott: which if it isn't... that would be way too much of a pain to handle
03:51 karolherbst: :)
03:51 karolherbst: well, a bit
03:51 karolherbst: I guess with movs in between it would be a bit easier?
03:52 cwabbott: well, without the moves, you also have to worry about sources interfering with each other
03:52 cwabbott: or rather, interfering with the destination
03:52 karolherbst: mhhh
03:52 cwabbott: all kinds of fun stuff
03:52 karolherbst: mhh
03:52 cwabbott: so the moves are essential for correctness
03:53 karolherbst: if I decide to split the store, I would end up with a merge+split pair
03:53 karolherbst: which could be cleaned up, but this is also more like of a workaround
03:53 cwabbott: i would just try and figure this out
03:54 cwabbott: the moves are supposed to be inserted by this code: https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp#n2322
03:56 cwabbott: figure out why it isn't triggering for 296, maybe
04:00 karolherbst: mhhh
04:01 karolherbst: it hits that continue
04:01 karolherbst: well right, because the defi->op == OP_MERGE
04:02 karolherbst: ohh wait, that is fine
04:02 cwabbott: wait, yeah... that's broken
04:03 karolherbst: okay, it isn't fine :)
04:04 karolherbst: pass
04:04 cwabbott: normally, it would be ok, if it were any other op... since defi only has one use (our use), we don't have to worry about the source interfering with the dest
04:04 karolherbst: if (cst->getSrc(s)->refCount() == 1 && !defi->constrainedDefs() && defi->op != OP_MERGE) { ?
04:04 cwabbott: yeah
04:04 karolherbst: OP_SPLIT as well?
04:06 karolherbst: well, lets see what piglit has to say about this now
04:06 karolherbst: cwabbott: well, we need something to remove those silly moves post RA though
04:07 cwabbott: RA should be able to coalesce them away though
04:07 karolherbst: https://gist.githubusercontent.com/karolherbst/351c407c7f09725386851166122890e1/raw/85c60280394e66524a0acc6a2ddcef2f3e89563a/gistfile1.txt
04:07 karolherbst: :)
04:07 karolherbst: those 4 movs are quit pointless
04:08 karolherbst: because the store could just read from $r5
04:08 karolherbst: *$r4
04:08 cwabbott: ah, yeah
04:08 cwabbott: that's probably caused by what i was talking about
04:08 cwabbott: the case where both are compound and compMask == 0xff
04:08 cwabbott: for one of them
04:09 cwabbott: and we can handle it, but we have to fix up the child defs
04:10 karolherbst: I doubt that, I think that value just gets the reg 0 assigned, because it is free
04:10 karolherbst: actually
04:10 karolherbst: the value thinks that 4-7 are taken
04:10 cwabbott: no, i'm pretty sure that's it
04:11 cwabbott: it assigns the second quadword to r0
04:11 karolherbst: https://gist.githubusercontent.com/karolherbst/a4d61b85e49f0aac41c5975f2289526a/raw/f4da4b3572cc984beb916f3c68313ea4b794f3c3/gistfile1.txt ?
04:11 cwabbott: and the doublewords to r4 and r6
04:11 karolherbst: mhh
04:12 cwabbott: if you coalesced the second quadword with the doublewords, it would know to assign the quadword to r4 instead
04:12 karolherbst: ahh, right
09:18 pmoreau: imirkin: From what I remember reading about fp16 support in GM20B+ (and looking at the CUDA C API), is that operations like hadd still takes 2 32-bit sources and outputs to 1 32-bit destination, but the instruction will do one 16-bit add on the lower part and another one on the higher part.
09:19 pmoreau: I think those are done simultaneously with a single hw instruction, but I’d need to check again.
09:20 pmoreau: As for loading/extracting 2 fp16 to/from a 32-bit register, I haven’t checked if there is an instruction for that or whether the driver has to do it.
09:21 pmoreau: I guess I could write a small kernel and see what’s being done. :-) Let’s do that!
09:42 pmoreau: imirkin, karolherbst: Here a couple of simple CUDA kernels using fp16, and the corresponding SASS for SM 6.0: https://hastebin.com/kujanolubo.cs https://hastebin.com/icixodacot.swift
09:43 pmoreau: So, HADD2 is indeed a single hw instruction that takes 32-bit registers as source/destination.
10:39 imirkin: ok, so it's like VADD*
10:40 imirkin: oh lol, that's cheating. there's just a f16vec2 type on which the hadd thing operates? super.
10:42 imirkin: XMAD.PSL.CLO R0, R4, 0x1, R2;
10:42 imirkin: i guess that's faster than BFI somehow?
10:47 pmoreau: Could be. XMAD just does so many things :-D
15:57 karolherbst: imirkin: is there any kind of 64 bit rdsv or are all results 32 bit?
16:04 imirkin_: all 32-bit
16:04 imirkin_: some values are 64-bit
16:04 imirkin_: so you have to do 2x rdsv
16:04 imirkin_: (like clocks)
16:06 karolherbst: ahh, I see
16:28 RSpliet: Imma just going to drop this here: https://cs.unc.edu/~anderson/papers/rtss17c.pdf . I don't think it contains anything shocking about work scheduling, but nice to see it on paper.
16:29 RSpliet: Wonder if we can use the priority mechanism to prioritise full-screen workloads (read: games) over say the compositor... or whether it's a compute-context only property
16:31 RSpliet: And whether it might help close the gap in perf between playing games on gnome-shell vs. something like i3
16:39 karolherbst: mupuf: .... "remote: fatal: Out of memory, malloc failed (tried to allocate 292587109 bytes)"
16:39 karolherbst: mupuf: that apitrace repository
16:39 mupuf: where is that?
16:40 karolherbst: apitraces actualy
16:40 karolherbst: I guess somebyd pushed a trace too big for your machine....
16:40 karolherbst: *somebody
16:40 mupuf: holy shit!
16:41 mupuf: add a swap file
16:41 mupuf: and clone it ;)
16:41 karolherbst: ....
16:41 karolherbst: uhm
16:41 karolherbst: it is your machine which runs out of ram
16:41 RSpliet: can't even allocate 280MiB? Wuss! :-P
16:41 karolherbst: "remote: Compressing objects"
16:42 mupuf: you mean the server?
16:42 karolherbst: yes
16:42 mupuf: it has 2 GB
16:42 karolherbst: :D
16:42 karolherbst: not enough
16:42 karolherbst: check it out
16:42 karolherbst: I pull again
16:42 karolherbst: or clone
16:43 mupuf: and 512 MB of swap
16:43 mupuf: I can increase that
16:43 karolherbst: :) well more RAM would be better
16:43 karolherbst: I think the issue is, that the entire repository is compressed
16:43 karolherbst: at clone time
16:43 karolherbst: and I doubt it caches stuff on the disc
16:43 karolherbst: so it keeps every trace in ram
16:44 RSpliet: Luckily cloning is infrequent, so increasing swap space slightly is not an unreasonable solution :-)
16:44 mupuf: yep
16:44 karolherbst: "slightly"
16:44 karolherbst: it failed before even reaching 50% of all objects
16:44 mupuf: karolherbst: but yeah, your git process is only increasing in size
16:44 karolherbst: :)
16:44 karolherbst: 7/27
16:44 mupuf: 75% of the RAM now
16:45 karolherbst: see the problem?
16:45 mupuf: I see ;)
16:45 RSpliet: mupuf: perhaps kill systemd to clear up some ram :-P
16:45 karolherbst: mupuf: you could put 8GB ra + zram
16:45 karolherbst: that should work for a while
16:45 karolherbst: or maybe reconfigure the git server to do less silly things like that
16:45 karolherbst: I don't know what the proper way here is
16:46 karolherbst: or maybe setup a git lfs server?
16:46 karolherbst: or maybe we should use pain ftp for the traces...
16:46 mupuf: karolherbst: do you know where this server is?
16:46 karolherbst: I have no idea
16:46 mupuf: Roubaix, France
16:46 karolherbst: :) nice
16:47 mupuf: it is a dedicated server
16:47 mupuf: hosted by OVH
16:47 karolherbst: your machine or rented?
16:47 mupuf: rented
16:47 RSpliet: mupuf: Just add a Lille bit of RAM ;-)
16:47 mupuf:added a 8 GB of swap
16:47 mupuf: it will get slow, but it should be enough
16:48 karolherbst: mupuf: k, compressing should be slow enough to not matter much here
16:48 mupuf: if 10 GB is not enough to clone the repo, we have a big problem
16:48 karolherbst: I guess I hit swap already?
16:48 mupuf: nope
16:48 karolherbst: okay
16:48 karolherbst: it is slow anyway
16:48 mupuf: and yeah, PTI probably slows the entire machine down even more
16:48 karolherbst: doubtful
16:49 karolherbst: compressing stuff usually doesn't include syscalls
16:49 karolherbst: or you do it wrong
16:49 RSpliet: Just asking for a friend: why do we need version control on traces?
16:49 karolherbst: I know somebody who wanted that
16:49 RSpliet: Isn't git simply the wrong tool for the job?
16:49 mupuf: RSpliet: nope, but that gave easy access control
16:50 RSpliet: Ah! mupuf: That means solutions don't necessarily imply hardware upgrades. Rather "just" time... don't know which is cheaper :-D
16:50 mupuf: ;)
16:51 mupuf: karolherbst: now it is swapping a little
16:51 mupuf: ok now it is swaping properly
16:51 karolherbst: mhh
16:51 karolherbst: I am at 33%
16:52 mupuf: let's hope the biggest traces were at the start
16:52 RSpliet: Or try cloning an older revision and work your way up step-wise to reduce per-pull RAM usage
16:53 mupuf: yep :)
16:56 mupuf: I guess it made it out of the hardest part. the memory usage went down
16:57 karolherbst: well, still at 33%
16:58 mupuf: it does not use CPU anymore
16:58 karolherbst: mhhh
16:58 karolherbst: maybe it gave up :/
16:58 karolherbst: I'll try again
16:58 mupuf: no, let it continu
16:59 mupuf: Gosh, my irc is slower because of you :D
16:59 karolherbst: duh
16:59 mupuf: it is just swapping, that's why it is slos
16:59 mupuf: let it finnish
16:59 mupuf: finish*
16:59 karolherbst: k
17:04 mupuf: now it is using both cores, I guess it is making progress
17:09 mupuf: karolherbst: welcome to fd.o!
17:09 mupuf: right now it is a little restricted, because of meltdown
17:09 karolherbst: wow, that was fast
17:09 karolherbst: :) thanks
17:10 mupuf: Y w
17:32 karolherbst: nice, I think I get the nir stuff pretty complete end of this week :) 64 bit types are working quite nicely now
17:34 imirkin_: karolherbst: no more RA crashes?
17:34 karolherbst: exactly :)
17:34 karolherbst: only input/output slotting issues left
17:34 imirkin_: yay
17:34 imirkin_: is that with cwabbott's thing, or with the split/merge stuff?
17:35 karolherbst: with cwabbott's thing
17:35 karolherbst: we debugged outstanding issues with that patch yesterday
17:35 karolherbst: and fixed it :)
17:36 karolherbst: and I have a fairly good understanding on that compound stuff now as well :)
17:37 karolherbst: I still do some split/merges though, but usually only for pack/unpack stuff and where we really have to do 32 bit stuff like c[] access or 64 bit op lowering stuff and so on
17:42 karolherbst: imirkin_: with doubles and int64 enabled: [26075/26075] skip: 1576, pass: 22505, warn: 9, fail: 1971, crash: 14
17:42 karolherbst: :)
17:44 imirkin_: nice
17:44 imirkin_: were you able to remove all your hacks?
17:46 karolherbst: 64 bit tyoe related, yes
17:46 karolherbst: *type
17:47 imirkin_: cool
17:47 imirkin_: why all the fail?
17:47 imirkin_: you were doing much better before
17:47 karolherbst: slotting
17:47 imirkin_: ah :(
17:47 karolherbst: uhm, well
17:47 imirkin_: yes, that is important, and there are a LOT of tests for it
17:47 karolherbst: before 64 bit stuff was disabled :)
17:47 karolherbst: the amount of passes went up, right?
17:48 karolherbst: so everything is fine :)
17:48 karolherbst: mhh
17:48 karolherbst: I still split access to FILE_MEMORY_BUFFER
17:48 karolherbst: can I do a 64 bit load from FILE_MEMORY_BUFFER?
17:49 karolherbst: well, I clean up the code proberly and fix all the regressions I encounter, shoud be easier
17:50 karolherbst: I want to move all the load and store stuff into two methods anyway
17:55 karolherbst: mhh and bool type conversions sometimes have some odd things, but there is nothing I can clean that up and let codegen handle that, because we don't have bool types :)
17:55 karolherbst: or at laest 32 bit bools
18:11 imirkin_: 64-bit memory loads are possible
18:11 imirkin_: that's actually a place where nir will do better than tgsi
18:11 imirkin_: with tgsi, we have to load 32-bit at a time
18:11 imirkin_: since we have to check for length for every 32-bit component
18:12 imirkin_: since we can't distinguish 2x 32-bit reads vs 1x64-bit read
18:12 imirkin_: with nir, if you load 1x64-bit and it's partly out of bounds, you can just nuke the whole thing
18:12 imirkin_: that said, the lowering/emitter might not be ready for 64-bit loads/stores. however if that's the case, it should be trivial to fix.
18:23 karolherbst: imirkin_: okay, I can't use 64 bit buffers loads :)
18:23 karolherbst: *buffer
18:24 karolherbst: I end up with a 21: union u32 %r125d %r123 %r124 (0) which the emiter doesn't like
18:28 imirkin_: uhhh
18:28 imirkin_: that's wrong
18:28 imirkin_: why is it a 64-bit dest
18:28 imirkin_: but 32-bit src type and args?
18:29 karolherbst: https://gist.githubusercontent.com/karolherbst/cbb9216ff26b81719333bd5adcaf0168/raw/311ae2dcf009f075ef54ff5b323de13a460c058e/gistfile1.txt
18:29 karolherbst: ld u64 %r10d b[0xa0]
18:30 imirkin_: ok. so it's the lowering pass that fucks up
18:30 imirkin_: lowering_nvc0::handleLOAD probably
18:30 karolherbst: maybe yes
18:30 karolherbst: I guess it doesn't expect 64 bit loads :)
18:31 karolherbst: ohh yeah, all unions are created with U32
18:31 karolherbst: and it is handleLDST
18:32 karolherbst: I better not mess with it at this point
18:32 karolherbst: it is one line change for me to workaround that issue, so it is fine
18:33 imirkin_: k
18:33 karolherbst: skeggsb: we might have some memory leaks still, after some piglit runs I see a general higher memory consumption, but I don't really know if that is related
18:34 karolherbst: also I get the feeling, that the machine crashes are after a certain number of runs, not randmoly
18:53 karolherbst: mupuf: I had to restart, but git clone was just reaching 40%
19:00 yann-kaelig: elo
19:00 yann-kaelig: What's Up, Doc?
19:54 karolherbst: imirkin: are 64 bit outputs a thing?
19:54 karolherbst: I mean, in terms of having a dvec4 shader output
20:08 karolherbst: imirkin: let's ship it: [26075/26075] skip: 1576, pass: 24318, warn: 9, fail: 158, crash: 14
20:08 karolherbst: :p
20:10 mupuf: karolherbst: nir?
20:11 karolherbst: yeah
20:11 mupuf: how does it compare to the non-nir path?
20:11 karolherbst: quite similiar, I think the tgsi one has like 6 less crashes and 50 less fails
20:12 karolherbst: piglit summary: 38313/38502 vs 37926/38148
20:12 mupuf: and how stable are piglit results anyway?
20:12 mupuf: Gosh, we really need cibuglog out in the open
20:12 karolherbst: there are quite a lot of disabled arb_shader_image_load_store tests
20:13 karolherbst: mhh
20:13 karolherbst: the RealisticRendering Demo still doesn't like the nir path
20:14 karolherbst: well, I guess I need to fix the other fails and after that run deqp to find all the other fails
20:16 karolherbst: ohh there are some tests with dvec4 outputs
20:16 karolherbst: nice
20:20 karolherbst: ... linterp flat f32 %r2d a[0xa0] ....
20:20 karolherbst: fail
20:26 pmoreau: mupuf: What is cibuglog? :-)
20:26 mupuf: well, it collects CI results and maps failures to known bugs using filters
20:27 pmoreau: Fancy!
20:27 mupuf: The first version I made (read-only): https://intel-gfx-ci.01.org/cibuglog/
20:27 mupuf: and this does not have filters
20:27 mupuf: the new version is almost ready
20:28 mupuf: and it has filters, allowing to track multiple trees, and allowing regexps on stdout, stderr, and dmesg
20:29 pmoreau: I like the output of the first version :-)
20:35 mupuf: lol
20:36 mupuf: if you want more graphs: https://intel-gfx-ci.01.org/cibuglog/metrics.html
20:37 mupuf: putain, 31 cases à cocher pour dire que tout esyt bon pour la secu
20:37 mupuf: parfois, tu dois documenter au passage
20:38 mupuf: ca va encore me prendre une journée rien que pour ca!
20:38 mupuf: Don't implement backdoors: OK!
20:38 mupuf: :D
20:38 pmoreau: :-D
20:38 mupuf: sorry, wrong window :D
20:38 pmoreau: Yup ;)
20:39 mupuf: so yeah, cibuglog's graphs are quite a good way to know hiw the CI system is being used
20:40 mupuf: and allow developer to know when they will get feedback
20:41 pmoreau: Yes, more graphs! \o/
22:31 Lyude: mupuf: what was the watchdog thing that you made for your systems a while back
22:33 Lyude: apparently msi had the brilliant idea of disabling the hardware watchdog on this ryzen board
22:35 imirkin_: i have taken 'msi' off of my list of companies i can buy hw from
22:35 imirkin_: they blatantly lie on their product descriptions
22:36 imirkin_: they gave me a G84 marketed as a GeForce 210 with DX 10.1 support.
22:36 imirkin_: [which G84 does not, btw]
22:45 mupuf: Lyude: it is called wtrpm
22:45 mupuf: I have some schematic
22:46 mupuf: and some software, although the new one is hardware is not supported yet (still working on it)
22:46 mupuf: it is on my desk and I fiddle with it more physically than figuratively
22:48 Lyude: imirkin_: yeah, I just got them because I figured it couldn't be that bad. I mean, no one could be clueless enough to disable a watchdog chip that's already soldered onto the board, right?
22:48 Lyude: (wrong, apparently)
22:49 mupuf: Lyude: watchdogs are all but trustable
22:49 mupuf: one of my SKL has a watchdog that indeed cuts the power but fails to resume :o
22:49 Lyude: mupuf: i've actually had really good luck with them when I could actualy get the hw wrking
22:49 mupuf: lucky you...
22:49 Lyude: yeah; some of the thinkpad carbons will have a heart attack if you enable the watchdog then s/r
22:50 Lyude: however, considering lenovo's incredible prowess with firmware i'm amazed those machines even turn on half the time..
22:52 Lyude: mupuf: would the software work for any old system with a few gpio ports on it
22:53 mupuf: Lyude: provided you hack it a little bit, yes :)
22:54 Lyude: sure thing
23:23 karolherbst: sigh... struct TEXCOORD10_centroid { vec4 Data; } -> type->is_array() ? type->without_array()->components() : type->components() is 0
23:23 karolherbst: annoying
23:26 karolherbst: nice
23:26 karolherbst: OpenGL 4.3 RealsticRenderer works now as well :)