00:00karolherbst: imirkin: target->isLimm(ImmediateValue*)?
00:00karolherbst: allthough mhh
00:01karolherbst: we actually need an insnCanLoad where we can just overwrite the op
00:01karolherbst: still the issue with depending on what other srcs have set
01:14karolherbst: imirkin: volta: IMAD R0, R0, R2, 0x12345678;
01:24HdkR: karolherbst: Hm?
01:26karolherbst: HdkR: before that you could only stick a limm on a fmad
01:26karolherbst: not imad
01:26karolherbst: I think
01:27karolherbst: not 100% sure
01:27karolherbst: and it was actually on src1...
01:27karolherbst: I have to try more stuff
01:27skeggsb: volta doesn't really have "limm", it's 32-bit on all instructions that take them
01:28skeggsb: some (but not all) can take it on src1 or src2
01:28karolherbst: well, "IMAD R0, R0, 0x12345678, R2;"
01:28karolherbst: skeggsb: yeah, I figured as much
01:28karolherbst: bits 32-63 or something, right?
01:28skeggsb: yeah, for both forms
01:28karolherbst: are there alu instructions not being able to do that?
01:28karolherbst: I guess that's valid for all
01:28karolherbst: and always the same place
01:29skeggsb: not all of them support the src1 immediate form
01:29skeggsb: or maybe the other way around, can't recall off the top of my head
01:29karolherbst: I guess for some it doesn't make a difference
01:29karolherbst: like add3
01:29karolherbst: maybe there is even max3/min3 now
01:30karolherbst: sadly no
01:30karolherbst: skeggsb: I found a difference between sm_70 and sm_72
01:31skeggsb: yeah, not surprised, figured there'd be some reason for them to distinguish it :P
01:31karolherbst: but hum...
01:31skeggsb: nfi what chipset they map to though
01:31karolherbst: "IMAD.U32 R2, RZ, RZ, c[0x0][0x160];" they do stuff like that
01:32karolherbst: opt level 3
01:32karolherbst: any clue what this is all about?
01:32skeggsb: none... is there no MOV instruction? :P
01:32karolherbst: seems like sm_70 doesn't know IMAX
01:33karolherbst: or at least not IMNMX
01:33karolherbst: or maybe it is slow
01:33karolherbst: wouldn't have thought that a max(max(a, b), c) would show a difference between sm70 and sm72
01:33skeggsb: yeah, i didn't find IMAX/IMNMX in my scans apparently
01:34karolherbst: I am sure it is there
01:34skeggsb: doesn't mean it's not there, there's a couple of others i had to hunt extra hard for, because nvdisasm is dumb
01:36karolherbst: nvdisasm says no
01:36karolherbst: skeggsb: do you now clcc?
01:36karolherbst: allthough, I guess you wouldn't need it
01:38karolherbst: skeggsb: maybe it is broken
01:39karolherbst: skeggsb: I am actually wondering what happens if we put the SM_72 imnmx opcode into the hardware
07:23hakzsam: imirkin: no, I think the late algebraic pass has been added after (for shldadd)
07:23hakzsam: add3 should be there I guess
07:58tomeu: tagr: you should be able to use the system 6.0 LLVM with the two top commits in this branch: https://github.com/tomeuv/SPIRV-LLVM-Translator/commits/for_nouveau_6.0
08:49tagr: tomeu: interesting, thanks
08:49tagr: tomeu: what's your development target? Tegra?
08:49tomeu: tagr: was using a jetson tk1 for these experiments, yeah
08:50tagr: tomeu: cool, I've only tested on TX1 so far, but great to hear that it's working on TK1 as well
08:51tagr: though, admittedly, it isn't very suprising
08:51tomeu: things were smoother than I was expecting
08:53tagr: tomeu: I like it when that happens =)
08:58karolherbst: hakzsam: I guess I will already take care of that. But we found a nice opt we could move earlier: iadd(neg(a), neg(b)) -> iadd3(a, -b, -c), otherwise we could have to do iadd(neg(a), -b)
09:00karolherbst: but maybe this should stay later as well, because it could interfere with optiing into mad or something
09:00karolherbst: but ModifierFolding is after algebraic opt anyway
09:03tagr: btw, has anyone else been seeing this: http://paste.debian.net/1022628/
09:03karolherbst: tagr: uhhh
09:04karolherbst: systemd again
09:04karolherbst: I wouldn't even know why systemd-logind needs a GPU context
09:04HdkR: Because it wants to do EVERYTHING
09:04karolherbst: there are things it's a good thing to do
09:04karolherbst: like saving backlight status
09:05karolherbst: but, you don't need a GPU context for that
09:05karolherbst: or doing premission thing like only real users are allowed to use the GPU for displaying stuff or something
09:05karolherbst: but still
09:07karolherbst: tagr: mind figuring out what exactly systemd-logind is doing here?
09:07karolherbst: I know we have some reports with systemd-logind around, but never got really to figure out what the hell is going on
09:10karolherbst: mhh, list of open files doesn't show anything obvious to me as well
09:19tagr: hmm... I was assuming that this was just reporting some top-level systemd-logind process that had spawned some GUI, but seems like that systemd-logind process is standalone
09:19tagr: what would it even do pushing stuff to Nouveau?
09:31karolherbst: tagr: yeah, exactly
09:31karolherbst: well there seems to be some memory copy stuff going on
09:31karolherbst: but ... I have no clue
09:31karolherbst: tagr: mind checking what files systemd-logind has opened? maybe there is something obvious
09:42karolherbst: imirkin, hakzsam: do you know if we do shl(neg(a), b) -> neg(shl(a, b))?
09:44karolherbst: I didn't find anything yet, but this should allow us to eliminate some neg(a) instructions
10:00tagr: karolherbst: looks like it's opened /dev/dri/card0, but I think that's just for monitoring, I don't see any code in systemd-logind that would actually talk to Nouveau via pushbuffer
10:01karolherbst: tagr: I was more thinking about using libdrm or something
10:01karolherbst: doing some kms stuff
10:01tagr: well, unless there's something implicitly pushing data, like maybe fbcon
10:01karolherbst: might be
10:01karolherbst: I am not really familiar in this kind of area
10:01tagr: this happens on both 4.14 LTS and 4.16 kernels, by the way
10:01tagr: it's odd that I've never noticed until a couple of days ago
10:02karolherbst: well, it's something coming from userspace
10:02karolherbst: the kernel version shouldn't matter
10:02tagr: I usually notice because X freezes for a couple of seconds (the freeze is actually longer/indefinite on v4.16, I /think/)
10:02karolherbst: are you using the modesetting DDX?
10:03karolherbst: but, hum, this shouldn't appear as logind
10:04tagr: and yes, I'm using the DDX
10:05tagr: should I retry this with glamor?
10:07karolherbst: modesetting is a DDX as well and uses glamor for accell
10:07karolherbst: I was more asking about if you use modesetting or the nouveau one
10:07tagr: I meant "Nouveau DDX"
10:07tagr: sorry =)
10:08karolherbst: then I am kind of out of ides
10:10tagr: it's really odd that this shows up as systemd-logind, X is nowhere near a child of that process
10:10karolherbst: right, it was just a random thought
10:10tagr: oh wait... I think this must be logind forwarding the file descriptor
10:11tagr: for rootless X I assmue
10:11tagr: assume even
10:11karolherbst: might be, yes
10:11karolherbst: maybe the nouveau ddx does something which is illegal for tegra?
10:11karolherbst: then again
10:11karolherbst: you don't run any ddx on the nvidia GPU usually
10:12karolherbst: I mean on tegra
10:12karolherbst: the stuff is done on the tegra display thingy, no?
10:12tagr: oh... sorry this is on a desktop =\
10:12karolherbst: tagr: this might be the offloading copy from nouveaudrm to tegradrm?
10:12tagr: I suck at bug reports =)
10:12karolherbst: it was just my assumption
10:12karolherbst: what GPU?
10:13tagr: 01:00.0 VGA compatible controller: NVIDIA Corporation G96GL [Quadro FX 380] (rev a1)
10:14karolherbst: what desktop environment?
10:14tagr: on X
10:14karolherbst: okay, so nothing running really
10:15tagr: well, there's a ton of chromium tabs, as usual
10:15tagr: come to think of it, I've only ever seen the freezes when doing something in chromium
10:15karolherbst: do you do something special before that happens or are those just random messages without any consequences?
10:16karolherbst: tagr: maybe it is something trivial like running out of vram and we usually don't care
10:16tagr: it's happened twice when I was sorting through spam on gmail, and it's happened earlier when I was doing something in a bug tracker
10:16karolherbst: does it happen if open tons of tabs?
10:16karolherbst: but doing real stuff
10:16karolherbst: no default tabs
10:16tagr: (I remember because I was annoyed to have to retype the message I had just carefully crafted for 10 minutes)
10:17tagr: honestly, I haven't tried with just a few tabs because I can't get myself to close all of them, perhaps I should
10:17karolherbst: I assume the pushbuf is real for now and that no corruption or anything happened (this could happen, but only with tons of contexts and stuff)
10:18karolherbst: tagr: well, my thought was if it is possible to crash the GPU by only opening a lot of tabs
10:18karolherbst: it obvioulsy tries to do a copy from/to a non present page
10:19karolherbst: and it might be just due to being out of vram and usually we have no real mechanism to detect that or even prevent crashing
10:19karolherbst: at least this is my working theory for now
10:19karolherbst: I am sure imirkin might have a better guess
10:19tagr: hmm... sounds plausible
10:19tagr: I'm pretty sure it's never happened out of the blue, I actually need to actively do something in chromium to trigger it
10:20karolherbst: mhh another thing to test
10:20karolherbst: could you check what the status is?
10:21karolherbst: if it doesn't do hw accell, chromium might be not that much invovled, allthough I don't know what chromium does and it's all faulty in the nouveau ddx
10:23tagr: some bits seem to be accelerated, canvas and compositing for example
10:23karolherbst: that's nough to cause problems :)
10:24tagr: opening/closing a bunch of default tabs seems to trigger the INVALID_CMD errors
10:24tagr: but no freeze
10:24karolherbst: at least something
10:25karolherbst: the copy might be also some race condition or so
10:25tagr: not reliably, though, it seems
10:25karolherbst: triggered nly if you have a lot of vram allocated
10:28tagr: yeah, opening a couple of video related websites does the trick, but still no freeze
10:28tagr: only INVALID_CMD
10:29tagr: oh... and the DATA_ERROR, seemingly when I closed the tabs
10:29karolherbst: okay, nice, I think this helps
10:29karolherbst: imirkin knows way how to debug all those things
10:30karolherbst: it is all async, so we have to put some debug prints into the ddx and log all that
10:30tagr: oh wow, how reproducible all of a sudden =)
10:32tagr: also causes some minor visual corruption in unrelated windows
10:32karolherbst: the m2m error is different
10:33karolherbst: tagr: how much VRAM 512?
10:34karolherbst: or only 256?
10:37karolherbst: well, nouveau is crappy in terms of memory management, so I wouldn't be surprised if that's just being out of vram
10:38karolherbst: maybe we should work on exporting how much VRAM is left :) allthought that's not that easily trackable if stuff can get moved out from/in to system memory
10:48tagr: karolherbst: aside from the actual difficulties involved, out-of-VRAM is something that could be handled gracefully, right? either swap out unused memory to system memory or as a last resort refuse to create new buffers
10:49tagr: I'm aware that I'm grossly oversimplifying here =)
10:49karolherbst: well, we kind of do those things, but some stuff can't be moved to system ram
10:49tagr: right... there's also the matter that not all APIs are aware of stuff moving around
10:50karolherbst: well, the bigger problem is, that you can place basically everything into system ram
10:50karolherbst: but when it's needed it might be moved to vram
10:50karolherbst: and when allocating you never know what's the case
10:50skeggsb: unless you see a "fail -12" or something in the kernel log, the issue has nothing to do with running out of vram..
10:51skeggsb: and *that* issue, is because userspace doesn't listen to failed command submissions, and continues on its merry way, ignoring that the GPU state doesn't match the sw state (because the submission failed), and causing further errors
10:52karolherbst: skeggsb: okay, makes sense
10:52skeggsb: not that any of that helps you tagr :P that's something more weird, your pushbufs are getting corrupted for *some* reason
10:52karolherbst: in this case userspace should be the nouveau DDX
10:52skeggsb: in this case, it's nothing related to running out of memory too ;)
10:53tagr: skeggsb: well, that at least narrows it down =)
10:53karolherbst: it was just a guess from my side anyway
10:53skeggsb: tagr: fx380 you say?
10:53tagr: skeggsb: yeah
10:53skeggsb:looks to see if he still has one here
10:53skeggsb: i might have sent it back to Qe
10:53karolherbst: skeggsb: I guess any g96 should do
10:54karolherbst: tagr: do you have other tesla gpus?
10:54tagr: karolherbst: no, unfortunately not
10:54karolherbst: would be nice to know if it's just causing troubles on a g96 or maybe only on a quadro or so
10:55karolherbst: well, I have two G92s I could try things on, bu not today and not tomorrow
10:55skeggsb: nope, i must've sent it back
10:55karolherbst: skeggsb: do you have any other G9x GPU?
10:56skeggsb: yes, but it's not necessarily going to effect them too
10:56tagr: I'd be happy to try any patches, but I'm pretty sure I won't be able to come up with anything sensible by myself =\
10:56karolherbst: I know, but it seems easy enough to test
10:56skeggsb: it might. i had a g96 plugged in yesterday, testing something else, but i'll try harder
10:57tagr: skeggsb: do you want me to file a bug for this?
11:01karolherbst: tagr: I guess you should do that, otherwise we might forget
11:02tagr: karolherbst: okay, will do
12:16karolherbst: hakzsam, imirkin: now I am kind of getting somewhere with iadd3: https://gist.githubusercontent.com/karolherbst/7a5fee17d3624317b7858040273464d0/raw/8998c311743409c5d1d16ecc9e369f3ac17e60bc/gistfile1.txt
12:25karolherbst: imirkin: actually, how does the CC bit works on add3? in theory we can have two overflows, no?
12:40karolherbst: did a small mistake in the last run. did the iadd3 opt even if a shladd opt happened before, but added more cases: https://gist.githubusercontent.com/karolherbst/7a5fee17d3624317b7858040273464d0/raw/c5d479764bdece4bcf5ba7634814e9854a07ac6c/gistfile1.txt
12:40karolherbst: still good
13:19imirkin: tagr: this is the famous 406040 error:
13:19imirkin: Apr 30 12:17:53 ulmo kernel: nouveau 0000:01:00.0: fifo: DMA_PUSHER - ch 2 [systemd-logind] get 0020017fe0 put 0020013dc4 ib_get 000001ab ib_put 000001cf state 80000024 (err: INVALID_CMD) push 00406040
13:20imirkin: happens on all tesla's afaik
13:20imirkin: no clue what causes it
13:20imirkin: once you get that error, you're basically fucked
13:21karolherbst: imirkin: well, we are kind of sure it is coming from the nouveau ddx
13:21karolherbst: and tagr is able to trigger it quite fast with chromium
13:22imirkin: the fd gets passed around, so you can basically never know what really triggers it
13:22skeggsb: it's X
13:22imirkin: also, it's something to do with how we operate the fifo in general
13:22karolherbst: he has only i3 running + chromium
13:22skeggsb: the channel id is the give-away
13:22imirkin: there are tons of reports about this
13:22imirkin: skeggsb: ok, maybe in this instance
13:22karolherbst: yeah, and we think it is the DDX
13:22karolherbst: but yeah
13:22imirkin: but in general it happens whereever
13:22karolherbst: maybe the actual reason is something else
13:23karolherbst: but maybe this might help us tracking it down, if it comes from the DDX, because this is a kind of controlled environment from our perspective
13:24imirkin: sure. good luck.
13:24imirkin: i tried and failed back when tesla was a more common thing
13:25karolherbst: maybe running valgrind/hellgrind on the Xorg server might give us something
13:25imirkin: my suspicion was that it was in the context switch
13:25karolherbst: might be
13:25skeggsb: it's happening at the fifo level, and that's all done in hw
13:25skeggsb: (ie. we can't fuck it up)
13:25karolherbst: except we forget to set MMIO_REG_TO_NOT_FUCK_UP :p
13:28imirkin: skeggsb: sounds like a challenge!
13:28karolherbst: imirkin: anyway, iadd3 is nice as it allows us to have an add with two neg mods applied (one at src0 or src1 and one at src2). I am just not quite sure where to really do this opt
13:28karolherbst: merging two adds with each having a neg, sure thats in lateAlgOpt
13:29imirkin: skeggsb: we could be not saving something off in the graphics context
13:29karolherbst: but besides that?
13:29imirkin: or the context switch comes at the wrong time and we don't prevent it?
13:29imirkin: or ... something
13:46glennk: what alignment do the IBs use?
14:40karolherbst: imirkin: do you think it makes sense to have a more flexible insnCanLoad? currently we can just check for loading into srcX of insn the src0 of a given instruction, but what we need is an overwrite for insn->op and choosing a different source of the "load" instruction
14:40imirkin: yeah dunno
14:40karolherbst: so we could do insnCanLoad(add, OP_ADD3, 2, add, 1) or something
14:41karolherbst: but then we also need something in the case we want to shuffle sources around
14:41imirkin: or you can always move it out of the way
14:41imirkin: and then do a manual propagation
14:41karolherbst: well, limms are a problem
14:42karolherbst: this is what I have currently: https://github.com/karolherbst/mesa/compare/8b7358fe4376aecee0c29ea622f88f9ef07e6b11...karolherbst:nvir_opt_shladd_const
14:42imirkin: i mean always stick the imm into a separate op
14:42imirkin: and then do a conditional propagate
14:43karolherbst: the problem is, I still have the add not being the add3. I could do add->op = OP_ADD3 and revert it after checking, but that's kind of ugly
14:44karolherbst: but this issue can be solved by easily overwriting the op, but still doesn't solve the issue of shuffling srcs around
14:44imirkin: you don't understand my point
14:44karolherbst: like for add(c, neg(a)) I want to do add(0, c, -a)
14:45imirkin: but ... whatever
14:45imirkin: do whatever, as long as it works in the end
14:45imirkin: once you have something, we can discuss further
14:45karolherbst: well I have the patches ready, it works afaik
14:45karolherbst: I just don't like how it is done
14:46imirkin: well i can't look right now either way
14:47karolherbst: I just hoped you might have a good idea on what to add to the Target class to not having to add hacks into peephole
14:50karolherbst: but maybe we need something completly different, like "Layout validLayout(OP_ADD3, srcs, constraints)" and it just tells us where each srcs has to go or returns error if no valid combination is possible
14:51karolherbst: constraints being something like [a, b] can only be at src0, src1 and c has to be at src2
17:27docmax: could it be possible to hack the hardware to be able to do things quadro cards can do?
17:27docmax: provide vGPU support for example on geforce cards?
17:29imirkin_: if it uses stuff like virtual pci functions, then highly unlikely
17:29imirkin_: also afaik there's tons of driver-level support to make this go
17:30imirkin_: (this is all speculation, btw)
18:24karolherbst: docmax: depends on the feature really
18:25karolherbst: docmax: for example we exposed the power consumption on pre maxwell GPUs allthough nvidia never did that
18:25karolherbst: which is also kind of a quadro feature
18:25karolherbst: or was
18:33Lyude: imirkin_: sure; it'll need to wait until I'm in the office tommorrow though since I didn't bring any hardware home I could use for testing that
18:33imirkin_: Lyude: this has been like 12 months in the making. another day won't hurt too bad :)
18:34imirkin_: also note that none of my changes really fix anytyhing, but hopefully they'll provide us with more information