04:19karolherbst: imirkin: darkbasic tested radeonsi schedule patches and found out its faster than catalyst in nearly all benchmarks now
04:20karolherbst: metro got a +75% perf boost
04:21karolherbst: and unigine none
04:25karolherbst: and they already used like this generic schedule from llvm before
06:52echo083: Pointer to TMDS table invalid is it important ?
06:52echo083: follown with Pointer to flat panel table invalid
08:21imirkin: karolherbst: like i said, instruction scheduling is important :)
08:51karolherbst: yeah but I got the feeling, that you aren't sure if it may be a big difference
08:52imirkin: karolherbst: it *may* be, but it's difficult to get it there
09:00RSpliet: karolherbst: the next difficulty is figuring out *what* to schedule for :-)
09:01RSpliet: eg. what would be the best strategy to improve performance; it likely means finding a strategy to better saturate the memory bus
09:01RSpliet: but does that mean scheduling for more adjacent read ops; or rather scheduling for minimum register pressure?
09:02RSpliet: (and thus maximising the number of threads that can run "concurrently"
09:54karolherbst: RSpliet: maybe it makes sense to decide upon how the current load is?
09:54karolherbst: or what kind of programs there are in total
09:56karolherbst: but I think the easiest thing is just to keep the core doing work all the time and reduce waits for now
09:59imirkin: karolherbst: do it in terms of live values.
09:59imirkin: and instruction latencies
10:34karolherbst: I guess this information is somehow known already or do I have to figure it out?
10:34karolherbst: the latter one
10:37imirkin: karolherbst: targ->getLatency()
10:37imirkin: the info's not exactly perfect, but it's the place where such info should be improved
10:53RSpliet: imirkin, karolherbst: instruction latencies I think will only help if you want to start doing dual issue efficiently
10:53imirkin: RSpliet: dual issue is a separate thing
10:53imirkin: this is "how long until the result is ready"
10:55RSpliet: yes, and with dual issue you are likely to dispatch two instructions of unequal length; now I don't know what the pipeline loks like, but it could make sense trying to issue two insns of roughly equal length *or* trying to fill up slots only when you know the particular subcomponent is free
10:57imirkin: RSpliet: and if you use the result immediately, that means stalling until the thing is there
10:57imirkin: RSpliet: however if you don't use the result immediately, then you can keep the pipeline going
10:57imirkin: RSpliet: dual-issue, afaik, is about issuing within the *same cycle* to 2 different units
10:57imirkin: but i'm not 100% sure
11:05karolherbst: sooo, for example we would try to find the best binary code for a given program and try to analyze it deeper to understand how the hardware works?
11:05RSpliet: imirkin: there's plenty of considerations, but I'm sure proper utilisation of a dual-issue pipeline is a scheduling problem :-)
11:07RSpliet: consider how far it can continue in a program if one of the two instructions cannot be dispatched at a given moment...
11:08RSpliet: (whether it's pairwise or "sliding window")
11:11imirkin: RSpliet: 100% agreed. dual-issue is a thing to care about in there
11:11imirkin: however... let's not try to fix the WHOLE universe at once
11:13RSpliet: just the universe? :-(
11:14imirkin: karolherbst: step 1 is to create the proper infrastructure for experimentation
11:15imirkin: i.e. something that works on the basis of various constraints, and then we can futz with those constraints
11:15karolherbst: I see
11:15RSpliet: speaking of which... look up work done by "shinpei"
11:16imirkin: yusukesuzuki: btw, did you get anywhere with your single-step thing? is it available somewhere?
11:17karolherbst: I still get this phi issue... and it only happens with really few programs and usually everything still runs except unigine heaven :/
11:18karolherbst: should have something to do with live ranges
11:18imirkin: karolherbst: are you doing stuff roughly the way i suggested to do it?
11:18karolherbst: is there some constraints between an instructions "writing" into a phi source and its position?
11:18karolherbst: yeah, and it works for most of the stuff
11:18imirkin: phi nodes must come at the start of a basic block
11:18karolherbst: yeah I know
11:18imirkin: unless you mess around with cfg edges, shouldn't matter
11:19karolherbst: the program also looks fine to me
11:19imirkin: there is, however, a bb->joinAt thing
11:19karolherbst: yeah, saw it
11:19imirkin: which iirc is a pointer to the join instruction
11:19imirkin: in a different bb
11:19imirkin: however that *also* shouldn't matter
11:19karolherbst: yeah, but this pointer stays the same
11:19RSpliet: karolherbst: on second thought, no scrap that... shinpei scheduled everything except instructions I think :-D
11:19karolherbst: I don't touch it
11:19yusukesuzuki: imirkin: oh, thank you! i'm now stuck in how to modify the MP registers without using NVIDIA binary blob gr firmware... now i'm writing the articles, but after it's done, i'll reinvestigate it :D
11:20imirkin: yusukesuzuki: cool. if you could post the stuff you _have_ done (using blob fw), perhaps we can help. unless you wanted to do it yourself.
11:20karolherbst: basically this check fails " nRep->livei.overlaps(nVal->livei)"
11:21imirkin: yusukesuzuki: it's not urgent or anything, just wanted to check up on status :)
11:21yusukesuzuki: imirkin: great! I attempted to modify the MP registers through nouveau SW interface, but it seemed that the timing is not good (context is switched during this?).
11:21imirkin: yusukesuzuki: hmmm... you added sw methods and it didn't work?
11:22imirkin: anyways, i guess it has to be in the ctxsw firmware somewhere
11:22karolherbst: rep is the node of "dst->join->asLValue();"
11:22karolherbst: nval the node of "src->join->asLValue();"
11:22imirkin: karolherbst: are either src or dst coalesced nodes?
11:22karolherbst: I can't do much with that and don't really see what the "real" issue is sadly
11:23yusukesuzuki: imirkin: yup. added SW method to nouveau and hit it from the DRM channel, but it does not work.
11:23karolherbst: imirkin: it happens inside GCRA::coalesceValues
11:23yusukesuzuki: and nouveau side https://github.com/CPFL/linux/commit/4a371765e0b1e51ab592629cab1d58200696d5cc
11:23imirkin: karolherbst: can you output the program both before and after your manipulations, and indicate to me what src and dst are?
11:23karolherbst: do you need something else? because I can't do it right now, do you have some time later?
11:25imirkin: yusukesuzuki: hmmmmm... i'd need to brush up on the details, but that doesn't *seem* quite right
11:25imirkin: yusukesuzuki: i can't look at it now though... but perhaps later
11:26imirkin: karolherbst: basically i need to see what code the thing is choking on. and when you send me some giant program, it'll take me forever to check
11:26imirkin: karolherbst: however if i know *exactly* what instruction it's dying on, that's much easier
11:27yusukesuzuki: imirkin: oh! sounds nice. I thought that sw method is called since inserted nv_error is called with the specified values.
11:28imirkin: yusukesuzuki: oh no, it's called. i just don't know that you're doing the right thing inside it ;)
11:30yusukesuzuki: imirkin: ah, make sense.
11:30imirkin: but perhaps you are, like i said, it's been a while since i looked at that stuff
12:31wildross_: Greetings, I have a Lenovo w541 laptop with Optimus hardware. I've got external monitors sort of working with Fed 22 and the Nouveau drivers. Any pointers to further setup? (Right now I'm vanilla and I get some boot errors)
12:31imirkin: wildross_: what are you trying to achieve?
12:32wildross_: Being able to reliably dock and undock the laptop w/o it crashing.
12:32imirkin: and presumably that's not the case right now?
12:32wildross_: Nope. Locks the laptop either going onto or off of the dock.
12:34wildross_:thought he was done having to mess with video a decade ago....
12:35imirkin: just use decade-old hardware and you'll be all set ;)
12:36imirkin: anyways, the first thing i'd do is grab the latest and greatest kernel
12:36wildross_: Well, I do drive a 15 year old truck and a 11 year old SUV....so I'm trying
12:36wildross_: hmm, ok.
12:37imirkin: if the issue still happens, then try to get a trace using netconsole or something
12:37imirkin: i'm not actively aware of any such issues with docking, but then again it's a pretty rare setup
12:37imirkin: and i'm forgetful :)
12:37wildross_: ok, I'll try a newer kernel...
12:38Hoolootwo: my decade-old hardware is really really slow, so I would recommend not doing that
12:39imirkin:just picked up a PowerMac7,3 - dual 2GHz with a NV34. heh.
12:39imirkin: still trying to get it to boot though
12:40Hoolootwo: my old dell (2.4Ghz, NV17) still works but is really too slow, which is really not that surprising
12:41Hoolootwo: a 2Ghz dell with some radeon chip is fast enough though, so I'm not sure what the difference is
12:43imirkin: well, GHz aren't all the same
12:44imirkin: the 3Ghz athlon xp from 2003 isn't quite the same as a 2GHz skylake chip
12:44imirkin: and not just coz of the core count
12:45imirkin: although that certainly doesn't hurt
12:46Hoolootwo: it's a pentium 4M on the slow one and a Pentium M on the fastish one
12:46Hoolootwo: so they shouldn't be too different
12:46Hoolootwo: but still, there is probably a bottleneck somewhere which I'm missing
12:46imirkin: er, those two are entirely different arch's
12:47imirkin: p4 = slow, pentium-m = the initial "core" arch, no?
12:51Karlton: calculating the FLOPS is a bit more accurate in guessing the performance than just knowing the frequency of a cpu
13:09Hoolootwo: yeah, that would explain the difference
13:46xiay_: Hi folks, what is the difference of cechan and channel in struct nouveau_drm?
13:49imirkin: cechan probably refers to a copy engine channel
13:50imirkin: i think it's a separate context used to move buffers around
13:50imirkin: not sure
13:50imirkin: [i.e. when ttm says to move a buffer in/out of vram, copy engine is used]
13:51imirkin: i think the tegras might not have copy engines though
13:51imirkin: [and/or copy engines were integrated into pfifo for kepler+? i'm weak on the details]
13:52xiay_: so CE refers to copy engine?
13:56RSpliet: xiay_: that's how I know the abbreviation yes
14:00RSpliet: imirkin: http://envytools.readthedocs.org/en/latest/hw/fifo/pcopy.html?highlight=pcopy implies there's been one since GT21x
14:23Hoolootwo: http://i.imgur.com/6mspoOL.jpg something happened
14:24Hoolootwo: I couldn't get a regular screenshot which showed the problem with any utility I know of
14:27Hoolootwo: unfortunately dmesg has nothing useful because it filled up with messages about my wifi adapter
14:27RSpliet: Hoolootwo: I'm afraid you have to pay Canal+ for a descrambler, you can't just watch that without a subscription
14:28RSpliet: srsly though: one thing you can do is increase the size of the kernel log so that next time it happens, the likelihood of useful messages is higher
14:28Hoolootwo: yeah will do
14:29RSpliet: it looks like it's scanning out uninitialised memory
14:31imirkin: Hoolootwo: so that means that the software thinks it's doing the right thing
14:31imirkin: but reality begs to differ
14:32Hoolootwo: yeah, weird things have been happening lately with that laptop, I'm going to do a bisect to see if I can pinpoint when
14:32Hoolootwo: it's not easy though since sometimes it just works
14:32Hoolootwo: this is my NVS 3100M one
14:34Hoolootwo: is there any debug info I can pull from it or should I reset?
14:36imirkin: anything in dmesg? does the screen update at all, or semi-constant garbage?
14:38imirkin: is that still a thing?
14:40Hoolootwo: it is updating, with what appears to be the cpu, ram, etc. bars
14:40Hoolootwo: also I can see an Xchat logo which lit up when I was highlighted
14:41Hoolootwo: it happened when I plugged in a monitor, but it did refuse to sleep beforehand
14:42Hoolootwo: I'm not sure if the not-suspending thing is related though
14:42Hoolootwo: and dmesg is not changing since I turned off wifi
14:44imirkin: yeah, i was sort of expecting a PDISP-related splat
14:45imirkin: Hoolootwo: what if you go through a modeset operation...
14:45imirkin: e.g. try to change your screen resolution
14:49karolherbst: imirkin: https://gist.github.com/karolherbst/c337f012471226677e5c instruction 124
14:49Hoolootwo: it did not seem to change anything
14:50Hoolootwo: the external monitor is still blank
14:50karolherbst: first source
14:50imirkin: 124: phi u32 %r429 %r629 %r640 (0)
14:50imirkin: and it hates %r629?
14:50karolherbst: insn->getSrc(c) c is 0
14:50karolherbst: I can also give you the live ranges
14:51imirkin: 134: break BB:21 (0)
14:52imirkin: oh nm. found it.
14:52imirkin: [the BB that is]
14:52karolherbst: the one move instruction with %629 moved up quite a lot, does it matter?
14:53karolherbst: there is also a merge
14:54karolherbst: ohh could it be, that the one mov shouldn't jump over the merge?
14:55karolherbst: mhh, treated merge as fixed now and it stops at 129 phi then
14:56karolherbst: for the second one a mov jumps over a split
14:57imirkin: karolherbst: dunno, seems reasonable to me
14:57karolherbst: treating split as fixed doesn't help though now
14:57imirkin: that def shouldn't matter
14:58karolherbst: https://gist.github.com/karolherbst/15cb83ffb08c553a825a instruction 129 source 0
14:58karolherbst: 129: phi u32 %r434 %r634 %r645
14:58karolherbst: maybe this shows the real issue more clearly
15:00karolherbst: rep node ranges are 131-134 and 135-180, which overlaps with val node range 171-183
15:02karolherbst: the both instruction in play cleared switched relative order
15:02karolherbst: 180 and 171
15:03karolherbst: but I don't really understand what that means yet
15:03imirkin: so were we just getting lucky before?
15:03Hoolootwo: I'm resetting this thing because I need to rebuild my servers' array and I need a laptop not booted off the server
15:04karolherbst: imirkin: you mean it worked by luck before?
15:04karolherbst: and there is some kind of dependency missing or something?
15:06imirkin: karolherbst: could be, not sure
15:07karolherbst: wanna see my current patch?
15:11imirkin: sorry, not right now
15:30imirkin: karolherbst: ok, i think i know what's up
15:30imirkin: karolherbst: you need to not touch the mov's inserted by PhiMovesPass
15:30karolherbst: nice :) and I tried to understand phi, which I though I understood but didn't make sense in the blocks :/
15:30imirkin: mark those fixed or something
15:30karolherbst: inside PhiMovesPass?
15:31imirkin: or give them a subop or... something to recognize them
15:31imirkin: yeah, there's a PhiMovesPass::something which inserts OP_MOV's left and right
15:31imirkin: to the end of bb's
15:31imirkin: those movs need to stay at the end i think
15:31imirkin: sounds right :)
15:32karolherbst: okay, found the place
15:33karolherbst: but fixed sounds kind of right to me?
15:34karolherbst: nice, works
15:37karolherbst: ohh right, this was a bipshock shader
15:42karolherbst: I get one "WARNING: out of code space, evicting all shaders." with bioshock
15:44imirkin: that's normal
15:44imirkin: or rather, not a problem
15:44imirkin: it's a perf issue
15:45imirkin: [actually there's a minor bug around how we do it, but highly unlikely that anyone will notice]
15:45karolherbst: now the only issue left is the out of splilling candidates in the one heaven shader
15:45karolherbst: everything else seems to work fine now
15:45imirkin: does that still happen?
15:46karolherbst: yeah, in the 4000 ins heaven shader
15:46karolherbst: currently debugging this
15:46karolherbst: "DLLIST_ADDHEAD(&hi, &nodes[i]);" with i = 5895
15:46karolherbst: could this be a problem?
15:50karolherbst: nodes should be like 17000 big
15:50karolherbst: hi contains a lot of garbage
15:51karolherbst: ptr value is also quite high (@0x7ffd2b92afc0)
15:51karolherbst: somehow the memory gets kind of corrupted
15:52imirkin: probably not fun when used with heaven :)
15:52karolherbst: GCRA this pointer: @0x7ffd2b92ae90
15:53karolherbst: should be 64bit?
15:53imirkin: but if you can just use this shader with nouveau_compiler should be easier
15:53karolherbst: then its all right
15:53karolherbst: I have it
15:54karolherbst: yeah got some invalid writes
15:57karolherbst: same place
16:15karolherbst: okay, hi next and prev points into some memory location with only 0s
16:16karolherbst: but somehow vallgrind isn't that usefull helping here :/ *sigh*
16:48marcosps: imirkin: did you tried to cross compile mesa to 32bit to test with steam?
16:52marcosps: imirkin: as Dead Island is 32bit, maybe compiling to 32bit, and use LD_LIBRARY_PATH, could make this work here hehe
17:01karolherbst: can "gr: TRAP ch 2 [00bf890000" this be memory or core related?
17:01karolherbst: my core drives with a too low voltage usually
17:05RSpliet: when providing too little juice to a core, any kind of malfunction can happen... that's goung to be hard to deduce
17:26karolherbst: imirkin: at least when I give RA only one try, then it doesn't segfault :/
17:29karolherbst: ohh what a nasty bug
17:29karolherbst: seems to have effect on other compilations, too :/
17:37karolherbst: strike, found it
17:37karolherbst: nv50_ir_emit_nvc0.cpp:3055: v->reg.size is like 1073741824
17:40karolherbst: ohh what a pity
17:40karolherbst: only occurs inside valgrind
19:21kb9vqf: I have an interesting bug that's happened twice now on my GTX 680: when switching to a new X session, the session locks up almost immediately and the console shows "PGRAPH engine fault on channel 5"
19:21kb9vqf: any ideas?
19:21kb9vqf: killing the new Xorg session with -9 allows the older, backgrounded Xorg session to be restored without issue