09:58 RSpliet: pmoreau: you're right, OpenQL is the future now...
10:16 pmoreau: RSpliet: OpenQL? o_O
10:16 pmoreau: Query Language?
10:16 RSpliet: Open Quantum Library (OpenQL)
10:17 pmoreau: :-D
17:53 karolherbst: imirkin: was this the commit with the RA stuff? https://github.com/imirkin/mesa/commit/671aade0a6fb67919c70ccc0620942c560faa359
18:09 Lyude: oh karolherbst btw; I realized I lied, i've acutally had a branch with the powergating work uploaded on github that I forgot about (just did a force push to it though just to make sure it's up to date) https://github.com/Lyude/linux/tree/wip/fermi%2B-clockgating-v4
18:09 Lyude: probably needs to be rebased
18:09 karolherbst: ahh nice
18:10 karolherbst: ohh a kernel tree, oh well
18:12 Lyude: yeah sorry D:, I don't usually work on the OOT nouveau repo. using the normal kernel repo integrates a lot better with my workflow and plethora of scripts
18:12 karolherbst: the second patch looks nice. I guess I will try it out at home on my GPU and see how big the benefit is
18:13 karolherbst: I guess I'll also go over the patches and leave some comments or so
18:15 karolherbst: hihi "total local used in shared programs : 1 -> 21909 (2190800.00%)"
18:18 Lyude: karolherbst: yeah, for the rest of them (for BLCG, SLPG or whatever the one that starts with an S should be just adding other register writes and maybe more hooks)
18:18 Lyude: oops, for the rest of them it should just be scanning the mmio traces and copying the registers
18:18 Lyude: i am very curious about ELPG too though, have pondered some solutions for figuring out the inter-engine power gating dependencies
18:23 mupuf: Lyude: ELPG will require interactions with the pmu
18:24 Lyude: oh, that I did not realize
18:24 mupuf: never managed to make it work
18:24 mupuf: but there are regs with nice names in the nvgpu tree
18:24 karolherbst: mupuf: is it something like we also disable all means of communications from the host, so that onle the PMU can do stuff to enable things again?
18:25 mupuf: one of them being HISTOGRAM, that probably stores a histogram of how often we power gate (with each bin meaning a certain time of activity)
18:26 Lyude: so other then setting the powergating regs, any idea what the other interactions we're supposed to have with PMU are?
18:26 mupuf: Lyude: nope, but you can check nvgpu, it has all you need
18:26 Lyude: hm, alright
18:27 mupuf: I just could not reproduce it on nouvea
18:27 mupuf: that was years ago though
18:45 karolherbst: imirkin: could it become a problem if spills use SSA registers, which got removed by RA? pre-spill: merge u64 %r227d %r165 %r169 after spill: ld u64 %r228d l[0x0]
18:51 karolherbst: ohh, I see what is going on with the shader. It just needs more retries, because the spilling code adds new live ranges and requires more values to be spilled due to that, annoying
20:05 imirkin_: karolherbst: yes.
20:05 imirkin_: (that was the commit with the RA stuff)
20:05 imirkin_: but it's ... not quite right
20:07 karolherbst: yeah, I found a TGSI where it isn't working
20:08 karolherbst: last failed state in RA (same output even with that patch): https://gist.github.com/karolherbst/5f3dfa048152a98dc5fe7e12482b9e2b
20:12 imirkin_: yes, well, like i said - the patch has issues ;)
20:39 karolherbst: imirkin_: any idea why the marked edge is important in this case? https://gist.github.com/karolherbst/2f9ab5d372c24a9d29101fea6e5ea512
20:39 karolherbst: or is even there
20:40 imirkin_: the RA algorithm has resisted understanding from me
20:41 imirkin_: which is double-sad, since i've implemented one myself a very veyr very long time ago
20:41 imirkin_: in a galaxy far far away
20:41 dcomp: Any changes to maxwell1 (840M) (GM108). I'm sorry to say I've been stuck on the red team (radeon) whilst my other half needed my nvidia laptop. Is reclocking mainstream yet?
20:41 imirkin_: for MIPS :)
20:41 karolherbst: well sure, but I don't see how the marked edge is even important for deciding on spilling the value or not
20:41 karolherbst: just doesn't make any sense to be there
20:41 imirkin_: dcomp: should be able to reclock since 4.12
20:42 imirkin_: karolherbst: oh, you mean for determining which stuff to spill?
20:42 karolherbst: yea
20:42 imirkin_: yeah, not 100% sure on that.
20:42 karolherbst: my assumption now is, that maybe those dependencies might be not 100% correct
20:42 karolherbst: and that's why we don't spill something we actually should spill
20:43 karolherbst: this example wouldn't matter much, but what if there are edges missing
20:46 karolherbst: at least I have a small lead now I can follow
20:57 Lyude: dcomp: remember it's not dynamic though, you have to change the perf levels by hand through sysfs
21:02 dcomp: Lyude: does it reclock on boot, I think I couldn't use it without a reclock on boot.
21:06 karolherbst: dcomp: nouveau.config=NvClkMode=15
21:06 karolherbst: or put the number of the perf level you want the module to boot ot
21:06 karolherbst: *to
21:07 dcomp: will that work with runpm or do I still need to disable that
21:08 imirkin_: iirc it should set that mode on resume from runpm. but if there were other reasons why you had to disable runpm, those are unlikely to have changed.
21:29 pmoreau: Yes, clCreateProgramWithSource() working again! (Thank god I found I previous patch, as I could just reuse it to have it work again.)
21:30 Lyude: \o/
21:31 pmoreau: If I didn’t had labs to supervise tomorrow at 08:00, I would try to get the OpenCL CTS up and running
21:32 pmoreau: Lyude: Are you attending XDC?
21:34 Lyude: pmoreau: ye
21:35 pmoreau: Ah, nice! Enjoy it then! :-)
21:35 pmoreau: If I didn’t had a paper deadline + 2 labs to supervise, I would have attented it… Hopefully next time
21:37 Lyude: thanks!
21:38 karolherbst: imirkin_: okay, I think I found the issue
21:39 karolherbst: imirkin_: 12: split b128 { %r149 %r161 %r173 %r185 } %r261q
21:39 karolherbst: %r173 is never checked if the value should be spilled eg
21:39 karolherbst: same for all the other vlaues
21:39 karolherbst: none of them are ever spilled
21:39 imirkin_: yeah, coz it checks r261q probably
21:40 imirkin_: they're all part of the same RIG_Node
21:40 karolherbst: well, "SIMPLIFY: pushed %261(spill)"
21:40 karolherbst: but those aren't spilled
21:41 karolherbst: so they stay being live values
21:41 karolherbst: and then we get above the actual register limit
21:42 karolherbst: https://gist.github.com/karolherbst/b499d8d702f24d4406b8ddc3d5ea7e36
21:42 karolherbst: every other line are the recent live values
21:42 karolherbst: register limit is 8
21:42 karolherbst: and prior instruction 15, r149 could be spilled
21:43 karolherbst: ohh wait
21:43 karolherbst: it couldn't
21:43 karolherbst: but r146 could have been
21:43 karolherbst: odd
21:43 pmoreau: anEpiov: Pushed the patch for clCreateProgramWithSource() and edited the instructions and script (you need to replace @@LLVM_INSTALL_PREFIX@@ by the correct path, in src/gallium/state_trackers/clover/spirv/invocation.cpp).
21:44 karolherbst: now that I think about it, why isn't r146 spilled
21:45 karolherbst: it is wrote to in instruction 10
21:45 karolherbst: and read in 83
21:45 karolherbst: as part of a merge
21:45 karolherbst: okay, r146 isnt checked for being spilled as well
21:48 karolherbst: 146 isn't part of any RIG_Node
21:48 karolherbst: imirkin_: could it be, that we simply miss generating some RIG_Nodes? because now it really looks like that
22:02 karolherbst: mhhh interesting
22:02 karolherbst: RIG_Node[%146]($[1]-1): 1 colors, weight inf, deg 0/8 X
22:07 karolherbst: but it can be spilled!
22:12 karolherbst: yeah, it seems like all the RIG_Nodes of those split/merge values are totally unusable
22:16 karolherbst: imirkin_: does this make sense to you, then any RIG_Node of the values of a split needs to contain their individual edges as well? Doesn't make sense to spill the original value anymore, because it doesn't exist due to the split
22:17 imirkin_: which is why those aren't spilled
22:17 karolherbst: exactly
22:17 imirkin_: this isn't necessarily a great idea btw
22:17 karolherbst: and that's why we can't compile shader with _tons_ of tex with merged regs as well
22:17 imirkin_: how all this merging stuff is done.
22:17 karolherbst: don't know about that
22:17 karolherbst: it kind of makes sense to have something like this, but maybe how it is done is not good
22:19 karolherbst: anyhow, I would first try to get the RIG_Nodes built proberly and maybe this fixes most of our current issues here
22:19 imirkin_: RIG_Node's are built properly afaik
22:19 karolherbst: no
22:19 karolherbst: if weight is inf, it isn't spilled
22:19 karolherbst: at least this part is wrong
22:20 imirkin_: why is the weight inf?
22:20 karolherbst: well the weight for the split regs
22:20 karolherbst: no outgoing/incoming edges
22:20 karolherbst: afaik
22:20 imirkin_: anyways, i dunno
22:20 imirkin_: just saying ... this stuff is subtle. be careful.
22:20 karolherbst: will try
22:20 imirkin_: getting it wrong causes subtle issues.
22:20 karolherbst: I can imagine that
22:20 imirkin_: i know coz i've fixed a bunch of them :)
22:21 karolherbst: :D
22:21 imirkin_: 300b5ad023962ee95322e890a9ba57396392407e
22:22 imirkin_: dfb0ca16065c1d251101bb094f2cfd08cf3cda15
22:22 imirkin_: good times.
22:23 imirkin_: took me 2 tries to fix the same issue :)
22:24 karolherbst: ugh
22:39 optlink: I'm getting hangs whenever I try to reclock my GTX 960M on an optimus system. Running 4.12.13 now but I don't think reclocking has ever worked on this system.
22:41 karolherbst: optlink: the GPU has to be enabled
22:41 karolherbst: if it suspended, the echo might cause troubles
22:41 karolherbst: but if you run something on the GPU and reclock then, it should work (t)
22:41 karolherbst: *(tm)
22:42 optlink: karolherbst: that's interesting. If I try to reclock before running something I can sometimes run for a few minutes before hanging. I just tried what you suggested earlier and I was hung instantly
22:43 karolherbst: interesting
22:43 karolherbst: optlink: do you have any dmesg or so when it does that?
22:45 optlink: I do not. It's kind of hard to do anything when nothing works except mouse movement
22:45 karolherbst: ssh?
22:45 optlink: i haven't tried that one yet, I'll see what I can do
22:46 karolherbst: in older kernel versions, intel behaved more sane as well. sane means not hanging the system if the offloaded GPU doesn't respond
22:47 optlink: that might explain why I could sometimes still use X and some commands in older versions
22:48 karolherbst: well prior 4.12 you shouldn't be able to reclock the GPU anyhow
22:49 optlink: I've been able to do this since ~4.10 if I recall correctly. Prior to that I think the GPU didn't register as a offloadable GPU
22:49 optlink: but that was using a driver I built from git
22:56 karolherbst: mhh okay, could be that only memory reclocking was disabled on maxwell
22:57 karolherbst: optlink: we still have a few reclocking issues on kepler/maxwell GPUs, it would help us if you could create a mmiotrace on your GPU on nvidia and provide a vbios
22:58 optlink: i'm not sure how to do either of those things. Is there a document I can refer to?
23:09 karolherbst: imirkin_: the range of those split values: RIG_Node[%185]($[1]-1): 1 colors, weight inf, deg 0/8, range 4294967295/0
23:10 karolherbst: makes sense, doesn't it ;)
23:12 karolherbst: fun
23:12 karolherbst: those "silly movs" have the same kind of broken RIG_Node
23:20 optlink: karolherbst: I managed to get a good 10 seconds of unigine heaven out of it. Here's my dmesg log: https://hastebin.com/bawigigoka
23:21 karolherbst: mhh
23:22 karolherbst: I was expecting a bit more of information
23:23 karolherbst: I think you might want to wait longer?
23:24 optlink: you mean just let it sit there while hung?
23:27 optlink: karolherbst: I have managed to dump the vbios (I think)
23:27 karolherbst: optlink: yeah
23:27 karolherbst: like 1-2 minutes
23:28 optlink: karolherbst: ok sounds good
23:33 karolherbst: okay, seems like coalesce(insns) "breaks" those Rig_nodes
23:44 optlink: karolherbst: This should be a bit more informative: https://hastebin.com/simoqiweko
23:46 karolherbst: huh
23:47 karolherbst: optlink: and this only happens after reclocking?
23:47 optlink: as far as I have tested, yes
23:48 karolherbst: ohhh
23:48 karolherbst: imirkin_: Interval::unify "destroys" the given interval :/
23:53 optlink: karolherbst: any ideas? I really appreciate the help, by the way.
23:54 karolherbst: optlink: not really
23:54 karolherbst: optlink: which perf level did you tesT?
23:54 karolherbst: or do you have only 2?
23:55 optlink: karolherbst: I tested 0f. I have two other pstates: 07 and 0a
23:55 karolherbst: optlink: what happens if you clock ot 07 and 0a?
23:57 optlink: karolherbst: roughly the same behavior. I get hangs at some point though if I'm lucky it won't be until I close the application on 0a. 07 seems to be the default but I can get hangs if I try to set it when AC reads 0 and 0