00:13karolherbst: imirkin_: disabled this line and then it compiles and generates _much_ less spills: https://github.com/karolherbst/mesa/blob/master/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp#L946
00:14karolherbst: maybe it also broke stuff
00:14karolherbst: yep, broken as hell
00:17karolherbst: now it looks better
00:18karolherbst: imirkin_: what do you say about removing those two lines: https://github.com/karolherbst/mesa/blob/master/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp#L1088-L1089
00:23optlink: karolherbst: I experienced a rather severe hang after trying to reclock to 07: https://hastebin.com/zumekobagu
00:24karolherbst: yeah, somehow the GPU doesn't really like the reclock
00:24karolherbst: I am _quite_ sure it is something _really_ silly going on
00:25karolherbst: optlink: could you boot with nouvea.config=NvMemExec=0 and try reclocking then?
00:26optlink: karolherbst: yeah I can do that
00:35optlink: karolherbst: my system hung immediately after reclocking to 0f
00:36karolherbst: optlink: what about lower perf levels?
00:37optlink: I'll try the other two and get back
00:51optlink: karolherbst: ok so I started heaven, clocked to 07 and ran it through fine with no hangs. I then clocked up to 0a and within a few moments the window went black: the process was totally hung and unresponsive. After a minute or two it cleared up just like before
00:54optlink: so I guess it hung but x was still working this time
16:48karolherbst: skeggsb: the maxwell2 PMU hack: https://github.com/karolherbst/nouveau/commit/cb9063847480c8efbf87039a76e596dc667afe87
16:54RSpliet: karolherbst: nitpick: helpfull -> helpful
16:55RSpliet: hah, nm, you already knew
17:24optlink: sorry to bother again but I'm still having problems reclocking my 960M. The middle and high performance options are unstable, causing system hangs.
17:28RSpliet: optlink: is that with nouveau.config="NvMemExec=0" as a kernel param?
17:34optlink: RSpliet: yes it is
18:01optlink: RSpliet: I'm also having a problem where attempting the command to change the clock itself is hanging if done after a previous reclock attempt
18:33RSpliet: optlink: if it is, then... the fix shouldn't be too complex, but you might have to get an mmiotrace of the official driver to see what they do differently from nouveau
18:34RSpliet: unfortunately, I can't help you with that right now, but if the on-line guides prove useless, you might be able to get help from some other people here
18:34RSpliet: (although they're nearly all at a conference in California at the moment, so replies might be delayed)
18:43optlink: RSpliet: I understand. Is valgrid-mmt the tool I need to use? and is there a certain application I should run it with?
18:43imirkin_: mmiotrace, not valgrind-mmt
18:43optlink: I believe I can figure
18:43optlink: ah ok
18:45imirkin_: optlink: this is a good guide: https://wiki.ubuntu.com/X/MMIOTracing
18:47optlink: imirkin_: thank you. I see that I should be starting X with nvidia. My system doesn't have any outputs on the dGPU so I'm not sure if I can do that
18:48optlink: or does that not matter?
19:28rhyskidd: Lyude: "yeah, for the rest of them (for BLCG, SLPG or whatever the one that starts with an S should be just adding other register writes and maybe more hooks)" <= SLCG is Second Level Clock Gating
19:45karolherbst: Lyude: look for patches here: https://android.googlesource.com/kernel/tegra.git/+log/android-tegra-dragon-3.18-marshmallow-dr-dragon/drivers/gpu/drm/nouveau
19:45karolherbst: mupuf: ^^nouveau patches
19:47karolherbst: zcull stuff: https://android.googlesource.com/kernel/tegra.git/+/77b7865b631d2af135ca24915e0c2937ebd29879
19:57Lyude: karolherbst: cool, thanks!
20:05pmoreau: Quite a few patches over there
20:23RSpliet: karolherbst: that zcull patch is courtesy of sooda ;-)
20:24sooda: that's a super simple one, likely most zcull stuff hidden under the hood and you know better than me what's supposed to be in the pushbuffers, as i've said iirc :P
20:25sooda: there's indeed quite a few patches; however, from what i remember many of then don't get things right the first try so features are kind of split...
20:25RSpliet: sooda: no need to downplay your contributions :-) I thought I'd drop that to let him know that in the worst case we can communicate about this work too. Of course it's better if we let you work on your own chromium work as I bet your boss doesn't want you to spend too much time on upstreaming (boo at your boss if true :-P)
20:26sooda: yeah surely if there's anything odd just ask me
20:27sooda: we did discuss internally about upstreaming all the pixel c things, but somehow other things got prioritized over ;(
20:28RSpliet: yeah, corporate life isn't always compatible with ideals
20:28RSpliet: (nor with good solutions. In SoC world the quicker seems to always be the better)
20:29RSpliet: pretty cool though that it took you guys like... 140 relatively small patches?
20:30sooda: i didn't count them, but the coolest thing i remember is that getting the simplest things to work with our userspace needed very little kernel changes
20:31sooda: even perf got pretty close. likely just zcull and compression and such is the big remaining delta
20:31RSpliet: no closer to 275 patches, but still. that includes iterative changes :-)
20:34RSpliet: and a lot of it is DVFS and clock gating, which is just something that has had rather little love (or well.. rather lots of "tough love" :-P) from our side
20:34sooda: ha :)
20:34RSpliet: seeing that makes me feel quite confident about the quality of nouveau as it is.
20:37sooda: i'm feeling that the big rm driver is pretty fat due to each and every tricky corner case and rare bug workaround and sli and such things. the (open-source) nvgpu driver for tegra isn't too big
20:38RSpliet: Oh yeah, and I bet there's a lot of legacy code in rm for cards that are no longer supported - but untangling it is a bit messy
20:39RSpliet: plus... if I see how much effort AMD is putting in their display abstraction layer I bet there's a million quirks that we don't know about for monitors that might run somewhere in a cellar :-P
20:41karolherbst: RSpliet: that DVFS stuff is stupid and silly and unusable
20:42karolherbst: I already have better code
20:44karolherbst: well unusable on dedicated GPUs
20:44RSpliet: karolherbst: I would personally appreciate it if you could value this "given horse" a bit better. It might not be perfect for our broader use-cases, but I'm sure it's valuable code nonetheless.
20:44karolherbst: sadly, not really
20:44karolherbst: most of the other stuff is usefull, but not the DVFS part
20:46karolherbst: RSpliet: and it depends on the tegra PMU images, there is mostly nothing on the host, except setting up the counters, a little
20:46sooda: how's xdc btw? it's rather far away this time :p
20:46karolherbst: it's nice
20:46sooda: quite a few attendees in the list
20:48pmoreau: sooda: Hard to be closer than last year! ;-)
20:48karolherbst: RSpliet: and if they would post those patches on our ML, I would answer the same
20:48sooda: if i was an organizer, it would happen in the otaniemi aalto univ campus which is much closer to my place! :D
20:51karolherbst: sooda: well, there is always next year
20:54RSpliet: karolherbst: I know. There's a difference between "valuable" and "upstreamable" though, any line of code is free documentation that gives away hardware specs and/or engineer mindset of someone who had access to those. I'm not asking you to ack code like that, just be a bit more appreciative of the effort of hardworking people on open source software, as I'd hate to see hostility between people who want to reach similar goals (albe
20:54RSpliet: different forms or shapes).
20:54karolherbst: well, the DVFS also didn't give us any more information
20:55RSpliet: you can always ask marcheu for the rationale behind his decisions ;-)
20:57karolherbst: look at those patches, then your questions shall be answered
20:58karolherbst: you won't see me saying that nothing of this is of value, just that those DVFS patches are not and if we wanna go through all those patches, we need to filter out unusefull ones either way
20:58karolherbst: and sooner or later we would come to the conclusion, that some patches are useless
20:59karolherbst: and I would rather spend time to appreciate usefull patches, than spending time discussing why useless patches should be still appreciated. Sure we could do that, but it won't get us anywhere
21:00karolherbst: at worst it would even tell nvidia, that useless stuff is still "fine"
21:00karolherbst: the power/clock gating is super usefull, that's why I showed it to Lyude
21:03RSpliet: karolherbst: Yes. I got that. I'm telling you that we're dealing with human beings. There's no need to call their work stupid and silly, those kind of remarks work counterproductive in creating goodwill. You may choose their work is not of great use to our goal of upstream nouveau, but slacking off other peoples hard work just because it's incompatible with your (or our) use-case should not be tolerated.
21:03marcheu: RSpliet: the decision was something like "I have 30 min to implement something, what can I do within that time"
21:04karolherbst: RSpliet: okay, that's true, my phrasing was terrible
21:04karolherbst: marcheu sorry about that, no offense meant
21:06RSpliet: karolherbst: thanks, appreciate it. :-)
21:07RSpliet: marcheu: wow... :-D Are you still involved with nouveau/tegra on the chromium side?
21:08marcheu: RSpliet: we did pixel C...
21:08marcheu: karolherbst: np, this code is not great :p
21:09karolherbst: pmoreau: hihi your work is mentioned
21:09pmoreau: karolherbst: Woot, where?! o_O
21:09sooda: what do you think about this not great thing btw :D https://android.googlesource.com/kernel/tegra.git/+/2c3e13563bdb2be13fe60656b567aa62aed944a2%5E%21/
21:10karolherbst: pmoreau: clover. SPIR-V -> nv50ir
21:10sooda: i managed to understand only half of the object madness and our userspace needed to call methods of objects created by the kernel, which wasn't allowed
21:10karolherbst: sooda: :D
21:11pmoreau: karolherbst: Where in clover? I’m confused
21:11sooda: the whole thing kinda looked like the kernel-created objs are "soon" going to be replaced by some new mesa code or something
21:11karolherbst: pmoreau: on the slide?
21:11marcheu: sooda: you know we were running the closed source user space on top of this right? :)
21:12karolherbst: pmoreau: XDC presentaiton, state of gpgpu
21:12sooda: i know i wrote the parts for the closed source driver to talk to that api
21:13pmoreau: karolherbst: Ah, ok, I was completely missing the context. Yes, Tom sent me some emails to know what the current status. :-)
21:13marcheu: sooda: did you work w/ Lauri?
21:15RSpliet: marcheu: perhaps superfluous, but sooda == Konsta
21:16marcheu: ah, then yes we were in that together :p
21:16imirkin_: sooda: there's now 'nvif' which allows a lot more of that
21:16imirkin_: [calling stuff directly]
21:18sooda: marcheu: a little bit, yes. he started the "v2" of the submit api for explicit fencing and stuff iirc and then moved on :P
21:18sooda: (that's the one we use with the closed userspace thing)
21:19sooda: imirkin_: iirc that's what i used already
21:19karolherbst: well conclusions: we should talk more about stuff ;)
21:19karolherbst: and sending patches to the ML also helps a lot to initiate it
21:19sooda: just needed to nvif-call some stuff that was initialized from the kernel
21:19imirkin_: sooda: oh ok
21:20sooda: spent a nontrivial amount of time wrapping my head around it :D
21:20imirkin_: i don't remember when nvif came about, nor when you forked things
21:20imirkin_: yeah, it's confusing as hell
21:20imirkin_: if it makes you feel any better, it's slightly less confusing than it was ~2 rewrites ago
21:21karolherbst: sooda: well you also could just add more nvif methods to do that
21:21sooda: i think stuff like the zcull and this need it https://android.googlesource.com/kernel/tegra.git/+/b8fc4450d9d8d7c79b5d5bdc25165fba6c35fd1f
21:21karolherbst: well, at least this would be the cleaner way
21:21sooda: "A new KEPLER_SET_CHANNEL_PRIORITY mthd on fifo channel objects" yeah sounds relevant, that just was like two years ago so i've lost the details :p
21:22imirkin_: it's a lot easier when you understand how the hw works :p
21:22sooda: like an ioctl inside an ioctl
21:23imirkin_: pretty much.
21:24sooda: the nvgpu driver is much less magic in the sense that all such objects are just ioctls on certain fds
21:24imirkin_: i believe ben had dreams of virtualization
21:24sooda: that's what you explained me years ago :)
21:24sooda: i pasted a few lines of irc logs to our wiki. still there i guess
21:25sooda: just compare these two
21:25RSpliet: karolherbst: does Ben still have these dreams? Does he talk in his sleep? :-P
21:25karolherbst: RSpliet: no clue
21:27sooda: sleep ->
21:29marcheu: I see Ben right now and he's awake, not dreaming
21:29RSpliet: sooda: oh that timeslice tuning actually seems quite nice. Would be nice if we can use per-context "load" information to tune timeslice lengths in-kernel altogether...
21:33karolherbst: imirkin_: I broke the register selection of the splits now and all values aren't assigned in one step :/
21:35karolherbst: meh, it's a bit more complicated than I thought, but I guess I will be able to manage
21:44karolherbst: well, still looks wrong though
21:44karolherbst: but I am getting there
21:50karolherbst: something is messed up with the merges
22:50karolherbst: imirkin_: does this look correct to you? https://gist.github.com/karolherbst/7788a7ea12156b43ee9c9e66d272fc20
22:50karolherbst: limiting to 7 regs
22:53imirkin_: 1: st b128 # l[0x20] $r0q (8)
22:53imirkin_: 2: ld u32 $r0 l[0x20] (8)
22:53imirkin_: seems a tad wasteful :p
22:53imirkin_: and shouldn't happen ... iirc the restore should only run if it's not the next instruction =/
22:54imirkin_: anyways... this seems odd. only goes up to r4
22:54imirkin_: i'd do 8
22:54karolherbst: yeah, I know
22:55imirkin_: 49: st u32 # l[0x64] $r0 (8)
22:55imirkin_: like ... where does that get retrieved?
22:55karolherbst: I think there is too much spilled actually
22:55imirkin_: nowhere i see
22:55imirkin_: so yeah, i think you're spilling a bit too much.
22:56imirkin_: and i think some of your offsets are wrong
22:57imirkin_: 36: export u64 # o[0x40] %r205 %r208 (0)
22:57imirkin_: 77: ld u32 $r0 l[0x20] (8)
22:57imirkin_: 78: ld u32 $r1 l[0x20] (8)
22:57imirkin_: 79: export u64 # o[0x40] $r0d (8)
22:57karolherbst: I think it is still broken
22:59imirkin_: (it's supposed to be 2 diff values, but there it's loading the same value)
23:00karolherbst: well, I didn't touch the offset stuff, just how the node dependencies are tracked
23:01karolherbst: but it really helps me to understand how the code works, so I suspect I will throw my changes aware some several more times until I get something working for real
23:09karolherbst: imirkin_: you are right, I need to fix the offsets, because for spilled merges/splits, the offset in the local space isn't considered for the values
23:15karolherbst: imirkin_: but I think it's not my fault that too much is spilled, because that's how the code basically works. I think we could spill a little less, but it is never a problem usually because it was 60 vs 62 live values, not 4 vs 6 or so
23:15karolherbst: not quite sure about it though
23:15imirkin_: it really likes to align at 4
23:16imirkin_: so if you have 7
23:16imirkin_: then ... that's not great ;)
23:16karolherbst: probably, yeah
23:17karolherbst: I know the issue
23:17karolherbst: it's like that: either all of the split values are spilled or none
23:18imirkin_: "over-joining" =/
23:28karolherbst: pro tip: don't extend the live range of the vfetch value
23:43optlink: How long is a mmiotrace supposed to take?
23:44imirkin_: it doesn't take time
23:44imirkin_: it starts and stops whenever you tell it
23:45imirkin_: karolherbst: you say that, but when e.g. that vfetch goes directly into a tex arg...
23:45optlink: Sorry. I meant I'm trying to start X during a trace and it hasn't started yet even after an hour
23:45imirkin_: ah. should take a few mins at most.
23:45imirkin_: 10 at the outside.
23:45karolherbst: imirkin_: mhh, right
23:45karolherbst: imirkin_: but it doesn't matter
23:45imirkin_: karolherbst: pro tip: RA is hard ;)
23:46karolherbst: currently the value of the vfetch is extended beyond the split and all the live ranges of all split values are merged into the vfetch value as well
23:46karolherbst: thats what I was refering to
23:46imirkin_: well, that's just coz the RIG nodes are merged
23:46karolherbst: but this doesn't happen if there is no split/merge to begin with
23:47imirkin_: it is a questionable thing.
23:47imirkin_: yeah, but there wouild be a split/merge ;)
23:47karolherbst: I basically removed that merging
23:47imirkin_: i do have a split/merge pass fixer
23:47imirkin_: that i did for doubles
23:47imirkin_: but it only works if things match up perfectly
23:47imirkin_: which they might not for textures
23:58optlink: imirkin_: I'm trying to trace on an optimus system. Is the trace still useful if I use bumblebee? Nvidia-xrun is what I'm using now, apparently it doesn't work
23:59imirkin_: sure, should be about the same