10:18 mwk: Lyude: do you have envytools push access?
10:19 mwk: hm, apparently not
11:12 RSpliet: sooda: the download.nvidia.com FTP is down?
11:13 skeggsb: RSpliet: yeah, it's down while they upload all the docs and stuff for us ;)
11:13 sooda: doesn't seem to work internally either
11:13 RSpliet: sooda: tnx. Apparently I can still access the open-gpu-doc structures over http
11:13 mupuf: RSpliet: yeah, the http link has been the only one working for some time
11:15 RSpliet: oh hadn't noticed before... I guess FTP is outdated tech so can't blame them for taking it off-line. Wouldn't want to lose access to open-gpu-doc though :-D
11:16 pmoreau: skeggsb: Was that a pun or are they really taking it down to upload stuff for us? :-D
11:17 RSpliet: mwk: know why on Kepler NVIDIA sometimes introduces nops to make sure "far" backwards" branches end up at the start of the 64-byte instruction block (in fact, pointing at the sched codes). Is that an I-cache optimisation or just a limitation in the ISA?
11:17 skeggsb: pmoreau: i'd hoped my ";)" gave away that i was dream-typing
11:17 pmoreau: skeggsb: :-)
11:18 RSpliet: *far backward branch targets
11:18 karolherbst: :O
11:24 RSpliet: hmm, doesn't seem to be an ISA encoding requirement...
11:26 skeggsb: i don't suppose their compiler just pads out each basic block with nops?
11:27 mwk: RSpliet: I would guess cache alignment
11:27 mwk: it's quite common on many archs
11:29 karolherbst: maybe they read the instruction blocks at once, due to the sched opcodes?
11:29 RSpliet: karolherbst: yes, and conveniently 64-byte is also the minimal granularity of a DRAM operation
11:30 RSpliet: (with a burst length of 8 and 64-bit channels...)
11:30 karolherbst: RSpliet: do they fill up to 6 nops? or simply one?
11:30 karolherbst: or more=?
11:30 RSpliet: I've seen it fill up with 5 nops, but that's a special case
11:31 karolherbst: ohh right, 6 makes no sense
11:31 karolherbst: or does it? let me think
11:31 karolherbst: okay yeah, 6 might make sense
11:32 RSpliet: well, that special case being the expansion of OpenCL's async_work_group_copy()
11:32 RSpliet: they might just be blindly inlining some ASM for that
12:12 RSpliet: Ignoring the async_work_group_copy() it almost looks like an ad-hoc optimisation to avoid having the start of a loop at the end of an instruction block. Found one loop where they insert a single nop, and three nested loops where they didn't bother introducing 5 or 6. I'd have to look at more shaders, but perhaps it's a small threshold they assume is beneficial
12:29 karolherbst: RSpliet: I think for nested loops the overhead of executing those NOPs could be too big. If they now they only have to do it once: fine, otherwise: better not to
12:29 RSpliet: karolherbst: for the outer loop it shouldn't matter
12:30 karolherbst: why not? the outer loop could be executed a few times as well
12:33 RSpliet: yes, but the nops leading up to it aren't
12:35 karolherbst: yes
12:35 karolherbst: I think that's what I said, wasn't it?
12:35 skeggsb: the nops seem like a good way to avoid fetching useless (ie. before the jump target) instructions into the cache
12:35 karolherbst: ohh, okay, I think I see what I wrote wrong
12:36 karolherbst: RSpliet: I meant if they know they execute NOPs once, it's fine for them, this also includes NOPs in front of outer loops
12:36 karolherbst: skeggsb: they have to fetch those anyway (sched)
12:36 karolherbst: ohh right
12:36 karolherbst: that's what you meant
12:36 karolherbst: sorry
12:37 RSpliet: skeggsb: yeah exactly, an I-cache optimisation :-)
12:38 RSpliet: just curious where NVIDIA would've decided the threshold lies
17:23 Lyude: mwk: nope
17:27 RSpliet: karolherbst: well every inner loop has a high likelihood of being executed more often than its enclosing loops. I suspect the maths always work out
21:06 z411: hello everyone, quick question: is vsync breaking when using xrandr a known issue?
21:20 pmoreau: z411: Hi. What kind of use of xrandr is breaking vblank? Like just running `xrandr` without arguments?
21:24 z411: pmoreau, Hi, thanks for the response. It happens when I do anything with it like switching mode, refresh rate or turning screens on/off.
21:24 z411: https://pastebin.com/raw/VDaKz0Bc
21:25 z411: For example, here I tried running glxgears and it works fine (it caps at 60 FPS), but running a mode change, vsync seemingly stops working, even after I turn it back to the original mode again (1920x1080)
21:25 z411: The same happens if I change the refresh rate, or if I turn on my second monitor.
21:28 pmoreau: Mhh… I *think* it's not expected to work with multiple screens, regardless if using xrandr or not.
21:30 z411: Will disconnect my second monitor to see if it's related to that, a sec...
21:31 z411: Mhm, seems to still happen even with one monitor connected, it seems just changing the mode triggers the issue
21:32 z411: I might be going old though, I'm on kernel 4.9.25, will still try on mainline
21:32 pmoreau: 4.9 is not too old yet :-)
21:35 z411: Vsync does work with two monitors as long as I don't manipulate anything with xrandr
21:35 z411: Pretty weird, should I report this?
21:35 pmoreau: You should open a bug report with kernel + X version, which DRI version you are using, whether you are using the Nouveau DDX (and which version) or the modesetting one, include you Xorg.0.log + the test you pasted
21:36 karolherbst: I reworked my pmu counter stuff. If anybody wants to look before I send out a new version of the series, please check it out: https://github.com/karolherbst/nouveau/commits/pmu_counters_v2
21:36 pmoreau: That way people who know how it works can have a look
21:36 pmoreau: karolherbst: The last 7 commits, right?
21:37 z411: pmoreau, I see, will do that. Thanks.
21:38 pmoreau: z411: Thanks for reporting
21:43 pmoreau: karolherbst: So, load is reported as a hex value between 0x00 and 0xff? Wouldn't it be "easier" to have it as a percentage?
21:44 mupuf: pmoreau: don't lose precision!
21:44 mupuf: and a value is never reported as hex or binay or anything ... unless it is in a string ;D
21:44 mupuf: It is just a u8
21:44 mupuf: </pedantic> :D
21:44 pmoreau: :-p
21:46 karolherbst: pmoreau: it
21:47 karolherbst: 's easier this way
21:47 karolherbst: a simple div on the falcon
21:47 pmoreau: Would you lose too much precision if it was reported as 37.8%? It's only the value reported to userspace, not the one used for dyn reclocking, so it shouldn't matter too much, should it?
21:47 karolherbst: pmoreau: check the PMU code and it makes sense
21:47 pmoreau: I mean the value outputted in debugfs
21:47 karolherbst: 2nd commit
21:47 karolherbst: ahh I see
21:47 karolherbst: oh well
21:47 karolherbst: I could print it out as %
21:48 karolherbst: details ;)
21:48 pmoreau: I agree it's way easier to keep it as u8 on the falcon
21:48 pmoreau: :-)
21:48 karolherbst: we could even report the full u8 value to userspace
21:48 karolherbst: and let the gallium_hud convert it to %
21:50 pmoreau: True, gallium_hud could certainly do the conversion. Or any userspace script.
21:50 karolherbst: yeah
21:51 mupuf: yes, push all the precision to the userspace and let the HUD do its magic
21:51 karolherbst: going higher than u8 doesn't make much sense though, because we really don't need that much precision
21:53 karolherbst: I was even thinking about sending the full u32 values to te kernel, but then I couldn't read it out in one go and would need to change too much in the PMU-host communication, so I kept it as u8
21:54 mupuf: karolherbst: but there may be more than 4 channels in the future :p
21:54 karolherbst: te code supports 8 actually
21:54 mupuf: I will check the code later, still working on this article
21:54 mupuf: oh, cool
21:54 karolherbst: only gt215 has 4 channels
21:54 karolherbst: gf100+ has already 8
21:54 karolherbst: and this is plenty
21:55 karolherbst: even nvidia doesn't have anyting usefull to do with those
21:55 pmoreau: :-D
21:55 karolherbst: they use 5 or 6 usually
21:55 karolherbst: aond 2 of those are kind of not needed actually
21:55 karolherbst: like they fill 2 slots for video accel stuff....
21:56 karolherbst: if I didn't make any mistakes, this is what nvidia uses: https://gist.githubusercontent.com/karolherbst/1eb3759be936406734bcfa308c2652b2/raw/56dd0ea5c1396f70fdcec445455dea8b27773260/gistfile1.txt
22:03 mupuf: yeah, it looks familiar
22:03 mupuf: and it was funny that they included pcopy and GR together on older platforms
22:03 mupuf: btw, nva5 also was using PCOPY IIRC
22:04 karolherbst: maybe just bad naming on our end
22:04 karolherbst: I am wondering how dyn reclocking worked on older gens
22:05 karolherbst: or maybe they just monitored the FPS in userspace and complained about bad perf?
22:07 mupuf: karolherbst: they used pcounter
22:08 mupuf: check mmiotraces
22:08 RSpliet: wasn't it more crude... like "when launching a new context/game/we, crank the clocks up to max"
22:08 mupuf: RSpliet: this is still true
22:08 RSpliet: yeah
22:08 mupuf: no, they were monitoring the perf counters
22:08 RSpliet: ok
22:09 RSpliet: never looked at that much
22:09 mupuf: from pcounter
22:09 mupuf: the problem is that they had to disable dyn reclocking when the userspace wanted to use the perf counters
22:09 mupuf: whereas, with the pmu, both of them are independent
22:10 karolherbst: ohh I see
22:10 karolherbst: well dyn reclocking was kind of crappy on Tesla anyway
22:13 mupuf: in what sense?
22:13 mupuf: only the reclocking process was chaotic and very chip-dependent
22:14 mupuf: dyn reclocking was quite alright, aside from the fact that memory usage was not taken into account IIRC
22:14 karolherbst: well there weren't many perf levels
22:14 mupuf: ah, right!
22:14 mupuf: yes, only 4 at best, only on laptops
22:15 mupuf: desktop PCs had funnier vbioses
22:15 karolherbst: yep
22:25 z411: pmoreau: Thanks for mentioning the modesetting driver, I can now confirm it only happens with the DDX one
22:35 pmoreau: z411: Which version of the Nouveau DDX did you test?
22:43 z411: pmoreau, 1.0.13 which is the latest debian sid offers, although experimental has 1.0.15 might try getting that one
22:51 z411: Still happens with 1.0.15, sadly.
23:11 karolherbst: yay, getting notified and setting new thresholds is working :)
23:16 mupuf: :p
23:17 karolherbst: but something is wrong on the PMU set, the max value isn't set to the value I've expected...
23:30 karolherbst: a falcon C compiler would have been nice now :D