09:24 danuker: Hi guys! I wish to thank you for your work, I was surprised that my GTX 850M works in 3D games with nouveau!
09:56 danuker: I am currently compiling LLVM (need it to use Mesa from git) to try out the latest version. @imirkin_: Would it help to provide piglit results on it? (NV117 / GM107)
09:59 karolherbst: danuker: you don't need llvm if running only nouveau
09:59 karolherbst: just disable radeonsi and llvm support when building mesa
10:00 danuker: oh, thanks! it was quite slow
10:00 karolherbst: yeah.. building llvm is pain
10:00 karolherbst: but you usually don't need your own llvm to build mesa anyway
10:00 karolherbst: just a new enough version is enough as well
10:01 danuker: well, debain testing only has LLVM 7, and mesa required 8
10:01 karolherbst: ahh, I see
10:01 karolherbst: danuker: https://apt.llvm.org/
10:02 danuker: wow, thanks xD
12:11 danvet: https://patchwork.freedesktop.org/patch/325648/ some testing/review very much appreciated ...
14:45 imirkin_: karolherbst: minor point, but llvm is used by nouveau via gallivm for GL_SELECt/FEEDBACK render modes.
14:45 imirkin_: (well, more like used by st/mesa)
14:45 karolherbst: ohhh
14:45 karolherbst: interesting
14:46 karolherbst: why is llvm used for that?
14:46 imirkin_: well, "draw" is used.
14:46 karolherbst: I see
14:46 imirkin_: which in turn will use gallivm if available
14:46 imirkin_: (there's also a DRAW_USE_LLVM env var iirc)
14:46 karolherbst: what happens if you compile with llvm=false then?
14:46 karolherbst: slower path sued?
14:46 karolherbst: *used
14:46 imirkin_: yep
14:46 karolherbst: k
14:47 imirkin_: but again ... only for SELECT/FEEDBACK, which are exclusively used by old versions of blender
14:47 imirkin_: and software that was last updated in the past century.
14:47 karolherbst: and I assume there is no nice way to do that on the GPU
14:47 imirkin_: mareko and i have talked about it in the past
14:47 imirkin_: it's doable, but lots of effort, esp in the presence of an actual geometry shader (in a compat context)
14:48 imirkin_: actually i think his "clip in compute" logic can be used for 97% of this
14:56 imirkin_: danvet: what precisely needs to be tested?
14:56 imirkin_: is this a "does it work" thing, or does it need to be thorough in some way?
14:57 imirkin_: you mention that it affects relocations, so pre-nv50 ... i have a nv34 plugged in, so it wouldn't be an enormous amount of trouble.
14:57 danvet: imirkin_, "does it work" but on a pre-nv50 apparently
14:58 danvet: something that still uses relocs
14:58 imirkin_: danvet: is there a way to force the "slowpath" logic?
14:58 danvet: so yeah if you can give it a spin, very much appreciated
14:59 danvet: imirkin_, as long as you hit a reloc path it should be hit
14:59 danvet: i.e. first batch
15:00 danvet: the slowpath is hit unconditional if we do any relocs
15:00 imirkin_: ok. pretty much all pre-nv50 batches will have relocs
15:00 danvet: I meant: reloc and we actually have to apply them
15:00 danvet: after buffers are placed we should go back to the fastpath
15:00 imirkin_: mmmaybe... tbh i'm weak on how relocs work out in practice
15:00 imirkin_: [on nouveau]
15:01 imirkin_: it's one of those "it works, so i'm happy" things :)
15:03 imirkin_: danvet: if all goes to plan (what are the chances), i should be able to do a basic test tonight
15:03 imirkin_: danvet: if you don't hear from me by tomorrow, ping me again
15:13 bbigras: Is this enough to report a hang? https://pastebin.com/Kb1z4PdU I mean for the log part, I can try to provide details and system info.
15:14 imirkin_: bbigras: don't bother
15:14 imirkin_: there's not a ton that can be done
15:15 bbigras: imirkin_: what do you mean?
15:15 imirkin_: i mean providing more info won't lead to the issue being investigated or fixed
15:22 bbigras: In the sense that nobody gives a damn or because it's not really reproductible without the same hardware and os or something like that?
15:30 karolherbst: reproducibility is a big issue
15:31 karolherbst: and then it depends on the actual issue
15:31 karolherbst: some are just caused by a big issue which can't be easly fixed without spending weeks/months of dev time
15:36 imirkin_: bbigras: a bit of both. i've tilled against that particular windmill many times... my last theory is that our fifo ctx switch logic for nv50 is bogus, but fixing it requires ... work
15:36 imirkin_: (and i have no time to invest into such an effort)
15:38 imirkin_: sorry that's a shitty answer, but i don't really have a better one
15:38 imirkin_: nouveau is in desperate need of proper developer attention
15:39 imirkin_: unfortunately such people don't present themselves often
15:39 karolherbst: or they run away :p
15:39 imirkin_: and the ones who do eventually burn out / move on / etc
15:55 bbigras: No worries I understand. Thanks to you both.
16:05 imirkin_: bbigras: basically the current situations is that nv50-era gpu's sometimes hang. it's relatively rare, but as more and more toolkits decide that GL is necessary to accelerate the drawing of single pixels, it will become more and more frequent.
16:14 meeku_: so it seems the old vga register is a no go, I'm now looking at ways to detect vsync in a device specific manner
16:15 imirkin_: tbh i'm not even sure how it works on nouveau
16:15 imirkin_: it must be that we get an intr...
16:15 imirkin_: (after we ask for it)
16:15 meeku_: i would think so
16:15 meeku_: the Intel docs work in a similar fashion
16:16 meeku_: but i did note the hw sequencer can detect it
16:16 meeku_: wondering if that might not be easier to setup
16:17 imirkin_: "hw sequencer"?
16:20 meeku_: just trying to find the damn thing now
16:20 meeku_: https://github.com/envytools/envytools/blob/master/docs/hw/bus/hwsq.rst
16:20 imirkin_: oh. that hwsq.
16:21 imirkin_: is it a thing on your target hw?
16:21 meeku_: not sure, is only on newer cards or only on older ones
16:22 meeku_: because it has 0x5f: ewait - waits for an event [3 bytes] [NV41+ only]
16:22 imirkin_: according to those docs, it's gone starting fermi.
16:23 meeku_: bugger
16:23 meeku_: i'd be happy to have something for newer only cards
16:24 meeku_: maybe there is a status bit that is directly probable from the CRTC regs
16:30 imirkin_: so basically you want to wait until next vblank period, right?
17:51 meeku_: back
17:51 meeku_: yep
17:51 meeku_: it really should be as simple as polling on some bit
17:51 meeku_: "should"
17:51 meeku_: being a probably non-operative word :)
17:52 meeku_: GPUs .. a lesson in how to over-engineer and over-complicate something
18:04 meeku_: possibly:
18:04 meeku_: #define NV_PDISP_FE_EVT_STAT_HEAD_TIMING(i) (0x00611800+(i)*4) /* RW-4A */
18:05 meeku_: #define NV_PDISP_FE_EVT_STAT_HEAD_TIMING_VBLANK 2:2 /* RWIVF */
18:05 meeku_: #define NV_PDISP_FE_EVT_STAT_HEAD_TIMING_VBLANK_INIT 0x00000000 /* R-I-V */
18:05 meeku_: #define NV_PDISP_FE_EVT_STAT_HEAD_TIMING_VBLANK_NOT_PENDING 0x00000000 /* R---V */
18:05 meeku_: #define NV_PDISP_FE_EVT_STAT_HEAD_TIMING_VBLANK_PENDING 0x00000001 /* R---V */
18:05 meeku_: #define NV_PDISP_FE_EVT_STAT_HEAD_TIMING_VBLANK_RESET 0x00000001 /* -W--V */
18:26 imirkin_: meeku_: displays have gotten more complicated
18:26 imirkin_: think about getting audio to some specific monitor along a DP-MST chain.
18:26 imirkin_: it's just ... not simple.
18:27 imirkin_: it was much simpler in the cirrus days
18:29 meeku_: I agree that things are more complicated due to more features and expectations of devices over time, but i'm 99% positive it could be re-designed to be orders of magnitude less complex to use, but thats not going to happen unless one of us feels like starting our own gpu company .. but then you realise a lot of the problem stem from lower level standards like hdmi/dp.. so you might as well rebuild all the protocols and
18:29 meeku_: interfaces too :)
18:31 meeku_: like all the timing related info for modes.. it's all still based around a CRT timing model, which really shouldn't be necessary any more.. all the porches etc.. and if the monitor and gpu interface was smarter and simpler, enumerating possible modes and setting them should be a piece of cake
18:31 meeku_: but i digress.. all i need is vsync :)
23:00 RSpliet: karolherbst: phoa, that's been ages ago. https://envytools.readthedocs.io/en/latest/hw/pm/gt215-clock.html shows the best I can offer you in terms of docs
23:01 RSpliet: There's like a million ways you can configure your clock tree, and only one of them is Crystal->PLL->PLL - which is the most useful for the highest perflvl
23:02 RSpliet: Square boxie things with A and X are muxes, A are selection lines, X are inputs.
23:05 RSpliet: I *think* PLL_BYPASS was useful as a temporary state while you reconfigure the PLL. That way the clock won't be cut off or go all wobbly.
23:08 RSpliet: And in some cases the output clock was just one of your RPLL clocks, optionally divided by the VCO_DIV signal. The PLL should be bypassed then as it's useless. Definitely seen that being configured on NVA8 at lower clocks
23:10 RSpliet: This is a while ago, but I think the RPLLs were configured to 810MHz, and the low/mid perflvl were 135/405MHz, ergo RPLL/2 and RPLL/6.
23:11 RSpliet: In fact!
23:13 RSpliet: When I RE'd this, before I went wrinkly and bald, the blob would refuse to use the PLL for clocks below 810MHz. It would always use the RPLL+VCO_DIV and leave the PLL in bypass. Needless to say there were huge rounding errors. Anything between 405 and 810MHz would either result in 405 or 810MHz.
23:37 karolherbst: RSpliet: okay.. but one thing is, nvidia never touches the BYPASS bit
23:38 RSpliet: karolherbst: on what GPU?
23:38 karolherbst: all
23:38 RSpliet: NVA3/5/8?
23:38 karolherbst: we have 0 traces where nvidia sets that bit
23:38 karolherbst: _all_ literally
23:38 karolherbst: but on a gf119 it breaks reclocking
23:38 karolherbst: and when the bypass bit is set, the clock signal is super unstable
23:39 RSpliet: Does your search include PMU scripts?
23:39 karolherbst: it's engine reclocking
23:39 karolherbst: that's never done on the PMU
23:40 RSpliet: FB is an engine. It has a DLL rather than a PLL, but for all intents and purposes that makes no difference.
23:40 RSpliet: As for the other engines. I would have to dig reeeally deep for traces and stuff
23:41 karolherbst: I am scanning the traces right now.. but anyway
23:41 karolherbst: setting that big breaks reclocking on that gf119
23:41 karolherbst: *bit
23:42 RSpliet: Can't confirm or deny. Clock trees change. Might be skewed too much when you bypass the PLL on that chip, can't say, never looked into it.
23:42 karolherbst: RSpliet: thats the command I am using to scan: $ find . -type f -iname *trace* | while read line; do echo $line; demmio -f $line -c 0 2>/dev/null | grep -i BYPASS_PLL_CHECK ; done
23:42 karolherbst: RSpliet: well.. what we do is, to disable the bypass, do that PLL_LOCK check
23:42 karolherbst: then enable the bypass again
23:43 karolherbst: question is: what does this bypass do exactly
23:43 RSpliet: ah, no that's a different bit
23:43 karolherbst: and why would we want to turn it on
23:43 RSpliet: that essentially disables the PLL check, making it look locked even if it isn't.
23:43 karolherbst: okay.. and why is it a good idea to enable it?
23:44 RSpliet: Never, just for debugging purposes during board bring-up I imagine
23:44 karolherbst: okay.. and why was the code changed then in the first place?
23:44 RSpliet: "the code"?
23:44 karolherbst: nouveau
23:44 karolherbst: you added that code :p
23:45 RSpliet: mind pointing me at that code?
23:45 karolherbst: https://github.com/skeggsb/nouveau/commit/35d351975dcc2989facddbafcabba204278c4049
23:46 karolherbst: it's the "nvkm_mask(device, addr + 0x00, 0x00000010, 0x00000010);" part which breaks it
23:46 karolherbst: just trying to understand why that line was added and if we can just remove it without any danger
23:47 RSpliet: "and disable test logic when done (presumably to save power)."
23:47 karolherbst: where does that assesement come from?
23:48 karolherbst: I mean.. sure, it might save power, but it also breaks that gf119
23:48 RSpliet: "presumably" -> fuck knows, we're just doing whatever we saw in traces years ago.
23:48 karolherbst: there are 0 traces with that bit
23:49 karolherbst: at least not in the repository
23:49 RSpliet: I have know idea what "the repository" means. I had a handful of traces from the mmio.dumps and/or locally obtained
23:49 karolherbst: our repository with the traces and vbios
23:50 karolherbst: and in doubt I trust that more than any of local traces I can't check :p
23:50 karolherbst: but if there is hardware where nvidia sets that bit, it would be cool to figure out under which conditions nvidia does that
23:50 karolherbst: and when it's not safe to do
23:51 RSpliet: Ah that one, I do have a copy of that. Not sure how many traces are actually in there
23:51 RSpliet: Think I lost access to the git repo
23:51 karolherbst: you didn't :p
23:51 karolherbst: we moved it
23:52 karolherbst: RSpliet: https://gitlab.freedesktop.org/nouveau/nouveau_vbios_trace
23:52 karolherbst: but now it's also git lfs based
23:52 karolherbst: so you can probably just remove your local copy and clone that one
23:52 karolherbst: ohh wait
23:52 karolherbst: do you have a gitlab account?
23:53 karolherbst: huh
23:53 karolherbst: anyway.. now you have access
23:53 karolherbst: thought you were in the nouveau group
23:53 RSpliet: It's been a long time man. By weight I have more dust than GPU in my GPU drawer
23:54 karolherbst: the group is only really important for private projects :p
23:54 karolherbst: and it's just that and the shader-db
23:55 karolherbst: RSpliet: anyway... if you have a local trace where nvidia actually touches that bit, it would be nice if you could add the trace to the repository, bonus points if you add the vbios.rom file as well
23:58 karolherbst: RSpliet: mhh.. actually.. nvidia touches that bit, but only on the PCLOCK.SPLL_CTRL and PCLOCK.NVPLL_CTRL for a nva5 gPU
23:58 karolherbst: reg 0x004200 and 0x004220
23:59 RSpliet: Martin's NVA8/home as well
23:59 karolherbst: huh? really?
23:59 RSpliet: yep