08:02 karolherbst: now the interesting part, what fixed gddr5 reclocking on mupuf nve6 https://gist.github.com/karolherbst/b2bd97e605e95565f33e409009d6d297
08:05 karolherbst: got it messed up again, sad
08:06 karolherbst: ohh, usual desktop reclocking instability as the cause
08:07 karolherbst: nope, something still wrong
08:23 RSpliet: karolherbst: I'd be suspicious for GPIO writes when you do that MR hack - you're altering the VDD range...
08:23 karolherbst: well I added that, because nvidia does it as well
08:24 RSpliet: Well, yes, but they do that to alter the internal voltages of the DRAM chips
08:24 RSpliet: That likely means they alter the supply voltage as well - hence GPIO writes
08:24 karolherbst: ahhh
08:25 karolherbst: MEM_VOLTAGE or MEM_VREF?
08:26 RSpliet: Think you'd have to tell by experimenting, don't know that from the top of my head
08:26 RSpliet: Either way it's pretty likely that you'd want to perform a mask operation, or better, represent this change in gddr5.c instead once the right VBIOS bit is identified
08:27 karolherbst: yeah
08:27 karolherbst: but first, I want to get reclocking stable, so that I know which changes are actually required to have a significant improvement
08:28 RSpliet: These kind of MR writes all tend to differ per-board. If you don't find the respective VBIOS bit you are most likely to fix it for one board but break it for another
08:29 karolherbst: yeah, and in the end it might be not so important, that's why I am currently simply try to figure out what's wrong
08:29 RSpliet: This kind of a difference... vital! I encourage you to dive into this properly ;-)
08:30 karolherbst: yeah, I guess I'll do this after I ran out of obvious differences
08:30 karolherbst: there is also the 62c000 vs 62503c difference
08:30 karolherbst: with the WAIT STATUS !(unknown) , 45478000 ns thing
08:31 RSpliet: that's the display range, isn't it... suspect that will have a smaller impact
08:31 karolherbst: well
08:32 karolherbst: touching 62c000 on laptops where there is no display thingy, totally crashed the gpu
08:32 RSpliet: Ah, yes... there is that
08:32 karolherbst: but it is written to in the end
08:32 karolherbst: so it might be not that big of a deal
08:33 karolherbst: and it could be some kind of do this or that thing and we have the GPU configued to use 62c000 and not 62503c or so
08:33 RSpliet: 62c000 is a broadcast register, 62503c is an alias for a single HEAD?
08:33 karolherbst: the differences in the reclocking scripts are rather big, so I am sure I'll have enough fun with it
08:33 karolherbst: RSpliet: might be
08:34 karolherbst: nvidia writes to 62503c prior FB PAUSE
08:34 karolherbst: and writes 0x62c040 into 0xa and 0x616340 into 0xb
08:35 karolherbst: what worries me more that nvidia writes different values into 0x1373f4
08:35 karolherbst: the 12th bit set to 1, where we set it to 0
08:36 karolherbst: well
08:36 RSpliet: yeah... I guess the only caution I'd advice you to take is that the code is I think developed for/on desktop GPUs. It makes assumptions that you now discover are probably wrong, but it's easy to make equally wrong assumptions based on one or two optimus set-ups
08:36 karolherbst: or we set it later to 1
08:36 karolherbst: well
08:36 karolherbst: it's a desktop gpu I am working on
08:36 RSpliet: ^ I just made an assumption that was wrong
08:37 RSpliet: they're the mother of all f... well
08:37 karolherbst: that's why I try to keep the differences as small as possible
08:37 karolherbst: and if a super small "fix" breaks something else, it's easer to figure out
08:37 karolherbst: hopefully it's a simple fix in the end
08:38 karolherbst: I like all those comments in the code
08:39 karolherbst: really gives me an idea where to dig deeper when I'll finish with all the other things
08:45 karolherbst: ohh nice, it looks pretty identical now
08:54 karolherbst: RSpliet: okay, regarding those GPIOs, we already do the same nvidia does
08:54 karolherbst: I am sure it will be something else which matters as well
08:55 RSpliet: Are you able to verify in your "perf mode 2" that if you flick the voltage switch in the VBIOS, the MR[7] write disappears or not?
08:56 karolherbst: I am not that far yet
08:56 RSpliet: Not judging... I shouldn't even be on IRC right now ;-)
08:57 karolherbst: ;)
08:57 karolherbst: and I should prepare for tomorrow
08:58 karolherbst: I get the feeling, that the more I do what nvidia does, the more unstable memory reclocking becomes here
08:58 karolherbst: *sigh*
09:10 karolherbst: uhh, how that happen
09:10 karolherbst: P=0
09:11 karolherbst: yeah, then no wonder it doesn't work
09:18 karolherbst: and when I change the plls, I also need to retrain, makes sense
09:24 karolherbst: now we are getting somewhere
09:27 karolherbst: interesting
10:05 karolherbst: RSpliet: *sigh* I think something really small is wrong. Because whenever I reclock 07 -> 0a -> 0f it's much more stable than going 07 -> 0f
11:53 RSpliet: does NVIDIA *ever* go from 7 to f without a in between?
11:54 RSpliet: one hypothesis: PLLs don't always like to be reconfigured drastically, and sometimes simply need to be configured in steps
11:56 karolherbst: RSpliet: yeah, that's also my assumption
11:56 karolherbst: but it works pretty well on most cards
11:56 karolherbst: and nvidia indeed goes directly to f
11:56 RSpliet: ok, so that invalidates the idea
11:57 karolherbst: there are a few differences in the scripts which might be related to this though
11:57 karolherbst: but I am not able to take another look today
11:57 RSpliet: In your own time obvs. Every tiny difference is worth investigating, problems are easily a "piling up" of tiny differences
11:58 karolherbst: doubtful
11:58 karolherbst: or maybe
11:58 RSpliet: I've brought many many GT21x to their knees over tiny differences ;-)
11:58 karolherbst: the big problem currently is, that most of the things we do different, we do for the right reason differently
11:59 karolherbst: and this makes it a bit painful to find the thing we do wrong
11:59 karolherbst: but
11:59 karolherbst: one obvious difference: nvidia only touches one PLL
11:59 karolherbst: and skips mem training
12:00 RSpliet: slight difference in order of commands is often not a huge problem, but ever-so-slightly different values can have big effects. Esp. if it's something like "DRAM has DLL disabled, but we forgot to inform the memory controller"
12:00 RSpliet: ah yeah... they probably configured the other one well in advance?
12:00 karolherbst: yes
12:00 karolherbst: nvidia tends to do this sometimes
12:00 karolherbst: maybe we should prefer to reuse the old pll configuration
12:01 karolherbst: and just configure the second "multiplier" pll
12:01 RSpliet: always. If it's configured well, we don't have to reconfigure
12:01 karolherbst: mhh
12:01 karolherbst: it depends on the prior clocking state
12:01 karolherbst: but usually the 0a and 07 clock states allow us to do this
12:01 karolherbst: which might also explain why 0a -> 0f works and 07 -> 0f not
12:02 RSpliet: simply not allowing enough time for the first PLL to stabilise before configuring the second?
12:02 karolherbst: maybe
12:02 karolherbst: but the second is the less stable one normally
12:03 karolherbst: the second is this crazy M == P == 1 one
12:03 karolherbst: if M != 1 -> unstable
12:03 karolherbst: we could set P to 2 though
12:05 karolherbst: or was it he first pll?
12:05 karolherbst: nvm
12:07 karolherbst: PMPLL.MCLK0_COEF is the one nvidia doesn't touch
12:07 RSpliet: if (ram->mode == 2) around line 1064
12:07 RSpliet: mode == 2 is supposed to mean "PLL"?
12:07 karolherbst: mhhh
12:08 karolherbst: mode == 2 is more like high freq GDDR5 mode
12:08 karolherbst: clocks above ~ 2.2GHz
12:08 RSpliet: ram->mode = (next->freq > fuc->refpll.vco1.max_freq) ? 2 : 1;
12:08 RSpliet: ah yes
12:08 RSpliet: so in that case it doesn't program the refpll coefficients at all?
12:08 RSpliet: that seems wrong
12:08 karolherbst: gk104_pll_calc_hiclk calculates the PLLs
12:09 RSpliet: both?
12:09 karolherbst: yes
12:09 RSpliet: ah ok
12:09 karolherbst: because it's easier that way
12:09 karolherbst: or rather, it was the better working soltuion
12:09 karolherbst: prior that we simply took PLL1 and calculated PLL2 to match the freq
12:09 karolherbst: which didn't work (at all)
12:10 karolherbst: the line "cur_N = target_khz / cur_clk;" could be cur_N = *N2
12:10 karolherbst: and maybe that's enough then
12:14 RSpliet: line 186 appears to be a wait long enough for PLL locking...
12:14 karolherbst: yes
12:14 RSpliet: and it skips reconfigure of the refpll if already has the right coefficients
12:15 karolherbst: yes
12:15 RSpliet: maybe the if-statement simply needs to verify the clock isn't just configured right but also enabled (132020 low bit) and not bypassed (132028 bit 19)?
12:16 RSpliet: idk, haven't seen traces of kepler reclocking, that's more Ben's domain
12:16 karolherbst: fun thing is, that nvidia always sets M/P to 1 for the PMPLL.MCLK0_COEF pll
12:17 karolherbst: and never touches it again basically
12:18 karolherbst: mhhh
12:18 karolherbst: interesting indeed
12:37 karolherbst: RSpliet: maybe we should simply calculate that PLL2 in the boot sequence and be done with it, because it isn't used for anything else afaik
12:41 karolherbst: still unstable, sigh
12:41 karolherbst: will take another look tomorrow
13:11 stefanches7: Hello all, just wanted to ask where can I find sources for Mali-4xx GPU project
13:12 karolherbst: not here
13:13 stefanches7: Where then?
13:13 karolherbst: no idea
13:42 anEpiov: is the opencl part compilable at this point? is it testable?
13:43 anEpiov: instead of waiting to be ready I figured I can be a beta tester
13:48 karolherbst: anEpiov: well for a beta to exist, something needs to work
13:51 pmoreau: anEpiov: You can always test my WIP, but you need to also compile a custom version of LLVM and clang, as well as SPIRV-Tools. And you can’t use clCreateProgramWithSource, but rather clCreateProgramWithBinary or clCreateProgramWithIL from OpenCL 2.1.
13:54 RSpliet: pmoreau: anything you'd learn from those tests that's useful?
13:56 anEpiov: pmoreau: ok I will
13:56 anEpiov: pmoreau: which kernel are you using?
13:56 pmoreau: RSpliet: Besides finding bugs, not that much. Getting an idea of which missing features should be prioritised.
13:56 pmoreau: anEpiov: Any kernel should work, I haven’t changed anything there.
13:57 anEpiov: pmoreau: is your version of llvm static?
13:58 pmoreau: It doesn’t look like it.
13:58 anEpiov: argh! system wide install?
13:58 pmoreau: No
13:58 pmoreau: I have some instructions on how to setup things here: https://phabricator.pmoreau.org/w/mesa/testing_opencl_through_spirv/
13:59 pmoreau: But it is missing the part about SPIRV-Tools. :-/
14:00 pmoreau: The SPIRV-Tools repo can be found here https://github.com/pierremoreau/SPIRV-Tools/tree/implement_linker (use the implement_linker branch). I will update the instructions once I get back home from work.
14:00 anEpiov: pmoreau: damn, excellent documentation, are you a teacher.
14:01 pmoreau: Thanks, it went through a couple of iterations, trying to make it as easy as possible, and I think hakzsam was the one rooting for the automatic script when I was asking him to test stuff.
14:01 pmoreau: I’m a teaching assistant.
14:02 pmoreau: I’ll try to finish getting support for clCreateProgramWithSource, but I was having issues with LLVM. And I need to do some reviewing/testing for karolherbst.
14:03 anEpiov: ok let's do it! let see if we can finish opencl in one month together!!
14:04 pmoreau: Uhhh, I would still like to sleep at least a few hours a week! :-D
14:05 pmoreau: Also, clover doesn’t even have OpenCL 1.2 support, so there would be some additional API work to get up to 2.2.
14:07 pmoreau: And, to have OpenCL 1.0 support in Mesa, we need an upstream version of LLVM/clang generating SPIR-V, which is currently not the case, so work on that front will be needed as well.
14:18 karolherbst: pmoreau: :)
14:20 karolherbst: pmoreau: clover only does opencl->TGSI, right?
14:21 pmoreau: There is also a path that uses clang to compile OpenCL -> LLVM IR, which is then used by the radeon driver IIRC.
14:22 karolherbst: pmoreau: ahh
14:23 karolherbst: pmoreau: I find the idea interesting to write a nir -> nvir translator, no idea if it's interested enough though so that I actually do this
14:23 karolherbst: I would rather rewrite/fix RA prior to that
14:23 karolherbst: which we will have to do to get OpenCL support anyhow
14:23 pmoreau: I don’t know if anyone is using the OpenCL -> TGSI path, as it was broken for some time, before someone fixed it.
14:24 pmoreau: Some modifications would have to be done to NIR for OpenCL/CUDA support, as it currently does not support unstructured control flow. Some pointer support was added relatively recently, but I don’t know whether it is enough or not.
14:27 karolherbst: yeah, that's why fixing RA is most likely the "future"
14:30 RSpliet: karolherbst: could you recap your problems with RA?
14:31 karolherbst: spilling
14:31 karolherbst: especially if you spill register for tex instructions
14:31 karolherbst: I have a trace where this bug shows
14:32 RSpliet: Ok, but this is a bug rather than a fundamental issue, isn't it?
14:32 karolherbst: there are also some fundamental issues afaik, but nothing I know for sure
14:33 RSpliet: I got the impression that the "core" of current RA is nothing more than building an interference graph and assigning colours in a semi-arbitrary order.
14:33 imirkin: RSpliet: karolherbst: i definitely remember skeggsb talking about multi-step reclocks
14:34 karolherbst: well
14:34 karolherbst: the high clock gddr5 reclock is alway done in two steps
14:34 imirkin: RSpliet: RA is fine. the issue is when it fails, we don't clean things up properly.
14:34 imirkin: or ... *an* issue
14:35 RSpliet: imirkin: that's my impression too. That's why I wanted to double-check whether karolherbst was leaning towards rewrite or fix ;-)
14:36 karolherbst: or I evalute what is wrong in the broken shader here: https://gist.github.com/karolherbst/c2ed0a32eaa6a8451ffa521e9254e214
14:36 karolherbst: RA spilling fail
14:37 karolherbst: fun fact: only broken when 4 b32 stores are merged into one b128 one
14:37 imirkin: i'd recommend avoiding doing any "rewriting" or "fixing" before you *actually* understand what the issue is
14:38 karolherbst: I know
14:38 imirkin: and yes... unmerging merged nodes is where some of the fail is.
14:38 imirkin: so having fewer merged nodes makes fail less likely to happen
14:38 karolherbst: I doubt you know the reason why this happens?
14:39 imirkin: i sort of do
14:39 imirkin: but not enough to fix it.
14:39 RSpliet: I... presume most of the complexity in RA deals with (interference) analysis and phi nodes, which is why it's such a long slab of code. I don't think you'd want to rewrite that unless there's something fundamentally broken. Spilling is almost like a shell around RA that says "okay, this reg with long liveness and lots of interference is now spilling to DRAM and we'll re-try", so hopefully the opposite to "fundamentally broken" :-)
14:39 imirkin: i think i've pointed you at a WIP commit with some fix attempts
14:39 karolherbst: imirkin: yeah
14:40 imirkin: RSpliet: the fundamental breakage might be in node merging though, which would affect RA.
14:40 karolherbst: imirkin: I could imagine that some live range tracking is broken (tm) or that a register is written to, which has still a valid value
14:40 imirkin: (not in the merging itself, of course, but rather in *undoing* it being impossible)
14:41 imirkin: karolherbst: yeah, it's definitely not in any way related to that whatsoever
14:41 RSpliet: imirkin: is this loosely related to the stuff that pmoreau has been trying to debug with merging 8-bit values into a 32-bit register?
14:41 imirkin: unlikely.
14:41 imirkin: there's additional subtlety with units etc
14:41 imirkin: which i doubt has been super-duper-tested
14:42 pmoreau: RSpliet: I do remember some issues with the merging, but I can’t remember which side was responsible for them.
14:43 RSpliet: pmoreau: I suspect "all sides" ;-)
14:43 imirkin: basically on nvc0+, regs are 32-bit
14:43 imirkin: while on nv50, you can address 16-bit components of 32-bit regs
14:43 imirkin: (at least with some instructions)
14:44 imirkin: arguably you can do that on nvc0 as well - some ops can take a byte or word value out - but that's not handled at the RA level
14:44 pmoreau: And SM_5.3+ which can do some 2x16fp, or even 4x8int ops, but we are not there yet :-)
14:44 RSpliet: karolherbst: I feel your pain though... it's unfortunate that due to the nature of spilling you can't create small test cases :-P you might want to consider hacking up nouveau to think there's only 6 registers or something silly, to force spilling on more trivial kernels
14:44 karolherbst: interesting though
14:45 imirkin: no need for SM_%.3
14:45 imirkin: SM_20 has those.
14:45 imirkin: e.g. I2F.B2 or I2F.H1
14:46 imirkin: we deal with those as subops of I2F
14:46 imirkin: the other place it comes up is the V* ops
14:46 imirkin: which we largely ignore
14:47 RSpliet: are those the "SIMD video instructions"?
14:48 pmoreau: I was more thinking of the (r0_low + r1_low, r0_high + r1_high) operations, where r0 and r1 each contain 2xfp16 values in a 32-bit reg, introduced with the TX1, and found in Pascal and above.
14:50 imirkin: pmoreau: your builder died
14:50 imirkin: on sep 4
14:50 RSpliet: pmoreau: yeah, the video instructions seem to be 8/16-bit integer
14:50 imirkin: RSpliet: yeah. kinda like SIMD inside of SIMD :)
14:50 pmoreau: imirkin: Yes, glibc dependency for valgrind-mmt. I haven’t tried fixing it yet, will probably do that tonight.
14:50 RSpliet: yo dawg
14:51 imirkin: SIMIMD
14:51 pmoreau: Might be enough to bump the required version of glibc, or it might not
14:51 pmoreau: imirkin: Wanted to suggest it to Philippe? :-)
14:52 RSpliet: imirkin: and then now there is the 16x16 collaborative "tensor processor" instructions, I can't even imagine an abbreviation for that
14:52 pmoreau: True
14:52 imirkin: pmoreau: yes.
14:52 imirkin: your images are still fine
14:52 imirkin: considerably more recent than, say, kernel 3.2
14:52 pmoreau: Was writing a reply as well, but you were faster ;-)
14:53 imirkin: feel free to suggest that he come here if he needs to speak french
14:54 RSpliet: I wonder... whether NVIDIA might decide after Volta that it's really time for a clean-up overhaul
15:04 imirkin: pmoreau: i thought speaking french was a requirement for nouveau contribution...
15:04 pmoreau: :-p
15:04 karolherbst: imirkin: what
15:05 pmoreau: Ok, from now on we will only speak french here! ;-D
15:05 jamm: haha
15:05 jamm: XD
15:06 pmoreau: Or finish, choose your poison
15:06 pmoreau: *finnish
15:06 imirkin: more french than finnish speakers here, i think
15:07 pmoreau: Possibly
15:07 pmoreau: Should we make a poll? :-D
15:07 pmoreau: (just kidding)
15:09 imirkin: RSpliet and i might tip the scales.
15:34 onext: Hi ! I report the bug number 102840, could someone help me, in frech if possible ? Thanks.
15:51 RSpliet: onext: dsl, mais mon français n'est pas assez bien pour vous aider propre ;-) Mais... avez-vous essaié de mettre au courant votre noyau?
15:56 pmoreau: RSpliet: Your French is not that bad, you would probably be able to help. :-) We have been pm’ing.
15:57 RSpliet: Hahaha, merci. Mais alors, je te laisse lui aider ;-)
15:58 pmoreau: For information, the card is a NV44A.
15:58 RSpliet: Ah... oh dear. That does explain the 32-bit kernel
15:59 onext: My machine is at least 10 years old...
16:01 onext: Thanks a lot to you all! Bye!
16:06 tstellar: pmoreau: Have you seen: https://github.com/google/clspv
16:06 tstellar: pmoreau: This is probably the best solution for OpenCL C -> SPIR-V at the moment.
16:07 airlied: though some of the fixes are to limit opencl c
16:08 airlied: like dont use goto :-)
16:15 imirkin_: oh i fixed some NV44A stuff ... at some point.
16:15 imirkin_: is it a PCI NV44A or an AGP NV44A
16:16 imirkin_: my fixes were for the PCI variant
16:59 pmoreau: tstellar: I have, but it does generate Vulkan SPIR-V rather than OpenCL SPIR-V. But yes, it could be an option.
17:00 pmoreau: tstellar: But feature-wise, the SPIRV-LLVM repo has been more than enough for now. :-)
19:08 Lyude: anyone from nvidia who might be around on IRC today?
19:11 mupuf: Ping ben, he should have access to a couple of them today!
19:12 Lyude: skeggsb: ^ ? if anyone there might be able to help me figure out why their kernel driver panics whenever I try to start it with drm modesetting https://paste.fedoraproject.org/paste/30QqJtKmbtI-fSVmiJpFMA that would be appreciated...
19:13 Lyude: i swear if I sneeze the wrong way this driver will seize up and die...
19:21 mupuf: Lyude: sneezing is not part of the test plan ;)
19:22 Lyude: hehe
19:39 imirkin_: Lyude: they have support forums
19:46 Lyude: sigh. i guess that might be what i'll need to do