08:02karolherbst: now the interesting part, what fixed gddr5 reclocking on mupuf nve6 https://gist.github.com/karolherbst/b2bd97e605e95565f33e409009d6d297
08:05karolherbst: got it messed up again, sad
08:06karolherbst: ohh, usual desktop reclocking instability as the cause
08:07karolherbst: nope, something still wrong
08:23RSpliet: karolherbst: I'd be suspicious for GPIO writes when you do that MR hack - you're altering the VDD range...
08:23karolherbst: well I added that, because nvidia does it as well
08:24RSpliet: Well, yes, but they do that to alter the internal voltages of the DRAM chips
08:24RSpliet: That likely means they alter the supply voltage as well - hence GPIO writes
08:25karolherbst: MEM_VOLTAGE or MEM_VREF?
08:26RSpliet: Think you'd have to tell by experimenting, don't know that from the top of my head
08:26RSpliet: Either way it's pretty likely that you'd want to perform a mask operation, or better, represent this change in gddr5.c instead once the right VBIOS bit is identified
08:27karolherbst: but first, I want to get reclocking stable, so that I know which changes are actually required to have a significant improvement
08:28RSpliet: These kind of MR writes all tend to differ per-board. If you don't find the respective VBIOS bit you are most likely to fix it for one board but break it for another
08:29karolherbst: yeah, and in the end it might be not so important, that's why I am currently simply try to figure out what's wrong
08:29RSpliet: This kind of a difference... vital! I encourage you to dive into this properly ;-)
08:30karolherbst: yeah, I guess I'll do this after I ran out of obvious differences
08:30karolherbst: there is also the 62c000 vs 62503c difference
08:30karolherbst: with the WAIT STATUS !(unknown) , 45478000 ns thing
08:31RSpliet: that's the display range, isn't it... suspect that will have a smaller impact
08:32karolherbst: touching 62c000 on laptops where there is no display thingy, totally crashed the gpu
08:32RSpliet: Ah, yes... there is that
08:32karolherbst: but it is written to in the end
08:32karolherbst: so it might be not that big of a deal
08:33karolherbst: and it could be some kind of do this or that thing and we have the GPU configued to use 62c000 and not 62503c or so
08:33RSpliet: 62c000 is a broadcast register, 62503c is an alias for a single HEAD?
08:33karolherbst: the differences in the reclocking scripts are rather big, so I am sure I'll have enough fun with it
08:33karolherbst: RSpliet: might be
08:34karolherbst: nvidia writes to 62503c prior FB PAUSE
08:34karolherbst: and writes 0x62c040 into 0xa and 0x616340 into 0xb
08:35karolherbst: what worries me more that nvidia writes different values into 0x1373f4
08:35karolherbst: the 12th bit set to 1, where we set it to 0
08:36RSpliet: yeah... I guess the only caution I'd advice you to take is that the code is I think developed for/on desktop GPUs. It makes assumptions that you now discover are probably wrong, but it's easy to make equally wrong assumptions based on one or two optimus set-ups
08:36karolherbst: or we set it later to 1
08:36karolherbst: it's a desktop gpu I am working on
08:36RSpliet: ^ I just made an assumption that was wrong
08:37RSpliet: they're the mother of all f... well
08:37karolherbst: that's why I try to keep the differences as small as possible
08:37karolherbst: and if a super small "fix" breaks something else, it's easer to figure out
08:37karolherbst: hopefully it's a simple fix in the end
08:38karolherbst: I like all those comments in the code
08:39karolherbst: really gives me an idea where to dig deeper when I'll finish with all the other things
08:45karolherbst: ohh nice, it looks pretty identical now
08:54karolherbst: RSpliet: okay, regarding those GPIOs, we already do the same nvidia does
08:54karolherbst: I am sure it will be something else which matters as well
08:55RSpliet: Are you able to verify in your "perf mode 2" that if you flick the voltage switch in the VBIOS, the MR write disappears or not?
08:56karolherbst: I am not that far yet
08:56RSpliet: Not judging... I shouldn't even be on IRC right now ;-)
08:57karolherbst: and I should prepare for tomorrow
08:58karolherbst: I get the feeling, that the more I do what nvidia does, the more unstable memory reclocking becomes here
09:10karolherbst: uhh, how that happen
09:11karolherbst: yeah, then no wonder it doesn't work
09:18karolherbst: and when I change the plls, I also need to retrain, makes sense
09:24karolherbst: now we are getting somewhere
10:05karolherbst: RSpliet: *sigh* I think something really small is wrong. Because whenever I reclock 07 -> 0a -> 0f it's much more stable than going 07 -> 0f
11:53RSpliet: does NVIDIA *ever* go from 7 to f without a in between?
11:54RSpliet: one hypothesis: PLLs don't always like to be reconfigured drastically, and sometimes simply need to be configured in steps
11:56karolherbst: RSpliet: yeah, that's also my assumption
11:56karolherbst: but it works pretty well on most cards
11:56karolherbst: and nvidia indeed goes directly to f
11:56RSpliet: ok, so that invalidates the idea
11:57karolherbst: there are a few differences in the scripts which might be related to this though
11:57karolherbst: but I am not able to take another look today
11:57RSpliet: In your own time obvs. Every tiny difference is worth investigating, problems are easily a "piling up" of tiny differences
11:58karolherbst: or maybe
11:58RSpliet: I've brought many many GT21x to their knees over tiny differences ;-)
11:58karolherbst: the big problem currently is, that most of the things we do different, we do for the right reason differently
11:59karolherbst: and this makes it a bit painful to find the thing we do wrong
11:59karolherbst: one obvious difference: nvidia only touches one PLL
11:59karolherbst: and skips mem training
12:00RSpliet: slight difference in order of commands is often not a huge problem, but ever-so-slightly different values can have big effects. Esp. if it's something like "DRAM has DLL disabled, but we forgot to inform the memory controller"
12:00RSpliet: ah yeah... they probably configured the other one well in advance?
12:00karolherbst: nvidia tends to do this sometimes
12:00karolherbst: maybe we should prefer to reuse the old pll configuration
12:01karolherbst: and just configure the second "multiplier" pll
12:01RSpliet: always. If it's configured well, we don't have to reconfigure
12:01karolherbst: it depends on the prior clocking state
12:01karolherbst: but usually the 0a and 07 clock states allow us to do this
12:01karolherbst: which might also explain why 0a -> 0f works and 07 -> 0f not
12:02RSpliet: simply not allowing enough time for the first PLL to stabilise before configuring the second?
12:02karolherbst: but the second is the less stable one normally
12:03karolherbst: the second is this crazy M == P == 1 one
12:03karolherbst: if M != 1 -> unstable
12:03karolherbst: we could set P to 2 though
12:05karolherbst: or was it he first pll?
12:07karolherbst: PMPLL.MCLK0_COEF is the one nvidia doesn't touch
12:07RSpliet: if (ram->mode == 2) around line 1064
12:07RSpliet: mode == 2 is supposed to mean "PLL"?
12:08karolherbst: mode == 2 is more like high freq GDDR5 mode
12:08karolherbst: clocks above ~ 2.2GHz
12:08RSpliet: ram->mode = (next->freq > fuc->refpll.vco1.max_freq) ? 2 : 1;
12:08RSpliet: ah yes
12:08RSpliet: so in that case it doesn't program the refpll coefficients at all?
12:08RSpliet: that seems wrong
12:08karolherbst: gk104_pll_calc_hiclk calculates the PLLs
12:09RSpliet: ah ok
12:09karolherbst: because it's easier that way
12:09karolherbst: or rather, it was the better working soltuion
12:09karolherbst: prior that we simply took PLL1 and calculated PLL2 to match the freq
12:09karolherbst: which didn't work (at all)
12:10karolherbst: the line "cur_N = target_khz / cur_clk;" could be cur_N = *N2
12:10karolherbst: and maybe that's enough then
12:14RSpliet: line 186 appears to be a wait long enough for PLL locking...
12:14RSpliet: and it skips reconfigure of the refpll if already has the right coefficients
12:15RSpliet: maybe the if-statement simply needs to verify the clock isn't just configured right but also enabled (132020 low bit) and not bypassed (132028 bit 19)?
12:16RSpliet: idk, haven't seen traces of kepler reclocking, that's more Ben's domain
12:16karolherbst: fun thing is, that nvidia always sets M/P to 1 for the PMPLL.MCLK0_COEF pll
12:17karolherbst: and never touches it again basically
12:18karolherbst: interesting indeed
12:37karolherbst: RSpliet: maybe we should simply calculate that PLL2 in the boot sequence and be done with it, because it isn't used for anything else afaik
12:41karolherbst: still unstable, sigh
12:41karolherbst: will take another look tomorrow
13:11stefanches7: Hello all, just wanted to ask where can I find sources for Mali-4xx GPU project
13:12karolherbst: not here
13:13stefanches7: Where then?
13:13karolherbst: no idea
13:42anEpiov: is the opencl part compilable at this point? is it testable?
13:43anEpiov: instead of waiting to be ready I figured I can be a beta tester
13:48karolherbst: anEpiov: well for a beta to exist, something needs to work
13:51pmoreau: anEpiov: You can always test my WIP, but you need to also compile a custom version of LLVM and clang, as well as SPIRV-Tools. And you can’t use clCreateProgramWithSource, but rather clCreateProgramWithBinary or clCreateProgramWithIL from OpenCL 2.1.
13:54RSpliet: pmoreau: anything you'd learn from those tests that's useful?
13:56anEpiov: pmoreau: ok I will
13:56anEpiov: pmoreau: which kernel are you using?
13:56pmoreau: RSpliet: Besides finding bugs, not that much. Getting an idea of which missing features should be prioritised.
13:56pmoreau: anEpiov: Any kernel should work, I haven’t changed anything there.
13:57anEpiov: pmoreau: is your version of llvm static?
13:58pmoreau: It doesn’t look like it.
13:58anEpiov: argh! system wide install?
13:58pmoreau: I have some instructions on how to setup things here: https://phabricator.pmoreau.org/w/mesa/testing_opencl_through_spirv/
13:59pmoreau: But it is missing the part about SPIRV-Tools. :-/
14:00pmoreau: The SPIRV-Tools repo can be found here https://github.com/pierremoreau/SPIRV-Tools/tree/implement_linker (use the implement_linker branch). I will update the instructions once I get back home from work.
14:00anEpiov: pmoreau: damn, excellent documentation, are you a teacher.
14:01pmoreau: Thanks, it went through a couple of iterations, trying to make it as easy as possible, and I think hakzsam was the one rooting for the automatic script when I was asking him to test stuff.
14:01pmoreau: I’m a teaching assistant.
14:02pmoreau: I’ll try to finish getting support for clCreateProgramWithSource, but I was having issues with LLVM. And I need to do some reviewing/testing for karolherbst.
14:03anEpiov: ok let's do it! let see if we can finish opencl in one month together!!
14:04pmoreau: Uhhh, I would still like to sleep at least a few hours a week! :-D
14:05pmoreau: Also, clover doesn’t even have OpenCL 1.2 support, so there would be some additional API work to get up to 2.2.
14:07pmoreau: And, to have OpenCL 1.0 support in Mesa, we need an upstream version of LLVM/clang generating SPIR-V, which is currently not the case, so work on that front will be needed as well.
14:18karolherbst: pmoreau: :)
14:20karolherbst: pmoreau: clover only does opencl->TGSI, right?
14:21pmoreau: There is also a path that uses clang to compile OpenCL -> LLVM IR, which is then used by the radeon driver IIRC.
14:22karolherbst: pmoreau: ahh
14:23karolherbst: pmoreau: I find the idea interesting to write a nir -> nvir translator, no idea if it's interested enough though so that I actually do this
14:23karolherbst: I would rather rewrite/fix RA prior to that
14:23karolherbst: which we will have to do to get OpenCL support anyhow
14:23pmoreau: I don’t know if anyone is using the OpenCL -> TGSI path, as it was broken for some time, before someone fixed it.
14:24pmoreau: Some modifications would have to be done to NIR for OpenCL/CUDA support, as it currently does not support unstructured control flow. Some pointer support was added relatively recently, but I don’t know whether it is enough or not.
14:27karolherbst: yeah, that's why fixing RA is most likely the "future"
14:30RSpliet: karolherbst: could you recap your problems with RA?
14:31karolherbst: especially if you spill register for tex instructions
14:31karolherbst: I have a trace where this bug shows
14:32RSpliet: Ok, but this is a bug rather than a fundamental issue, isn't it?
14:32karolherbst: there are also some fundamental issues afaik, but nothing I know for sure
14:33RSpliet: I got the impression that the "core" of current RA is nothing more than building an interference graph and assigning colours in a semi-arbitrary order.
14:33imirkin: RSpliet: karolherbst: i definitely remember skeggsb talking about multi-step reclocks
14:34karolherbst: the high clock gddr5 reclock is alway done in two steps
14:34imirkin: RSpliet: RA is fine. the issue is when it fails, we don't clean things up properly.
14:34imirkin: or ... *an* issue
14:35RSpliet: imirkin: that's my impression too. That's why I wanted to double-check whether karolherbst was leaning towards rewrite or fix ;-)
14:36karolherbst: or I evalute what is wrong in the broken shader here: https://gist.github.com/karolherbst/c2ed0a32eaa6a8451ffa521e9254e214
14:36karolherbst: RA spilling fail
14:37karolherbst: fun fact: only broken when 4 b32 stores are merged into one b128 one
14:37imirkin: i'd recommend avoiding doing any "rewriting" or "fixing" before you *actually* understand what the issue is
14:38karolherbst: I know
14:38imirkin: and yes... unmerging merged nodes is where some of the fail is.
14:38imirkin: so having fewer merged nodes makes fail less likely to happen
14:38karolherbst: I doubt you know the reason why this happens?
14:39imirkin: i sort of do
14:39imirkin: but not enough to fix it.
14:39RSpliet: I... presume most of the complexity in RA deals with (interference) analysis and phi nodes, which is why it's such a long slab of code. I don't think you'd want to rewrite that unless there's something fundamentally broken. Spilling is almost like a shell around RA that says "okay, this reg with long liveness and lots of interference is now spilling to DRAM and we'll re-try", so hopefully the opposite to "fundamentally broken" :-)
14:39imirkin: i think i've pointed you at a WIP commit with some fix attempts
14:39karolherbst: imirkin: yeah
14:40imirkin: RSpliet: the fundamental breakage might be in node merging though, which would affect RA.
14:40karolherbst: imirkin: I could imagine that some live range tracking is broken (tm) or that a register is written to, which has still a valid value
14:40imirkin: (not in the merging itself, of course, but rather in *undoing* it being impossible)
14:41imirkin: karolherbst: yeah, it's definitely not in any way related to that whatsoever
14:41RSpliet: imirkin: is this loosely related to the stuff that pmoreau has been trying to debug with merging 8-bit values into a 32-bit register?
14:41imirkin: there's additional subtlety with units etc
14:41imirkin: which i doubt has been super-duper-tested
14:42pmoreau: RSpliet: I do remember some issues with the merging, but I can’t remember which side was responsible for them.
14:43RSpliet: pmoreau: I suspect "all sides" ;-)
14:43imirkin: basically on nvc0+, regs are 32-bit
14:43imirkin: while on nv50, you can address 16-bit components of 32-bit regs
14:43imirkin: (at least with some instructions)
14:44imirkin: arguably you can do that on nvc0 as well - some ops can take a byte or word value out - but that's not handled at the RA level
14:44pmoreau: And SM_5.3+ which can do some 2x16fp, or even 4x8int ops, but we are not there yet :-)
14:44RSpliet: karolherbst: I feel your pain though... it's unfortunate that due to the nature of spilling you can't create small test cases :-P you might want to consider hacking up nouveau to think there's only 6 registers or something silly, to force spilling on more trivial kernels
14:44karolherbst: interesting though
14:45imirkin: no need for SM_%.3
14:45imirkin: SM_20 has those.
14:45imirkin: e.g. I2F.B2 or I2F.H1
14:46imirkin: we deal with those as subops of I2F
14:46imirkin: the other place it comes up is the V* ops
14:46imirkin: which we largely ignore
14:47RSpliet: are those the "SIMD video instructions"?
14:48pmoreau: I was more thinking of the (r0_low + r1_low, r0_high + r1_high) operations, where r0 and r1 each contain 2xfp16 values in a 32-bit reg, introduced with the TX1, and found in Pascal and above.
14:50imirkin: pmoreau: your builder died
14:50imirkin: on sep 4
14:50RSpliet: pmoreau: yeah, the video instructions seem to be 8/16-bit integer
14:50imirkin: RSpliet: yeah. kinda like SIMD inside of SIMD :)
14:50pmoreau: imirkin: Yes, glibc dependency for valgrind-mmt. I haven’t tried fixing it yet, will probably do that tonight.
14:50RSpliet: yo dawg
14:51pmoreau: Might be enough to bump the required version of glibc, or it might not
14:51pmoreau: imirkin: Wanted to suggest it to Philippe? :-)
14:52RSpliet: imirkin: and then now there is the 16x16 collaborative "tensor processor" instructions, I can't even imagine an abbreviation for that
14:52imirkin: pmoreau: yes.
14:52imirkin: your images are still fine
14:52imirkin: considerably more recent than, say, kernel 3.2
14:52pmoreau: Was writing a reply as well, but you were faster ;-)
14:53imirkin: feel free to suggest that he come here if he needs to speak french
14:54RSpliet: I wonder... whether NVIDIA might decide after Volta that it's really time for a clean-up overhaul
15:04imirkin: pmoreau: i thought speaking french was a requirement for nouveau contribution...
15:04karolherbst: imirkin: what
15:05pmoreau: Ok, from now on we will only speak french here! ;-D
15:06pmoreau: Or finish, choose your poison
15:06imirkin: more french than finnish speakers here, i think
15:07pmoreau: Should we make a poll? :-D
15:07pmoreau: (just kidding)
15:09imirkin: RSpliet and i might tip the scales.
15:34onext: Hi ! I report the bug number 102840, could someone help me, in frech if possible ? Thanks.
15:51RSpliet: onext: dsl, mais mon français n'est pas assez bien pour vous aider propre ;-) Mais... avez-vous essaié de mettre au courant votre noyau?
15:56pmoreau: RSpliet: Your French is not that bad, you would probably be able to help. :-) We have been pm’ing.
15:57RSpliet: Hahaha, merci. Mais alors, je te laisse lui aider ;-)
15:58pmoreau: For information, the card is a NV44A.
15:58RSpliet: Ah... oh dear. That does explain the 32-bit kernel
15:59onext: My machine is at least 10 years old...
16:01onext: Thanks a lot to you all! Bye!
16:06tstellar: pmoreau: Have you seen: https://github.com/google/clspv
16:06tstellar: pmoreau: This is probably the best solution for OpenCL C -> SPIR-V at the moment.
16:07airlied: though some of the fixes are to limit opencl c
16:08airlied: like dont use goto :-)
16:15imirkin_: oh i fixed some NV44A stuff ... at some point.
16:15imirkin_: is it a PCI NV44A or an AGP NV44A
16:16imirkin_: my fixes were for the PCI variant
16:59pmoreau: tstellar: I have, but it does generate Vulkan SPIR-V rather than OpenCL SPIR-V. But yes, it could be an option.
17:00pmoreau: tstellar: But feature-wise, the SPIRV-LLVM repo has been more than enough for now. :-)
19:08Lyude: anyone from nvidia who might be around on IRC today?
19:11mupuf: Ping ben, he should have access to a couple of them today!
19:12Lyude: skeggsb: ^ ? if anyone there might be able to help me figure out why their kernel driver panics whenever I try to start it with drm modesetting https://paste.fedoraproject.org/paste/30QqJtKmbtI-fSVmiJpFMA that would be appreciated...
19:13Lyude: i swear if I sneeze the wrong way this driver will seize up and die...
19:21mupuf: Lyude: sneezing is not part of the test plan ;)
19:39imirkin_: Lyude: they have support forums
19:46Lyude: sigh. i guess that might be what i'll need to do