04:27seyeongkim: kernel 4.x with P400, P5000, P6000 card and more than 64 GB RAM can't boot properly, what kind of subject i can research for this issue?
04:28imirkin: iirc someone reported this earlier
04:28imirkin: perhaps you, or an associate?
04:28imirkin: (i can't imagine there are too many of these systems running around)
04:28imirkin: anyways, iirc the comment was that the 64GB thing had nothing to do with nouveau, the system just wouldn't boot even if nouveau was never loaded
04:30imirkin: anyways, not sure what would be special about a 64GB limit as far as nouveau is concerned... i could imagine things going south at 4GB or at 1TB.
04:30imirkin: although the 4GB barrier is fairly well-tested these days
04:30imirkin: i'd have no trouble believing there were issues with a >1TB system though.
04:31seyeongkim: Thanks. and related machine has 1TB
04:31seyeongkim: need to set kernel parameter mem=64000mb
04:31imirkin: can you try a smaller number than 1TB (but larger than 64GB)?
04:32seyeongkim: I'm going to check it again
04:32imirkin: (the 1TB limit arises from the fact that nvidia gpu's have a 40-bit virtual address space)
04:32HdkR: Almost feel like it would be an issue when you hit the 40bit VA limit
04:32imirkin: otoh, that's just the VA ... i forget if the PTE's can address higher system ram locations.
04:33seyeongkim: if i test it with 1080, you can presume that same issue there?
04:34seyeongkim: ok, Im far from exact machine so I'm thinking alternatives..
04:34seyeongkim: I'll dig that part thanks imirkin HdkR
04:34imirkin: any G80 or later gpu has this limitation afaik
04:35seyeongkim: ah one more thing.. imirkin you may know code file for this limitation? e.g manually increasing for only testing
04:35imirkin: i'm glancing at the gp100 vmm... i can't quite tell if gp100 is 40- or 48-bit
04:35imirkin: it's an architectural limit
04:35seyeongkim: ah ok
04:36seyeongkim: thanks a lot
04:36imirkin: yeah, looks like GP100 can go up to 47 bits
04:36imirkin: but limited to 40-bit by default
04:37imirkin: er no. limited to 47 bits by default
04:37imirkin: but with an option to use the gm200 setup
04:37imirkin: (which is 40-bit)
04:38imirkin: seyeongkim: may i ask what you're doing with such a machine and nouveau?
04:38seyeongkim: actually our customer reported this issue to us , not sure what they really do with this
04:38imirkin: k. someone was in here a couple days ago talking about the same thing
04:39imirkin: there can't be too many people trying this, so probably same person
04:39imirkin: you can check the irc logs (see topic)
04:39seyeongkim: ah ok
04:40HdkR: Obviously the solution would be to send imirkin a system that breaches the 40bit VA limit to work around the problem :P
04:41imirkin: while that'd be self-serving, if you want things fixed, send them to skeggsb
04:47seyeongkim: ok I checked him in near team :) i don't know why he didn't update the case. Thanks
10:22tagr: imirkin: so that patch gets rid of the errors, but I suspect it's only hiding the issue, I get a bunch of these: https://hastebin.com/vevopoviki.apache
10:24tagr: everything is also pretty sluggish, obviously
11:47imirkin: tagr: yeah that's totally bogus
11:47imirkin: sorry for sending you such a half-baked patch
12:26tagr: imirkin: no problem, I'm happy to test anything you think could help
12:28tagr: imirkin: the patch helped confirm that on Linux 4.16.6, the freezes are actually permanent, so display doesn't refresh (other than the cursor)
12:28tagr: with 4.14, it recovers after a couple of seconds at maximum, or in many cases is even hardly noticeable
12:29tagr: which means that something must've changed in that area, right?
12:54imirkin_: tagr: maybe ... could be something silly though. i think skeggsb had the additional theory that if this is related to the software channel, then using DRI3 would reduce the likelihood of issues.
12:54imirkin_: the downside of DRI3 is that it doesn't 100% work with the nouveau ddx
12:54imirkin_: but ... 99.9% :)
12:56karolherbst: imirkin_: did you see this patch? https://lists.freedesktop.org/archives/mesa-dev/2018-April/192430.html it looks fine to me, allthough usually I wouldn't even bother.
12:57karolherbst: and I guess compilers are smart enough already...
12:57imirkin_: i did
12:57imirkin_: i meant to apply it
12:57imirkin_: but clearly that fell through.
12:58karolherbst: I can push it as well if you don't have time
13:05imirkin_: go for it
13:46tomeu: karolherbst: btw, don't know why, but I needed these changes to build llvm-spirv: https://github.com/tomeuv/SPIRV-LLVM-Translator/commit/9c82149364739b19c85b0db4a0b96dc34c976deb
13:47karolherbst: tomeu: do you really need the first one?
13:47tomeu: karolherbst: don't think so
15:42karolherbst: imirkin_: mhh, something is causing me to have more spilling fails, even for trivial enough shaders
15:48karolherbst: imirkin_: yeah.. maybe I just ported the RA fix wrongly we need for 64 bit values
15:57karolherbst: imirkin_: yeah, something in 5428066f5e1ef5ea6ae04c84019f270023cfc6aa breaks stuff for me :(
15:57karolherbst: or rather this + cwabbotts fix
15:59karolherbst: imirkin_: duh... I know the issue
15:59karolherbst: ohh wait, doesn't make sense
15:59imirkin_: that should have been a no-op
16:00imirkin_: that only affects nv50
16:00karolherbst: yeah... I know
16:00imirkin_: and even then, only in very rare cases
16:00karolherbst: that's why I said + cwabbotts fix
16:00imirkin_: i mean literal no-op
16:00karolherbst: I know
16:00imirkin_: like ... the code should do exactly the same thing
16:00karolherbst: right, but when I revert it, it works
16:00imirkin_: if reverting it helps in any way, that's highly surprising
16:00karolherbst: I have to apply https://github.com/karolherbst/mesa/commit/def1d1ddc2e8dca2ae967557f1c20204c7d9a96a on top of it
16:01karolherbst: last change is relevant
16:01imirkin_: oh, unless connor's fix was in that logic
16:01karolherbst: even then
16:01karolherbst: he basically just add a " && defi->op != OP_MERGE && defi->op != OP_SPLIT) "
16:01imirkin_: ok, so copy that into the code i refactored
16:02karolherbst: ... as if I didn't try that already ;)
16:02imirkin_: i can't imagine why anything else would matter
16:02karolherbst: me neither
16:02imirkin_: try to figure it out -- it should literally be the same lines of code executing before and after the change
16:02karolherbst: except something random is random in a different way
16:02imirkin_: just refactored into a function.
16:05karolherbst: imirkin_: .. guess what
16:06karolherbst: ohh wait, no
16:07karolherbst: I thought I fixed it, but it was just correct result still stored in VRAM
16:08karolherbst: imirkin_: cwabbott fix is totally unrelated, it cmpiles fine just with reverting your commit
16:08karolherbst: I don't need his patch
16:08karolherbst: (in this case)
16:09imirkin_: stupid question, but ... you're not compiling for nv50 are you?
16:09karolherbst: and I actually run the kernel
16:09imirkin_: well, i'll need the details.
16:09karolherbst: I diff the DEBUG=7 output
16:09karolherbst: maybe that gives me something
16:13karolherbst: imirkin_: there are differences like "RIG_Node[%108]($-1): 2 colors, weight inf, deg 12/63" vs "RIG_Node[%108]($-1): 2 colors, weight 7.200000, deg 12/63" right is reverted
16:13karolherbst: the weight is inf on master
16:13karolherbst: I mean, without the revert
16:14imirkin_: that means i'm fucking something up
16:14imirkin_: can you provide both of those files in full?
16:14imirkin_: i can't investigate now, but will try to get to it tonight
16:15imirkin_: i'll also try to stare at the code to see if the issue appears.
16:15imirkin_: to be clear, this is master vs master + revert, right? no other funny RA-related changes, like connor's
16:15karolherbst: all RA changes like connor's are removed, it is on my opencl branch though
16:16karolherbst: but your change is the newest showing up with git log ... nv50_ir_ra.cpp
16:16imirkin_: is the branch somewhere i can see?
16:16imirkin_: [in case i need to double-check anything]
16:17imirkin_: k. i'll have a look tonight.
16:17imirkin_: Lyude: and maybe you can have a look at the DP-MST thing ;)
16:17karolherbst: imirkin_: well, I try to figure out what changed as well. Your change isn't really that big...
16:25karolherbst: imirkin_: that fixes it: https://gist.github.com/karolherbst/52ee5433affd605701a6407cb28bdba7
16:25karolherbst: maybe a smaller patch is needed
16:25karolherbst: but this is basically the changes you did while moving
16:31karolherbst: I found it
16:31karolherbst: ... bah
16:31karolherbst: no :(
16:32karolherbst: imirkin_: https://gist.github.com/karolherbst/52ee5433affd605701a6407cb28bdba7 ;)
16:32imirkin_: how can that matter?
16:33karolherbst: I like that '// doesn't help' comment though
16:33imirkin_: doesn't help... but it hurts!
16:34karolherbst: I'll write a fix
16:34karolherbst: .. well on master
16:40karolherbst: imirkin_: https://github.com/karolherbst/mesa/commit/3c2476c01aee39c8483636c842779c6dc7103881
16:40karolherbst: do you want to check if the test still passes?
16:44imirkin_: yeah..... i'm a bit concerned about sticking the noSpill on there.
16:44imirkin_: i wanted to leave it off.
16:44imirkin_: but perhaps it should go on there
16:44imirkin_: i'll think about it. thanks for tracking it down!
16:47imirkin_: [will look tonight, but now with a much higher likelihood of success]
17:37Lyude: imirkin_: yep! sorry I didn't get a chance yesterday but I brought back the MST stuff I would need to test it
17:38imirkin_: yep, no worries. just keeping it near the top of the proverbial stack.
17:38imirkin_: [until you either do it or tell me to go away]
18:24karolherbst: imirkin_: the second arg of popcount is a mask?
18:24karolherbst: so could I do popcnt $r0 $r0 0xff for chars?
18:26imirkin_: karolherbst: the two args of popcnt are and'd together
18:26imirkin_: note that iirc the second arg is lost on maxwell+
18:27karolherbst: wondering why nvidia still does this then: POPC R0, R0, -0x1;
18:27karolherbst: ohh wait
18:27karolherbst: my mistake :)
18:27karolherbst: I compiled for sm_30
18:28imirkin_: it's conceivable the 2-arg thing still exists, i haven't extremely investigated
19:21pendingchaos: imirkin_: what the source of OP_PIXLD used for?
19:25karolherbst: pendingchaos: you mean for what is pixld used or what the source should be?
19:27pendingchaos: I guess the second. all code that creates a PIXLD instruction seems to supply it zero, though https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp#n2658 seems to say it is used for something
19:32imirkin_: pendingchaos: some PIXLD ops take an arg
19:32imirkin_: some don't
19:33imirkin_: pendingchaos: or perhaps it's the RT index. i really don't know tbh.
21:00karolherbst: imirkin_: what's LDG.CI.U8?
21:00karolherbst: the CI especially
21:00karolherbst: load with global address, but it is caches as a const buffer actually?
21:03HdkR: karolherbst: bzz, wrong
21:03karolherbst: HdkR: ?
21:04HdkR: That's not what the CI means :P
21:04karolherbst: are you sure?
21:05karolherbst: or does ldg.ci just mean to cache more aggressivly, because the data never changes?
21:06karolherbst: well I am 100% sure it has something to do with caching :p
21:06HdkR: oh wait no, I misread
21:06HdkR: yes, latter
21:07karolherbst: mhh interesting, reads from global are only cached inside L2 on Kepler
21:07karolherbst: but on maxwell with .CI it can be promoted to be cached in L1 as well
21:07karolherbst: or starting with kepler2 actually