08:36 karolherbst: nice, some flickering issues in a game
09:19 karolherbst: uhh, texture doesn't get drawn, weird
09:21 karolherbst: but a frame earlier it got
11:03 npnth: Hi, I've got a machine that appears to be leaking memory in the kernel.
11:03 npnth: If I let it sit for a week or so, I end up with 6G of memory in SUnreclaim (according to /proc/meminfo), which according to slabtop is from 'kmalloc-64'.
11:03 npnth: (In particular, it's *not* from 'ext4_inode_cache' or anything like that.)
11:03 npnth: The memory isn't owned by any process - I can kill everything down to X and crond without significantly affecting memory usage.
11:03 npnth: The above makes me think this is in the kernel. The most obscure thing I run is nouveau, so I figured I'd ask:
11:03 npnth: Are there any known memory leaks with nouveau right now? If so, great, I can stop looking and wait for a fix. If not, I'll keep looking.
11:28 karolherbst: npnth: well, it could be everything
11:29 karolherbst: and if we would know about any leaks, we would try to fix it
11:30 npnth: Sure, of course. But perhaps somebody found a leak a month ago and is trying to track it down, and there isn't a fix yet, but people know that there's a problem.
11:31 npnth: That's the situation I'm hoping for, because then I can stop trying to figure out what the results of "perf" mean, and which project I can even go to with my bug report.
11:31 karolherbst: well, not that I know of
11:31 npnth: Fair enough - thanks then!
11:31 karolherbst: and I doubt you will be able to figure it out with perf
11:31 npnth: Personally, I doubt that I'll be able to figure it out at all :)
11:31 karolherbst: maybe there is some kind of kernel memory leak detector... maybe not, I never looked into that
11:32 pmoreau: imirkin_: NVC0LegalizeSSA::handleShift assumed the shift would not be an immediate but didn’t check for it. I added a handleImmShift version as you can simplify things a bit when you know the immediate.
11:38 karolherbst: pmoreau: nice
11:40 pmoreau: I have no idea how I did not hit that code path before, since those transforms are at the very least two years old :o
11:40 karolherbst: luck
11:40 karolherbst: the order of opts sometimes matters
11:41 karolherbst: and many of the opts depend on that order
11:41 karolherbst: like early opts might not check for mods
11:41 pmoreau: True
11:43 RSpliet: karolherst.npnth: you can configure slab with leak detect functionality. Can't tell you exactly how that works...
11:47 npnth: RSpliet: I'm currently reading up CONFIG_DEBUG_KMEMLEAK, which might be what you mean. That's OT, I guess, but if I find something concrete I'll let people know.
11:47 RSpliet: DEBUG_SLAB_LEAK is the one I thought of, perhaps KMEMLEAK is more useful/relevant
11:50 npnth: RSpliet: That sounds good, too. I'll try it.
12:34 pmoreau: karolherbst, imirkin_: Do you think it is better to have it like this https://hastebin.com/orujikewuj.php or include the imm handling in the existing handleShift function?
12:37 pmoreau: Or even call the handleShiftImm function from within handleShift.
12:44 karolherbst: pmoreau: usually it is easier to handle that in one function
12:44 karolherbst: but
12:47 karolherbst: mhh
12:48 pmoreau: karolherbst, imirkin_: If within the same function, it gives something like this: https://hastebin.com/kabimokani.diff
12:49 karolherbst: try to be smarter ;)
12:49 pmoreau: Minus the assert at the beginning. :-D
12:49 karolherbst: the OR instruction is bascially the same in both cases
12:50 pmoreau: Only one arg changes, true
12:50 pmoreau: And it does not have a predicate
12:50 karolherbst: but mhh
12:51 pmoreau: So I am going to have if/else blocks all over the place inside that code (almost).
12:51 karolherbst: maybe a different approach
12:51 karolherbst: if you shift by more then 32 bits, you can simplify that to a 32bit shift eintirely
12:51 karolherbst: I mean a shift by x-32
12:52 karolherbst: mhh
12:52 karolherbst: and you need to set something to 0 I think
12:53 pmoreau: The low bits of dst should be zero in that case.
12:55 karolherbst: yeah, something like that
12:55 karolherbst: you just shift the low bits by x-32 into dst.hi and set dst.lo to 0
12:57 pmoreau: That’s what’s happening (minus the dst.lo being set to 0).
12:57 karolherbst: you shift the src.hi into dst.hi, right?
12:57 karolherbst: or what is src[1]?
12:58 pmoreau: src[i] is src.hi
12:58 karolherbst: ahh, okay
12:58 pmoreau: Oh right, I see what you mean.
12:59 karolherbst: you confused me :O
13:00 pmoreau: Ah, but the srcs are swapped if it’s a SHR.
13:00 pmoreau: So...
13:00 karolherbst: ohh, I see
13:01 pmoreau: I *think* the original code is partly wrong.
13:16 pmoreau: Yes, there is an error: The operation after `Compute HI (shift > 32)` should be using src[0] and not src[1].
14:09 NSA: :(
14:09 NSA: appearently i somehow killed my gpu
14:09 NSA: well
14:10 NSA: at least the driver
14:10 karolherbst: NSA: rebooting should help in every case
14:10 NSA: i was wondering if there's any way to avoid that
14:10 karolherbst: try to find the cause
14:11 NSA: it's spamming my journal with plenty of this: https://paste.debian.net/plainh/ff87f901
14:11 NSA: once per second
14:11 NSA: just the ch3: psh line is changing
14:11 karolherbst: uhh
14:11 karolherbst: is there something before that?
14:12 NSA: system froze before that for about an hour due to oom
14:12 karolherbst: I think imirkin might know what is going on here
14:12 NSA: then it was back responding for a few seconds
14:12 NSA: and after that it froze again
14:12 NSA: i was able to ssh in though
14:12 NSA: and it's responding just fine
14:12 NSA: just the display is frozen
14:13 karolherbst: I think it might be the bug where something spawns too many requests and we run out of space in the fifo or something like this
14:13 karolherbst: but maybe there is some initial error in the log
14:16 NSA: https://gist.github.com/Nothing4You/392eb93e8dfab7e48b41a173ac680d1a this should be where it started
14:17 karolherbst: mhh
14:17 karolherbst: that out of memory doesn't sound good to begin with
14:21 NSA: yeah, i've been running oom every once in a while - been thinking about adding some swap, however, i'll have to repartition my ssd for that
14:21 karolherbst: mhh
14:21 karolherbst: you could swap to zram
14:22 karolherbst: zram is basically compressed memory, which you can use as a swap device
14:22 karolherbst: it is still inside RAM, but compressed
14:22 karolherbst: some people get around 2x memory with this
14:22 NSA: lol
14:22 karolherbst: but there is a performance hit, which you also have by swapping to a disc ;)
14:22 NSA: obviously
14:22 karolherbst: just not as bad I figure
14:22 karolherbst: especially if you compare it with a HDD
14:23 karolherbst: I use zram for swap and tmpfs replacement
14:23 NSA: luckily i don't use hdds in my desktop anymore
14:23 NSA: only for storage via lan
14:23 karolherbst: well swapping on SSD should be still slower though, maybe not with those pcie nvme ones
14:24 NSA: yeah but it's far better than hdds
14:24 karolherbst: yeah
14:24 karolherbst: allthough I would argue that swapping to a ahci ssd is also pain, because you block other disc IO
14:24 NSA: wait what
14:24 NSA: it just fucking recovered
14:24 NSA: seriously?
14:24 karolherbst: sometimes it does
14:24 NSA: i just waited long enough and it is back
14:25 karolherbst: well, nouveau tries to reset the state and move one
14:25 karolherbst: sometimes it works, sometimes it doesn't
14:25 karolherbst: and sometimes it just needs a lot of time
14:25 NSA: i think i just wasn't able to use my pc for about 1.5h due to problems originating with the OOM
14:26 karolherbst: how much RAM do you have?
14:26 NSA: 32gb
14:26 karolherbst: mhhh
14:26 karolherbst: that should be plenty...
14:26 NSA: /should/
14:26 NSA: that's what people keep telling me
14:26 NSA: yet i run into oom like once or twice in 2 months
14:27 karolherbst: weird
14:27 karolherbst: I usually max around 12GB
14:28 karolherbst: maybe you should keep an eye open and check which application might eat so much RAM, allthough that isn't really easy in some cases
14:28 karolherbst: or maybe it is also inside the kernel, where some memory leak detection might figure that out
14:29 NSA: idk
14:30 NSA: probably not happening often enough to "easily" figure that out
14:30 karolherbst: well
14:30 karolherbst: maybe
14:31 karolherbst: or run with 16GB, should speed it up :p
14:31 NSA: and i wouldn't be surprised if it was closed source software causing those issues
14:31 NSA: like in this instance it was the steam web helper that was oom killed
14:31 karolherbst: well
14:31 karolherbst: steam web helper is basically chromium
14:31 NSA: that doesn't mean valve is using it properly
14:32 karolherbst: well, we don't know. Web browser tend to do a lot of allocation and deallocations and might just trigger the oom situation
14:32 karolherbst: could be everything else needed so much RAM
14:32 karolherbst: or leaking it
14:35 NSA: i guess i should just go and add some swap soon
16:12 imirkin: pmoreau: 64-bit shifts can't get an immediate arg merged in
16:12 imirkin: pmoreau: pretty sure that canInsnLoad is set up not to allow that
16:13 imirkin: pmoreau: on input into the IR, you should never have an immediate arg to a real (non-mov) op
16:14 pmoreau: imirkin: I always have the immediates go through a mov first before being used.
16:15 pmoreau: The issue is ConstantFolding folds an imm into MUL (or MAD) for the second operand. Then, turns out the imm is a power of two -> SHL.
16:15 pmoreau: And you get an immediate as the shift value.
16:39 imirkin: ahh =/
16:40 imirkin: ok yeah, so then handleShift needs a bit of help
16:40 imirkin: i did not anticipate that scenario.
16:41 imirkin: (and we want the MUL to have the imm because that makes split64BitOps work better)
16:42 pmoreau: Right
16:42 imirkin: hopefully my comments were sufficient to explain wtf was going on
16:42 imirkin: it *is* a bit subtle, esp the way i have the std::swap() in there ;)
16:43 pmoreau: Also, there is a small error in handleShift: https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp#n219 it should be src[0], not src[1], according to the comments earlier (and it works better)
16:43 imirkin: i think one of you was saying i got the args backwards, but i doubt it
16:43 imirkin: that said, it should be trivial to trigger any issues with a piglit test, so go for it
16:43 pmoreau: Yeah, the double swap makes it a bit trickier.
16:44 imirkin: "it works better"?
16:45 imirkin: i.e. you found bugs in the existing logic?
16:45 imirkin: what values does it not handle?
16:45 imirkin: just make a shader_test, should be easy to prove
16:45 pmoreau: Well, I get the same result as doing the same operation on the CPU.
16:45 imirkin: for which values.
16:46 pmoreau: That’s weird, that was for 1ull << 32ull
16:47 pmoreau: But the predicate of that instruction should be false.
16:48 imirkin: ok
16:48 imirkin: well, that clearly should be handled.
16:49 imirkin: let's make a shader_test...
16:49 pmoreau: If we consider a SHL, src[0] == src0.lo and src[1] == src0.hi, right?
16:50 imirkin: i haven't looked at the code
16:51 pmoreau: The comments says "(HI,LO) << x = (LO << (x - 32), 0)", so "HI = (LO << (x - 32))", but the code does "shl dst[1] src[1] x_minus_32", so "shl dst.hi src.hi x_minus_32"
16:51 karolherbst: read the code ;)
16:53 pmoreau: Let’s see if I can right a shader_test
16:53 pmoreau: s/right/write
16:53 karolherbst: bld.mkSplit(src, 4, lo->getSrc(0));
16:53 imirkin: pmoreau: https://hastebin.com/ifilefamod.cpp
16:53 imirkin: does this fail for you?
16:53 imirkin: it passes on SM35
16:54 karolherbst: so I guess src[1] is the hi value here, because it gets data.offset += halfSize applied
16:55 pmoreau: imirkin: It skips :-/ You run it with shader_runner, right?
16:55 imirkin: pmoreau: yes
16:55 imirkin: it passes for me
16:55 orbea: this shader commit for RetroArch just didn't work for me, no crash, just immediately closed. Couldn't even get an apitrace. Could there be something nouveau cant handle in it or is the problem more general? https://github.com/libretro/RetroArch/commit/bada13a2152b2092e7c66e656b06f14bcd8ea07c retroarch verbose output at least https://pastebin.com/WKVxw3TP
16:56 imirkin: both with the SM35 code, as well as with the SM30 code
16:56 feep: I'm a newb and I have no idea about the codebase, but ... shouldn't those be informatively named macros?
16:56 feep: GET_HI, SET_HI and such
16:57 karolherbst: feep: in a perfect world, yes
16:57 imirkin: feep: writing a compiler can be subtle work.
16:57 imirkin: figuring out wtf "hi" means in the first place can be tricky.
16:58 pmoreau: imirkin: What are the arguments again to let it run and print the result, rather than displaying a window? It’s something with auto and fb, but I can’t get it right.
16:58 imirkin: one could make single-use macros, but those tend to just obfuscate code more
16:58 imirkin: pmoreau: -auto
16:58 imirkin: pmoreau: green = good, not-green = bad.
16:58 pmoreau: Ah, I was trying with double dash
16:59 imirkin: pmoreau: also -fbo does off-scren rendering, but that doesn't really matter here
16:59 imirkin: pmoreau: anyways, when you find a set of args that mess up that shader, let me know. until then i'm assuming that handleShift is OK :)
16:59 pmoreau: imirkin: "./bin/shader_runner -auto shift.shader_test" could not read file "-auto" what did I messed up :-(
17:00 imirkin: pmoreau: based on tests/spec/arb_gpu_shader_int64/execution/fs-shift-scalar-by-scalar.shader_test btw
17:00 imirkin: filename first, options second
17:00 karolherbst: pmoreau: put args after file
17:00 pmoreau: Eh, ok
17:02 pmoreau: It does pass as well, but the error is in the shfl > 32 code path.
17:04 pmoreau: And if I change shfl to 33, and the result to 0x200000000, it does fail. Unless I change src[1] to src[0].
17:04 karolherbst: pmoreau: I am sure you are right here as well
17:05 imirkin: pmoreau: ok. just test it out for all the various cases.
17:06 imirkin: fwiw the SM35 path works ;)
17:06 imirkin: not that that's surprising
17:06 karolherbst: ;)
17:06 pmoreau: I couldn’t properly read my test: I had 1ull << 33ull in my test, not 1ull << 32ull as I previously said.
17:07 karolherbst: ohh, imirkin I found an issue with some textures not being drawn in a few frames
17:07 pmoreau: Anyway, will try the various test cases.
17:07 karolherbst: I have a trace if you want to look at it
17:07 karolherbst: I am sure your GPU can render that trace in under a minute
17:07 karolherbst: well 30 seconds even
17:07 karolherbst: I already tried MESA_DEBUG=flush, but that didn't do anything
17:08 imirkin: pmoreau: lol ok.
17:08 imirkin: pmoreau: just make sure it works with shr
17:08 imirkin: add more swaps as necessar y;)
17:08 pmoreau: Right, was going to try that
17:08 imirkin: karolherbst: i can look, but i have ... no time
17:08 karolherbst: mhh
17:09 karolherbst: imirkin: well, at least you can tell me if it happens on gk110
17:09 pmoreau: I think there are enough swaps :-D
17:09 karolherbst: uhm
17:09 karolherbst: or newer
17:09 imirkin: well - GK208
17:09 karolherbst: I tried to install it on my pascal one, but steam doesn't like my fedora or the other way around...
17:13 karolherbst: imirkin: https://drive.google.com/open?id=19oejmug6I7yC9LVHlhYVaCW0VjvaDQDi
17:13 pmoreau: 0x2ffffffff >> 33 == 1 currently fails, but succeeds if using src[0] as well (which makes sense as src[0] contains the high bits here, and we want those, not the low bits.
17:14 karolherbst: call 959061 is correct on intel and wrong with nouveau
17:14 karolherbst: pmoreau: write a patch
17:14 pmoreau: karolherbst: Planning to :-)
17:14 karolherbst: well, hopefully we have some piglit tests testing this
17:14 karolherbst: maybe
17:14 karolherbst: ;)
17:15 karolherbst: pmoreau: you mean this src[1], right? https://github.com/karolherbst/mesa/blob/master/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp#L219
17:15 karolherbst: yeah, that has to be src[0] for sure
17:16 pmoreau: Yes, that one
17:17 karolherbst: weird though
17:17 pmoreau: Going to run the various int64 piglit tests.
17:19 karolherbst: x32_minus_shift is shift - 0x20, and the code wants to do src[0] << -(shift - 0x20) basically
17:19 pmoreau: Yep, which is fine
17:19 pmoreau: Wait no, you mixed the first one
17:19 karolherbst: I don't think so
17:19 pmoreau: x32_minus_shift == 32 - shift
17:20 karolherbst: ohh wait
17:20 karolherbst: src(0) gets the -
17:20 karolherbst: so
17:20 karolherbst: -shift + 32
17:21 karolherbst: and then
17:21 karolherbst: src[0] << -(-shift + 32)
17:21 karolherbst: which is src[0] << shift - 32
17:21 karolherbst: mhh
17:22 karolherbst: looks a bit like this is more complicated than it should be, but it seems fine though
17:22 pmoreau: Right, but it avoids one operation.
17:59 imirkin: karolherbst: 959061 in that trace is a glBindBuffer call
18:01 karolherbst: yeah
18:02 karolherbst: uhm
18:02 karolherbst: wait
18:02 karolherbst: ahhh, crap, I debuged a different trace I made
18:05 imirkin: karolherbst: pmoreau: yeah, that clearly has to be src[0]
18:05 imirkin: since we want LO << something, and LO is in src[0]
18:05 pmoreau: Right
18:06 imirkin: (finally got a chance to look at the code carefully)
18:06 imirkin: feel free to add even more comments if you like
18:06 pmoreau:should right shader_test more often
18:06 pmoreau: The comments in the beginning are pretty good
18:07 imirkin: it's pretty compact code
18:07 imirkin: but it avoids having to have 4 semi-identical sections of logic
18:07 pmoreau: Updating fs-shift-scalar-by-scalar.shader_test test to include shifts higher than 32.
18:07 imirkin: er, 2 i guess.
18:08 imirkin: yeah, good idea
18:08 imirkin: shader_test's are pretty easy to write
18:08 imirkin: there's a bit of boiler plate, but there's tons of examples
18:08 karolherbst: imirkin: https://drive.google.com/open?id=1hqFiZEOp8hW5ayJju7aF9FV_fJjQzgEe
18:08 imirkin: eventually you can do it all from memory
18:09 imirkin: karolherbst: ok, that's a draw in this one. good start :)
18:10 karolherbst: :D
18:10 karolherbst: well there should be some path rendered on the bottom
18:10 karolherbst: the one in the texture
18:10 karolherbst: on intel it does get rendered
18:11 imirkin: karolherbst: https://i.imgur.com/iosBj7F.png
18:11 imirkin: i assume whatever's supposed to be there is missing?
18:11 imirkin: this is with mesa 17.3.0-rc5
18:11 pmoreau: True, setting up the same thing with OpenCL is a bit more involved. Plus it will depend on all my work, rather than just the regular code.
18:11 karolherbst: imirkin: look at texture0
18:12 imirkin: some black vase-looking thing?
18:12 karolherbst: open it
18:12 karolherbst: and flip it
18:12 imirkin: done
18:13 karolherbst: that is the same call on intel: https://i.imgur.com/zQaRx7X.png
18:13 imirkin: oh i see
18:13 karolherbst: replaying the trace you should see some flickering
18:13 karolherbst: you can imagine that isn't right
18:14 karolherbst: afaik some texture don't get drawn in calls like this one
18:14 karolherbst: no idea why
18:15 imirkin: oooh!!!
18:15 imirkin: i wonder if it's the bug i just fixed
18:15 imirkin: hold on
18:15 karolherbst: :O
18:16 imirkin: no :(
18:16 imirkin: but it was such a good idea
18:16 imirkin: it's *exactly* the case that's broken
18:16 imirkin: i.e. a mix() with a single-arg lerp param
18:16 karolherbst: ?
18:17 imirkin: https://bugs.freedesktop.org/show_bug.cgi?id=103955
18:17 karolherbst: mhhh
18:17 karolherbst: but it works with intel and llvmpipe
18:17 imirkin: yeah. and it doesn't work with that bug fix in place. so it's not that issue.
18:17 imirkin: but it's *close* to that issue ;)
18:17 karolherbst: ahhh
18:18 karolherbst: the sahders didn't look really special
18:18 karolherbst: so I didn't really looked into those
18:18 imirkin: gl_FragColor = mix( grey, tex0, Saturation);
18:18 imirkin: and Saturation is a "uniform float"
18:18 imirkin: i think in the tgsi, it comes out as CONST[].xyzw
18:18 imirkin: instead of CONST[].xxxx
18:18 karolherbst: uhhh
18:19 imirkin: in practice, Saturation == 1.0
18:19 imirkin: so the color should be 100% the texture
18:19 karolherbst: okay, I could check that
18:20 imirkin: but like i said, i don't think that's it
18:20 imirkin: (also note that patch has a minor bug... should be oo[2].swizzle, not ->swizzle)
18:20 karolherbst: I just noticed that there is MESA_SHADER_DUMP_PATH and MESA_SHADER_READ_PATH :)
18:21 imirkin: there's another potential problem in that it's a GL_RGB framebuffer
18:21 imirkin: and it's using blending
18:21 imirkin: although GL_DST_ALPHA is never used...
18:21 imirkin: which is the theoretically-problematic case
18:23 imirkin: karolherbst: do try that patch, perhaps i have some other patch which is causing additional troubles
18:25 karolherbst: ../../../src/mesa/state_tracker/st_glsl_to_tgsi.cpp:1365:41: error: base operand of ‘->’ has non-pointer type ‘st_src_reg’
18:25 imirkin: see above
18:25 karolherbst: ahh
18:26 karolherbst: but mhh
18:26 karolherbst: why does it work with llvmpipe then
18:27 imirkin: i prefer not to ask such questions until i find out that the patch helps
18:27 imirkin: if it helps - we can figure it out
18:27 imirkin: perhaps we implement LRP differently
18:27 imirkin: but if it doesn't, then don't worry and move on :)
18:27 karolherbst: right, it doesn't help
18:27 imirkin: ok cool
18:28 imirkin: at least it's consistent.
18:28 imirkin: ok. another thing going on here is that it's an elements draw with a small number of elements
18:28 imirkin: it's conceivable that the push hint stuff logic is broken
18:28 karolherbst: mhh
18:29 karolherbst: but that should happen every frame
18:29 karolherbst: or well, the call should be kind of the same every frame
18:30 karolherbst: except that the scene is a bit different
18:30 imirkin: yep, doesn't help
18:30 karolherbst: see call 953567
18:30 karolherbst: same thing, just one frame earlier
18:32 karolherbst: modelviewProjection has different values though
18:32 karolherbst: which is an uniform in the fragment shader
18:32 karolherbst: but uhm...
18:33 imirkin: anyways
18:33 imirkin: i gtg
18:33 imirkin: good luck
18:33 karolherbst: okay
18:33 karolherbst: I will play around with that a bit
18:35 karolherbst: uhm... vertex shader though
18:40 pmoreau: imirkin: Does https://hastebin.com/yeweqanixe.md seems reasonable considering the input is https://hastebin.com/nojuyevume.cpp ?
18:40 pmoreau: I don’t understand why there is no ISHR64.
18:44 karolherbst: imirkin: okay, that uniform indeed changes the behaviour
18:44 karolherbst: maybe we run into some precision issues here?
18:55 karolherbst: uhhhh
18:55 karolherbst: 64bit int stuff
18:57 imirkin: pmoreau: yes... quite reasonable ... there is both an I64SHR and U64SHR
18:58 imirkin: pmoreau: only a U64SHL since sign doesn't matter for SHL
18:58 karolherbst: here are the shaders: https://gist.github.com/karolherbst/660a390dea931e6c4d92a899d3ffa0f6
18:58 pmoreau: Ugh, yeah, I missed the I64SHR --"
18:59 imirkin: karolherbst: is that with my patch?
18:59 imirkin: karolherbst: 10: LRP TEMP[7], CONST[0][0].xxxx, TEMP[2], TEMP[6]
18:59 karolherbst: no
18:59 karolherbst: ohh wait
18:59 karolherbst: it might be
18:59 imirkin: i think before it would do "CONST[0][0]"
18:59 karolherbst: no, it isn't
18:59 imirkin: hm, oh well.
19:00 imirkin: that's what i had in mind though about fixing
19:05 imirkin: 23: sub ftz f32 $r4 $r3 $r3 (8)
19:05 imirkin: 24: mad ftz f32 $r3 $r4 c0[0x0] $r3 (8)
19:05 imirkin: heh
19:05 imirkin: should probably have some opt that sub (a,a) == 0
19:05 imirkin: i guess it's not 10000% accurate, but it's *pretty* accurate
19:05 imirkin: (since e.g. Infinity - Infinity = NaN, and NaN - anything = NaN)
19:05 karolherbst: wait a second...
19:06 karolherbst: oh no, should be fine
19:06 imirkin: yeah. it's correct. both sides have tex.a as the arg
19:06 imirkin: so a lerp between tex.a and tex.a will always give ... tex.a
19:06 karolherbst: I am sure the issue is within that vertex shader though, or the matrix calculation ends up producing values the fragment shader can't handle
19:07 pmoreau: imirkin: Could you please try this on your SM35? https://hastebin.com/piqakitora.cpp
19:09 imirkin: pass
19:09 imirkin: i think i did a lot less testing on SM30
19:10 imirkin: i basically just did it, then forced the lowering on my box, ran a couple tests and it was fine, so ... must be right
19:12 pmoreau: No worries! Besides for OpenCL address computations, I don’t think int64 paths will be that used.
19:21 karolherbst: imirkin: well, when I do gl_FragColor = tex0, I see basically the same issue
19:39 karolherbst: imirkin: something is wrong with that matrix multiplcation, but I really don't know what
19:44 karolherbst: ahhhh
19:45 karolherbst: mhh interesting
19:47 karolherbst: ha
19:48 karolherbst: okay, here is the deal
19:49 karolherbst: imirkin: in this line: "gl_Position = modelviewProjection * vec4(vPosition, 1.0);" the application depends on modelviewProjection[3] having -z <= w. but the inputs are always like z == -w
19:50 karolherbst: so if I slightly increase w or slightly decrease z the issue kind of disappears
19:50 karolherbst: and decrease z means * 0.99999... and increase w means * 1.000....1
19:51 karolherbst: z being negative and w positive
19:51 imirkin_: so it's getting clipped?
19:52 karolherbst: well, might be
19:52 karolherbst: in the shader, z should be c0[0x38] and w c0[0x3c], right?
19:53 karolherbst: mhh
19:53 imirkin_: can't look now
19:53 karolherbst: mhh, well in the end both values are parts of a matrix multiplcation into gl_Position
19:54 karolherbst: this line: gl_Position = modelviewProjection * vec4(vPosition, 1.0);
19:56 karolherbst: maybe some hw precision thing?
19:56 imirkin_: unlikely
20:03 tobijk: karolherbst: floar vs double in tgsi rep?
20:03 tobijk: *float
20:05 karolherbst: mhhh
20:05 karolherbst: I don't think those are actually doubles in any way
21:26 Faults: karolherbst: Join #Solus-Chat where we can praise Solus and worship Ikey
21:46 Lyude: Is the API for rnndb documented anywhere?
21:46 Lyude: not entirely sure yet; but I might want to start hooking some stuff up from biopenly to rnndb
21:47 imirkin_: not completely
21:48 imirkin_: rob's been using it for freedreno, you can talk to him about his experiences with it
21:48 imirkin_: eric went with the intel-made thing for vc4
21:48 imirkin_: which is approximately the same as rnndb, but NIH
21:55 karolherbst: I think etnaviv also uses some parts of envytools like rnndb
21:55 imirkin_: ah yeah, they do
21:58 karolherbst: how can I disable clipping in nouveau?
21:58 karolherbst: well, if it can be disabled at all
22:28 karolherbst: imirkin_: oh yeah, by disabling enough stuff which have "clip" in their names, the issue went away
22:28 imirkin_: ;)
22:29 karolherbst: all my changes: https://gist.github.com/karolherbst/d75444fd2e0cc3a54ab160734e12fcdc
22:29 imirkin_: https://gist.github.com/karolherbst/d75444fd2e0cc3a54ab160734e12fcdc#file-clip-patch-L58
22:30 karolherbst: I am sure it is the change in nvc0_state.c which helped
22:30 imirkin_: my guess is that's the hunk that plays.
22:30 karolherbst: :D
22:30 karolherbst: yep
22:31 karolherbst: that's the one
22:33 karolherbst: mhh
22:33 karolherbst: imirkin_: adding NVC0_3D_VIEW_VOLUME_CLIP_CTRL_DEPTH_CLAMP_NEAR to the origina path helps
22:34 imirkin_: heh
22:34 imirkin_: i need to spend a few and properly RE that VIEW_VOLUME_CLIP thin
22:34 imirkin_: unfortunately this can only be done with a proper understanding of clipping
22:34 imirkin_: in practice, i only have like a 25%-complete understanding of clipping.
22:35 karolherbst: and I verified that the other two flags don't change anyhting
22:37 karolherbst: mhh, is clipping really that complicated or is it just like a priority kind of thing?
22:38 imirkin_: well
22:38 imirkin_: the core concept is pretty simple.
22:38 imirkin_: the problem is that various hw offers a wide variety of options for how its done
22:38 imirkin_: and the various bits implement those variations
22:39 imirkin_: knowing what variations are out there ahead of time sure helps :)
22:39 imirkin_: like ... clamping, for example - that clamps the depth instead of clipping
22:39 imirkin_: then there's depth vs x/y clipping
22:44 imirkin_: is there a polygon offset that's set?
22:44 karolherbst: how can I check?
22:45 imirkin_: qapitrace
22:45 karolherbst: yeah well, how?
22:45 imirkin_: glPolygonOffset() and iirc it should come out in the state
22:45 imirkin_: like GL_POLYGON_OFFSET or something
22:45 karolherbst: okay
22:45 imirkin_: there's a few of them
22:45 imirkin_: (3)
22:45 karolherbst: well right, but not for that call
22:46 imirkin_: i mean - there are a few GL_POLYGON_OFFSET_bla's
22:46 karolherbst: ahh
22:46 imirkin_: SCALE, FACTOR, and CLAMP probably
22:46 karolherbst: they are set to default
22:46 imirkin_: k