01:02 imirkin: skeggsb: when trying vdpau with a 4.15 kernel: nouveau 0000:02:00.0: msvld: intr 00000040
01:02 imirkin: have you tested it at all?
01:04 imirkin: and later when tearing down the channel (due to the hang): [1281335.031035] nouveau: X[18618]:00000000:0000a06f: detach msvld failed, -110
01:05 imirkin: (has anyone tested vdpau with the 4.15 vmm changes?)
01:07 orbea: i did, that is when mplayer crashed my system
01:07 orbea: not sure if vdpau was involved
01:07 orbea: still need to rebuild with KASN...
01:08 imirkin: boo.
01:55 imirkin: ok... looks like 2d image + level != 0 makes my SUQ impl break
01:55 imirkin: binding level = 0 fixes it, so to speak
01:57 imirkin: i wonder if textureSize() is also broken...
02:34 imirkin: karolherbst: would appreciate a trace of KHR-GL45.shader_image_size.advanced-nonMS-fs-float on maxwell+ as well - want to see what blob does
10:05 karolherbst: hakzsam: here is the broken and fixed sched opcode for imad hi x: https://gist.githubusercontent.com/karolherbst/4c3fe88529cf4977d56d0748bf0ec4b8/raw/f68a226b9959ae7691eb37b93c4530b032b7d16d/gistfile1.txt
10:05 karolherbst: does this makes sense overall?
10:06 karolherbst: I mean looking at the, the current code is clearly wrong, because it doesn't wait until the imad writing into c finished, just the imad before that
10:07 karolherbst: (I took care of the things you pointed out, just wondering if there is something I overlooked here)
10:19 karolherbst: hakzsam: ohh, I still need to fix your comment on the second patch
11:03 hakzsam: karolherbst: yeah, looks good
11:14 karolherbst: hakzsam: okay, nice. I also updated the patches and sent them out today
12:22 hakzsam: karolherbst: I don't see the patches on the list
12:22 karolherbst: hum
12:23 karolherbst: I sent them out like two hours ago
12:23 karolherbst: but they aren't on the archive either
12:24 hakzsam: yep
12:24 karolherbst: they are in my inbox though
12:25 karolherbst: ahh, because I am in CC :(
12:25 karolherbst: annoying
13:16 karolherbst: hakzsam: anyway, the two patches are here: https://github.com/karolherbst/mesa/commits/sched_fixes
17:30 karolherbst: imirkin: mailing list is kind of broken todady, quick review for the patch from yesterday? https://github.com/karolherbst/mesa/commit/1cb6ebc0281842e875c37d771db9f441fbec5b2c.patch then I would just push it out today
17:40 imirkin: karolherbst: the patch you just linked is r-by me
17:40 imirkin: what was the patch from yesterday?
17:46 karolherbst: yeah
17:47 karolherbst: it was that one
17:47 karolherbst: mad.hi -> shladd
17:48 imirkin: oh ok
18:22 karolherbst: imirkin: so... the nvir OP_MOD instruction, is this more like mod or rem?
18:23 karolherbst: rem is like Cs % and mod is mathematical
18:23 imirkin: rem
18:23 imirkin: no computers ever do mathematical
18:23 karolherbst: okay
18:24 karolherbst: then I found another bug in my nir pass :)
18:24 karolherbst: and I might want to enable lower_fmod64/lower_fmod32 then if I map irem -> OP_MOD
18:24 karolherbst: ohh wait
18:24 karolherbst: fmod is fine
18:24 karolherbst: or maybe not
18:25 karolherbst: who knows
20:35 imirkin: karolherbst: btw, i'm fairly sure that i messed up the OP_MOD implementation when mod'ing by an immediate. basically glsl doesn't specify what happens when you mod by a negative integer. and i only sorta-handled that case. i think there's some permutation of negatives i didn't handle correctly.
20:35 karolherbst: mhh
20:35 imirkin: it should be easy to fix if necessary, just ... requires testing and thinking.
20:35 imirkin: since i presume that opencl cares.
20:36 imirkin: also, OP_MOD is only integer, not float
20:36 imirkin: fmod is just a - (a/b) * b
20:36 imirkin: and i've weaseled my way into "fixing" the double mod thing where mod(a,a) == a instead of 0 :)
20:37 imirkin: https://github.com/KhronosGroup/VK-GL-CTS/issues/51 -- they agreed with me, to my *vast* surprise
21:03 karolherbst: mhh
21:05 karolherbst: sure it is "a - (a/b) * b"?
21:05 karolherbst: because no matter how you look at it, it doesn't really make sense
21:06 karolherbst: nir writes "src0 - src1 * floorf(src0 / src1)" for fmod
21:08 karolherbst: which still seems to have a quite big error
21:10 cwabbott: doesn't nouveau convert to float32 to do the division? you could enable the NIR lowering and get better precision
21:10 cwabbott: although i think a / a isn't guaranteed to be 1.0 exactly either
21:11 karolherbst: ...
21:11 cwabbott: i think there's a hack in the nir 64-bit fmod lowering to fix the issue imirkin described
21:13 imirkin: karolherbst: yeah, there's a floor in there somewhere ;)
21:14 imirkin: cwabbott: you mean for rcp(f64)? nouveau has a library which does a pretty good job (not quite merged). i think it uses the rcp64h instruction which gets you the top 32-bits of the result, and then it's on you to get the lower 32
21:14 cwabbott: oh, i see
21:14 imirkin: there's a rsq64h as well, but in practice it's not as useful i think
21:14 cwabbott: actually, it is
21:14 imirkin: the current in-tree implementation just says lower 32 == 0 :)
21:15 imirkin: close enough! :)
21:15 cwabbott: i'd have a look at the NIR implementation
21:15 cwabbott: you basically have to copy it, but use your fancy HW instruction
21:15 imirkin: well, we already have the lib
21:16 imirkin: and it's all tested etc
21:16 imirkin: to be within like 1 ULP of the cpu result
21:16 cwabbott: i'm talking about going a step beyond "low 32-bits equals 0"
21:16 imirkin: oh yeah
21:16 imirkin: we have a lib
21:16 imirkin: just not merged yet
21:16 cwabbott: oh, you were just talking about the current thing
21:16 imirkin: i need to just push it otu.
21:16 imirkin: yeah. the in-tree is low == 0
21:16 imirkin: which seemed like the expedient thing to do when i was bringing up f64
21:16 cwabbott: why isn't rsq64h helpful?
21:16 imirkin: just the way the math works out
21:17 cwabbott: huh?
21:17 imirkin: it doesn't really improve over messing with the exponent
21:17 imirkin: and using a plain rsq on a 32-bit thing + some newton-raphson steps
21:17 cwabbott: oh, i thought it would
21:17 cwabbott: you have to do all the range reduction, etc.
21:17 imirkin: we looked at what blob did... it didn't use it iirc
21:17 cwabbott: interesting
21:18 imirkin: but with rcp64h it still makes sense
21:18 cwabbott:spent way too much time on this at intel
21:18 imirkin: the algo is here:
21:18 imirkin: (hold on, have to find it)
21:18 imirkin: rcp: https://github.com/imirkin/mesa/commit/1da4f35a0789f223d2bc3498fe06608a6dd2b387
21:18 imirkin: rsq: https://github.com/imirkin/mesa/commit/1a8ea0024ad861f6e6ba939de5385c3338b3cb28
21:19 imirkin: ok - got it backwards. rcp64h isn't useful
21:19 karolherbst: imirkin: uhh by the way, how does fp64 works in general? Because like on those high end compute cards it is suppose to be much faster, but I have no idea how nvidia enables/disables that. Any ideas?
21:19 imirkin: but rsq64h is.
21:20 imirkin: karolherbst: well ... remember that the way we look at programs is all well and good, but that's not actually how things execute
21:20 imirkin: karolherbst: it's some giant AVX512-style thing
21:20 imirkin: so it's just a question of how long the ops take, there are internal resources, etc
21:21 karolherbst: okay, sure
21:23 karolherbst: but I am like, is the hardware indeed different so that there is not enough blocks for fp64 stuff or is it just sw controlled?
21:24 karolherbst: because why would nvidia even produce two different kind of GK110 chips for example
21:24 imirkin: no real idea. i assume just internal hw resources.
21:24 karolherbst: mhh
21:25 karolherbst: well at some point we will figure that out
21:25 imirkin: i think e.g. architecturally maxwell has fewer f64 resources than kepler or something
21:25 karolherbst: yeah, well, right. But that's not what I meant
21:26 imirkin: but within a gen, no reason there can't be variety
21:26 imirkin: as long as the same "API" is presented to the executing code
21:26 karolherbst: I was talking about GK110 with fp64 perf vs GK110 without
21:26 imirkin: yeah, tbh i've never heard of that
21:26 imirkin: but i don't see why that'd be impossible.
21:27 imirkin: keep in mind that my state of the art gpu is a GK208, so ... i don't delve in high-end that often ;)
21:27 karolherbst: :D
21:27 karolherbst: right
21:27 imirkin: (technically GM107? not sure which is better.)
21:27 imirkin: i wonder if my GT215 with GDDR5 would beat both of them if it reclocked
21:27 karolherbst: I am just wondering why there is this rumour always about nvidia fearing people turn their cards into quadros or whatever, which never stroke me as giving you any benefit anyway
21:28 karolherbst: besides some cool fancy sw features or so
21:28 imirkin: so that's a really old thing
21:28 imirkin: there's a method
21:28 imirkin: which introduces a stall on regular gpu's, but not on quadro's
21:28 imirkin: or something equally silly
21:28 karolherbst: well a GT215 isn't that fast either
21:29 imirkin: right, but ... GDDR5 is fast
21:29 karolherbst: raw compute power GM107 should be 4x as fast
21:29 imirkin: and memory is like 50% of the battle
21:29 imirkin: and GT215 is actually pretty solid too. it's not as big as the G200, but still big.
21:30 karolherbst: depends on what GT215
21:30 karolherbst: I compared slowest GT215 vs slowest GM107
21:30 imirkin: GT215 with GDDR5 vram :p there's like only one.
21:30 imirkin: (and i have it)
21:30 karolherbst: GT 340?
21:30 imirkin: GT 240
21:30 karolherbst: see, so there are two ;)
21:30 imirkin: do you have a GT 340?
21:31 karolherbst: no
21:31 imirkin: i'll believe it when i see it :p
21:31 karolherbst: it is a OEM card
21:31 karolherbst: and a rebranded 240 :D
21:31 imirkin: and it shipped with GDDR5?
21:31 karolherbst: yeah, all 340 are GDDR5
21:31 imirkin: k
21:32 karolherbst: mhh but GDDR5 on GT215 wasn't exactly fast either
21:32 karolherbst: ~2x as DDR3
21:32 imirkin: a lot faster than ddr3
21:32 karolherbst: where the normal difference is more like ~3x to ~4x on newer gens
21:32 imirkin: but the gm107 is ddr3
21:32 imirkin: it's a GTX 745
21:34 karolherbst: GM107: 28.8 GB/ mem 800 GFLOPS core vs GT215: 54.4 GB/s mem 250GFLOPS core
21:34 karolherbst: mhh
21:35 imirkin: the GTX 745 performed surprisingly well in valley, despite all the render fail
21:35 imirkin: (or maybe as a result of? who knows)
21:35 karolherbst: faster than the GT215? :p
21:35 karolherbst: ohh maybe I could look into that
21:35 imirkin: i dunno, been so long since i plugged it in
21:35 karolherbst: :D
21:35 karolherbst: I see
21:35 imirkin: and ... no reclocking
21:35 karolherbst: right
21:35 imirkin: i got it because it insta-hung on nouveau
21:35 imirkin: tried to fix it for a few days and then gave up
21:35 karolherbst: ah
21:36 imirkin: then ben fixed it after like 5 minutes of looking at it.
21:38 karolherbst: mhh
21:38 karolherbst: which issue you mean in valley exactly?
21:38 karolherbst: those random blue trees flickering or something else?
21:38 karolherbst: because I think there was something else
21:39 imirkin: yes, the random blue trees, and general geometry fail every so often
21:39 karolherbst: ahh
21:39 imirkin: iirc the fail used to look slightly different, dunno
21:39 karolherbst: that also happens with heaven
21:39 imirkin: the blue stuff with msaa x8?
21:40 imirkin: i thought i fixed that
21:40 karolherbst: no
21:40 karolherbst: trees flcikering
21:40 karolherbst: it is super rare
21:40 karolherbst: mhh
21:40 karolherbst: interesting
21:40 karolherbst: opt level changes the issues
21:41 imirkin: and iirc MESA_DEBUG=flush fixes them
21:41 imirkin: so to speak :)
21:41 karolherbst: ohh wait, I just used my system install
21:41 karolherbst: mhh
21:41 imirkin: (remember you need a debug build to use MESA_DEBUG)
21:42 karolherbst: yeah
21:44 karolherbst: mhh, that indeed fixes it
21:45 imirkin: good luck =]
21:46 imirkin: issues happen in xonotic too
21:46 karolherbst: imirkin: https://i.imgur.com/fDnNDN2.jpg
21:46 karolherbst: in heaven
21:46 imirkin: even with flush?
21:47 karolherbst: didn't try with
21:47 karolherbst: looks fine with flush
21:47 imirkin: the msaa x8 bug of yore looked like a really cool effect
21:47 imirkin: and was visible everywhere
21:48 karolherbst: right
21:48 karolherbst: we should have made that into an extension.....
21:49 karolherbst:is hoping that intel will fix that buf sync issue at some point...
21:50 imirkin: hehe
21:50 karolherbst: seriously
21:51 karolherbst: even with some games running at 55 fps or something through DRI_PRIME look like runnint at 10 fos or so
21:51 imirkin: send patches
21:51 karolherbst: *fps
21:51 karolherbst: disable the sync?....
21:51 karolherbst: I think they added some syncing stuff to really kill tearing for that kind of stuff
21:52 karolherbst: but it has some weird side effects
21:52 karolherbst: like if nouveau gets stuck, the entire screen could just freeze :)
21:52 imirkin: yeah
21:52 imirkin: you add sync's to wait on things
21:52 imirkin: but then the problem is that you wait on things :)
21:52 imirkin: can't have it both ways, sadly
21:53 karolherbst: well, you get notified or something
21:53 imirkin: yeah dunno
21:53 imirkin: outside my domain.
21:53 karolherbst: and just draw the entire screen until the dedicated GPU finised drawing
21:53 karolherbst: and then update the content
21:54 imirkin: i can only keep so many things in my head, and this stuff doesn't make the cut
21:54 karolherbst: which might add a latency of (1/Hz) s in the worst case
22:27 imirkin: ok, i think i see what's going on with that advancedNonMS thing. it's not nouveau's fault.
22:29 imirkin: it's the universe's fault :)
22:34 karolherbst: :D
22:34 karolherbst: isn't everything?
22:41 imirkin: some combination of funkiness in st_finalize_texture and image conversion... looking at it now
22:42 imirkin: OH MAN
22:42 imirkin: this is the worst.
22:43 imirkin: the min filter doesn't include mipmaps. so the finalize only takes the first level into account.
22:43 imirkin: THE WORST
22:55 karolherbst: :O
22:58 imirkin: on the bright side, i think that explains everything
23:01 imirkin: and _mesa_is_texture_complete is fundamentally wrong too
23:01 imirkin: urrrgh
23:01 imirkin: or rather, *using it* is wrong