01:02imirkin: skeggsb: when trying vdpau with a 4.15 kernel: nouveau 0000:02:00.0: msvld: intr 00000040
01:02imirkin: have you tested it at all?
01:04imirkin: and later when tearing down the channel (due to the hang): [1281335.031035] nouveau: X:00000000:0000a06f: detach msvld failed, -110
01:05imirkin: (has anyone tested vdpau with the 4.15 vmm changes?)
01:07orbea: i did, that is when mplayer crashed my system
01:07orbea: not sure if vdpau was involved
01:07orbea: still need to rebuild with KASN...
01:55imirkin: ok... looks like 2d image + level != 0 makes my SUQ impl break
01:55imirkin: binding level = 0 fixes it, so to speak
01:57imirkin: i wonder if textureSize() is also broken...
02:34imirkin: karolherbst: would appreciate a trace of KHR-GL45.shader_image_size.advanced-nonMS-fs-float on maxwell+ as well - want to see what blob does
10:05karolherbst: hakzsam: here is the broken and fixed sched opcode for imad hi x: https://gist.githubusercontent.com/karolherbst/4c3fe88529cf4977d56d0748bf0ec4b8/raw/f68a226b9959ae7691eb37b93c4530b032b7d16d/gistfile1.txt
10:05karolherbst: does this makes sense overall?
10:06karolherbst: I mean looking at the, the current code is clearly wrong, because it doesn't wait until the imad writing into c finished, just the imad before that
10:07karolherbst: (I took care of the things you pointed out, just wondering if there is something I overlooked here)
10:19karolherbst: hakzsam: ohh, I still need to fix your comment on the second patch
11:03hakzsam: karolherbst: yeah, looks good
11:14karolherbst: hakzsam: okay, nice. I also updated the patches and sent them out today
12:22hakzsam: karolherbst: I don't see the patches on the list
12:23karolherbst: I sent them out like two hours ago
12:23karolherbst: but they aren't on the archive either
12:24karolherbst: they are in my inbox though
12:25karolherbst: ahh, because I am in CC :(
13:16karolherbst: hakzsam: anyway, the two patches are here: https://github.com/karolherbst/mesa/commits/sched_fixes
17:30karolherbst: imirkin: mailing list is kind of broken todady, quick review for the patch from yesterday? https://github.com/karolherbst/mesa/commit/1cb6ebc0281842e875c37d771db9f441fbec5b2c.patch then I would just push it out today
17:40imirkin: karolherbst: the patch you just linked is r-by me
17:40imirkin: what was the patch from yesterday?
17:47karolherbst: it was that one
17:47karolherbst: mad.hi -> shladd
17:48imirkin: oh ok
18:22karolherbst: imirkin: so... the nvir OP_MOD instruction, is this more like mod or rem?
18:23karolherbst: rem is like Cs % and mod is mathematical
18:23imirkin: no computers ever do mathematical
18:24karolherbst: then I found another bug in my nir pass :)
18:24karolherbst: and I might want to enable lower_fmod64/lower_fmod32 then if I map irem -> OP_MOD
18:24karolherbst: ohh wait
18:24karolherbst: fmod is fine
18:24karolherbst: or maybe not
18:25karolherbst: who knows
20:35imirkin: karolherbst: btw, i'm fairly sure that i messed up the OP_MOD implementation when mod'ing by an immediate. basically glsl doesn't specify what happens when you mod by a negative integer. and i only sorta-handled that case. i think there's some permutation of negatives i didn't handle correctly.
20:35imirkin: it should be easy to fix if necessary, just ... requires testing and thinking.
20:35imirkin: since i presume that opencl cares.
20:36imirkin: also, OP_MOD is only integer, not float
20:36imirkin: fmod is just a - (a/b) * b
20:36imirkin: and i've weaseled my way into "fixing" the double mod thing where mod(a,a) == a instead of 0 :)
20:37imirkin: https://github.com/KhronosGroup/VK-GL-CTS/issues/51 -- they agreed with me, to my *vast* surprise
21:05karolherbst: sure it is "a - (a/b) * b"?
21:05karolherbst: because no matter how you look at it, it doesn't really make sense
21:06karolherbst: nir writes "src0 - src1 * floorf(src0 / src1)" for fmod
21:08karolherbst: which still seems to have a quite big error
21:10cwabbott: doesn't nouveau convert to float32 to do the division? you could enable the NIR lowering and get better precision
21:10cwabbott: although i think a / a isn't guaranteed to be 1.0 exactly either
21:11cwabbott: i think there's a hack in the nir 64-bit fmod lowering to fix the issue imirkin described
21:13imirkin: karolherbst: yeah, there's a floor in there somewhere ;)
21:14imirkin: cwabbott: you mean for rcp(f64)? nouveau has a library which does a pretty good job (not quite merged). i think it uses the rcp64h instruction which gets you the top 32-bits of the result, and then it's on you to get the lower 32
21:14cwabbott: oh, i see
21:14imirkin: there's a rsq64h as well, but in practice it's not as useful i think
21:14cwabbott: actually, it is
21:14imirkin: the current in-tree implementation just says lower 32 == 0 :)
21:15imirkin: close enough! :)
21:15cwabbott: i'd have a look at the NIR implementation
21:15cwabbott: you basically have to copy it, but use your fancy HW instruction
21:15imirkin: well, we already have the lib
21:16imirkin: and it's all tested etc
21:16imirkin: to be within like 1 ULP of the cpu result
21:16cwabbott: i'm talking about going a step beyond "low 32-bits equals 0"
21:16imirkin: oh yeah
21:16imirkin: we have a lib
21:16imirkin: just not merged yet
21:16cwabbott: oh, you were just talking about the current thing
21:16imirkin: i need to just push it otu.
21:16imirkin: yeah. the in-tree is low == 0
21:16imirkin: which seemed like the expedient thing to do when i was bringing up f64
21:16cwabbott: why isn't rsq64h helpful?
21:16imirkin: just the way the math works out
21:17imirkin: it doesn't really improve over messing with the exponent
21:17imirkin: and using a plain rsq on a 32-bit thing + some newton-raphson steps
21:17cwabbott: oh, i thought it would
21:17cwabbott: you have to do all the range reduction, etc.
21:17imirkin: we looked at what blob did... it didn't use it iirc
21:18imirkin: but with rcp64h it still makes sense
21:18cwabbott:spent way too much time on this at intel
21:18imirkin: the algo is here:
21:18imirkin: (hold on, have to find it)
21:18imirkin: rcp: https://github.com/imirkin/mesa/commit/1da4f35a0789f223d2bc3498fe06608a6dd2b387
21:18imirkin: rsq: https://github.com/imirkin/mesa/commit/1a8ea0024ad861f6e6ba939de5385c3338b3cb28
21:19imirkin: ok - got it backwards. rcp64h isn't useful
21:19karolherbst: imirkin: uhh by the way, how does fp64 works in general? Because like on those high end compute cards it is suppose to be much faster, but I have no idea how nvidia enables/disables that. Any ideas?
21:19imirkin: but rsq64h is.
21:20imirkin: karolherbst: well ... remember that the way we look at programs is all well and good, but that's not actually how things execute
21:20imirkin: karolherbst: it's some giant AVX512-style thing
21:20imirkin: so it's just a question of how long the ops take, there are internal resources, etc
21:21karolherbst: okay, sure
21:23karolherbst: but I am like, is the hardware indeed different so that there is not enough blocks for fp64 stuff or is it just sw controlled?
21:24karolherbst: because why would nvidia even produce two different kind of GK110 chips for example
21:24imirkin: no real idea. i assume just internal hw resources.
21:25karolherbst: well at some point we will figure that out
21:25imirkin: i think e.g. architecturally maxwell has fewer f64 resources than kepler or something
21:25karolherbst: yeah, well, right. But that's not what I meant
21:26imirkin: but within a gen, no reason there can't be variety
21:26imirkin: as long as the same "API" is presented to the executing code
21:26karolherbst: I was talking about GK110 with fp64 perf vs GK110 without
21:26imirkin: yeah, tbh i've never heard of that
21:26imirkin: but i don't see why that'd be impossible.
21:27imirkin: keep in mind that my state of the art gpu is a GK208, so ... i don't delve in high-end that often ;)
21:27imirkin: (technically GM107? not sure which is better.)
21:27imirkin: i wonder if my GT215 with GDDR5 would beat both of them if it reclocked
21:27karolherbst: I am just wondering why there is this rumour always about nvidia fearing people turn their cards into quadros or whatever, which never stroke me as giving you any benefit anyway
21:28karolherbst: besides some cool fancy sw features or so
21:28imirkin: so that's a really old thing
21:28imirkin: there's a method
21:28imirkin: which introduces a stall on regular gpu's, but not on quadro's
21:28imirkin: or something equally silly
21:28karolherbst: well a GT215 isn't that fast either
21:29imirkin: right, but ... GDDR5 is fast
21:29karolherbst: raw compute power GM107 should be 4x as fast
21:29imirkin: and memory is like 50% of the battle
21:29imirkin: and GT215 is actually pretty solid too. it's not as big as the G200, but still big.
21:30karolherbst: depends on what GT215
21:30karolherbst: I compared slowest GT215 vs slowest GM107
21:30imirkin: GT215 with GDDR5 vram :p there's like only one.
21:30imirkin: (and i have it)
21:30karolherbst: GT 340?
21:30imirkin: GT 240
21:30karolherbst: see, so there are two ;)
21:30imirkin: do you have a GT 340?
21:31imirkin: i'll believe it when i see it :p
21:31karolherbst: it is a OEM card
21:31karolherbst: and a rebranded 240 :D
21:31imirkin: and it shipped with GDDR5?
21:31karolherbst: yeah, all 340 are GDDR5
21:32karolherbst: mhh but GDDR5 on GT215 wasn't exactly fast either
21:32karolherbst: ~2x as DDR3
21:32imirkin: a lot faster than ddr3
21:32karolherbst: where the normal difference is more like ~3x to ~4x on newer gens
21:32imirkin: but the gm107 is ddr3
21:32imirkin: it's a GTX 745
21:34karolherbst: GM107: 28.8 GB/ mem 800 GFLOPS core vs GT215: 54.4 GB/s mem 250GFLOPS core
21:35imirkin: the GTX 745 performed surprisingly well in valley, despite all the render fail
21:35imirkin: (or maybe as a result of? who knows)
21:35karolherbst: faster than the GT215? :p
21:35karolherbst: ohh maybe I could look into that
21:35imirkin: i dunno, been so long since i plugged it in
21:35karolherbst: I see
21:35imirkin: and ... no reclocking
21:35imirkin: i got it because it insta-hung on nouveau
21:35imirkin: tried to fix it for a few days and then gave up
21:36imirkin: then ben fixed it after like 5 minutes of looking at it.
21:38karolherbst: which issue you mean in valley exactly?
21:38karolherbst: those random blue trees flickering or something else?
21:38karolherbst: because I think there was something else
21:39imirkin: yes, the random blue trees, and general geometry fail every so often
21:39imirkin: iirc the fail used to look slightly different, dunno
21:39karolherbst: that also happens with heaven
21:39imirkin: the blue stuff with msaa x8?
21:40imirkin: i thought i fixed that
21:40karolherbst: trees flcikering
21:40karolherbst: it is super rare
21:40karolherbst: opt level changes the issues
21:41imirkin: and iirc MESA_DEBUG=flush fixes them
21:41imirkin: so to speak :)
21:41karolherbst: ohh wait, I just used my system install
21:41imirkin: (remember you need a debug build to use MESA_DEBUG)
21:44karolherbst: mhh, that indeed fixes it
21:45imirkin: good luck =]
21:46imirkin: issues happen in xonotic too
21:46karolherbst: imirkin: https://i.imgur.com/fDnNDN2.jpg
21:46karolherbst: in heaven
21:46imirkin: even with flush?
21:47karolherbst: didn't try with
21:47karolherbst: looks fine with flush
21:47imirkin: the msaa x8 bug of yore looked like a really cool effect
21:47imirkin: and was visible everywhere
21:48karolherbst: we should have made that into an extension.....
21:49karolherbst:is hoping that intel will fix that buf sync issue at some point...
21:51karolherbst: even with some games running at 55 fps or something through DRI_PRIME look like runnint at 10 fos or so
21:51imirkin: send patches
21:51karolherbst: disable the sync?....
21:51karolherbst: I think they added some syncing stuff to really kill tearing for that kind of stuff
21:52karolherbst: but it has some weird side effects
21:52karolherbst: like if nouveau gets stuck, the entire screen could just freeze :)
21:52imirkin: you add sync's to wait on things
21:52imirkin: but then the problem is that you wait on things :)
21:52imirkin: can't have it both ways, sadly
21:53karolherbst: well, you get notified or something
21:53imirkin: yeah dunno
21:53imirkin: outside my domain.
21:53karolherbst: and just draw the entire screen until the dedicated GPU finised drawing
21:53karolherbst: and then update the content
21:54imirkin: i can only keep so many things in my head, and this stuff doesn't make the cut
21:54karolherbst: which might add a latency of (1/Hz) s in the worst case
22:27imirkin: ok, i think i see what's going on with that advancedNonMS thing. it's not nouveau's fault.
22:29imirkin: it's the universe's fault :)
22:34karolherbst: isn't everything?
22:41imirkin: some combination of funkiness in st_finalize_texture and image conversion... looking at it now
22:42imirkin: OH MAN
22:42imirkin: this is the worst.
22:43imirkin: the min filter doesn't include mipmaps. so the finalize only takes the first level into account.
22:43imirkin: THE WORST
22:58imirkin: on the bright side, i think that explains everything
23:01imirkin: and _mesa_is_texture_complete is fundamentally wrong too
23:01imirkin: or rather, *using it* is wrong