00:58karolherbst: imirkin: I posted a fix for constantfolding of 64 bit int muls/mads, mind taking a quick look? It is a small patch
00:58karolherbst: otherwise I just push it tomorow or something
00:59imirkin: just push
00:59imirkin: it seemed fine
00:59imirkin: but i didn't look at the surrounding code
00:59imirkin: to actually make sure of that
01:00karolherbst: you don't know what uses 64 bit int maths though, right? :/
01:00karolherbst: kind of silly that we have no shader_test files inside shader-db
01:01karolherbst: maybe I should write a piglit test for that though
01:04karolherbst: mhh, there are basically just shift tests
01:08imirkin: i don't think you're looking right.
01:08imirkin: has plenty of op-mult tests
01:08karolherbst: ohhh, I didn't check the generated tests :/
01:09karolherbst: mhh, right, okay. But that won't hit the ConstantFolding path
01:09karolherbst: or well, those tests
01:09imirkin: ok. but there are multiplication tests ;)
01:10imirkin: you need a special kind of multiplication
01:10imirkin: the kind that glsl can't const-fold
01:10imirkin: but the kind that nv50_ir can.
01:10karolherbst: I triggered that through CL :/
01:11karolherbst: imirkin: but why would glsl constfold a a * 0x100 multiplication or something?
01:11imirkin: why would it?
01:12imirkin: it would not.
01:12imirkin: right. this is the "shift conversion" thing
01:12imirkin: which worked before, but was extended to 64-bit, but apparently not quite.
01:14karolherbst: I think before the faulty patch it was also triggered for 64-bit
01:14karolherbst: but, the code was different
06:29pabs3: is tearing while watching video fullscreen in totem + gnome-shell + Xorg something that is normal with nouveau drivers?
17:51karolherbst: imirkin: can we actually have more than 16 const buffers on maxwell? Some people say that we should have 18 in total
18:11imirkin: should be easy to test
18:11imirkin: just bump the driver constbuf up to 16
18:17karolherbst: imirkin: can we actually expose 16 ubos?
18:18karolherbst: wondering what the actual benefits would be here
18:18imirkin: if we have 18 constbufs, sure
18:18karolherbst: how many does opengl mandate?
18:18imirkin: or 13, i forget
19:02karolherbst: imirkin: btw, would you mind if we change the emiter for NEG/ABS to use some random ALU op instead of f2f and i2i?
19:03karolherbst: or would you prefer having this inside peephole?
19:06karolherbst: mhh, although I would expect that i2i and f2f are slow on every ISA and that we could just use iadd/fadd instead and have some post everything pass to do that for us
19:06karolherbst: like right be fore the last DCE
19:08karolherbst: (and we should fix that up inside the idiv as well)
19:13karolherbst: pendingchaos_: weren't you looking into that at some point? i2i/f2f ->iadd/fadd?
19:21karolherbst: pendingchaos: did you come up with anything?
19:22pendingchaos: I decided to do it in LateAlgebraicOpt for some reason
19:22pendingchaos: I also have some unsent patches for it
19:22pendingchaos: might be a few issues remaining with them though?
19:22karolherbst: I don't really know if we really want that to be there though
19:23karolherbst: or maybe...
19:27karolherbst: having LoadPropagation makes kind of sense
19:37imirkin: karolherbst: should not be in the emitter. that'll mess up the opclass/etc logic (maybe)
19:37karolherbst: ahh, true
19:38karolherbst: also the sched stuff I guess
19:38karolherbst: I think LateAlgebraicOpts might be actually fine, we just need to be careful, that nothing "optimizes" it back to NEG/ABS
21:31HdkR: karolherbst: Oh. Going to increase the UBO numbers?
21:32HdkR: meh? :P
21:33karolherbst: well, does it even matter?
21:33HdkR: It does for Yuzu
21:33HdkR: Because they are hitting UBO limits constantly :D
21:35HdkR: They currently have no SSBO fallback path, which they are working on...apparently
21:36karolherbst: they don't remap, right?
21:36HdkR: Not currently
21:36karolherbst: I am sure nothing uses more than 4 or so ubos
21:37HdkR: Games are using all 18
21:37karolherbst: what games?
21:37HdkR: I think SMO is the big one
21:37karolherbst: are they using 18 or are they using 0, 1 and 18?
21:37karolherbst: uhm, 17
21:38HdkR: Not sure on that one
21:38HdkR: Remapping would save them on that one yes
21:38karolherbst: they have to remap ayway
21:38karolherbst: like literally
21:39HdkR: yea, they will have to eventually
21:39karolherbst: no, they have to
21:39HdkR: and then indirect cbank access will hurt
21:39karolherbst: I highly doubt that
21:39karolherbst: how are you ending up with that anyway?
21:40karolherbst: anyway, with smart glsl code the compiler is able to optimize that away
21:41HdkR: Unless the cbank selection on load is also indirect and impossible to optimize? :P
21:41karolherbst: the issue is you have to implement that inside glsl
21:41karolherbst: do we actually have indirect ubo access in glsl?
21:42HdkR: GLSL doesn't understand indirect UBO binding accesses
21:42karolherbst: okay, so you have to write a helper function which selects the correct ubo
21:42karolherbst: anyway, lots of fun
21:43HdkR: yea, it'll be a switch that masks + remaps + selects. Tons of fun stuff
21:43karolherbst: you know what will be super fun?
21:43karolherbst: if they have an indirect acces across 17 ubos
21:43karolherbst: like full range
21:44karolherbst: nothing will do that, but still
21:44karolherbst: HdkR: how many ubos does nvidia actually expose? 14? 15?
21:44HdkR: 14 in GL
21:44karolherbst: 16 in vk?
21:45HdkR: 15 I think?
21:45karolherbst: sounds like fun
21:45karolherbst: 2 internal + 1 for constants I've heard
21:45karolherbst: so that would give us 15
21:46HdkR: Sounds about right
21:47karolherbst: what was that thing we decided on would be the death of perf for yuzu?
21:47HdkR: ...I forgot, was it an instruction being rude?
21:48karolherbst: TXD was that tex op lowered into SHFL and QUADOP
21:49karolherbst: or that insane surface stuff nvidia ends up doing?
21:49HdkR: The formatted loadstores, or the surface reductions?
21:50karolherbst: the reductions
21:50karolherbst: I am sure imirkin is able to come up with something which is just impossible to reimplement through glsl ;)
21:50HdkR: Could have been that one
21:51karolherbst: I have some stuff somewhere
21:52HdkR: They are attempting to figure out how to handle cross-thread communication on hardware with different warp sizes right now as well
21:52karolherbst: HdkR: this isn't the original stuff, but some imageLoad+imageStore on 3D textures ends up in something like that: https://gist.github.com/karolherbst/d8a3e38d2a6d5ccd200486a71dd620da
21:52karolherbst: the glsl was literally imageLoad + imageStore
21:52karolherbst: nothing else
21:53HdkR: lol, what fresh hell is this
21:53karolherbst: I tried to figure out what the heck nvidia is doing there
21:53karolherbst: HdkR: same thing, but through nouveau: https://gist.github.com/karolherbst/190939c7f056c040a320dbaa01b4c1e8
21:53karolherbst: we... skip a few things
21:54karolherbst: ahh, there is the original nvidia one
21:55HdkR: Looks like yours is a bit shorter
21:55karolherbst: we fail some 3d image CTS tests as well :p
21:55karolherbst: big surprise, I know
21:57HdkR: If only there was documentation on how not to fail with the hardware
22:05karolherbst: yeah, if only