00:58 karolherbst: imirkin: I posted a fix for constantfolding of 64 bit int muls/mads, mind taking a quick look? It is a small patch
00:58 karolherbst: otherwise I just push it tomorow or something
00:59 imirkin: just push
00:59 imirkin: it seemed fine
00:59 karolherbst: k
00:59 imirkin: but i didn't look at the surrounding code
00:59 imirkin: to actually make sure of that
01:00 karolherbst: you don't know what uses 64 bit int maths though, right? :/
01:00 karolherbst: kind of silly that we have no shader_test files inside shader-db
01:01 karolherbst: maybe I should write a piglit test for that though
01:01 imirkin: piglit
01:04 karolherbst: mhh, there are basically just shift tests
01:07 imirkin: erm
01:08 imirkin: i don't think you're looking right.
01:08 imirkin: generated_tests/spec/arb_gpu_shader_int64/execution/built-in-functions/
01:08 imirkin: has plenty of op-mult tests
01:08 karolherbst: ohhh, I didn't check the generated tests :/
01:09 karolherbst: mhh, right, okay. But that won't hit the ConstantFolding path
01:09 karolherbst: or well, those tests
01:09 imirkin: ok. but there are multiplication tests ;)
01:09 karolherbst: true
01:10 imirkin: you need a special kind of multiplication
01:10 imirkin: the kind that glsl can't const-fold
01:10 imirkin: but the kind that nv50_ir can.
01:10 karolherbst: mhhhhhh
01:10 karolherbst: I triggered that through CL :/
01:11 karolherbst: imirkin: but why would glsl constfold a a * 0x100 multiplication or something?
01:11 imirkin: why would it?
01:11 imirkin: oh
01:12 imirkin: it would not.
01:12 imirkin: right. this is the "shift conversion" thing
01:12 imirkin: which worked before, but was extended to 64-bit, but apparently not quite.
01:14 karolherbst: I think before the faulty patch it was also triggered for 64-bit
01:14 karolherbst: but, the code was different
06:29 pabs3: is tearing while watching video fullscreen in totem + gnome-shell + Xorg something that is normal with nouveau drivers?
17:51 karolherbst: imirkin: can we actually have more than 16 const buffers on maxwell? Some people say that we should have 18 in total
18:11 imirkin: dunno
18:11 imirkin: should be easy to test
18:11 imirkin: just bump the driver constbuf up to 16
18:17 karolherbst: imirkin: can we actually expose 16 ubos?
18:18 karolherbst: wondering what the actual benefits would be here
18:18 imirkin: if we have 18 constbufs, sure
18:18 imirkin: none
18:18 karolherbst: how many does opengl mandate?
18:18 imirkin: 14
18:18 imirkin: or 13, i forget
19:02 karolherbst: imirkin: btw, would you mind if we change the emiter for NEG/ABS to use some random ALU op instead of f2f and i2i?
19:03 karolherbst: or would you prefer having this inside peephole?
19:06 karolherbst: mhh, although I would expect that i2i and f2f are slow on every ISA and that we could just use iadd/fadd instead and have some post everything pass to do that for us
19:06 karolherbst: like right be fore the last DCE
19:08 karolherbst: (and we should fix that up inside the idiv as well)
19:13 karolherbst: pendingchaos_: weren't you looking into that at some point? i2i/f2f ->iadd/fadd?
19:19 pendingchaos: yeah
19:21 karolherbst: pendingchaos: did you come up with anything?
19:22 pendingchaos: I decided to do it in LateAlgebraicOpt for some reason
19:22 karolherbst: mhhh
19:22 pendingchaos: I also have some unsent patches for it
19:22 pendingchaos: might be a few issues remaining with them though?
19:22 karolherbst: I don't really know if we really want that to be there though
19:23 karolherbst: or maybe...
19:27 karolherbst: having LoadPropagation makes kind of sense
19:37 imirkin: karolherbst: should not be in the emitter. that'll mess up the opclass/etc logic (maybe)
19:37 karolherbst: ahh, true
19:38 karolherbst: also the sched stuff I guess
19:38 karolherbst: I think LateAlgebraicOpts might be actually fine, we just need to be careful, that nothing "optimizes" it back to NEG/ABS
21:31 HdkR: karolherbst: Oh. Going to increase the UBO numbers?
21:31 karolherbst: meh
21:32 HdkR: meh? :P
21:33 karolherbst: well, does it even matter?
21:33 HdkR: It does for Yuzu
21:33 HdkR: Because they are hitting UBO limits constantly :D
21:35 HdkR: They currently have no SSBO fallback path, which they are working on...apparently
21:36 karolherbst: interesting
21:36 karolherbst: they don't remap, right?
21:36 HdkR: Not currently
21:36 karolherbst: I am sure nothing uses more than 4 or so ubos
21:37 HdkR: Games are using all 18
21:37 karolherbst: what games?
21:37 HdkR: I think SMO is the big one
21:37 karolherbst: are they using 18 or are they using 0, 1 and 18?
21:37 karolherbst: uhm, 17
21:38 HdkR: Not sure on that one
21:38 HdkR: Remapping would save them on that one yes
21:38 karolherbst: they have to remap ayway
21:38 karolherbst: like literally
21:39 HdkR: yea, they will have to eventually
21:39 karolherbst: no, they have to
21:39 karolherbst: ;)
21:39 HdkR: and then indirect cbank access will hurt
21:39 karolherbst: I highly doubt that
21:39 karolherbst: how are you ending up with that anyway?
21:40 karolherbst: anyway, with smart glsl code the compiler is able to optimize that away
21:41 HdkR: Unless the cbank selection on load is also indirect and impossible to optimize? :P
21:41 karolherbst: well
21:41 karolherbst: the issue is you have to implement that inside glsl
21:41 karolherbst: do we actually have indirect ubo access in glsl?
21:42 HdkR: GLSL doesn't understand indirect UBO binding accesses
21:42 karolherbst: okay, so you have to write a helper function which selects the correct ubo
21:42 karolherbst: anyway, lots of fun
21:43 HdkR: yea, it'll be a switch that masks + remaps + selects. Tons of fun stuff
21:43 karolherbst: you know what will be super fun?
21:43 karolherbst: if they have an indirect acces across 17 ubos
21:43 karolherbst: like full range
21:44 karolherbst: nothing will do that, but still
21:44 karolherbst: HdkR: how many ubos does nvidia actually expose? 14? 15?
21:44 HdkR: 14 in GL
21:44 karolherbst: 16 in vk?
21:45 HdkR: 15 I think?
21:45 karolherbst: ohh
21:45 karolherbst: sounds like fun
21:45 karolherbst: 2 internal + 1 for constants I've heard
21:45 karolherbst: so that would give us 15
21:46 HdkR: Sounds about right
21:47 karolherbst: what was that thing we decided on would be the death of perf for yuzu?
21:47 HdkR: ...I forgot, was it an instruction being rude?
21:47 karolherbst: TXD?
21:48 karolherbst: TXD was that tex op lowered into SHFL and QUADOP
21:49 karolherbst: or that insane surface stuff nvidia ends up doing?
21:49 HdkR: The formatted loadstores, or the surface reductions?
21:50 karolherbst: the reductions
21:50 karolherbst: I am sure imirkin is able to come up with something which is just impossible to reimplement through glsl ;)
21:50 HdkR: Could have been that one
21:51 karolherbst: I have some stuff somewhere
21:52 HdkR: They are attempting to figure out how to handle cross-thread communication on hardware with different warp sizes right now as well
21:52 karolherbst: HdkR: this isn't the original stuff, but some imageLoad+imageStore on 3D textures ends up in something like that: https://gist.github.com/karolherbst/d8a3e38d2a6d5ccd200486a71dd620da
21:52 karolherbst: the glsl was literally imageLoad + imageStore
21:52 karolherbst: nothing else
21:53 HdkR: lol, what fresh hell is this
21:53 karolherbst: I tried to figure out what the heck nvidia is doing there
21:53 karolherbst: HdkR: same thing, but through nouveau: https://gist.github.com/karolherbst/190939c7f056c040a320dbaa01b4c1e8
21:53 karolherbst: we... skip a few things
21:54 karolherbst: ahh, there is the original nvidia one
21:55 HdkR: Looks like yours is a bit shorter
21:55 karolherbst: really
21:55 karolherbst: we fail some 3d image CTS tests as well :p
21:55 karolherbst: big surprise, I know
21:57 HdkR: If only there was documentation on how not to fail with the hardware
22:05 karolherbst: :D
22:05 karolherbst: yeah, if only