00:37 chrisf: anholt, interesting.
00:38 chrisf: anholt, can you ping my work email about it and i'll see who i can light a fire under :P
00:41 chrisf: nvm, ive got things in motion now
12:27 udovdh: hello!
12:27 udovdh: somethign happened: https://pastebin.com/rthnVYFN
12:28 udovdh: looks like some page fault, but is it a bug or a an indicator of an issue in the video stream that was played?
12:28 udovdh: stuiff recovered and kept playing after the glitch
12:29 udovdh: happened some more after a little while
12:29 udovdh: did not nsee mmhub0 mentioned before
12:30 udovdh: kernel 5.7.9 on Gigabyte X570 Aorus Pro, BIOS F20, Fedora 31 with kernel.org, git mesa, etc.
12:34 udovdh: different stuff happened after that: https://pastebin.com/6U7XZjsA
12:34 udovdh: real quality code... :-/
12:34 udovdh: uptime of 12 days and gnome-shell is killed for OOM reasons?
12:36 HdkR: Oom? https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6148 Might help
12:40 udovdh: HdkR, how can I be sure that this is the fix for what just happened?
12:40 HdkR: Can't be
12:41 udovdh: Ah, so wait and do a git pull when the commit is merged?
12:41 udovdh: I do not see this issue regularly
12:45 udovdh: was playing HD video off a bluray iso using vlc
12:45 udovdh: fairly high bitrate: 35 GB for 1h38m
12:59 udovdh: due to multiple audio tracks but also hq video
14:28 karolherbst: mhhh...
14:28 karolherbst: divergency analysis
14:28 karolherbst: I need per component info for vec2/3/4
14:29 karolherbst: well.. I can also hack around it :p
15:36 jekstrand: karolherbst: I'm starting to think we want a nir_var_mem_global_const
15:36 jekstrand: karolherbst: And a load_global_const
15:37 karolherbst: jekstrand: probably
15:37 jekstrand: karolherbst: Mostly because load_global_const would be CSE-able and we could ignore barriers for it.
15:37 jekstrand: The more I play around with global memory stuff, the more it looks like a good idea.
15:37 karolherbst: yeah.. I am still not sure how to deal with constant* memory as it's just painful
15:38 jekstrand: I'm also wondering if we don't want some sort of a load_global_const which takes a base pointer and an offset.
15:38 karolherbst: jekstrand: the big issue is, for the kernel there is no different between input memory and taking the address of _tmp values
15:38 karolherbst: *_temp
15:38 jekstrand: karolherbst: Yeah, that's tricky
15:38 jekstrand: karolherbst: CL does have __constant though doesn't it?
15:38 karolherbst: yes
15:39 jekstrand: And inputs aren't __constant?
15:39 karolherbst: yes
15:39 karolherbst: but so are in kernel constants
15:39 karolherbst: there is no difference between those
15:39 jekstrand: you answered "yes" to a negative. Which way does it go?
15:39 karolherbst: well, inputs are only constant if declared as such
15:40 karolherbst: of course you can't write to the input buffer
15:40 karolherbst: but
15:41 karolherbst: you also don't have direct access to it anyway
15:41 karolherbst: the way stuff works is that there is an implicit wrapper around kernel functions anyway
15:41 karolherbst: and prameters are just parameters
15:41 karolherbst: so you can take the address of the parameter and write to it if you want to
15:41 karolherbst: but if you have a __constant * to something
15:42 karolherbst: it can point to memory you pass as kernel args _or_ in kernel constant memory
15:46 jenatali: What we're doing (if it helps) is running a pass on inline constants to see if we can promote them shader_temp, which maps to an inline constant buffer inside DXIL, otherwise we reflect them out to the runtime and bind them as hidden inputs
15:47 jenatali: That way you can do all kinds of crazy take-address-of-local-const stuff, convert it to an int, store it in a buffer, load it later through a different deref chain, cast it back, and it just works
15:47 karolherbst: yeah..
15:48 karolherbst: we were mainly wondering on how to deal with actual constant memory though and maybe having a new file could help
15:48 karolherbst: just makes handling of inline constants more annoying
15:48 jenatali: I'm not seeing what global_const buys you that uniform/ubo doesn't?
15:49 karolherbst: jekstrand: so.. here is the big issue: constants can be global loads with a constant flag and ubos
15:49 karolherbst: soo.. the thing is, we only have 8 constant buffers in total with compute shaders on nv
15:50 karolherbst: 1 used for inputs/uniforms, 1 for driver const buffer
15:50 karolherbst: so we only have 6, but GL mandages more of course
15:50 karolherbst: so what we do is to spill the remaining ubos to global memory loads
15:50 karolherbst: and I bet that marking global memory loads as containing constant data probably has a benefit
15:51 karolherbst: so yeah.. load_global_const would make sense in this case
15:51 jenatali: Ah
15:51 karolherbst: does it make sense for the general case? no, as CL constant memory _are_ ubos
15:51 karolherbst: they have the same size limitation
15:51 karolherbst: they behave the same
15:51 karolherbst: and you could even bind them like in GL
15:51 karolherbst: by having implicit indexing on the args or go full indirect
15:52 karolherbst: the bigger issue for us just would be.. what happens if you have some conditinoal code selecting either inline constants _or_ spilled ubos?
15:52 karolherbst: but in this case as the inline constants are in a scratch buffer already they could accessed by global addressing as well
15:53 karolherbst: those details are just super annoying to deal with
15:55 jekstrand: I'm less concerned with spilling out to UBOs and more concerned with how the compiler treats them with respect to things like CSE and barriers.
15:55 jenatali: Yup - hence why we promote any inline constants that have a deref chain to them which doesn't end in a load (e.g. a select) into ubos as well
15:56 jekstrand: I think, if we have a different mode which is unaffected by barriers, NIR's copy-prop stuff should be good enough to take care of most of our CSE issues.
15:56 jekstrand: Though I'm not 100% sure on that.
15:57 jekstrand: The other thing is that, in the case where you have a constant offset from some effectively fixed address, we can do much better if we can use a block load instruction.
15:58 jekstrand: Essentially, it loads 32B or 64B of data spread across 8 or 16 SIMD lanes. We then pick up the individual bits with something that looks like subgroup ops. (Not real subgroup ops but that's what they sort-of look like.)
15:58 jekstrand: The result is very wide loads and much lower register pressure.
15:58 karolherbst: right
15:58 jekstrand: For Intel, I couldn't care less about being able to shunt them off to actual UBOs. I just want my big wide loads.
15:58 jekstrand: That's the #1 benefit we get from UBOs
15:59 jekstrand: For Nvidia, I realize you really want to use your magic UBO hardware and that's fine.
15:59 jekstrand: They're sort-of different problems.
15:59 jekstrand: But the "help NIR reason about them" problem is one that I think will only be solved with nir_var_mem_global
16:00 jekstrand: The others will take more creativity, I think.
16:00 karolherbst: jekstrand: but would there be a problem if we just treat them as ubos? using mem_global was always more of a workaround
16:01 jekstrand: karolherbst: Not strictly speaking, no. However, with generic pointers, you may not always know.
16:01 karolherbst: jekstrand: can't cast constant to generic
16:01 jekstrand: Ok, that's helpful
16:01 karolherbst: but
16:01 karolherbst: and what was the reason I added the workaround
16:02 karolherbst: you can back constant* with SVM memory
16:02 karolherbst: CL 2.0 strikes again
16:02 jenatali: You could just not do CL2.0 :P
16:02 jekstrand: For us, as I said, the only benefits we get from UBOs are a) no 64-bit arithmetic and b) big wide loads if it's a constant offset.
16:02 karolherbst: yeah
16:03 karolherbst: constant buffers in CL are super small
16:03 jekstrand: So if we can figure out how to get our big wide loads with a 64-bit pointer, 64-bit pointers all the way.
16:03 karolherbst: so doing 32bit math is feasible
16:03 karolherbst: I think I'd even back up SVM memory with a real ubo when launching the kernel and just hide it entirely
16:03 jekstrand: I also have other reasons to want nir_var_mem_global_const for 64-bit pointers which I can't discuss in detail. :-)
16:03 karolherbst: I think that's what nvidia is doing these days as well
16:04 karolherbst: ahh
16:04 karolherbst: well
16:04 karolherbst: I'd like to have it for spilled ubos :p
16:04 jekstrand: Yeah, I think there are plenty of reasons why it's useful.
16:04 karolherbst: but that's more of a "read this value, but it's constant"
16:04 karolherbst: we have LDG.CONSTANT variants
16:05 jekstrand: Spilled UBOs, Anything where it's just more convenient to pass a 64-bit pointer into the shader, etc.
16:05 karolherbst: and I bet it gets cached more aggressivly or so
16:05 jekstrand: That could be.
16:05 karolherbst: jekstrand: that's what we do with GL :)
16:05 karolherbst: well.. we load it from the driver constbuf
16:05 karolherbst: but yeah
16:06 karolherbst: still.. we don't do LDG.CONSTANT with those yet
16:06 jekstrand: See, you could get a perf boost from this! :P
16:06 karolherbst: yeah, that's my hope
16:07 karolherbst: but I still need to write this lowering pass for ubos as well :)
16:07 karolherbst: it's a bit tricky with indirects
16:07 jekstrand: I'm also not sure how I would lower load_global_const to wide loads
16:08 jekstrand: It requires some alignment restirctions and things.
16:08 jekstrand: I guess I could add a nir_intrinsic_assert_aligned_uniform(x, align)
16:08 jekstrand: Which declares that its value is both subgroup-uniform and aligned to the given alignment
16:09 jekstrand: And otherwise it's a total pass-through op.
16:09 karolherbst: mhhh
16:09 jekstrand: Then detect load_global(iadd(aligned_uniform(x, align), const_offset))
16:09 karolherbst: I think we just have to make better use of the alignment info we hagve
16:09 karolherbst: and set it better
16:09 jekstrand: That would also be an option
16:09 jenatali: Agreed
16:10 jekstrand: Check that align_mul >= 32 && align_offset % 32 == const_offset % 32
16:10 karolherbst: so if you load a vec4 of an ubo, the first entry is 0x10 aligneed, the second mul 0x10 + 0x4, etc...
16:10 jekstrand: That probably workls
16:10 karolherbst: yeah
16:10 jenatali: One of bbrezillon's pending MRs to upstream has patches to propagate alignment from SPIR-V through explicit IO
16:10 jekstrand: Combine it with uniformity analysis
16:10 karolherbst: well.. aren't things in ubos normally vec4 aligned?
16:10 karolherbst: well.. not for CL at least..
16:11 karolherbst: but I think we could be more clever
16:11 jekstrand: karolherbst: In order to get my fancy wide pulls, I need vec8 alignment and the address to be subgroup-uniform.
16:11 karolherbst: right
16:11 jekstrand: That's tricky to get just from the alignment parameters
16:11 karolherbst: but the first member of a UBO is aligned to the UBOs size
16:11 jekstrand: Well, to some driver-specified UBO alignment
16:12 karolherbst: I bet you could have offsets
19:28 austriancoder: jekstrand: I just looked again at your xdc2019 presentation .. what happend to ibc?
19:30 jekstrand: austriancoder: Kayden is working on it now.
19:31 jekstrand: austriancoder: I had to step away from it back in October/November or so to help out with some of the performance analysis and image compression stuff on Gen12
19:31 jekstrand: austriancoder: So it sat there for probably 4 months with no one doing anything
19:31 jekstrand: But now Kayden's working on it and starting to make real progress again.
19:31 jekstrand: He's just about got spilling working
19:31 jekstrand: And I think he hooked up tessellation shaders.
19:35 jekstrand: I think it's really good to have someone else work on it for a while; that way we have some chance of breaking any design ruts I may have gotten us into.
19:35 jekstrand: I'm really looking forward to it, though. I keep seeing so many cases where real scalars are going to be useful.
19:38 austriancoder: great to hear that there is still some movement
19:41 jekstrand: Yeah, there was chatter a few weeks ago about trying to land it upstream. I really wanted to get spilling implemented before we did that and someone other than myself familiar with the code-base who could say, "Yes, I like this a lot. We should switch to it."
19:42 jekstrand: I wouldn't be surprised if we don't drop it in behind an environment variable some time soonish.
19:42 jekstrand: But no promises there.
21:35 austriancoder: can I pass custom options to an nir_algebraic.AlgebraicPass() based custom py file?