00:05jekstrand: jenatali, karolherbst: Also, from what I can see, my generic pointer handling is working a treat. I'm seeing very little generic even though the kernel source uses it for just about everything.
00:15karolherbst: I think the CTS spirv tests might be better tests for that :p
00:15karolherbst: not sure how much they use generic pointers though
00:16karolherbst: probably none :/
00:21jenatali: jekstrand: Awesome :)
04:11jekstrand: jenatali: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6871
04:11jekstrand: karolherbst: ^^
04:11jekstrand: karolherbst: At some point, we need to start optimizing stuff in clover "for real" so we can turn some of these things on. :)
04:31jekstrand: jenatali, karolherbst: There's one issue I'd still like to sort out: Alignment casts.
04:31jekstrand: We're not doing nearly as good a job as I'd hoped of getting rid of them.
04:32jekstrand: And they may be preventing some amount of optimization.
04:32jekstrand: How much I don't know yet
04:32jekstrand: Lots of deref-based optimizations just throw up their hands and walk away when they see a cast.
04:33jekstrand: Which is usually fine because our objective is to get rid of casts
04:33jekstrand: But these carry useful information that we can't represent any other way
04:33jekstrand: Any time you have a pointer that we can chase all the way back to the variable, the cast typically comes off and everything's fine
04:33jekstrand: But these pointers are being conjured out of thin air by the kernel
04:33jekstrand: Sometimes with an actual uint->pointer cast
04:35jekstrand: It's pretty clear that LLVM is determining the alignments based on base alignments of struct types
04:35jekstrand: So, for instance, if I have a struct type with a base alignment of 8, I get lots of 8 and 4-byte aligned uints
04:37jekstrand: And NIR could, in theory, figure that out except we're currently being pretty conservative about our usage of type-based alignment inference.
04:39jekstrand: Hrm... Maybe NIR can see through them. They're considered "trivial" by the deref path logic so opt_copy_prop_vars shouldn't see them.
06:04jenatali: jekstrand: I probably won't get a chance to review that til Wednesday or so, but I'll check it out when I can
08:00tjaalton: dcbaker[m]: what's blocking mesa 20.2? the tracker seems clean
08:00tjaalton: getting hard to squeeze that in ubuntu 20.10..
09:38EdB: hello, if all goes as expected, Marge will merge https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6336 (clover: set LLVM min version to 8.0.1)
09:39EdB: this MR remove the use of 'meson-clover-old-llvm:' for the gitlab-ci. so '.use-x86_build_old' is no longer used
09:41doras: MrCooper: I think `xserver` started to fail its build after your recent change around fourcc.h. I'm using autotools.
09:42EdB: may be a new 'current' with higher LLVM version could be set up and a new 'old' with older LLVM version could be set
09:43doras: MrCooper: looks like the -I flag for `libdrm` is missing from the build command of files in the dri3 directory.
09:44doras:sent a long message: < https://matrix.org/_matrix/media/r0/download/matrix.org/wGzIXmNDjlDKhkMBpaTVAcyM/message.txt >
10:21MrCooper: doras: would be more on topic on #xorg-devel
10:21MrCooper: anyway, fourcc.h is in the xserver tree, not in libdrm
10:25MrCooper: oh, this is about the dri3 de-DDX-ification, not the fourcc.h related change
10:26MrCooper: doras: try adding @LIBDRM_CFLAGS@ to AM_CFLAGS in dri3/Makefile.am
10:29MrCooper: wonder why that doesn't seem necessary on my systems nor in CI though
12:45danvet_: mripard, just add it
12:45danvet_: tbh that entire defconfig thing is a bit fail, and I'm hoping to script it with gitlab CI
12:46danvet_: but alas, that's suspended for a while now (hopefully not much longer)
14:30jekstrand: karolherbst: Another thing I've been thinking about is how we're going to support initializers for shared variables.
14:30jekstrand: karolherbst: My inclination is to do the same thing as for __constant and then emit a copy from constant to shared at the start of the kernel.
14:31jekstrand: Likely, for us, that would mean some sort of loop where the whole local workgroup helps to do the copy so we can get max bandwidth.
14:31jekstrand: And I think we'd put the initializaiton data in the nir_shader::constant_data so clover doesn't need an extra path.
14:32karolherbst: jekstrand: mhhh.. yeah, I have no idea if our hardware supports prefilling shared on launch
14:32jekstrand: And maybe have a special case for when we can just memset to 0
14:32jekstrand: karolherbst: I thought you said you had a nifty "copy global to shared" thing.
14:32karolherbst: ohh, yeah, we have an instruction, but it's relatively new
14:33linkmauve: Hi, in GLES with GL_EXT_texture_format_BGRA8888 enabled, what does it mean for shaders that the internal format must (?) be GL_BGRA_EXT?
14:33karolherbst: jekstrand: yeah.. ampere+
14:33linkmauve: Does it change anything wrt how components are addressed?
14:34linkmauve: This is about https://gitlab.gnome.org/GNOME/gtk/-/merge_requests/2625 doing something not specified.
14:34karolherbst: but let me check what it actually can do.. mhh
14:34jekstrand: linkmauve: It should just mean that you can upload BGRA
14:35jekstrand: linkmauve: Uh... looking at taht MR, I'm not sure. There are some weird corners there when you start looking at GL internal formats.
14:35karolherbst: jekstrand: so we can do a zero fill actually :D
14:35jekstrand: karolherbst: Everyeone should be able to do a cooperative loop type thing where each invocation takes a vec4
14:35jekstrand: karolherbst: Yeah, I'm not surprised you have accelerated zero-fill.
14:35linkmauve: Yeah, I’d say it’s wrong, even more so in using GL_RGBA for internal format with GL_RGB for the format.
14:36jekstrand:throws -ENOTVULKAN and walks away. :)
14:37jekstrand: karolherbst: All the kernels I've been looking at have huge initializers for shared but I think they're all zero.
14:37jekstrand: karolherbst: I'm not sure what LLVM gives us if it gives us OpNull or actual constants full of zeros
14:37jekstrand: karolherbst: The OpenCL spec says shared and global default-init to zero.
14:38jekstrand: Wait, maybe not
14:38jekstrand: It says "Program scope and static variables in the global address space are zero initialized by default. A constant expression may be given as an initializer."
14:38linkmauve: jekstrand, thanks anyway. ^^
14:38karolherbst: I think that explains those huge initializers
14:39jekstrand: karolherbst: According to the OpenCL 3 spec: "
14:39jekstrand: Variables allocated in the __local address space inside a kernel function cannot be initialized.
14:39jekstrand: Which makes me very confused
14:39karolherbst: jekstrand: and yeah.. at least from the documented stuff it doesn't seem like we can preupload stuff :/
14:39karolherbst: jekstrand: "inside a kernel function" :p
14:40karolherbst: I think it talks about parameters
14:40karolherbst: or maybe declerations inside a function as well? mhh
14:42jekstrand: karolherbst: How do __local input parameters work?
14:42karolherbst: they are just offsets
14:42jekstrand: karolherbst: That's whedre I'm seeing initializers, I think
14:43karolherbst: maybe something translates the global ones to kernel args?
14:43karolherbst: but in the end all those global ones are just static constants
14:43karolherbst: I think..
14:44karolherbst: but that kind of depends on the runtime where it places the input ones
14:44jekstrand: Wait..... I see now. The ones with initializers are __local variables declared at a function scope.
14:44jekstrand: That's weird....
14:45jekstrand: I didn't think that was legal
14:48jekstrand: Variables allocated in the __local address space inside a kernel function cannot be initialized.
14:49jekstrand: Wow... The OpenCL SPIR-V environment spec is really bad. It says nothing about any of this.
14:51jekstrand: At the very least, I suspect we need to support zero-init
14:51jekstrand: That's what we're getting from LLVM
14:52karolherbst: yeah, and I think it kind of makes sense? dunno. I always assumed those things just start as 0 anyway in hw
14:53jekstrand: Not on everyone's HW :-P
14:53karolherbst: yeah.. not sure about nvidia either
14:53karolherbst: but I don't think we do anything about it
14:53karolherbst: so it is probably fine
14:53karolherbst: otherwise we would have something to 0 init before ampere
14:55jekstrand: I think nvidia auto-zeroes but I'm not sure
14:55karolherbst: for us it's all L2 cache anyway and I would be surprised if the hw wouldn't just clear ir
14:55jekstrand: That would be nice, wouldn't it?
14:56karolherbst: is there something inside gallium for opengl to deal with it already?
14:58jekstrand: opengl doesn't guarantee zero-init
14:59karolherbst: yeah... I just saw
15:00jekstrand: Yeah, it's not OpConstantNull
15:00jekstrand: Oh, well
15:00jekstrand: We can still detect it somehow, I'm sure
15:01jekstrand: cmarcelo: ^^
15:01jekstrand: I'll add it to the list of things I need to chat with our OpenCL guys about.
15:08kisak: clover only supports nir-based driver these days right? is it realistic to expect the current push to cover blender's CL 1.2 needs by the time llvm 12 rolls out?
15:09karolherbst: kisak: what does it have to do with llvm?
15:09karolherbst: or why llvm 12 specifically?
15:10EdB: kisak: radeon llvm native path is still suported
15:10kisak: I don't realistically expect the libclc-related enhancements to get widely distributed in llvm 11 since it was branched a fair while ago, nothing special about llvm 12 except it's not branched yet
15:13karolherbst: kisak: ohh, missing things for CL 1.2 you mean?
15:15jekstrand: Yes, all the current push is round NIR and the SPIR-V path but we're trying to not perturb things too much so the LLVM path keeps working.
15:15karolherbst: well, we have a CL 1.2 MR already anyway
15:15jekstrand: One day, we may wan to convert radeon to the SPIR-V and NIR path for consistency
15:17EdB: at the moment I'm unable to get blender to compile CL kernel because of GPU register usage on my quite old card
15:17karolherbst: EdB: why should that be a problem? Or doesn't the driver spill?
15:17karolherbst: or well.. llvm
15:18karolherbst: or does blender just fail as the group size is too low?
15:18EdB: yeah llvm seems to have problem i'm 106 register for a target set to 104
15:18karolherbst: that should just get spilled...
15:20karolherbst: EdB: what I think we need to take care next is the local size restriction on the compiled binary, which is a bigger perf issue with clover atm
15:20karolherbst: I think r600 and radeonsi assume the worst case, no
15:21EdB: what do you mean by ocal size restriction ?
15:21karolherbst: the max thread count depends on the compiled binary
15:21karolherbst: so if you use more regs -> less threads
15:22karolherbst: clover doesn't support this at all and drivers report thread counts for the worst case always or so
15:22EdB: at the moment I'm trying to understand why CL_MEM_USE_HOST_PTR failed to work on my card :)
15:22karolherbst: needs userptr support
15:22karolherbst: I think the !userptr path is just broken
15:22karolherbst: or well..
15:22karolherbst: causes some synchronization issues
15:23EdB: my drivers says it supports it : dev.allows_user_pointers()
15:23EdB: but I have a dmess full of VM fault (0x00, vmid 6) at page 1049089, write from 'TC1' (0x54433100) (16)
15:24karolherbst: EdB: check get_max_threads_per_block for instance, it reports 256 threads for llvm shaders
15:25karolherbst: same for radeonsi
15:25EdB: yes, I saw that
15:25karolherbst: that one of the things we need to fix
15:26karolherbst: we always report 1024 but I think this is wrong as well
15:27EdB: i'm not familiar with that part
15:27karolherbst: mhh.. we can do 1024 threads per block, a block has either 64k or 32k regs
15:28karolherbst: and some gpus can do 63 or 255 regs per thread
15:28karolherbst: and all of those things have to match
15:28karolherbst: so if your kernel uses 255 regs, that means you can only schedule 128 threads on hw with 32k regs
15:29karolherbst: maybe only 124...
15:29karolherbst: the numbers are a bit odd and I am not sure if they are 1000 or 1024 base
15:32EdB: I'm really not use to those hardware details for the moment
18:27sravn: Anyone up for reviewing "drm/rockchip: fix build + warning" on dri-devel? I would like to fix the build
21:18mareko: does any driver lower uniforms to literals at draw time? I'll probably implement it
21:27imirkin: mareko: i'm fairly sure nvidia driver does some amount of that, esp for uniforms which affect control flow
22:56mareko: is it possible to lower uniforms to literals in vulkan drivers? do vulkan apps make it easier or harder?
22:56imirkin: vk has a thing called "specialization" afaik
22:56imirkin: which was meant for this sort of thing
23:03jekstrand: mareko: Not really possible
23:05mareko: jekstrand: I guess not with constant buffers, but what about push constants? you can certainly read them at draw time
23:09jekstrand: mareko: I suppose you could but shader re-compiles are generally considered bad in Vulkan. The whole API is designed to avoid them.
23:09jekstrand: Well, not the whole API but certainly the pipeline compilation part
23:10karolherbst: I think drivers won't care about this as long as it gives higher runtime perf...
23:10karolherbst: and there is always this game being stupid
23:10jekstrand: Also, basically no one uses push constants for anything interesting
23:12jekstrand: mareko: Is there some use-case you have in mind?
23:13mareko: jekstrand: I'll implement it for radeonsi, but it would be interesting to know if radv can use it too
23:13bnieuwenhuizen: I'd say that specialization constants are the right mechanism and we already do that
23:13jekstrand: Honestly, I'd be more interested in sharing between radeonsi and iris. There's a decent chance we could use it.
23:14bnieuwenhuizen: jekstrand: talking about specialization constants we have a first user putting a SSBO physical device address in there :P
23:14jekstrand: bnieuwenhuizen: Really? I thought the GPU-asisted validation layers already did that.
23:14jekstrand: And RenderDoc
23:15bnieuwenhuizen: ah, maybe I'm wrong about it being the first
23:15jekstrand: I'm not sure if either of those actually use spec constants for it but both do put it in the shader as an immediate of some form.
23:15bnieuwenhuizen: I didn't know the others did, that is going to mess with our disk caches
23:15jekstrand: bnieuwenhuizen: But now you've got me curious. What app?
23:15bnieuwenhuizen: jekstrand: printf-debugging in vkd3d
23:16jekstrand: bnieuwenhuizen: Seems reasonable
23:18jekstrand: Using BDA to avoid adding descriptors seems quite popular for tools
23:18jekstrand: I think the Vulkan printf layer is going to do it too
23:22mareko: jekstrand: sharing with iris is difficult; the NIR passes can be shared, but the specialization at draw time will use the asynchronous stall-free shader compilation in radeonsi, which is kinda crazy
23:27Plagman: is any of the async shader compilation infrastructure reusable?
23:27Plagman: we should use it in radv with the no-optimize bit
23:27Plagman: and maybe spec constants down the line
23:32bnieuwenhuizen: I think the problem with spec constants and delayed compilation is that they're a bit too flexible. Like you can use spec constants for array sizes or the workgroup size of a compute shader
23:32bnieuwenhuizen: that is going to be pretty hard to deal with in a generic shader binary
23:39Plagman: but you'd at least have the option of figuring out it could be a simple branch and use that early pipeline instead
23:40Plagman: you can just as well bail out of that path and go back to the current approach