00:00 anarsoul: yeah, looks like he got it working
00:05 Shibe: is anyone in the know about what all this microsoft "hardware gpu scheduling" about here?
00:05 Shibe: is about*
00:05 Shibe: is it even necessary for DRM drivers, as in will we see it there later?
00:07 jenatali: Hardware GPU scheduling is just about offloading some of the OS's scheduling policies onto the GPU's command processor
00:10 Shibe: yup, i believe you were the person i talked to the other day
00:10 Shibe: it sounds neat, but i guess is it necessary for DRM drivers or was it purely to replace the old crusty vista scheduler on windows
00:11 apinheiro: jekstrand, I think that I got working what you mentioned on the afternoon, if you are curious, this is the main (small) commit:
00:11 apinheiro: https://gitlab.freedesktop.org/apinheiro/mesa/-/commit/ad22ca399848e0e988712a074c54343cb91e39d4
00:11 apinheiro: and this an exra fix I needed to add
00:11 apinheiro: https://gitlab.freedesktop.org/apinheiro/mesa/-/commit/b9a96e5ee011fab0ab2169495e3f5eebba625ed0
00:11 jenatali: I'm a Windows dev, not really too familiar with Linux's GPU scheduling, but high level for us, it wasn't strictly necessary, though it provides opportunity to get some nice perf/power improvements
00:13 apinheiro: with that in place tomorrow I will try if I can get the copy test working with the ideas you had
00:13 apinheiro: if not, I would need to go to check why spilling is not, well, spilling for that test
00:29 jenatali: karolherbst, jekstrand: I think https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5891 is good for another pass of review
00:30 jenatali: I didn't move any existing nir compiler options to the new lowering pass we talked about, because the existing options really do seem driver-specific, as opposed to API-specific
00:53 Shibe: jenatali: interesting! how does it work internally, do you do the scheduling work on some of the gpu's shader cores, or is there some special hardware block to do it?
00:53 jenatali: Shibe: I'm a D3D (usermode) dev, not a kernel dev, so I don't have all the insights. As I understand it, we leave it mostly up to the driver to choose where the scheduling code runs
01:37 jekstrand: apinheiro: Cool!
01:53 alyssa: jekstrand: ....and now you're a CL dev ;)
01:54 alyssa: * jenatali
01:55 jekstrand:kicks his branch of to Jenkins again and hopes Jenkins enjoys it more this time than last.
01:57 jenatali: jekstrand: Clearly our usernames are alphabetically too close
01:58 jekstrand: Yeah... karolherbst and Kayden get me every time too.
01:58 alyssa: jenatali: sorry, jekstrand wins out. Everyone knows names closer to the beginning of the alphabet wins out.
01:58 alyssa: That's how it works.
01:58 alyssa:nods to her A name
01:58 jenatali: Yep, that's a fact
01:59 jekstrand:considers changing his name to Aaaron
01:59 jenatali: A-a-ron
01:59 jekstrand: Why three "a"s? Just to annoy all the Aarons out there
02:00 jekstrand: I have a friend who always puts his name in things as _daniel just for sorting purposes...
02:03 alyssa: _alyssa, hmm...
02:11 imirkin: jekstrand: or go with bobby tables...
02:11 jekstrand: imirkin: Always a good choice!
02:18 anholt: anarsoul: looks like the trick was "lower temps to if ladders (before int_to_float) instead of regs (which has to be after int_to_float)"
02:39 jekstrand: karolherbst: It seems to be early return from inside a nested loop that's throwing it off
03:53 jekstrand: karolherbst: Found it
03:53 jekstrand: karolherbst: Not sure how to fix it yet though
03:55 jenatali: jekstrand: I think I hear a 2.5-hour echo
03:55 jenatali: Er, 4.5-hour
03:55 jekstrand: hehe
03:55 jekstrand: I've got a 13-node CFG where it fails.
03:56 jekstrand: It's totally the loop optimization he talks about at the end of the HTML
03:56 jenatali:nods
03:56 jenatali:hasn't read the paper
03:56 jekstrand: All you need is a loop with two "if (foo) { /* real stuff */ break; }"
03:57 jekstrand: I'm kind-of surprised that nothing in the OpenCL CTS tripped it up
03:57 jekstrand: And only two tests in the Vulkan CTS.
03:57 jenatali: I haven't run the full CTS through the new version, though the v1 of that MR passes the full CL CTS
04:03 jekstrand: karolherbst: I've pushed my for/karolherbst branch. Don't grab anything from it just yet but feel free to give it a read. I've improved the readability a bit, I think. Hopefully, I'll squash the bug tomorrow. I know very precisely what's going wrong now.
04:04 hanetzer: agd5f_: hey. just to make sure I understand you, you're wanting 1. kernel your 'old fix' patch applies to tested and 'earliest kernel with new fix' patch tested?
04:30 agd5f_: hanetzer, right. try the patch that went upstream with an older kernel where the original patch applied and see if it also fixes the issue
04:30 hanetzer: ohhh, so if say, 5.7.5 will accept the old patch, try the new patch on it.
04:31 agd5f_: and if the old patch worked on that kernel version
04:33 hanetzer: gotcha. so three 'works' booleans. 5.7.5 with old patch, 5.7.5 with new patch, and 5.7.6 with new patch
04:33 hanetzer: and maybe we'll do a proper bisect at some point after narrowing stuff.
04:38 hanetzer: agd5f_: oh, and disclaimer. I don't think I tried the old patch with my current gpu (had to upgrade to get enough dp ports and gpu power to drive both displays at max powah
04:39 hanetzer: so it could theoretically be a gpu-specific issue :)
06:10 hanetzer: agd5f_: so. 5.7.5-gentoo with the old patch, no workie. similar failure, maybe not exactly. will have to investigate.
07:17 hanetzer: rak-zero: oh hey, didn't know you hung out here :)
07:18 rak-zero: hey hanetzer, yeah im broadly interested
07:18 rak-zero: https://www.youtube.com/channel/UCrv269YwJzuZL3dH5PCgxUw <-- i try to solve something similiar to this
07:21 rak-zero: channel was created on my birthday lol
07:21 rak-zero: (dri-devel)
07:21 rak-zero: mmmmh
07:30 MrCooper:has flashback of when the DRI project held weekly meetings on this channel
07:31 rak-zero: https://www.youtube.com/watch?v=kEB11PQ9Eo8 2 years ago today
07:31 MrCooper: (some 15-20 years ago)
07:34 rak-zero: it was created 14 years ago
07:34 rak-zero: 13,8 years ago to be exact
07:38 bbrezillon: pendingchaos_, karolherbst, jekstrand: any comments/objections on the latest version of https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4917 ?
08:56 tzimmermann: danvet, do you have comments on the latest iomem ptr patches?
09:52 danvet: tzimmermann, which series?
09:52 danvet: I seem to be blind again ...
10:07 tzimmermann: danvet, this one: https://lore.kernel.org/dri-devel/20200806085239.4606-1-tzimmermann@suse.de/
10:11 bbrezillon: jekstrand, karolherbst: I realize we don't have an glsl_type::explicit_align() method, and if we really want to stop using size_align() funcs and move to explicit types in https://gitlab.freedesktop.org/mesa/mesa/-/commit/44e8d7708eec989d8b6b9bcdbd69b6deb25b0ac1?merge_request_iid=5900, I think I need a way to retrieve the explicit alignment
10:12 sravn: tzimmermann: iomem ptr patches is also on my backlog, will try to take a look tonight
10:13 tzimmermann: sravn, thanks. the patchset is still RFC, but it should be complete enough to build on x86_64
10:13 danvet: tzimmermann, ah missed it, will take a look
10:14 tzimmermann: sravn, danvet, if we go with this approach, i'd at least propose something similar to dma-buf's vmap/vunmap.
13:27 tjaalton: eric_engestrom: hi, looks like '-Ddri-drivers=nouveau, i915, i965, r200, r100,' is now invalid with 20.2, assuming it's because of d32144602c1dfd?
13:31 eric_engestrom: tjaalton: yup, you have a trailing `,`
13:31 shadeslayer: eric_engestrom: could you assign https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6125 to Marge Bot please :)
13:32 eric_engestrom: shadeslayer: let me have a look first, but I'm on a conference call right now
13:32 shadeslayer: eric_engestrom: sure, not in a hurry :)
13:45 tjaalton: eric_engestrom: makes it harder to build an array which supports different archs/kernels..
13:47 tjaalton: assuming having a leading ',' is just as broken now
14:12 dcbaker[m]: tjaalton: we can probably fix that in meson itself, can you file an issue?
14:21 tjaalton: or maybe I'll just rework it on the packaging
14:21 kisak: tjaalton: in the mean time, might be easier to sed away the trailing comma instead of carefully crafting the variable? 's/,$//'
14:22 tjaalton: maybe, unless this ends up looking less ugly
14:25 jekstrand: bbrezillon: Hrm...
14:25 jekstrand: bbrezillon: Maybe?
14:25 jekstrand: bbrezillon: At least for struct types, we do.
15:03 jenatali: Hm, looks like https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6112 added a test that doesn't build on Windows but doesn't have any build gating to prevent it, so it just errors :(
15:12 jenatali: karolherbst: Finally got around to rebasing https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5242, should be ready for more review or testing
15:12 jekstrand: karolherbst: I've got the bug fixed, I think. Running through Jenkins again.
15:13 karolherbst: cool
15:53 jenatali: Damn, the image patches broke freedreno Vk somehow
15:59 jekstrand: :(
15:59 tomeu: jenatali: if you tag each of your commits, you can easily bisect by starting pipelines for each of them
16:00 tomeu: and knowing the exact commit could give enough clues to fix it?
16:00 jenatali: tomeu: That's helpful, but in this instance, I'm pretty sure I already know which commit, I'm just not sure how :)
16:00 jekstrand: Did it break Intel?
16:00 jekstrand: That might be easier to debug
16:01 jekstrand: But we're not in that CI system
16:01 jenatali: Dunno :) I don't have access to Intel CI
16:04 jekstrand: craftyguy: ^^
16:05 jekstrand: jenatali: What MR?
16:05 jekstrand: jenatali: Just thinking that might be easier as there's a decent chance you at least have access to hardware. :)
16:05 jekstrand: Though, you probably have some Qualcomm hardware kicking around....
16:05 jekstrand: Not sure how easy it is to run Linux on those devices
16:07 craftyguy: jenatali: I can give you access to trigger mesa testing jobs in the intel CI, if you're interested I can DM you to work out the details
16:08 jenatali: craftyguy: Sure - not sure how often I'll end up using it, but it couldn't hurt?
16:08 jenatali: jekstrand: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5242
16:32 jekstrand: jenatali: I've got it running in CI now
16:32 jenatali: jekstrand: Thanks. Looks like I broke ImageFetch somehow
16:34 jenatali: Oh, I see it :)
16:37 jekstrand: karolherbst, jenatali: I just force-pushed my for/karolherbst branch. It now passes the full Vulkan CTS with unstructured forced on
16:37 jenatali: :D
16:37 karolherbst: nice :)
16:37 jekstrand: karolherbst: I don't expect anything more than an acked-by or tested-by on the patches from me.
16:37 karolherbst: I am fine with that
16:38 jekstrand: karolherbst: THere are a few more fixup ones in there
16:38 karolherbst: okay, cool. Will take a look and merge it into my branch
16:38 jekstrand: karolherbst: Also, it's based on my shamrock branch; please don't include the two iris patches in your MR
16:38 jekstrand: jenatali: I don't know how much testing you want to do on it.
16:39 jekstrand: I'm mentally debating between landing ASAP and making sure it's tested well
16:39 jenatali: jekstrand: I can at least pull it into our tree and run it through some basic tests
16:39 jenatali: A full CL CTS run takes a long time though, and I doubt it'd really find issues - the CL CTS doesn't do a lot of crazy control flow in its test kernels
16:40 vpandya_: https://www.irccloud.com/pastebin/gJ26YKGE/
16:40 jekstrand: jenatali: How about this: Do whatever testing makes you feel comfortable slapping a T-B on my patches. :-)
16:40 jenatali: jekstrand: Works for me
16:40 vpandya_: Hello, I found src/amd/vulkan/radv_pipeline.c
16:40 vpandya_: 4998:VkResult radv_CreateGraphicsPipelines(
16:40 vpandya_: entry point but I don't who is caller to this function.
16:40 vpandya_: Can someone please explain how this works?
16:40 vpandya_: P.S I have never worked with driver.
16:40 jenatali: I'll get to that in a little bit
16:41 jekstrand: jenatali: Maybe let karolherbst do the merge first, assuming he's up for that
16:41 jekstrand: Or I can do the bit of squashing and push karolherbst's branch if he'd rather
16:41 jenatali: I was hoping he'd get to that before I had time to pick it up anyway :)
16:41 jekstrand: Ok
16:41 karolherbst: jekstrand: you can do it as well if you want to :p
16:42 jekstrand: karolherbst: I'll just push your MR branch then
16:42 karolherbst: yep
16:50 pendingchaos: vpandya_: the caller is the Vulkan loader's vkCreateGraphicsPipelines() (and I think the loader obtains the function pointer for it with RADV's vk_icdGetPhysicalDeviceProcAddr())
16:51 jekstrand: karolherbst: Updated. All patches are now either authored by me or RB me.
16:51 karolherbst: cool
16:51 jekstrand: karolherbst: As soon as you and jenatali are happy with it, feel free to merge.
16:51 jekstrand: \o/
16:51 jekstrand: I'm kind-of hoping it'll land this weekl.
16:52 jenatali: Exciting :)
16:52 jekstrand: Now time to figure out how to implement generic pointers.
16:52 jenatali: I should have results this afternoon
16:52 jekstrand: *sigh*
16:52 jenatali: :(
16:52 jekstrand: I think my plan is to let nir_deref_instr::modes be a bitmask of what modes it might be.
16:53 jenatali: I feel like there's some more fundamental pieces that are missing before jumping to generic pointers - unless you've got a specific use case already in mind
16:53 jekstrand: Then generics become global | shader_temp | function_temp | shared
16:53 jekstrand: And we can have a pass which restricts them down based on parent derefs
16:53 jenatali: Clever
16:53 jekstrand: And, hopefully, by the time we get done with function inlining and optimization, none of them are ever left. :-)
16:53 jenatali: Asking selfishly, any chance you could wait a little bit until we're more closely synchronized with upstream? :)
16:53 vpandya_: @pendigchaos I found https://gitlab.freedesktop.org/mesa/mesa/-/blob/master/src/amd/vulkan/meson.build#L21
16:54 jenatali: Constantly chasing giant refactorings is getting a little tiring :P
16:54 clever:waves
16:54 jenatali: clever: Hah, sorry, didn't realize that was a username :P
16:54 jekstrand: jenatali: Yeah, I feel you.
16:54 clever: happens fairly rarely
16:54 karolherbst: jekstrand: we might want to enforce that hw is able to map all memory into the global address space
16:54 jekstrand: karolherbst: Not all hardware can
16:54 karolherbst: :/
16:54 jekstrand: karolherbst: On Intel, we have to switch on some pointer bits
16:54 jenatali: For us, we'd generate an if-ladder
16:55 karolherbst: jenatali: right...
16:55 jenatali: (Though we don't really care about generic pointers)
16:55 karolherbst: so we need the if ladder fallback anyway
16:55 jekstrand: The way I'm thinking of doing it is to stuff something in the top 2 bits of the pointer
16:55 jekstrand: 0/~0 for global, 1 for shared and 2 for scratch
16:55 jekstrand: Or something like that
16:55 karolherbst: jekstrand: the issue is just that generic pointers can leave the kernel
16:56 jenatali: karolherbst: Pointers leaving the kernel isn't super useful unless you have SVM
16:56 karolherbst: right
16:56 karolherbst: but you could also just put them into shared memory
16:56 karolherbst: just leaves the thread then
16:56 jekstrand: And if you let a shared or shader_temp pointer leave the kernel, it's an invalid pointer the moment the workgroup ends.
16:56 karolherbst: not quite sure what the expectation would be then
16:57 jekstrand: I think it'd be the same as any other garbage pointer deref
16:57 jekstrand: jenatali: If you're ok with the top-two-bits convention I mentioned above, we could probaly generate the if-ladder directly in nir_lower_explicit_io.
16:58 jenatali: jekstrand: Sure, seems fine to me
16:58 jekstrand: jenatali: Effectively, "if ((ptr >> 62) == 1) load_shared(); else if ((ptr >> 62) == 2) load_scratch(); else load_global();"
16:58 jenatali: Though again I don't particularly care about generic pointers :P but if we get it for free, I don't see the harm in flipping a cap to light it up
16:59 jenatali: Yeah, makes sense
16:59 jekstrand: We have to keep 0/3 both reserved for global because the top bits there can be either 0 or ~0
17:00 jekstrand: Not sure what to do about NV where they have the crazy address space mapping
17:00 jekstrand: I guess you'd likely want to just have a load_scratch_base_ptr and load_shared_base_ptr to start the access chain or something like that
17:01 jekstrand: jenatali: I care about generic pointers only so far as I need them to compile these stupid kernels I'm playing with.
17:01 jenatali: Ah, gotcha
17:02 jenatali: Just like how I needed images for the kernels I care about, even though they're technically optional for CL
17:03 karolherbst: jekstrand: yep, that was my plan
17:04 karolherbst: having some intrinsics to convert into global address
17:04 karolherbst: and back again
17:04 jekstrand: karolherbst: I've already added a scratch_base_ptr intrinsic which you get if you use any of the _global address modes for scratch lowering.
17:04 jekstrand: karolherbst: It wouldn't take much to add one for shared
17:06 karolherbst: yeah, I guess :)
17:19 jekstrand: karolherbst: What happens if the .cl file I pass in has #includes?
17:20 jenatali: jekstrand: By default, they're searched on disk, based on "-I" options passed to the compile/build functions
17:20 jekstrand: hrm....
17:20 karolherbst: jenatali: then it includes :p
17:20 jekstrand: I think I need to make my test app smarter then
17:20 jenatali: If you use CL1.1's separate compilation APIs, you can create cl_program objects that can replace #includes
17:20 karolherbst: clang even adds implicit includes
17:21 jenatali: Not sure if Clover supports that feature though :)
17:21 karolherbst: #include <opencl-c.h> I think is the one clang needs :)
17:21 karolherbst: jenatali: doubtful :D
17:21 jenatali: karolherbst: Yeah, I had to do some refactoring to get it to not look for <opencl-c.h> on disk...
17:21 karolherbst: :D
17:21 jekstrand: Is there a simple CL program I can grab from somewhere that takes a bunch of -I and -D args and runs it through the driver without executing it?
17:22 jenatali: I'd say the CL CTS test_compiler, but that does definitely execute it afterwards
17:22 karolherbst: jekstrand: you can just run clang directly to generate llvm ir and push it through llvm-spirv :p
17:28 jenatali: Okay, my libclc MR should be good to review as well, though it can't land until the structurizer is in and libclc upstream is actually updated...
17:34 jekstrand:loves open_memstream
17:39 HdkR: Oh! That is actually really neat. That might be better than a tmpfile hack I'm currently using :D
17:40 mattst88: yeah, that's an awesome function
17:41 jekstrand: I always forget about it and get tempted to start making my own stream data structure
17:44 karolherbst: ohh btw.. how do people feel about the tegra driver advertising "NVIDIA" as the opengl vendor?
17:45 karolherbst: https://gitlab.freedesktop.org/mesa/mesa/-/blob/master/src/gallium/drivers/tegra/tegra_screen.c#L62
17:47 anholt: jenatali: I'll see about bisecting that vk fail
17:47 jenatali: anholt: I found it, no worries
17:47 anholt: oh, cool
17:47 jenatali: If you were curious: CL allows sampling with integer coords, so I mistakenly tried to convert all tex ops with integer coords to float
17:48 jenatali: But things like ImageFetch should actually accept integer coords
17:48 anholt: heh
17:55 jekstrand: karolherbst: Is there something wrong with clGetProgramBuildInfo?
17:56 jekstrand: CL_PROGRAM_BUILD_LOG in particular
18:03 karolherbst: jekstrand: could be?
18:04 karolherbst: but I think retrieving logs should work
18:05 jekstrand: karolherbst: Nah, my log buffer was too small. :)
18:05 jekstrand: Lots of logs
18:05 karolherbst: :D
18:06 karolherbst: just wanted to point you towards this MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5038
18:06 jekstrand: I grabbed your HMM tests and am trying to turn it into a "run my shader please" program
18:09 jekstrand: karolherbst: How do we pass the CL version to LLVM to do the compile?
18:09 karolherbst: -cl-std=CL2.0
18:10 jekstrand: karolherbst: How do I request CL 2.0 on driver init?
18:11 jenatali: jekstrand: You don't? The driver either supports it or doesn't, there's no modality
18:11 jenatali: Oh, nevermind
18:12 jekstrand: Woo! Down to just one error
18:12 jekstrand: input.cl:158:37: error: use of undeclared identifier 'MAX_HW_SIMD_WIDTH'
18:12 jekstrand: Which is totally because someone somewhere isn't including something
18:14 jekstrand: This is fun:
18:14 jekstrand: compile-test: lib/SPIRV/libSPIRV/SPIRVEntry.h:717: void SPIRV::SPIRVComponentExecutionModes::addExecutionMode(SPIRV::SPIRVExecutionMode*): Assertion `IsCompatible(ExecMode, (*I).second) && "Found incompatible execution modes"' failed.
18:14 karolherbst: yeah.. I guess generic is just super broken :p
18:14 karolherbst: ohh wait
18:14 karolherbst: that's something else
18:14 karolherbst: strange
18:15 jenatali: jekstrand: What execution mode is it trying to add?
18:16 jekstrand: Unclear. I don't have those symbols
18:16 jekstrand:rebuilds spirv-llvm-translator
18:19 jekstrand: Ugh... spirv-llvm-translator is blowing up on Intel extensions.
18:19 jekstrand: *sigh*
18:21 karolherbst: :D oh well
18:21 jekstrand: It doesn't like ExecutionModeLocalSize
18:21 karolherbst: mhhh, strange
18:21 karolherbst: I thought that's supposed to work
18:21 jenatali: That... should definitely be handled
18:22 karolherbst: jekstrand: I am on 1e661b2bfb9782ed3d7619a11b198e84abcd7eba
18:22 jekstrand: same
18:22 karolherbst: mhh
18:25 jenatali: Looks like the complaint is that there's two incompatible execution modes
18:25 jenatali: The question is what's the other one
18:26 jekstrand: They're both LocalSize
18:26 jekstrand: And they're the same
18:26 jenatali: How'd you get two of them?
18:26 jekstrand: Good question
18:26 jenatali: That seems odd...
18:28 jenatali: jekstrand, karolherbst: Managed to get your new-new version of the structurizer picked into our tree: https://gitlab.freedesktop.org/kusma/mesa/-/merge_requests/245
18:29 jenatali: Passed our trivial unit tests, going to run it through some CTS
18:29 jekstrand: \o/
18:35 bl4ckb0ne: is there a way to force the recreation of a container with the ci-templates? i added a package to it and the checks is still passing
18:36 mareko: jekstrand: I'm planning to encode in/out variable info in load_input/output intrinsics and remove all in/out nir_variables from the IR, and nir_shader_gather_info will scan the intrinsics, any concern with that?
18:36 mareko: cwabbott: ^^
18:36 jekstrand: mareko: Might need some nir_shader_gather_info changes but I see nothing wrong with it.
18:36 jekstrand: We typically leave the variables around longer so I'd like that path to keep working.
18:37 jenatali: Us too :)
18:37 jekstrand: zmike: I was thinking a bit about your UBO/SSBO problems a bit last night
18:37 zmike: uh oh
18:37 dcbaker[m]: bl4ckb0ne: delete the container from your registry
18:38 bl4ckb0ne: oh right that can do it
18:38 bl4ckb0ne: thanks
18:38 jekstrand: zmike: I think what you probably want to do is to disable the nir_lower_explicit_io call in gl_nir_lower_buffers and use derefs directly.
18:38 bl4ckb0ne: its time to make that xdg-shell build pass now that the CI is up to date on waffle
18:39 jekstrand: zmike: Thenk you coule map GL UBOs and SSBOs directly to SPIR-V like A Vulkan app would.
18:39 zmike: jekstrand: I'd considered that as another option during the initial UBO impl since it's a thing for samplers
18:40 zmike: I don't remember why I didn't go with it
18:40 jekstrand: It sounds a lot easier to me than just having a array descriptor where each buffer is an array of vec4s
18:41 zmike: uhhhh well that's something else entirely
18:41 zmike: the deref usage and the actual loading are more or less unrelated
18:41 jekstrand: ?
18:42 zmike: you're talking about using derefs for ubo/ssbo initially to replace the dynamic access pass i wrote
18:42 zmike: but then the array of vec4 is the ubo/ssbo memory layout, which only matters for how the memory is accessed once I know the block id
18:42 jekstrand: I think it might help with both?
18:43 bl4ckb0ne: i think it pull the one from the upstream repo by default https://gitlab.freedesktop.org/bl4ckb0ne/waffle/-/jobs/4073895
18:43 zmike: it seems like two separate things to me...the deref thing is just the OpAccessChain base pointer
18:43 zmike: and the vec4 layout is the index array
18:44 zmike: using derefs sure, I could create arrays of ubo/ssbo as a single variable, but then that's still just adding another index into that OpAccessChain index array
18:44 zmike: so ultimately it's separate in how I'm calculating the offset to access beyond that
18:44 jekstrand: Yeah, I guess you could separate them
18:45 zmike: well I'm assuming I'd get a deref_array in this case, which would be another OpAccessChain that I'd then get passed for e.g., load_ubo
18:45 jenatali: karolherbst, jekstrand: New structurizer looks good on CL CTS's test_basic - I don't really think any of the other tests throw any kind of complex control flow at the compiler, so that's probably the best I'll be able to do
18:45 zmike: so at that point I guess all I'd be doing is removing the first index I'm using there
18:45 jenatali: I did throw it through a Photoshop filter too just for fun and that's fine too, so I'm good with adding my Tested-by
18:46 jekstrand: jenatali: \o;/
18:46 HdkR: jekstrand: Dang. You can't call fileno(3) on the FILE* returned from open_memstream. That explains why I didn't use it
18:47 jekstrand: HdkR: Sorry to dissapoint
18:50 zmike: jekstrand: would there be any perf benefit to moving to the deref method vs the pass in my MR?
18:51 jekstrand: zmike: Unknown. Depends on the underlying driver, I suppose.
18:51 jekstrand: If it's mesa, probably not.
18:51 jekstrand: zmike: It just seems like it'd be simpler
18:51 jekstrand: Why lower to an index+offset and then back to derefs when you can just use the derefs
18:52 jekstrand: I guess you probably have to handle index+offset for cbuf0
18:53 zmike: hm
18:54 zmike: I suppose I'll investigate it since I don't actually need the dynamic lowering until zink hits 4.0 outside of my branch, which is likely not going to be anytime soon
18:55 zmike: thanks for the tip
18:55 jekstrand: np
18:56 zmike: certainly it would make the nir output look nicer on the other side of vtn than what I have now
19:04 jekstrand: jenatali: Looks like maybe an issue with a pre-declaration of a function and the function itself
19:06 jekstrand: No, that's not it either
19:06 jekstrand: It's only showing up once
19:06 jekstrand: In the source, that is
19:10 jekstrand: And only once in LLVM
19:10 jekstrand: And one SPIRVFunction
19:14 jekstrand: The kernel is listed in the module multiple times...
19:23 jekstrand:tries adding a generic "don't stick on duplicate execution modes" thing
19:23 jekstrand: Except SPIRV::SPIRVExecutionMode doesn't support ==. Of course, it doesn't.
19:35 jekstrand: For some reason, my llvm::Module has 3 functions, two of which are duplicated in the list twice.
19:35 jekstrand: For a total of 5
19:36 jekstrand: Is this even legal LLVM?
19:36 jekstrand:is clueless
19:37 jekstrand: And by "duplicated", I mean the same llvm::Function pointer twice
19:41 jekstrand: And the order in which I place the kernels in the .cl file determins which ones get duplicated and how many times.
19:41 jekstrand:is thoroughly wierded out
19:43 jenatali: jekstrand: Are you calling one kernel from another?
19:43 jekstrand: jenatali: Nope
19:43 jenatali:shrugs
19:44 jekstrand: LLVM anything is soooooo much code!
19:45 pmoreau: 😲 This is still in LLVM IR, right?
19:45 jekstrand: pmoreau: yes
19:45 pmoreau:hopes it’s not coming from some bugs in spirv-linker
19:45 jekstrand: pmoreau: Very much still in LLVM
19:46 pmoreau: Okay cool, not my bugs then :-D
19:46 jekstrand: But, for some reason, the function is duplicated in the module.
19:46 jekstrand: pmoreau: Do you know SPIRV-LLVM-Translator well enough to have any clue why?
19:46 pmoreau: I know it a bit, but that’s about it…
19:46 pmoreau: If you can share the test program, I can have a look at it
19:47 jekstrand: I can't. :-(
19:47 jenatali: jekstrand: Is this happening *before* sending to the translator or only after?
19:47 jekstrand: Let me see if I can make something simpler that reproduces it
19:47 jekstrand: jenatali: Trying to figure that out now
19:48 jenatali: jekstrand: My go-to for playing around with that kind of thing is godbolt, just throwing all the CL switches on the command line to inspect the bitcode output
19:51 pmoreau: jekstrand: FYI, you can use `CLOVER_DEBUG=llvm` to dump the LLVM IR before it gets to the translator, though you are maybe already doing that.
19:51 jekstrand: pmoreau: Really? That sounds useful
19:53 pmoreau: CLOVER_DEBUG takes a list of comma-separated flags, which can be `clc` (for the source OpenCL C), `llvm` (for the LLVM IR), `native` (not needed in your case), and `spirv` (which will dump the SPIR-V disassembly as outputted by the translator)
19:53 jekstrand: pmoreau: It looks like they each show up once in the LLVM at that point. :-/
19:54 pmoreau: Mmh, so the translator doing something funky then :-/
19:55 pmoreau: Or could be the regularizer from the translator…
19:56 jekstrand: Yeah, the translator must be doing something funky
19:56 pmoreau: You could try adding an extra print after the regularizer (or removing the call altogether) in `src/gallium/frontends/clover/llvm/invocation.cpp:442`.
19:58 pmoreau: (There is also `CLOVER_DEBUG_FILE` which you can set, rather than having the IR and other debugs outputted to `stderr`.)
19:58 jekstrand: pmoreau: After the regularize pass, it still only has one instance
19:59 jekstrand: It does, however, now have an entry to it in something that looks like a function table at the end.
19:59 jekstrand: No idea what's up with that
19:59 pmoreau: No idea either
19:59 jekstrand: It looks some SPIR-V module thing that's generated by the regularizer
20:00 pmoreau: I’ve been lucky enough to not have too many issues with the translator so far. 🤞
20:00 jekstrand: So, yeah, something in the translator is blowing up and duplicating my function
20:01 jenatali: jekstrand: https://gitlab.freedesktop.org/kusma/mesa/-/commit/e4db31b11e2fb57dd435c0f2bf22049742885ae9
20:02 jekstrand: jenatali: Oh, really? Nice!
20:02 pmoreau: You’re getting atwo functions for the price of one! You shouldn’t be complaining, that’s a great deal :-p
20:02 pmoreau: Oh, good to know!
20:02 jekstrand: Why isn't that upstream? :-(
20:02 jenatali: Yeah, sorry took me a while to remember that I had seen something related to regularizing
20:03 jenatali: jekstrand: We have a slightly different CLC -> SPIRV path for CLOn12 that's not currently code sharing with Clover
20:03 jekstrand: jenatali: Yeah, but it'd be really nice when you find those things if you make patches... :-)
20:03 pmoreau: I wonder if that behaviour changed in a later version, or if it had always been that way and I missed it.
20:03 jenatali: jekstrand: Yeah, yeah, I'm working on it :P
20:03 jenatali: That's my whole todo list for the rest of the day, is to send you more patches
20:04 jekstrand: lol
20:04 jekstrand:braces for the dump
20:05 jekstrand: Capability '38' is not supported
20:05 jekstrand: That would be GenericPointer
20:09 jekstrand: Are generic pointers 2.2 or 2.0?
20:10 Sachiel: 2.0
20:10 jekstrand: Ah, it's clover throwing a warning
20:11 jekstrand:adds a hack
20:15 jekstrand: Cool. Now it's failing to link
20:15 jekstrand: Maybe getting somewhere?
20:15 jekstrand: hrm...
20:16 pmoreau: Ah, I should have an easier time helping with that
20:16 jekstrand: I'm seeing a linking failure with
20:16 jekstrand: [Error] At word No.0: "Unresolved external reference to "sort8_descending"."
20:16 jekstrand: But sort8_descending isn't actually used by anything AFAICT
20:16 pmoreau: Mmh
20:16 karolherbst: jekstrand: you look like you are having fun :p
20:16 jekstrand: So it should get dead-coded
20:16 jekstrand: karolherbst: After a fassion
20:17 karolherbst: maybe we should run the spvtools::dce :p
20:17 pmoreau: AFAIR, the linker does not check whether functions are dead-code or not.
20:17 karolherbst: pmoreau: actually.. maybe we should run DCE on the spirv?
20:18 pmoreau: Maybe
20:20 jekstrand: Or run DCE on the LLVM
20:20 jekstrand: Though I guess you want to link first
20:20 karolherbst: 2.1 requires spirv support anyway
20:21 karolherbst:doens't trust llvm enough to run DCE
20:21 karolherbst: imagine they start removing blocks as well and do weird things.. no no
20:21 jekstrand: For me, it's not a problem of trusting LLVM so much as trusting that I know enough about LLVM to not mess it up. :)
20:21 jekstrand: karolherbst: Oh, come on.... You don't think our structurizer is up to it?
20:22 jenatali: jekstrand: Is that just a function declaration that's never called?
20:22 karolherbst: LD
20:22 karolherbst: :D
20:22 karolherbst: has a brilliant idea
20:22 karolherbst: llvm -> spirv -> nir -> optimize -> spirv -> llvm and run it somewhere and hope it's now faster than only using llvm :p
20:23 jekstrand: jenatali: It looks like it's an inline
20:23 jekstrand: which isn't ever used
20:23 jenatali: I don't see how there could be an external reference to it if it's never used...
20:24 jekstrand: jenatali: Hrm... It is used
20:24 jekstrand: But ultimately, it's all dead code
20:24 jekstrand: It's definitely included
20:25 pmoreau: The SPIR-V linker is relatively simple: it builds a table of all functions marked with the LinkageType Import or Export and tries to mix and match them. It will allow unresolved cases if you are building a library, since you might link some more stuff further down the road. But it does not look at whether functions are used or not.
20:25 jenatali: Right, the assumption is that it should only be declared as a needed import if it's actually used
20:25 jekstrand: I see this:
20:25 jekstrand: ; Function Attrs: convergent inlinehint nounwind
20:25 jekstrand: declare spir_func i32 @sort8_descending(i32) local_unnamed_addr #9
20:26 jekstrand: And a use of sort8_descending
20:26 jekstrand: But no declaration
20:26 jekstrand: So I don'tthink the linker is lying
20:26 jekstrand: But it's definitely declared
20:26 jenatali: jekstrand: How are you getting the definition?
20:26 jekstrand: Oh, wait... maybe it isn't
20:26 jenatali: :)
20:27 jekstrand: No, it's definitely in a header that should be getting pulled in
20:27 pmoreau: How is the SPIR-V looking? I wonder if it’s something with the types which aren’t considered the same, something similar to https://github.com/KhronosGroup/SPIRV-Tools/issues/1145
20:28 jenatali: Sounds like the LLVM doesn't have a definition, so I don't think it's a translator/linker issue
20:28 jekstrand: But it doesn't exist in the LLVM so clang isn't generating it for some reason
20:28 jekstrand: Maybe clang is dead-coding half the stuff but not the other half?
20:29 jenatali: Try turning off optimizations?
20:29 jekstrand: how?
20:29 jenatali: -O0?
20:29 jekstrand: Here's a thing which should all give us hope for the future:
20:29 jekstrand: inline uint generic_atomic_add(uint *p, uint val)
20:29 jekstrand: {
20:29 jekstrand: if (to_global(p) != NULL)
20:29 jekstrand: return atomic_add(to_global(p), val);
20:29 jekstrand: if (to_local(p) != NULL)
20:29 jekstrand: return atomic_add(to_local(p), val);
20:29 jekstrand: return 0;
20:29 jekstrand: }
20:37 jenatali: karolherbst: We talked about genericizing the printf lowering that CLOn12's using - I've got the already-generic patches for printf here: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6308. Did you want me to try genericizing the printf lowering pass as part of this?
20:37 jekstrand: Running it through cpp with exactly the same args, sort8_descending is defined. :-/
20:38 jenatali: jekstrand: And running through C? Not cpp?
20:39 jekstrand: jenatali: ?
20:39 jenatali: I just could see weird language rule differences between C and CPP causing it to work with CPP, but CLC is based on C
20:39 jenatali:shrugs
20:39 jekstrand: Oh... I guess that's possible but it seems super unlikely
20:40 jekstrand: I wonder if clang is just not generating actual versions of functions it thinks should be inlined
20:40 jenatali: OH! Wait this sounds familiar, hold on
20:40 jenatali: https://github.com/KhronosGroup/SPIRV-LLVM-Translator/issues/432
20:41 jekstrand: -Dinline= it is!
20:41 jekstrand: That seems to do the trick.
20:43 jekstrand: Someone doesn't understand NoSignedWrap.....
20:44 karolherbst: jekstrand: wait...
20:44 pmoreau: IIRC it’s still something we need to accept on the clover side, right now we bail on all extensions. But Karol should remember best (and maybe even have patches).
20:44 karolherbst: jekstrand: https://gitlab.freedesktop.org/karolherbst/mesa/-/commit/fd440b794c53d17d4436729b577b555e8597da1d
20:44 karolherbst: we are just super picky
20:45 karolherbst: pmoreau: maybe we should just accept all and fail in spirv_to_nir :p
20:45 jekstrand: Clearly, I just don't have enough commits in my tree. :)
20:45 jekstrand: We should probably fail in clover
20:45 jekstrand: spirv_to_nir does not fail gracefully
20:46 karolherbst: right.. but we kind of need to coordinate what vtn supports or we just enable when we hit something? dunno
20:46 pmoreau: ^
20:46 pmoreau: How does clover know which extensions spirv_to_nir supports?
20:46 karolherbst: it doesn't
20:46 jekstrand: It's in the same code-base
20:47 karolherbst: vtn doesn't check extensions, does it?
20:49 jekstrand: Looks like my shaderc is too old
20:55 jekstrand: Why do all these projects use so much C++? They take forever to compile... :-(
20:56 karolherbst: jekstrand: because people write compiler in C++ :p
20:56 jekstrand: I could brew a coffee in the time it takes shaderc to compile.... at Portland/Seattle hipster speeds!
20:57 jekstrand: Still not as bad as the time I tried to download Amber over hotel wifi during a Khronos meeting.
20:57 karolherbst: but honestly.. do we have any compiler project which is used and is not C++ besides what we have in mesa? :D
20:57 jekstrand: It's > 1GB of source code once you pull all the deps.
20:57 karolherbst: ouch
20:57 pmoreau: :o
20:57 jekstrand: It does pull LLVM.... twice.
20:57 jenatali: Twice?
20:58 pmoreau: For twice the fun, I guess
20:58 jekstrand: Once for DXC and once for OpenCL
20:58 jenatali: Ah
20:58 jenatali: I hate that that makes sense
20:59 jekstrand: You and me both
21:00 karolherbst: and people prefer using llvm as gcc was apparently too painful to deal with :p
21:01 anholt: well, gcc licensing makes it a non-starter for library usage.
21:02 karolherbst: right.. but apparently llvms API is also way easier to use and whatnot
21:02 jekstrand: Oh, great, now I have a newer version of the validator and my SPIR-V doesn't validate. 😠
21:03 jekstrand: This is turning into quite the adventure
21:03 karolherbst: uggghhh :/
21:03 karolherbst:maybe should continue with fixing spirv-llvm-validator :D
21:03 pmoreau: What’s the issue?
21:03 karolherbst: I even added support to run CI with spirv-val
21:03 karolherbst: pmoreau: spirv-llvm being broken :p
21:03 pmoreau: :-D
21:03 karolherbst: it can generate quite a lot of invalid spirv
21:04 jenatali: Yes, yes it can
21:04 jenatali: jekstrand: Maybe just don't validate it? :)
21:04 karolherbst: at least it's easy to write tests
21:04 karolherbst: fixing... is a different story
21:04 pmoreau: > I even added support to run CI with spirv-val
21:04 pmoreau: I remember that
21:04 jekstrand: [Error] At word No.9556: "ID 4288[%lower_x_1] has not been defined
21:04 jekstrand: %4289 = OpCompositeInsert %v4uint %lower_x_1 %4213 0
21:04 karolherbst: jenatali: I think we do that in clover
21:05 karolherbst: ehhh...
21:05 karolherbst: I think I remember that one
21:05 karolherbst: this was something super silly
21:07 jenatali: FWIW you won't be able to pass the CTS with validation enabled, at least with the current translator
21:12 jenatali: Hm, I should probably see about getting developer permissions for Mesa so I can actually label all the MRs I'm creating without bugging people...
21:13 jekstrand: [Error] At word No.0: "Unresolved external reference to "_Z10atomic_minPU3AS1Vff"."
21:13 jekstrand: Do I need mangler patches?
21:13 jenatali: You need libclc
21:13 jekstrand: Oh
21:13 jenatali: Actually, hold on
21:13 jekstrand:just needs more patches
21:14 airlied: just merge all rhe branches
21:14 jekstrand:is going to personally upstream all of jenatali's patches by the end of the week from the looks of it
21:14 jenatali: No, I'd expect that to be an opcode if it was coming from libclc
21:14 jenatali: Oh, unless the translator just doesn't know about that function
21:15 jenatali: Oh, the translator's probably running in 1.2 mode instead of 2.0
21:15 jekstrand: The translator should be running in 2.0 mode
21:16 jekstrand: But maybe not
21:16 jekstrand: I don't know what I'm doing
21:16 jenatali: Wait, those atomics are in 1.2 so nevermind
21:16 jekstrand: This header file is re-declaring them for some reason
21:18 jenatali: Is that supposed to be operating on floats?
21:18 jekstrand: Uh... maybe?
21:19 jenatali: According to the CL spec there's only supposed to versions that operate on int/uint
21:19 jekstrand: There's an intel extension which adds atomic float support
21:19 jekstrand: Maybe I need to implement that in the translator?
21:19 jenatali: Ah, the translator probably just doesn't match that to an opcode
21:20 jekstrand: Hrm... It looks like there isn't a SPIR-V extension for that
21:20 jekstrand: WTH? We've had the hardware for years
21:21 jekstrand: And I know there's an OpenCL extension
21:21 jenatali: Apparently only for source and not for SPIR-V?
21:23 jenatali: jekstrand: https://github.com/KhronosGroup/SPIRV-LLVM-Translator/blob/master/test/InvalidAtomicBuiltins.cl :)
21:28 jenatali: I don't see it as an OpenCL extension in the spec
21:31 jekstrand: Yeah, there isn't one
21:31 jekstrand:doesn't know what's going on
21:31 jenatali: :)
21:44 jekstrand: Looks like I get to write an extension... And clc support and spirv-llvm-translator support and....
21:45 HdkR: Whoa, someone actually caring to write a CL extension, how rare ;)
21:47 jekstrand: I just don't like how much work this is looking like it's going to be
21:55 jekstrand: Are float atomics really different from integer for min/max? If you ignore NaN, I don't think so.
21:59 jenatali: jekstrand: Probably not
22:02 jekstrand: Ok, now I'm onto things I might actually be able to fix
22:03 jekstrand: jenatali: How does translating intrinsics to opcodes work?
22:03 jekstrand: jenatali: Seeing intel_sub_group_ballot
22:03 jekstrand: There is actually an INTEL extension for that
22:03 jenatali: jekstrand: It's hardcoded in the LLVM SPIRV translator
22:03 jenatali: There's a map of mangled name <=> opcode
22:03 jekstrand: Hrm....
22:03 jekstrand: Like maybe test/transcoding/sub_group_ballot.ll
22:04 jekstrand: Well, that's a test
22:04 jenatali: I know I've seen INTEL extensions in that codebase
22:04 jekstrand: Looks like there's some stuff in SPIRVToOCL12
22:04 jekstrand: But I want the other way
22:06 Sachiel: is the translator picking up the required extensions or you, like the cmd line tool, you need to be explicit about it?
22:06 jekstrand: jenatali: Do you happen to know where this table is?
22:06 jenatali: Let me find it again...
22:08 jenatali: https://github.com/KhronosGroup/SPIRV-LLVM-Translator/blob/master/lib/SPIRV/OCLUtil.h
22:10 jenatali: jekstrand: FYI if you wanted to, you could probably use an approach similar to what we do for libclc. For libclc we take an opcode and convert it into a call to a function that comes from a separate NIR shader, which is then inlined
22:10 jenatali: You could have a NIR shader with implementations of those functions, and then you just need to match the call to the impl
22:27 jenatali: Would someone be willing to Marge https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6303 ?
22:42 jekstrand: When is LLVM 11 supposed to be released?
22:43 jekstrand: Wait... they're on 12 already
22:43 jenatali: August 26 according to https://llvm.org/
22:44 jekstrand: ah
22:46 jekstrand:clones LLVM
22:46 jekstrand: How painful is it to build your own LLVM and build llvm-spirv-translator against it?
22:47 jenatali: Shouldn't be bad - I've only done it in-tree personally though
22:48 kisak: jekstrand: llvm reliably misses their target dates by a bit, and it makes sense git-master got version bumped right after the llvm 11 branchpoint
22:49 jekstrand: It looks like the KHR CL extension I want isn't in clang 10
23:10 airlied: jekstrand: once you config llvm its not too bad
23:14 dcbaker[m]: Except for how long it takes to compile
23:18 krh: jenatali: done
23:18 jenatali: krh: Thanks :)
23:19 clever: mripard: would you be able to answer some questions on the pi4 pixel valve muxing? i know that there are 3 HVS channels, but 5 PV's, and there is some form of mux between those, and then some more muxing (or just broadcast?) from each PV to a fixed list of output ports
23:20 krh: jenatali: thanks for fixing
23:38 jenatali: krh: Looks like freedreno's got a flaky CI job... assuming it passes on re-run would you mind re-Marge-ing?
23:51 Kayden: (looks like Rob took care of it)
23:51 jenatali: Yep, thanks robclark
23:52 robclark: np