00:31airlied: Venemo: I think the first patch is okay, the second might cause build issues on windows , I tend to use memset in that code
00:31airlied: Venemo: also that code isn't anything to do with softpipe
00:32airlied: it's gallivm/nir prefix, which is used by llvmpipe
01:52airlied: oh man that libresoc is like the blind leading the blind
01:53imirkin: airlied: https://xkcd.com/386/
01:57airlied: imirkin: my tablet doesnt let me send email :-p
01:58HdkR: But is their GPU any good? :P
01:58airlied: its like designing llvmpipe by feeling elephants
01:59imirkin: that guy like to write long emails, but i can't quite tell what the vision is.
02:46chrisf: they appear to be off in the weeds
02:55anholt: reinvent both larrabee and llvmpipe!
02:58imirkin: i know larabee was a commercial fail, but is there a clear explanation of why anywhere?
03:02imirkin: it doesn't *intrinsically* seem like fail ... is it because it didn't have fixed function rasterizer/clipper/etc?
03:06jekstrand: karolherbst: Nice!
07:36karolherbst: ugh... locking inside a shader is hard :/
07:45TacoCodedSalad: is it normal for libxcb to leak memory in eglMakeCurrent? at xcb_register_for_special_xge https://ghostbin.co/paste/pcjzv
07:47drawat: Is there a way to get visual of VKMS display?
07:54emersion: someone wanted to add writeback support
07:54emersion: but i don't think it's merged
07:54emersion: or even posted on the ML
08:26karolherbst: jekstrand: ehhh.. the lowering works on iris but not on nouveau :D
08:27karolherbst: somehow I don't get my threads properly managed and they end up spinlock looping forever
08:27karolherbst: I am sure the code is correct though :)
08:27karolherbst: now somebody just needs to write up 64 bit shared load/stores in intel
08:28pq: drawat, emersion, ask leandrohrb about VKMS writeback
08:28pq: he's been writing weston code to make use of it
08:38EdB: karolherbst: What I was saying is that it shoulb undedfined by default so the change should not be on clover side.
08:38EdB: Also I'm pretty sure the piglit test is rigth
08:40EdB: well think of that the test would work if defined to zero
08:42EdB: I will have a look to it
08:45EdB: karolherbst: the test is ok
08:56karolherbst: EdB: mhh.. maybe clang doesn't add the define for radeonsi then?
08:56karolherbst: I mean.. my patch works around a clang bug anyway
08:57karolherbst: or maybe some compiler option needs to be set somewhere?
08:57karolherbst: maybe it gets derived from the tripple
08:58karolherbst: ahhh.. yeah...
08:58karolherbst: clang adds it always when the tripple is spir
09:01EdB: I think clang is wrong here
09:01karolherbst: not necessarily
09:01karolherbst: if you emit SPIR claiming image support is correct
09:01karolherbst: or if you just compile your spirv
09:02karolherbst: but loading that spirv on a device not supporting images will obviously fail
09:02karolherbst: (if it ends up using images)
09:02karolherbst: I think we should be more explicit about how to configure clang long term
09:02karolherbst: and not use tripples at all except it's 100% correct
09:03EdB: if you want image support, you define it, it should assume not image support by default
09:03karolherbst: but llvm supporse images
09:04karolherbst: and libclang doesn't know we want to convert it to something else later
09:04karolherbst: we also say that we target spir, not whatever backend device we have
09:05EdB: let's assume you have a code that does 'int have_image = __IMAGE_SUPPORT__
09:06EdB: and then you do a 'if have_image' to chose a code path
09:06EdB: It use the wrong one if the device have no image support
09:07karolherbst: right, but that's nothing clang can fix
09:07karolherbst: we compile spirv afterall
09:08karolherbst: and spirv doesn't know about preprocessor stuff
09:09karolherbst: so all what llvm can do at this point is to compile with image support and generate a spirv claiming it does image does
09:09karolherbst: and it's up to whoever loads the spirv to the device to generate the proper ones
09:09karolherbst: of course there should be a nicer flag to switch it
09:09EdB: I don't think it's up to the compiler to auto define that kind of macro
09:09karolherbst: it actually is in this case
09:10karolherbst: llvm doesn't know anything about the runtime
09:10karolherbst: they won't remove it
09:10EdB: yes, but you still can pass the define at compile time is you want it
09:10karolherbst: removing it would mean breaking every user
09:11karolherbst: also, the spir target is quite well defined
09:11karolherbst: and the output will either use images or won't and you can always compile two versions
09:11karolherbst: which you have to do anyway
09:12EdB: but it should have be undefined but default :/
09:12karolherbst: yeah.. maybe
09:13karolherbst: EdB: https://reviews.llvm.org/D40252
09:13karolherbst: not much of a discussion going on there though
09:14EdB: if you keep it on clover, please make it spirv dependant
09:14EdB: clang also make mistake
09:14karolherbst: I think we could keep it for all targets though as it really doesn't hurt
09:19EdB: It's a spirv workaround, let's no give the impression that we sould undefine all current and future macro OpenCL adds
09:22karolherbst: I don't give the impression we should. And if libclang starts doing more of this I am also happy to add non spirv workarounds the same way
09:22karolherbst: I am just being pragmatic here
09:24EdB: I think clang mistaken having target image support and device image support here
09:34Venemo: airlied: ok, I will change it to memset and open a mr.
09:34Venemo: I know it's not a big deal, but I prefer to see a build without warnings :P
09:38karolherbst: EdB: oh well.. I let curro_ decide if I should move it into clover::llvm::compile_to_spirv or maybe even somewhere else
09:42karolherbst: EdB: btw, I will probably review your notification callback patch for program builds and I think it might make sense to move it into a separate MR + Fixes tags
09:42karolherbst: or is this a 1.2 feature?
09:43EdB: it's not 1.2 specific
09:43karolherbst: this bug is super annoying for CL CTS runs as you have to timeout :)
09:43EdB: I didn't think that 1.2 MR would last that long
09:43karolherbst: and the program tests are the only ones timing out
09:44karolherbst: I think...
09:44EdB: fot CTS, you also need the extra space stuff commit
09:45karolherbst: heh.. with 1.1 I don't run into this issue? mhh
09:46karolherbst: yeah.. it was in test_api, no?
09:46karolherbst: ohh no
09:46EdB: I don't remember
09:46karolherbst: it was compiler
09:46EdB: seems so
09:47karolherbst: ahh.. 5 fails in api without your patches :)
09:47karolherbst: let's see what I need
09:48karolherbst: maybe I review all of those then.. heh
09:48karolherbst: I just want to make sure they don't regress anything
09:48karolherbst: locally I have allocations, atomics and basic without fails :)
09:49EdB: I have CTS test them, so let's hope they don't add regression on your side
09:50EdB: but this is mostly core change so it should not impact target behavior
09:50EdB: I even add piglit tests
09:51karolherbst: mhh.. even with your patch get_kernel_arg_info and get_kernel_arg_info_compatibility still fail
09:51karolherbst: ohh. I see
09:52EdB: here we looks at metadata
09:52EdB: are they missing ?
09:53EdB: ha !
09:53EdB: I do remember that didn't implements them on spirv
09:54EdB: it didn't build on my machine at that time so I could test them, and pass empty data
10:23karolherbst: those fail because clang misses cl_khr_fp64 mhhh
10:37karolherbst: ohh wait.. mhh
10:38karolherbst: okay works with 1.2.. now the spirv bits
11:54eric_engestrom: so, I started doing stuff on blender again after not having touched it in like 6 months, and now it randomly freezes the whole WM/compositor (tried on both sway+xwayland and i3)
11:55eric_engestrom: sometimes the freeze happens after 30 seconds, other times I can use it for an hour or two before anything happens
11:56eric_engestrom: it freezes the entire UI with a loop of the last two frames, but I can still switch to the console and use the computer from there, and if I come back to the UI it just shows black
11:57eric_engestrom: and the UI still responds to the keyboard, so I can save my work in blender, quit the UI going the login manager and restart the WM or compositor
11:57eric_engestrom: I'm not sure where to start debugging this
12:00eric_engestrom: linux 5.8.0-5.8.3, mesa 20.1.4-20.1.6, blender 2.83.4-2.83.5; happy to bisect once I know which component to blame for this: kernel, mesa, something else?
12:12dj-death: eric_engestrom: if you don't see any GPU hang we can probably rule out pure driver stuff
12:12dj-death: eric_engestrom: might be a window system issue
12:13dj-death: eric_engestrom: blender is still X11 right?
12:14eric_engestrom: yup, there's on-going work to support wayland but it's not ready yet and not compiled in on the distro package I use
13:05DPA: Assuming in X11, I see a situation where drmModeAddFB/drmModeAddFB2WithModifiers work, and one where they fail with -2, but both instances are called
13:05DPA: with the same arguments, and the buffer has been allocated using gbm_bo_create with the same arguments, what else could be different?
13:07karolherbst: jenatali: what do you do to pass the api get_kernel_arg_info and get_kernel_arg_info_compatibility tests?
13:38jenatali: karolherbst: We fail those due to https://github.com/KhronosGroup/SPIRV-LLVM-Translator/issues/599
13:39karolherbst: that works though
13:39karolherbst: they are just.. somewhere else
13:39karolherbst: jenatali: https://gitlab.freedesktop.org/karolherbst/mesa/-/commit/530105de2d96518a2e82db9b6047f116249a474f
13:40karolherbst: that works except of type_names
13:40karolherbst: you need to parse the functionparameter attributes
13:40karolherbst: NoAlias and NoWrite
13:40karolherbst: took me a while as well
13:40jenatali: Thanks for that
13:40karolherbst: just the way type names are stored is painful :/
13:40jenatali: You're having problems with names?
13:41karolherbst: I guess you also parse that hack the translator uses?
13:41karolherbst: that kernel_arg_type stuff
13:41jenatali: karolherbst: https://gitlab.freedesktop.org/kusma/mesa/-/blob/msclc-d3d12/src/microsoft/clc/clc_helpers.cpp#L235
13:42karolherbst: yeah.. that was my plan as well
13:42karolherbst: c++ is just being annoying and "core" functions like string::starts_with are C++20 :D
13:42karolherbst: ahh.. find..
13:43karolherbst: your code seems... wrong?
13:43karolherbst: str.find("kernel_arg_type.") should be 0 if the string starts with that
13:43jenatali: Yep, so if it doesn't start with it, we skip it?
13:44karolherbst: ohh.. right..
13:44karolherbst: that works for all args, right?
13:45jenatali: Our only failures are missing const/restrict right now
13:45jenatali: Which I'll try to hook up thanks to what you pointed out - after I'm awake enough to actually understand it :P
13:46karolherbst: yeah.. well I will hook up the type_names now :)
13:46karolherbst: I was just reveiwing the CL 1.2 stuff for clover and noticed that that's LLVM only for now
13:46jenatali: SPIR-V won't be far behind though :)
13:46karolherbst: spir-v can't do it anyway :p
13:46jenatali: Can't do it?
13:47karolherbst: it's not specified how
13:47karolherbst: we just rely on the translator adding that as a OpString
13:47jenatali: What 'it'?
13:47jenatali: Oh the names?
13:47karolherbst: you can't even construct it back from the type declarations :/
13:47karolherbst: you can good enough for every dev
13:48karolherbst: but not good enough for the spec :D
13:48jenatali: Right :P but the string is present if the kernel was compiled with the arg info flag
13:50karolherbst: because the translator puts it there
13:50karolherbst: that all breaks if somebody uses a different toolchain :)
13:50jenatali: Ah, sure
13:51jenatali: Dunno if you're required to support that correctly on SPIR-V-created kernels
13:51jenatali: "Kernel argument information is only available if the program object associated with kernel is created with clCreateProgramWithSource and the program executable was built with the -cl-kernel-arg-info option specified in options argument to clBuildProgram or clCompileProgram."
13:51karolherbst: also.. please use std::getline(istream, line, '.') :D
13:52karolherbst: but now idea if that would be any faster
13:52karolherbst: just easier to write and read
13:53jenatali: Probably not, just less code
13:53jenatali: I didn't write this bit, but yeah could be cleaner
13:54karolherbst: 6 loc for the full parsing :)
13:54karolherbst: uhm.. maybe 10
13:55karolherbst: now let's see..
14:00karolherbst: PASSED :)
14:02karolherbst: EdB_: mind picking up this commit into your MR? https://gitlab.freedesktop.org/karolherbst/mesa/-/commit/5380cc9f39c806d0027f006282b2c780eedacbe1
14:02karolherbst: pmoreau: you might want to take a look as well :)
14:04jenatali: karolherbst: Do you handle the workgroup size metadata?
14:04karolherbst: is that checked in different tests?
14:04jenatali: It's somewhere
14:05jenatali: ... I think
14:06jenatali: test_api kernel_required_group_size
14:07karolherbst: ahh right..
14:08jenatali: The hint isn't tested anywhere though... except as part of the SPIR tests: https://github.com/KhronosGroup/OpenCL-CTS/issues/832
14:09karolherbst: well, how'd you test a hint anyway?
14:09jenatali: At least reflect it from the metadata? Atomic_max the local_id seen in the kernel?
14:10jenatali: I just mean they don't even pass it to you to see if you compile it :P
14:11karolherbst: right.. api is broken as it requires 1.2
14:12karolherbst: if you run it with 1.1 it just uses doubles
14:12karolherbst: and doesn't declare the extension :)
14:12jenatali: Right, the open-source CTS is 1.2+. Apparently pmoreau was trying to add 1.1 support if I recall?
14:12karolherbst: yeah.. I don't mind running with 1.2
14:12karolherbst: I just get to a point where I can actually track regressions
14:13karolherbst: I am just worried enabling 1.2 makes stuff fail
14:13karolherbst: api: Pass 79 Fails 2 Crashes 0 Timeouts 0
14:13jenatali: What's left?
14:13pmoreau: jenatali: Correct; WIP is available at https://github.com/pierremoreau/OpenCL-CTS/commits/support_opencl_1.0_and_1.1.
14:13karolherbst: get_platform_info and kernel_required_group_size :D
14:13jenatali: pmoreau: Cool :)
14:14jenatali: karolherbst: Easy stuff ;)
14:14karolherbst: get_platform_info ehhh...
14:14karolherbst: "ERROR: OpenCL profile version returned is less than 1.2!" yeah...
14:15karolherbst: PASSED :D
14:15pmoreau: Such a hard one to fix… :-D
14:15karolherbst: kernel_required_group_size is also quite simple
14:49karolherbst: jenatali: :
14:49karolherbst: shader: MESA_SHADER_KERNEL
14:49karolherbst: local-size: 64, 14, 1
14:50karolherbst: but now I have no idea how well that works in combination with offsets :D
14:50karolherbst: but it seems fine?
14:50jenatali: Should be fine
14:50karolherbst: the difference is massive for small kernels though
14:51karolherbst: 22 vs 29 instructions
14:52jenatali: Which is why I turn them off when the API didn't pass any offsets in
14:52jenatali: 99% of the time they'll be 0
14:52jenatali: And when they are used, I recompile the kernel to allow them
14:52karolherbst: but that was with offsets
14:53karolherbst: just specified local_size
14:53karolherbst: I guess without offsets it's a bit better
14:59karolherbst: at some point lower_bit_sizes should support converts as well :D
15:00karolherbst: yeah.. we can't do all combinations in hardware
15:00karolherbst: and lower bit sizes does trivial argument/destination scaling
15:00karolherbst: and it doesn't seem like to really deal with operation where dest and sources have different sizes?
15:01jenatali: I've got a patch that helps that
15:01jenatali: It won't make it do converts
15:01jenatali: But bit_count is broken without it since the dest is always 32bit
15:07jenatali: FWIW DXIL doesn't have 8bit types at all (only for metadata, not math). So all of our loads produce 32bit types, then we cast to 8bit as necessary. After lower_bit_size, we end up with (32bit load -> cast to 8bit -> cast to 16bit -> 16bit alu -> cast to 8bit -> cast to 32bit -> 32bit store)
15:08jenatali: And then we run a pass to remove all the (cast to 8bit -> cast to 16bit) and (cast to 8bit -> cast to 32bit) pairs and replace them with masks and shifts
15:08jenatali: And some GPUs don't even expose 16bit types through D3D, so for those we go all the way to 32bit with lower_bit_size and do the same for 16bit casts
15:10karolherbst: our ALU is mostly 32 bit anyway
15:10karolherbst: annoying are just those special cases like 8bit int to 64 bit float
15:11karolherbst: but that's easy, you just assume 32bit int and do a 32 bit to fp64 cast
15:11karolherbst: fp64 to u8 is more annoying
15:11karolherbst: need to convert to 32 bit in an intermediate step
15:13jenatali: Yeah, we do that by detecting the (f2u8, u2<>32) pair, since we're guaranteed to have that second one due to how we do loads/stores/alus, and just truncate the value in the resulting type to the (0, 255) range
15:53karolherbst: ehh.. seems like spilling of 64 bit values is busted for us :/
19:37airlied: imirkin: vectorisation was one of the reasons lrb died
19:37airlied: but I think yeah no fixed function wasn't a great idea
20:41imirkin: airlied: isn't the arch largely similar to GCN wrt vectorization?
20:42imirkin: i.e. any arch that doesn't let you forget that there are other lanes running
20:43imirkin: or is this the vec4-per-lane vs scalar-per-lane discussion?
20:46airlied: lrb was avx512 on pentium cores pretty much
20:47airlied: so i think it relied on compiler vectorisation
20:49airlied: though it had some interleaving stuff as well
20:50imirkin: couldn't it decree that one scalar = one lane, and vectorize that way? or does that not work?
20:50imirkin: (that does require a massive quantity of regs...)
20:51imirkin: (don't mean to beat a dead horse, just trying to understand why such a strategy is impractical)
20:52imirkin: i.e. what's the real differents between SIMD16 on gen and AVX512
20:52airlied: no i dont think it worked like that
20:52airlied: it relied on compiler vectorisatio
20:53imirkin: right. but the compiler could just punt it and skip vectorization and let multiple lanes do their work? i.e. dispatch the data that way.
20:54airlied: has some details
20:54karolherbst: imirkin: AVX512 is super painful though
20:55karolherbst: by using AVX512 you make everything else slower
20:55karolherbst: and it's a separate thing alltogether
20:56airlied: is also good
20:56airlied: karolherbst: that is avx512 on xeon
20:56airlied: the slower thjng
20:56airlied: not in the larrabee ctx
20:57imirkin: karolherbst: that's implementation detail, not architectural fail