00:31 airlied: Venemo: I think the first patch is okay, the second might cause build issues on windows , I tend to use memset in that code
00:31 airlied: Venemo: also that code isn't anything to do with softpipe
00:32 airlied: it's gallivm/nir prefix, which is used by llvmpipe
01:52 airlied: oh man that libresoc is like the blind leading the blind
01:53 imirkin: airlied: https://xkcd.com/386/
01:57 airlied: imirkin: my tablet doesnt let me send email :-p
01:57 imirkin: good.
01:58 HdkR: But is their GPU any good? :P
01:58 airlied: its like designing llvmpipe by feeling elephants
01:59 imirkin: that guy like to write long emails, but i can't quite tell what the vision is.
02:46 chrisf: they appear to be off in the weeds
02:55 anholt: reinvent both larrabee and llvmpipe!
02:58 imirkin: i know larabee was a commercial fail, but is there a clear explanation of why anywhere?
03:02 imirkin: it doesn't *intrinsically* seem like fail ... is it because it didn't have fixed function rasterizer/clipper/etc?
03:06 jekstrand: karolherbst: Nice!
07:36 karolherbst: ugh... locking inside a shader is hard :/
07:45 TacoCodedSalad: is it normal for libxcb to leak memory in eglMakeCurrent? at xcb_register_for_special_xge https://ghostbin.co/paste/pcjzv
07:47 drawat: Is there a way to get visual of VKMS display?
07:54 emersion: someone wanted to add writeback support
07:54 emersion: but i don't think it's merged
07:54 emersion: or even posted on the ML
08:26 karolherbst: jekstrand: ehhh.. the lowering works on iris but not on nouveau :D
08:27 karolherbst: somehow I don't get my threads properly managed and they end up spinlock looping forever
08:27 karolherbst: I am sure the code is correct though :)
08:27 karolherbst: now somebody just needs to write up 64 bit shared load/stores in intel
08:28 pq: drawat, emersion, ask leandrohrb about VKMS writeback
08:28 pq: he's been writing weston code to make use of it
08:38 EdB: karolherbst: What I was saying is that it shoulb undedfined by default so the change should not be on clover side.
08:38 EdB: Also I'm pretty sure the piglit test is rigth
08:39 EdB: mh
08:40 EdB: well think of that the test would work if defined to zero
08:40 EdB: *thinking
08:42 EdB: I will have a look to it
08:45 EdB: karolherbst: the test is ok
08:47 EdB: https://paste.centos.org/view/3e98f9ac
08:56 karolherbst: EdB: mhh.. maybe clang doesn't add the define for radeonsi then?
08:56 karolherbst: I mean.. my patch works around a clang bug anyway
08:57 karolherbst: or maybe some compiler option needs to be set somewhere?
08:57 karolherbst: maybe it gets derived from the tripple
08:58 karolherbst: ahhh.. yeah...
08:58 karolherbst: clang adds it always when the tripple is spir
09:01 EdB: I think clang is wrong here
09:01 karolherbst: not necessarily
09:01 karolherbst: if you emit SPIR claiming image support is correct
09:01 karolherbst: or if you just compile your spirv
09:02 karolherbst: but loading that spirv on a device not supporting images will obviously fail
09:02 karolherbst: (if it ends up using images)
09:02 karolherbst: I think we should be more explicit about how to configure clang long term
09:02 karolherbst: and not use tripples at all except it's 100% correct
09:03 EdB: if you want image support, you define it, it should assume not image support by default
09:03 karolherbst: but llvm supporse images
09:03 karolherbst: *supports
09:04 karolherbst: and libclang doesn't know we want to convert it to something else later
09:04 karolherbst: we also say that we target spir, not whatever backend device we have
09:05 EdB: let's assume you have a code that does 'int have_image = __IMAGE_SUPPORT__
09:06 EdB: and then you do a 'if have_image' to chose a code path
09:06 EdB: It use the wrong one if the device have no image support
09:07 karolherbst: right, but that's nothing clang can fix
09:07 karolherbst: we compile spirv afterall
09:08 karolherbst: and spirv doesn't know about preprocessor stuff
09:09 karolherbst: so all what llvm can do at this point is to compile with image support and generate a spirv claiming it does image does
09:09 karolherbst: and it's up to whoever loads the spirv to the device to generate the proper ones
09:09 karolherbst: of course there should be a nicer flag to switch it
09:09 EdB: I don't think it's up to the compiler to auto define that kind of macro
09:09 karolherbst: it actually is in this case
09:10 karolherbst: llvm doesn't know anything about the runtime
09:10 karolherbst: anyway
09:10 karolherbst: they won't remove it
09:10 EdB: yes, but you still can pass the define at compile time is you want it
09:10 karolherbst: removing it would mean breaking every user
09:11 karolherbst: also, the spir target is quite well defined
09:11 karolherbst: and the output will either use images or won't and you can always compile two versions
09:11 karolherbst: which you have to do anyway
09:12 EdB: but it should have be undefined but default :/
09:12 karolherbst: yeah.. maybe
09:13 karolherbst: EdB: https://reviews.llvm.org/D40252
09:13 karolherbst: not much of a discussion going on there though
09:14 EdB: if you keep it on clover, please make it spirv dependant
09:14 EdB: clang also make mistake
09:14 karolherbst: I think we could keep it for all targets though as it really doesn't hurt
09:19 EdB: It's a spirv workaround, let's no give the impression that we sould undefine all current and future macro OpenCL adds
09:22 karolherbst: I don't give the impression we should. And if libclang starts doing more of this I am also happy to add non spirv workarounds the same way
09:22 karolherbst: I am just being pragmatic here
09:24 EdB: I think clang mistaken having target image support and device image support here
09:34 Venemo: airlied: ok, I will change it to memset and open a mr.
09:34 Venemo: I know it's not a big deal, but I prefer to see a build without warnings :P
09:38 karolherbst: EdB: oh well.. I let curro_ decide if I should move it into clover::llvm::compile_to_spirv or maybe even somewhere else
09:40 EdB: :)
09:42 karolherbst: EdB: btw, I will probably review your notification callback patch for program builds and I think it might make sense to move it into a separate MR + Fixes tags
09:42 karolherbst: or is this a 1.2 feature?
09:43 EdB: it's not 1.2 specific
09:43 karolherbst: right
09:43 karolherbst: this bug is super annoying for CL CTS runs as you have to timeout :)
09:43 EdB: I didn't think that 1.2 MR would last that long
09:43 EdB: yes
09:43 karolherbst: and the program tests are the only ones timing out
09:44 karolherbst: I think...
09:44 EdB: fot CTS, you also need the extra space stuff commit
09:45 karolherbst: heh.. with 1.1 I don't run into this issue? mhh
09:45 EdB: strange
09:46 karolherbst: yeah.. it was in test_api, no?
09:46 karolherbst: ohh no
09:46 EdB: I don't remember
09:46 karolherbst: it was compiler
09:46 EdB: seems so
09:47 karolherbst: ahh.. 5 fails in api without your patches :)
09:47 karolherbst: let's see what I need
09:48 karolherbst: maybe I review all of those then.. heh
09:48 karolherbst: I just want to make sure they don't regress anything
09:48 karolherbst: locally I have allocations, atomics and basic without fails :)
09:49 EdB: I have CTS test them, so let's hope they don't add regression on your side
09:50 EdB: but this is mostly core change so it should not impact target behavior
09:50 EdB: I even add piglit tests
09:51 karolherbst: mhh.. even with your patch get_kernel_arg_info and get_kernel_arg_info_compatibility still fail
09:51 karolherbst: mhh..
09:51 karolherbst: ohh. I see
09:52 EdB: here we looks at metadata
09:52 EdB: are they missing ?
09:53 EdB: ha !
09:53 EdB: I do remember that didn't implements them on spirv
09:54 EdB: it didn't build on my machine at that time so I could test them, and pass empty data
10:23 karolherbst: mhhh
10:23 karolherbst: those fail because clang misses cl_khr_fp64 mhhh
10:37 karolherbst: ohh wait.. mhh
10:38 karolherbst: okay works with 1.2.. now the spirv bits
11:54 eric_engestrom: so, I started doing stuff on blender again after not having touched it in like 6 months, and now it randomly freezes the whole WM/compositor (tried on both sway+xwayland and i3)
11:55 eric_engestrom: sometimes the freeze happens after 30 seconds, other times I can use it for an hour or two before anything happens
11:56 eric_engestrom: it freezes the entire UI with a loop of the last two frames, but I can still switch to the console and use the computer from there, and if I come back to the UI it just shows black
11:57 eric_engestrom: and the UI still responds to the keyboard, so I can save my work in blender, quit the UI going the login manager and restart the WM or compositor
11:57 eric_engestrom: I'm not sure where to start debugging this
12:00 eric_engestrom: linux 5.8.0-5.8.3, mesa 20.1.4-20.1.6, blender 2.83.4-2.83.5; happy to bisect once I know which component to blame for this: kernel, mesa, something else?
12:12 dj-death: eric_engestrom: if you don't see any GPU hang we can probably rule out pure driver stuff
12:12 dj-death: eric_engestrom: might be a window system issue
12:13 dj-death: eric_engestrom: blender is still X11 right?
12:14 eric_engestrom: yup, there's on-going work to support wayland but it's not ready yet and not compiled in on the distro package I use
13:03 swivel:
13:05 DPA: Assuming in X11, I see a situation where drmModeAddFB/drmModeAddFB2WithModifiers work, and one where they fail with -2, but both instances are called
13:05 DPA: with the same arguments, and the buffer has been allocated using gbm_bo_create with the same arguments, what else could be different?
13:07 karolherbst: jenatali: what do you do to pass the api get_kernel_arg_info and get_kernel_arg_info_compatibility tests?
13:38 jenatali: karolherbst: We fail those due to https://github.com/KhronosGroup/SPIRV-LLVM-Translator/issues/599
13:39 karolherbst: right...
13:39 karolherbst: heh..
13:39 karolherbst: that works though
13:39 karolherbst: they are just.. somewhere else
13:39 karolherbst: jenatali: https://gitlab.freedesktop.org/karolherbst/mesa/-/commit/530105de2d96518a2e82db9b6047f116249a474f
13:40 karolherbst: that works except of type_names
13:40 karolherbst: you need to parse the functionparameter attributes
13:40 karolherbst: NoAlias and NoWrite
13:40 jenatali: Ooh...
13:40 karolherbst: yeah..
13:40 karolherbst: took me a while as well
13:40 jenatali: Thanks for that
13:40 karolherbst: just the way type names are stored is painful :/
13:40 jenatali: You're having problems with names?
13:41 karolherbst: I guess you also parse that hack the translator uses?
13:41 karolherbst: that kernel_arg_type stuff
13:41 jenatali: karolherbst: https://gitlab.freedesktop.org/kusma/mesa/-/blob/msclc-d3d12/src/microsoft/clc/clc_helpers.cpp#L235
13:41 karolherbst: right...
13:42 karolherbst: yeah.. that was my plan as well
13:42 karolherbst: c++ is just being annoying and "core" functions like string::starts_with are C++20 :D
13:42 karolherbst: ahh.. find..
13:42 karolherbst: ehh
13:43 karolherbst: heh
13:43 karolherbst: your code seems... wrong?
13:43 karolherbst: str.find("kernel_arg_type.") should be 0 if the string starts with that
13:43 jenatali: Yep, so if it doesn't start with it, we skip it?
13:44 karolherbst: ohh.. right..
13:44 karolherbst: :D
13:44 karolherbst: that works for all args, right?
13:45 jenatali: Our only failures are missing const/restrict right now
13:45 jenatali: Which I'll try to hook up thanks to what you pointed out - after I'm awake enough to actually understand it :P
13:45 karolherbst: :D
13:46 karolherbst: yeah.. well I will hook up the type_names now :)
13:46 karolherbst: I was just reveiwing the CL 1.2 stuff for clover and noticed that that's LLVM only for now
13:46 jenatali: Yeah
13:46 jenatali: SPIR-V won't be far behind though :)
13:46 karolherbst: spir-v can't do it anyway :p
13:46 jenatali: Can't do it?
13:47 karolherbst: well
13:47 karolherbst: it's not specified how
13:47 karolherbst: we just rely on the translator adding that as a OpString
13:47 jenatali: What 'it'?
13:47 jenatali: Oh the names?
13:47 karolherbst: yes
13:47 karolherbst: you can't even construct it back from the type declarations :/
13:47 jenatali: Right
13:47 karolherbst: well..
13:47 karolherbst: you can good enough for every dev
13:48 karolherbst: but not good enough for the spec :D
13:48 jenatali: Right :P but the string is present if the kernel was compiled with the arg info flag
13:50 karolherbst: because the translator puts it there
13:50 karolherbst: that all breaks if somebody uses a different toolchain :)
13:50 jenatali: Ah, sure
13:51 jenatali: Dunno if you're required to support that correctly on SPIR-V-created kernels
13:51 jenatali: "Kernel argument information is only available if the program object associated with kernel is created with clCreateProgramWithSource and the program executable was built with the -cl-kernel-arg-info option specified in options argument to clBuildProgram or clCompileProgram."
13:51 karolherbst: also.. please use std::getline(istream, line, '.') :D
13:52 karolherbst: but now idea if that would be any faster
13:52 karolherbst: just easier to write and read
13:53 jenatali: Probably not, just less code
13:53 jenatali: I didn't write this bit, but yeah could be cleaner
13:54 karolherbst: yeah..
13:54 karolherbst: 6 loc for the full parsing :)
13:54 karolherbst: uhm.. maybe 10
13:55 karolherbst: now let's see..
14:00 karolherbst: PASSED :)
14:00 jenatali: :)
14:02 karolherbst: EdB_: mind picking up this commit into your MR? https://gitlab.freedesktop.org/karolherbst/mesa/-/commit/5380cc9f39c806d0027f006282b2c780eedacbe1
14:02 karolherbst: pmoreau: you might want to take a look as well :)
14:04 jenatali: karolherbst: Do you handle the workgroup size metadata?
14:04 karolherbst: nope
14:04 karolherbst: is that checked in different tests?
14:04 jenatali: It's somewhere
14:04 karolherbst: mhhh
14:05 jenatali: ... I think
14:06 jenatali: test_api kernel_required_group_size
14:07 karolherbst: ahh right..
14:08 jenatali: The hint isn't tested anywhere though... except as part of the SPIR tests: https://github.com/KhronosGroup/OpenCL-CTS/issues/832
14:09 karolherbst: well, how'd you test a hint anyway?
14:09 jenatali: At least reflect it from the metadata? Atomic_max the local_id seen in the kernel?
14:09 jenatali:shrugs
14:10 jenatali: I just mean they don't even pass it to you to see if you compile it :P
14:11 karolherbst: ehhh...
14:11 karolherbst: right.. api is broken as it requires 1.2
14:11 karolherbst: annoying
14:12 karolherbst: if you run it with 1.1 it just uses doubles
14:12 karolherbst: and doesn't declare the extension :)
14:12 jenatali: Right, the open-source CTS is 1.2+. Apparently pmoreau was trying to add 1.1 support if I recall?
14:12 karolherbst: yeah.. I don't mind running with 1.2
14:12 karolherbst: I just get to a point where I can actually track regressions
14:13 karolherbst: I am just worried enabling 1.2 makes stuff fail
14:13 karolherbst: nice
14:13 karolherbst: api: Pass 79 Fails 2 Crashes 0 Timeouts 0
14:13 jenatali: What's left?
14:13 pmoreau: jenatali: Correct; WIP is available at https://github.com/pierremoreau/OpenCL-CTS/commits/support_opencl_1.0_and_1.1.
14:13 karolherbst: get_platform_info and kernel_required_group_size :D
14:13 jenatali: pmoreau: Cool :)
14:14 jenatali: karolherbst: Easy stuff ;)
14:14 karolherbst: get_platform_info ehhh...
14:14 karolherbst: "ERROR: OpenCL profile version returned is less than 1.2!" yeah...
14:15 karolherbst: PASSED :D
14:15 pmoreau: Such a hard one to fix… :-D
14:15 karolherbst: kernel_required_group_size is also quite simple
14:49 karolherbst: jenatali: :
14:49 karolherbst: shader: MESA_SHADER_KERNEL
14:49 karolherbst: local-size: 64, 14, 1
14:49 karolherbst: :)
14:50 jenatali: Huzzah
14:50 karolherbst: but now I have no idea how well that works in combination with offsets :D
14:50 karolherbst: but it seems fine?
14:50 jenatali: Should be fine
14:50 karolherbst: the difference is massive for small kernels though
14:51 karolherbst: 22 vs 29 instructions
14:52 jenatali: Which is why I turn them off when the API didn't pass any offsets in
14:52 jenatali: 99% of the time they'll be 0
14:52 jenatali: And when they are used, I recompile the kernel to allow them
14:52 karolherbst: yeah...
14:52 karolherbst: but that was with offsets
14:53 karolherbst: just specified local_size
14:53 karolherbst: I guess without offsets it's a bit better
14:59 karolherbst: at some point lower_bit_sizes should support converts as well :D
15:00 jenatali: Hm?
15:00 karolherbst: yeah.. we can't do all combinations in hardware
15:00 karolherbst: and lower bit sizes does trivial argument/destination scaling
15:00 karolherbst: and it doesn't seem like to really deal with operation where dest and sources have different sizes?
15:01 jenatali: I've got a patch that helps that
15:01 karolherbst: ahh
15:01 jenatali: It won't make it do converts
15:01 jenatali: But bit_count is broken without it since the dest is always 32bit
15:02 jenatali: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6313/diffs?commit_id=529bae69420bcac4fd98d78f9808ca31ebfa2400
15:02 karolherbst: ahh
15:07 jenatali: FWIW DXIL doesn't have 8bit types at all (only for metadata, not math). So all of our loads produce 32bit types, then we cast to 8bit as necessary. After lower_bit_size, we end up with (32bit load -> cast to 8bit -> cast to 16bit -> 16bit alu -> cast to 8bit -> cast to 32bit -> 32bit store)
15:08 jenatali: And then we run a pass to remove all the (cast to 8bit -> cast to 16bit) and (cast to 8bit -> cast to 32bit) pairs and replace them with masks and shifts
15:08 jenatali: And some GPUs don't even expose 16bit types through D3D, so for those we go all the way to 32bit with lower_bit_size and do the same for 16bit casts
15:09 karolherbst: right..
15:10 karolherbst: our ALU is mostly 32 bit anyway
15:10 karolherbst: annoying are just those special cases like 8bit int to 64 bit float
15:11 karolherbst: but that's easy, you just assume 32bit int and do a 32 bit to fp64 cast
15:11 karolherbst: uhm
15:11 karolherbst: u2f
15:11 karolherbst: fp64 to u8 is more annoying
15:11 karolherbst: need to convert to 32 bit in an intermediate step
15:13 jenatali: Yeah, we do that by detecting the (f2u8, u2<>32) pair, since we're guaranteed to have that second one due to how we do loads/stores/alus, and just truncate the value in the resulting type to the (0, 255) range
15:53 karolherbst: ehh.. seems like spilling of 64 bit values is busted for us :/
19:37 airlied: imirkin: vectorisation was one of the reasons lrb died
19:37 airlied: but I think yeah no fixed function wasn't a great idea
20:41 imirkin: airlied: isn't the arch largely similar to GCN wrt vectorization?
20:42 imirkin: i.e. any arch that doesn't let you forget that there are other lanes running
20:43 imirkin: or is this the vec4-per-lane vs scalar-per-lane discussion?
20:46 airlied: lrb was avx512 on pentium cores pretty much
20:47 airlied: so i think it relied on compiler vectorisation
20:49 airlied: though it had some interleaving stuff as well
20:50 imirkin: couldn't it decree that one scalar = one lane, and vectorize that way? or does that not work?
20:50 imirkin: (that does require a massive quantity of regs...)
20:51 imirkin: (don't mean to beat a dead horse, just trying to understand why such a strategy is impractical)
20:52 imirkin: i.e. what's the real differents between SIMD16 on gen and AVX512
20:52 airlied: no i dont think it worked like that
20:52 airlied: it relied on compiler vectorisatio
20:53 imirkin: right. but the compiler could just punt it and skip vectorization and let multiple lanes do their work? i.e. dispatch the data that way.
20:53 airlied: https://pharr.org/matt/blog/2018/04/18/ispc-origins.html
20:54 airlied: has some details
20:54 karolherbst: imirkin: AVX512 is super painful though
20:55 karolherbst: by using AVX512 you make everything else slower
20:55 karolherbst: and it's a separate thing alltogether
20:56 airlied: http://tomforsyth1000.github.io/blog.wiki.html#%5B%5BWhy%20didn%27t%20Larrabee%20fail%3F%5D%5D
20:56 airlied: is also good
20:56 airlied: karolherbst: that is avx512 on xeon
20:56 airlied: the slower thjng
20:56 airlied: not in the larrabee ctx
20:57 karolherbst: ahh
20:57 imirkin: karolherbst: that's implementation detail, not architectural fail