00:05 jekstrand: karolherbst: Pass 92 Fails 8 Crashes 2 Timeouts 4
00:05 jenatali: \o/
00:05 jekstrand: karolherbst: I'd say it looks like it works from my end
00:05 jenatali: I'm really excited how fast Clover is improving
00:21 airlied: dcbaker: I don't see the GL 4.5 enable patch in taging/20.2 any reason?
00:32 dcbaker[m]: airlied: there were some conflicts and I didn't have time to figure it out yesterday.
00:32 dcbaker[m]: They may be trivial, I just haven't looked yet today
00:33 airlied: dcbaker[m]: ah cool, it might be the relnotes or the CI files I suppose
00:33 airlied: I think it was relnotes only when I cherry-picked it the other day
00:34 dcbaker[m]: Yeah, that wouldn't surprise me. I'll get it figured out tonight after dinner
00:43 airlied: dcbaker[m]: thx!
00:43 airlied: anholt: yeah comes in at 10 mins for that 1/10 split
00:46 airlied: uggh I wonder if I need to add target to gallium image views
00:46 airlied: to get the vulkan semantics
00:50 airlied: hmm maybe I can mangle it
07:42 pmoreau: karolherbst: I could… but my bandwidth is going to be quite limited at the moment, and I would like to get !5038 merged first, and !4974 could be nice too for OpenCL 1.2 (+ some fixes for earlier versions IIRC); after that, cl_khr_extended_versioning + !2078 could be good though not that important at the moment, and it would be nice to land some of the nv50 patches but they need to be cleaned up and fixed first.
07:46 pmoreau: karolherbst: Updates are welcomed for https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4979, especially since you have quite a few tests passing locally it would be good to confirm that what we have in clover is indeed working and change from my temporary “in progress” to “DONE”. Landing it would be good to track what is missing for a particular version.
11:03 shirsch: Hello , I hit an issue with KwinFt wayland compositor, where a resolution for a screen can't be set. The core issue for this seems toi be, that some resolution modes seem to be reported twice for some reason. As a system I use archlinux with mesa 20.1.6 and kernel version 5.8.1. If someone is interested in a drm_info log, I could provide that too.
11:07 romangg: The preferred mode seems to be reported twice (once in the beginning and once in the end). Haven't seen that before.
11:08 shirsch: My graphics card is an amd radeon pro wx4100 by the way.
11:09 romangg: correction: it's once reported without any flags in the beginning and once in the end with current+preferred.
11:10 romangg: drm_info + wayland-info log: https://gitlab.com/kwinft/kwinft/-/issues/70#note_399327397
14:07 tomeu: jekstrand: sorry, the MR you assigned to Marge will fail CI
14:07 tomeu: next in the queue is a patch disabling the jobs that fail atm
14:07 tomeu: so if you reassign to Marge in a short bit, it should pass through
14:07 daniels: tomeu: you can just cancel the jobs and Marge will insta-fail
14:11 tomeu: daniels: yeah, already did that
14:12 daniels: nice!
14:21 pcercuei: I think FB_DAMAGE_CLIPS is broken
14:22 pcercuei: on libdrm side, the blob is set to an array of "struct drm_clip_rect"
14:22 pcercuei: on kernel DRM side, it expects the property to be an array of "struct drm_rect"
14:22 pcercuei: the former uses "unsigned short" fields, the latter uses "int" fields
14:24 pcercuei: (I guess I'm just not using the right userspace struct)
14:25 jenatali: karolherbst: Got any suggestions for a name for an inverted "lower_cs_global_id_from_local"? I could just add "dont" or something :P
14:25 pcercuei: meh. It was "struct drm_mode_rect". Sorry for the noise
14:25 karolherbst: jenatali: has_global_id?
14:25 jenatali: That could work
14:26 jenatali: Cool, will do
14:33 karolherbst: ehh.. looking at the API validation of clover, we need a CL_MESA_no_error extension :D
14:33 jenatali: What would that do?
14:33 karolherbst: skip all API validation
14:33 karolherbst: we have that for GL already
14:33 jenatali: But... why?
14:33 karolherbst: lower CPU overhead?
14:33 jenatali: Ah
14:34 karolherbst: ohh, actually it seems like the GL extension was first: https://www.khronos.org/registry/OpenGL/extensions/KHR/KHR_no_error.txt
14:34 karolherbst: :D
14:37 karolherbst: I think modesetting even uses it?
14:37 karolherbst: mhh
14:45 jekstrand: jenatali: I think ImageSizeLod is actually legal in Vulkan; we've just never seen it because glslang doesn't generate it. :-/
14:45 jenatali: :D
14:45 jenatali: That doesn't surprise me at all, it seems really odd that they would've added it just for CL C++
14:53 jekstrand: This is the classic "the CTS only tests the output of GLSLang" problem
14:53 jenatali: I wonder, does DXC generate it?
14:55 jekstrand: Not AFAIK
14:55 jekstrand: jenatali: I just filed a CTS bug about it.
14:55 jenatali: Cool :)
14:56 jekstrand: jenatali: Makes me even more sure that just adding lod to the intrinsic is better
14:56 jenatali: jekstrand: Agreed. Thanks for that
14:57 Kayden: is the lod still required to be zero? if not, we should probably hook it up
14:58 Kayden: it looks trivial to do on radeon and intel; I didn't look at nouveau
15:00 jekstrand: Actually... It's not required for Vulkan:
15:00 jekstrand: OpImageQuerySizeLod, and OpImageQueryLevels must only consume an “Image” operand whose type has its “Sampled” operand set to 1.
15:00 jekstrand: ^^
15:00 jekstrand: So no storage images in Vulkan
15:00 jekstrand: Still, it seems like it'd be easy enough to support. It's just TXS after all.
15:01 jenatali: jekstrand: Where's that wording?
15:01 jekstrand: jenatali: In the Vulkan environment spec for SPIR-V
15:01 jenatali: Ah, got it
15:01 jekstrand: Which is part of the main Vulkan spec
15:02 jekstrand: Looks like OpenCL allows it and just requires LOD to be 0
15:02 jekstrand: So it's just a bit of stupid divergence between the two for absolutely no reason
15:02 jenatali: Just making sure that the SPIR-V translator wasn't doing something stupid, since Sampled is set to 2 (unknown) for CL IIRC
15:02 jekstrand: Yeah, CL sets Sampled to 2
15:02 karolherbst: because you have the runtime sampler, no?
15:03 jekstrand: Yeah
15:03 jenatali: Though I thought CL C++ could use an actual nonzero LOD there?
15:03 jekstrand: This is another one of those places where CL is just weird.
15:03 jenatali: Yep!
15:03 jekstrand: jenatali: Only if you have cl_khr_mipmap_image
15:03 jenatali: Throw it on the pile lol
15:03 jenatali: jekstrand: Well, sure, if you don't have that, there's no such thing as a LOD
15:03 jekstrand: jenatali: Sure. :)
15:04 jenatali: That's on my backlog as a reasonable extension to support at some point
15:04 jenatali: Once we hit core 1.2 then I can start looking at all the 2.x stuff that became optional in 3.0 (like generic pointers) and extensions
15:05 jekstrand: Yeah
15:05 jekstrand: Generic pointers seem like the most obvious additional feature to me
15:05 jekstrand: They're both wonderful (from an app developer POV) and terrible (from a driver POV)
15:05 jenatali: Hooking up SPIR/SPIR-V support into the runtime I think is the most important one
15:05 jekstrand: Have fun with SPIR
15:06 jenatali: We already use it as an intermediate in our CLC compilation pipeline, I'm not concerned
15:06 jekstrand: Oh, ok. Yeah, if we can get recent LLVM to consume SPIR and dump out SPIR-V, it shouldn't be bad.
15:06 jenatali: Yeah we had this discussion a while back - LLVM's bitcode reader is designed to be able to read old versions of LLVM IR and upconvert
15:06 jekstrand: SPIR-V support should be trivial, in theory.
15:07 jenatali: Yep, as long as we don't start getting SPIR-V that looks nothing like what we get from the converter
15:08 jekstrand: heh
15:15 jenatali: I think the other CL2.x optional stuff I'd want to support is read-write images, writable 3D images, program scope variables, and initializers/finalizers
15:16 jenatali: And maybe subgroups, now that we've got some wave ops in DXIL, though I haven't looked at how closely that maps
15:16 karolherbst: speaking of CL2.x... https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6401
15:16 karolherbst: :D
15:17 jekstrand: jenatali: If you use the KHR subgroups stuff, it should map basically 1:1
15:17 karolherbst: jenatali: I think it makes sense to check what SYCL ends up using
15:17 jenatali: karolherbst: Nice :)
15:17 jenatali: If SYCL uses SVM we're already out :P
15:17 karolherbst: :D
15:17 karolherbst: airlied would know
15:17 jenatali: That's just not happening on Windows generically anytime soon
15:18 jenatali: Maybe we can do it in WSL :P
15:18 karolherbst: jenatali: but I am sure you can support SVM. I am quite sure that d3d probably also has optional support for userptrs? no?
15:18 karolherbst: like GL_AMD_pinned_memory for GL
15:18 jenatali: karolherbst: Yes-ish, but there's no guarantee you'll get the same address on the GPU
15:18 karolherbst: that's fine
15:19 karolherbst: it just doesn't work 100% correctly then
15:19 jekstrand: jenatali: There are 4 levels of SVM
15:19 jekstrand: jenatali: I'm sure you can do at least the first one
15:19 karolherbst: synchronizing of shadow buffers _is_ a problem
15:19 karolherbst: but besides that?
15:19 jenatali: Maybe I just haven't read the spec closely enough for what that first level actually is
15:19 karolherbst: 3
15:19 karolherbst: + an atomic bit
15:19 karolherbst: jenatali: just describes how explicit the sharing is
15:20 karolherbst: coarse_grain_buffer: explicit API for allocation + explicit map/unmap
15:20 karolherbst: fine_grain_buffer: explicit API for allocation
15:20 karolherbst: system_grain: no API
15:20 jenatali: Yeah but you still need to be able to use the same pointers inside a kernel that you use on the host
15:20 karolherbst: right
15:20 karolherbst: but because you have to explicitly allocate, the driver can map the pointer into the GPUs VM
15:21 karolherbst: that's what GL_AMD_pinned_memory is doing
15:21 jenatali: Eh sure
15:21 jenatali: We don't have APIs for that in D3D, though I think our memory manager could probably do it
15:21 karolherbst: I see
15:21 karolherbst: I mean.. it's basically a coherent and persiotenly mapped buffer :p
15:22 karolherbst: just with a few additional implications I think
15:22 jenatali: Sure, it's just getting the addresses to match between CPU and GPU that's tricky
15:22 karolherbst: yeah
15:22 jenatali: I mean, DXIL doesn't even have pointers, sooooo
15:22 jekstrand: Ugh... Why does clover have to use exceptions backwards?
15:22 karolherbst: OpenGL doesn't either
15:22 karolherbst: jekstrand: don't ask...
15:22 karolherbst: I also have to force me to not complain about it too often and just go with the flow...
15:22 jekstrand: device::device returns for success and unconditionally throws at the end
15:23 jekstrand: srsly?
15:23 karolherbst: e:shrut
15:23 karolherbst: ...
15:23 karolherbst: ¯\_(ツ)_/¯
15:23 jenatali: karolherbst: The whole point of SVM is to allow dereferencing a host pointer inside a kernel, but we actually can't translate that to DXIL at all, even if we could get the addresses to be the same :P
15:23 karolherbst: jenatali: that's fine
15:24 karolherbst: the application doesn't provide the pointer in the first levels
15:24 karolherbst: it can even be GPU memory
15:24 karolherbst: doesn't matter
15:24 karolherbst: you just should get the same pointer
15:24 karolherbst: so.. what you'd do is to have a normal memory allocation and either map the GPUs pointer into your CPU VM or vice versa
15:24 karolherbst: all of that is runtime controlled
15:25 karolherbst: and when the application _maps_ it just gets the pointer returned by the runtime
15:25 jenatali: Sure... but without GPU pointers in the shader language, that's kind of useless :P
15:25 karolherbst: no?
15:25 karolherbst: textures and stuff?
15:25 karolherbst: guess what GL_AMD_pinned_memory is doing
15:25 karolherbst: it's just not directly SVM
15:25 jenatali: "This form of SVM is similar to non-SVM use of memory; however, it lets kernel-instances share pointer-based data structures (such as linked-lists) with the host program."
15:25 karolherbst: but it depends on the same concept
15:26 karolherbst: jenatali: just because you can't hand in a pointer via the API doesn't mean the shaders won't deal with pointers :p
15:26 karolherbst: in GL you create a buffer with EXTERNAL_VIRTUAL_MEMORY_BUFFER_AMD
15:26 karolherbst: and that's essentially your SVM buffer then
15:26 vpandya_: ERROR: [Loader Message] Code 0 : loader_scanned_icd_add: Attempt to retrieve either 'vkGetInstanceProcAddr' or 'vk_icdGetInstanceProcAddr' from ICD /home/vivek/install/lib/x86_64-linux-gnu/libvulkan_xxx.so failed.
15:27 vpandya_: here am I missing some files to be included?
15:27 karolherbst: you just use it as a normal OpenGL buffer
15:27 karolherbst: but with the exception that now the CPU and GPU have the same pointer
15:27 jenatali: karolherbst: If you write a __global pointer inside a CL kernel, that ends up being a (index, offset), and right now DXIL doens't have any way to make it anything else. SVM would require that to be an actual address, so that the host can read it
15:27 karolherbst: jenatali: nope
15:27 karolherbst: ignore the kernel
15:27 karolherbst: so you create an SSBO, right?
15:27 jenatali: Right...
15:27 karolherbst: and in GL that's just a buffer, right?
15:28 karolherbst: and if, in GL you'd create the buffer with EXTERNAL_VIRTUAL_MEMORY_BUFFER_AMD, it ends up having the same address on the CPU and GPU
15:28 karolherbst: and if you bind this buffer to an SSBO, you operate on the same pointer
15:28 karolherbst: s/to/as/
15:28 jenatali: Then I'm not seeing the point... that doesn't sound like it does anything special
15:29 karolherbst: at runtime the GPU will have to translate the SSBO index/offset to a native pointer anyway
15:29 jenatali: Suer
15:29 karolherbst: with EXTERNAL_VIRTUAL_MEMORY_BUFFER_AMD it's just the same as on the CPU
15:29 karolherbst: where without it, it can be different
15:29 jenatali: Sure... that still doesn't sound like CL's SVM
15:29 karolherbst: so you essentially map your CPU memory into the GPUs VM and be done with it
15:29 karolherbst: it is :p
15:29 karolherbst: it's not system SVM
15:30 karolherbst: but it's the first or maybe second level
15:30 karolherbst: not sure if you can implement the second level with that
15:30 karolherbst: but the first one for sure
15:30 jenatali: It still doesn't allow linked lists to be shared between CPU and kernels, which to me sounds like it doesn't meet the requirements of level 1
15:30 karolherbst: first level of SVM doesn't allow that either
15:31 karolherbst: you have to tell the runtime about _every_ allocation beforehand
15:31 jenatali: karolherbst: Re-read the first level description on https://www.khronos.org/registry/OpenCL/specs/3.0-unified/html/OpenCL_API.html#shared-virtual-memory
15:31 karolherbst: that stuff is only possible with system SVM without going through that pain
15:31 karolherbst: jenatali: there is an API for that
15:31 karolherbst: wait...
15:32 karolherbst: jenatali: clSetKernelExecInfo + CL_​KERNEL_​EXEC_​INFO_​SVM_​PTRS
15:32 jekstrand: karolherbst: Why does get_compute_param take an ir_type?
15:33 karolherbst: "Non-argument SVM buffers must be specified by passing pointers to those buffers via clSetKernelExecInfo for coarse-grain and fine-grain buffer SVM allocations but not for finegrain system SVM allocations."
15:33 karolherbst: jekstrand: ehhh.. no idea?
15:35 jenatali:just doesn't understand SVM at all
15:35 karolherbst: you get used to it :p
15:35 jenatali: I'm still planning to completely ignore it unless I actually need to care, and I'm really hoping I don't...
15:36 jekstrand: Bindings? Who seriously needs those?
15:36 karolherbst: :D
15:36 karolherbst: yeah, just go bindless all the way
15:36 karolherbst: :p
15:37 prabhakarlad: hello all, currently we are investigating a issue where the tux+console appears on the LCD panel only when modetest is run. But doesnt come up up by default. Any thoughts what could be causing it ?
15:37 karolherbst: jenatali: but remember this: Nvidia got annoyed enough about it, that they actually start implementing it inside their OpenCL driver :p
15:38 karolherbst: it's inevitable
15:38 jenatali: We'll see
15:42 jenatali: karolherbst: I took your feedback for the system values MR, and added an ack to your Clover patch (don't think I can properly review)
15:52 jenatali: jekstrand: When you get a chance, I think the libclc series is good for another look
15:57 jekstrand: jenatali: Ok, I'll give it another read
15:57 jekstrand: jenatali: Given airlied's issues, I think we need to land mem_constant first
15:57 jenatali: Sure, seems fine
15:57 jenatali: Not in any particular rush to actually hit the merge button, but I would like to at least iron out review feedback sooner rather than later
15:58 karolherbst: yeah... I just prefer to land the offset stuff first because that's easier to rebase everything on )D
15:59 karolherbst: :)
15:59 jenatali: karolherbst: I think that one should be in pretty good shape now
15:59 karolherbst: I think you overwrite my latest push though :p
15:59 karolherbst: let me get that right
15:59 jenatali: Only if you pushed while I was working this morning?
16:00 jenatali: If so, sorry
16:01 karolherbst: ehh... no, you are right.. I missread the code :D
16:01 karolherbst: yeah, I think it's good to go and the clover changes are only touching the nir bits, so that's fine
16:01 karolherbst: maybe jekstrand wants to look over the nir bits and give some suggestions or so?
16:02 jenatali: jekstrand: Sorry for monopolizing all of your time with reviewing :P
16:03 karolherbst: jenatali: my only worry is that we missed a place where we'd need to add nir_lower_compute_system_values.. mhh but yeah
16:03 karolherbst: I checked at least and it looked fine
16:03 jenatali: karolherbst: Yeah that was my main concern with splitting it out into its own pass, but that's what you guys wanted :)
16:09 daniels: airlied: see what you've started ... ?! https://gitlab.freedesktop.org/icecream95/mesa/commits/val_pan
16:10 jekstrand: karolherbst: Pass 92 Fails 8 Crashes 2 Timeouts 4
16:10 jenatali:wonders about hooking up vallium to the d3d12 driver...
16:10 jekstrand: karolherbst: I think I'm happy enough with it for now
16:10 karolherbst: jekstrand: what stuff are you failing on?
16:10 jekstrand: karolherbst:
16:11 jekstrand: basic bufferreadwriterect
16:11 jekstrand: basic global_work_offsets
16:11 jekstrand: basic kernel_numeric_constants
16:11 jekstrand: basic kernel_preprocessor_macros
16:11 jekstrand: basic loop
16:11 jekstrand: basic vstore_global
16:11 jekstrand: basic vstore_local
16:11 jekstrand: basic vstore_private
16:11 karolherbst: mhhh, strange, but okay
16:11 imirkin: jekstrand: the "what aren't you failing on" list might have been shorter...
16:11 daniels: jenatali: dude.
16:11 jekstrand: imirkin: That was the fail list
16:11 karolherbst: daniels: I think that's a great idea :p
16:11 jekstrand: imirkin: The pass list has 92 items :)
16:11 karolherbst: ;) not
16:12 karolherbst: imirkin: I only have 7 fails with nouveau :p
16:12 karolherbst: (including crashes and timeouts)
16:12 jekstrand: daniels, airlied: Nice!
16:12 daniels: karolherbst: tbf if you have SVM + events you could write a GPU in CL, right
16:13 karolherbst: probably
16:13 jekstrand: jenatali: No worries, getting NIR in good shape for CL is pretty high-priority for me right now.
16:13 jekstrand: This week, anyway. Next week, that may change. :D
16:13 karolherbst: :D
16:14 jenatali: karolherbst: Right now the async stuff isn't part of the libclc series. Do you think adding them into the existing MR makes sense or should I just add them later?
16:14 jenatali: It's not huge
16:14 karolherbst: I think splitting things up is always a good idea as it makes it easier to review. So if the current libclc stuff only wires up the opcodes we don't support that's fine
16:15 jenatali: It rewires a few already-supported opcodes to better implementations, but for the most part just lights up new stuff
16:16 jenatali: But yeah, I'll add it separately
16:30 jekstrand: jenatali: Headed out. I'll cook up a patch to handle variables "properly" when I get back in an hour or two.
16:33 jenatali: jekstrand: Cool, sounds good
17:07 karolherbst: mhh.. I think I have a few SVM related regressions now.. let's seee
17:42 pmoreau: karolherbst: Do you Rb your updated version of “meson: Raise minimum version for SPIR-V OpenCL deps (v4)”? I’m adding Serge’s Rb as well to the commit while at it.
17:43 karolherbst: pmoreau: I think I added it already, no?
17:43 pmoreau: Not as far as I can tell
17:43 pmoreau: Or maybe you did it locally but forgot to push?
17:46 karolherbst: let's see
17:47 karolherbst: pmoreau: r-by me :D
17:47 karolherbst: I forgot to add it indeed
17:47 pmoreau: Thanks! will add
17:48 pmoreau: How about the updated “clover/llvm: Use the highest supported SPIR-V version (v3)”? :-) I changed to a min and folded in the static_cast.
17:48 karolherbst: that's fine as well
17:49 pmoreau: 👍️
17:49 pmoreau: Thanks! So all patches should have at least one Rb now.
17:59 cwabbott: I assigned !5720 to marge ~5 minutes ago and she still hasn't picked it up, despite nothing else being in her queue, and I didn't get the usual email when I assigned it to her... is something in gitlab broken?
17:59 cwabbott: usually that happens ~instantly when there's no queue
18:07 karolherbst: pmoreau: let's finally merge some stuff :D I need to reduce the amount of local patches :D
18:07 pmoreau: Yes please!
18:09 jenatali: :D
18:12 karolherbst: ehh.. why am I pulling more and more patches :(
18:16 karolherbst: pmoreau: can I interest you in reviewing patches which might help nv50 reducing overall CPU overhead when it comes to using shaders?
18:16 karolherbst: I plan to merge the shader cache stuff for nvc0 shortly (like tomorrow or so?)
18:17 karolherbst: but I don't want nv50 to be left out for too long
18:17 karolherbst: but the nv50 patches are a bit more annoying to review
18:21 pmoreau: I had a quick look for fun, and the initial RFC for the SPIR-V support in clover was sent May 3rd, 2017, containing a SPIR-V linker (which ended up later that year (or maybe in 2018) in SPIRV-Tools, but most patches (implementing cl_khr_il_program and the equivalent bits in 2.1) are still waiting in !2078 :-)
18:22 karolherbst: that's a long time ago :p
18:22 jenatali: That is a long, long time
18:22 pmoreau: It is
18:24 pmoreau: There have been long stretches of me leaving the project on the side (especially lately), tbf. But also due to not being able to get things merged.
18:26 pmoreau: karolherbst: Maybe… I’d like to get nv50 patches merged so that it doesn’t lag too much behind on the OpenCL side, but I still need to figure out what is going wrong with one of them. Can I have this weekend to look at it?
18:26 karolherbst: sure
18:28 airlied: daniels: uggh better try and derail thr insanity
18:29 airlied: what part of vallium isnt for hw is hard to get!
18:29 karolherbst: curro_: mind if we merge !5038 ? that's the improved spirv support
18:29 imirkin: airlied: the "isn't for hw" part :)
18:30 karolherbst: airlied: it's not up to you anymore what happens with vallium :p
18:30 imirkin: that's a typo, right? you meant to say *is* for hw? :p
18:33 airlied: karolherbst: i can nak merging thungs
18:33 airlied: because its a bad idea
18:34 airlied: just write a vulkan driver
18:34 airlied: its less work
18:34 karolherbst: why didn't you just write a vulkan driver then :p
18:34 airlied: because swrast is different
18:34 airlied: the work in swrast is the rast
18:34 karolherbst: true
18:35 airlied: vulkan is a lowlevel api
18:36 airlied: building cpu cmdbuffers is wrong
18:36 karolherbst: sounds like fun
18:36 karolherbst: :D
18:38 airlied: if someone see icecream95 on irc tell them to stop
18:38 karolherbst: what are they up to?
18:39 airlied: vallium on panfrost
18:40 kisak: airlied: drop a comment on one of their commits in gitlab to email them?
18:40 airlied: kisak: did that already
18:40 karolherbst: airlied: I have to say.. if you didn't expect something like that your are quite naive :p
18:40 karolherbst: and I am sure you won't be able to stop it :D
18:41 airlied: karolherbst: i totally expect it
18:41 karolherbst: I like how most of the patches are just breaking things :D
18:41 airlied: but also going to point out the waste of time itis
18:41 airlied: and will never let it be merged
18:42 karolherbst: airlied: but what are the plans to do with it honestly? I kind of expect that having a swrast vulkan driver installed can cause a huge pain
18:42 karolherbst: like applications supporting vulkan and opengl, but prefering vulkan
18:42 karolherbst: and than you only have swrast vulkan
18:42 zmike: it's for zink unit testing
18:42 karolherbst: ahhh
18:42 karolherbst: and what is unittesting valium, zink?
18:42 karolherbst: :p
18:42 zmike: yes
18:42 airlied: karolherbst: then we find that out
18:42 karolherbst: fun
18:43 airlied: also for virgl vulkan ci
18:43 karolherbst: airlied: right.. but like if you run nouveau, you don't want vallium ever
18:43 karolherbst: like never ever
18:44 karolherbst: so this is a huge problem when users are getting it installed
18:44 karolherbst: also, old GPUs and stuff
18:44 airlied: some people want it
18:44 karolherbst: distribution will enable it by default
18:44 karolherbst: maybe not all
18:44 karolherbst: but some will
18:45 airlied: then we eill get feedback and look at it
18:45 karolherbst: we already know the answer though: don't use vallium
18:45 airlied: not sure what apps do yet
18:45 karolherbst: games
18:45 karolherbst: I know a few which does
18:46 airlied: karolherbst: it is tge same arhument againdt llvmpipe
18:46 karolherbst: no
18:46 karolherbst: it's totally not
18:46 airlied: just you have a gl driver
18:46 karolherbst: it's completely different
18:46 karolherbst: games might prefer vulkan, so they load vulkan and see they get a driver, fine, and use that
18:46 karolherbst: if it is vallium? bad luck
18:47 karolherbst: your OpenGL driver would be hw accelerated
18:47 airlied: the driver is marked as cpu
18:47 airlied: so maybe we csn fix the loader
18:47 karolherbst: you expect all games to do the right thing
18:47 karolherbst: yeah...
18:47 airlied: yes until shown otherwise
18:47 karolherbst: I mean.. I can give it a try on those I know about and see what those do
18:48 karolherbst: I am sure wine could also run into this, but wine is fixable, so I don't care about it
18:48 airlied: like i could force fail to load onto it only via envvar
18:48 airlied: its a trivial fix
18:48 karolherbst: you mean you have to set an env var so it fails?
18:49 airlied: envvar to use it at all
18:49 karolherbst: ahh right
18:49 karolherbst: yeah, that might work
18:49 airlied: its not a great option
18:49 karolherbst: at least that would hurt less users than the opposite
18:49 karolherbst: but yeah
18:49 karolherbst: we should check if games are broken in this regard
18:49 airlied: in the future that might not be true though
18:49 airlied: esp users stuck on sw gl
18:50 karolherbst: right
18:50 karolherbst: yeah.. it's a bit tricky
18:50 karolherbst: I just wasn't thinking about this potential issue before mhh
18:50 karolherbst: oh well, we still have time until 20.3 :p
18:50 karolherbst: but I'd really like to get that sorted out
18:50 airlied: writing a nouveau vulkan driver will solve 2 problems :-p
18:51 karolherbst: convince skeggsb to push it :p
18:51 karolherbst: airlied: but it's also about older hardware and stuff
18:52 karolherbst: or other drivers only having GL
18:52 airlied: yeah i can try on cayman hw
18:52 airlied: i should make rpms to test
18:52 airlied: just unplugged the cayman
18:53 karolherbst: I know some games feral ported support vulkan and opengl and default to vulkan.. some default to opengl
18:54 karolherbst: I think there might even be games allowing you to switch at runtime
18:54 karolherbst: dota 2 I think?
18:54 karolherbst: mhhh
18:55 airlied: but i also think apps will eventually get used to living in a world with a cpu vulkan provider
18:55 karolherbst: hopefully
18:55 airlied: also write a layer
18:55 airlied: to block older apps from seeing it i suppose
18:56 karolherbst: yeah.. but how do you decide what's too old and I am sure there will always be new ones messing it up
18:56 karolherbst: games can be.... annoying
18:56 karolherbst: and even if you tell them about issues they might not fix it
18:57 karolherbst: I trust them like 0
18:58 karolherbst: worst case you get a game without a config option at runtime and without a launch parameter always using vulkan if there is a driver, no matter what kind
18:58 airlied: we have cintact with most vendors
18:58 karolherbst: and I am sure those will/do exist
18:58 airlied: but fixes are slow usuallly
18:58 airlied: Plagman: ^
18:58 karolherbst: airlied: right, but doesn't solve the case where the "dev team" was removed :p
18:58 airlied: thoughts?
18:59 karolherbst: I already talked with a publisher and they told me, that they don't have linux devs anymore and won't be able to fix anything
18:59 karolherbst: so...
18:59 karolherbst: old bug, but still
18:59 karolherbst: I am sure that most games will be fine, hopefully, but there will always be those which are not
19:03 cwabbott: ugh, it failed with "CI taking too long" but it didn't actually run CI...
19:05 robclark: cwabbott: if that is one of the a630 runners, !6398 should in theory help..
19:05 cwabbott: robclark: no, it didn't even attempt to run CI I think
19:06 cwabbott: so the error message is nonsensical...
19:06 robclark: huh, got link to the job?
19:07 cwabbott: there is no job, since it didn't run CI
19:07 cwabbott: but it's https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5720
19:08 robclark: huh, ok.. that's a new one.. odd
19:09 airlied: cwabbott: do yo have allow other to change this mr ticked?
19:09 airlied: ah yes it is
19:11 pendingchaos: cwabbott: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5977#note_576737
19:18 jenatali: karolherbst: Did you want to get someone to review the Clover bits for !5891 (offsets)?
19:19 jenatali: Once that's done I think that one's good to merge
19:25 pmoreau: curro_ and karolherbst: Updated to have it as a function: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5038/diffs?commit_id=6e892429b9a29725d4effefa75abf82eef782ae4; I used uint8_t (since major and minor just use one byte each) as argument that are upcasted to uint32_t, rather than uint32_t args that are then ANDed with `0xFF`, since the former makes it more explicit if you are trying to pass in too much data.
19:27 curro_: pmoreau: seems fine, go ahead and merge it :)
19:28 pmoreau: curro_: I do not have the rights to do so, so please go ahead. And thank you for the reviews!
19:37 karolherbst: jenatali: maybe not review, but at least somebody else familiar with nir should take a look... in my first version I also had some ugly bugs in it
19:38 karolherbst: jenatali: actually.. there is a bug :D
19:39 karolherbst: let's see...
19:39 karolherbst: or maybe not?
19:39 karolherbst: pmoreau: mind giving !5891 a go with the CTS on tesla as we only advertise 2 dimensions there?
19:39 karolherbst: test_basic get_global_offset global_work_offsets
19:39 jenatali: karolherbst: Hah, I did actually spot a bug
19:40 karolherbst: jenatali: although I am only 20% sure it's a bug :/
19:40 pmoreau: karolherbst: Will do, but probably tomorrow now.
19:41 karolherbst: mhh. no I think it's fine, offset_vars[3] should be initialized to NULL, so the var ? check further down should be fine
20:00 karolherbst: jenatali: what should check out now? images? :D
20:01 jenatali: That's probably the next big one, if you want to give it a shot
20:01 jenatali: You could also try out libclc + jekstrand's constant series, make sure that's looking good so we can get that in?
20:01 karolherbst: mhhh, good idea
20:01 karolherbst: that's probably more important
20:01 jekstrand: jenatali: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6411
20:02 jekstrand: jenatali: Compile-tested only
20:02 jenatali: :)
20:02 jenatali: I'll give it a shot
20:02 karolherbst: okay.. so what do I need for libclc? :D
20:03 jenatali: LLVM code from trunk, though the actual clang binaries can be older
20:03 karolherbst: yeah.. I really only want to compile libclc though
20:04 karolherbst: are there integer_ops tests requiring libclc btw?
20:04 jenatali: And the SPIRV-LLVM-Translator executable, instead of just the lib
20:04 karolherbst: or just math_brute_force?
20:04 jenatali: karolherbst: Yes
20:04 karolherbst: mhhhh
20:04 karolherbst: which one?
20:04 jenatali: https://github.com/llvm/llvm-project/tree/master/libclc/generic/lib/integer
20:04 karolherbst: I mean.. most of them we can do in nir :p
20:04 jenatali: I don't think we're using all of these, most of them are trivial
20:05 karolherbst: oh well..
20:05 jenatali: https://github.com/llvm/llvm-project/blob/master/libclc/generic/lib/SOURCES#L66 these are the ones we ended up including in the built libclc
20:05 jenatali: I'd have to skim to see how many we're calling out to
20:05 karolherbst: yeah.. let's see
20:06 karolherbst: ehh..
20:06 karolherbst: can we not use fma? :D
20:07 jenatali: We can't
20:07 jenatali: We don't have a fused fma, which means it doesn't have enough precision to pass CL's tests
20:07 karolherbst: mad_sat seems like to be the only int function
20:08 karolherbst: mhhh...
20:08 karolherbst: right..
20:08 iive: there is simpler solution for the cpu vulkan render. have it disable unless an envvar is set. Just like the other stuff for debugging.
20:08 jenatali: Cool, we could probably build a smaller libclc
20:08 Plagman: airlied: i'm not aware of any games that support both GL and vulkan except doom, where enabling vulkan is opt-in by the user, and dota 2
20:08 karolherbst: iive: and what if you really only have software renderer?
20:08 Plagman: there's an attribute in the impl info to declare you're not a hw implementation right?
20:09 Plagman: maybe the loader could have an env var to disallow/allow them
20:09 iive: karolherbst, if you don't have the envvar set, it is like you don't have any render.
20:09 Plagman: steam could set the disallow one
20:10 karolherbst: iive: but then users have to enabled it
20:10 karolherbst: I am not looking for "technical correct solutions", I am looking for a solution which doesn't suck
20:10 pmoreau: karolherbst: Time to rebase: the MR was merged! \o/
20:10 jenatali: :)
20:11 iive: karolherbst, That's the point. If the user prefers slow software vulkan render over fast hw accelerated opengl one, then the user sets the envvar.
20:11 karolherbst: yay.. only 51 patches
20:11 karolherbst: :D
20:11 karolherbst: iive: that goes completly besides the point
20:11 airlied: Plagman: yeah there's a GPU vs CPU implemtatnion falag
20:11 airlied: Plagman: yeah a loader/layer might be an option there
20:11 karolherbst: the user should do nothing if it wants the best outcome
20:12 karolherbst: if a user has to set an env var to get vulkan sw over GL sw and vulkan sw is considered faster, than that solution sucks
20:12 iive: karolherbst, exactly. if the user doesn't set the envvar, he gets hw accelerated opengl.
20:12 karolherbst: iive: and what if the user doens't have hw accelerated stuff at all?
20:12 iive: karolherbst, then it doesn't matter...
20:12 karolherbst: maybe it does
20:12 karolherbst: and if it doesn't, then vallium doesn't matter
20:13 karolherbst: and then there might be only software using only vulkan and not gl, which is fine sw accelerated
20:13 karolherbst: the point is
20:13 karolherbst: the user shouldn't have to set an env var
20:13 iive: i thought that the point of vallium is doing testing. so it is not for general user consumption.
20:13 airlied: Plagman: VkPhysicalDeviceType
20:13 Plagman: yea
20:13 karolherbst: if it's only for testing, then it shouldn't be on master
20:14 airlied: karolherbst: huh?
20:14 karolherbst: or it's so clear for testing only that distributions won't ship it
20:14 airlied: that makes no sense
20:14 airlied: but I don't want to limit it to just testing
20:14 karolherbst: if vallium itself requires an env var, that's fine
20:14 karolherbst: airlied: right
20:14 airlied: people have found use cases for llvmpipe
20:14 karolherbst: exactly
20:14 karolherbst: and that's why I think any solution involving an env var is useless
20:14 iive: karolherbst, the users are also allowed to do testing.
20:15 karolherbst: iive: as airlied said, it's not only for testing
20:15 anholt:imagines firefox will want vallium when they start working on vulkan webrender
20:15 airlied: well an env var for steam to disable it would probably cover 99% of problem cases
20:15 karolherbst: if we want vallium to be enabled, _whatever_ uses it should do the right thing
20:15 karolherbst: or mesa or whatever
20:16 karolherbst: airlied: yeah well.. if a solution only works with steam it's equally useless :p
20:16 airlied: karolherbst: why?
20:16 karolherbst: because there are uses playing games without steam
20:16 karolherbst: *users
20:16 airlied: show me them :-P
20:16 karolherbst: there are enough
20:16 airlied: playing vulkan games?
20:16 karolherbst: I even own linux games which I can't get from steam
20:16 karolherbst: maybe in the future? but yes
20:16 karolherbst: there are also games supporting vulkan you can get from somewhere else
20:17 airlied: now how much effort do I need to put in to find those people?
20:17 karolherbst: does it even matter
20:17 iive: i think wine does have vulkan pass-through, doesn't it?
20:18 karolherbst: the concern is simply, that stuff shouldn't get useing vallium if there is accelerated OpenGL
20:18 karolherbst: *start
20:18 karolherbst: iive: wine can be fixed
20:18 airlied: karolherbst: maybe I can make vallium detect that somehow, not sure though
20:18 karolherbst: maybe.. but probably not
20:18 airlied: don't really want to run glxinfo |grep at startup
20:18 karolherbst: yeah..
20:18 karolherbst: and you won't know if the application will even try
20:19 iive: that's why I propose to use envvar... let the user choose.
20:19 karolherbst: they don't want to decide
20:19 airlied: karolherbst: you seem to understand there is no solution but not willing to accept it :-P
20:19 karolherbst: well.. we could just accept there is no solution, but I am quite confident we would run into that sooner or later
20:20 karolherbst: iive: suggest macos user to set some env var to get something they want :p
20:20 airlied: like the solution might be to just blacklist on an app by app basis
20:20 karolherbst: I left this "linux is just for nerds living at their parents home" phase long time ago :p
20:20 airlied: and fix open source apps that hit it
20:20 karolherbst: yeah..
20:21 karolherbst: that might be the best solution
20:22 ajax: i mean... if your game engine thinks VK_PHYSICAL_DEVICE_TYPE_CPU is better than hardware opengl, perhaps that's your game engine's bug
20:22 karolherbst: airlied: or check once and save a file inside /tmp :D
20:22 ajax: (which vallium ought to be setting itself as. i sincerely hope and expect.)
20:22 karolherbst: ajax: well.. games have bugs and I don't trust them to get it right
20:23 karolherbst: most will
20:23 karolherbst: some won't
20:23 iive: does mesa work macosx?
20:23 ajax: the market will effectively drive those games out of circulation, to the extent that this scenario even arises, which is, i'ma say: rare
20:24 karolherbst: iive: you know what I meant
20:24 ajax: i recommend finding a better thing to worry about?
20:25 ajax: you want an envvar for VALLIUM_HURT_ME_PLENTY=1 to force it to lie about being software, go ahead i guess
20:25 karolherbst: ajax: I mean.. I am fine with a blacklist of broken applications we can't fix, that's just something coming into my mind which could be a problem
20:25 jekstrand: That's called VK_ICD_FILENAMES=/dev/null
20:25 airlied: yeah I also think we wait until it's a problem that we have some scope on
20:26 iive: the default one should be the most desired thing. That is using hardware.
20:26 karolherbst: yeah, fine by me, but the backlisting solution actually sounds good enough I think
20:26 karolherbst: iive: and how do you detect apps only doing vulkan vs apps doing both?
20:26 karolherbst: anyway
20:26 jenatali: jekstrand: Looks like that works :)
20:26 karolherbst: a solution involving env vars isn't one :p
20:26 karolherbst: at least not here
20:28 jekstrand: jenatali: Cool. I think that's probably better than having the caller try to sort it out after it's done the inlining.
20:28 jenatali: jekstrand: Agreed. Less code, and more efficient
20:28 jekstrand: jenatali: And I've got patches which really want it too. :)
20:28 jenatali: jekstrand: Will rebase that into the libclc series
20:28 jekstrand: jenatali: Cool. I'm happy for you to pull it into that series and I can just close my MR
20:29 jenatali: Great, sounds good
20:29 karolherbst: ehh... conflicts
20:29 iive: karolherbst, in most cases, you would not want to run software render, even if it is your only option.
20:29 iive: I just hope that the blacklist is in a config file
20:29 karolherbst: iive: gnome on wayland is not an important enough use case?
20:30 karolherbst: but still, that only does GL for now
20:30 karolherbst: but still a case where you require GL
20:30 karolherbst: even if it's only software
20:30 ajax: that's gnome on anything, tbh
20:30 karolherbst: right
20:30 iive: gnome is slow as it is..
20:30 karolherbst: iive: besides the point?
20:31 ajax: so... i guess i'd care about gnome on vallium if it happens to be faster than gnome on llvmpipe
20:31 karolherbst: :D
20:31 karolherbst: right
20:31 karolherbst: but it might actually be
20:31 karolherbst: or does mutter disable GL validation?
20:31 ajax: what does "disable gl validation" mean
20:32 karolherbst: uhm.. this no_error stuff
20:32 airlied: khr no error
20:32 iive: have in mind, vallium vs llvmpipe may become a race. where you have different winner every other release :D
20:32 ajax: it's not presently asking for that flag, i don't think. oughta be i suppose.
20:32 karolherbst: yeah..
20:32 karolherbst: would make sense to use it fo compositors
20:33 ajax: i think the zeroth-order term there is you don't have any memory bandwidth, and the error checking overhead is down in the noise
20:33 airlied: i expect like all swrast the cpu time is swamped
20:33 airlied: by the rast
20:33 karolherbst: probably right
20:34 ajax: right. which is bandwidth more than it is execution units.
20:34 karolherbst: but for accelerated you still want it to be enabled
20:34 karolherbst: anyway
20:34 karolherbst: it would just make the comparison llvmpipe vs vallium a bit more fair
20:34 karolherbst: but yeah.. probably doesn't matter
20:34 airlied: i have some pipelining fixes for llvmpipe
20:34 airlied: that help gs
20:34 karolherbst: or maybe somebody finds out that vallium consumes 5% less power or so...
20:34 karolherbst: who knows
20:35 airlied: currently it doesnt overlap vs and fs
20:35 airlied: so wint saturate your cores
20:35 ajax: also this is all predicated on mutter growing a vk backend at all, which is still in the future tense iirc
20:35 airlied: i got heaven to saturate
20:35 karolherbst: sure
20:36 karolherbst: but stuff like gtk4 pr qt6 or wahtever could get vulkan support as well and I wouldn't be surprised if somebody goes and says: ehhh.. doesn't matter if the vulkan driver is CPU
20:36 ajax: airlied: would also be cool if it could do swapbuffers in parallel
20:36 airlied: ajax: yeah that was trickier
20:37 airlied: need to use oresent i exoext
20:37 ajax: i'm about >< from wanting to rewrite drisw in xcb anyway
20:37 airlied: fencing access to buffers was a bit tricky with shm image
20:38 airlied: we currently Xsync
20:38 ajax: yeah, because you can't listen for the shm completion event, because the main loop already might be
20:38 airlied: yes that one
20:39 ajax: could do an async xsync? pop a thread, xcb_get_input_focus(), pthread_join
20:40 airlied: i ideally dint want the sync
20:40 ajax: you need a roundtrip some way or other
20:40 airlied: just a shmfence of some sort
20:40 ajax: X's request atomicity means if you emit ShmPutImage and then GetInputFocus, the reply from the focus will come back after the ShmPutImage is _complete_
20:41 ajax: no need to get all fency clever
20:42 airlied: so keep two shm pixmaps
20:42 airlied: and join thread before reuse
20:43 ajax: mmm. have to keep them in sync when you swap though
20:44 ajax: or do buffer_age. we should do buffer_age.
20:44 karolherbst: can somebody just give me the latest libclc.spv files? :D
20:44 airlied: i suppose you could overlap up until rast
20:44 airlied: which might be enough
20:44 ajax: would want to profile to see which part of the pig to apply the lipstick to, i think
20:45 airlied: i had heaven blocking on putimage
20:45 airlied: as the only stall point
20:45 jekstrand: jenatali: So.... this __builtin_has_hw_fma32
20:45 jenatali: Yeah
20:46 airlied: ajax: maybe just delaying until rast with singke pixmap might be enough
20:46 airlied: might add to todo
20:46 karolherbst: jekstrand: I suspect we want to have ffma and fmad :p
20:47 ajax: airlied: would certainly be a win to do that on its own, yeah
20:47 airlied: the vulkan wsi is also putimage i should improve that to use s as well
20:47 airlied: shm
20:47 jekstrand: karolherbst: I think we do
20:47 jekstrand: karolherbst: Right now, you can get ffma by setting exact on ffma
20:48 jekstrand: But we should probably have two opcodes
20:48 karolherbst: or use_fma = false
20:48 jenatali: jekstrand: Some functions in libclc, specifically sin/cos, only work correctly if you have an ffma with increased precision, which DXIL doesn't have
20:48 karolherbst: and has_fma= true
20:48 karolherbst: but yeah..
20:48 karolherbst: for us it's especially annoying
20:48 karolherbst: nv pre nvc0 was fmad, nv nvc0+ is ffma :/
20:49 karolherbst: and we only have one version on hw
20:49 jenatali: There's a software implementation of the high-precision fma, but it's pretty complex. There's also an algorithm which is just slightly worse, instead of requiring the full software fma to get the right accuracy
20:49 jekstrand: jenatali: Right. I'm just wondering if a __builtin to detect it is better than just doing a second lowering of ffma
20:49 jenatali: Yes, it is :)
20:49 jekstrand: jenatali: Oh, if it's completely switching algorithms, that's a different story.
20:49 jenatali: jekstrand: Let me get you the links to show you the difference
20:49 karolherbst: mhhh
20:50 karolherbst: jenatali: we want a few things merged before libclc
20:50 jenatali: jekstrand: https://github.com/llvm/llvm-project/blob/master/libclc/generic/lib/math/sincos_helpers.cl#L110
20:50 karolherbst: ohh wait !6256 is already merged
20:50 karolherbst: mhh
20:50 jenatali: As opposed to the full software fma: https://github.com/llvm/llvm-project/blob/master/libclc/generic/lib/math/clc_fma.cl#L35
20:50 karolherbst: jenatali: mind rebasing that libclc stuff? :D
20:50 jenatali: karolherbst: I'm working on it :)
20:50 karolherbst: cool
20:51 karolherbst: jenatali: I am sure one can do better.. :/
20:51 karolherbst: or maybe not?
20:51 jenatali: karolherbst: If you want the full precision that the CL CTS requires for fma... probably not
20:51 karolherbst: ehhh wait
20:51 karolherbst: wait wait...
20:51 karolherbst: why do you need this anyway?
20:51 karolherbst: the spec is quite clear it doens't care, no?
20:51 jenatali: No
20:52 karolherbst: I am sure the spec if fine with either version
20:52 jenatali: "Returns the correctly rounded floating-point representation of the sum of c with the infinitely precise product of a and b. Rounding of intermediate products shall not occur. Edge case behavior is per the IEEE 754-2008 standard."
20:52 karolherbst: ohh wait
20:52 karolherbst: that was for optimizing add and mul
20:52 karolherbst: for optimization the spec doesn't care
20:52 jenatali: Right
20:52 karolherbst: but I guess for an explicit fma it does?
20:52 jenatali: The spec *requires* a conformant fma
20:52 karolherbst: right.. mhh
20:53 karolherbst: annoying
20:53 jenatali: Agreed :)
20:53 jenatali: That __builtin was the best we could come up with
20:53 karolherbst: mhhh
20:53 karolherbst: I have an idea...
20:54 karolherbst: we can probably deal with that inside vtn or nir when linking
20:54 jenatali: karolherbst: That's the approach we went with, to deal with it inside vtn
20:54 karolherbst: jekstrand: nir_options.has_fma is this for fused or any?
20:55 jenatali: karolherbst: Did you see that first link I pasted to the sincos_helpers?
20:55 karolherbst: ehhh
20:55 karolherbst: fuse_ffma and lower_ffma are the options mhhh...
20:55 karolherbst: jenatali: uff... :/
20:55 jenatali: Yep
20:55 jenatali: We spent a while trying to pass bruteforce :)
20:56 karolherbst: I imagine
20:56 karolherbst: it is _very_ thourough
20:56 jenatali: Understatement of the century, even with emphasis
20:56 karolherbst: :D
20:57 jekstrand: I kind-of want has_hw_fma32 to be a specialization constant.....
20:58 jenatali: Yeah, we had talked about doing that, but I think it was tricky, since the CLC -> SPIR-V path doesn't really have a way to do specialization constants
20:59 jekstrand: Yeah....
20:59 jenatali: I guess we could always patch the SPIRV-LLVM-Translator to turn __builtin_has_hw_fma32 into a specialization constant?
20:59 jekstrand: :-/
20:59 jenatali: Yeah...
20:59 jekstrand: I don't see a "good" solution anywhere....
20:59 jenatali: Indeed
21:00 jenatali: Like I said in the MR comments, I'm happy to split the libclc patches into two, one which deals with upstream stuff that has landed, and another which deals with the stuff that LLVM folks haven't reviewed yet
21:00 jenatali: But yeah, that fma32 is ugly
21:00 jenatali: No matter which way you slice it
21:01 karolherbst: can't we make it better by... using mad? :D
21:02 karolherbst: jenatali: that code should have checked if c is 0 and a or b are 1 tbh :p
21:02 jekstrand: jenatali: What would it look like if we used just what's upstream today?
21:02 karolherbst: ohh wait.. c is checked
21:02 jenatali: jekstrand: The series would stop at "vtn/opencl: Switch some nir-sequence ops to use libclc"
21:03 jekstrand: jenatali: Ah
21:03 jenatali: Ah, actually, I should also refactor fmod into its own patch, that's not in the upstream libclc yet either
21:04 jenatali: But still, it's just chopping off bits of the last few patches
21:07 jekstrand: What do functions which are just prototypes look like in SPIR-V?
21:07 jenatali: https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html#_a_id_logicallayout_a_logical_layout_of_a_module #10
21:08 jenatali: OpFunction, OpFunctionParameter, OpFunctionEnd
21:09 jenatali: With an import linkage declaration on it, since libclc is sent through vtn as a library without fully linking it
21:10 karolherbst: jenatali: mhh.. yeah.. so this needs a bit more work :)
21:10 karolherbst: let's see if I can figure it out
21:11 jenatali: karolherbst: What're you seeing?
21:12 karolherbst: asserts
21:12 karolherbst: it's fine
21:12 karolherbst: I'll show a patch once I am done
21:12 jenatali: Great, thanks
21:13 karolherbst: acos fails :D
21:14 karolherbst: but yeah.. seems to work
21:14 jenatali: Fails how?
21:14 karolherbst: nan handling
21:14 karolherbst: ohh
21:14 jenatali: Ah
21:14 karolherbst: a bunch of them fail because of nan
21:14 karolherbst: yeah..
21:14 jenatali: Hm... wonder if there's something ordered vs unordered going on
21:15 karolherbst: jenatali: https://gitlab.freedesktop.org/karolherbst/mesa/-/commit/03ee75557e8cac0f676d8ea1ba19b8b5d9a47da3
21:15 jenatali: I'm seeing that we pass them all on our software driver, but it also didn't probably handle the difference between ordered and unordered fne
21:15 jenatali: karolherbst: Ah, duh, thanks :)
21:15 karolherbst: ahh yeah.. that debugging stuff was just merged as well
21:16 karolherbst: it does depend on the constant series though :p
21:16 karolherbst: but you get the idea
21:16 jenatali: Yep
21:16 karolherbst: anyway. it does seem to work
21:16 karolherbst: just two issues
21:16 karolherbst: 1. if libclc isn't found the runtime crashes
21:16 jenatali: :P oops
21:16 karolherbst: 2. it seems nan handling is broken :p
21:16 karolherbst: heh
21:17 karolherbst: ceil and copysign pass :p
21:17 karolherbst: but yeah
21:17 karolherbst: only see nan fails
21:17 jenatali: Cool - will be fun to debug nan handling
21:17 jenatali: I wonder what's going on there
21:17 karolherbst: ohh
21:17 karolherbst: exp and exp2 pass
21:17 karolherbst: those use _huge_ constant lookup tables, no?
21:17 jenatali: I think so
21:18 karolherbst: let's do a proper run :D
21:18 karolherbst: just 100 tests in bruteforce
21:18 jenatali: Still takes a while :P
21:19 karolherbst: well.. most fail.. so ...
21:19 karolherbst: let me enable wimpy mode
21:20 jekstrand: jenatali: Does OpenCL mangle all function names?
21:21 jenatali: jekstrand: All builtins defined by the SPIR-V spec
21:21 jenatali: Not all
21:21 jenatali: Er, by the SPIR spec, sorry
21:21 karolherbst: added -w to my cl cts runner :)
21:21 jekstrand: Does CLC allow overrides as a general language feature or only for builtins?
21:22 jenatali: jekstrand: Overrides?
21:22 jekstrand: jenatali: Same function name, multiple sets of input types
21:22 karolherbst: jenatali: not too bad
21:22 jenatali: jekstrand: Ah, just for builtins
21:22 karolherbst: jenatali: https://gist.github.com/karolherbst/c90e29ea921886bc8f27f130ab692561
21:23 jekstrand: jenatali: k
21:23 jenatali: karolherbst: Not bad :)
21:23 jenatali: jekstrand: Is this line of questioning still related to fma? I feel like you're about to come back with a beautiful solution
21:23 karolherbst: hah.. I don't se lower_ffma nice
21:23 karolherbst: *set
21:24 jenatali: karolherbst: That could easily cause all of your nan failures
21:24 karolherbst: ehh...
21:24 karolherbst: annoying
21:24 jekstrand: jenatali: You'd like that, wouldn't you?
21:24 jekstrand: jenatali: Just thinking about builtins
21:24 jenatali: jekstrand: Honestly I'm not sure - changing course means more work for me :P
21:24 karolherbst: let's try that...
21:24 jekstrand: jenatali: There are some Intel OpenCL extensions which are implemented as builtin functions rather than "real" SPIR-V opcodes and I'm thinking about how bbrezillon's builtin function handling will play into that.
21:25 jenatali: jekstrand: Ah, makes sense
21:25 karolherbst: jenatali: I still get fails, so I don't think it's that
21:25 jenatali: karolherbst: Too bad :(
21:26 karolherbst: could also be some optimizations going wrong in codegen.. let's see
21:26 karolherbst: mhh.. nope
21:26 karolherbst: but yeah.. it does seems to work in general
21:27 jenatali: Cool :)
21:27 karolherbst: also.. can we merge more stuff? :D
21:27 jenatali: karolherbst: Offsets?
21:27 karolherbst: yeah
21:27 karolherbst: mind doing the fix yourself?
21:27 jenatali: Yeah I can do that fix, one sec
21:27 jenatali: Incorporating your Clover patch to the libclc series
21:27 karolherbst: I just don't want to have those branches with 100+ patches again :D
21:27 jenatali: A pain because it conflicts since constant mem isn't merged yet :P
21:28 karolherbst: right
21:28 karolherbst: but that constant mem stuff works perfectly for me
21:28 karolherbst: so I say we merge offsets, then get the constant stuff reviewed
21:28 karolherbst: and then libclc?
21:35 jenatali: karolherbst: Offsets should be good, unless you wanted more than an ack on that last patch
21:35 karolherbst: I am fine with the last patch, no idea what curro_ things if it's okay to push stuff into clover/nir wihtout his ack/review
21:36 karolherbst: but it also is quite internal to the nir stuff anyway
21:42 jenatali: jekstrand: If Intel has extensions that are just unresolved external functions, there's a good chance those shaders won't make it as far as vtn, depending on how Clover (or whatever other frontend) does CL's linking. If you try to use SPIR-V linker, it'll complain that there's unresolved externals
21:42 jenatali: The libclc __builtin only works because libclc doesn't get linked at the SPIR-V level
21:43 jenatali: I.e. you need to inform the SPIR-V linker (or whatever linker is used) that "this specific unresolved external should be allowed" but it should still fail if there's calls to normal unresolved externals
21:44 jekstrand: jenatali: Yeah, the SPIR-V linker causes us problems.
21:45 jekstrand: jenatali: Here's another crazy idea: Do we want to just have separate spirv64-hwffma and spirv64 targets?
21:45 jekstrand: jenatali: Yeah, that's terrible
21:45 jekstrand: I'm not sure if it's more or less terrible
21:45 jenatali: jekstrand: I... don't like it
21:45 jekstrand: I don't like it either
21:45 jenatali: Especially because that means that for e.g. Clover, you have to have both of them available depending on which device you're going to use Clover with
21:45 jenatali: You can't have a platform-wide libclc
21:46 jekstrand: Yeah....
21:48 jenatali: Especially if you want to support devices that may have both 32bit and 64bit addressing, then suddenly your matrix of required libclc binaries doubles again
21:49 jekstrand: yeah
21:49 jekstrand: I really want this to be a spec constant.....
21:50 jekstrand: I just don't know how we'd handle the IDs
21:56 karolherbst: jekstrand: but that stuff gets constant folded anyway, no?
21:56 karolherbst: fma I mean
21:57 karolherbst: so if there is an if it should be all fine
21:57 jenatali: Yeah I think in practice the __builtin function call vs a spec constant is pretty much identical, it's just a matter of cleanliness
21:57 jenatali: One's an abuse of the SPIR-V while the other is the intended mechanism for that kinda thing
21:58 karolherbst: yeah.. I think I like the spec constants stuff more
21:58 karolherbst: it's more ABI like and not a big surprise for consumers
22:00 karolherbst: jenatali: nan only fails for vec16 I think.... let me test a non wimpy run
22:00 karolherbst: but .. well
22:00 karolherbst: it's just vec16 :D
22:01 karolherbst: mhh
22:01 karolherbst: some fail with vec8 as well
22:01 karolherbst: strange
22:01 jenatali: Huh
22:01 jenatali: You can add -1 to force just scalar
22:01 karolherbst: I know
22:01 jekstrand: jenatali: Unfortunately, a "proper" solution to this problem is something the SPIR-V working group in Khronos has been arguing about for probably 3-4 years now and we have yet to solve it. :-(
22:01 jenatali: :D
22:01 karolherbst: ahh atan2 fails with -1
22:02 karolherbst: that's not that terrible to debug
22:02 karolherbst: jekstrand: what did they argue about? two vs one opcode?
22:02 karolherbst: :D
22:03 jekstrand: karolherbst: How to handle shaders that need different code-paths based on what's supported in hardware.
22:03 karolherbst: ahh
22:04 jekstrand: Spec constants sort-of help but it's very limited.
22:04 pmoreau: jenatali: What kind of unresolved dependencies do you want to allow? It already allows for variables/functions decorated with BuiltIn; you can always allow partial linkage by calling `LinkerOptions::SetAllowPartialLinkage(true)`. It currently doesn’t support functions being overriden, though I do have patches for fixing that, that have been laying around for some time.
22:04 jekstrand: pmoreau: Functions can be decorated BuiltIn?
22:04 pmoreau: IIRC yes; let me check
22:05 pmoreau: It’s been a while since I looked at it
22:05 jekstrand: SPIR-V spec for Builtin says " Indicates which built-in variable an object represents. See BuiltIn for more information."
22:05 pmoreau: Ah no
22:06 jenatali: pmoreau: If I wanted to implement an extension without having to have dedicated SPIR-V opcodes, I'd want it to be an "allowed" unresolved external. That way, if the app has a fully-linked binary except for that extension, we can link in the missing piece. But if they are missing implementations of their own functions, we can give them a nice linker error
22:07 jenatali: I guess otherwise we could just allow partial linkage at that point and let vtn emit the error instead
22:08 pmoreau: I probably got confused by OpenCL functions like `get_global_id(dim)` (which are mapped to SPIR-V builtin variables) and thought that functions could be decorated with builtin.
22:09 jekstrand: karolherbst: Where are we at on constants?
22:09 jekstrand: karolherbst: I didn't really get the gist of the status yesterday
22:09 karolherbst: seems to work alright
22:09 jekstrand: karolherbst: Things seem to mostly work on iris. I've got a couple bugs to debug but that's not your problem.
22:09 karolherbst: I think?
22:10 karolherbst: at this point I am not sure if I indeed get mem_constant though...
22:10 jekstrand: karolherbst: What do you mean?
22:10 pmoreau: jenatali: I see, so like you would pass in an array of names to ignore for example? (though you would probably need some information on the arguments as well to handle function override.
22:10 karolherbst: jekstrand: I mean.. in kernel constants are fine, I was more refering to __constant* args
22:11 karolherbst: I think those are still plain global
22:11 jekstrand: karolherbst: Yeah, you access them as global, I think.
22:11 jekstrand: karolherbst: But they have slightly different rules inside the optimizer.
22:11 jekstrand: I think
22:11 jenatali: pmoreau: That was my thinking. But I think we could just let vtn be the one to emit linker errors and that'd be fine
22:11 jenatali: pmoreau: It's not even my ask, this was all jekstrand trying to implement Intel extensions that don't have SPIR-V opcodes :P
22:12 pmoreau: jenatali: :-D
22:13 pmoreau: jekstrand: I should be able to hack a quick patch if needed, though getting it upstreamed would take a bit longer.
22:13 karolherbst: jekstrand: yeah.. I think I will try to move them over to constant as there is no reason to keep them as global
22:13 karolherbst: and see how that goes
22:14 jenatali: jekstrand: Any objections if I go ahead and merge !6393 (enum -> unsigned for nir_variable bitfields)?
22:14 karolherbst: jenatali: ehhh.. let me retest with lower_ffma.. I have no shader cache patches applied :D
22:14 karolherbst: ahh, still broken
22:14 jenatali: Damn
22:15 karolherbst: yeah...
22:15 karolherbst: mhh
22:15 karolherbst: annoying
22:15 karolherbst: I will check with something trivial
22:15 jenatali: If you debug and find answers, let me know
22:15 karolherbst: atan2 generates like 83 instructions, not that bad :)
22:16 karolherbst: ehhh
22:16 karolherbst: our TLS space is not big enough for libclc :O
22:16 karolherbst: that's going to be annoying
22:16 jenatali: TLS?
22:16 karolherbst: thread local storage
22:17 jenatali: Ah
22:17 karolherbst: scratch memory :)
22:17 karolherbst: but our allocation is a bit weird
22:17 karolherbst: so.. we have to alloce a bo for the entire GPU
22:17 karolherbst: and size it accordingly depending on .. SM and thread count :p
22:17 karolherbst: so bigger GPUs need a bigger allocation
22:17 karolherbst: and we kind of do it
22:17 karolherbst: we just never resize if we need more
22:19 karolherbst: jenatali: ERROR: atan2: -nan ulp error at {-0x1.fffffcp-127 (0x807fffff), -0x1.fffffcp-127 (0x807fffff)}: *-0x1.2d97c8p+1 vs. -nan (0xffffffff) at index: 3762
22:20 jenatali: You're producing a nan while doing atan2 on 2x -FLT_MAX
22:20 jenatali: Awesome
22:20 karolherbst: :D seems like it?
22:20 jenatali: Er, -FLT_MIN
22:20 karolherbst: I never bothered actually understanding the format the CTS prints out
22:20 jenatali: Yeah the * is expected
22:20 karolherbst: ahh
22:20 karolherbst: okay
22:21 jenatali: I wish that was clearer...
22:21 karolherbst: uhh wait...
22:21 imirkin: karolherbst: get rid of all the FTZ's and it should work
22:21 karolherbst: imirkin: it's graphics only
22:21 karolherbst: we don't do it for compute shaders
22:21 karolherbst: afaik
22:21 karolherbst: ohhhhh wait....
22:21 karolherbst: let me check something
22:21 imirkin: i'm not 100% sure. definitely started out not doing it for compute
22:21 jenatali: karolherbst: Does Clover set the API cap saying it supports float denorms?
22:21 jekstrand: jenatali: Go for it. (re 6393)
22:21 imirkin: but i think i eventually flipped it on
22:21 karolherbst: imirkin: we have a flag on the QMD header
22:22 karolherbst: for denorm flushing
22:22 karolherbst: maybe that's it
22:22 imirkin: wha....
22:22 karolherbst: :)
22:22 karolherbst: imirkin: https://github.com/NVIDIA/open-gpu-doc/blob/master/classes/compute/clc1c0qmd.h#L145
22:23 karolherbst: but I think by default it should be fine
22:23 imirkin: doesn't apply, i think
22:23 karolherbst: but we still have it
22:23 imirkin: "narrow instruction"?
22:23 karolherbst: no clue
22:23 imirkin: on fermi, you could do 4-byte instructions
22:23 karolherbst: that's for pascal
22:23 imirkin: but we never quite got it going
22:23 imirkin: i'm not aware of any "narrow" instructions on later gpu's
22:23 karolherbst: whatever that means :p
22:23 imirkin: yeah, i guess it could mean anything
22:24 karolherbst: but yeah.. seems like that field went away
22:24 karolherbst: volta doesn't have it anymore
22:24 imirkin: presumably there's a narrow encoding which doesn't have the FTZ bit
22:24 imirkin: and this lets you just set/unset it globally
22:25 karolherbst: yeah...
22:25 karolherbst: the QMD2 format doesn't have it either
22:25 karolherbst: and we use.. 2.1 on pascal?
22:25 karolherbst: yeah.. no FTZ emited
22:26 imirkin: https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp#n335
22:26 imirkin: and presumably you're using TYPE_COMPUTE here
22:26 karolherbst: yep
22:26 karolherbst: nvdisasm also agrees
22:26 imirkin: so yeah, i guess i left that in
22:27 karolherbst: I bet when I emit FTZ the test fails even more
22:27 karolherbst: heh
22:27 karolherbst: or not at all...
22:27 karolherbst: now I am confused
22:27 jenatali: :O
22:27 karolherbst: ERROR: atan216: -nan ulp error at {-inf (0xff800000), -inf (0xff800000)}: *-0x1.2d97c8p+1 vs. -nan (0xffffdead) at index: 99
22:27 karolherbst: :D
22:27 karolherbst: fails differently
22:27 karolherbst: super
22:28 jenatali: ffffdead is an excellent nan representation
22:28 karolherbst: let me check something
22:28 karolherbst: we also emit NEU.. mhhh
22:28 karolherbst: but I know that our folding is a bit broken
22:29 karolherbst: emited binary: https://gist.github.com/karolherbst/8f6a44af1ec376a487860efd3c37c36d
22:30 karolherbst: maybe MUFU.RCP kills it?
22:30 karolherbst: but for fp32 that should be fine...
22:31 karolherbst: I blame some nir opt :p
22:31 karolherbst: jekstrand: how can I turn of all algebraic opts?
22:31 jenatali: Don't call nir_opt_algebraic?
22:32 karolherbst: ohh right.. I don't have to fix it up in gallium :D
22:32 karolherbst: ehhh.. somebody removed support for fsub :D
22:33 karolherbst: mhhh.. same
22:34 karolherbst: ohh.. maybe
22:34 karolherbst: strange
22:35 karolherbst: jenatali: so it does try to do atan2 on -0x1.fffffcp-127, -0x1.fffffcp-127 and expects -0x1.2d97c8p+1, right?
22:35 jenatali: Yep
22:36 airlied: another insane optin for clc is to split the libs, have a core lib and smaller ones per ffma func
22:36 airlied: but feels horrible as well
22:36 karolherbst: mhhh
22:36 jenatali: airlied: Yeah... I suppose it'd work but agreed, doesn't feel great
22:37 airlied: like we can do whatever we like since we control the horizontal and the vertical (outer limits fans only)
22:37 curro_: jenatali: karolherbst's R-b for something under clover/nir/ should be good enough, don't wait for me to merge it ;)
22:37 jenatali: curro_: What if karolherbst wrote it? :)
22:38 curro_: jenatali: heh, fine, i'll take a look at the patch then :P
22:39 curro_: what's the MR number?
22:39 jenatali: !5891
22:44 karolherbst: uhhh jenatali I think there is a bug for devices with 2 dims...
22:45 karolherbst: it should be nir_vec3 allways
22:45 jenatali: Where?
22:45 curro_: regarding the libclc discussion, would it be helpful to fork+merge *all* potentially useful variants of libclc into the mesa tree so there is no dependency problem anymore? :P
22:45 karolherbst: jenatali: spir-v requires always a vec3, no?
22:46 jenatali: karolherbst: I believe so - are you talking about in the offsets patch for clover?
22:46 airlied: curro_: would introduce a clang build dep on mesa
22:46 airlied: whihch might be fun to resolve
22:46 karolherbst: jenatali: yes
22:46 karolherbst: nir_vec -> nir_vec3
22:47 karolherbst: the other part of the code handles that correctly already
22:47 jenatali: Cool, will take a look in a minute
22:47 curro_: airlied: hm but clover depends on clang already
22:47 karolherbst: or just put a 3 instead of state->global_dims
22:47 airlied: curro_: not to execute
22:47 airlied: at build time
22:47 karolherbst: also.. we depend on libclang, not clang :p
22:48 karolherbst: for some distribution that matters even
22:48 curro_: doesn't seem like a terribly invasive change since most people with libclang available at run-time are likely to have clang lying around too, or? :P
22:48 airlied: it's not a horrible idea, but it would definitely need more scoping into how bad it might be
22:49 airlied: just not sure we want to fork libclc, but I don't know if it has any other consumers anyways
22:49 karolherbst: I think it is a good idea to make it available to others if we can show that it works
22:49 karolherbst: no point that everbody implements it themselves...
22:49 karolherbst: airlied: also, somebody did add the PTX stuff for a reason I hope?
22:50 airlied: karolherbst: a reason yes, a good reason, who knows :-P
22:50 karolherbst: LD
22:50 karolherbst: :D
22:50 karolherbst: at this point any reason is fine I think :D
22:51 karolherbst: jenatali: anyway, with that vec stuff fixed, I feel comfortable enough with the patch :D I knew there was something fishy, I just didn't notice :p
22:51 karolherbst: uhm
22:51 karolherbst: didn't find it at first
22:52 jenatali: karolherbst: Cool, I'll just replace state->global_dims with 3 I think?
22:52 karolherbst: yeah
22:52 curro_: airlied: or maybe not fork it nor merge it into the mesa tree, instead send a merge request for the one chosen libclc tree collecting the useful bits from every other variant, so it's still somebody else's maintenance problem? ;)
22:58 karolherbst: jenatali: that atan2 lowering is still quite massive :/
22:59 jenatali: karolherbst: Yep
22:59 karolherbst: that will be annoying
22:59 jenatali: There might be easier fails to focus on?
22:59 karolherbst: I thought writing my own test would make it smaller..
22:59 karolherbst: guess what
23:02 karolherbst: jenatali: I can just compare with nvidia and see if they do anything special :D
23:04 karolherbst: mhh.. lowering is different enough :/
23:05 karolherbst: uhhh
23:06 curro_: because nvidia take so much more care for their assembly to be readable? ;)
23:06 karolherbst: I am inclcined to take care of the crash before :D
23:07 jenatali: karolherbst: Those would just be opcodes I added (which maybe should just be lowerings instead based on the post-merge discussion on that MR)
23:07 karolherbst: jenatali: but there seems to be something fundamentally wrong :/
23:07 karolherbst: not quite sure what though
23:07 jenatali: karolherbst: I'm not sure... our software driver passes it, but yeah I'm not getting perfect results on hardware
23:07 jenatali: We'll figure it out
23:08 jenatali: If it's a bug in libclc we'll get it upstreamed there, if it's a bug on our side we'll fix it there, I'm not too concerned
23:09 karolherbst: mhh "Subnormal values supported for floats? NO"
23:09 jenatali: That means the test should be expecting them to be ftz
23:09 jenatali: karolherbst: The values that were failing were the smallest (negative) normal floats, they weren't subnormal
23:09 karolherbst: airlied: clinfo is super slow with libclc
23:10 karolherbst: jenatali: right...
23:10 karolherbst: let me check something...
23:10 karolherbst: what is the default rounding for floats?
23:10 karolherbst: nearest?
23:10 jenatali: rtne, yeah
23:12 airlied: karolherbst: yeah it has to link stuff
23:12 karolherbst: let me still enable denorms and fma
23:12 airlied: karolherbst: should probably figure out how to make that faster
23:13 karolherbst: airlied: it links multiple times :D
23:13 karolherbst: even with the shader cache enabled
23:14 karolherbst: mhhhh
23:15 karolherbst: maybe I can get atan to fail
23:15 karolherbst: that should be a bit more siple
23:16 karolherbst: it does fail for vec16.. but maybe in non wimpy for other precisions as well
23:17 karolherbst: jenatali: divide fails :)
23:17 jenatali: O.o
23:17 jenatali: That's... not libclc :)
23:18 karolherbst: yeah
23:19 karolherbst: so the deal is.... we don't have fdiv...
23:19 karolherbst: because.. nobody has, right?
23:19 jenatali: That... could explain why WARP can get accurate results but not hardware
23:19 karolherbst: yeah..
23:20 jenatali: Maybe we do need a more precise divide implementation than reciprocal multiplication
23:20 jenatali: karolherbst: What's NV's code look like for divide? :)
23:20 karolherbst: (fdiv a b) == (mul a (rcp b)
23:20 karolherbst: )
23:20 jenatali: I mean for the CL divide test
23:21 karolherbst: yes
23:21 jenatali: Ah, ok
23:21 jenatali: Was curious if they did something more complex for CL
23:21 karolherbst: or did you mean the full one? I mean.. there is just the divide and boilerplate :)
23:21 karolherbst: nah.. just the rcp and mul
23:21 karolherbst: so I can totally see this causing it to fail
23:21 karolherbst: imirkin: any ideas?
23:21 karolherbst: let's ask nvidia :D
23:22 karolherbst: mhhh
23:22 imirkin: iirc we try to do it the nvidia way
23:22 karolherbst: div.full
23:22 karolherbst: that looks suspicious
23:22 karolherbst: and this is ptx
23:22 karolherbst: ehhhh
23:23 karolherbst: jep...
23:23 karolherbst: imirkin: https://gist.github.com/karolherbst/94272114ce318dbf5d4a3f57d1037a67
23:23 jenatali: https://microsoft.github.io/DirectX-Specs/d3d/archive/D3D11_3_FunctionalSpec.htm#22.10.2%20div: "Beware of the two allowed implementations of divide: a/b and a*(1/b)"
23:24 karolherbst: yeah
23:24 karolherbst: graphics is quite lax on that
23:24 karolherbst: soo.. hardware doesn't have div
23:24 karolherbst: because...
23:24 jenatali: Thanks, that's super helpful to have that lead actually
23:24 karolherbst: you know.. insane instruction
23:24 karolherbst: idiv is an even bigger headache
23:25 karolherbst: sooo.. ptx has div.approx and div.full
23:25 karolherbst: and approx is a*1/b of course
23:26 karolherbst: heh..
23:26 karolherbst: they still insert fsetp
23:26 karolherbst: guess they are not _that_ lax as in graphics
23:26 karolherbst: but yeah...
23:26 imirkin: karolherbst: wait, so what's the diff?
23:26 imirkin: (on nvidia)
23:27 karolherbst: imirkin: fdiv.approx: https://gist.github.com/karolherbst/5e07fb31714dbe70c45244e8c422048a
23:27 imirkin: oh, it messes with the powers
23:27 imirkin: interesting
23:27 karolherbst: yeah
23:27 imirkin: s/powers/exponents/
23:28 imirkin: for super-small/large ones
23:28 karolherbst: jenatali: do we have an fdiv in libclc? :D
23:28 karolherbst: yeah.. it's a common lowering strategy
23:28 jenatali: karolherbst: Good question...
23:28 karolherbst: they do it for a bunch of stuff
23:28 karolherbst: some of the nir lowering I've added does it as well I think
23:28 karolherbst: just scale the input
23:28 airlied:just fixed a GL CTS test that expected a/b == a*(1/b)
23:28 imirkin: yeah, that's quite common
23:28 jenatali: karolherbst: Nah I don't see it
23:28 karolherbst: mhhh
23:28 karolherbst: annoying
23:29 imirkin: airlied: i convinced the cts people that floor(a/a) can be 1.0 ;)
23:29 jenatali: Half_divide and native_divide are both x / y
23:29 karolherbst: imirkin: that was a good one :d
23:29 karolherbst: jenatali: ehhh...
23:29 imirkin: that was a nice swindle
23:29 karolherbst: jekstrand: any idea how to do proper fdiv in nir? :D
23:30 jenatali: karolherbst: We can add one to libclc, so we can write it in C instead of nir
23:30 jenatali: Or we could just write it in nir...
23:30 karolherbst: yeah.. not much work
23:30 karolherbst: jenatali: the issue is we want to lower it to fdiv
23:30 karolherbst: so we can't really do it in libclc
23:30 jenatali: karolherbst: Us too, probably
23:30 jenatali: Well
23:31 karolherbst: we just scale a little
23:31 jenatali: Eh I dunno, we probably don't actually
23:31 karolherbst: why not?
23:31 karolherbst: ohh because some hw can do it?
23:31 karolherbst: mhhh
23:31 karolherbst: doubtful :p
23:31 jenatali: Because our fdiv is unspecified whether it's accurate or not
23:31 karolherbst: show me the hw implementing it correctly (and in fast, I just want to exclude CPUs)
23:32 karolherbst: the two possible scalings are just annoying
23:33 karolherbst: bcsel or control flow?
23:34 karolherbst: ahh, nvidia checks against 0x00800000 and 0x7e800000
23:34 karolherbst: interesting
23:34 karolherbst: guess that could work for all hw then
23:34 karolherbst: jenatali: I'd just push it into vtn_opencl
23:34 jenatali: One of the nice parts of this project is it made me finally learn IEEE floating point :P
23:34 karolherbst: :D
23:34 karolherbst: "nice parts"
23:35 karolherbst: I honestly don't know how frustrating it is for you normally to talk with hw engineers from vendors
23:35 jenatali: Heh I've been in this industry long enough I really should've learned it a while ago...
23:35 karolherbst: "nah, we can't talk about it"
23:35 karolherbst: :p
23:35 jenatali: Depends lol, we have a decent relationship with most vendors
23:35 karolherbst: even nvidia?
23:35 jenatali: Though that's not usually me, it's other folks
23:35 jenatali: Yeah, especially NVIDIA
23:35 karolherbst: interesting
23:36 karolherbst: I mean.. it always depends who you are talking to
23:36 airlied: jenatali: now go learn VAX FP :-P
23:36 karolherbst: and I got nice stuff from nvidia as well, so it's not that bad
23:36 jenatali: D:
23:36 airlied:hopes I've forgotten all of VAX FP
23:36 jenatali:doesn't know what that is
23:36 airlied: but I do have the manual on my shelf
23:37 imirkin: VAX? that's shortcut for saying "i'm old"
23:37 airlied: VAX has an alterante pre-IEEE FP
23:37 jenatali: Apparently I'm too young
23:37 airlied: imirkin: I think even when I worked on VAX it was retro
23:37 karolherbst: the heck
23:38 karolherbst: how terrible is VAX
23:38 imirkin: was that the one with 36-bit words?
23:38 airlied: at least I'm pretty sure when I worked on it they weren't making it anywhere anymore, which I suppose counts as retro
23:39 karolherbst: imirkin: https://nssdc.gsfc.nasa.gov/nssdc/formats/VAXFloatingPoint.htm
23:39 karolherbst: I like how they defined two 64 bit versions with different precisions :D
23:40 karolherbst: I bet that caused quite some confusions from time to time
23:40 karolherbst: but also how those things are just "middle endian" :P
23:41 karolherbst: although I guess at this point 16 bit was the "native" size?
23:45 jenatali: karolherbst: Made the change to !5891
23:46 karolherbst: cool
23:50 karolherbst: now.. how to implement GEU...
23:57 imirkin: karolherbst: should just work, no?
23:57 karolherbst: meant in nir
23:58 karolherbst: but I have it working now
23:58 imirkin: oh. yeah, can't help you :)
23:58 karolherbst: we don't have unordered in nir yet
23:58 karolherbst: but that's... on some todo lists
23:59 imirkin: wouldn't all of nir be unordered?