04:05 airlied:drops some ttm refactoring, got a bit out of hand
06:06 danvet: airlied, intentionally merged the few ttm patches to drm-next?
06:06 danvet: just because your cover letter says probably drm-misc-next
06:08 airlied: danvet: yeah those were a few of the precursor ones
06:08 danvet: yeah worst case we make some backmerges if stuff clashes
06:08 airlied: the big series is pretty orthognal from those anyways
07:19 tzimmermann: danvet, sravn, for the bochs-on-sparc64 problem: did you consider the proposal to add read/write callbacks to gem objects?
07:47 danvet: tzimmermann, uh read/write on buffers has massively fallen out of favour
07:48 danvet: it was once floating around as idea
07:48 danvet: even implemented in i915-gem as pwrite/pread
07:48 danvet: but all userspace switched over to mmap
07:48 danvet: just having read/write for the toio/fromio stuff feels even more awkward than the is_iomem flag sprinkling
07:49 tzimmermann: danvet, i see.
07:49 danvet: tzimmermann, I'm also bheind on dri-devel a bit :-/
07:49 danvet: so haven't gotten around to your latest stuff yet
07:50 tzimmermann: no problem
07:51 tzimmermann: i can't say i like having to return struct pointer {} much
07:52 tzimmermann: it adds complexity to the caller
07:52 tzimmermann: which could have been hidden well in read/write
07:53 tzimmermann: danvet, but how's userspace solving the problem? they must know if they are on I/O memory or system memory
07:53 tzimmermann: i thought i saw a flag in on of the uapi structs, but i can't find it anymore
07:53 tzimmermann: s/on/one
07:57 danvet: tzimmermann, tbh I have no idea
07:57 danvet: but I'd assume somewhere between "badly" and "not at all"
07:57 danvet: if fbdev doesn't have a flag, then it's probably "not at all"
07:57 danvet: or maybe all fbdev userspace from the 90s assumed that it's iomem
07:58 danvet: and unconditionally uses iomem instructions
07:58 danvet: on the arch where this makes a difference
07:58 danvet:no idea
07:59 tzimmermann: ok
08:03 airlied: I think ppc64 is also a funky arch
08:19 MrCooper: don't think it has a separate I/O memory space though, unless they added that compared to PPC32?
08:24 danvet: ppc is special because barriers
08:25 tomeu: MrCooper: btw, this should fix occasional failures when uploading artifacts to MinIO: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6135
08:25 danvet: I think
08:25 danvet: not sure
08:25 tomeu: not sure how often that happens
08:56 tzimmermann: and what about dma-buf. there seems to be no solution either ?
08:58 danvet: yeah
08:59 tzimmermann: danvet, has there ever been a proposal for a solution?
09:00 danvet: nah, no one cares enough I guess
09:00 danvet: if we fix fbdev internally for fbcon, we're probably good for another 10 years :-)
10:09 mlankhorst: at that point we just have to convert fbcon to atomic..
10:16 pendingchaos: jekstrand, cmarcelo: can one of you take a look at this spirv->nir patch: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6063/diffs?commit_id=170ccdb923939d9302b18e6210479f2e8c2b1ea3 ?
10:49 pendingchaos: I created a milestone for the 20.2 branchpoint: https://gitlab.freedesktop.org/mesa/mesa/-/milestones/19
13:14 dcbaker[m]: pendingchaos: thanks for doing that
13:22 zmike: would someone mind marging this for me https://gitlab.freedesktop.org/mesa/piglit/-/merge_requests/355
14:08 karolherbst: jenatali: "OpenCL 2.0 allows kernels to be enqueued with global_work_size larger than the compute capability of the NVIDIA GPU." :O they also work on non uniform ND ranges.. nice.. I guess I will take a look at some point :D
14:08 karolherbst: that's with nvidias 450 driver
14:09 jenatali: Awesome
14:09 karolherbst: they supported CL2.0 for a while now, but added and removed features on a whim
14:09 jenatali: CL2.0's got some weird features...
14:09 karolherbst: seemed like they added SVM support back in
14:13 jenatali: Btw karolherbst, dunno if you saw, one more missing CL1.2 feature hooked up in vtn: https://gitlab.freedesktop.org/kusma/mesa/-/merge_requests/255/diffs?commit_id=6a96e2a0181332ed971046c1127a4192d66737bc
14:13 jenatali: Will get it on our backport queue
14:13 jenatali: Er, not backport, upstream
14:13 karolherbst: jenatali: ahh so you figured out how to make it work without requiring fp16 support?
14:14 jenatali: Yeah, jekstrand's hints for the pack/unpack instructions was helpful. DXIL's got pretty close equivalents to those
14:14 karolherbst: :) cool
14:28 jekstrand: karolherbst: How would you feel about merging some form of jenatali's MR for base workgroup stuff and sorting out lower_system_values on top of that?
14:28 karolherbst: jekstrand: I think it is fine... that system value cleaning up ended up being more annoying than I hoped for anyway :/
14:28 karolherbst: no solution for the iris situation
14:29 jekstrand: karolherbst: Yeah.
14:29 jekstrand: karolherbst: And I think clover copying the options struct and changing one value is fine
14:29 jekstrand: It's not ideal, of course, but I think it's ok.
14:29 karolherbst: jekstrand: we already patch nir->info after spirv_to_nir in clover anyway :/
14:29 karolherbst: so I guess we can do that for now
14:29 jekstrand: Ok, that sounds good
14:29 karolherbst: "nir->info.cs.local_size_variable = true;" is the thing we change :)
14:30 jekstrand: karolherbst: That doesn't get set by spirv_to_nir automatically for kernels?
14:30 karolherbst: jekstrand: it's difficult :/
14:31 jekstrand: ?}
14:31 karolherbst: well.. CL allows you to provide hints for the local size...
14:31 jekstrand: jenatali: what's the status if that system values MR? I made a few comments on it IIRC.
14:31 jenatali: Let me look
14:32 jenatali: jekstrand: Aside from the larger discussion around nir_options, I think the only outstanding comment is whether the base names should have "_id" at the end
14:32 karolherbst: ehh.. wait.. the nir_shader_compiler_options is constant.. mhhh
14:32 karolherbst: okay.. so we need to workaround that a little indeed
14:33 jekstrand: karolherbst: Yeah
14:33 jekstrand: karolherbst: But, like I said, I think copying the nir_options and modifying it is probably fine.
14:33 karolherbst: uhhhmmm.. wait
14:33 jekstrand: ?
14:33 karolherbst: jekstrand: how does that work with serialized nirs?
14:33 jekstrand: Uh...
14:33 jekstrand: It probably doesn't
14:33 karolherbst: yeah.....
14:34 karolherbst: right.. so
14:34 karolherbst: mhhh
14:34 karolherbst: crap
14:34 jekstrand: The problem we have is that we're mixing state-tracker things with back-end things.
14:34 karolherbst: for $reasons (tm) clover shares a serialized nir with the driver
14:35 karolherbst: and the driver deserializes with its compiler options of course
14:35 karolherbst: _but_
14:35 karolherbst: we called nir_lower_system_values before serializing
14:35 karolherbst: so maybe it's fine...
14:35 karolherbst: *sigh*
14:35 jekstrand: For this one, I think it's fine.
14:35 jenatali: Good to track though as a reason we should change it
14:36 jekstrand: Bah
14:37 jekstrand: There's two ways we can look at this, IMO.
14:37 jekstrand: One is to say that compute is findamentally this weird thing that might get carved up into clumps.
14:37 jekstrand: In which case drivers implement work_group_id_zero_base and then there are other things which *may* get added in depending on API.
14:38 jekstrand: The second is to say that compute is simple and you just have a work group size; the fact that OpenCL needs some lowering is OpenCL's problem.
14:38 jenatali: Vulkan still has the workgroup ID offsets though?
14:39 karolherbst: jekstrand: I tend to the latter
14:39 karolherbst: jenatali: but it's explicit in the IR
14:39 karolherbst: so no problem
14:39 jekstrand: In this case, we need a NIR pass which looks at work_group_id etc., adds inputs to the shader as needed, and then produces something with normal work_group_id.
14:39 jekstrand: Vulkan is weird.
14:39 jekstrand: Looking at OpenCL, I'm quickly becoming convinced that the Vulkan design is wrong.
14:40 jekstrand: Because it adds a base workgroup but doesn't deal with the num_work_groups problem.
14:40 karolherbst: mhh, right
14:40 jenatali: Yeah good point
14:40 karolherbst: jekstrand: but I don't think it has to
14:40 jekstrand: It was smashed in at the last moment to support device groups
14:40 jekstrand: So you could easily run part of a workgroup
14:40 karolherbst: like.. which hw does support num_work_groups natively anyway?
14:40 jekstrand: But, as I said, I think we got it wrong.
14:41 karolherbst: I know nv doesn't
14:41 jekstrand: None does
14:42 jekstrand: But my point is that Vulkan got the API wrong
14:42 karolherbst: ahh
14:42 jekstrand: Because num_work_groups is critical to being able to calculation gl_GlobalInvocationIndex
14:42 jekstrand: And so the gl_GlobalInvocationIndex when using baseGroup in Vulkan is complete garbage.
14:42 karolherbst: uhhh... I see
14:43 jenatali: Is it?
14:43 karolherbst: why would the amount of work groups matter? .... yeah, that's strange
14:43 jenatali: Local size matters, not global size
14:43 karolherbst: jenatali: not global size, count of groups ;)
14:44 jenatali: Sure, global size == (count of groups * local size) unless you have non-uniform groups
14:44 karolherbst: right
14:45 karolherbst: still doesn't make sense to base get_global_index on get_num_groups :)
14:45 karolherbst: ehhh.. get_global_linear_id in CL
14:45 jenatali: Yeah, that's CL-only though
14:45 karolherbst: ohhhh wait....
14:45 jenatali: And it's defined to remove offsets
14:46 karolherbst: wait wait...
14:46 jenatali: Er, to remove global offsets, to keep workgroup offsets (since those are hidden anyway)
14:46 karolherbst: you need global size for index even in CL
14:46 jenatali: Yeah, but index is CL-only
14:46 karolherbst: "gl_GlobalInvocationIndex"?
14:46 jenatali: Is that a thing?
14:46 karolherbst: yes
14:46 karolherbst: GL 4.3
14:47 karolherbst: well.. glsl 4.30 rather
14:47 jenatali: Interesting... I was looking at SPIR-V and the built-in GlobalLinearId is listed as a Kernel value
14:47 karolherbst: mhhh
14:48 jekstrand: Global size matters because you have to flatten the 3D thing to 1D
14:48 jekstrand: That requires knowing two of the dimensions.
14:48 karolherbst: yeah.. I see that now
14:48 jekstrand: So, yeah, Vulkan's design is just broken.
14:48 jenatali: I'm having trouble finding references to gl_GlobalInvocationIndex
14:48 karolherbst: jekstrand: how does gl_GlobalInvocationID get emited to spir-v?
14:48 karolherbst: ohh wait
14:49 karolherbst: it's a uvec3...
14:49 karolherbst: fun
14:49 karolherbst: ...
14:49 karolherbst: I messed up
14:49 jenatali: Yeah, GlobalInvocationID is not Index, and ID doesn't need global size (or num groups)
14:49 karolherbst: jenatali: seems like glsl 4.30 added "gl_GlobalInvocationID"
14:49 karolherbst: right
14:49 jekstrand: It's gl_GlobalInvocationIndex that's 1D
14:49 jenatali: Index is CL-only, and that's the 1D thing
14:49 karolherbst: jekstrand: that's vulkan only I guess?
14:50 jenatali: jekstrand: My google-fu must be failing me, I can't find any hits
14:50 karolherbst: but honestly...
14:50 karolherbst: where is gl_GlobalInvocationIndex defined?
14:50 imirkin: there is none
14:50 imirkin: i use http://docs.gl
14:50 imirkin: to quickly find stuff
14:50 imirkin: it's not 100% authoritative, but it's a fast UI
14:50 imirkin: (it's basically the man pages)
14:51 karolherbst: jekstrand: I can't find gl_GlobalInvocationIndex ...
14:51 imirkin: there's only local
14:51 imirkin: no global
14:51 imirkin: there's a gl_GlobalInvocationID
14:51 jenatali: Yeah that's what I'm seeing too
14:51 imirkin: but no index. one could obviously be computed using ... math
14:52 jenatali: Unless you're using workgroup ID offsets :P
14:52 jekstrand: hrm... maybe it doesn't exist? That would certainly save us.
14:52 imirkin: wouldn't the global invocation id account for such offsets?
14:52 karolherbst: jekstrand: I added SYSTEM_VALUE_GLOBAL_INVOCATION_INDEX for CL :p
14:52 jenatali: imirkin: Yes, but not the number of groups
14:53 karolherbst: either way, the lowering still needs it :)
14:53 karolherbst: so it's CL which is weird not vulkan :D
14:53 jekstrand: Ok, yeah, gl_GlobalInvocationIndex doesn't exist. It only exists in CL. Vulkan is safe, I guess.
14:53 jekstrand: Still dumb but technically safe
14:53 jenatali: :)
14:53 imirkin: jenatali: oh, you wouldn't have the overall "width"? annoying. but yeah, seemingly not important.
14:54 karolherbst: imirkin: well.. if you operate on 3d arrays?
14:54 karolherbst: anyway.. dispatching 1D workloads through CL is not slower than dispatching 2D or 3D ones
14:54 karolherbst: so there isn't really a huge point
14:54 jekstrand: So it sounds to me like, for CL, num_work_groups needs to be handled by magic inputs and global_invocation_index needs to be handled using the "real" number of workgroups so it has to be lowered in the CL driver.
14:55 karolherbst: yeah
14:56 jenatali: Yup
14:57 jekstrand: That still leaves the question of what to do with all these bases.
14:57 jekstrand: Should we just have a core NIR lowering pass which adds bases?
14:57 jekstrand: And not bother with zero_base
14:57 jenatali: If you really prefer, we can handle this in clover/CLOn12's compiler rather than vtn, though I'd still like to have dedicated intrinsics for loading so we don't end up with the possibility of adding offsets multiple times
14:58 karolherbst: yeah..
14:58 jekstrand: Ugh... That's still a problem for Intel....
14:58 jekstrand: Becuase we do everything in terms of invocation_index not invocation_id
14:58 jenatali: jekstrand: Global invocation index? Or local?
14:59 karolherbst: jenatali: they don't have local ids afaik
14:59 jekstrand: Oh, wait, those are fine.
14:59 jenatali: Yeah
14:59 jekstrand: Bah. Compute has way too many IDs
14:59 karolherbst: it does
14:59 jenatali: You're not wrong :P
14:59 jekstrand: Ok, so that one's fine.
15:00 jekstrand: Ok, so I see two options and they both suck differently:
15:01 jekstrand: 1. Do the lowering in nir_lower_system values and have a _zero_base version. This requires some way of figuring out how to pass API-specific options into nir_lower_system_values. Or we could just make it do different things with KERNEL.
15:01 karolherbst: jekstrand: level0 doesn't have offsets ;) just in case we want to support it at some point
15:02 alyssa: spec isn't clear to me, does anyone know what sign(NaN) is supposed to be?
15:02 karolherbst: yes?
15:02 karolherbst: alyssa: did you check what the CTS expects?
15:02 jekstrand: 2. Add a shared nir_add_work_group_bases pass or something like that which adds the bases in and dumps out a shader with load_work_group_id not load_work_group_id_zero_base. This lets us avoid perturbing drivers quite so much but you have to be careful and ensure the pass is only called once.
15:02 karolherbst: I could imagine it accepting multiple values
15:03 jekstrand: alyssa: Undefined, I think.
15:03 alyssa: karolherbst: but that's *work* =p
15:03 alyssa: jekstrand: that's my reading of the spec too
15:03 alyssa: since it gives a piecewise definition with compares, none of which are true for NaN by definition
15:04 karolherbst: jekstrand: can't we have a mix? we remove the compute related loweriing out of lower_system_values into it's own pass
15:04 karolherbst: and callers can do whatever they like
15:04 karolherbst: but have to call it
15:04 jekstrand: alyssa: The SPIR-V spec is incredibly unhelpful: "1.0 if x > 0, 0.0 if x = 0, or -1.0 if x < 0." I'm sorry, but NaN is false for all three cases. :-P
15:04 alyssa: exactly!
15:05 jekstrand: alyssa: File a spec bug.
15:05 karolherbst: any value goes :p
15:05 jenatali: jekstrand: We could modify your option #2, and have the pass convert load_work_group_id into zero_base + base?
15:05 karolherbst: I have a silly idea
15:05 karolherbst: let's do the second pass
15:05 karolherbst: and it has an argument: the API
15:05 alyssa: jekstrand: Hm, but if I don't, it remains undefined so I have less things to worry about in my implementation =p
15:06 jenatali: alyssa: CL's sign says 0.0 for NaN if it helps?
15:06 jenatali: "Returns 1.0 if x > 0, -0.0 if x = -0.0, +0.0 if x = +0.0, or -1.0 if x < 0. Returns 0.0 if x is a NaN."
15:06 alyssa: jenatali: That does help, thanks :)
15:06 alyssa: how do we model that in NIR?
15:06 karolherbst: alyssa: I have... uhm... helper funcs for nan
15:06 alyssa: (are we sticking an isnan check in at the clc level..?)
15:06 karolherbst: yes
15:07 jenatali: alyssa: That one maps to nir_op_fsign
15:07 karolherbst: alyssa: "nir_nan_check2"
15:07 jekstrand: alyssa: I think the result is that the spec will explicitly say it's undefined.
15:07 karolherbst: yeah
15:08 karolherbst: I think we implement sign incorrectly at this point for CL, but well...
15:09 alyssa: karolherbst: Ah :)
15:09 karolherbst: alyssa: I think hw simply does the right thing?
15:09 alyssa: jenatali: fsign does not have this semantic though :p
15:09 alyssa: karolherbst: ours doesn't! *clap*
15:09 alyssa: everything is more fun with gles devices
15:09 jenatali: alyssa: There's a lower_fsign option?
15:10 karolherbst: alyssa: we lower fsign a well, but differently
15:10 alyssa: I meant, the constant folding definition of fsign in NIR doesn't seem well behaved for NaN
15:10 karolherbst: ohhh not true :D
15:10 karolherbst: we do the same
15:10 karolherbst: fun
15:10 alyssa: (I think it'd fold `fsign(NaN) to -1`?)
15:11 karolherbst: alyssa: heh?
15:11 karolherbst: shouldn't that be... 0?
15:11 jenatali: Yeah... looks like you're right
15:11 jenatali: The algebraic lowering seems right though
15:12 karolherbst: soo.. we have (fsub false false), right?
15:12 jenatali: karolherbst: Constant folding implementation is ((src0 == 0.0f) ? 0.0f : ((src0 > 0.0f) ? 1.0f : -1.0f))
15:12 karolherbst: ehhhh
15:12 karolherbst: okay.. constant folding is wrong for CL :p
15:12 karolherbst: but the nir lowering is correct
15:12 jenatali: Agreed
15:13 alyssa: I was useful! \o/
15:13 jenatali: Good thing the CTS doesn't do sign() on an inline constant :P
15:13 alyssa: ;P
15:13 karolherbst: yeah....
15:13 alyssa: jenatali: If a tree falls in a forest, and the CTS doesn't check for it...
15:13 karolherbst: jenatali: guess they didn't want the CTS run to take a week
15:13 karolherbst: :p
15:13 jenatali: :D
15:14 jenatali: jekstrand, karolherbst: Back to offsets - did we reach a conclusion?
15:14 jenatali: Sounds like we're trending towards a dedicated pass to add offsets
15:14 jenatali: Though I'd prefer if it didn't have to be fragile against running it twice and getting double-offsets...
15:15 karolherbst: yeah.. I think adding all the new intrinsics is fine
15:15 karolherbst: but
15:15 karolherbst: we can just require runtimes to run the pass with whatever option they prefer
15:15 karolherbst: or just be sane and be the API the option
15:15 jekstrand: We can also handle it in nir_lower_system_values and just look for KERNEL
15:15 karolherbst: so you can say that you want CL behaviour or wahtever
15:15 jekstrand: I don't really like that though.
15:16 karolherbst: jekstrand: does only work for CL though
15:16 jenatali: jekstrand: That's what's checked into our downstream fork, though it's vtn that's checking for KERNEL
15:16 karolherbst: but not for other APIs having kernels
15:16 karolherbst: granted.. we don't care about those yet, but it might become a problem
15:16 jenatali: Personally I'd still prefer options so we can remove offsets (and the corresponding UBO loads) when they're not being used at the API (99% case)
15:16 jekstrand: KERNEL doesn't fix anything for Vulkan
15:17 karolherbst: I think CL is the only compute API having those silly offsets anyway
15:17 karolherbst: jenatali: right...
15:17 karolherbst: so rather a small option struct
15:17 jenatali: Yeah
15:18 karolherbst: so kind of like the stuff I've done except we leave the nir_options stuff alone :p
15:18 jenatali: Yeah and just put it into a dedicated pass
15:18 karolherbst: yep
15:18 jenatali: Alright, sounds reasonable to me
15:18 jekstrand: So what did we settle on?
15:19 jekstrand: What does this separate pass do?
15:19 karolherbst: nir_lower_compute_system_values?
15:19 jenatali: Leave the existing system value -> existing intrinsic lowering as-is in lower_system_values, then add a new pass which can optionally lower them to zero_base + base variants
15:19 jekstrand: Part of the problem is that nir_lower_system_values does two things: 1. convert from load_var to load_<system value> and 2. lowers some system values to other system values.
15:20 karolherbst: yeah.. I think we should leave the 1. in lower_system_values
15:20 karolherbst: or maybe we split the pass up
15:20 karolherbst: and end up with three passes
15:20 karolherbst: :D
15:20 jekstrand: karolherbst: /o\
15:20 karolherbst: nir_lower_{,compute_,graphics_}system_values :p
15:20 jekstrand: jenatali: Ok, I think that's fine. It does mean drivers which use clover might see both load_work_group_id and load_work_group_id_zero_base.
15:20 karolherbst: I already named them, it will happen now :p
15:21 jekstrand: and they'll just have to implement them as the same thing
15:21 jenatali: jekstrand: Yeah - I honestly think that's okay
15:21 jekstrand: Yeah, it probably is.
15:21 jekstrand: I'm a big fan of a system value having exactly one meaning during the entire compile.
15:22 jekstrand: I think I'm good with this plan.
15:22 jekstrand: Let's make it happen before we change our minds. :-D
15:23 jenatali: Cool, I'll see if I can get that done today - my dev machine is a little bogged down with a CTS run on WARP in a VM taking all my CPU cycles though :P
15:24 jenatali: Hm, one opinion question - CL's global invocation index (the 1D value, not 3D) is supposed to be computed from (global ID - offset) for each dimension
15:24 ccr:looks around for bikesheds
15:24 jenatali: Should I change the lowering to always emit the zero_base intrinsic? Should I emit the non-zero-base intrinsic and let optimizations remove (zero_base + base - base)?
15:24 jekstrand: jenatali: What??? *sigh*
15:25 jekstrand: jenatali: Optimizations should remove "+ base - base"
15:25 jenatali: Alright, that's probably fine
15:26 jekstrand: jenatali: Which offset are we talking about here?
15:26 jenatali: Global offset
15:26 jekstrand: Which is different from the implicit work group offset?
15:27 jenatali: Correct
15:27 jekstrand: which is added to split tyhings up
15:27 jekstrand: Ok
15:27 jenatali: E.g. "For 1D work-groups, it is computed as get_global_id(0) - get_global_offset(0)."
15:27 jenatali: Where get_global_offset needs to return the explicit global offset passed to the enqueue API
15:27 jekstrand:runs into a corner and hides.
15:27 jenatali: :)
15:28 jenatali: Hopefully the rest of the CL changes we have on our upstream queue aren't *quite* this controversial :P
15:28 jekstrand: This isn't so much controversial as painful
15:29 jenatali: Fair - I just meant requiring this much discussion to get to a solution that we're moderately happy with
15:29 jekstrand: But, yeah, doing +/- global_offset seems like it should work.
15:30 jekstrand: If NIR doesn't optimize it away, you may have to re-order the math and then it should.
15:30 jenatali: Cool, I'm assuming it'll be fine
15:30 jekstrand: In particular, if you do (work_group_id_zero_base + base_work_group) + global_offset, it should be able to get rid of it.
15:30 jekstrand: But it might not re-associate
15:31 jenatali: It'll be global_id = (work_group_id_zero_base + base_work_group) * work_group_size + global_offset
15:31 jenatali: So global_id - global_offset should hopefully annihilate the global_offset from the overall expression
15:33 jenatali: Btw sorry for dominating this channel with all of my CL talk :P
15:34 jekstrand: jenatali: That's what this channel is for.
15:34 jekstrand: Annoyingly, some people think they can talk about kernel drivers here. :-P
15:35 jekstrand: Honesly, we really need a #mesa channel or something.
15:35 jekstrand: Or maybe #nir
15:44 jenatali: That'd make sense to me, since we're not really talking about dri at all...
15:57 jenatali: Ugh... D3D's default float->float rounding mode is rtz instead of rtne like nir
15:59 karolherbst: jenatali: right :)
15:59 karolherbst: but devices can flip this
15:59 karolherbst: but yeah...
16:03 jenatali: Guess I need a pass to convert f2f16 -> f2f16_rtne, and then a lowering pass to implement that in terms of f2f16_rtz
16:03 jenatali:sighs
16:04 karolherbst: jenatali: I was thinking about adding all those opcodes to nir anyway
16:05 karolherbst: because... CL is annoying and has convert
16:05 jenatali: Yeah, you mentioned. daniels added those to vtn in our tree
16:05 karolherbst: I guess constant folding is where things get annoying :p
16:13 alyssa: jekstrand: #dri-devel gets too much traffic for me to keep up personally
17:04 zmike: jekstrand: I followed your suggestion re: setting variable mode to nir_var_shader_temp and running fixup_deref_modes, but I'm still getting deref instructions for that variable
17:04 zmike: is there another pass that needs to be run?
17:07 zmike: ah nvm I see
18:06 Lyude: Hey - wasn't someone on here working on some sort of libkms?
18:10 emersion: Lyude: what do you have in mind?
18:10 emersion: what would that lib do?
18:11 emersion: there is liboutput, which has greater scope, and libliftoff, which is only about planes
18:11 Lyude: emersion: absolutely nothing - I just remembered someone mentioning something like that :P
18:11 emersion: (liboutput just an idea for now)
18:12 emersion: and then wlroots, which does the same as liboutput plus input and wayland :P
18:12 Lyude: https://gitlab.gnome.org/GNOME/mutter/-/issues/1360 basically we've got some folks from intel working on hdr stuff (some of this has to do with hdr backlights) and they suggested coming up with a color management library. i think that's a great idea, I'm just more curious if there's already a library going that would be better suited for implementing that functionality in
18:12 emersion: there's libplacebo in the color management departement
18:14 Lyude: emersion: ...just to confirm, is that actually the name of the library? or is this a clever joke, lol
18:22 imirkin: isn't there lcms or something? i never quite understood what it did
18:22 imirkin: but then again, color management has remained somewhat of a mystery
18:24 dcbaker[m]: Is colord related to this or different? (Generally curious)
18:24 Lyude: no idea, i'm a bit new to color management as well :)
18:24 Lyude: ( jadahl ^ any idea on any of this btw? )
18:24 Lyude: also - apparently there is a libplacebo, not sure it's what we want here but also not unsure
18:29 ajax: please let's not invent another color management library
18:29 Lyude: ajax: any idea what I should suggest to the person on that gitlab issue I linked a little higher up?
18:32 jadahl: Lyude: i don't know of any library
18:32 ajax: Lyude: lcms2 is already uses in gegl, gimp, gstreamer, gnome-settings-daemon, inkscape, colord...
18:32 ajax: please either use it or build atop it
18:33 Lyude: ajax: sgtm
18:35 ajax: gnome-color-manager (not sure how this is different from what g-s-d does) seems to link against lcms2 but uses the data files from argyllcms?
18:56 emersion: Lyude: yup :) https://github.com/haasn/libplacebo
19:04 jenatali: jekstrand: What's the normal source of half pack/unpack ops in nir?
19:04 jenatali: Is there a GLSL function for it or something?
19:05 imirkin: yes, there's are GLSL functions
19:05 imirkin: to pack/unpack 2x16 in a single uint
19:05 jenatali: Got the names?
19:05 imirkin: jenatali: http://docs.gl
19:05 imirkin: type "pack"
19:05 imirkin: enjoy
19:05 jenatali: Thanks :)
19:31 haasn: Lyude: imirkin: emersion: dcbaker[m]: hi. all three are mostly non-overlapping libraries. lcms is a ICC profile handling library with CPU color management routines. libplacebo is a library for GPU shaders, including one for color management. It depends on lcms (to generate LUT textures). colord is a library for querying/setting/retrieving ICC profiles from the system (per monitor etc.), which is
19:32 haasn: something that's considered out of scope of both lcms and libplacebo (they assume the user already has a void* ICC profile)
19:32 haasn: Using all three at the same time isn't a bad idea, if it's GPU-accelerated color management you want
19:33 Lyude: haasn: I believe it's a combination of gpu accelerated management and also programming the actual lcd panel to enable fancy hdr backlight features along with programming panel tone mapping and other stuff
19:33 haasn: ajax: to complete the picture, argyllcms is a library for creating, modifying, visualizing and otherwise manipulating ICC profiles, which is also mostly orthogonal to what LCMS provides (which is more about *parsing* and using them)
19:33 Lyude: if i'm reading that gitlab issue correctly, was going to respond and make sure I understood it right (and mention all of these libraries to them)
19:33 dcbaker[m]: haasn: thanks for the explanation
19:35 haasn: a typical data flow connecting all these components could look like: 1. user creates a profile using argyllcms (and its frontends, like displaycal); 2. user sets this profile for the display using colord; 3. user program loads the profile from colord and passes the pointer to libplacebo; 4. libplacebo parses it using lcms2 and generates function tables and shaders for use with the GPU
19:35 haasn: but note that the use of colord here is strictly optional, some refuse to use it because it requires dbus. Both X11 and Wayland have their own native ways of querying display ICC profiles (not via colord)
19:36 haasn: (but the wayland one is a WIP draft and I'm not sure if it's implemented anywhere)
19:36 imirkin: haasn: there's also a color space component in all this
19:36 imirkin: the source video is encoded in some color space
19:36 imirkin: and there's a way to change the range of colors displayed on the display
19:36 haasn: yes; you also deal with encodings, fixed function conversions, and a whole range of other fun topics
19:36 haasn: including how to handle out-of-gamut colors
19:37 imirkin: and some of these things are adjustable in hardware (e.g. you can program a LUT in)
19:37 haasn: or how to do soft tone mapping (for HDR->SDR etc.)
19:37 imirkin: while others aren't
19:37 haasn: the full picture also includes e.g. argyllcms generating 1D calibration LUTs that displaycal loads into the VCGT
19:37 imirkin: and some hardware allegedly has 3D LUTs
19:37 haasn: there are lots of moving parts here and I'm happy to answer any technical questions
19:38 haasn: imirkin: Some displays do, but they're generally not accessible from the operating system (except maybe via some weird vendor-specific channels)
19:38 imirkin: i mean display chips, not panels
19:38 haasn: I've never seen a GPU that has a programmable fixed function 3DLUT in the scan-out path
19:38 imirkin: i _thought_ intel did
19:38 haasn: But they could exist, I don't know
19:38 imirkin: but i'm not actually sure.
19:39 imirkin: (not currently exposed even if they do have it)
19:39 haasn: Typically it's 3x1D LUTs and if you're lucky a 3x3 shaper matrix
19:39 imirkin: that's fairly standard
19:39 imirkin: that 3x3 matrix (and LUTs) can also apply in various places along the composition path
19:39 imirkin: e.g. it coudl be per plane
19:39 imirkin: or it could not be per-plane
19:39 haasn: neat
19:40 imirkin: (obviously when i say 'standard', i'm referring to semi-modern hw. if you go far enough back in time, none of this stuff was there, except LUTs which have existed since the dawn of CGA graphics)
19:40 imirkin: (i guess with CGA it was a fixed LUT though)
19:44 ajax: haasn: libplacebo sounds awesome
19:45 ajax: and not just the name
20:37 anholt: chrisf: what would it take to get cts attention on https://gitlab.khronos.org/Tracker/vk-gl-cts/-/issues/2035 ?
21:29 phire: The CGA LUT is somewhat configurable. That's how CGA 4 color mode gets it's 4 diffrent color palettes.