00:05 bnieuwenhuizen: airlied: do you have a dump for the loader bug?
00:06 airlied: bnieuwenhuizen: https://github.com/KhronosGroup/Vulkan-Loader/issues/386
00:06 gitbot: KhronosGroup issue 386 in Vulkan-Loader "unlocking unlocked mutex in terminator_EnumerateDeviceExtensionProperties " [Open]
00:29 jekstrand: krh, anholt: So... You know how you want to better alignments to make std140 work better on freedreno? If we want UBO support on Gen4, we need that too.
01:35 jekstrand: Maybe not.... Still a bit unclear
03:09 mareko: what's the difference between StorageImageMultisample and ImageMSArray SPIR-V caps?
03:12 imirkin: i think one's the vk thing, the other is the spirv thing?
03:13 imirkin: oh, both are SpvCapability* hm
03:16 airlied: mareko: a mistake :-p
03:16 mareko: airlied: duplication?
03:17 airlied: I think they should both always be turned on
03:17 mareko: you "think"
03:18 airlied: mareko: in theory one is for ms images, one is for ms array images
03:18 airlied: but I have vague memories of it being a bit of stupidity that didn't get caught
03:19 airlied: mareko: I have a patch for ARB_gl_spirv to enable them, is that what you arefixing?
03:19 airlied: https://gitlab.freedesktop.org/airlied/mesa/-/commit/b88837b8f19d07d2ddc6a2e53d4d935989264abc
03:19 airlied: though it maybe not be complete, I expect it's sufficient
03:29 jekstrand: airlied: It's just for if you can do multisampled array images
03:29 jekstrand: mareko: ^^
03:29 jekstrand: mareko: As in samplerMSArray
03:29 jekstrand: It's not really about storage at all
03:29 jekstrand: The other is whether you can do image_load_store with MSAA
03:33 imirkin: jekstrand: image_ms_array isn't set by anv though
03:33 imirkin: so i don't think that's quite right
03:34 imirkin: (unless you're saying you don't support samplerMSArray)
03:35 airlied: I think anv and gl-spirv have a bug there then
03:36 jekstrand: Yeah, we may have a bug there. It just generates pointless warnings but it's still likelyl a bug
03:36 imirkin: i have this odd recollection that i may have added it...
03:36 imirkin: i feel like i did something related to this
03:37 imirkin: hm no
03:38 imirkin: it was with SpvCapabilityStorageImageExtendedFormats. i was close.
07:15 airlied: Kayden, imirkin : just to make sure I'm not nuts, but iris binder would not work on gen4/7?
10:24 ascent12: The atomic SRC_{X,Y,W,H} KMS properties are in 16.16 fixed point. The docs briefly mention subpixels, but what colour do you get if you scale up a subpixel?
10:28 pq: undefined
10:29 pq: from https://www.kernel.org/doc/html/latest/gpu/drm-kms.html#plane-composition-properties : "The filtering mode when scaling is unspecified."
10:29 pq: and "Drivers are also allowed to round the subpixel sampling positions appropriately, but only to the next full pixel."
10:29 ascent12: Ah right, I missed that bit.
10:30 pq: but if your source rectangle is fully inside one source pixel, then I would not expect anything else than the color of that pixel
10:30 pq: since sampling from outside of the src rect is forbidden
10:31 pq: but that raises the question, what color is around the source pixel if you use e.g. linear sampling filter?
10:31 pq: maybe "pad" from the allowed source edge pixels?
14:09 Kayden: airlied: I don't think you need/want iris_binder.c with the old memory model
14:10 Kayden: I think you should probably use relocations on old hardware, rather than softpin. and at that point, you can just use u_upload_mgr
14:23 Kayden: the binder, memory zones, and border color special handling should all go away
14:24 Kayden: and the i965 statebuffer should come back
14:42 shadeslayer: anholt: Hey, I'm looking into free form labeling of BO's in DRM and was wondering, how does VC4 use it's labels? is there userspace tooling for performance profiling that ends up using these labels?
14:57 alyssa: pendingchaos: (and other ACO people) - I read int16/fp16 on your hw is always done in terms of fp16 vec2s (?), like on bifrost for us. What's the plan for working with that?
14:58 alyssa: My plan was to generalize NIR's scalarize to be able to scalarize in pairs (and quads for int8), but I was curious if you already had a trick.
14:58 imirkin: [nvidia has the same thing -- 2 packed fp16's in a single 32-bit reg]
14:58 alyssa: imirkin: Joy :) no writemasks either?
14:58 imirkin: i'd have to check, but almost sure there are writemasks and argument swizzles
14:59 pendingchaos: I think we're at least going to have a vectorizer
14:59 alyssa: (We have 2-bit swizzles XX/XY/.., but no writemasks fwiw)
15:00 pendingchaos: not sure if we're going to change NIR's scalarize to scalarize in pairs
15:00 alyssa: [nir_opt_vectorize] --> [nir_lower_scalar_as_vec]? could be fully generic
15:00 pendingchaos: ( hakzsam, dschuermann )
15:04 hakzsam: vectorize vec2 packed fp16 at NIR level is probably fine
15:07 pendingchaos: one possible issue with scalarizing in pairs is that vec2 code might be less optimizable than scalar code
15:08 hakzsam: if bitfrost, nvidia and amd have similar fp16, it makes sense to have something generic I think
15:08 pendingchaos: and maybe the vectorizer could do a better job with vec3 code
15:08 pendingchaos: not sure though
15:09 imirkin: fwiw nvidia also has some "video" opcodes which allow packed 16- and 8-bit integers
15:09 imirkin: i've never seen them used by the blob compiler under any conditions though.
15:09 imirkin: including using NV_gpu_shader5's i8vec4 or whatever
15:16 alyssa: pendingchaos: which opts are you worried about?
15:16 alyssa: CSE, mostly?
15:17 pendingchaos: no opts in particular
15:17 pendingchaos: I haven't thought about it much
15:22 pendingchaos: could maybe make code worse, since optimizing each component separately might make it harder to vectorize
15:56 dschuermann: I think I agree with alyssa for just lowering to vec2 instead of vectorizing, but we will probably need both. I don't know if the fp16 shaders are mainly written for vec2 already or scalar
15:57 alyssa: for GLES, we have lots of vec4/fp16
15:57 imirkin: dschuermann: well presumably a lot of the GLES usage is just "precision float mediump;"
15:58 alyssa: ^^
15:58 imirkin: rather than an explicit request for fp16
15:58 pendingchaos: if we want to vectorize, we probably also want to improve/replace nir_opt_vectorize
15:58 pendingchaos: afaict (from reading the source) it can only vectorize arithmetic if vector instructions already exist
15:58 dschuermann: ok, but we also have to think about the modern rapid math stuff from vulkan and dx12
15:59 alyssa: rapid math?
15:59 dj-death: dcbaker: do you know if it's possible to prevent tests within a piglit group to run concurrently?
16:00 dschuermann: rapid packed math = marketing term for using vec2fp16
16:00 dj-death: dcbaker: I saw run_concurrent=False but I'm not sure it's what I want
16:14 dcbaker: dj-death: if you want to force them to never run concurrently you set run_concurrent=False
16:15 dcbaker: karolherbst: perhaps you'd be intersted in reviewing https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4673? :)
16:16 daniels: pq, ascent12: subpixel src co-ords are used when you're doing scaling with a non-nearest filter mode
16:29 karolherbst: dcbaker: why is it x86 specific though? I thought cutting of the high bits is expected
16:30 dcbaker: it relies on the fact that an unrepresentable number returns -MAX_INT
16:30 dcbaker: airlied figured out what was going on
16:30 karolherbst: sure
16:30 dcbaker: but lround returns 0 for an unrepresentable number
16:30 karolherbst: but why is that x86 specific
16:31 karolherbst: heh?
16:31 karolherbst: but that's not the issue I saw
16:31 karolherbst: lround returned a perfectly fine number
16:31 karolherbst: but
16:31 karolherbst: it returns a 64 bit int
16:31 karolherbst: so instead of clamping the high bits get discarded as the result is 32 bit
16:32 dj-death: dcbaker: cool, thanks
16:34 karolherbst: r-by me but I disagree with the description
16:35 dcbaker: I can change the description, but basically it relies on the x86 implementatino of undefined behavior
16:35 karolherbst: why is that different on eg arm?
16:35 karolherbst: does 1 << 33 not become 0 on arm?
16:36 dcbaker: 23:25  mattst88  https://www.felixcloutier.com/x86/cvtss2si
16:36 dcbaker: 23:26  mattst88  > If a converted result cannot be represented in the destination format, the floating-point invalid exception is raised, and if this exception is masked, the indefinite integer value (2w-1, where w represents the number of bits in the destination format) is returned
16:36 karolherbst: dcbaker: it has nothing to do with floating point
16:36 dcbaker: did I copy the wrong thing? hold on
16:37 imirkin: dcbaker: lroundf works. then you truncate it to integer instead of saturate.
16:38 karolherbst: but i can see that having a floating point number outside of a 64 bit integer range can cause issues still
16:38 karolherbst: but that's not what caused the fail on my system
16:38 alyssa: karolherbst: 1 << 3 = 2 here
16:38 alyssa: er
16:38 karolherbst: alyssa: << 33
16:38 karolherbst: :p
16:38 alyssa: yea
16:39 alyssa: (with -O0. with -O3 it's const propped to 0)
16:39 karolherbst: yep
16:40 alyssa: (if I throw in a scanf and -O3... yeah, it's really 2 :P)
16:40 karolherbst: which.. would be a garbage value :p
16:40 karolherbst: but yeah
16:40 alyssa: well... 1 << 34 = 4, etc
16:40 karolherbst: uhhh
16:40 karolherbst: wait
16:41 karolherbst: 1ll << 34
16:41 karolherbst: my mistake
16:41 karolherbst: well
16:41 karolherbst: 1l rather
16:41 karolherbst: it's long int
16:50 anholt: shadeslayer: the biggest thing is debugfs that tells you bo count and total size for each name
16:57 dcbaker: karolherbst: I've simplified the commit message to not include anything about x86, does that look more reasonable?
17:01 karolherbst: dcbaker: I'd mention that the problem wasn't lroundf itself, but rather the 64 to 32 bit int cast. util_iround probably doesn't handle out of range values better than lroundf
17:02 karolherbst: when I was debugging it, the returned value from lroundf was totally fine
17:03 karolherbst: lroundf(1.09951163e+12) -> 0x10000000000
17:03 mattst88: karolherbst: yes, lroundf didn't give the clamping behavior we desired, that's not the question; the question is why did IROUND give the desired behavior?
17:03 mattst88: I think the answer to that is that it was provoking undefined behavior
17:03 karolherbst: mattst88: because it returns a 32 bit int not 64
17:04 karolherbst: and IROUND is based on 32 bit math, not 64 as lroundf is
17:04 mattst88: karolherbst: converting a float that is out of the integer range into an int is undefined behavior
17:04 karolherbst: mattst88: but the float was not out of range
17:04 karolherbst: lround returned the correct value
17:04 karolherbst: *lroudnf
17:04 karolherbst: *lroundf
17:04 mattst88: 1.09e12 isn't out of the integer range?
17:04 karolherbst: nope
17:05 karolherbst: 0x10000000000 was the result
17:05 karolherbst: so lroundf did the correct thing
17:05 mattst88: python -c 'print(1.09e12 > (1<<32))'
17:05 mattst88: True
17:05 karolherbst: mattst88: it's _long_int_, 64 bit
17:05 mattst88: I'm not talking about lroundf
17:05 karolherbst: lroundf returns a 64 bit int
17:05 mattst88: I know!
17:05 mattst88: you're not listening...
17:06 mattst88: why did IROUND work before is the question
17:06 mattst88: we understand why lroundf didn't work
17:06 karolherbst: " and IROUND is based on 32 bit math, not 64 as lroundf is"
17:06 mattst88: yes, and it was doing something that was undefined behavior
17:06 karolherbst: why?
17:06 karolherbst: I don't see where the undefined behaviour was
17:07 mattst88: converting a float 1.09e12 to an 'int' is undefined behavior
17:07 karolherbst: but that never happened
17:07 mattst88: where did you get the 1.09e12 then?
17:07 karolherbst: value of obj->Sampler.MaxLod
17:07 mattst88: how was that an input to lroundf but not to IROUND before the patch?
17:08 mattst88: this was a /regression/. the test passed before we switched to lroundf()
17:08 karolherbst: I know
17:09 mattst88: okay, so what am I misunderstanding and how did we not pass 1.09e12 to IROUND?
17:09 karolherbst: I actually didn't check what was the old value we passed into IROUND
17:09 karolherbst: but I'd assume it was the same
17:10 mattst88: oh. well, you stated that it didn't happen earlier...
17:10 karolherbst: because why would it change?
17:10 mattst88: okay, glad we agree on that then
17:10 mattst88: the implementation of IROUND was
17:10 mattst88: return (int) ((f >= 0.0F) ? (f + 0.5F) : (f - 0.5F))
17:10 karolherbst: I just sad that the cast of the return value caused issues
17:11 mattst88: if we pass it f=1.09e12, then it's going to convert 1.09e12 -- a float that is out of the integer range -- to an int
17:11 mattst88: that is undefined behavior
17:11 mattst88: where is my logic wrong?
17:11 karolherbst: ohhh, I see what you mean now
17:11 karolherbst: the CTS relies on undefined behaviour
17:11 mattst88: yes
17:12 mattst88: well, probablay
17:12 mattst88: I /think/ that this was implicitly clamped in the conversion float -> int because the cvtss2si instruction does that under some circumstances
17:12 karolherbst: mhhh
17:12 karolherbst: now I am wondering
17:13 mattst88: but the C code is undefined behavior, and I think we were simply getting lucky
17:17 mattst88: I left a comment on the MR that explains what we think is happening
17:20 imirkin: arguably mesa relied on undefined behavior to implement GL correctly :)
17:21 imirkin: i don't know what the correct thing is tbh, would require some spec reading
17:22 Kayden: at any rate, getting it doing what it was doing is probably the right call
17:26 mattst88: oh, right. we're not even getting the "clamping" behavior
17:26 mattst88: IROUND was just returning complete nonsense: util_iround(1089999994880.000000) = -2147483648
17:27 mattst88: I don't think getting it doing what it was doing is the right call
17:27 mattst88: this seems very stupidly broken
17:28 Kayden: okay
17:29 mattst88: gotta clone cts, but I'll bet a shiny nickle that it's got a hack that allows this result
17:33 mattst88: okay, I don't even know where to find the code for sgis_texture_lod_basic_getter anymore
17:35 imirkin: mattst88: kc-cts, look in external, there's a fetch thing for it
17:36 imirkin: if you're one of the cool people.
17:36 mattst88: oh, right, the fetch script without the shebang that always screws me over
17:47 mattst88: lol
17:47 mattst88: a comment in the particular test: "is this correct? 2147483647 = (1<<31)-1"
17:51 malice: mattst88: Not even ==?
17:51 mattst88: it's a comment, not actual code
17:52 mattst88: not that int-literal = would mean anything
18:20 anholt: fd-farm ran out of disk space because I forgot to set up its gc scripts. apologies to anyone with broken jobs.
18:22 daniels: anholt: are you using docker-gc or docker-free-space?
18:22 anholt: daniels: nothing at the moment
18:23 anholt: I figure I should probably use the same thing you're using? but helm-gitlab-config seems to be mostly set up for provisioning packet runners
18:23 anholt: so I guess I could copy bits out of it?
18:23 daniels: if you pull it now, it's changed a fair bit
18:24 daniels: dfs is the new script that bentiss and I have been working on which tries much harder to a) preserve important things and b) clean up to a disk space target rather than a schedule
18:25 anholt: I've got 900G of disk, any replacement algorithm at all should be fine.
18:26 anholt: (though I've got serious side-eye for anything but lru or random)
18:26 daniels: dfs is LRU
18:26 anholt: I saw a bunch of stuff going on about tagging classes of images, I thought
18:27 daniels: the main docker-gc flaw is that ejecting old containers and volumes means ejecting your git repo cache
18:28 anholt:wishes we could include a one time clone of mesa in the docker image, and --references it from clones within jobs.
18:28 daniels: partly. there's a whitelist for upstream images which never get ejected, and explicit expiry for downstream images with the date generated on build. everything in between is LRU, and instance-global rather than runner-local LRU
18:28 EdB: It's nice, I was about to work again on bringing clover to opencl 1.2 and after that 3.0 will mostly be done :)
18:28 daniels: anholt: yeah ...
18:28 daniels: chasing that up with them currently
18:28 anholt: oh, sweet
18:32 anholt: daniels: so, I'm confused: how is the docker-free-space supposed to work long term if we never eject upstream images (/mesa/*)?
18:33 anholt: some time-based thing, I guess?
18:37 daniels: atm the presumption is that they change infrequently enough that manual ejection is fine. we can easily enough make that just bias the LRU expiry to only eject upstream images after a rather longer time
18:38 daniels: so far the 1.4TB local per-runner has been enough that that hasn't been a problem. also that we only deployed it just over a week ago, and it's not set in stone but we wanted to put it out whilst we worked on other things rather than polish it forever
18:39 daniels: (bentiss has been rewriting our helm deployment so we can move to upstream CN charts more easily, I've been getting us moving +3 psql major versions as well as d3d12/cl)
18:52 Lyude: seanpaul: maybe you know the answer to this, what should I do if I want to get a fix into drm-misc-fixes that doesn't apply cleanly, but does apply cleanly on drm-misc-next? (talking about https://patchwork.freedesktop.org/patch/362928/?series=76446&rev=1 - I have a feeling it's because your work with https://patchwork.freedesktop.org/patch/msgid/20200213211523.156998-3-sean@poorly.run
19:30 Lyude: airlied: any idea about the question I asked a little higher up? ^
19:41 airlied: Lyude: if you want it in 5.7, then put it into fixes
19:41 airlied: it'll get backmerged later into next
19:41 airlied: if you want it in next for CI or testing reasons, put it in next, then cherrypick back to fixes
19:41 airlied: danvet: ^?
19:43 danvet: for -misc I'd put it into -fixes and then tell maintainers to backmerge and how to resolve it
19:43 danvet: once it's all landed in the next -rc
19:46 danvet: Lyude, also seanpaul stepped down as -misc maintainer
19:46 danvet: if the -fixes version is very different, I guess pushing to -next and cherry-picking over to -fixes is also good
19:46 seanpaul: sorry Lyude, I was in meetings
19:46 danvet: either way heads up to -misc people is good for the backmerge
19:53 Lyude: danvet: yeah I know - I just figured they might still know the answer :p, alright though - I'll go ahead and do that
19:54 Lyude: i'll send a cherry-picked version in a bit
20:02 sravn:hopes that Xin Ji will be glad for all the feedback. There is a lot to do to address it all... (some bridge driver on dri-devel)
20:20 Lyude: mripard: btw, I'm pushing a fix to drm-misc-next: https://patchwork.freedesktop.org/patch/362928/?series=76446&rev=1 I want to cc this to stable as well but the patch will need to be slightly different for it to apply to drm-misc-fixes, so I'll send out a backported version to cherry-pick soon
21:15 shadeslayer: anholt: thanks!
21:44 airlied: hmm nesd to make virgl ci tests trigger for llvmpipe
21:48 anholt: oof, yeah.
22:11 airlied:clearly doesn't understand how to do that after trying to work out what virgl-rules are
22:12 airlied: ah test-sources-dep
22:12 airlied: anholt: just add llvmpipe paths to virgl rules or add llvmpipe rules to virgl jobs?
22:21 airlied:goes with the latter
22:53 anholt: airlied: add llvmpipe paths to virgl rules, I think. otherwise, I think the rules/stage will get overwritten by the later one included
22:55 airlied: anholt: I did this https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4777
22:55 airlied: but yeah I'm not sure about inheritance there
22:56 anholt: that's the thing i'm thinking won't work
22:58 airlied: bleh it's mesesy to test since any pipeline containing a test also contains the change .gitlab-ci
22:58 airlied: which means all the pipelines get generated I think
23:02 airlied: anholt: I'll go with adding the paths
23:05 airlied: anholt: okay pushed the alternate plan
23:21 airlied: ickle: do i915 hw contexts work on gen4/5 now?
23:23 airlied:finds 9ce9bdb00dfc7e5cac58c9dc59fe7f2669c0af6b
23:32 airlied: lols /* For the special day when i810 gets merged. */
23:44 HdkR: That would be a very special day indeed
23:47 airlied:wonders if I could even find a functional i810