00:02 Lyude: airlied: so, some context: my normal workflow is compile thing on my machine, then run a script I have that installs it to a temporary prefix, and rsync it to the target machine. it sounds like overkill, but it's been super useful when dealing with slower machines like some of the arm boards I've got lying around. Granted-this powerpc machine in particular is anything but slow, but I guess I'm
00:02 Lyude: just trying to keep the same workflow here as well
00:04 airlied: Lyude: yeah you should probably rethink the workflow there, ppc64 machines are a lot faster to build on
00:05 airlied: and hopefully fixing it isn't a major task
00:06 Lyude: yeah, i've been thinking about doing that at some point. definitely would need to fix the dozens of scripts i've accumulated over the last few years
00:09 Lyude: honestly - thinking about it, a pretty significant chunk of those scripts do things that autotools didn't do, but meson probably does do
00:09 Lyude: so I wonder how many of them I can just get rid of
00:11 Lyude: ('stuff' like setting up out-of-tree buildroots with one command, having a script to figure out what combination of autoreconf|./configure|./autogen.sh is needed for a project, etc.)
00:12 airlied: for igt on ppc64 I'd just git clone, fix it and run away, until you are ready to spend time on CI :-P
00:14 Lyude: yeah, that's my new plan
00:14 Lyude: a shame, but meh
00:39 jenatali_: karolherbst: MR comments aren't a great high-bandwidth conversation channel :)
00:40 jenatali_: Are you suggesting that in the case where a deref chain includes a cast, leave the load/store instructions alone?
00:40 jenatali_: Then lower_explicit_io can be added afterwards to handle those cases?
00:40 karolherbst: :D true
00:40 karolherbst: jenatali_: yes
00:40 karolherbst: exactly
00:40 karolherbst: lower_scratch should only convert the variables and just leave the casts alone
00:41 jenatali_: Yeah, I suppose that could work
00:41 karolherbst: :)
00:41 karolherbst: then we don't need to touch the scratch_size in lower_io
00:41 jenatali_: Yeah, makes sense
00:41 jenatali_: Just to be clear though, that routine's not actually part of lower_explici_io, it's an independent pass
00:42 karolherbst: is it the nir_lower_vars_to_explicit_types stuff?
00:42 jenatali_: Yeah
00:46 karolherbst: mhhhh
00:46 karolherbst: this type stuff is annoying
00:46 karolherbst: I am sure lower_scratch assumes GL types
00:47 jenatali_: Probably
00:47 karolherbst: ohh it does not
00:47 karolherbst: glsl_type_size_align_func is an input
00:47 jenatali_: Oh cool
00:47 karolherbst: so you can define your rules
00:47 karolherbst: so essentially just the cl type helpers
00:47 karolherbst: okay, nice
00:47 karolherbst: so that should work then
00:48 jenatali_: Sure, so just a matter of early-out on lower_load_store in nir_lower_scratch.c if var ends up coming back null (i.e. there's a cast)
00:48 karolherbst: no clue if we already have a convenience declaration for that
00:49 karolherbst: yeah.. I think that should be enough
00:49 jenatali_: There's one in clover and one in our clc frontend
00:49 karolherbst: ahh, maybe we should move it into libnir then
00:49 karolherbst: :)
00:50 jenatali_: Probably not a bad idea
00:50 karolherbst: huh.. I don't see the one in clover actually
00:50 jenatali_: Oh, I just assumed there was...
00:51 karolherbst: clover only uses nir_lower_explicit_io really
00:51 karolherbst: I tried to only have a very minimal amount of passes there
00:51 jenatali_: Ah fair enough
00:51 jenatali_: We use the lower_vars_to_explicit_types for shared memory too
00:52 karolherbst: normally the passes should be up to the driver to choose
00:52 karolherbst: yeah.. maybe clover should too
00:52 karolherbst: I never really spend much time on perfect CL support yet as... well.. should finish up other things first
00:52 jenatali_: Right, forgot clover gives nir over to the driver. For our stuff we're doing pretty much all the lowering upfront
00:52 karolherbst: yeah
00:53 jenatali_: I'm hoping next week to start working through the actual CTS rather than just synthetic tests :) we'll see how far off we've been from just reading the spec and doing our own basic tests hah
00:53 karolherbst: :)
00:53 karolherbst: I was running the CTS in the past
00:53 karolherbst: and still do it mainly for my testing
00:53 karolherbst: the vtn code is pretty solid there
00:53 karolherbst: I usually just run into stupid nouveau issues or things just missing
00:53 jenatali_: Until you touch images ;)
00:53 karolherbst: like image support
00:53 karolherbst: right
00:53 karolherbst: but that's optional :p
00:53 jenatali_: Ehhhh
00:53 karolherbst: :D
00:54 jenatali_: Technically, but in practice not really
00:54 karolherbst: I know
00:54 karolherbst: for ... $reasons the work I am doing on clover is because of system level SVM :p
00:54 karolherbst: so I need to finish this first
00:54 jenatali_: Heh fair enough
00:54 jenatali_: I'll see about getting those image patches on their way upstream soonish
00:55 karolherbst: and the structurizer is quite important for that as well :) and I hopefully be able to get back to it this week
00:56 karolherbst: jenatali_: cool
00:56 jenatali_: Oh the one other annoying thing about using lower_scratch instead of lower_explicit_io is it doesn't have clean support for the new address modes I added, meaning we might get spurious int64s where the optimization pass doesn't perfectly remove them
00:57 jenatali_: (In the cases where the deref chain does go back to a variable)
00:58 jenatali_: Oh... it also only removes variables that have dynamic derefs to them. I'd prefer not to have to implement both static deref chains *and* dynamic deref chains in the DXIL backend if I can avoid it
00:59 jenatali_: Yeah... I'm still leaning towards choosing the explicit_types + explicit_io passes as a superset of lower_scratch
01:00 karolherbst: jenatali_: not indirect ones just get resolved to constant offset loads
01:00 karolherbst: so.. this usually is fine, no?
01:00 karolherbst: because then you can lower them to ssa values
01:01 karolherbst: nir_lower_vars_to_ssa is your friend
01:01 jenatali_: As long as they're basic types (vec4 or less) yeah we can lower them to ssas
01:01 karolherbst: so instead of moving everything to scratch, you only move those with indirect references
01:02 karolherbst: saving space and lower perf impact
01:02 jenatali_: Yeah, we run lower_vars_to_ssa before lower_scratch I'm pretty sure
01:02 karolherbst: :)
01:02 karolherbst: yeah.. so in the best case that resolves all of that
01:03 karolherbst:should really use lower_scratch in nouveau
01:03 karolherbst: right now I just treat all left over variables as spilled memory
01:03 karolherbst: which is scratch :p
01:04 jenatali_: Alright, I'll see if I can construct a case where lower_vars_to_ssa + lower_scratch doesn't end up lowering all non-cast derefs
01:04 karolherbst: yeah.. would be good to know those cases
01:05 karolherbst: I could imagine that just taking a normal reference could kill it
01:05 jenatali_: You guys just have to make my life hard, would be so much easier to just call lower_explicit_io mutually exclusive with lower_scratch and call it done :P
01:05 karolherbst: :p
01:05 karolherbst: where would be the fine in upstreaming then :D
01:05 karolherbst: *fun
01:06 karolherbst: jenatali_: see it this way: this allows you to write high quality code, because it's not you saying "it has to be better code" but others, so it's out of your control :p
01:06 jenatali_: Yeah yeah :P
01:13 karolherbst: jenatali_: do you have a nir showing where we take a refrence to shader_tmp or function_tmp memory and store it somewhere?
01:13 karolherbst: I am wondering how the end of the deref chain looks like
01:13 jenatali_: I don't have a nir handy, but the CLC is pretty simple
01:14 karolherbst: I assume there is something besides load_deref/store_deref which we need to add to nir_lower_vars_to_scratch
01:14 jenatali_: Yeah, deref_var and take the result of that and store it somewhere
01:14 jenatali_: Then deref_cast that result back to a pointer and deref_load that
01:15 karolherbst: mhhhh
01:15 jenatali_: I.e. all the things that lower_explicit_io handles
01:15 karolherbst: mhhhh
01:15 karolherbst: so we essentially have a random deref chain ending with whatever
01:15 karolherbst: and we just store the deref value
01:15 jenatali_: Yep
01:16 karolherbst: mhhh
01:16 karolherbst: I am not sure if I like this and I'd rather see a deref_get_ptr_value to end the chain properly..
01:16 karolherbst: but.. maybe jekstrand is fine with it?
01:17 jenatali_: I mean, it's nothing special for function_temp/shader_temp compared to any other address mode
01:17 karolherbst: I really don't like a deref to end being opaque
01:17 jenatali_: If you use a pointer in CL, you get a deref that ends up being used as an actual value
01:17 karolherbst: yeah.. I know
01:17 jekstrand: karolherbst: What do you mean by deref_get_ptr_value?
01:17 karolherbst: but we never store it in memory really
01:17 karolherbst: jekstrand: like when a deref ssa value get's actually written to memory
01:17 jenatali_: Unless you use it as the value in a deref_store :P
01:18 karolherbst: ssa = deref_struct.. whatever
01:18 karolherbst: and this ssa gets stored
01:18 jekstrand: karolherbst: Right. So right now we don't require any sort of special cast there.
01:18 jekstrand: karolherbst: I think we talked about requiring it some time a long time ago
01:18 karolherbst: yeah...
01:18 karolherbst: I thnk so to
01:18 karolherbst: I don't really have an opinion on that besdies that deref should stay opaque
01:18 jekstrand: karolherbst: I've yet to see something that convinces me it's needed though
01:18 karolherbst: but maybe it's fine
01:18 karolherbst: dunno
01:19 karolherbst: I just see the ugglyness if you want to be smart
01:19 karolherbst: and have different ptr sizes
01:19 karolherbst: like 32 bit for shared memory
01:19 jekstrand: Yeah
01:19 jenatali_: I mean... you're not wrong
01:19 jekstrand: I think some of that ugliness is unavoidable though
01:20 karolherbst: probably
01:20 jekstrand: It's more a question of how we mitigate it
01:20 karolherbst: right
01:20 jenatali_: The lowering/optimization passes do a pretty good job of removing it by the end though, unless you actually need to write it to memory, in which case you need a consistent pointer size
01:20 karolherbst: I don't have a strong opinion on that
01:20 jekstrand: The current strategy is "just make them the size CL thinks they are and write optimizations to fix it if needed"
01:20 karolherbst: just ending a deref chain explicitly is kind of what I'd prefer
01:21 karolherbst: jekstrand: sure, but you could have 32 bits for shared
01:21 karolherbst: and just fill in upper bits
01:21 jekstrand: karolherbst: Yeah. I've just not come up with a way to do that which I like.
01:21 karolherbst: or so
01:21 jekstrand: karolherbst: And that's what jenatali_'s patches do
01:21 karolherbst: uhm.. jenatali_ I meant :D
01:21 karolherbst: yeah..
01:22 karolherbst: I saw those
01:22 karolherbst: I mean.. I am fine either way
01:22 karolherbst: that just makes it hard to use lower_scratch then I think
01:22 jenatali_: Yeah
01:22 jekstrand: karolherbst: I commented on that
01:23 jekstrand: karolherbst: I don't know that saving lower_scratch is that important
01:23 jekstrand: I also don't know that it's that hard
01:23 jenatali_: That's what I'm thinking too
01:23 karolherbst: jekstrand: I'd like to always use lower_scratch in nouveau
01:23 jenatali_: Though, as soon as you have a cast in play, you can't really know which variables might be loaded from scratch vs direct
01:23 karolherbst: instead of doing the ugly reg thing
01:23 jekstrand: lower_io is pretty much a superset of lower_scratch with the one caveat that you don't have control over the sizes of things.
01:24 jekstrand: karolherbst: ugly red thing?
01:24 karolherbst: indirects on registers are spilled memory
01:24 karolherbst: for us
01:24 karolherbst: so..
01:24 karolherbst: right now I force regs to handle this
01:24 jenatali_: If it didn't already get removed from lower_vars_to_ssa
01:25 karolherbst: get rid of all vars as much as possible
01:25 karolherbst: then do nir_lower_locals_to_regs
01:25 karolherbst: and those are my indirects
01:25 jekstrand: Yeah, if it's not already removed by vars_to_ssa then it's indirect and lower_io will lower it to scratch. That sounds like what you want.
01:25 karolherbst: ohh, it does?
01:25 karolherbst: I see
01:25 karolherbst: a lot of stuff changed... I really need to clean up some of the messy bits
01:25 karolherbst: or just write the convert from scratch :D
01:26 karolherbst: *converter
01:26 jenatali_: Which converter?
01:26 jekstrand: In the Intel back-end we also run lower_indirect_derefs to turn indirect derefs into if ladders
01:26 jekstrand: Which works but we don't currently have a knob to tell it to leave large stuff alone so we can lower_io on it.
01:26 jekstrand: So we lower_scratch first and then lower_indirect_derefs
01:26 jekstrand: But if lower_indirect_derefs had such a knob, we could use lower_io instead of lower_scratch.
01:27 jenatali_: Oh, now I understand your comment
01:27 jenatali_: I hadn't seen lower_indirect_derefs yet
01:28 karolherbst: jekstrand: yeah... temps on indirect stuck
01:28 karolherbst: uhmn
01:28 karolherbst: indirects on temps
01:29 jekstrand: Yeah, we can do indirect reads on registers fairly well but we don't actually have that wired up in the back-end
01:29 jekstrand: And it has nasty register pressure implications
01:29 karolherbst: yeah.. we can't :p
01:30 jekstrand: We can't do indirect writes competently
01:31 karolherbst: I just need a proper way to lower indirectly accessed temps to scratch space
01:31 karolherbst: this register business is annoying
01:31 jekstrand: karolherbst: lower_scratch and lower_io will both do that for you. :)
01:31 jenatali_: Well, lower_io doesn't do it quite yet ;)
01:31 jekstrand: Well, lower_io will once jenatali_'s patches have landed
01:31 karolherbst: yeah.. I guess
01:31 karolherbst: well
01:32 karolherbst: I wouldn't mind getting rid of lower_scratch and replace it with lower_io :p
01:32 karolherbst: I guess
01:32 karolherbst: or is there a good reason we should keep lower_scratch?
01:32 karolherbst: I really only want lower_io to do it for all indirects though
01:33 jenatali_: If you've run vars_to_ssa, the only thing that lower_io *can* change would be indirects
01:33 karolherbst: *vars_to_reg
01:33 karolherbst: uhm
01:33 karolherbst: locals_to_reg
01:34 karolherbst: ohhh
01:34 karolherbst: now I see what you mean
01:34 karolherbst: mhhhh
01:34 karolherbst: lower_io is kind of the first pass I run
01:34 jenatali_: Well you don't run it on function_temp right now :)
01:34 jenatali_: So you can just do that one later
01:34 karolherbst: and I gues I could do a second loweR_io after vars_to_ssa
01:34 karolherbst: "NIR_PASS_V(nir, nir_lower_io, nir_var_all, type_size, (nir_lower_io_options)0);" :p
01:35 jenatali_: Ah, I'm only tweaking lower_explicit_io, not nir_lower_io
01:35 karolherbst: ohh, I see
01:35 karolherbst: mhhh
01:35 karolherbst: maybe I could run explicit_io on temps then
01:36 jekstrand: karolherbst: Looks like it's currently getting used by v3d, radeon, r600, and intel
01:36 karolherbst: well either way.. if that helps me to get rid of that indirect regs stuff I am fine
01:36 jenatali_: So, what would you like me to do as part of my patches? :)
01:37 karolherbst: whatever jekstrand thinks is best :p
01:37 jenatali_: The lower_vars_to_explicit constructs the driver_location for all the vars in the address space to be tightly packed contiguous, so sharing the scratch_space with another pass would be a big complex
01:37 jenatali_: a bit* complex
01:37 jekstrand: What is maybing happening as part of patches?
01:38 jenatali_: Just allowing lower_explicit_io to act a replacement for lower_scratch that supports casts
01:38 jekstrand: Right
01:38 jekstrand: How much work is that to do?
01:38 jekstrand: Is it just a matter of nicer handling of nir_shader::scratch_size?
01:39 jenatali_: I mean, everyone who touches it has to cooperate on it
01:39 jekstrand: on scratch_size? sure
01:39 jekstrand: It's not hard though. You just keep adding at every step
01:39 jenatali_: Yeah. I guess I haven't looked at what other passes try to modify it
01:39 jekstrand: Which is to say, you have to do the usual "align and add" dance
01:40 jenatali_: Hmm I guess if I start the offset at scratch_size before doing lower_vars_to_explicit, yeah that's probably not too bad
01:40 jekstrand: shader->scratch_size = align(shader->scratch_size, align_of_var)
01:40 jekstrand: var->data.location = shader->scratch_size
01:40 jekstrand: shader->scratch_size += size_of_var
01:41 jenatali_: Fair enough, I'll do that. Should I also handle sharing info.cs.shared_size with other passes while I'm at it?
01:41 karolherbst: shared_size has no meaning with CL, does it?
01:41 jenatali_: We lower "local" pointers into pointers to shared
01:41 jenatali_: er, offsets to shared
01:41 karolherbst: sure
01:42 karolherbst: but you still have no size
01:42 jekstrand: I think the only thing that's missing is initilizing offset to nir->scrach_size
01:42 jenatali_: karolherbst: We also accumulate the size of shared variables to fill out shared size
01:42 karolherbst: I am sure the kernel could give you a hint on the size though?
01:42 jekstrand: Unless you want to handle "real" function calls. Then it gets to be more fun.
01:42 karolherbst: jekstrand: but the kernel doens't know the size
01:42 karolherbst: ...
01:42 karolherbst: jenatali_:
01:42 jenatali_: jekstrand: No, definitely no function calls here :)
01:42 jekstrand: karolherbst: Doesn't know the size of what?
01:43 jenatali_: karolherbst: It does if you patch it at enqueue time
01:43 karolherbst: jekstrand: shared memory usage
01:43 airlied: at some point we'll have to grow up to real function calls :-
01:43 airlied: :-P
01:43 karolherbst: jenatali_: ehhh... well.. right
01:43 karolherbst: but no :p
01:43 jekstrand: airlied: Yeah.....
01:43 jenatali_: DXIL doesn't support dynamic work group sizes either, so we have to patch at enqueue time anyway
01:43 karolherbst: ufff
01:43 karolherbst: jenatali_: I see that offsets will be fun for you
01:44 jenatali_: :) It's actually not too bad
01:44 karolherbst: well
01:44 karolherbst: it is
01:44 karolherbst: there is no limit
01:44 jenatali_: We read them from kernel inputs for "pointers" coming from kernel args
01:44 karolherbst: your work group can be as big as the application wants it to be
01:44 jekstrand: Are you implementing scratch by attaching your own buffer? I assume DXIL doesn't really have a concept of scratch that's passed through to the driver.
01:44 jenatali_: jekstrand: DXIL is just LLVM so you can do alloca to create scratch space
01:45 jenatali_: HLSL shaders can do that, so drivers have to deal with it
01:45 karolherbst: ufff
01:45 jekstrand: Right... "just LLVM"
01:45 karolherbst: nice
01:45 karolherbst: jenatali_: but with a compile time constant size, right?
01:45 karolherbst: or... how do the driver know how much to allocate
01:46 jenatali_: karolherbst: For "private" memory which translates to scratch/alloca, yes we have a compile time constant
01:46 karolherbst: okay
01:46 karolherbst: having to know the size of CL local mem is painful...
01:46 jenatali_: Eh for "shared" we know how much too. It's inline __local variables plus sizes passed by the app at enqueue time
01:46 karolherbst: because.. for drivers it doesn't matter really
01:46 karolherbst: at least not for compiling
01:48 jenatali_: jekstrand: Opinion on whether lower_vars_to_explicit should support appending to shared_size as well as scratch_size?
01:48 karolherbst: you could even do magic and just patch the passed local "pointers" aka offsets by scratch_size
01:48 karolherbst: and all is good
01:49 jenatali_: karolherbst: https://gitlab.freedesktop.org/kusma/mesa/-/blob/msclc-d3d12/src/microsoft/clc/clc_compiler.c#L1041
01:49 karolherbst: yeah...
01:49 jenatali_: High-level, compiler writes out what the offset should be, runtime writes it into the memory that we read as kernel inputs
01:49 jenatali_: Works well enough at least
01:50 karolherbst: sane enough
01:50 karolherbst: I really would like to move the compilation to an earlier point
01:50 karolherbst: and already have the binary before enqueueing the kernel
01:50 karolherbst: but for you that sounds impossible to do
01:50 jenatali_: Yeah, I'd like that too, but we've got enough stuff that's an impedence mismatch
01:51 karolherbst: so I have to deal with shared_size to be 0 :)
01:51 karolherbst: CL terms are stupid anyway
01:51 karolherbst: local...
01:51 karolherbst: who came up with that
01:52 jenatali_: We also have to lower non-normalized sampler coordinates, since our samplers only support normalized coords, so even if we could patch the workgroup size *and* the shared mem size, we've got at least one more
01:52 karolherbst: :/
01:52 karolherbst: maybe you should fix DXIL :p but I guess that's even more annoying
01:52 karolherbst: and others will hate you for it
01:53 jenatali_: Heh, that's not something I can do on my own, though yes I can obviously provide feedback into making it happen (slowly)
01:53 jenatali_: And I have :P
01:53 karolherbst: :)
01:54 karolherbst: anyway.. should probably go to bed.. 4am here
01:54 jenatali_: Hah, yes. Thanks for your insights
05:45 danvet: nashpa, did I convince you on the drm_crtc_vblank_reset patch?
06:18 hakzsam: jekstrand: yeah, it was me :). I have a patch somewhere, I can give it a new try
07:30 pq: Does any mechanism exist that would let a DRM kernel driver use CLOCK_MONOTONIC_RAW with pageflip timestamps?
08:07 MrCooper: pq: I'm afraid not, it's hard-coded to monotonic
08:12 danvet: pq, idea was that we pick the same as alsa and v4l ... did they change?
08:15 pq: danvet, I'm trying to figure out why someone claims that weston is using CLOCK_MONOTONIC_RAW for presentation, when it doesn't even seems to have the code to reach that conclusion from DRM. So I'm just covering my bases here.
08:17 pq: danvet, as MrCooper hints to, the KMS UAPI does not support anything expect CLOCK_REALTIME or CLOCK_MONOTONIC, AFAIK.
08:17 pq: *except
08:17 pq: and I was wondering if that's still the case, or did I miss something, or could some drivers get it wrong
08:18 danvet: ah ok
08:18 danvet: I thought there's demand for this feature to be added, and wondered where it's coming from
08:18 pq: nope, no demand from my side
08:18 pq: btw. is there any way to get KMS use CLOCK_REALTIME? I'd like to test if my tiemstamp conversion code works.
08:20 pq: or if not, when did the last users of CLOCK_REALTIME in the kernel disappear? only roughly, like couple years or ten years ago?
08:21 MrCooper: 4.15 according to drivers/gpu/drm/drm_vblank.c
08:21 pq: thanks
08:22 MrCooper: np
08:22 pq: so some LTS kernels might still be using REALTIME, but even Debian stable is free from it
08:23 MrCooper: IIRC MONOTONIC has been the default for much longer
08:24 pq: ah
08:24 pq: so realistically, I should never see REALTIME
08:24 MrCooper: think so
08:24 pq: awesome, thanks
08:24 MrCooper: that was back when we didn't have a clue about this stuff yet :)
08:27 danvet: MrCooper, isn't that time a constant 5 years in the past, but for a changing set of "this" :-)
08:27 MrCooper: touché
08:29 zzag: Can a single process use both GLX and EGL? or are they mutually exclusive? (I know it's a weird question)
08:44 emersion: in which case do i get EBUSY from the kernel on an atomic commit?
08:45 emersion: let's say i have a rendering loop that submits a new frame to the kernel after each page-flip event
08:45 emersion: if i want to perform a modeset, do i need to wait for a page-flip event to avoid EBUSY?
08:45 emersion: right now i just do a blocking commit to modeset and it seems to work
08:46 emersion: would that still be the case if i do a non-blocking commit to modeset?
08:57 pq: hmm, I'd assume you'd have to wait always... but I guess not?
08:57 pq: or maybe that's just my personal desire to never do blocking anything
08:59 emersion: yeah, i would've expected the modeset during the rendering loop to fail with EBUSY
09:09 pq: emersion, maybe it's legacy KMS leaking through?
09:10 emersion: yeah, i'm kind of worried about this as well
09:11 pq: Does it hurt though? Either the compositor waits nicely and does not fail, or it doesn't wait, blocks a bit, and doesn't fail?
09:11 pq: if the kernel now changed to returning EBUSY instead of blocking a bit more, it could regress something
09:14 emersion: i'm just trying to understand how the queue works
09:14 emersion: and what's the best way to perform a modeset when you've submitted a page-flip
09:46 MrCooper: with legacy API, it's best to wait for flips to complete before doing a modeset
09:46 MrCooper: otherwise the flip can end up clobbering the modeset
09:48 danvet: MrCooper, do we really have a that bad driver?
09:49 MrCooper: yep, radeon and non-DC amdgpu at least
09:49 danvet: geez
09:49 emersion: MrCooper: and what happens if you do that with the atomic API?
09:50 danvet: emersion, nothing stupid
09:50 danvet: atomic is fully ordered
09:50 danvet: blocking commits always wait for previous ones, and never EBUSY
09:50 emersion: ah, nice
09:50 danvet: the only issue with EBUSY is that if you do a non-blocking modeset, you might get EBUSY on some other crtc
09:50 danvet: if we have to reconfigure stuff on them
09:50 danvet: and atm the uapi sucks on that
09:51 danvet: so you might just get a spurious EBUSY there, but no event, since you didn't ask for it
09:51 emersion: so doing a blocking modeset when a non-blocking one says EBUSY would be an easy way out?
09:51 danvet: yup, that will always work
09:51 danvet: emersion, btw did you see my EBUSY patch?
09:52 danvet: it's stuck in limbo because there's no some igt that fails with it
09:52 danvet: and maybe we need something more useful
09:55 emersion: ah, yes
09:59 emersion: let me find it again
11:35 shadeslayer: Hi, I can't seem to create a MR for igt-gpu-tools
11:36 shadeslayer: I literally do not see the section for creating MRs
11:36 shadeslayer: and manually going to https://gitlab.freedesktop.org/drm/igt-gpu-tools/-/merge_requests gives me a 404
11:37 pq: shadeslayer, looks like it uses email: https://gitlab.freedesktop.org/drm/igt-gpu-tools/-/blob/master/CONTRIBUTING.md
11:55 shadeslayer: ugh
12:20 emersion: shadeslayer: https://git-send-email.io/
12:41 shadeslayer: emersion: I have everything setup, it's just that I have to go through the subscribe, create filter and mailboxes dance
12:42 shadeslayer: looks like IGT also doesn't have a function to fill a array with random characters
12:42 emersion: you don't need to subscribe
12:42 emersion: and if you do want to subscribe, you can disable email delivery
12:42 shadeslayer: emersion: oh? do you just end up using patchwork?
12:43 emersion: well, you get replies to your patches by email…
12:43 emersion: but if you want to browse other people's patches, yeah patchwork can help
12:56 pq: emersion, your modifiers doc patch a hunk in amdgpu_dm.c?
12:56 pq: +has
12:57 emersion: oh, really?
12:57 emersion: damn
12:57 emersion: sorry about that, let me send v3
12:57 pq: or then my claws-mail is more broken than I thought
12:57 emersion: also forgot to add teh changelog
14:25 danvet: tomba, [PATCH] drm/atomic-helper: reset vblank on crtc reset <- ping for some review on this one?
14:25 danvet: testing also pretty useful
14:35 danvet: emersion, yeah that hunk in v2 shouldn't be there
14:38 emersion: sent a fixed v3
14:51 Zeising: Are there no release tarballs of libglvnd?
14:53 danvet: emersion, ah yes
14:53 danvet: emersion, do you have someone to push all this for you?
14:53 daniels: Zeising: https://github.com/NVIDIA/libglvnd/releases - see 'assets'
14:53 danvet: or want commit rights?
14:54 kisak: Zeising: looks fine? https://gitlab.freedesktop.org/glvnd/libglvnd/-/releases
14:54 danvet: I'd be very happy if we have someone taking some care of the uapi doc stuff
14:54 daniels: kisak: er yeah, forgot it moved :P
14:55 Zeising: kisak, daniels: I guess what I'm asking is if they should be in the same place as other freedesktop.org assests. It might be intentional to pull it straight from gitlab though?
14:57 Zeising: There's a lot of stuff here for instance: https://www.freedesktop.org/software/ and here https://xorg.freedesktop.org/releases/individual/ and so on. I guess I expected to find a tarball somehwere like that as well.
14:57 emersion: daniels: nope, anyone with push rights is welcome to apply it :)
14:58 Zeising: perhaps I should poke eric_engestrom about it? :)
14:58 emersion: ah, i'm interested in taking care of uapi doc stuff, yes
14:59 emersion: pq is probably interested too, but maybe lacks the time to contribute
15:08 robher: danvet: I pinged a few other folks to test out the shmem patch as I still haven't found the time...
15:09 danvet: robher, thx
15:09 danvet: yeah maybe bbrezillon or mripard could test on v3d ...
15:10 danvet: robher, I think I'll give it another week or so, and if nothing comes in leave it at the testing tzimmermann has done on udl
15:11 robher: danvet: yep, bbrezillon is who I asked. He said he could tomorrow.
15:14 danvet: awesome
15:21 danvet: pinchartl, since you apparently fixed this for mxsfb somewhat already, but not quite: "[PATCH] drm/atomic-helper: reset vblank on crtc reset" any takes?
15:37 jekstrand: hakzsam: Thanks! If you're concerned about regressions, running nir_lower_io immediately after spirv_to_nir should get you roughly the current behavior.
15:37 hakzsam: okay
15:38 jekstrand: Well, it does change the way descriptor references work a tiny bit. Speicifically, there's a load_descriptor intrinsic you'll get and have to handle.
15:39 bbrezillon: danvet: I don't have a rpi unfortunately
15:39 jekstrand: hakzsam: You may want to do your testing on top of https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5241
15:39 jekstrand: hakzsam: It adds derefs for push constants
15:39 hakzsam: thanks, will do
15:39 eric_engestrom: Zeising: yeah, libglvnd uses git tarballs directly for releases, instead of re-uploading them somewhere else like most projects used to do before github/gitlab were a thing
15:40 eric_engestrom: so that's indeed the "official" URL to use
15:41 eric_engestrom: come to think of it, I'm not even sure libdrm or mesa have any pre-generated artifacts anymore either, so we could move to simple git tarballs
15:41 eric_engestrom:makes a note to fact-check that later
15:42 emersion: no more PGP signatures?
15:42 eric_engestrom: back in the autotools days we needed to pre-generate a ton of things to then ship, starting with ./configure
15:43 eric_engestrom: emersion: the release tags are already pgp-signed with the same key
15:43 emersion: yeah, but that's the Git tags only
15:43 emersion: if a distro wants to check the tarball is what the releaser intended, it's complicated
15:44 emersion: that's the only reason why i'm still generating tarballs locally and uploading them
15:44 emersion: note, you can host them on GitLab, and even script the whole thing, pretty easily
15:45 eric_engestrom: I'm not sure what failure/attack vector or threat model you're thinking about?
15:45 eric_engestrom: yeah I know we can host them on gitlab, the point was about having to do the local-tarball-generation dance
15:46 emersion: i'm thinking of someone getting access to the fd.o infra and serving bad tarballs
15:47 ajax: you think that's bad, even if you install the good tarballs you still end up with X11
15:48 eric_engestrom: but that's not an issue anymore is you're using git tarballs from gitlab
15:48 emersion: hm, what do you mean?
15:48 eric_engestrom: unless you mean if gitlab (the software) gets compromised and someone uses that flaw to make it serve bad tarballs?
15:49 emersion: if you have access to fd.o infra, it's pretty easy to make the reverse proxy serve a bad tarball instead of forwarding the request to gitlab
15:49 eric_engestrom: ah
15:49 eric_engestrom: right
15:49 eric_engestrom: that's true
15:50 emersion: requires the fd.o infra to get compromised or for an authorized bad maintainer to do it, true
15:50 eric_engestrom: ajax: haha
15:50 emersion: but also gitlab.fd.o is hosted in "the cloud", so cloud providers could do it too
15:50 eric_engestrom: emersion: it's a real possibility nonetheless
15:51 eric_engestrom: cloud providers are less of a threat imo
15:51 emersion: once the kernel infra was compromised, so that's why i think about it
15:51 eric_engestrom: but hackers getting in via flaw in the configuration, bug exploits or plain password stuffing is a real threat
15:52 emersion: https://en.wikipedia.org/wiki/Kernel.org#2011_attack
15:52 eric_engestrom: I guess the perfect fix would be for gitlab to serve the .sig alongside the .tar.xz
15:54 eric_engestrom: I remember hearing about this but hadn't looked up the full story
15:56 emersion: oh, gitlab doesn't support attaching arbitrary blobs to tags :/
15:56 emersion: github supports it
15:57 eric_engestrom: no, but it has "releases" which are tags + editable text + uploadable blobs
15:57 daniels: yes
15:57 emersion: hm, i bet uploading a blob there doesn't provide a stable URL :/
15:58 eric_engestrom: it does iirc
15:58 emersion: ah, nice
15:58 daniels: it does
15:58 emersion: cool
15:58 daniels: i've certainly been a fan of something like https://github.com/cgwalters/git-evtag over detached tarball sigs, but otoh the web of trust doesn't actually exist and would anyone genuinely notice if you just signed it with a scratch key?
15:59 emersion: the web of trust doesn't need to exist
16:00 emersion: PGP keys are hardcoded in package metadata files
16:01 ajax: daniels: anyone who ever validated any tarball i released with my first pgp key was taking that signature _entirely_ on faith, as anyone who decided to sign my key did so without telling me, and i never signed anyone else's with it.
16:01 ajax: the web of trust there was entirely "hey, these tarballs are signed by the same key as all these emails"
16:02 eric_engestrom: ajax: that's kind of the point though
16:02 eric_engestrom: doesn't matter who you are
16:02 eric_engestrom: it only matter you are that person
16:02 ajax: indeed
16:03 daniels: sure, I think proof of association is much much better than proof someone standing in a big room saw a possibly fake ID for like ten seconds
16:03 daniels: but on the other hand, look at how many people actually GPG-sign their mails ...
16:03 daniels: so we are more or less asking people to take change-of-key on faith whenever someone new does a release
16:04 ajax: but, once you have that kind of faith in distributed attestation, why is a tarball more reliable than a signed tag, or that any more reliable than a chain of sha1 sums representing a widely replicated tree state
16:04 daniels: i think we'd get far far better security by just requiring everyone who can push to have 2FA enabled tbh
16:05 eric_engestrom: oh definitely; speaking of, I was thinking: would it be ok to make having 2fa a requirement to getting push rights?
16:05 emersion: would be a good idea indeed
16:05 ajax: speaking of widely replicated tree states, i have another 30kish patches in this rebase to attend to
16:05 ajax:vanishes
16:05 emersion: do you get a lot of "i lost my 2fa code" requests, daniels?
16:06 MrCooper: danvet: holy shit, Thomas Hellström is at Intel now? Did you guys need TTM know-how for dGPUs? :)
16:06 danvet: ...
16:06 daniels: emersion: not had any so far!
16:06 emersion: nice
16:07 daniels: (otoh every time someone wanted to change their SSH key on the old infrastructure they had inevitably lost their GPG key like ... three years prior)
16:07 emersion: eh
16:09 eric_engestrom: back to the "mandatory 2fa", is that a knob you can turn in gitlab, or would it have to be enforced by humans knowing that rule and thinking to check, and hope they don't turn it off right after?
16:10 imirkin: you can enforce it on a group iirc
16:10 imirkin: i.e. you can say that all members of mesa project must have 2fa
16:10 eric_engestrom: yup, just found it: https://docs.gitlab.com/ee/security/two_factor_authentication.html#enforcing-2fa-for-all-users-in-a-group
16:10 imirkin: at least i remember doing that on gitlab.com, presumably the CE also has that
16:11 eric_engestrom: ah wait that's on groups, but permissions are based on "role"
16:11 imirkin: yeah, dunno that you can enforce that all e.g. developers, have 2fa, but reporters don't
16:11 imirkin: (making up permission names, but i think those are close, if not accurate)
16:12 daniels: they're 100% accurate!
16:12 ajax: i don't think there's a knob for it, but you could certainly enforce that socially
16:13 anholt_: https://docs.gitlab.com/ee/security/two_factor_authentication.html but maybe ee only?
16:14 daniels: currently 18/128 people with >=developer access to mesa/mesa have 2fa enabled
16:14 daniels: anholt_: we can enforce that now, but as imirkin said, that also means enforcing it for people we give reporter-level access to as well
16:15 bnieuwenhuizen: I wonder if it makes sense to make a lot of the branches marge-only. That way you can at least mostly avoid people hiding whatever nefarious they did
16:15 bnieuwenhuizen: 128 people who can force push are a lot
16:16 anholt_: force pushes are a lot more visible and disruptive than sneaking in a MR, not sure that would help
16:16 eric_engestrom: is restricting a branch to a specific user a thing gitlab can do?
16:16 eric_engestrom: anholt_: wouldn't have to be a force-push, someone could just push something to master instead of making an MR for it
16:17 bnieuwenhuizen: right, even a normal push doesn't tell you who did it in a trustworthy way
16:19 daniels: https://docs.gitlab.com/ee/user/project/protected_branches.html
16:19 eric_engestrom: but yeah, +1 on restricting master to marge-only, and all other branches to readonly (which mean no new branches if someone accidentally pushes their branch to master), except for the staging/XX.Y and XX.Y which maintainers need to be able to push to
16:19 eric_engestrom: *accidentally pushes their branch to mesa/mesa
16:21 MrCooper: I was going to propose restricting push access to maintainers/owners (which includes Marge)
16:24 eric_engestrom: slightly relatedly: I really dislike the 100+ branches of dead-for-years stuff we have in mesa
16:26 eric_engestrom: I went through them like a year ago and deleted the ones (20-30 of them iirc) that had not commit after branching off of master, but there's tons left that have just a WIP patch or a PoC and some other commits that have no value anymore
16:30 eric_engestrom: (but I don't want to be the one to make that "no value" call)
16:52 MrCooper: danvet: drm_framebuffer_remove sometimes seems to leave the CRTC enabled even when disable_crtcs=true
16:53 danvet: huh
16:54 danvet: MrCooper, any other insights already?
16:54 danvet: does the atomic commit fail somehow?
16:57 MrCooper: yeah, returns -EINVAL, presumably because the CRTC is still enabled in the new state, but all its planes are disabled (which I'm changing amdgpu DC to always reject)
16:57 MrCooper: the question is how the CRTC state can remain enabled
16:58 MrCooper: despite disable_crtcs=true
17:01 danvet: MrCooper, at least for a modeset we shut down everything as we should
17:01 danvet: a) mode clear b) active false c) all connectors are off
17:01 MrCooper: ah, it only disables the CRTC when the primary plane is enabled, looks like the problem occurs when an overlay plane is disabled last
17:01 danvet: so might be time to instrument dc atomic_check and see what complains
17:01 MrCooper: primary plane is disabled
17:02 danvet: ah yes
17:02 danvet: if you need that link, probably best to always require a primary plane
17:02 danvet: otherwise rather confusing for everyone
17:03 MrCooper: got it, will try that next, or are there are valid use cases for overlay plane without primary?
17:03 danvet: or at least "primary plane must be on" is rather common "a plane, no matter which one, must be on" is a bit special
17:03 danvet: not sure there's anything else than amdgpu
17:03 anholt_: bnieuwenhuizen, jekstrand: what do you know about deqp's shader cache? does it have any driver dependencies, or could we precompute it at container build time by doing a deqp run against a working driver so that they were all hit once? if we're doing shader caching at runtime, is it safe to have multiple deqp processes pointing at the same file?
17:04 danvet: MrCooper, there are
17:04 danvet: hm ...
17:04 danvet: vsyrjala, how does this work on gen2?
17:04 danvet: MrCooper, I think generally the really flexible hw is happy with any amounts of planes enabled
17:05 danvet: any amount = [0, all] inclusive
17:05 MrCooper: the cursor plane seems special here unfortunately
17:07 daniels: anholt_: RTYI https://patchwork.freedesktop.org/patch/367529/
17:08 danvet: MrCooper, what's it again? cursor needs primary plane to work?
17:11 danvet: MrCooper, maybe we could encode a "cursor requires primary plane" restriction
17:11 danvet: should be less confusing
17:11 danvet: still allows overlay-only video
17:13 jekstrand: anholt_: It's not safe to have multiple processses pointing at the same file. That's a very good way to have very strange failures.
17:13 jekstrand: It usually asserts inside std::string IIRC. :-)
17:14 jekstrand: anholt_: AFAIK, it just caches the SPIR-V generated by GLSLang so probably no driver dependencies
17:16 anholt_: jekstrand: cool, that may be the current source of flakes in CI, and a guide to how to reduce CI's runtime.
17:17 anholt_: (but I wonder how bnieuwenhuizen hasn't tripped over this)
17:25 pinchartl: danvet: queued on my review list :-)
17:25 danvet: pinchartl, thx a lot
17:25 danvet: pinchartl, do I want to ask how long that list is or maybe not?
17:25 danvet: pinchartl, and if you're totally swamped ok to skip, it's kinda both an fyi ping and poke for review
17:26 pinchartl: it's a very long list, but given that I was planning to work on a v2 of the mxsfb patches soon, yours is high on the list
17:33 anholt_: jekstrand: thanks, that was in fact my flakes, it looks like. and now the CI is faster by disabling truncation so that we have a per-thread cache across the entire run for the thread.
17:33 bnieuwenhuizen: anholt_: reusing same filesystem between runs?
17:34 anholt_: bnieuwenhuizen: not reusing fs between runs was the difference between my local cheza and CI
17:34 jekstrand: anholt_: You should pass your notes on to craftyguy. It may be useful for us as well. I don't know what we're doing right now with the cache. I think we're just disabling it.
17:35 jekstrand: But if we can save some CPU cycles (GLSLang can be a pig) by enabling the cache, that'd be good.
17:36 anholt_: though. hmm. given that truncation is the default behavior per deqp invocation, I'm not sure how my local cheza wouldn't be doing the same race as CI
17:37 jekstrand: you can specify the file name so you could do deqp-cache-$SHARD_ID
17:37 anholt_: yep, that's the fix
17:37 anholt_: what I'd love is to have the container pre-bake a shader cache, and populate the shard caches with the container's RO copy.
17:38 anholt_: but that's a problem for future me trying to get us to 100% coverage, rather than "get literally any pre-merge vulkan CI in place"
17:38 jekstrand: Yeah
17:38 jekstrand: You can also tell the CTS to compile all the shaders at build time and do it on a big box
17:39 jekstrand: I don't know how that works though
17:39 jekstrand: We should probably look into that for our CI as well
17:39 jekstrand: But craftyguy's time is valuable and we have many other things to do
17:41 craftyguy: that is an interesting idea though
17:41 anholt_: jekstrand: not seeing the build time thing in the CTS.
17:42 jekstrand: anholt_: I don't remember how it works. :-(
17:43 jekstrand: anholt_: I think it's external/vulkancts/scripts/build_spirv_binaries.py
17:44 jekstrand: anholt_: It's intended precisely for your use-case, though: wimpy CPUs.
17:44 bnieuwenhuizen: there is vk-build-programs in the build dir
17:44 jekstrand: anholt_: There was a huge push for it early on to make Android testing faster
17:44 jekstrand: I've never used it though
17:44 anholt_: nice
17:44 anholt_: will definitely enable that
17:45 jekstrand: I wonder if it properly multi-threads....
17:45 jekstrand: I guess it doesn't matter that much because you only have to run it when you rev your CI build which probably doesn't happen often.
17:46 anholt_: yep, that would be in the container
18:12 Zeising: eric_engestrom: At least for now, mesa and libdrm are hosted elsewhere.
18:14 Zeising: eric_engestrom: But there is no plans for libglvnd to be served other than from gitlab?
18:16 mattst88: Zeising: libglvnd moved from github to gitlab.freedesktop.org so that non-nvidia people could actually make it usable for distros
18:16 Zeising: ok
18:16 mattst88: it's still mirrored on github: https://github.com/NVIDIA/libglvnd
18:16 Zeising: Yeah, that I've seen.
18:17 Zeising: Is it ready for general consumption btw?
18:17 Zeising: From that I can see, most linux distros seem to use it, but I might be wrong.
18:17 kisak: libglvnd? it was ready for general use years ago
18:18 mattst88: heh, I wouldn't quite go that far
18:18 Zeising: :)
18:18 mattst88: but yeah, it's usable now and I suspect all Linux distros are switched over to it
18:18 Zeising: Ok
18:19 mattst88: unfortunately, nvidia-drivers as old/recent as -390 do not quite support libglvnd, so my life as a gentoo maintainer isn't necessarily improved yet
18:20 mattst88: (https://bugs.gentoo.org/713546)
18:21 Zeising: Yeah, fun times indeed.
18:21 vsyrjala: danvet: gen2 is fine with all planes disabled. the only peculiar thing is that the underrun bit is asserted all the time while all planes are off. but we just ignore it. no planes -> nothing can underrun anyway
18:21 danvet: vsyrjala, ah right that was it
18:22 danvet: MrCooper, ^^ fyi
18:22 Zeising: mattst88: Is it possible to see a package list for the gentoo mesa package(s)?
18:22 vsyrjala: another slightly fishy thing is that pipe crc gives a single bogus value when transitioning from n planes -> 0 planes
18:22 vsyrjala: happens on modern hw too
18:22 vsyrjala: iirc
18:22 mattst88: Zeising: a list of files installed by the mesa package?
18:22 Zeising: yes
18:23 vsyrjala: actually not sure the crc issue happens with ancient hw.
18:23 vsyrjala: definitely seen it on ivb, and some skl+ things iirc
18:23 Zeising: and, with the risk of revealing exactly how little I know about gentoo, hos is mesa packaged? One package with everything, or subpackages for each library and so on?
18:23 karolherbst: Zeising: not possible by design
18:24 karolherbst: subpackages are also pointless unless you can split up the compilation process
18:24 karolherbst: there is some magic done in qt5 eg
18:24 karolherbst: but normally there are no subpackages
18:25 Zeising: Ok
18:25 karolherbst: Zeising: also, the use is free to enable or disable compile time features and stuff
18:25 mattst88: Zeising: of course it'll change depending on the configuration of USE flags, but here's the list from my system: http://dpaste.com/04CKWBV
18:25 Zeising: mattst88: thanks!
18:25 mattst88: yw!
18:25 mattst88: Zeising: I assume that you're investigating enabling libglvnd on FreeBSD?
18:26 kisak: Zeising: on gentoo, you can use 'equery files mesa', online, you can use pfl http://www.portagefilelist.de/site/query/listPackageFiles/?category=media-libs&package=mesa&version=20.1.0_rc4&do#result
18:26 karolherbst: yeah.. but that requires having built it already
18:26 Zeising: mattst88: Yes
18:26 mattst88: nice
18:27 Zeising: mattst88: libglvnd is packaged already, and there's a submission to enable it. I'm just trying to 1. wrap my head around it and 2. sneaking a peek into how other projects have packaged stuff. Gentoo is sometimes good to look at since portage and ports have quite a lot of similarities.
18:27 karolherbst: kisak: ehh.. not a single glvnd user?
18:28 mattst88: Zeising: yeah, makes sense :)
18:28 kisak: karolherbst: I don't understand the question
18:29 karolherbst: kisak: in that list you linked to there is not a single entry with glvnd enabled
18:29 karolherbst: which.. is odd
18:30 karolherbst: because there is "/usr/lib64/libGLX_mesa.so.0.0.0" ehh.. weird
18:31 kisak: my understanding the site is based on user submissions, just means nobody gave feedback to that site with that specific version of mesa built with USE=libglvnd
18:34 karolherbst: yeah.. maybe
18:34 karolherbst: maybe it's hidden because libglvnd is the default now...
18:35 Zeising: mattst88: With the risk, once again, of not understanding portage, how is the dependency chain between libdrm, mesa, libglvnd, wayland and xorg? Because that's one of the things I forsee as an issue. Apps will depend on libglvnd to link to various mesa libraries, but something also needs to pull in mesa. I'm just curious how you've handled that.
18:36 Zeising: And another question, should mesa provide a library for every library libglvnd provides, or does libglvnd map several different libraries (dispatch, I guess) into one mesa lib?
18:36 Zeising: sorry for playing 20 questions
18:46 anholt_: danvet: botching up ioctls doc says to align 64-bit-containing struct size to 64 bits still, but I thought we didn't need that for drm
18:47 danvet: anholt_, it's defensive, in case you put them into some kind of array
18:47 danvet: iirc on x86 it doesn't matter, but on others 64bit stuff gets aligned or something like that
18:47 anholt_: ok, so for a top level ioctl struct (the new set/getlabel) we should be fine
18:47 danvet: or I'm just cargo culting this stuff and paranoid doesn't cost much :-)
18:48 danvet: yeah top level shouldn't hurt
18:48 danvet: since the size is encoded anyway
18:48 danvet: so you can still extend at the bottom and it's all good
18:49 danvet: oh, so no drm_ioctl we're fine, since we decode the size field in the ioctl struct number
18:49 danvet: iirc this is why the dma-buf set_label blew up
18:49 Zeising: mattst88: Is it possible to get a similar list from the same system for libglvnd?
18:49 danvet: the pointer got encoded or something like that, and dma-buf doesn't have its own forward/backward compatible magic
18:49 mattst88: Zeising: sure, sec
18:50 Zeising: thanks!
18:50 mattst88: Zeising: http://dpaste.com/33WPGPP
18:52 Zeising: :thumbsup: as they say at work :)
18:52 Zeising: I'm still not sure I understand everything, but it's a start. What confuses me is for instance that there's a libGLESv2 in libglvnd, but there is no corresponding libGLESv2_mesa in mesa.
18:53 Zeising: But I guess that's because libglvnd knows that it's supposed to dispatch calls through libGLESv2 to somewhere else in mesa
18:55 mattst88: Zeising: I think libglvnd's libGLES*,libOpenGL,libGL all dispatch via lib{GLX,EGL}_$vendor.so
18:55 Zeising: .Ok
18:55 vsyrjala: danvet: anholt_: didn't we have some issue with 32bit userland + 64bit kernel when things weren't 64bit aligned? or was that just the standard "i'll just put this long into the ioctl struct, what could go wrong" type of mess?
19:03 anholt_: vsyrjala: which things specifically are you thinking of? struct size or struct members?
19:26 eric_engestrom: does anybody know if someone's already done the work to port VK-GL-CTS to xdg_shell? it still supports only wl_shell, which wlroots/sway doesn't support (and won't), meaning I have to run weston inside sway just to run deqp...
19:28 eric_engestrom: I'm contemplating doing it if nobody has already, but I'd be very slow.. on the other hand, I would learn a lot in the process 🙃
19:41 Lyude: Hey, would anyone have any idea why lib/tests/igt_fork from igt-gpu-tools would fail like this when launched directly, but not when it's launched from meson test?
19:41 Lyude: https://paste.centos.org/view/d7bed856
19:52 EdB: jvesely: change made acording your comment on https://gitlab.freedesktop.org/mesa/piglit/-/merge_requests/271/
20:06 ngcortes: heads up: some results on the mesa results site will be culled in light of a site backend update
22:12 jekstrand: karolherbst: Do you have any memory of why OpSignBitSet choses whether or not to do a signed shift based on the number of components? That seems completely bonkers to me.
22:13 karolherbst: jekstrand: no clue, sounds bonkers to me as well
22:13 jekstrand: karolherbst: Well, you wrote it and I reviewed it. :-P
22:13 jekstrand: karolherbst: I'm cleaning it up now.
22:13 karolherbst: ohh wait.. let me check the cl specs then :D
22:14 karolherbst: I am sure there is a comment
22:15 karolherbst: jekstrand: "Test for sign bit. The scalar version of the function returns a 1 if the sign bit in the float is set else returns 0. The vector version of the function returns the following for each component in floatn: -1 (i.e all bits set) if the sign bit in the float is set else returns 0."
22:15 karolherbst: I guess there was some spirv fallout from that
22:16 karolherbst: I will check with the CTS
22:17 jekstrand: karolherbst: That's not what SPIR-V says
22:17 jekstrand: karolherbst: SPIR-V requires the destination to be a boolean
22:17 karolherbst: but I guess we deal with a broken spirv llvm translator here and spirv has a fixed op
22:17 karolherbst: yeah.. let me see
22:20 alyssa:sings the glamor blues
22:23 karolherbst: jekstrand: seems like either one works now... probably something got fixed in the meantime and we also have our 1 bit bools now
22:26 karolherbst: ahh yeaj
22:26 karolherbst: the translator emits a OpSelect choosing the right constant
22:56 alyssa: #0 0x0000aaaac449194c in _start ()
22:56 alyssa: thanks for the helpful bt, X11
22:58 imirkin: that sounds like glibc fail
22:58 imirkin: _start is supposed to call main
22:58 alyssa: Weee
22:58 alyssa: probably not gdb'ing right
22:58 imirkin: did you run?
22:59 alyssa: grabbed the xwayland pid from `ps -aux` and `gdb -p [..]`
22:59 imirkin: oh
22:59 imirkin: should work. weird.
23:00 karolherbst: alyssa: threads apply all bt
23:00 karolherbst: ehh
23:00 imirkin: (or just "i threads" to see threads)
23:00 karolherbst: thread apply all bt
23:02 alyssa: Only seeing a single thread, probably broke things worse than expected
23:06 airlied: X server doesn't really have much threads, well one once it gets input going
23:06 airlied: just printf your way out via the log file :-P
23:07 alyssa: on further thought maybe building Kodi with Wayland support might be easier .. :P
23:15 alyssa:wonders if she has enough glamor for apitrace at least
23:15 alyssa: can build up from there..
23:16 airlied: doesn't sounds like it's getting near glamor
23:16 airlied: you should hvae at least a logfile
23:16 alyssa: :|
23:17 alyssa: otoh apitrace should work with wayland natively ..
23:26 jenatali_: karolherbst: Does Clover support global work ID offsets?
23:27 karolherbst: jenatali_: no
23:27 jenatali_: That's what I thought
23:27 karolherbst: jenatali_: it's a 1.2 feature :p
23:27 alyssa: Patches welcome?
23:27 alyssa:ducks
23:27 jenatali_: Ah, right
23:27 karolherbst: but.. every compliant 1.0 has to support it anyway
23:27 jenatali_: alyssa: Patches incoming ;)
23:27 alyssa: jenatali_: \o/
23:27 karolherbst: there is no way your runtime can get away without supporting offsets
23:28 jenatali_: Yep
23:28 jenatali_: We've got it implemented, but we don't support the explicit intrinsic for querying offsets
23:28 karolherbst: I.. just never went through the troube of splicing the kernel invocations
23:28 jenatali_: I'm trying to decide where to put the lowering of ID => actual index + offset
23:28 karolherbst: yeah...
23:28 karolherbst: I thought about how to do it, but it just sucks
23:28 jenatali_: Yeah
23:28 karolherbst: I think you always have to load the offset
23:28 jenatali_: Don't really want to add a second intrinsic for "this is the real ID without requiring an offset added in"
23:29 jenatali_: Yeah, we do
23:29 jenatali_: I'd just like our backend to see "load ID" and "load offset" as separate intrinsics and dumbly translate them
23:29 karolherbst: yeah
23:29 karolherbst: there is no other way
23:29 karolherbst: no hw supports the real id
23:29 karolherbst: I think
23:29 karolherbst: CL is just stupid here...
23:29 karolherbst: seriously.. no limits on the global work size
23:30 karolherbst: how instane
23:30 jenatali_: So, objections to adding 2 intrinsics? One which is pre-lowering ID (including offset) and one which is post (real hardware thread ID)?
23:30 karolherbst: *insane
23:30 karolherbst: jenatali_: there is no hw supporting the real hw thread id
23:30 karolherbst: it's just pointless to care about it
23:30 jenatali_: ?
23:30 alyssa: so lower in vtn then?
23:31 karolherbst: we lower it in nir anyway
23:31 karolherbst: to local id + local size
23:31 jenatali_: Ah, I understand
23:31 karolherbst: and now we have to add offset as well to this lowering
23:31 karolherbst: jenatali_: and compliant implementations have to splice the kernel invocation up into n invocations because CL is stupid
23:31 karolherbst: so you need to handle it internally anyway
23:32 jenatali_: Oh... that's what you meant by there's no max...
23:32 karolherbst: yeah....
23:32 karolherbst: it's a mess
23:32 jenatali_: Awesome
23:32 karolherbst: :)
23:32 karolherbst: there is a limit on the local group size though
23:32 jenatali_: Yeah
23:33 jenatali_: Alright, I've got some work to do then
23:33 karolherbst: anyway.. for CL we can probably always inject the offset and let the runtime treat it as a system value or something
23:33 jenatali_: Yeah we load it from a UBO
23:33 karolherbst: and.. drivers push the value into a constant buffer like some drivers do for the local group sizes
23:34 jenatali_: Which we'll have to do for local sizes too I think
23:34 karolherbst: jenatali_: is there a concept of offsets in DXIL?
23:34 jenatali_: Nah
23:34 jenatali_: There are global indices though
23:34 jenatali_: But 0-based
23:34 karolherbst: yeah..
23:34 karolherbst: what you need for compute shaders
23:35 karolherbst: I mean.. you can also always recompile the kernel on each slice
23:35 karolherbst: matters... not at all
23:35 karolherbst: if you have to slice the kernel runs for hours anyway
23:35 karolherbst: so...
23:35 karolherbst: but I think it makes sense to support it in a way where you don't have to recompile all the time
23:36 jenatali_: Heh, true, I could pass it into the compiler :P But nah I'd rather have that in a UBO
23:36 karolherbst: mhh, but then we have the 1.2 API with the application offsets
23:36 jenatali_: Local sizes have to be passed into the compiler
23:36 karolherbst: ehh
23:36 jenatali_: Right, that's what we have implemented so far, hadn't realized about splitting
23:36 karolherbst: yeah.. but splitting is a runtime only thing
23:36 karolherbst: just loop over enquekernel :p
23:36 karolherbst: and pass in the offset for each slice
23:37 jenatali_: Yep, pretty much
23:37 karolherbst: I think we can just have a new intrinsic for the offsets thenand emit it when lowering the global id
23:37 karolherbst: or... mhh dunno
23:38 jenatali_: Yeah, maybe do that in vtn
23:38 karolherbst: the end result is the same no matter what we do, so I don't care as much
23:38 karolherbst: yeah.. vtn is probably the sane location
23:38 karolherbst: we know we have a kernel there
23:38 karolherbst: and we can just add the offset to the global id
23:38 jenatali_: Yeah
23:39 karolherbst: just be aware you also have to fix the linear id variant as well :p
23:39 jenatali_: Yep, I see that
23:39 jenatali_: Actually wait
23:40 jenatali_: No, CL defines the linear ID as not having offsets
23:40 karolherbst: sure?
23:40 karolherbst: https://man.opencl.org/get_global_linear_id.html
23:41 jenatali_: For 1D work-groups, it is computed as get_global_id(0) - get_global_offset(0).
23:41 karolherbst: yeah, get_global_offset :p
23:41 karolherbst: ohhhh
23:41 karolherbst: wait
23:41 jenatali_: Which means if we have to do splitting, we need two levels of offsets...
23:41 jenatali_: One that's returned from get_global_offset, and one that's not
23:42 jenatali_: Where get_global_offset only returns the API offset, not any internal splitting
23:42 karolherbst: why does CL have to have stuff like this :(
23:42 karolherbst: this is so annoying
23:42 jenatali_: :D
23:42 karolherbst: yeah whatever.. then we add the offset only for global_id and leave global_linear_id alone
23:42 karolherbst: ....
23:42 karolherbst: those are different sys vals anyway
23:43 jenatali_: Yeah
23:44 karolherbst: jenatali_: if you post the patches please CC me and I will fix it up for nouveau then
23:44 jenatali_: Yep, probably tomorrow or early next week
23:44 jenatali_: Started running the CTS and immediately fell over on a bunch of missing sysvals
23:45 karolherbst: :)
23:45 karolherbst: yeah.. the CTS is quite thorough with checking everything actually
23:45 jenatali_: Wasn't super impressed with the 2700 warnings I got from building it though :P
23:45 karolherbst: well
23:45 karolherbst: same with the GL/VK CTS
23:46 jenatali_: FWIW, looks like the CL CTS has a max global work size of 256...
23:46 jenatali_: #define MAX_GWS 256 // Global Work Size (must be multiple of 16)
23:46 karolherbst: I am sure it uses bigger work groups
23:47 jenatali_: Oh you're right, I found it
23:47 karolherbst: test_conformance/basic/test_global_work_offsets.cpp has some offset tests btw
23:48 jenatali_: Thanks
23:48 karolherbst: test_basic has global_work_offsets and get_global_offset
23:48 jenatali_: Yep, test_basic is where I fell over
23:48 karolherbst: :)
23:49 karolherbst: in the past the CTS required 2.0 :p
23:49 jenatali_: Yes, I've noticed that
23:49 karolherbst: not really.. but some API stuff had to be implemented
23:50 jenatali_: Thanks for the insights, didn't realize about the splitting... I'll figure something out tomorrow and hopefully you don't hate it :P
23:51 karolherbst: jenatali_: got lazy and did this: https://gitlab.freedesktop.org/mesa/mesa/-/commit/333c9d5bb054d5ac5518e830b535e8a4f3f80187
23:51 karolherbst: :D
23:51 karolherbst: enough for the CTS
23:52 jenatali_: Yeah, I've got a few trivial 2.0 functions implemented - I have the full 2.2 set in our runtime (which I'll get around to posting on github probably next week) with most >1.2 stuff stubbed
23:52 karolherbst: yeah...
23:53 karolherbst: jenatali_: It might be that I get super unresponsive over the next 1 or two weeks.. internal stuff comming up
23:53 jenatali_: Thanks for the heads up, no worries
23:54 jenatali_: I'm hopeful that we've gotten through the big hurdles and I just have a bunch of bugfixing and runtime work to go do
23:55 karolherbst: yeah
23:55 karolherbst: most of the ugly bring we already took care of :p
23:55 karolherbst: like the general pointer support
23:55 karolherbst: it was quite fun actually teaching vtn about all the CL semantics
23:55 jenatali_: Yep. Just not "generic" or SVM pointers :P
23:55 karolherbst: well
23:56 karolherbst: SVM pointer don't need compiler changes ;p
23:56 jenatali_: Yeah, I've been enjoying learning nir from this
23:56 karolherbst: I already have system SVM running with nouveau and clover actually
23:56 karolherbst: it's all just runtime stuff
23:57 jenatali_: Makes sense. Windows doesn't really like SVM at the moment
23:57 karolherbst: especially.. because USE_HOST_PTR buffers can be svmed as well :p and you can pass the buffer _and_ the SVM HOST_PTR into the same kernel :p
23:57 karolherbst: it's nightmare
23:57 jenatali_: Oh boy...
23:57 karolherbst: yeah...
23:58 karolherbst: but well
23:58 karolherbst: got all CTS svm tests to pass
23:58 karolherbst: except the ones using generic pointers
23:58 karolherbst: not going to implement generic pointers just for that
23:58 jenatali_: :)
23:59 karolherbst: ohh btw, you can back __constant* by SVM memory as well :p
23:59 karolherbst: so.. uhm