00:26imirkin: zmike: https://cgit.freedesktop.org/mesa/mesa/commit/?id=84821964eb6a6962a862223865d44e3c236df66f -- you actually have to support rgb32 for texture buffers if you want GL 4.0
00:27imirkin: zmike: this is how we handle it in nouveau: https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c#n68
00:27zmike: imirkin: that's the image case, not buffers
00:28zmike: thanks for double checking me though 👍
00:28imirkin: aha, yeah, the PIPE_BUFFER case is handled "outside". yeah definitely not supporting rgb 3-component for "regular" textures is a good idea.
00:29zmike: yes, I've sort of been advocating for it for a while
00:29zmike: nice catch on the zs scissor clears btw
00:52imirkin: zmike: it's the only reasonable approach - no hw out there can flexibly support this stuff. some can handle rgb888, but that's the extent of it.
00:53imirkin: with the zs clears, i just happened to be looking at the gallium trace of a test that did depth clears
00:53imirkin: and wasn't seeing the scissors in the clear
00:54zmike: what do you mean by "gallium trace"?
00:54imirkin: it activates the 'trace' driver
00:55imirkin: which is a wrapper which outputs an xml thing of all the calls
00:55imirkin: there are some scripts to help print the xml, or you can just look at it raw
00:55zmike: that probably would've saved me a lot of time debugging
00:56imirkin: back in the bad old days there also used to be rbug, but that has fallen into deep disrepair by now
00:56imirkin: (which allowed a "remote" debugger to connect and look at various gallium-level state)
00:57imirkin: or ... something. it's been like 5+ years since i've used it, so i don't even remember precisely how it worked
00:57zmike: kinda neat?
00:57zmike: why not just use gdb
00:57imirkin: it let you look at bound surfaces, etc
00:57imirkin: why use qapitrace when there's 'apitrace dump' :)
00:57zmike: oh like a more high level thing?
00:58zmike: could probably do that with gdb python too?
00:58imirkin: i think the application has to be running for this stuff to work
00:59imirkin: otherwise i don't know that you can necessarily look at texture contents
00:59imirkin: but like i said - i barely remember its functions
00:59zmike: still interesting to hear about from a historical perspective
00:59imirkin: it was neat, i used it a handful of times, not sure that it was ever so useful that i couldn't achieve the same task another way
01:00imirkin: the trace thing is super-useful though
01:00zmike: yes, it certainly seems like it will be
01:00zmike: great tip 👍
01:10xyene: pq: thanks! Ideally I'd like to modify the producer to explicitly sync with the consumer, but getting that working in the existing codebase I'm hacking on would require a bunch of not-gfx-related rethinking that I'm deferring to the medium-term (forcing the DMA is definitely not ideal, but still significantly lowers CPU usage versus not using DMA at all)
01:11xyene: I had another random question about dmabufs: does the fragmentation of the physical pages backing the dmabuf impact the transfer speed on typical hardware?
01:11xyene: I can imagine a universe where the DMA scatterlist being long => the DMA taking longer than if it were short, but do we live in this universe?
01:11xyene: A quick benchmark on my amdgpu shows that dmabuf import from shmem is a little slower than using CPU cycles to fill a texture through a glMapBufferRange buffer
01:12xyene: (I didn't have expectations for DMA to be *faster* than spinning the CPU, this is just a personal curiosity)
03:55jekstrand: marex: nir_remove_dead_variables().
08:55pq: imirkin, are you asking if the driver should reject a modifier when userspace attempts to use it a place where it cannot work? Of course the driver must reject that.
08:57pq: imirkin, I don't think you can rule any specific use out of dmabuf. Submitting depth textures to a compositor is definitely something people have already done in some form.
08:58emersion: mainly for XR compositors
09:05pq: xyene, I have no idea. I think a bigger question is, can the devices you want to share the dmabuf with all actually support non-contiguous memory at all? AFAIK some display blocks don't, which makes scanout-able memory on those systems a precious resource.
09:08MrCooper: xyene: the only thing that might make a difference is if the memory is contiguous or not; if it's not, each page will end up as a separate entry in a page table anyway
09:10MrCooper: (even if it is, it may not make a difference if the buffer is smaller than a huge page, or if the driver can't take advantage of huge pages)
11:30emersion: is there a way to get llvmpipe to draw on a buffer i've allocated?
11:30emersion: is this what osmesa is for?
11:31emersion: i'd prefer to use EGL if possible
12:22bnieuwenhuizen: pepp: mareko: any idea how much perf BIG_PAGE can give on GFX10.3 chips and what a good showcase would be?
13:35alyssa: pendingchaos: I wonder if it might make sense to teach peephole_select about pre-lower_io load_deref uniforms, so when it's called in st_nir_opts (before lowering UBOs to uniforms *or* lowering I/O) it can do the right thing
13:35alyssa: and then the UBO issue is totally ignored
13:40danvet: tzimmermann, so all the dma_resv_lock in commit_tail are gone again, and I can start pushing my annotation patches?
13:40pendingchaos: alyssa: it looks like the pass already handles uniform load_deref
13:40tzimmermann: danvet, i still have patches for vbox and ast
13:41tzimmermann: i only mreged the patches for shmem so far
13:41danvet: I guess if you have links I'll try and look at them tomorrow or so
13:41tzimmermann: the patchsets are ready, i'll send them out this week
13:41tzimmermann: i'll send out vbox today. 2 patches
13:42alyssa: pendingchaos: so it does... hm
13:42tzimmermann: ast is still 12 patches and depends on vbox stuff
13:43tzimmermann: hans wants to test vbox patches, so i'd like to wait for him
14:11marex: jekstrand: I think I have a bigger problem there
14:11marex: jekstrand: so maybe I can ask about it
14:13marex: jekstrand: it seems that etna_compile_shader_nir() does nir_shader_clone() and then runs all the NIR optimization passes on the clone shader, but that cloned shader is then free()d -- etna_link_shader_nir() then iterates over shader inputs of the un-optimized FS version and connects those inputs to varyings
14:14marex: I guess it isnt all that strange though ... because the compile_shaders part might or might not remove some inputs, so in the link stage we want to connect all possibly existing inputs
14:14marex: and thats why it uses the un-optimized shader
14:14marex: but then, I should be able to run the nir_remove_dead_variables in the link stage (?)
14:43jekstrand: I don't know why it uses the unoptimized shader for link or if that's the right thing to do. I don't know the gallium hooks all that well. Kayden? robclark?
15:06zmike: is this the precompiled shader variant and not the finalized one?
15:14alyssa:doesn't understand finalize_nir
15:30zmike: it's for finalizing
15:31zmike:has all the best answers
15:32imirkin: pq: thanks
15:36jekstrand: All I know is that the name was picked based on brw_finalize_nir() and I know what that does.
15:36jekstrand: But that doesn't mean the gallium hook does anything even close to similar. :-P
15:38bnieuwenhuizen: jekstrand: so what do you do for iris then? both?
15:38jekstrand: bnieuwenhuizen: Don't ask me. :-P
15:39jekstrand: bnieuwenhuizen: I do know iris' finalize_nir doesn't cal brw_finalize_nir. :-)
15:39jekstrand: brw_finalize_nir gets called by the back-end compiler right before back-end processing.
15:39jekstrand: Because it's used to apply shader keys
15:40robclark: so finalize_nir() is more interesting if you have shader-cache.. it lets you hook in driver lowering/opt passes before mesa/st serializes
15:49alyssa: robclark: ah, ok
16:04jekstrand: robclark: Since you're here, can you answer marex's question about linking?
16:05robclark: so, I'm not entirely sure why a (non-tiler) backend would have varyings to remove.. mesa/st already does link-time opts
16:05robclark: (a tiler would want to generate a binning pass shader.. which is potentially a thing that could be moved into nir)
16:20danvet: tzimmermann, ok I guess I can merge driver-specific patches, but will wait for you with the one for atomic helpers
16:20danvet: it's all for 5.13 only anyway
16:21tzimmermann: danvet, as i said, hans wants to test the vbox patches
16:22danvet: yeah makes all sense
16:30mareko: bnieuwenhuizen: no idea
16:33mareko: yes, finalize_nir is for driver-specific lowering before st/mesa serializes to disk
16:34mareko: if you change nir too much in there, the shader variant passes in st/mesa might need fixing
16:38mareko: nvidia finally beaten by mesa in most benchmarks
16:38mareko: according to phoronix
16:41alyssa: mareko: \o/
16:41alyssa: imirkin: congrats ^^
16:42imirkin: i don't think mareko is referring to nouveau
16:42imirkin: beating nvidia
16:42zmike: mareko: possibly interesting for you is that nvidia blob is slightly faster in drawoverhead than radeonsi was before your atomics removal, now only about 60%
16:42alyssa: imirkin: I know ;P
16:42imirkin: just a hunch :)
16:43MrCooper: very impressive
16:45mareko: zmike: do you mean nvidia is faster? or was faster?
16:46zmike: only about 10%
16:46MrCooper: looking forward to zink on nvidia Vulkan beating nvidia OpenGL :)
16:46zmike: would have to get fresh numbers to see the gap again now
16:46zmike: gonna be a while
16:46MrCooper: seems like just a matter of time given the current trajectories
16:47zmike: probably need some new extensions before it's even remotely possible
16:47MrCooper: even if it's just one benchmark
16:47mareko: I guess you can mostly thank radv for those tresults
16:48zmike: well I think I'll beat nv gl if a multidraw extension happens, at least in drawoverhead
16:49mareko: psst, we don't want them to know
16:49MrCooper: I'll take that :)
16:49LordKitsuna: I wonder if zink gets to on par if amd will just go "damn let's just ship this" and stop producing opengl drivers lol
16:49LordKitsuna: That would be hilarious
16:52MrCooper: RX 6800 on par with RTX 3080 overall, holy crap
16:53mannerov: Congrats to you all guys, that's indeed an achievement
16:55alyssa:puts on annoying unknowledgable user hat
16:55alyssa: how are radeonsi + aco benchmarks? 🙃
16:56HdkR: I'm also looking forward to Radeonsi + ACO
16:58LordKitsuna: We can just call it RadeonSaico
18:59alyssa: "firstname.lastname@example.org: Permission denied (publickey,keyboard-interactive).
18:59alyssa: " uhh
19:02kisak: looks like gitlab.fd.o went down.
19:02zmike: oh no
19:03alyssa: kisak: got it, thought something was wrong on my end, good :)
19:15Lyude: it's probably related to the server maintanence (also i'm so sorry I never noticed the meeting minutes didn't word wrap until now) that got mentioned here: https://www.x.org/wiki/BoardOfDirectors/MeetingSummaries/2021/01-28/
19:16alyssa: pendingchaos: Ah-ha, found the actual culprit -- the limit used in our backend opt loop for peephole_select is much more aggressive than mesa/st uses
19:16alyssa: If, instead of setting uniforms_to_ubo, I run peephole select in the backend and then call lower_uniforms_to_ubo explicilty (and then do the opt loop as normal), I get the old behaviour again.
19:17alyssa: Possibly there's still an underlying issue with peephole_select to be considered, not sure
19:22Lyude: oh-apparently it is still supposed to be up right now, someone's looking into the downtime :)
19:22Lyude: (thanks bentiss!)
19:23imirkin: github's also having issues apparently. something must be going around
19:23Lyude: so the pandemic has made it's way to computers. I knew this day would come
19:23alyssa: Lyude: # sudo shutdown -h now
19:24alyssa: still don't get what the -h is for, I didn't need that years ago
19:24imirkin: Lyude: https://www.youtube.com/watch?v=zvfD5rnkTws
19:24Lyude: i've just moved to sudo systemctl poweroff
19:25Lyude: imirkin: lol, i remember this song
19:28imirkin: a simpler time
19:32alyssa: Eww, the shader-db gobblegook is making sense to me now. I regret paying attention in my statistics class D:
19:36imirkin: i think i zoned out around 'power' -- couldn't handle the conflict with physics.
19:45bwelty: airlied, danvet: are you able to take a look this week at RFC I sent out: "cgroup support for GPU devices"
19:50bl4ckb0ne: is it a good place to ask drm questions or is there a better channel?
19:50Lyude: ask away
19:50alyssa: say no to drm! yarr! ;P
19:51bl4ckb0ne: im getting ENOSPC on drmModeSetCrtc and i dont get why
19:51Lyude: sounds like you're hitting an atomic check failure
19:52bl4ckb0ne: im screwing around with drm dumb buffers
19:52Lyude: a lot of times if you enable debugging output in the kernel (drm.debug=0x10 would probably be what you want) it'll spit out a slightly more sensible error message describing what's going on
19:52Lyude: (in the kernel log of course)
19:56bl4ckb0ne: thanks for the tip
19:58vsyrjala: we should have a faq. this question would be near the top if not 1.
20:30emersion: bl4ckb0ne: https://github.com/swaywm/wlroots/wiki/DRM-Debugging
20:31bl4ckb0ne: oh neat
20:51Prf_Jakob: keithp: Wondering what the status is on the VK_GOOGLE_display_timing extension. I understand if you don't have time and if so is okay if I try to upstream it?
21:17keithp: Prf_Jakob: it's all tied up with the EXT_present_timing stuff; not sure what the best plan is for that whole stack
21:17keithp: And that's all tied up with a pending (private) extension that we're hoping to get done in the next few weeks
21:18Lyude: curious, are you working at google these days? or did that extension just happen to come from google
21:18keithp: it came from google; I'm still consulting for Valve
21:18Lyude: ah cool
21:36Prf_Jakob: keithp: The first patch applies cleanly without any need for the other extension, and is useful for us.
21:37keithp: yeah, maybe just get it merged and plan on doing the rest of the work later
21:37keithp: I have been keeping it rebased with some regularity
21:37keithp: google_display_timing doesn't require any other extentions, it's the 'fixed' version of ext_present_timing that will
21:37keithp: and getting those extensions pushed through khronos is taking all of my available cycles, I'm afraid
21:39Prf_Jakob: So if I get the first patch upstream that is okay with you?
21:40keithp: sounds good
21:40keithp: that would be helpful, I think
21:56Peste_Bubonica: Hi folks. Its true that we will be able to test AMD Smart Acess with mesa 21.0, even without Ryzen 5000 and Radeon 6000?
21:57Peste_Bubonica: I know that its already avaliable in kernel, with amdgpu
21:57bnieuwenhuizen: Peste_Bubonica: so you need to have some mobo support for some stuff, "above 4g decoding" in particular
21:57Peste_Bubonica: bnieuwenhuizen, yes, my mobo supports it
21:59bnieuwenhuizen: I think most opts for that case should be in 21.0 yes
22:00Peste_Bubonica: I've read on Phoronix that maybe it will work on Ryzen 3000 series and Radeon 5000 series, but the user must enable it, beucase it may not work in all setups
22:01HdkR: It may also not give you a perf increase
22:01HdkR: May even reduce perf in some cases :D
22:01Peste_Bubonica: HdkR, oh yeah. I see it even in Windows benchmarks, with radeon 6000 :)
22:02HdkR: I should probably retry my setup with the latest bits again. Last time I tried it murdered perf
22:02bnieuwenhuizen: when was last time?
22:02HdkR: Early last year I think
22:03HdkR: Without any of the SAM specific improvements of course
22:03Peste_Bubonica: I'm thinking to build 21.0-rc4 and do some tests with Agesa 18.104.22.168
22:03bnieuwenhuizen: I think in early fall we had some fixes that should avoid the regressions from before that
22:11HdkR: I definitely need to try it again then. I think I was poking at Runescape for someone a while ago with it and noticed the huge performance delta
22:20Lyude: hey just a quick heads up for folks at google, https://opensource.googleblog.com/2021/02/know-prevent-fix-framework-for-shifting-discussion-around-vulnerabilities-in-open-source.html if y'all could speak out against"verified identities" for committing to open source projects that'd be nice :). such a requirement today would literally stop queer folks like myself from maintaining projects. thx
22:20Lyude: (see section "Goal: Authentication for Participants in Critical Software")
22:20Prf_Jakob: keithp: Cool, I'll do that soon then, thanks!
22:27Venemo: Lyude: hey. I'm not a googler, but I took a look at the link, and I don't see what this has to do with that
22:27Lyude: "It is also conceivable that we could have “verified” identities, in which a trusted entity knows the real identity, but for privacy reasons the public does not. This would enable decisions about independence as well as prosecution for illegal behavior."
22:27Peste_Bubonica: HdkR, what GPU did you used in your tests with resizable bar?
22:28HdkR: Peste_Bubonica: rx 5700
22:28Venemo: so you're worried about someone knowing your real identity?
22:28Peste_Bubonica: HdkR, nice, besides the performance problem, did you have any stability issues?
22:28HdkR: Peste_Bubonica: Nah, only perf at that time.
22:28Peste_Bubonica: HdkR, I have a 5700xt and I really want to test it :)
22:29HdkR: Venemo: Anonymity is very important
22:29Venemo: I see
22:29Lyude: i mean yes, because for a lot of trans folks it may not be possible to legally change their names for various reasons, and such a policy would require using your legal name (even though it's a 'trusted' entity, which could really be anything). and yeah-there's plenty of other reasons to be anonymous for code contributions
22:32ccr: I wonder how .. well, whomever .. think they are going to enforce a policy like that on projects / distros.
22:32ccr: probably possible for some projects, at least those that are pretty much owned by some organization, I suppose
22:35Lyude: yeah-I have no idea :S, it's kind of astonishing to me that they don't really seem to consider the fact that google doesn't maintain all of open source
22:40jekstrand: pendingchaos, cwabbott: What would you think about a CAN_REORDER_IN_BLOCK flag for intrinsics? I'm thinking of ways to allow CSE for subgroup ops but while keeping it block-local.
22:41cwabbott: jekstrand: a long time ago i had a proposal for a complete set of flags for intrinsics, like "convergent" in llvm but a bit more detailed
22:41jekstrand: cwabbott: Yeah... I remember that vaguely
22:41cwabbott: I think we should probably have something like that... even if we just use it to CSE it per-block
22:42cwabbott: we should be a bit more rigorous about the semantics of subgroup stuff I think
22:43jekstrand: Right now, most of them are CAN_ELIMINATE with no re-ordering
22:43jekstrand: But we really want to be able to CSE them, I think.
22:43Peste_Bubonica: HdkR, I'm thinking if I must wait for a stable release of mesa 21.0 or try 21.0-rc4 LOL
22:44cwabbott: also, there's theoretically stuff like jump-threading or flattening nested loops that isn't legal but we have no way of specifying that atm
22:45cwabbott: like, coherent is actually stricter than just !CAN_REORDER, at least theoretically
22:46jekstrand: Yeah. I remember something about that but not really the details. :-/
22:53alyssa: is "write to a UBO [via a compute shader or whatever] and read from it later in the same frame" something apps do in practice?
22:53bnieuwenhuizen: if you mean ssbo yes
22:53alyssa: I mean UBO
22:54bnieuwenhuizen: well, you can't write to UBOs in a compute shader, you can at msot write to it as SSBO and then read as UBO
22:54alyssa: that then
22:54bnieuwenhuizen: for buffers it happens, not sure about SSBO vs. UBO on read path
22:54alyssa: for promoting UBO reads to push constants, in that corner case we need to flush (or do something clever like insert a compute shader to do the copy etc)
22:55alyssa: don't have enough experience with gl3+ workloads to know if this a real thing that apps would try to do
22:55HdkR: ...It does happen but it needs a fence of some sort
22:55bnieuwenhuizen: glMemoryBarrier or what the name was?
22:55HdkR: I think that's the one
22:55bnieuwenhuizen: or you can even do transform feedback ;)
23:11Venemo: jekstrand, cwabbott Did we have an agreement on how to proceed with the story of having 2 different float modes in 1 shader?
23:12jekstrand: Venemo: I don't think so. I got grumpy about all of them and didn't have a better plan. :-)
23:13Peste_Bubonica: HdkR, i'm building mesa-git here. Do we need to enable BAR with enviroment variable, or it will be enable automatically?
23:18pendingchaos: jekstrand: I think we should specify this stuff better
23:18pendingchaos: "CAN_REORDER_IN_BLOCK" sounds like movement across discards is valid and movement to blocks with the same set of invocations is invalid though
23:19pendingchaos: there's also some intrinsics where it's valid to add invocations, just not remove invocations
23:19pendingchaos: like shuffle
23:19jekstrand: pendingchaos: Discards are hard. 'cause they're really control-flow.
23:20jekstrand: we just don't model them that way
23:22pendingchaos: for load_ubo/load_ssbo/etc with dynamic binding array indexing, we can remove invocations and only safely add invocations if there's another access with the same index and an intersecting set of invocations
23:22pendingchaos: so CSE is valid but most GVN is not
23:23exit70[m]: hi, is it normal for “glxinfo -i” to give an x error?
23:24airlied: exit70[m]: yes
23:24Venemo: jekstrand: do you still feel grumpy about it?
23:24airlied: indirect rendering is no longer available by default
23:24exit70[m]: i see thanks
23:26jekstrand: pendingchaos: I've thought about having a can_reorder() helper which takes two instructions.
23:26jekstrand: pendingchaos: The semantics, I think, would be something like can_reorder(a, b) == "can I move a to below b?"
23:31pendingchaos: might also be useful to have a "can I move a to above b" or "can I move a to this cursor"
23:34pendingchaos: I made some local patches a while ago adding more detailed flags than CAN_REORDER for subgroup operations: https://pastebin.com/raw/rQ6B82Eu
23:35jekstrand: pendingchaos: I'm not sure if that's the right set of flags or not. :-/
23:35jekstrand: pendingchaos: But I do expect we probably want to specify it in terms of adding/removing invocations instead of "cannot leave the block"
23:35jekstrand: With that, plus some divergent CF analysis, we can potentially move stuff inside locally uniform control-flow.
23:37jekstrand: We may also want a quad version for "can I add or remove whole quads?" for dealing with texturing and discard
23:56airlied: llvm just got a set rounding intrinsic :-)