00:00anholt:is skeptical -- those haven't looked like big deals according to bloaty
00:00jenatali: anholt: I managed to strip 600KB by removing nir_print_shader
00:01jenatali: Well, I wasn't building a GL driver, so I was able to fully remove the format tables
00:01anholt: nir_print_shader was pulling those in?
00:01jenatali: (Specifically looking at taking the nir -> dxil bits we've bit and making them standalone, how small could we make it)
00:02jenatali: nir_print_shader requires them, yeah
00:02jenatali: Though I had to actually remove all the code because there's recursive definitions in there that the compiler wasn't smart enough to see had no external references :D
00:02jenatali: Er, recursive references
00:03anholt:fixed a ton of that stuff in the last month or so
00:03jenatali: anholt: Ah... our nir -> dxil stuff still isn't upstreamed, so we probably don't have that
00:03jenatali: We haven't rebased in... a while
00:04anholt: yeah, you should just be pulling in the format descriptor tables, not the pack/unpack codegen, now
00:04jenatali: yet one more reason we really need to get upstream :P
00:04anholt: those descriptor tables could get shrunk too, probably, but there are very touchy bits in there with llvmpipe
00:04anholt: and endianness
00:05jenatali: Makes sense
00:08jenatali: anholt: After that, looks like the next things are adding constexpr to glsl_type constructors/destructors to move the built-in types to data segments instead of code, and then yeah the SPIR-V opcode/capability strings are still not super tiny
00:09jenatali: But nothing major, both of those would be relatively minor
00:12anholt: jenatali: hmm. gcc, at least, is awful at optimizing switch statements to tables, so some work on the spirv_info codegen to make tables could be a win for everyone
00:13jenatali: MSVC doesn't appear much better
00:13anholt: though spirv's enums mostly look sparse so that could be more painful
00:14jekstrand: anholt: Yeah... nir_print_shader uses PIPE_FORMAT tablrs
00:15jekstrand: jenatali: But, also, look at the NIR size then look at the LLVM size and ask yourself which one you really want to throw away. :)
00:15jenatali: jekstrand: I'm talking SPIR-V -> DXIL only, no LLVM at all
00:16jenatali: Taking our CL SPIR-V -> DXIL path and stripping out the LLVM and SPIRV-Tools bits, it's ~2.5MB right now
00:16jenatali: I'm interested to see if I can make it smaller
00:17jekstrand: jenatali: Is that a debug build or release?
00:17jenatali: jekstrand: Release
00:18jekstrand: jenatali: Generally, I'm not a fan of adding a bunch of #defines just to shrink build sizes unless there's a really good reason for it.
00:18jenatali: Fair enough :)
00:18jekstrand: jenatali: I'm going to take a wild stab in the dark and guess that a good chunk of that 2.5M is nir_opt_algebraic. :)
00:18jenatali: jekstrand: Constant folding, actually
00:18jekstrand: I had patches to shave like 200K off it at one point
00:18jekstrand: Oh, yeah, constant folding will do it too
00:19jekstrand: You kind-of want that, though. :P
00:19jenatali: Yep! :)
00:20jenatali: Yeah, not looking to go crazy, just looking to see if there's anything we don't really need for a stripped-down scenario
00:20jekstrand: I don't have a plan to shave down constant folding. Not without dropping opcodes or bit-sizes.
00:20jekstrand: And I don't see either of those as being a particularly good idea.
00:21anholt: constant expressions could probably get some reasonable wins from making helper functions instead of inlining so much code.
00:21jekstrand: Besides, it's OpenCL's fault that we have to constant-fold 16-wide instead of just 4-wide. :P
00:21jenatali: Oh, Meson's default was optimize-for-speed, I bet changing to size would help quite a bit
00:21jekstrand: anholt: Yeah. There's a bunch of cases where we could probably just loop or something.
00:22anholt: and I bet you'd get some huge wins for the per-channel ops by having an outer loop over the channels handing just the scalars in for folding by an op handler.
00:22jekstrand: Actually... We do usually loop
00:22anholt: oh, and, as always, burn the switch statement to the ground
00:22jekstrand: Yeah, we could get rid of the switch statement. That would probably help
00:23jenatali: FWIW I was just playing around today, trying harder is going to come later
00:23jekstrand: I think they all have the same prototype so we could just make a table.
00:24jekstrand: anholt: As far as per-channel goes, yeah, we might be able to have the function pointer just switch on bit size and do the op and then have the loop outside. That sounds like it might make things more complicated, though.
00:24jekstrand: That code is already pretty hard to read
00:29anholt: yeah, actually, I don't think there's anything to my per-channel idea
00:31anholt: however, nir_eval_const_opcode is 22% of the whole file in release mode, so that should be some easy win.
10:46karolherbst: why do we have 2020 and some desktop still run like crap as wayland? .... :/
10:47bnieuwenhuizen: karolherbst: haven't you heard? 2020 is a pretty bad year?
10:47karolherbst: I at least thought that people stuck at home would help with fixing bugs.. :D
10:47karolherbst: but honestly...
10:48karolherbst: sometimes you just hit 10 bugs a day
10:48karolherbst: it's super annoying
10:48pcercuei: instead, they wrote new features, bringing new bugs
10:48pcercuei: a coder's life
10:48karolherbst: honestly.. wayland is probably the main reason to stop using plasma
10:48karolherbst: it's just broken
10:49karolherbst: I am sure the main reason is having two display with different DPI scaling settings
10:49karolherbst: but... that's also one of the main reasons to use wayland
10:50bnieuwenhuizen: huh, messing with DPI scaling between two different monitors is also why I'm not using Wayland yet (in particular xwayland issues with sway)
10:50pq: and it still fails miserably with X11 apps
10:50karolherbst: well, I have a 4K and a FHD display
10:51karolherbst: pq: for fun reasons, it's fine for X1 apps here
10:51bnieuwenhuizen: well, with X11 I have something that works well enough for me ...
10:51karolherbst: bnieuwenhuizen: you mean, not at all as it's just not supported :p
10:51pq: karolherbst, in that case, I think your X11 apps are broken (doing the DPI scaling themselves) ;-)
10:52karolherbst: I mean.. it kind of works, it's just the compositor being broken
10:52karolherbst: now I have the issue that the cursor gets doubled in size when I hover over wayland applications
10:52karolherbst: and it reverts to normal size over X11 apps
10:52karolherbst: and I think it was caused by switching to a tty for a moment
10:53karolherbst: and X11 apps just stay at FHD on the 4K display
10:53karolherbst: so that is working just fine
10:53karolherbst: I do think that kwin has a better concept of this as the physical size of applications stay the same on both displays
10:54karolherbst: and they just get redrawn at higher resolution at some point
10:54karolherbst: not this half half issue you see with gnome I think
10:54pq: that's how it's supposed to work, as long as you don't use fractional scaling
10:54karolherbst: where an application on both displays is just broken
10:54pq: anyway, there is a fundamental incompatiblity: a Wayland compositor is supposed to scale a window for an output if the app didn't already do that. But X11 and therefore Xwayland has no way of knowning what the app did.
10:54karolherbst: so they just stay FHD
10:55karolherbst: which is totally fine
10:55pq: yes, assuming that all Xwayland content is always scale=1 is one workaround, but then X11 apps need to stop scaling themselves.
10:55pq: but it does quite sound that kwin is confused with its own scaling
10:56karolherbst: I mean.. usually it works quite nicely, but there are those tons of random bugs happening
10:56bnieuwenhuizen: but if the compositor scales it gets all blurry while if the app does it they can render e.g. text at higher res?
10:56karolherbst: gwenview is also just broken
10:56pq: bnieuwenhuizen, yes.
10:56pq: blurry or blocky, but at least you can read it
10:57karolherbst: the scaled apps are a bit blurry on the 4K display
10:57karolherbst: but you can also keep them on the FHD one
10:57karolherbst: wayland native apps are scaled correctly
10:57karolherbst: so you move them over.. they are scaled up for a moment until they resize to the native resolution
10:58bnieuwenhuizen: well, I got all the apps here configured to scale and I like the non-blurriness
10:58karolherbst: bnieuwenhuizen: question is, do you have a FHD and a 4K display?
10:58bnieuwenhuizen: yes it will show up kinda big on the FHD one, but I like that better than the blur
10:58karolherbst: mhh, I had big trouble getting that even to work as either everything is super big or super small on one display
10:58karolherbst: yeah... I hate that
10:58karolherbst: so "not working at all" :p
10:59bnieuwenhuizen: my FHD display just gets IRC or a few terminals anyway
10:59karolherbst: I like to be able to move things around
10:59bnieuwenhuizen: karolherbst: broken in a way that happens to work better for me compared to wayland :)
10:59karolherbst: biggest pain point is still chromium btw, which is X11 only except you enable that wayland stuff somehow
10:59karolherbst: and I am to lazy to build chromium myself :p
10:59karolherbst: bnieuwenhuizen: right.... what desktop are you using btw?
11:00bnieuwenhuizen: with no DE but mostly KDE apps + firefox
11:00karolherbst: ahh yeah..
11:00pq: bnieuwenhuizen, in that case you configure your FHD output as scale=2. If it's scale=1 and stuff is "big", it's not working as intended.
11:00karolherbst: I guess all those nieche window managers will just never work perfectly with wayland
11:00bnieuwenhuizen: pq: that was the plain X11 case
11:00karolherbst: and atm I'd say only gdm even has the resources to be a "perfect" one
11:01karolherbst: as atm every compositor is broken
11:01bnieuwenhuizen: karolherbst: for wayland I tried sway especially because it seems the most supported tiling compositor
11:01pq: bnieuwenhuizen, oh, I thought you were using Plasma too, saying it works for you.
11:01karolherbst: bnieuwenhuizen: but yeah.. with X you can't get the scaling right
11:01karolherbst: have to use wayland for that
11:02karolherbst: trying to do that in X is just working around how broken X is
11:02bnieuwenhuizen: pq: for me problem on wayland (with sway) is blurry X11 apps on 4k (especially browser), problem on plain X11 is that my apps scale on the FHD monitor too (and text etc. becomes too big)
11:02karolherbst: just use wayland apps :p
11:03pq: bnieuwenhuizen, right. The blurry X apps thing on a Wayland compositor is almost unfixable AFAIK, as I explained above.
11:03karolherbst: wondering if gamescope could help with that
11:03karolherbst: Plagman: could gamescope run all applications at the highest res display res and scale down if moved around? :D
11:03karolherbst: so X apps just run at 4K all the time...
11:03karolherbst: but I guess then things are just small
11:04karolherbst: X11 is just broken
11:04karolherbst: and a waste of time to make it work
11:04pq: karolherbst, you could probably tweak a normal desktop Wayland compositor to do the same: assume all X windows are scale=2, and do whatever X magic is needed to have them draw in 2x.
11:05karolherbst: sounds horrible though
11:05karolherbst: but I had this running when I only used the 4K dispay
11:05karolherbst: and it mostly worked
11:05bnieuwenhuizen: if that gives me a good wayland experience for an "unfixable" problem I'll take horrible :P
11:05karolherbst: bnieuwenhuizen: it's horrible because desktops are just broken
11:05karolherbst: which was my initial point
11:06karolherbst: I know that this works quite nicely with gnome
11:06karolherbst: but mutter does this annoying scaling when you move windows around :/
11:08karolherbst: I honestly don't know what the problem is
11:08karolherbst: not enough bugs reported?
11:08karolherbst: not enough people working on it?
11:08karolherbst: people just yse X11, as wayland is broken (even the devs)?
11:08karolherbst: no clue
11:09karolherbst: but it doesn't feel like there was sufficient progress so one can say that wayland support improved generally
11:09karolherbst: can't say wayland runs better with plasma compared to one year ago
11:09pq: people not using a mixed-DPI monitor setup, because with X11 you can't make it work? :-)
11:10karolherbst: I guess...
11:10pq: or people do use mixed-DPI monitors, but being accustomed to X11, do not want it configured as such
11:11karolherbst: but that kind of comes back to the question on how to deal wth X11 being.. EOL pretty much
11:11karolherbst: it's not official, but factually...
11:11pq: I have to say I've been personally avoiding HiDPI monitors on purpose.
11:12bnieuwenhuizen: I can't run my games GPU limited on 1080p anymore so I had to get something higher res to do GPU perf testing :(
11:12pq: why does GPU perf testing have anything to do with a monitor?
11:13pq: it should run off-screen
11:14bnieuwenhuizen: pq: getting ingame offscreen is kinda hard ...
11:15bnieuwenhuizen: maybe for a few games where you can define an automated test profile
11:15pq: oh, right, games are not benchmarks :-/
11:15bnieuwenhuizen: (+ xrandr scaling had other problems with input clipping etc.)
11:25tomeu: bnieuwenhuizen: ooc, why not using traces for this? I guess you aren't looking at optimizing the game logic... :)
11:26danvet: pinchartl, can you maybe join the thread with narmstrong?
11:26bnieuwenhuizen: tomeu: because tracing for Vulkan has been horrible for the longest time
11:27bnieuwenhuizen: also I'm still not sure if we have replay tools that replay fast enough to actually fully load the GPU when replaying a typical game trace?
11:28tomeu: hmm, looks pretty bad, but something we should fix
11:28tomeu: I'm doing opengl first though
11:40linkmauve: bnieuwenhuizen, can’t you use a modeline bigger than your screen instead?
11:41bnieuwenhuizen: honestly not excluding it but I have no clue about that stuff
11:41linkmauve: On Intel at least, I can tell Weston to use a 4K framebuffer and configure the primary plane to scale it down to my actual screen size.
11:42linkmauve: That’s how I actually test HiDPI stuff when I feel like it. :)
11:42bnieuwenhuizen: linkmauve: I guess that is pretty close to what I said with xrandr, but input wasn't quite right there
11:42bnieuwenhuizen: maybe it would be better with wayland
11:42linkmauve: GPU-wise it should be exactly the same usage as normal 4K, it’s only scanout which will differ.
11:43pq: didn't Xorg have panning too? :-P
12:25pinchartl: danvet: sorry, which thread ?
12:27danvet: [PATCH 6/6] drm/meson: add support for MIPI-DSI transceiver
12:27danvet: pinchartl, ^^
12:27danvet: tldr; not very happy about the prospect of reworking dw-* drivers
12:29pinchartl: why are you pulling me in threads I've carefully managed to ignore ? :-)
12:29danvet: narmstrong asked for you
12:30danvet: well maybe overruling me by pulling bridge people in :-)
12:42narmstrong: Personnaly idc reworking these drivers !
12:58vsyrjala: hmm. wonder how i should apply the dp dfp series...
12:58vsyrjala: airlied: danvet: ok if i just smash all of https://patchwork.freedesktop.org/series/72928/ into drm-intel?
12:59vsyrjala: it has a bunch of new and/or improved dp helpers for the core
12:59danvet: I guess should still all make it into 5.10, so sounds reasonable
12:59danvet: a-b: me
13:00vsyrjala: cool. thanks
13:00danvet: maybe ping vivijim to make sure it makes it into the last 5.10 pull request
13:13danvet: bnieuwenhuizen, now that you have it typed up, ack on https://firstname.lastname@example.org/ ?
13:58warpme_: guys: i'm testing mesa 20.2-rc4 with drm_prime using EGL_LINUX_DMA_BUF_EXT (video playback with v4l2_m2m). It looks like 20.2-rc4 works ok on intel/amd/mali450/t720/t820/t860/g31/v3d - but on vc4 (bcm2835 rpi3) it makes segfult of my app (mythtv). It this known issue?
14:05jekstrand: karolherbst: So... I wrote a block-merging pass and, now that I've got the bugs worked out of that, no structurizer fails. :-(
15:11txenoo: warpme_, can you fill an issue at https://gitlab.freedesktop.org/mesa/mesa/-/issues I will have a look at it. Does it started happening on 20.2-rc4 ?
15:18warpme_: txenoo: OK. thx. before filling bug report I just want to be sure is it reasonable to do so - that's why i'm here. re: when regression started to manifesting: it difficult to say when exactly regression started but i tested rc1...rc4 and it happens for all rc. Also i tested git from 17/07 and it also crashes. Any ver. from 20.1.x seems to work ok (tested 20.1.4 and 20.1.8). What details You need in bug report?
15:23warpme_: btw: issue seems to be only when i'm using v4l2_m2m video playback with is exporting decoded frames by using EGL_LINUX_DMA_BUF_EXT in mesa. Playback with sw. decode rendering to OpenGL surfaces works ok. This may explain why there is not so many reports about this regression (as other video players using v4l2_m2m decode are rendering to DRM planes - not to EGL). So i think it is worth to look first in changes in EGL
15:23warpme_: buf. extensions area...
15:24txenoo: warpme_, here are the guidelines, https://docs.mesa3d.org/bugs.html, but I think you have useful information. The steps to reproduce the bug and the versions you are using would be enough. You can use the #videocore channel also if you prefer.
15:25warpme_: txenoo: ok. thx. I'll switch to #videocore as this seems to be more appropriate I think...
15:26Prf_Jakob: keithp: Any progress on VK_GOOGLE_display_timing?
15:28hikiko: bnieuwenhuizen, can I ask you now?
15:29hikiko: so, I've been looking at the code that imports the memory object and realized that the reason that the buffers tests don't pass wasn't the assertion but that we only implement the callback that imports the memory objects for textures and not buffers
15:30hikiko: and I was thinking that I'd rather create a callback import resource from memobj instead of texture put the common code and then have separate functions for textures, buffers
15:30hikiko: (I was looking why this check for the surface sanity is called for buffers)
15:31bnieuwenhuizen: yeah I think the buffer import is missing, it should not be hard to add
15:31hikiko: no, but do you agree with that approach?
15:31bnieuwenhuizen: basically we can make the same split as is done during normal resource creation
15:31bnieuwenhuizen: we have the common gallium callback and the first thing it does is split between buffer and textures
15:32bnieuwenhuizen: assuming you meant something like that that should be ok
15:32hikiko: so I should do the same for imported objects
15:32hikiko: that's what I meant :) thank you very much!
15:32tpalli: hikiko for the depth/stencil problem .. I remember there was some recent fixes for ref/unref of memobjs, I just wanted to mention that maybe the u_transfer_helper code I sent earlier would now work properly, I can test this later and will report what color smoke comes out
15:32hikiko: thanks a lot tpalli :)
15:33tpalli: np, hopefully we get one that solved
16:28mripard: cgit.freedesktop.org seems to be down?
16:28Namarrgon: bnieuwenhuizen: would it be a good idea to file a separate bugreport/feature-request for the missing depth-buffer imports on polaris?
17:14jekstrand: itoral: Where can I find the rpi Vulkan branch?
17:15bnieuwenhuizen: Namarrgon: for X-plane? I'm already in the bug for that and I have a MR ready
17:17itoral: jekstrand: https://gitlab.freedesktop.org/apinheiro/mesa/-/tree/wip/igalia/v3dv
17:18jekstrand: itoral: Thanks!
17:19jekstrand: Wanted to make sure you set lower_ubo_ssbo_access_to_offsets to false. You do. Otherwise, I might have made a small stink. :)
17:19jekstrand:doesn't want to land any new drivers which use the legacy path.
17:21karolherbst:has no idea if nouveau uses any legacy paths in nir
17:23jekstrand: karolherbst: Not that legacy path
17:23karolherbst: at least something
17:23bnieuwenhuizen: danvet: I agree on the principle but given the limitations I'm curious if someone else on the AMD side objects to my impl
17:24itoral: jekstrand: good to know, thanks for checking :)
17:25kisak: mripard: cgit.fd.o is not accessible here as well
17:25danvet: bnieuwenhuizen, so think better to hold off for a bit more?
17:26Namarrgon: bnieuwenhuizen: ah, okay
17:27bnieuwenhuizen: danvet: on second thought I can make rendering work with some fallbacks if people scream so seems to be ok
18:31Venemo: jekstrand: your RT talk managed to get a complete noob like myself to understand the basics, which is great. thanks for doing it!
18:32jekstrand: That was kind-of the point.
18:32jekstrand: Help bring people up to speed. Then maybe someone will review my SPIR-V patches. :P
18:33ajax: jenatali: not sure if i missed it in steve's talk, but: is the Xwayland y'all end up running just using software (and therefore llvmpipe for glx)? and if so, have you thought about how to address that?
18:34bnieuwenhuizen: jekstrand: I have most of them, will finish tonight
18:34jekstrand: bnieuwenhuizen: Oh, sweet!
18:34jekstrand: bnieuwenhuizen: I wouldn't be at all opposed to merging the SPIR-V bits upstream.
18:34jenatali: ajax: Yeah, so far our plans for GPU acceleration in WSL and GUI integration are independent
18:34bnieuwenhuizen: jekstrand: not the entire MR?
18:34jekstrand: bnieuwenhuizen: There's no user for them so I'm also happy to have them sit in a branch for a while yet.
18:34jenatali: ajax: Our plan for merging them together is to get the GLOn12 driver we're building in Mesa running in WSL
18:35jekstrand: bnieuwenhuizen: The entire MR is just SPIR-V bits (and a couple new nir_variable_modes)
18:35jenatali: ajax: With some hand waving for how all of that is actually going to integrate because we haven't looked too closely at that yet :)
18:35bnieuwenhuizen: jekstrand: was thinking might be easier than keeping to rebase over all the CL changes :)
18:35jekstrand: bnieuwenhuizen: Yeah....
18:35Venemo: jekstrand, do you have anything about mesh shaders?
18:36jekstrand: bnieuwenhuizen: Now that we're done adding new nir_variable_modes for CL, the rebasing isn't too bad
18:36jekstrand: Venemo: No, I don't. I've not spent much time with that API yet.
18:36bnieuwenhuizen: Venemo: his SPIR-V MR also adds mesh shader shadertypes :)
18:36jekstrand: Yeah, I did add those
18:36Venemo: I saw, that's why I asked
18:36jenatali: bnieuwenhuizen: I think we're just about done sending major destabilizing CL changes :)
18:36jekstrand: But mostly to keep the Vulkan and Mesa enums consistent.
18:36bnieuwenhuizen: I think mesh shaders don't need a lot of spir-v/nir stuff
18:37Venemo: maybe not
18:37bnieuwenhuizen: besides only having a NV extensions at this point
18:37jekstrand: They probably need a handful of intrinsics and system-values but probably not nearly as much as ray-tracing.
18:37Venemo: yeah probably not
18:37Venemo: but... is there anything from KHR that we could already look at?
18:37Venemo: or is it just the NV ext
18:38bnieuwenhuizen: just the NV ext
18:38bnieuwenhuizen: could also look at the D3D12 equivalent to get some ideas of how a KHR might be changed :P
18:38jekstrand: Yeah, there's a D3D12 equivalent and I think I can reasonably say that all the desktop types are looking at it.
18:39jekstrand: But there's nothing from Khronos yet.
18:39jekstrand: jenatali: What's left besides memcpy and clc?
18:39jenatali: jekstrand: Conversions, and printf hasn't landed yet
18:39bnieuwenhuizen: and doing the D3D12 thing on top of NV is kinda annoying at the moment so I'm curious what we need for vkd3d
18:40jekstrand: jenatali: Right... conversions.
18:40Venemo: in fact if we had a mesh stage already, I'm pretty sure NGG GS would be as simple as writing a nir_gs_to_ms
18:40jekstrand: jenatali: Is there an MR out for that yet?
18:40bnieuwenhuizen: since D3D12 can do 3d dispatches from the task/amplification shader but the NV ext only does 1d
18:40jenatali: jekstrand: No, what we did downstream isn't what you'll want upstream, I need to rewrite it
18:40jekstrand: jenatali: Ok, sounds good.
18:40jenatali: jekstrand: I'm thinking it makes sense to get our whole stack upstream first, and just take the regression from not having conversions, and then reimplement it upstream directly
18:41jenatali: That way I don't have to rewrite it downstream and port the patches and deal with conflicts
18:41jekstrand: jenatali: If that's what you want to do, that's ok with me.
18:41bnieuwenhuizen: Venemo: the NV extension allows only implementing mesh shaders and not task shaders so likely implementable in navi10 without firmware changes
18:41bnieuwenhuizen: could try it right now if you want to
18:41Venemo: probably after I got NGG GS working like we discussed earlier
18:41jenatali: jekstrand: If we do that, then I think the only thing that really thing that really needs to land (to avoid rebase annoyances) is libclc
18:42jekstrand: jenatali: Cool. Still waiting for karolherbst on that one?
18:42karolherbst: I don't think so?
18:42jenatali: jekstrand: I think everyone's acked the concept, but I don't have acks/r-bs on specific patches
18:42karolherbst: but yeah.. I think I wanted to give it a proper review at least ones
18:42karolherbst: just not sure if I get to it this or next week ...
18:43karolherbst: things come up which need some short term handling
18:43jenatali: karolherbst: That's alright, I completely understand
18:43jekstrand: karolherbst: Tomorrow's XDC talks look less interesting than the last two days. :)
18:43karolherbst: yeah lol.. I wished I would have refered to XDC :p
18:46jenatali: jekstrand: The other thing that's missing is me rewriting our underaligned load/store pass to use deref alignments instead of intrinsic alignments :P
18:46jenatali: I'll see if I can take care of that today, and then maybe next week we can actually stage the full D3D12+DXIL MR against upstream :O
18:46jekstrand: jenatali: Does it run before or after lower_io?
18:46jenatali: jekstrand: Before
18:47jekstrand: Yeah, I'd love to see a real MR :)
18:47Venemo: dschuermann, bnieuwenhuizen w.r.t ray tracing, I think we could solve some of the headache by merging some stages together, eg. callables into their shaders. then we could maintain a stack frame of sorts, in LDS
18:47jenatali: jekstrand: It was simpler to deal with just deref loads/stores rather than the 3 or 4 different types of intrinsics
18:47jekstrand: Venemo: You can't really merge callable; at least not easily.
18:47Venemo: why not, jekstrand?
18:48jekstrand: Venemo: They're based on indicees in tables that are bound as part of vkCmdTraceRaysKHR. So if you wanted to inline it, you'd need to make an uber-shader with a switch based on an index that you load from the SBT buffer or something like that.
18:48Venemo: sounded like it is basically a subroutine
18:48bnieuwenhuizen: Venemo: they can be recursive
18:48jekstrand: It can be done
18:48jekstrand: Also, yes, they can be recursive. :)
18:48bnieuwenhuizen: and yes to do it inline you need to inline the entire pipeline libary :(
18:49Venemo: that's why I said we can maintain a stack frame too
18:49bnieuwenhuizen: yeah, we'll likely need a calling convention
18:49Venemo: if we don't merge it, then how do we manage divergence?
18:50dschuermann: in theory we can implement function calls, not sure if that works in all cases, though
18:50Venemo: eg. say, in a Wave64 RT shader, some lanes call a callable, others don't
18:50bnieuwenhuizen: Venemo: the callable returns I think?
18:50bnieuwenhuizen: so you just handle it as an opaque block in an if statement?
18:51jekstrand: The best thing I've been able to come up with as a design for a SW-only implemenation is to have a compute shader which basically acts like a scheduler. It'd have several work queues that store what shaders need to be executed. Then, at each iteration, it would try to find a queue with enough work to fill a wave, and execute that shader.
18:51jekstrand: It'd still have to be an uber-shader.
18:51Venemo: yes, of course, but if the callable is a different shader invocation, you have to also take care of propagating the exec mask
18:51jekstrand: executeCallable() would push stuff to the queues and then end the current shader.
18:51jekstrand: Same for traceRayKHR()
18:52bnieuwenhuizen: jekstrand: we can split by jumping to a different piece of code (i.e. we have a scalar call instruction), but register management is going to be interesting for efficiency
18:52jekstrand: And your scheduler CS would keep executing bits of shaders until you're all out of stuff to execute.
18:52jekstrand: bnieuwenhuizen: Yeah, we can too if it's a uniform jump.
18:53bnieuwenhuizen: and then we can just do the same thing as for VK_EXT_descriptor_indexing :)
18:53bnieuwenhuizen: divergence can be horribly inefficient though
18:53jekstrand: bnieuwenhuizen: So I think if we did it in mesa, I think we could come up with something a little more efficient than what's possible in a layer.
18:53Venemo: also, if the callables and whatnots are different invocations, in different waves even, then wouldn't we need to take a perf hit when we launch them, and when we return to the original?
18:53bnieuwenhuizen: Venemo: yes, the reg + stack interestion has overhead
18:54jekstrand: Venemo: It depends on what you mean by "return" :-)
18:55Venemo: I guess the caller can hit some trap, or some sort of s_sendmsg maybe, which radv can detect and launch the other wave of called shaders.
18:55bnieuwenhuizen: could a callable return to its caller?
18:55jekstrand: Venemo: The "return" might not actually be a return in any real sense but rather the execution of a "continuation" shader that just picks up where the one that did the call left off.
18:55Venemo: then when the callable hits s_endpgm, radv would need to resume the previous wave
18:55jekstrand: bnieuwenhuizen: From a shader languate perspective, yes. But it may not be implemented as a real return.
18:56bnieuwenhuizen: yeah so without a scheduler we can just return for real but otherwise we likely need a continuation indeed
18:56Venemo: jekstrand: yes, I guess it can be possible to pause the calling wave, launch a new wave with the other shader, then resume the old wave
18:56dschuermann: bnieuwenhuizen we have a fork instruction, but I didn't check yet if that works properly for function calls
18:56jekstrand: That's why the spec explicitly allows for things like gl_SubgroupInvocation to change across executeCallable()
18:56dschuermann: should work with divergence, thouhh
18:56bnieuwenhuizen: dschuermann: you don't want the fork instruction, it is pretty much not useful
18:57Venemo: bnieuwenhuizen: can we pause a wave, while it waits for its callable to finish?
18:57bnieuwenhuizen: dschuermann: the entire "call other shader" thing isn't interesting at all besides the RA implications of it being opaque
18:57jekstrand: Venemo: You can spin in an atomic loop, for sure. But then that wave is doing nothing.
18:57bnieuwenhuizen: the interesting parts are maintaining efficieny with divergence and dealing with recursion
18:57jekstrand: But then you have to be very careful to make sure you don't dispatch too many waves or you may deadlock.
18:58bnieuwenhuizen: dschuermann: it is pretty much just something like s_getpc + a jump to a register
18:58bnieuwenhuizen: no magic involved
18:59jekstrand: I do think some sort of mechanism with shader continuations and a scheduler to dispatch the shader snippets (and process the ray traversal) in a wave-friendly way should be feasible.
18:59Venemo: I think the first question we have to figure out is whether we want to merge some (all?) of the RT pipeline into a single shader binary, or launch new waves for each callable, etc
18:59bnieuwenhuizen: jekstrand: I think continuations are interesting with nir control-flow
18:59bnieuwenhuizen: Venemo: AFAICt we have no HW support for launching new waves so the answer is easy there
18:59jekstrand: bnieuwenhuizen: They would certainly be easier if we didn't have structure....
19:00Venemo: bnieuwenhuizen: why not? I just gave an example on how to do it earlier
19:00dschuermann: bnieuwenhuizen rocm can do that by signalling host, not sure if the bits are also in place in the graphics stack
19:00bnieuwenhuizen: Venemo: how? I'm not seeing it?
19:00Venemo: bnieuwenhuizen: the 'call' could be an s_sendmsg, or a trap, which radv detects and launches a new wave
19:01bnieuwenhuizen: via a compute generated compute buffer is way to slow for a per-wave level
19:01bnieuwenhuizen: and we're not going to get userspace/GPU write cmdbuffers anytime soon
19:01Venemo: so it's gonna have to be a single, big binary, then?
19:01jekstrand: I think so
19:01bnieuwenhuizen: well we can do a call, it just won't launch new waves
19:02jekstrand: Though you could probably break it up and implement something with jumps as long as the core scheduler kernel is able to guarantee uniformity.
19:02Venemo:wishes we could discuss in person, and with a couple of beers
19:02jekstrand: So you could compile each shader as its own shader and then patch it together
19:03dschuermann: jekstrand: why uniformity? if one invocation wants to call, everything waits for the return
19:03bnieuwenhuizen: jitsi is open if you want it :) (though I recognize it is not quite in person)
19:03jekstrand: dschuermann: For executeCallable, that might work, but for traceRay, it'll kill perf if you do that.
19:04jekstrand: bnieuwenhuizen: We have jitsi and I have beer in my fridge......
19:04dschuermann: keep me posted, I'm currently outside
19:14jekstrand: Well, that killed the discussion....
19:15bnieuwenhuizen: Venemo: jekstrand: well if we want to continue I just created https://bh.xdc2020.x.org/dri-devel-rt
19:16jekstrand: dschuermann: As far as uniformity goes... You can also easily have the case with traceRay where the different lanes hit different geometries with different any-hit shaders.
19:38Venemo: bnieuwenhuizen: it's quite late now, but how about tomorrow?
19:39bnieuwenhuizen: Venemo: jason is in here :)
19:52Venemo: bnieuwenhuizen, jekstrand please fill me in tomorrow, then.
20:34agd5f: Venemo, bnieuwenhuizen IIRC, the amd backend in llvm supports functions. Might be a good starting point
20:40karolherbst: jenatali: btw: https://github.com/KhronosGroup/SPIRV-LLVM-Translator/pull/720 got mergeed
20:41jenatali: karolherbst: I saw :)
21:12dschuermann: agd5f: probably not recursive ones? that would be quite huge
21:13agd5f: dschuermann, not sure. mainly used for compute stuff
21:20dschuermann: agd5f: it's a good idea to check, thx! maybe hcc supports recursive functions
21:38jenatali: jekstrand: Am I missing a place where glsl_type::get_struct_instance takes alignment/packed into account for the hash/compare?
21:39jenatali: I feel like I must be, since it seems crazy I haven't hit problems with bad hash lookups before, but I am hitting problems now and I'm not seeing it...
21:41exit70[m]: Is there a way to check which Mesa OpenGL driver is in use?
21:42imirkin: exit70[m]: glxinfo
21:44kisak: ^and glxinfo -B trims it down to the human readable bits
21:45imirkin: neat. didn't know about that.
21:54exit70[m]: thanks. on my intel laptop it doesn’t tell me if it is i915, i965 or iris?
21:55exit70[m]: the xorg log says dri2 driver i965 so it should be i965?
21:59exit70[m]: I can guess based on hardware generation but would love to know a definitive way to tell which driver is in use
22:00mattst88: exit70[m]: what is the 'OpenGL renderer string' output from glxinfo -B?
22:00mattst88: what about OpenGL vendor string?
22:03exit70[m]: OpenGL renderer string: Mesa DRI Intel(R) Iris(R) Pro Graphics P5200 (HSW GT3)
22:03exit70[m]: OpenGL vendor string: Intel Open Source Technology Center
22:04ajax: that's i965, i think. iris would just say "Intel" for the vendor string
22:04ajax: or at least, my skylake just says Intel, there
22:04Sachiel: and doesn't include DRI in the renderer
22:04kisak: my memory says that iris doesn't support haswell as well?
22:04Sachiel: and wouldn't run on HSW either, right?
22:05ajax: right, iris is broadwell+
22:05imirkin: mattst88: definitely not confusing that there's Iris in the product name...
22:07exit70[m]: right right i know the hardware generation so i can infer it must be i965. assume i don't know hardware situations the definitive answer is in xorg log?
22:08kisak: the xorg log is going to tell you xorg ddx info, which is separately called i965
22:08kisak: (or it could tell you modesetting)
22:09ajax: % LIBGL_DEBUG=verbose glxinfo |& grep "pci id for"
22:09ajax: libGL: pci id for fd 4: 8086:1926, driver iris
22:09ajax: ^ kinda gross but doesn't involve the X log, which you wouldn't have one if it's Xwayland
22:10ajax: xdriinfo should work too in theory but in practice i think it's busted on Xwayland because no DRI2
22:10exit70[m]: awesome! my output: libGL: pci id for fd 4: 8086:0d26, driver i965
22:16exit70[m]: related question: in https://gitlab.freedesktop.org/mesa/mesa/-/blob/master/docs/features.txt there is no iris? does that mean i965 and iris have roughly the same features?
22:43jekstrand: jenatali: You may not be.
22:43jekstrand: jenatali: We may have forgotten to add that to the hash. :-/
22:43jenatali: jekstrand: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6767
22:44jenatali: jekstrand: I only added to the comparison, since the alignment also wasn't in the hash
22:45jekstrand: I'ad add them to the hash too
22:45jekstrand: It's probably just an oversight
22:45jenatali: Makes sense
22:46jenatali: jekstrand: I managed to get things up and running with your alignment patches instead of load/store_deref intrinsic alignment
22:55jenatali: jekstrand: I think hashing of glsl types just needs some attention... the entire subroutine hash map uses the same hash value
23:23jenatali: Ah, there's the Phoronix article I expected to see yesterday :P
23:44jekstrand: jenatali: 32 comments, even!
23:45jenatali: jekstrand: And not a single one positive, no surprise :P
23:45jekstrand: sorry. :(
23:45jenatali: Heh, not your fault
23:45anholt: 32 comments of expected caliber!
23:55cwabbott: dschuermann: it does have an actual calling convention, but it's probably going to be completely different for rt stuff anyways given how threads might be divided into wavefronts differently when returning from the callable function
23:57cwabbott: I think it's probably going to be most similar to how some languages have "async" functions which get transformed into a state machine under the hood