07:21Lynne: I'm just about sure that mesa is miscompiling my shader
07:21Lynne: could someone look into it?
07:22Lynne: affects all mesa drivers, anv, radv, and lavapipe, so I think the issue is with NIR
07:26airlied: does it work on nvidia?
07:40mareko: what is VUE on Intel?
08:52frankbinns: DemiMarie: the conformance submission I linked is for the open source driver (the proprietary driver passes 1.3 conformance across a large range of GPUs)
08:53frankbinns: we're currently in the process of upstreaming everything for Vulkan 1.0 & GLES 3.1 (via Zink)
08:54frankbinns: with all the extensions, etc we had to implement for Zink the driver can almost expose Vulkan 1.1
08:55Lynne: airlied: yes
08:56Lynne: also I found out that I've fixed llvm in one of my latest rebases
08:56Lynne: err, lavapipe
09:16jani: airlied: sima: so I'm not going to reply to this: https://lore.kernel.org/all/2024110521-mummify-unloved-4f5d@gregkh/
09:19jani: airlied: sima: if *you* say we should look into the process again, we will
09:28sima: jani, airlied yeah I think we'll just keep silently shrugging
09:28sima: with amd and intel doing this regularly and also misc sometimes there's really no point trying to change it, the people (on dri-devel) have spoken
09:28sima: agd5f, ^^ fyi
09:29sima: also my irc fails at rejoining somehow since a few weeks :-/
09:31jani: sima: there just isn't a good answer to having a fix to backport a fix from a non-rebasing -next branch
09:31jani: -"a fix"
09:31jani: the 2nd one
09:32sima: jani, imo it's more fundamental
09:32sima: for gregkh linus' tree is the center of the world
09:32sima: for a driver developer, it's the driver tree, which often is just $driver-next
09:32jani: agreed
09:32sima: and gregkh doesn't understand the different perspective somehow, and not for lack of us trying to explain it
09:51airlied: jani, sima : yeah I think there's almost a deliberate not getting it going on, maybe I should write a blog post to just point at
09:58jani: drm-misc is trying to apply fixes to -fixes and -next-fixes directly instead of cherry-picking, but a) I think it's hard for the large pool of developers to consistently get right, and you end up with cherry-picks anyway, and b) if you have a series with fixes first, you won't be able to merge the entire thing but have to wait for fixes to get applied to -fixes and backmerged to -next, slowing people down
10:01jani: iiuc the only way to make git really understand what's going on is via merges, but I'm also under the impression that Linus doesn't want a ton of backmerges and crossmerges either
10:04sima: jani, yeah we'd need topic branches for every fix that is needed for a feature patch set in -next
10:04sima: I think that's even harder to get committers to consistently do than the current approach
10:04sima: I'm wondering whether we should lean more towards "push to -next and cherry-pick if you need some as fixes" even for drm-misc
10:04sima: to avoid some of the confusion
10:04sima: because it's really the simplest model for committers
10:04sima: airlied, blog sounds good
10:05jani: yes, that's the thing. do what's easiest for developers/committers, don't add more hurdles
10:05sima: yeah
10:06jani: "develop on top of drm-tip or -next, preferrably put fixes first in the series, push to -next"
10:07jani: instead of, "well, you see, apply patches depending on what they do to a branch that depends on where we are in the development of the current linus' upstream, and maybe sometimes you'll get it right"
10:08sima: yeah the 2nd one I screw up too since I don't do it daily anymore like way back in the committer-less model
10:08jani: "yes it's a fix, but it's not small enough or important enough to be applied to -fixes at this stage"
10:09sima: yeah or people just dumping their fixes in at -rc6 because "oops forgot there's a kernel release"
10:09sima: into -fixes I mean, not -next
10:10sima: airlied, yeah a blog that just explains how everyone has a different kernel branch that's they're center of the world is probably all that's needed ...
10:10jani: since phb crystal ball went down, I wrote this https://jnikula.github.io/linux-kernel-tea-leaves/ so I don't have to answer the questions *and* it's got the feature deadline too ;)
10:36Lynne: is there a way to define a binding for an already defined BDA type?
10:36Lynne: in glsl, I mean
10:37Lynne: I just want to use BDA in a standard descriptor rather than in push data
11:52alyssa: mareko: iirc, VUE is the varying remapping thing for VS->FS i/o (etc)
11:52alyssa: but i'm not an intel dev, just read too much iris&anv code :P
12:59alyssa: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31964
12:59alyssa: anyone up to trade NIR reviews or something?
14:21karolherbst: oh right.. I wanted to review that 🙃
14:35soreau: emersion: gentle ping on https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31725
15:17DemiMarie: frankbinns: are there plans for Vulkan 1.3 and OpenGL 4.6 with the open driver?
15:22bl4ckb0ne: DemiMarie: https://mesamatrix.net/
15:23frankbinns: DemiMarie: we'll be adding the missing bits for Vulkan 1.1 & 1.2 next year
15:23frankbinns: no specific plans for Vulkan 1.3 or OpenGL 4.6 as of yet
15:25DemiMarie: frankbinns: does the HW support it?
15:35frankbinns: DemiMarie: Rogue support VK 1.3 but not GL 4.6 afaik
15:50sima: airlied, I kinda started drafting, will ping you when I have something
15:50sima: jani, you too and maybe agd5f
15:51DemiMarie: frankbinns: Could one use SW emulation? Asahi is able to get VK 1.3 and GL 4.6 on HW that doesn't even support geometry shaders or tessellation.
15:51agd5f: sima, sounds good
15:52frankbinns: DemiMarie: yeah, I expect that should work
15:53DemiMarie: frankbinns: Does Rogue have enough compute power for ”do everything missing in compute shaders” to work, and are the driver devs willing to go as far as alyssa?
15:54DemiMarie: frankbinns: do you want VK 1.3 with enough extensions for Proton & gaming?
16:02frankbinns: DemiMarie: lets just say, there's a lot of different things we can focus on and we don't want to plan too far ahead
16:03DemiMarie: frankbinns: Fair!
16:06DemiMarie: Does Mesa have specific functions it uses to access GPU VRAM from the CPU?
16:13pixelcluster: DemiMarie: in general? no, drivers usually have some (more-or-less) internal way of mapping buffers and reading/writing that mapped memory
16:14pixelcluster: e.g. RADV has buffer_map in radv_radeon_winsys for that
17:07DemiMarie: pixelcluster: The reason I ask is that if all VRAM access had to go through specific Mesa functions, it would unlock dGPU support on many Arm platforms.
17:08DemiMarie: Right now, those require unaligned access emulation in the kernel, which is slow and out of tree.
17:11pixelcluster: DemiMarie: what about applications calling vkMapMemory though? maybe the drivers can be reworked (big maybe btw), but you can't just change all apps, can you?
17:11DemiMarie: pixelcluster: had not thought of that
17:22Company: jenatali: I have another question from my d3d12 adventures - dzn treats D3D11 textures and D3D12 resources interchangably - is that allowed? I was wondering why VkImportMemoryWin32HandleInfoKHR needs a handle_type and there's one for D3D11 and D3D12 and OPAQUE_HANDLE and whatnot, but ID3D12Device::OpenSharedHandle() doesn't care about handle types at all
17:23jenatali: D3D11 -> D3D12 sharing is a special case of something that we tried to make work transparently
17:24Company: I also found ID3D11Device5::CreateFence()
17:24Company: so now I'm wondering if I can assume that D3D11 and D3D12 can work interchangably
17:25Company: always? usually? sometimes? rarely? almost never?
17:25jenatali: Usually
17:25Company: so "write code that can deal with it not working, but assume it works"
17:26jenatali: Like, the Windows compositor is D3D11 still, except for cases where it uses D3D12 like for beam-chasing ink rendering or VR late-stage reprojection. So a D3D12 app using ink would do 12 -> 11 -> 12 on the path to the screen
17:27Company: yeah, I wondered about that, too - I can use a lot more APIs in the compositor if I can just turn my D3D12 objects into D3D11 objects
17:30jenatali: For 12 -> 11, the D3D12 resource needs to be either created according to the strict set of "cross API sharing" rules (format restrictions, no mips, no arrays, RTV/SRV only), or needs to use https://github.com/microsoft/DirectX-Headers/blob/main/include/directx/d3d12compatibility.idl#L38 to specify how D3D11 should open it
17:30jenatali: You'll also want to ensure you've got SIMULTANEOUS_ACCESS, since 11 doesn't have any kind of barriers, so no way to express layout transitions on input/output
17:31Company: yeah, we need that anyway
17:32Company: it's all for external apps (read: mainly video players) feeding GTK buffers to display
17:33Company: so I have a limited ability to decide what kind of buffer I get, but I can probably assume people will do what's necessary
17:34jenatali: FYI to avoid taking this channel too off-topic, we do have a dedicated DX Discord server if you want
17:34jenatali: I don't mind either way, I just know it's not what this channel's supposed to be for :)
17:35Company:doesn't even use discord
17:36Company: but if people mind, I'll stop asking
17:36bl4ckb0ne: i was looking into that earlier today, i didnt understand why there's so many different D3D handle type
17:37rpavlik: it's bl4ckb0ne's fault I showed in here, I have extensive d3d handle type trauma
17:38jenatali: Yeah I don't really understand why either tbh
17:38bl4ckb0ne: is there any doc about which one should be prefered?
17:38Company: the one you have probably
17:39bl4ckb0ne: we're using the OPAQUE WIN32 atm for d3d12 but i feel like the D3D12 resource one should be prefered
17:39rpavlik: from the d3d side, in practice, prefer the global dxgi handles because there are weird limitations from the NT handles, despite the global dxgi handles being allegedly deprecated.
17:40Company: bl4ckb0ne: dzn opens OPAQUE first as a heap and then as a resource I think
17:40rpavlik: I do not remember how I picked which type to specify when importing
17:40jenatali: I don't know that it really matters. At the end of the day, an allocation is an allocation, and the driver should be able to interpret its associated metadata to figure out what it is without being told at the API
17:40Company: and for resource type it only tries resource
17:41jenatali: The problem with the KMT handles compared to NT is lifetime and security. With KMT handles, if the original resource goes away before the handle gets opened, the handle becomes invalidated. With NT, the handle has its own lifetime which can outlive the source resource
17:41jenatali: And KMT is session-wide where NT is process-local
17:41rpavlik: yes, but some types (especially depth textures) can't be exported as nt handles
17:42jenatali: Yeah, D3D11 has a bunch of restrictions there because it was never tested. D3D12 throws away all the restrictions
17:42rpavlik: (speaking as a Monado dev, so I'm sharing graphics handles between processes, sometimes, and definitely between apis)
17:42rpavlik: I have joked that my primary Vulkan expertise is in importing and exporting :D
17:44bl4ckb0ne: i feel like we need another handle type to unify all of that :D
17:44jenatali: :P
17:44Company: if you make it a file descriptor, you win
17:44jenatali: The other fun part is that you can actually tell if a handle is an NT handle or not just by looking at the values. NT handles have the low 2 bits set to 0 where the GDI-style handles have a tag stored there
17:45jenatali: So having two handle types for them isn't even necessary
17:52rpavlik: wait there is already a tag there?
17:52rpavlik: 🤦♀️
17:53jenatali: In the DDK headers I see #define D3DKMT_GDI_STYLE_HANDLE_DECORATION 0x2
17:56rpavlik: well that's fancy
17:58rpavlik: ok now I'm gonna have to fire things up here and test stuff
18:33alyssa: ..now im not sure if my own patch is sound *cry*
18:50bluetailoff: last occasions, i did bang the calculator for 2-3months i think to find a way to avoid division by two on access of indexed compress encoded data, and i did not think it was possible (started to think so and give up) but i succeeded on this finally, and it looks pretty elegent, this really ends all my research on among those lines, i think i got maximum out, cleaning up linux based
18:50bluetailoff: systems i can not participate , cause i am not interested in fighting you. I can not really find another human who would work on last module for 2months every day just to remove the bitshift by one out of the way in the formula. I am really bad maximalist, idealist and so tired too cause of that, it bites back i can see :)
18:50bluetailoff: but i promised to leave, just let you know that i quit, and there is no need to go crazy on the issues over me anymore .
19:06bluetailoff: You have my permission to use anything i developed cause i am getting older and older every day and i am a tired man, just to let you know that those models on linux for deep learning can be on linux built by far more superior than others have with licenses currently, and technically that is the way it should go too, but ...whatever.
19:42dj-death: can someone confirm that this is not correct NIR? : https://pastebin.com/UPPABSZu
19:42dj-death: or least the dominance is not correct right
19:43jenatali: What's the problem?
19:43dj-death: how can a value defined inside an if block, we no phi be used later on after the if block
19:43jenatali: I don't see %7 used outside the if block?
19:44dj-death: what happens to lanes not going through if block?
19:44jenatali: I think you just deleted too much from the thing you were trying to show. I see %7 defined in an if block and then never used
19:44jenatali: The extract uses %62 and %81 which aren't even in this paste
19:44dj-death: oh my bad
19:44dj-death: it was meant to be 32 %72 = extract_u16 %62, %7
19:45jenatali: Then yeah that looks wrong
19:47pendingchaos: assuming b2 isn't unreachable, then I think it's invalid NIR
19:50dj-death: I forced the dominance rebuild before CSE which is generating that code
19:50dj-death: but that doesn't help
19:50dj-death: I don't get it, it's so basic that it would be break lots of stuff
19:51jenatali: Yeah nir_validate should be complaining about this
19:52pendingchaos: that validation is disabled by default
19:52pendingchaos: NIR_DEBUG=validate_ssa_dominance enables it
19:53dj-death: the complete shader for ref : https://pastebin.com/cVNv44mT
19:54dj-death: pendingchaos: doesn't complain
19:54pendingchaos: because b4 is unreachable
19:55pendingchaos: so b5 has only one "actual" predecessor
19:55dj-death: this appears to be fixed on main
19:56jenatali: Ooh, yeah that loop is important
19:58dj-death: I'll bisect tomorrow
20:05alyssa: ......I am even less convinced then before this code is correct for sufficiently torturous deref chains
20:10alyssa: this isn't going to work is it.
20:31alyssa: ok, new idea
20:31alyssa: if I cap storage buffers at 2GB, I can use i2i64 instead
20:31alyssa: and then the important thing is nsw instead of nuw
20:31alyssa: which I think will actually work even in crazy cases
20:32alyssa: or... do something at a higher level.... actually...
20:32alyssa: wwwwwwwait
20:57alyssa: or maybe it is fine because derefs
20:57alyssa:rubs head
20:58alyssa: the 'case' that breaks it would be, like, x[-1][1] and expecting to get x[0][0]
20:58alyssa: although if you've gone out-of-bounds.. you've gone out of bounds and IDK that it's my job to save you
20:58alyssa: but I don't.. feel good about this
21:26dj-death: pendingchaos: I found that 60776f87c38f69507d60591b46b3ea2efba8e188 fixed the issue, but sounds more like a workaround :/
21:39dj-death: pendingchaos: do you think that endless loop can be removed?
21:39dj-death: pendingchaos: because opt_dead_cf is doing that later
21:39dj-death: pendingchaos: and that sets all hell loose
21:40dj-death: starting to generate phis to go into extract_u16 src1
21:41glehmann: yeah extract_u* is kind of a hack
21:42dj-death: what's the proper thing?
21:45glehmann: it shouldn't use a ssa def for the source that's required to be constant, but alu instructions can't have something like the intrinisic indices
21:45dj-death: glehmann: I see, I agree
21:46glehmann: I guess as long as extract works the way it does rn, nir_repair_ssa needs to rematerialize constants instead of creating phi(const, undef)
21:47dj-death: I think the root of my issue is really dropping the empty loop
21:47glehmann: dropping an empty loop should be valid
21:47dj-death: shouldn't just hang?
21:47dj-death: an empty loop with a break, sure
21:48dj-death: but without...
21:48glehmann: infinity loops are undefined behavior
21:48glehmann: so we should be able to remove them
21:49glehmann: *execution of an infinity loop is undefined behavior, the shader is still valid if that else is always skipped afaiu
21:50lensesonthere: So AMD/ARM/Intel never used bus entropy encodings , because... (you can see how immediates get encoded) PCIe has some encoding however. 8b10b I did not have time to look what means pci emulation path, but it is very likely that there should be hardware encoder for pci buses, and emul does encode the stuff on cpu. But it's safe bet that Cornell pdf has never materialized to commercial
21:50lensesonthere: commodity hw. So what say is that i know how to do it with quite little time from sw side quite surely now. And did you want to ask was that tough work what i did!! Very very is answer, you are not superiror to me, such patches as you do i could do every day, but there is no point in that.
22:05DemiMarie: glehmann: there needs to be a way for WebGL/WebGPU/etc to ensure that there is no undefined behavior in the code generated, and preventing detection of infinite loops seems very hard.
22:09glehmann: well then webgpu needs to solve the halting problem
22:13agd5f: has anyone else run into issues with cross compiles with 6.12-rc6? Something seems to have changed between -rc5 and -rc6. Seems to be something related to libssl
22:14agd5f: used to work with a 64 bit libssl, but now it seems to require a 32 bit one?
22:14dj-death: glehmann: was 60776f87c38f69507d60591b46b3ea2efba8e188 a way to deal with those invalid cases?
22:15DemiMarie: glehmann: the point is that web APIs are never allowed to have undefined behavior, end of story.
22:15DemiMarie: otherwise it is a security vulnerability
22:16DemiMarie: I believe that an asm block is inserted when generating Metal.
22:17DemiMarie: the solution is to have a way to ensure that infinite loops are well-defined
22:17DemiMarie: where “well-defined” explicitly includes VK_ERROR_DEVICE_LOST but not e.g. out-of-bounds accesses
22:19DemiMarie: If this is not possible then WebGL and WebGPU would be impossible to correctly implement on top of Vulkan.
22:19glehmann: dj-death: that commit was intended to be an optimization, but it will remove all phis that are problematic for extract_. Definitely doesn't feel right to rely on it for correctness tho
22:19DemiMarie: The web platform has a lot of black and white security invariants.
22:20DemiMarie: glehmann: what should webgpu do instead?
22:21glehmann: I have no idea, I try to avoid it at all cost
22:21dj-death: glehmann: yeah... guessing ACO would be sensitive to that too
22:23glehmann: yes, this isn't intel specific in any way
22:25glehmann: requiring ssa def source to be constant is fundamentally broken imo, but it's also hard to fix the existing cases in NIR
22:26dj-death: yeah
22:26dj-death: a nightmare
22:26dj-death: I guess modifications should ensure we can fallback to direct const sources
22:27dj-death: well no
22:27dj-death: because reverting 60776f87c38f69507d60591b46b3ea2efba8e188 brings back the issue
22:27dj-death: I thought maybe copy-prop would be enough :(