IRC Logs of #dri-devel on irc.freenode.net for 2023-12-15

00:51 mareko: agd5f: LLVM seems too bloated to me and evolves too slowly, compile times suck, the lack of proper VMEM clausing has't been resolved for years despite being known by the LLVM team, and it doesn't even support graphics and there is no development and no interest to add it, the other team supports graphics shaders by lowering them in LLPC instead of LLVM, whle Mesa lowers them in NIR; while NIR gets new
00:51 mareko: passes for gfx regularly, LLVM has exactly 0; don't expect LLVM to get any better, but if we disregard VMEM clauses and some other issues, the GPU performance overall is pretty good
00:52 mareko: the scope of what LLVM is doing is like 50% of NIR, which is like the backend half of NIR, it just doesn't do enough to be a replacement for NIR
00:53 mareko: it's just an SSA -> bytecode translator with good late optimizations
00:54 mareko: while the frontend optimizations for GPUs are non-existent
00:56 iive: when bisecting, llvm is always a huge hurdle because of api changes.
00:58 mareko: NIR is actually the first mover in the graphics compiler space because it's the first SSA-based compiler that actually caters to graphics
01:04 iive: Also, I've been away from mesa3d topics for like 5 years, I do remember hoping llvm getting fastisel back then...
01:06 HdkR: globalisel is the new hotness and it still isn't good enough :)
01:11 mareko: LLVM will forever be just a bytecode translator from an ISA-level IR (who will generate it?), the only way to fix LLVM deficiencies is to add the fucking second half of the compiler that's missing there that nobody wants to add
01:12 mareko: it's a good bytecode translator though, I have to say
01:13 FL4SHK[m]: HdkR: long time no see
01:14 HdkR: uh
01:14 FL4SHK[m]: Dolphin
01:15 HdkR: whew
01:15 FL4SHK[m]: that's where I know your username from
01:15 FL4SHK[m]: I'm not sure how long it's been
01:16 FL4SHK[m]: But it's been at least a few years
01:16 HdkR: I'm always around, just not with Dolphin
01:16 FL4SHK[m]: ah
01:19 iive: mareko, let's say that mesa3d decides to move away from llvm. how could that happen. could existing developers handle the task, would it require new developers.
01:19 airlied: I think it would be quicker to close the NIR gap on compute features than close the LLVM gap on graphics
01:19 airlied: we already have done it
01:20 mareko: airlied: and also this: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10285
01:21 airlied: gotta turn some of those - into + :-P
01:22 mareko: ACO already has an advantage in how it forms VMEM clauses, which is why the Energy tests are better with ACO
01:29 iive: if llvm is dead-end, move away from it. don't let the sunken cost fallacy keep you down.
01:29 iive: Oh, btw, I'm around because of R600 regression. I've filled a bugreport and provided a trace, would be nice somebody to take a look.
01:29 iive: n8
08:37 MrCooper: I haven't seen even a proposal for replacing llvmpipe, so "Mesa is moving away from LLVM" seems a tad premature
08:48 pq: Referring to yesterday's discussion, is there any doc on how dmabuf accepting programs should be tested? Maybe even with some notes on how to do it in a pure software (CI) environment?
10:11 Company: pq: when I asked last time, nobody had an option other than "use Mesa if available"
10:11 Company: the only suggestion was dma_heap, but I don't think that works in CI
10:16 Company: fwiw, software dmabuf support is high on my wishlist for 3 reasons:
10:16 Company: 1. CI
10:17 Company: 2. unifying codepaths for software rendering (in particular on thn clients and other devices without GPU)
10:18 Company: 3. the ability to allocate dmabuf memory for things like software video decoders or Cairo renderers
10:19 Company: for example GTK3 could be made do that so it can use linux-dmabuf instead of shm with its Cairo rendering
10:21 Company: ((3) is a smaller issue, because it can be done with Vulkan, and then you even get GPU memory and not just any memory, but still)
10:31 pq: Company, yeah, I'm curious about what all different kinds of dmabuf should be used in tests and where to get those from, given the discussion from yesterday made the point there are subtly different types, some needing more carefuly handling than others.
10:34 sima: roughly four: dma-buf where you get cached system memory and don't have to do expensive cache management with SYNC_START/END (you should still do), dma-buf with write-combine (because reading from those will suck), dma-buf with cached mmap that does require SYNC_START/END and dma-buf with flat out no mmap
10:34 sima: at least for mmap
10:35 sima: add in implicit sync for additional fun, but with the import/export ioctl that should be a lot easier to test (together with getting userspace controlled dma_fence from vgem for testing)
10:35 sima: note that outside of testing userspace controlled dma_fence are a very bad idea because they might livelock and then timeout, so functionally buggy under memory pressure :-)
10:36 sima: livelock or deadlock, it depends how you look at it
10:36 sima: Company, pq ^^
10:37 pq: sima, I've heard most of those words, but I have no clue at all how to turn those words into code in my tests.
10:37 sima: yeah, atm not doable, I think adding some "give me a special dma-buf" ioctl to vgem would make sense
10:38 sima: maybe combined with drm_fourcc.h test pattern generation even
10:39 pq: sounds nice
11:08 vsyrjala: looking for review for a fb refcount fix: https://patchwork.freedesktop.org/series/127618/ any takers?
12:21 emersion: mattst88: any objections if i make a pixman release?
12:22 emersion: previous one is 1 years old
12:22 emersion: year*
12:47 dj-death: I have opencl code that parsed a structure with 4 uint32_t generate the following description for it :
12:47 dj-death: %struct_VkDrawIndirectCommand = OpTypeStruct %uint %uint %uint %uint
12:48 dj-death: and nowhere are the offsets of the fields specified in the spirv
12:48 dj-death: so the created glsl_type in the spirv parser has all the field offset = -1
12:48 dj-death: not sure what I should be doing to fix it
12:49 dj-death: I could assign a default location ?
12:55 dj-death: any tip? :)
12:56 dj-death: maybe call glsl_get_explicit_interface_type() on that type
12:57 dj-death: in the OpenCL case
13:02 jenatali: dj-death: you need to lower to explicit types
13:12 dj-death: jenatali: thanks, nir_lower_vars_to_explicit_types() ?
13:13 jenatali: Yep
13:14 dj-death: jenatali: thanks a lot, that works :)
13:15 dj-death: jenatali: have you run into CL code the generates clang errors of this type? :
13:15 dj-death: InvalidBitWidth: Invalid bit width in input: 128
13:15 dj-death: by any chance :)
13:16 dj-death: I had my lowering one step too late
13:16 jenatali: Yeah you have to generate with O0
13:16 jenatali: I've never seen 128, that seems like an SSE type
13:17 dj-death: hmm
13:17 dj-death: not working unfortunately
13:17 dj-death: -cl-std=cl2.0 -O0 -D__OPENCL_VERSION__=200 -DGFX_VERx10=110
13:17 dj-death: unless I need to put that -O0 somewhere
13:18 jenatali: What's the source look like? Is there actually a 128 bit type somewhere?
13:18 dj-death: no
13:19 jenatali: I'm not at a PC but if you want to paste the source I can take a look later
13:19 dj-death: jenatali: that's calling this function that is causing problems : https://paste.mozilla.org/duwj1k3L
13:19 jenatali: I'd check godbolt to see where the LLVM ir is getting an i128
13:21 dj-death: https://paste.mozilla.org/yECPJdek/slim
13:21 dj-death: with the caller
13:22 dj-death: dropping that code in #if 1 makes it compile
13:22 dj-death: not sure if godblot will be able to take that much code in
13:24 jenatali: Well somehow you need to see why LLVM is generating an i128
13:25 dj-death: yeah no way around it
13:25 dj-death: will give it a try
13:40 dj-death: jenatali: I supposed there is no easy way to do this locally
13:41 jenatali: dj-death: you can dump the .bc from clang, I just like how godbolt does the highlighting of which source like produced an IR line
13:46 dj-death: I suppose I need to hook that up with intel_clc
13:57 dj-death: jenatali: heh, reproduced on godbolt
14:00 dj-death: damn the scroll to source doesn't work
14:05 dj-death: jenatali: okay, don't use : 1ull << (some_value < 32)
14:05 dj-death: apparently 1ul is fine
14:32 jenatali: dj-death: weird, I guess long long is 128 bits in clang's data layout for CL/spir
15:04 dj-death: Where is meaning of SpvMemoryAccessAlignedMask documented? :)
15:24 dj-death: jenatali: have you fought much with alignment issues?
15:24 jenatali: Very much yes
15:24 dj-death: jenatali: generated spirv code has aligment of 1 for SpvOpCopyMemorySized
15:24 dj-death: jenatali: any tip on that side?
15:25 jenatali: And the memcpy can't be turned into a struct copy in nir?
15:26 jenatali: My fight has mainly been getting my backend to handle unaligned things, less about trying to coerce them into being aligned in the first place
15:26 jenatali: Since we still have to deal with packed structs resulting in unaligned accessess
15:28 dj-death: ah right
15:28 dj-death: I packed my struct in the hope things would have the same layout in CL & C
15:28 dj-death: maybe that was a mistake
15:29 jenatali: Yes
15:29 jenatali: Packed structs for whatever reason get alignment of 1 in LLVM
15:29 dj-death: yep
15:29 dj-death: I'm seeing that now
15:30 dj-death: I'm dealing with lots of small values
15:30 dj-death: like u8
15:30 jenatali: Sounds like you want load/store vectorization
15:30 dj-death: yeah
15:31 dj-death: I'm trying to make all struct a series of u8 that rounds up to u32
15:31 dj-death: or rather aligns to u32
15:31 jenatali: Unions?
15:31 dj-death: no just structs
15:31 dj-death: like the vertex index ;)
15:31 jenatali: Right I'm saying could you union it?
15:32 dj-death: only 33 of them
15:32 dj-death: no it's more I have lots of small values per each of those elements
15:32 dj-death: and I'm trying to have a single load carry all the data for each element
15:33 dj-death: rather then have a u32 field for each value
15:33 jenatali: Can you just run the vectorizer pass in the compiler?
15:33 dj-death: yeah
15:33 dj-death: but I also want to limit bandwidth
15:33 dj-death: and register space
15:34 jenatali: Now you're into problems I've not dealt with
15:34 jenatali: My compiler experience ends at producing an IR
15:34 dj-death: heh
15:34 dj-death: well thanks again anyway
15:34 dj-death: you've put me on the path of solving all my issues so far :)
15:37 jenatali: Glad to help
15:59 mattst88: emersion: I was hoping to resolve https://gitlab.freedesktop.org/pixman/pixman/-/merge_requests/78 but it's been two months... :(
16:00 mattst88: I guess the only thing that would be nice to fix before a release is https://gitlab.freedesktop.org/pixman/pixman/-/issues/85
16:00 mattst88: but yeah, feel free to make a release