IRC Logs of #dri-devel on irc.freenode.net for 2025-01-29

00:26 zmike: uhhh
00:27 zmike: never seen that before
00:28 zmike: hmmm I think I can imagine how it might happen
00:30 zmike: too bad I'm at the pub
00:31 karolherbst: have fun, but also, why are you checking in on IRC while in the pub
00:37 Ristovski: one does not waste optimization opportunities, no matter the occasion
00:39 zmike: I got a ping, I checked the ping
13:48 zmike: jenatali: try https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33285
13:50 jenatali: zmike: will do
16:05 pistolsmoke: maybe someone wants to meet those estonian fuck aces that spam the world with wankterror , da real deal instead of my techtalks. Asses full of biritish scumbags load. Terrible people. still i think encoding banks outside the hash itself that goes into one register, no longer makes sense, there are zero delay interrupts possible yet, and mixture with non-compressed intrinsical methods,
16:05 pistolsmoke: however more than that i do not see getting done, i do not think i am capably in form for this, nested compressed intrinsics would give entire maniac results, but one needs to come out with some addressing schematic, i have not landed to that spot with any of my thoughts. Nested ones are probably possible but that has already complexity, i am not prepared for this and i exhibit no form at
16:05 pistolsmoke: all tbh. in mathematics or programming, it's just what i talked about is very simple schools material before.
16:29 jenatali: zmike: Yep, fixed it
16:30 zmike: jenatali: cool
16:30 zmike: does it work for whatever you were testing too?
16:30 jenatali: It exhibits the same bug I was seeing with the d3d12 backend and softpipe
16:30 jenatali: I.e. not my bug :)
16:30 zmike: cool
16:30 zmike: wait softpipe?
16:31 zmike: or llvmpipe
16:31 jenatali: Softpipe. This is an arm64 laptop and I didn't feel like building llvm
16:31 zmike: coward
16:31 jenatali: Guilty
16:31 jenatali: FWIW this is the bug: https://github.com/telegramdesktop/tdesktop/issues/28905
16:32 zmike: gross
17:06 alyssa: how is host_image_copy supposed to work with sparse? o_O
17:06 zmike: it's not
17:07 alyssa: then why is there CTS for it :clown:
17:07 zmike: or maybe it is and I misremembered
17:09 alyssa:returns not_supported and CTS shuts up
17:09 alyssa: deeply unserious
17:10 zmike: smh how will anyone use this driver
17:11 alyssa: holding it wrong
17:13 HdkR:aims graphics driver directly at foot
17:30 jenatali: ... how do you do sparse with host access at all?
17:31 zmike: very carefully
17:33 alyssa: jenatali: that is my question
17:50 alyssa: I guess lavapipe could do it
18:08 memleak: hello, i've been working with X for many years and trying to help a guy out with a really obscure problem and i'm stuck.
18:09 memleak: he's using an intel board with a i915 GPU that is too new for the 5.4 LTS kernel so I helped get his system to use fbdev first with simplefb, that didn't work (no suitable framebuffer found) even though /dev/fb0 was there
18:10 memleak: next i tried to get vesa going for him by uninstalling the xorg xserver fbdev driver to help shut vesa up, but now he gets: V_BIOS address 0x0 out of range
18:10 memleak: vesa doesn't want to start if fbdev is available and if /dev/fb0 is present so i satisfied those dependencies for him
18:11 memleak: never in my life saw this V_BIOS address 0x0 out of range problem before and no idea how to fix.
18:11 Ermine: is upgrading to newer kernel an option? There are newer LTS kernels
18:12 memleak: it has to be 5.4 because we're using RTAI
18:12 memleak: (real-time kernel, preempt_rt not fast enough)
18:13 memleak: i maintain the rtai repository for linuxcnc. my only other option looks to be backporting support for meteorlake into 5.4 if i can't get vesa going
18:13 Ermine: Well.. I'd try to blacklist any fbdev modules that show up and try to get xorg up with modesetting ddx
18:15 Ermine: (so simpledrm driver is in charge on kernel side)
18:15 memleak: checking if simpledrm is in 5.4...
18:16 memleak: nope, introduced in 5.14
18:16 Ermine: ugh
18:17 jenatali: alyssa: Do you know how lavapipe does sparse?
18:18 Ermine: just in case, there's also #xorg channel
18:22 memleak: thanks Ermine checking there too :)
18:22 memleak: never saw this memory error come up before
18:29 memleak: i'm going to try backporting simpledrm to 5.4 and see how much work it is
18:29 alyssa: jenatali: I assume mmap(MAP_FIXED)?
18:29 jenatali: Ah yeah, makes sense
18:31 alyssa: yeah, more or less https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29408
18:38 lumag: sima, vsyrjala: any resolution on the https://lore.kernel.org/dri-devel/it2puzcitkui2inz4tmvkpig47jyz2efeq3udzffnqwomf3r3v@5sylpgnvqdxk/ ?
18:42 alyssa: does dEQP-VK.dynamic_rendering.primary_cmd_buff.local_read.depth_stencil_mapping_to_no_index_depth_clear crash for anyone with recent CTS?
18:42 alyssa: seems to be a CTS regression
18:42 alyssa: but I find it .. challenging to find things in gerrit (:
18:42 zmike: I've literally never seen a CTS crash in my life
18:43 alyssa: dies inside
18:43 alyssa: #0 0x0000000002500d4c in vk::createShaderModule(vk::DeviceInterface const&, vk::VkDevice_s*, vk::ProgramBinary const&, unsigned int) ()
18:46 HdkR: zmike: Closing your eyes when the CTS runs doesn't count
18:46 zmike: lavapipe too strong, too handsome to crash
18:47 alyssa: oh that's https://gitlab.khronos.org/Tracker/vk-gl-cts/-/issues/5565 I guess
18:47 alyssa: I guess I can just .dynamicRenderingLocalReadDepthStencilAttachments = true, it's not like anything uses DRLR right hahaha? *sweats*
18:48 memleak: nope.. backporting simpledrm is not feasible
18:50 vsyrjala: lumag: i guess i don't really care what as long it doesn't add some assumptions that trip up when drivers trigger modesets that don't need any extra state checks
19:18 DemiMarie: memleak: why is preempt_rt too slow?
19:19 memleak: latency needs to be lower for this guy to control stepper motors
19:19 memleak: rtai is always lower
19:36 DemiMarie: Have you considered using a completely separate RTOS for the motor control tasks? Staying on Linux 5.4 isn’t going to be feasible in the long term.
19:37 DemiMarie: This could be done with either dedicated cores hidden from Linux or with a separate microcontroller.
19:42 zmike: tarceri: pls make sure to assign your glsl fix today before the branchpoint
19:54 memleak: Yeah, the real fix is to port LinuxCNC to EVL/Dovetail for lower latency than what you'll get with PREEMPT_RT and it supports the later kernels, there's just no support for it in LinuxCNC
19:55 memleak: I'm trying a different approach with the vesa driver, going to do it in gentoo, maybe something screwy is happening with debian
19:55 memleak: I've never seen vesa not work in gentoo, not in my builds anyway
19:56 memleak: brb
20:29 airlied: karolherbst: moving here from other channel, could we move the intel subgroups into a faster to compile header?
20:29 karolherbst: I'm working on pch support
20:29 karolherbst: that should speed it up a lot
20:29 karolherbst: the intel spirv files are generated roughly 8 times faster with pch
20:30 Ristovski: damn, quite a speedup
20:30 karolherbst: soo the plan is to simply precompile the opencl-c.h file
20:30 karolherbst: and then it doesn't matter
20:31 karolherbst: I have it working on the cli, but it's kinda more painful to get it working with libclang...
20:42 karolherbst: it works.. nice
20:45 karolherbst: with PCH: time ninja src/intel/shaders/intel_gfx_shaders.pch src/intel/shaders/intel_gfx{80,90,110,120,125,200,300}_shaders.spv => real 0m0,482s
20:47 karolherbst: without PCH: time ninja src/intel/shaders/intel_gfx{80,90,110,120,125,200,300}_shaders.spv => real 0m1,471s
20:47 karolherbst: the difference in user is even more impressive
20:47 karolherbst: user 0m7,584s => user 0m1,205s
20:47 karolherbst: guess it's waiting more on IO than burning through the CPU
20:48 karolherbst: though the time spent is kinda variable
20:48 psykose: think it's just parallelism (slowest file went from 1,2 => 0,4, bit random), but they're all faster so the user goes way down
20:49 karolherbst: mhhh yeah fair
20:49 karolherbst: I mean, the PCH stuff needs to parse the header once single threaded
20:49 karolherbst: so the actual compile jobs start later
20:49 karolherbst: anyway...
20:49 psykose: means they're even faster than that 0,482 then for the slowest :)
20:49 karolherbst: yeah..
20:49 karolherbst: it's a small win
20:50 psykose: lines up with the -ftime-trace i saw for a lot of the smaller <5s files, they were mostly 50+% frontend
20:50 psykose: the huge 5 minute ones i think won't be impacted that much
20:50 karolherbst: it's just the intel shader stuff, but we do have a couple of more jobs using mesa_clc
20:50 psykose: er, not 5 minutes, 50 s
20:50 karolherbst: we should port the intel raytracing stuff over to mesa_clc as well
20:52 karolherbst: anyway.. best case without PCH: real 0m1,450s user 0m7,324s
20:52 karolherbst: best case with PCH: real 0m0,480s user 0m1,342s
20:52 karolherbst: dj-death, airlied: ^^
20:53 karolherbst: it won't make CI that much faster, but it's something :D
20:53 psykose: amd stuff has a lot of such files too if you wanna try too
20:53 psykose: i think the c++ especially
20:53 karolherbst: this is just for OpenCL C files
20:53 psykose: yea
20:54 karolherbst: host side PCH is kinda a build system mess I don't want to get involved in yet :D
20:54 karolherbst: the rules are funky, even for CL, but the environment is way more controlled
20:55 karolherbst: like at least with clang you can only have a single pch (though you can chain them by linking a new header pulling in a pch)
20:55 karolherbst: anyway.. it's kinda a mess
20:55 karolherbst: now... I need to clean that mess up
21:06 karolherbst: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33292
21:10 dj-death: karolherbst: we'll just delete this stuff soon, don't bother too much ;)
21:10 karolherbst: heh
21:10 karolherbst: though I wanted to have the pch interfaces anyway
21:14 alyssa: karolherbst: what's this PCH thing?
21:14 karolherbst: precompiled header
21:14 alyssa: I see
21:14 karolherbst: tldr: the AST gets dumped to a file and reused
21:17 alyssa: I guess I should review this
21:17 alyssa: Probably tomorrow though, my brain is too fried rn I think
21:18 karolherbst: in theory we could reuse the same PCH file across all of mesa_clc users as long as they use the same CL version and stuff
21:19 karolherbst: could also add more header files into the PCH, but not sure it matters much as the opencl-c.h is kinda the big one here
21:22 alyssa: karolherbst: i'm kinda confused, why is opencl-c.h not already pulled in?
21:23 karolherbst: what do you mean?
21:23 alyssa: what does this have to do with intel_subgroups
21:23 alyssa: aren't we already #include'ing this?
21:23 jenatali: alyssa: Clang has most CL support in a builtin
21:23 karolherbst: depending on supported extensions we either pull in opencl-c-base.h or opencl-c.h
21:23 alyssa: jenatali: ..Cute
21:23 karolherbst: and the latter is like 15x the size or so
21:23 karolherbst: and it slows down compilation speed by a lot
21:23 alyssa: Ok, I see
21:23 airlied: I wonder if we could just move more things to the builtins
21:24 karolherbst: in llvm? yeah, certainly
21:24 alyssa: ^ or just copy paste the subset we care about and include that instead?
21:24 jenatali: Yeah but that introduces a dependency for new LLVM
21:24 alyssa: that might also require llvm changes
21:24 jenatali: And that's... sometimes painful
21:24 karolherbst: yeah, but then we have to maintain our own file
21:25 alyssa: seems .. fine?
21:25 karolherbst: not sure I want to deal with clang internals there
21:25 alyssa: also I feel like I'm missing context, why are we interested in cl_intel_subgroups?
21:25 karolherbst: you enable it in mesa_clc
21:25 alyssa: Uhoh
21:25 karolherbst: but there are other extensions which might pull in the big header file as well
21:25 alyssa: I think I cargo culted that from the old intel_clc
21:26 alyssa: Not sure anything is using it now that GRL is ogne
21:26 karolherbst: yeah.. the raytracing one will need it for sure
21:26 alyssa: gone
21:26 karolherbst: could also disable it...
21:26 alyssa: I guess grl is still in tree..
21:26 karolherbst: let me check how much pch has an impact with intel_subgroups disabled
21:27 alyssa: yeah I think this is just there because of grl
21:27 alyssa: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32472
21:28 karolherbst: but I want to enable intel_subgroups in rusticl sooner or later, so I kinda want PCH support anyway
21:28 alyssa: that's fair
21:28 alyssa: pch applies to rusticl online compilation too?
21:28 karolherbst: not yet
21:28 karolherbst: would need to wire it up
21:29 karolherbst: it's a bit more complicated in online compilation, because you need one per combination of arguments I'm not sure yet which matter
21:29 karolherbst: I know that the CL version matters...
21:32 alyssa: ok, right
21:32 alyssa: nothing uses intel_subgroups with mesa_clc
21:33 alyssa: but. as a design decision, mesa_clc just enables everything to make it easy for mesa drivers to use
21:33 alyssa: and since mesa supports intel_subgroups, .. yeah
21:33 alyssa: probably doesn't make sense to special case disable this ext in mesa_clc, either
21:34 karolherbst: especially if it's not that much slower with PCH, though I'll do some testing on it
21:46 karolherbst: uhh.. after rebase that's going to be a bit more painful to support.. well.. still possible
21:54 karolherbst: yeah.. so intel_subgroups disabled gives us real 0m0,279s and user 0m1,195s
21:57 karolherbst: intel_subgroups disabled with PCH is in the same ball park
21:57 karolherbst: slightly less user, slightly more real
21:57 karolherbst: so I think it will become faster with more users of the PCH
21:58 alyssa: karolherbst: so what should we do?
21:59 karolherbst: well.. I want the clc bits anyway. I think it also makes sense for intel_subgroups disabled, because the compile time with the pch not changed is massively faster in either case
21:59 karolherbst: => faster development
22:00 karolherbst: but it's opt-in anyway, so might as well keep it
22:01 karolherbst: let me remember which extension also needed it...
22:02 karolherbst: ahh yeah
22:02 karolherbst: kernel_clocks
22:02 karolherbst: needs the big file as well
22:03 karolherbst: I'm sure the coop matrix stuff will also land there
22:04 karolherbst: anyway.. I think it's good to have it in place in case we have stronger needs for it
22:06 alyssa: sounds good
22:06 alyssa: will review tomorrow
23:10 mareko: daniels: debian-ppc64el still has LLVM 15; it's OK to stop building radeonsi and radv for it, right?
23:17 mareko: jenatali: I wonder if it's better to build windows drivers with -Dllvm=disabled, that should be fine AMD drivers
23:17 mareko: in CI
23:18 jenatali: mareko: We build CL components there
23:18 jenatali: Those need LLVM
23:20 jenatali: mareko: I'm attempting to upgrade my local build from 15 to 19. Assuming it looks fine we should be able to bump CI too