00:26zmike: uhhh
00:27zmike: never seen that before
00:28zmike: hmmm I think I can imagine how it might happen
00:30zmike: too bad I'm at the pub
00:31karolherbst: have fun, but also, why are you checking in on IRC while in the pub
00:37Ristovski: one does not waste optimization opportunities, no matter the occasion
00:39zmike: I got a ping, I checked the ping
13:48zmike: jenatali: try https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33285
13:50jenatali: zmike: will do
16:05pistolsmoke: maybe someone wants to meet those estonian fuck aces that spam the world with wankterror , da real deal instead of my techtalks. Asses full of biritish scumbags load. Terrible people. still i think encoding banks outside the hash itself that goes into one register, no longer makes sense, there are zero delay interrupts possible yet, and mixture with non-compressed intrinsical methods,
16:05pistolsmoke: however more than that i do not see getting done, i do not think i am capably in form for this, nested compressed intrinsics would give entire maniac results, but one needs to come out with some addressing schematic, i have not landed to that spot with any of my thoughts. Nested ones are probably possible but that has already complexity, i am not prepared for this and i exhibit no form at
16:05pistolsmoke: all tbh. in mathematics or programming, it's just what i talked about is very simple schools material before.
16:29jenatali: zmike: Yep, fixed it
16:30zmike: jenatali: cool
16:30zmike: does it work for whatever you were testing too?
16:30jenatali: It exhibits the same bug I was seeing with the d3d12 backend and softpipe
16:30jenatali: I.e. not my bug :)
16:30zmike: cool
16:30zmike: wait softpipe?
16:31zmike: or llvmpipe
16:31jenatali: Softpipe. This is an arm64 laptop and I didn't feel like building llvm
16:31zmike: coward
16:31jenatali: Guilty
16:31jenatali: FWIW this is the bug: https://github.com/telegramdesktop/tdesktop/issues/28905
16:32zmike: gross
17:06alyssa: how is host_image_copy supposed to work with sparse? o_O
17:06zmike: it's not
17:07alyssa: then why is there CTS for it :clown:
17:07zmike: or maybe it is and I misremembered
17:09alyssa:returns not_supported and CTS shuts up
17:09alyssa: deeply unserious
17:10zmike: smh how will anyone use this driver
17:11alyssa: holding it wrong
17:13HdkR:aims graphics driver directly at foot
17:30jenatali: ... how do you do sparse with host access at all?
17:31zmike: very carefully
17:33alyssa: jenatali: that is my question
17:50alyssa: I guess lavapipe could do it
18:08memleak: hello, i've been working with X for many years and trying to help a guy out with a really obscure problem and i'm stuck.
18:09memleak: he's using an intel board with a i915 GPU that is too new for the 5.4 LTS kernel so I helped get his system to use fbdev first with simplefb, that didn't work (no suitable framebuffer found) even though /dev/fb0 was there
18:10memleak: next i tried to get vesa going for him by uninstalling the xorg xserver fbdev driver to help shut vesa up, but now he gets: V_BIOS address 0x0 out of range
18:10memleak: vesa doesn't want to start if fbdev is available and if /dev/fb0 is present so i satisfied those dependencies for him
18:11memleak: never in my life saw this V_BIOS address 0x0 out of range problem before and no idea how to fix.
18:11Ermine: is upgrading to newer kernel an option? There are newer LTS kernels
18:12memleak: it has to be 5.4 because we're using RTAI
18:12memleak: (real-time kernel, preempt_rt not fast enough)
18:13memleak: i maintain the rtai repository for linuxcnc. my only other option looks to be backporting support for meteorlake into 5.4 if i can't get vesa going
18:13Ermine: Well.. I'd try to blacklist any fbdev modules that show up and try to get xorg up with modesetting ddx
18:15Ermine: (so simpledrm driver is in charge on kernel side)
18:15memleak: checking if simpledrm is in 5.4...
18:16memleak: nope, introduced in 5.14
18:16Ermine: ugh
18:17jenatali: alyssa: Do you know how lavapipe does sparse?
18:18Ermine: just in case, there's also #xorg channel
18:22memleak: thanks Ermine checking there too :)
18:22memleak: never saw this memory error come up before
18:29memleak: i'm going to try backporting simpledrm to 5.4 and see how much work it is
18:29alyssa: jenatali: I assume mmap(MAP_FIXED)?
18:29jenatali: Ah yeah, makes sense
18:31alyssa: yeah, more or less https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29408
18:38lumag: sima, vsyrjala: any resolution on the https://lore.kernel.org/dri-devel/it2puzcitkui2inz4tmvkpig47jyz2efeq3udzffnqwomf3r3v@5sylpgnvqdxk/ ?
18:42alyssa: does dEQP-VK.dynamic_rendering.primary_cmd_buff.local_read.depth_stencil_mapping_to_no_index_depth_clear crash for anyone with recent CTS?
18:42alyssa: seems to be a CTS regression
18:42alyssa: but I find it .. challenging to find things in gerrit (:
18:42zmike: I've literally never seen a CTS crash in my life
18:43alyssa: dies inside
18:43alyssa: #0 0x0000000002500d4c in vk::createShaderModule(vk::DeviceInterface const&, vk::VkDevice_s*, vk::ProgramBinary const&, unsigned int) ()
18:46HdkR: zmike: Closing your eyes when the CTS runs doesn't count
18:46zmike: lavapipe too strong, too handsome to crash
18:47alyssa: oh that's https://gitlab.khronos.org/Tracker/vk-gl-cts/-/issues/5565 I guess
18:47alyssa: I guess I can just .dynamicRenderingLocalReadDepthStencilAttachments = true, it's not like anything uses DRLR right hahaha? *sweats*
18:48memleak: nope.. backporting simpledrm is not feasible
18:50vsyrjala: lumag: i guess i don't really care what as long it doesn't add some assumptions that trip up when drivers trigger modesets that don't need any extra state checks
19:18DemiMarie: memleak: why is preempt_rt too slow?
19:19memleak: latency needs to be lower for this guy to control stepper motors
19:19memleak: rtai is always lower
19:36DemiMarie: Have you considered using a completely separate RTOS for the motor control tasks? Staying on Linux 5.4 isn’t going to be feasible in the long term.
19:37DemiMarie: This could be done with either dedicated cores hidden from Linux or with a separate microcontroller.
19:42zmike: tarceri: pls make sure to assign your glsl fix today before the branchpoint
19:54memleak: Yeah, the real fix is to port LinuxCNC to EVL/Dovetail for lower latency than what you'll get with PREEMPT_RT and it supports the later kernels, there's just no support for it in LinuxCNC
19:55memleak: I'm trying a different approach with the vesa driver, going to do it in gentoo, maybe something screwy is happening with debian
19:55memleak: I've never seen vesa not work in gentoo, not in my builds anyway
19:56memleak: brb
20:29airlied: karolherbst: moving here from other channel, could we move the intel subgroups into a faster to compile header?
20:29karolherbst: I'm working on pch support
20:29karolherbst: that should speed it up a lot
20:29karolherbst: the intel spirv files are generated roughly 8 times faster with pch
20:30Ristovski: damn, quite a speedup
20:30karolherbst: soo the plan is to simply precompile the opencl-c.h file
20:30karolherbst: and then it doesn't matter
20:31karolherbst: I have it working on the cli, but it's kinda more painful to get it working with libclang...
20:42karolherbst: it works.. nice
20:45karolherbst: with PCH: time ninja src/intel/shaders/intel_gfx_shaders.pch src/intel/shaders/intel_gfx{80,90,110,120,125,200,300}_shaders.spv => real 0m0,482s
20:47karolherbst: without PCH: time ninja src/intel/shaders/intel_gfx{80,90,110,120,125,200,300}_shaders.spv => real 0m1,471s
20:47karolherbst: the difference in user is even more impressive
20:47karolherbst: user 0m7,584s => user 0m1,205s
20:47karolherbst: guess it's waiting more on IO than burning through the CPU
20:48karolherbst: though the time spent is kinda variable
20:48psykose: think it's just parallelism (slowest file went from 1,2 => 0,4, bit random), but they're all faster so the user goes way down
20:49karolherbst: mhhh yeah fair
20:49karolherbst: I mean, the PCH stuff needs to parse the header once single threaded
20:49karolherbst: so the actual compile jobs start later
20:49karolherbst: anyway...
20:49psykose: means they're even faster than that 0,482 then for the slowest :)
20:49karolherbst: yeah..
20:49karolherbst: it's a small win
20:50psykose: lines up with the -ftime-trace i saw for a lot of the smaller <5s files, they were mostly 50+% frontend
20:50psykose: the huge 5 minute ones i think won't be impacted that much
20:50karolherbst: it's just the intel shader stuff, but we do have a couple of more jobs using mesa_clc
20:50psykose: er, not 5 minutes, 50 s
20:50karolherbst: we should port the intel raytracing stuff over to mesa_clc as well
20:52karolherbst: anyway.. best case without PCH: real 0m1,450s user 0m7,324s
20:52karolherbst: best case with PCH: real 0m0,480s user 0m1,342s
20:52karolherbst: dj-death, airlied: ^^
20:53karolherbst: it won't make CI that much faster, but it's something :D
20:53psykose: amd stuff has a lot of such files too if you wanna try too
20:53psykose: i think the c++ especially
20:53karolherbst: this is just for OpenCL C files
20:53psykose: yea
20:54karolherbst: host side PCH is kinda a build system mess I don't want to get involved in yet :D
20:54karolherbst: the rules are funky, even for CL, but the environment is way more controlled
20:55karolherbst: like at least with clang you can only have a single pch (though you can chain them by linking a new header pulling in a pch)
20:55karolherbst: anyway.. it's kinda a mess
20:55karolherbst: now... I need to clean that mess up
21:06karolherbst: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33292
21:10dj-death: karolherbst: we'll just delete this stuff soon, don't bother too much ;)
21:10karolherbst: heh
21:10karolherbst: though I wanted to have the pch interfaces anyway
21:14alyssa: karolherbst: what's this PCH thing?
21:14karolherbst: precompiled header
21:14alyssa: I see
21:14karolherbst: tldr: the AST gets dumped to a file and reused
21:17alyssa: I guess I should review this
21:17alyssa: Probably tomorrow though, my brain is too fried rn I think
21:18karolherbst: in theory we could reuse the same PCH file across all of mesa_clc users as long as they use the same CL version and stuff
21:19karolherbst: could also add more header files into the PCH, but not sure it matters much as the opencl-c.h is kinda the big one here
21:22alyssa: karolherbst: i'm kinda confused, why is opencl-c.h not already pulled in?
21:23karolherbst: what do you mean?
21:23alyssa: what does this have to do with intel_subgroups
21:23alyssa: aren't we already #include'ing this?
21:23jenatali: alyssa: Clang has most CL support in a builtin
21:23karolherbst: depending on supported extensions we either pull in opencl-c-base.h or opencl-c.h
21:23alyssa: jenatali: ..Cute
21:23karolherbst: and the latter is like 15x the size or so
21:23karolherbst: and it slows down compilation speed by a lot
21:23alyssa: Ok, I see
21:23airlied: I wonder if we could just move more things to the builtins
21:24karolherbst: in llvm? yeah, certainly
21:24alyssa: ^ or just copy paste the subset we care about and include that instead?
21:24jenatali: Yeah but that introduces a dependency for new LLVM
21:24alyssa: that might also require llvm changes
21:24jenatali: And that's... sometimes painful
21:24karolherbst: yeah, but then we have to maintain our own file
21:25alyssa: seems .. fine?
21:25karolherbst: not sure I want to deal with clang internals there
21:25alyssa: also I feel like I'm missing context, why are we interested in cl_intel_subgroups?
21:25karolherbst: you enable it in mesa_clc
21:25alyssa: Uhoh
21:25karolherbst: but there are other extensions which might pull in the big header file as well
21:25alyssa: I think I cargo culted that from the old intel_clc
21:26alyssa: Not sure anything is using it now that GRL is ogne
21:26karolherbst: yeah.. the raytracing one will need it for sure
21:26alyssa: gone
21:26karolherbst: could also disable it...
21:26alyssa: I guess grl is still in tree..
21:26karolherbst: let me check how much pch has an impact with intel_subgroups disabled
21:27alyssa: yeah I think this is just there because of grl
21:27alyssa: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32472
21:28karolherbst: but I want to enable intel_subgroups in rusticl sooner or later, so I kinda want PCH support anyway
21:28alyssa: that's fair
21:28alyssa: pch applies to rusticl online compilation too?
21:28karolherbst: not yet
21:28karolherbst: would need to wire it up
21:29karolherbst: it's a bit more complicated in online compilation, because you need one per combination of arguments I'm not sure yet which matter
21:29karolherbst: I know that the CL version matters...
21:32alyssa: ok, right
21:32alyssa: nothing uses intel_subgroups with mesa_clc
21:33alyssa: but. as a design decision, mesa_clc just enables everything to make it easy for mesa drivers to use
21:33alyssa: and since mesa supports intel_subgroups, .. yeah
21:33alyssa: probably doesn't make sense to special case disable this ext in mesa_clc, either
21:34karolherbst: especially if it's not that much slower with PCH, though I'll do some testing on it
21:46karolherbst: uhh.. after rebase that's going to be a bit more painful to support.. well.. still possible
21:54karolherbst: yeah.. so intel_subgroups disabled gives us real 0m0,279s and user 0m1,195s
21:57karolherbst: intel_subgroups disabled with PCH is in the same ball park
21:57karolherbst: slightly less user, slightly more real
21:57karolherbst: so I think it will become faster with more users of the PCH
21:58alyssa: karolherbst: so what should we do?
21:59karolherbst: well.. I want the clc bits anyway. I think it also makes sense for intel_subgroups disabled, because the compile time with the pch not changed is massively faster in either case
21:59karolherbst: => faster development
22:00karolherbst: but it's opt-in anyway, so might as well keep it
22:01karolherbst: let me remember which extension also needed it...
22:02karolherbst: ahh yeah
22:02karolherbst: kernel_clocks
22:02karolherbst: needs the big file as well
22:03karolherbst: I'm sure the coop matrix stuff will also land there
22:04karolherbst: anyway.. I think it's good to have it in place in case we have stronger needs for it
22:06alyssa: sounds good
22:06alyssa: will review tomorrow
23:10mareko: daniels: debian-ppc64el still has LLVM 15; it's OK to stop building radeonsi and radv for it, right?
23:17mareko: jenatali: I wonder if it's better to build windows drivers with -Dllvm=disabled, that should be fine AMD drivers
23:17mareko: in CI
23:18jenatali: mareko: We build CL components there
23:18jenatali: Those need LLVM
23:20jenatali: mareko: I'm attempting to upgrade my local build from 15 to 19. Assuming it looks fine we should be able to bump CI too