01:00 Venemo: is there anyone else here who hates that Mesa needs to recompile half the world when you touch nir.h?
01:02 zmike: yes
01:02 Venemo: then this is your lucky day
01:02 zmike: shame that the gpu stack needs a compiler stack
01:02 Venemo: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33439
01:03 zmike: 🤔
01:46 DemiMarie: FL4SHK: is your goal "have fun", "create something that is used by millions", or something in between?
02:05 FL4SHK[m]: Demi: for which of my projects?
02:05 FL4SHK[m]: "have fun" is the number one goal
02:06 FL4SHK[m]: for all my personal projects, but I also want more than me to be interested in my projects
02:06 DemiMarie: Do you want your ISA to be better than all the others?
02:07 FL4SHK[m]: ... frankly it's okay if that's not the case
02:07 FL4SHK[m]: I'm happy to do something that is "good for an FPGA implementation"
02:07 FL4SHK[m]: btw, my CPU's implementation is generated by a CPU generator thing I wrote
02:08 DemiMarie: Oooh interesting!!!
02:08 FL4SHK[m]: I feed the CPU generator an instruction decoder and a list of kinds of instructions
02:08 FL4SHK[m]: then it spits out the core of a CPU
02:08 FL4SHK[m]: it's best used with RISC type CPUs
02:09 DemiMarie: Do you want to go out of order or stick with in-order?
02:09 FL4SHK[m]: so far, I've only stuck with in-order, but generating an out of order CPU would be cool some time too
02:10 FL4SHK[m]: the generator only supports integer operations directly right now
02:11 FL4SHK[m]: but it's able to do something like a classically microcoded CPU
02:11 FL4SHK[m]: if you want something not directly supported by the generator
02:11 DemiMarie: Now I'm imagining an HTTP server implemented in the FPGA as an offload device.
02:11 FL4SHK[m]: haha
02:12 FL4SHK[m]: you could do something like that
02:12 DemiMarie: The idea being the fastest web server ever.
02:12 FL4SHK[m]: haha
02:13 DemiMarie: Because it is all-hardware in the fast path.
02:13 FL4SHK[m]: the generator is all open source
02:13 DemiMarie: Nice
02:13 FL4SHK[m]: called libsnowshouse on my github
02:13 DemiMarie: Could be useful for custom DSPs
02:14 DemiMarie: Where there is a domain-specific operation one wants to go really fast.
02:14 FL4SHK[m]: honestly, yeah, that's a good point
02:15 FL4SHK[m]: I made it so the FPGA code can actually be a pretty small *and* fast CPU, regarding both LUT count and clock frequency
02:15 FL4SHK[m]: that took a few weeks ago
02:16 FL4SHK[m]: that took a few weeks of effort*
02:17 FL4SHK[m]: the sample CPU I made with the generator reaches 130 MHz in my Arty A7 100T
02:18 FL4SHK[m]: for that CPU, I actually wrote a GCC port and GNU Binutils port
02:18 FL4SHK[m]: but it doesn't have virtual memory support
02:18 FL4SHK[m]: ... that will come eventually
02:19 FL4SHK[m]: along with support for actual cache coherence
02:40 DemiMarie: FL4SHK: What about using the FPGA as a JIT target?
02:40 DemiMarie: Also, will it have a GPU? What about using the FPGA LUTs themselves as the compilation target?
02:44 FL4SHK[m]: Demi: JIT as in JIT compile FPGA code? or JIT compile to machine code for a CPU running on the FPGA?
02:44 FL4SHK[m]: the former is virtually impossible
02:44 DemiMarie: FL4SHK: The first one of course, the second is too mundane.
02:44 FL4SHK[m]: it's virtually impossible in any practical way
02:45 FL4SHK[m]: takes *way way* too long to compiles for FPGAs
02:47 FL4SHK[m]: Full fledged compiling for FPGAs is an NP hard problem, and it takes my decently beefy laptop a long time to compile the code and the way down to bitstream
02:48 Venemo: zmike, karolherbst, alyssa, gfxstrand, for splitting off NIR passes from nir.h, what do you think would be the best way? 1 header file per pass (there are already some examples of this), or one header for all passes? or something in between?
02:48 FL4SHK[m]: the GPU I'm developing for my custom game system is going to be fixed function, but the CPU for my custom game system will have vector instructions at least
02:48 FL4SHK[m]: I'm implementing the GPU in PipelineC.
02:49 FL4SHK[m]: It's been based on my software rasterizer
02:50 FL4SHK[m]: which was written in C++. Additionally, quite a bit of the code isn't that hard to translate over to PipelineC.
02:57 DemiMarie: FL4SHK: I mean something that is *much* less optimized, but still faster than interpreting the code, if possible.
02:59 FL4SHK[m]: maybe something with partial reconfiguration?
02:59 DemiMarie: I was thinking of a greedy algorithm for placement that stops when it runs out of space, and yes.
02:59 DemiMarie: The idea is that the FPGA modifies its own configuration at runtime.
03:00 FL4SHK[m]: Plus the FPGA synthesis plus pnr software isn't open source
03:00 DemiMarie: You would need an FPGA for which the software is open source
03:00 FL4SHK[m]: yeah
03:00 FL4SHK[m]: or reverse engineered or something
03:01 DemiMarie: Anyway this is getting off-topic, sorry Venemo.
03:01 FL4SHK[m]: ah sorry
03:01 DemiMarie: A Mesa backend for an FPGA, or a DRM driver for an FPGA, would be on-topic though.
04:36 airlied: cmarcelo: just fyi when I played with it a long while ago I think the sycl llama.cpp backend was 4x faster than the mesa vulkan one
04:36 airlied: from memory I think mesa was 0.5 my CPU and sycl was 2x my cpu roughly, I think it was on an IGP, but I also had an A770 so memory is a bit fuzzy
04:39 cmarcelo: airlied: tks. we might have issues on our side but I think also some tweaking can be done on the shader. I'll try the SYCL tomorrow to see the kind of partitioning they are doing for each thread. my suspicion is in vulkan we are trying to handle too much data and getting overwhelmed by spilling.
06:15 neggles: DemiMarie: heard of NextSilicon? :P
07:16 tnt: karolherbst: Actually my bad, it also doesn't work with rusticl. I just didn't notice at first because in my case it causes a segfault (it's a bug, if it finds GL but not GLX/EGL). For rusticl, it just ends up without cl/gl sharing and I hadn't noticed at first.
07:16 tnt: But using LD_DEBUG=all I can see the lookups done by rusticl also fail to find either glXGetProcAddress or eglGetProcAddress
07:55 mivanchev: hey, does anyone know where the megadriver is being dlopen'd on X11 when using OpenGL?
07:55 mivanchev: in the gbm backend?
08:26 MrCooper: mivanchev: no GBM should be involved in that scenario
08:26 mivanchev: MrCooper, well can you help me locate the place starting from dri3_create_screen
08:27 mivanchev: I see the entry points in dril_dri and maybe that's the mega driver but i can't find where it's getting dlopen'd
08:27 MrCooper: I'd have to dig myself, sorry
08:27 MrCooper: dril is only used in Xorg
08:27 MrCooper: client side goes directly to libgallium
08:28 mivanchev: yes I figured that much too! thank you!!!
08:29 MrCooper: so are you asking about in Xorg?
08:30 mivanchev: yes, it's Xorg
08:30 mivanchev: i got confused because i found a dlopen in GBM
08:30 mivanchev: that supposedly loads drivers
08:31 MrCooper: I mean, are you asking about the server side or client side?
08:31 MrCooper: that's the GBM backend
08:34 mivanchev: I need to know, when I start glxgears on X11, where Mesa opens the mega driver and accesses https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/gallium/targets/dril/dril_target.c#L558
08:34 mivanchev: which is const __DRIextension **__driDriverGetExtensions_##drivername(void);
08:42 MrCooper: it doesn't, dril is only used on the server side
08:43 mivanchev: can you explain to me shortly what "server-side" means in mesa?
08:54 MrCooper: inside the Xorg process
08:56 MrCooper: dril is just a backwards compatibility layer for the Xorg process, it's not used anywhere else
08:57 mivanchev: OK, thought so
08:58 mivanchev: so the question remains where the megadriver is loaded :/
08:58 MrCooper: the "megadriver" is libgallium now
09:22 mivanchev: MrCooper, I guess I'm trying to find out, somewhere around 'driCreateNewScreen3' where the driver name is used to get the initialization methods
09:54 MrCooper: zamundaaa[m]: I thought I saw somebody somewhere mention that KWin uses drmModeMoveCursor / DRM_IOCTL_MODE_CURSOR(2) for moving the cursor even when otherwise using atomic KMS; I can't seem to find such code on the current master branch though?
10:15 emersion: i don't think it does?
10:19 MrCooper: do you know of any other compositor which does?
10:31 zamundaaa[m]: MrCooper: it did a very long time ago
10:31 MrCooper: why did it stop?
10:32 zamundaaa[m]: The cursor move invalidates previous atomic test results. In practice that caused atomic commits to fail with hardware rotation on AMD for example
10:39 MrCooper: I see
10:43 MrCooper: did it seem to work fine in general with the nvidia driver though?
10:46 emersion: yeah, mixing legacy and atomic is not a good idea
10:47 MrCooper: that's my only real worry for mutter at this point, it doesn't use test commits much yet (and when that changes, doesn't seem hard to suspend asynchronous cursor moves between test commits and the real one)
10:47 emersion: chromeos used to do it as well, and in some cases the cursor just stopped being displayed or got stuck
10:48 MrCooper: I mean, it's essentially the same as what happens with Xorg
10:48 emersion: xorg doesn't use atomic so
10:48 MrCooper: I'm aware
10:48 MrCooper: still fundamentally the same interaction between flips and cursor moves
10:49 emersion: if you only set state which was settable with legacy api, maybe
10:49 emersion: if you set some other state, then no
10:49 emersion: the real fix here is to improve the atomic api
10:50 MrCooper: no time for that for mutter 48
12:18 bbrezillon: sima: just a quick feedback regarding yesterday's discussion around syncobj waits. Turns out I was doing one iteration before being blocked, so I'm not sure the race exists
12:21 sima: bbrezillon, hm
12:21 sima: bbrezillon, I guess mildly refactoring that code, pulling out helpers and reducing those monster stacked if ladders would still be good
12:22 sima: because I can't parse that thing anymore :-/
12:22 sima: s/if ladders/if conditions/
12:25 bbrezillon: agreed