02:50Company: so it turns out my shader compile wasn't inflooping
02:50Company: after 12 minutes, the rpi had compiled it
02:52Company: I suspect it's screwing up dead code elimination and inlining too much
04:16karolherbst: dcbaker: fyi, I'll work on a meson/rust/bindgen feature I want to use with 1.3 and it shouldn't take too much time. bindgen-0.64 (with fixes and support for C enums in 0.65) can generate C wrapper files for static inline functions, so one doesn't have to do it manually anymore. I still want to think about how to design the meson side but I think rust.bindgen should return an array of files (0: .rs 1: .c) and adding an optional `output_
04:16karolherbst: inline_wrapper` argument which then adds the required args to `bindgen`
04:16karolherbst: or something
04:16karolherbst: this way it stays backwards compatiblwe
04:19karolherbst: but that probably also requires meson to check for bindgen's version.. probably? dunno
08:07mediaim: Yeah well my phone has hardware tap on, it was arranged at overseas, I have not had very much resources to buy a new one.
08:26mareko: zmike: why don't you just use implicit sync provided by the kernel?
08:29mareko: you don't need any synchronization code in userspace, you just need a buffer list per submit (muhahaha)
08:50kode54: what would I need to do to actually list what's happening in an apitrace log?
08:51kode54: somehow, the last API calls in this D3D11 log cause DXVK to crash the GPU when I use xe.ko
10:31karolherbst: dcbaker: https://github.com/mesonbuild/meson/pull/12263
10:48karolherbst: dcbaker, cmarcelo: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25265
10:53karolherbst: anyway, don't depend on this getting merged soon, brecause after a meson req bump I still want to wait a few weeks even after the release
12:28zmike: mareko: 🤕
12:52mediaim: https://reverseengineering.stackexchange.com/questions/26975/not-able-to-understand-the-c-switch-statement-in-disassembly
12:53mediaim: Such a simple example, those pointer dereferences can be traced well, they end up as loads, that can be modified
13:12UndeadLeech: I'm running into issues on panfrost where dmabuf framebuffers are apparently leaked when using Firefox. Anyone run into something like this?
14:42cmarcelo: karolherbst: cool
17:28Company: good news everyone!
17:28Company: Mesa only takes about an hour to compile on an rpi
17:28Company: the bad news is that it doesn't include the v3d driver by default
17:29Company: but a recompile after enabling only took 15 minutes
17:29Company: but my shader still takes >10 minutes to compile
17:31HdkR: I assume v3d vulkan driver? v3d GL driver seems to be enabled by default on arm platforms at least
17:32Company: I meant the GL driver - but maybe I screwed something up
17:32Company: I did configure some things, and maybe that made autodetection decide to turn it off
17:33HdkR: https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/meson.build?ref_type=heads#L137 Default list shows v3d enabled if auto is selected there :)
17:33Company: to match the Fedora build
17:35Company: yeah, meson.build looks good
17:35Company: I'll assume that was my fault
18:08DavidHeidelberg[m]: karolherbst: we could use tinygrad for some performance rusticl testing https://paste.sr.ht/~okias/523dc5ec01e8db7b70374f0a6dd70fdf663ef17f I guess we could have something similar to trace graphs and reports for LLM
18:09DavidHeidelberg[m]: in tinygrad repo, they testing it with GPT2 which doesn't need 13G of VRAM (I guess so far we're limited on radeonsi on Valve farms, if we want to test)
18:10DavidHeidelberg[m]: also GPT2 model has ~ 500M, which is nicer to the runners
18:22karolherbst: yeah.. that might be a good idea
18:23karolherbst: though I think I'd kinda want to have some CTS testing on hardware first
18:55DavidHeidelberg[m]: karolherbst: looking at release date of OpenCL CTS (2021-03-25) it doesn't look like OCL is any priorty... :/
19:01karolherbst: well.. there has been plenty of git activity
19:07karolherbst: anyway, it's what is used to test conformance, and I'd prefer tracking not regressing anything there than performance at this point. I'm doing it locally so far
19:16DavidHeidelberg[m]: karolherbst: deqp has some basic CL support, so it should be doable in CI
19:18karolherbst: mhh, didn't know. Maybe those tests are actually useful
19:25DavidHeidelberg[m]: karolherbst: https://gitlab.freedesktop.org/anholt/deqp-runner/-/commit/909a37a4778157fa50388524cc3c151e0c33bcba
19:32tertl8: i recently purchased the steamdeck for my first ever amd chip
19:33tertl8: my nvidia box wont post
19:33tertl8: and i dont care enough to fix it
19:49karolherbst: DavidHeidelberg[m]: ahh yeah, sadly that's kinda useless, because it doesn't fully implement querying all subtests
19:50karolherbst: I have my own runner to deal with most of that CL CTS nonsense: https://gitlab.freedesktop.org/karolherbst/opencl_cts_runner/-/blob/master/clctsrunner.py?ref_type=heads
20:14DemiMarie: karolherbst: I’m trying to understand what exactly is different between Nvidia GPUs and others. Would an accurate summary be that on most GPUs, it is insecure to allow more than one GPU context to execute at a time, whereas on Nvidia GPUs, it is possible to safely execute code from different contexts concurrently?
20:21karolherbst: You can't execute more than one GPU context anyway. Also I think you are too hung up on that matter. Nvidia simply just has a head start getting all the security in place, there isn't really more to it
20:25DemiMarie: Makes sense
20:26karolherbst: There are some details like some parts being designed with privilege seperation in mind. Like userspace doing things on MMIO ranges and not putting anything security relevant in there. But that's just things you end up doing if you have that use case and think it's important enough to care
20:34DemiMarie: As opposed to requiring everything to go through the kernel?
20:34karolherbst: yeah. It's mostly done for context relevant things, like command submission or engine allocation and other rando things
20:39DemiMarie: That makes sense. Compute really cares about kernel bypass because of latency, whereas graphics probably cares less.
20:40DemiMarie: AGX would be the opposite, providing very little security in the GPU and making the kernel responsible for everything.
20:42DemiMarie: Still, as Asahi Lina has shown, one can write a stable and secure driver for it.
20:42karolherbst: yeah.. there are pros and cons to either approach
20:43DemiMarie: Honestly I prefer doing more stuff in the driver and less in the firmware.
20:44karolherbst: yeah.. some of the nvidia stuff relies on the firmware not being broken, but some of it is also actual hardware level stuff.
20:44karolherbst: but then if it's broken you can't fix it
20:48DemiMarie: Yup! And sometimes it is broken (as mwk can testify to).
20:54Lynne: isn't the new sriov stuff useful for isolation?
20:55Lynne: for amd, iirc even preemption is supported
20:55karolherbst: SR-IOV is just making use of hardware features
20:55karolherbst: SR-IOV is just "marketing" so to speak
20:56karolherbst: but to properly do it, you need the hardware being able to splice blocks in a way you can't leak information across those splices
20:56karolherbst: nvidia hardware is quite dynamic there, like you can assign specific amounts of SMs or VRAM controller to each SR-IOV device
21:34soreau: mareko: FWIW, the transparent texture bug only happens when calling glClear twice in a row with glScissor set first to a dimension height of 1. If I skip the call in this condition, the bug doesn't happen. Does that happen to ring any bells?
21:37soreau: of course the texture is rendered after glClear call for the fb
21:37soreau: and if I double up on glClear calls, there is no consequence, so long as one (the first?) isn't a height of 1
21:38soreau: on the scissor
21:42DemiMarie: karolherbst: nvidia hardware also only supports SR-IOV for compute IIUC
21:42DemiMarie: karolherbst: do you plan to support SR-IOV in nouveau once the GSC firmware is supported?
21:47soreau: mareko: but if I double the calls and always set height = 1 for the first time around, there is flickering and garbage for the blur shaders every frame
21:49karolherbst: DemiMarie: maybe? I have no idea how much work needs to be done or if it's all configured via GSP
21:50DemiMarie: karolherbst: I doubt the GSP is involved, inasmuch as the current driver does not use the GSP for it. Nvidia plans on supporting vGPU in the open driver eventually, but states that GSP firmware changes will be required. I wonder if this is because of licensing.
21:51karolherbst: DemiMarie: nah, it's mostly because GSP will handle most of that in the future
21:52karolherbst: and then you essentially do RPC calls to set up the slices
21:52DemiMarie: karolherbst: my concern is that the GSP will handle license enforcement, meaning that vGPU will be unusable in practice
21:52DemiMarie: unless BIGCORP
21:52karolherbst: mhhh...
21:52karolherbst: yeah.. dunno
21:53DemiMarie: Whereas a reverse-engineered driver could simply skip these checks.
21:56karolherbst: sure, but it might not be possible to only use GSP for some bits and not for others... we'll kinda have to see how it all works out. But somebody also has to put in the work and maintain those special paths and everything. But I also kinda don't see how nvidia could enfore any sort of license there
21:58DemiMarie: true
22:03soreau: mareko: but if the height == 1 dimension is set after the first legit glClear scissor box, it's fine always
22:04soreau:wonders if he can make a simple case to reproduce
22:10Company: spirv vs text compiles of shaders can be quite different I guess?
22:10Company: one of my shaders seems 4x slower in Vulkan than GL
22:21soreau: nope, it must be a perfect storm of events for this bug to happen, though it is reproducible for me the same way nearly every time
22:55soreau: mareko: and if I simply glFlush() in between glClear()s, the bug doesn't happen either
23:03mediaim: It's cause it tries to respond in non adequate, but all accounting and considering way, in case you can toggle the dereferences in all the time, so a variables value can be changed this way.
23:04mediaim: So compiler generates those pointers such as jump table compiler codegen.
23:05mediaim: They appear as memory loads in machine code.
23:06mediaim: But cause runtime can give that address to patch up with DBI
23:07mediaim: You can fixup those addresses to point to code section
23:08mediaim: Cause you had already gotten the code section pointer, cause of pointing to function as with function pointer, or pointing to immediate data value
23:09kode54: do random people constantly join this channel to spout nonsense
23:11mediaim: It considers that pointer can be dereferenced all the time any time
23:11mediaim: That would immediately change the variable pointed to it's value
23:11mediaim: Case table switch generates such code
23:13Sachiel: it's not random people
23:13kode54: it's one person
23:13kode54: or bot
23:13kode54: I take it you've had to deal with this person for ages now
23:14Sachiel: yup
23:15mediaim: The compiler expects that pointer can be dereferenced at any time
23:15mediaim: It generates pointer loads at every basic block
23:16kode54: wonder if I got an answer last night, let's check the backlog
23:19mediaim: That itself makes your parser very thin, cause you search for address in the hex value
23:19mediaim: Like regex query you can perform
23:19mediaim: That's how you get in
23:23mediaim: The program is very easy to trace hence
23:23mediaim: You construct a system call change
23:24mediaim: And output constant that is
23:24mediaim: Casted to some value from immediate
23:24mediaim: Next thing you patch up pointers
23:25mediaim: Cause you had read the memories located at code section
23:26mediaim: It's very easy to trace any program cause they call syscalls
23:27mediaim: And when you present their output as const it will me used to point too inst addl immediate
23:28mediaim: You just interpose the syscall if not you get in from interrupt handler too
23:30mediaim: It just tells that now buf is a new value potentially, doing that twice
23:30mediaim: As pointing something at them
23:31mediaim: So first time it generates a constant in immediate
23:31mediaim: Field of instruction
23:31mediaim: Like say incr or add or anything really
23:32mediaim: Add is good cause you do not risk with page fault
23:32mediaim: Mmiotrace is hence pretty lame
23:33mediaim: Cause it's so darn slow
23:34mediaim: So after you have that pointer you start to read the memory contiguously
23:35mediaim: And then in all basic blocks containing the syscalls
23:35mediaim: You can patch the rest of the problem
23:47bl4ckb0ne: mediaim: you're off topic, this is a chan for gpu dev
23:56DemiMarie: can someone ban mediaim?