IRC Logs of #dri-devel on irc.freenode.net for 2024-01-16

03:04 mareko: I'm thinking of pinning Mesa threads to 1 core each including the app thread, so that they can't be moved between CPUs
03:06 mareko: benchmarks results are too random without it to the point that CPU performance is not testable
03:19 jenatali: That sounds like something reasonable behind an environment variable. I don't know that I agree with that idea for the general case
03:19 airlied: yeah sounds like something for testing, but would probably screw up in the real world
03:20 mareko: drivers could opt out
03:20 mareko: but it should work fine if the thread->core assignment is randomized
03:21 jenatali: Until you get an app that wants to pin its own threads
03:21 jenatali: Or the random assigns multiple app threads to the same core
03:21 mareko: it can do that before creating a GL context
03:22 jenatali: Yeah but that's not supposed to be part of the GL API. What if a context is bound to multiple threads sequentially?
03:22 airlied: mareko: still seems like a bad idea outside of benchmark comparisons
03:23 airlied: screwing with app thread pinning is definitely hostile behaviour
03:23 mareko: apps pinning their threads won't be affected
03:24 airlied: how do you know other apps won't be affected though?
03:24 airlied: like how do you even know what the app thread is
03:24 jenatali: Right. When do you test that condition? Fighting with an app over that seems bad
03:24 airlied: apps can have multiple threads on multiple contexts
03:24 mareko: the app thread calls MakeCurrent
03:24 jenatali: Definitely seems like driconf at best
03:28 airlied: like just the interactions with firefox or chromium make me shudder lots of processes getting pinned
03:30 mareko: I've been thinking about it for a while, and the problems you are bringing up are solved problems, e.g. multi context and multi process thread distribution, using adjacent cores for each context, randomizing thread distribution within a core complex, not changing thread affinity of app threads are already assigned to a single core, etc.
03:30 jenatali: Random pinning also could be bad for benchmark determinism. E.g. if (on Windows at least) you pin to a thread that's responsible for DPCs on one run and not on another
03:31 mareko: for starters let's just consider radeonsi and Linux
03:32 airlied: mareko: but mesa isn't the OS scheduler
03:33 airlied: the gpu driver isn't in charge of making those decisions for the whole desktop env
03:33 mareko: apps do
03:33 mareko: libraries do, engines do, runtimes do
03:34 airlied: and we should add to the mess?
03:35 airlied: also I'm sure this isn't common across x86 vendor cpus, and even less so when it comes to non-gaming apps or video transcoding servers etc
03:35 airlied: this isn't the drivers job, it's the schedulers job, if you want to become a scheduler developer, go do that :-)
03:35 mareko: there is always a cost-benefit ratio to everything
03:36 jenatali: Obviously if you scope it like that, it doesn't impact me so I don't really have any stake here. Still sounds like a bad idea to do generally but I'm open to changing my mind with data
03:36 airlied: like the driver doesn't have enough info to make decisions any better than anyone else, so it's likely outside of some benchmarks it'll make bad decisions just as often as good ones
03:37 mareko: hypothetically yes
03:37 jenatali: Yeah that's my take too, as an OS developer
03:37 airlied: you don't understand anything about the apps thread layout or usage patterns, you also will screw with the OS ability to power down cores
03:37 airlied: power management is screwey enough
03:38 airlied: doing this in the driver is the wrong hammer, now you could maybe do it in something like gamemode
03:39 airlied: where you say I'm launching a game, and gamemode goes and screws all your thread affinities
03:39 mareko: it can do that
03:39 airlied: but I'd think amd vs intel cpus will want different thigns
03:40 airlied: or at least amd cpus seem to care a lot more about thread locality
03:40 mareko: the pinning would only happen at initialization based on what the affinity mask is, and then it's up to apps
03:41 mareko: generally you want threads to stay on the same core, so that you don't lose cached data
03:42 mareko: other than that, AMD only needs to pin when you have multiple L3 caches, so e.g. 8-core Zen3 doesn't need pinning at all
03:43 airlied: you just reminded me of 8945, probably need to opt llvmpipe out of any pinning to cpus we aren't allowed to use
03:43 airlied: that we do that is also actively user hostile
03:44 mareko: an app pinning its thread calling GL to a single core is driver hostile
03:46 mareko: the current L3 cache pinning mechanism in Mesa has the best cost-benefit (i.e. pros/cons) ratio of any solution
03:46 airlied: an app can't be hostile to the driver, it can just have bad performance due to decisions it makes
03:46 airlied: if anyone cares they should contact the app developers and fix it
03:46 airlied: just because you have a driver, it doesn't mean you get to fix the world's problems in it
03:47 airlied: like yes it's easier to just hack it and move on, but it will screw up others trying to do the right thing
03:48 mareko: contacting app developers doesn't work :) ok, let's message 1 million app developers about how they should pin threads
03:49 airlied: they don't all do it wrong though do they
03:49 mareko: I think about it as a cost-benefit and ROI situation
03:49 mareko: the cost of doing in correctly in the kernel is too high
03:49 mareko: *it
03:51 mareko: for a Mesa person that is
03:54 airlied: yeah that's why I'd suggest something like gamemode
03:54 airlied: since it already screws with all those things
03:57 mareko: it's not possible to implement a thread scheduler in a different thread or a process that does exactly what Mesa does right now because pthreads are not visible in /proc, only the PID is, and thus you can't query which CPU a pthread is running on from a different pthread
03:58 mareko: the only pthread that can open its /proc/.../stat is self
03:59 airlied: huh threads are all under /proc/<pid>/task/<tid> ?
03:59 mareko: nope
04:01 mareko: even if you open the stat file of task/tid and verify that read() works, and you pass the fd to another thread, the read() on the other thread fails
04:03 mareko: the last thing worth trying is sending the opened fd of the stat file over a socket to another thread, so that it's properly duped, but I don't know if that would work
04:05 mareko: other than that, you can only open the main thread (pid) from any thread, not tid
04:05 mareko: the stat file I mean
04:08 airlied: seems like cpuset could be used for something like that
04:09 mareko: either the kernel or Mesa must do it
04:10 mareko: realistically it all sucks, practically it's the best we have
04:10 airlied: I would think having gamemode configure cpusets or cgroups would possibly be a better idea, when you say stat do you mean status?
04:10 airlied: stat doesn't seem to contain anything we'd want
04:12 mareko: /proc/pid/task/tid/stat contains the CPU number, which is useful when you want to move a set of threads to the same complex
04:13 mareko: but only the current thread can read it if the current thread is not a process, so it's as good as sched_getcpu
04:14 airlied: it's wierd I can read task from any process/thread my user is running
04:14 airlied: cat /proc/*/task/*/stat
04:14 airlied: or do you get 0 for the cpu in that case?
04:15 mareko: tids shouldn't be visible to getdents according to the documentation, so cat shouldn't print them
04:16 airlied: prints them here
04:16 mareko: but I can see them, hm
04:16 airlied: at least for firefox
04:18 mareko: pthread_create -> SYS_gettid in that thread -> trying to open /proc/self/task/tid/stat fails in another thread for me
04:19 mareko: I'm going to pursue the single core pinning idea as driconf at least
04:21 mareko: and possibly make it a default for radeonsi except pinning the app thread
04:22 mareko: zink will likely follow because it likes numbers
04:24 airlied: just get some numbers that aren't just specviewperf
04:25 airlied: real games showing actual fps changes is more likely to persuade people, or persuade phoronix to try it out :-P
04:25 jenatali: +1 multiple benchmarks, ideally from real apps would be great to see
06:37 HdkR: As an additional topic around CPU pinning being a bad idea that I didn't see. In a Big.Little world that we are in, pinning randomly can have catastrophic performance implications
06:38 HdkR: Obviously not a concern when the system has homogeneous CPU cores or cache layout, but Intel, AMD, and ARM devices are all changing this and it should be left to the kernel to schedule properly
06:44 mareko: airlied: there are not many GL games that still work anymore, most Feral's games don't run on recent distros
06:50 HdkR: As a "fun" aside, on an AMD Phoenix system with Zen4 + Zen4c cores, userspace can't determine which cores are which. CPUID returns identical data. So if you pin on to a Zen4c core, you immediately lose performance.
06:51 HdkR: At least on Intel systems you can query which cores are "Atom" versus "Core"
06:52 HdkR: Getting randomly pinned on Meteor Lake's low power E-Core cluster is going to have a really bad time. Similar to getting randomly pinned to a Cortex-A53
06:52 mareko: that's all nice, but it doesn't help the fact that there is no better solution
06:52 mareko: what Mesa does right now is the best thing we have
06:53 mareko: it's so good that you don't want to run without it in most cases
06:54 HdkR: Indeed, the best solution ends up leaving the scheduling to the kernel knowing the downsides that get incurred there
06:56 mareko: it doesn't leave it to the kernel
06:57 HdkR: You've got the special case for AMD cpu systems that pin to a single L3 arrangement, but everything else should fall back to kernel affinity scheduling?
06:58 mareko: the GL calling thread is free
06:58 mareko: 3 driver threads follow it using sched_getcpu and dynamic thread affinity adjustments
06:59 mareko: enabled by radeonsi and zink
07:00 ishitatsuyuki: there is cache affinity, which mareko just described and is already implemented. then there is also pinning the bottlenecked / single-threaded thread to the faster core, which doesn't seem to be widely done or getting plumbed yet
07:00 mareko: the affinity changes based which core the GL calling thread is occupying
07:01 mareko: it actually looks pretty awesome in the gnome system monitor if you set the same color to CPUs of the same L3, only the cores of the same color are utilized, and when the app thread jumps to a different L3, the Mesa threads jump to that L3 too
07:03 mareko: the next step is to prevent the app thread from jumping to a different L3, and an even stronger solution would be to disallow threads from moving between cores completely
07:05 mareko: without that, performance is still too random sometimes
07:06 mareko: up to -15% random
07:07 mareko: we'll hardcode this and big.LITTLE stuff in Mesa if we have to
07:10 HdkR: I guess I'll revisit this topic once I get caught by a game getting locked to a Cortex-A520 cluster
07:11 mareko: it's only applied to Zen CPUs for now
07:11 HdkR: That's good
07:12 mareko: usually you get 1 person trying to improve the situation by actually writing the code and 10 naysayers whose only job is to complain
07:13 HdkR: I can confirm that I'm a naysayer
07:15 karolherbst: airlied: fyi, llvmpipe breaks on llvm-18
07:15 karolherbst: "Callsite was not defined with variable arguments! ptr @llvm.coro.end"
07:15 mareko: Valve and low-mid APUs aren't even affected because you need a Zen CPU AND at least 2 L3 caches (12+ cores) to get the thread pinning code
07:15 mareko: enabled
07:16 mareko: on Zen 1-2 you need 6+ cores
07:17 HdkR: and Phoenix isn't affected because it says L3 due to monolithic die
07:18 HdkR: it shares L3*
07:18 mareko: it has 8 cores max, so 1 L3
07:22 airlied: mareko: also we have a lot more apps using GL now than previously with gtk4 all apps will mostly use GL, not sure we want to pin every single desktop app to misc cores
07:23 airlied: again you might get better perf in one game, but are screwing the whole power management
07:23 airlied: like you would likely want to migrate some background tasks to perf cores, having mesa pin them is hostile
07:25 airlied: saying nobody else wants to write code to screw with cpu affinity in a gpu driver project is a bit disingenious
07:25 airlied: nobody should be writing cpu affinity screwing code in a gpu driver project, we don't have the expertise to make those sort of decisions
07:25 mareko: yeah gtk4 changes the game
07:25 airlied: we might move gtk4 to vulkan sooner though, but initially will be gl4 based I think
07:26 mareko: the kernel folks apparently don't have the expertise either, or maybe it's too difficult to implement it there
07:26 airlied: I'm sure there are lots of scheduler people who would love to talk about a game optimised scheduler in the kernel :-P
07:26 airlied: kernel graphics folks probably don't, you'd want scheduler people
07:26 mareko: that's who I mean
07:27 airlied: I'd have though amd would have some cpu sched people who were big on getting ccix locality right
07:27 airlied: like cluster aware scheduler was one thing in that area
07:27 mareko: it's been 7 years since Zen launched, where have the sched people been all this time?
07:28 airlied: https://www.phoronix.com/news/AMD-Linux-L2-Cluster-Scheduler
07:28 airlied: I'm also aware that even if the kernel improves nobody will notice with apps if userspace overrides things anyways
07:29 mareko: trust, the kernel does nothing today, I can test it immediately
07:30 airlied: I'm going to guess nobody has done much work outside of EPYC cpus
07:30 airlied: at AMD at least
07:31 mareko: yep, nothing
07:33 airlied: seems like the cluster scheduler guy at amd would be worth talking to to see if he has any interest in ryzen
07:34 mareko: Linux 6.5.0, the Mesa thread pinning improves performance by 11% in a simple open source game I just ran
07:34 airlied: esp since it seems to need L3 clusters
07:44 mareko: airlied: if Mesa keeps the app thread alone and only prevents its own threads from moving, the kernel scheduler will be free to schedule the rest of the system around those
07:44 mareko: also those threads would be mostly idle with gtk apps
07:45 mareko: it's worth exploring as an improvement
08:08 MrCooper: mareko: instead of going behind the backs of the kernel scheduler and user, you should work toward adding UAPI which gives the former the information to do a better job
08:09 mareko: not my expertise
08:09 MrCooper: too bad then, this definitely isn't Mesa's business though
08:10 mareko: we've already talked about that it is userspace's business though
08:10 MrCooper: I disagree
08:10 mareko: you can
08:12 mareko: the affinity API exists precisely for this
08:13 MrCooper: nope
08:14 MrCooper: see e.g. https://gitlab.freedesktop.org/mesa/mesa/-/issues/8945 for one reason why this can't be handled by Mesa
08:15 MrCooper: the API exists for the user to control which cores an application may use
08:15 airlied: yeah that bug shows that mesa is being actively hostile to other users in places where they've directly requested it not be
08:16 MrCooper: what's needed is new UAPI which tells the kernel scheduler which threads should be kept closely together
08:17 MrCooper: I pointed this out to mareko years ago
08:22 daniels: if it’s a userspace rather then kernel responsibility, then why aren’t apps already doing it?
08:23 daniels: why Mesa rather then glibc or systemd or PipeWire or libwayland-client?
08:44 mareko: apps are doing it, game engines definitely do it, but not for driver threads
08:47 mareko: any app that cares about multi-core performance has to do it, like HPC apps
08:48 mareko: MrCooper: there are other people who can do programming, I don't have to do everythiing
08:49 MrCooper: indeed, not everything can be solved in Mesa
08:51 mareko: you can show your disagreement by writing code, wouldn't that work better for everybody?
08:52 jrayhawk: https://people.freedesktop.org/~cbrill/dri-log/ should be working again
08:52 mareko: that gitlab issue is for swrast_dri.so, which doesn't use any of my Mesa thread pinning, so I don't know why people are even talking to me about it
08:53 mareko: other than cpu_detect
08:54 ccr: perhaps that fact should be mentioned in the issue(?)
08:54 mareko: yes
08:54 MrCooper: mareko: can you not try to make me responsible for solving your issues? I'm not the one who's proposing to make Mesa do something it shouldn't
08:58 mareko: MrCooper: There are no issues for me to solve. What we have in Mesa gives me 11% more CPU performance. That's what I care about. Mesa should absolutely continue doing that if there is no other alternative. If you disagree, that's on you.
08:58 MrCooper: guess I'll have to live with that
09:02 mareko: if there are problems with apps, we'll deal with them individually
09:02 MrCooper: if there's no issue to solve, you don't need to do the thing which started this discussion
09:07 mareko: I think you are still bummed out by the fact that my initial implementation broke apps and we had an argument about it, but the current implementation is solid and has been enabled since 2020, adding up to 33% CPU performance, which is like a multi-generational uplift
09:09 MrCooper: no, it's always been clearly the wrong way to achieve this
09:09 mareko: actually since 2018 even
09:10 MrCooper: so you've had 6 years to get in touch with kernel scheduler developers
09:10 mareko: I have, trust me
09:10 mareko: well, AMD ones, that is
09:31 MrCooper: mareko: did you propose to them adding UAPI to tell the kernel scheduler which threads should be kept closely together?
09:32 MrCooper: that might result in even better performance than what Mesa does now, which fights the scheduler and lags behind it
09:49 mareko: they wouldn't even talk about doing anything about the situation
09:50 mareko: it doesn't fight the scheduler, in fact, it complements the scheduler very well and gives the scheduler a lot of freedom because the affinity masks have up to 8 bits set
09:50 mareko: 16 bits, actually
09:57 mareko: apps are completely unaffected by the current behavior because Mesa doesn't pin threads, it constantly moves them where app threads are scheduled
09:58 mareko: and it only moves its own threads, not app threads
09:59 mareko: a kernel scheduler wouldn't be more app-friendly than that
10:02 mareko: airlied: ^^
10:06 mareko: if an app pins its GL context to an L3 cache, Mesa is nice and moves its threads under that L3 cache too, it's the nicest thing Mesa could do for apps :)
10:11 lynxeye: mareko: The kernel scheduler could proactively move the driver thread to the same L3 cache before the next wakeup when it moves the app thread if it has the information that app and driver thread are closely cooperating.
10:15 lynxeye: it could even move the driver thread closer to the app thread (like same l2) if cpu capacity allows. Also L3 cache isn't the only thing to consider, on a multi-cluster setup like many modern ARM systems you want the threads on the same cluster. The kernel already has all the required topology information, Mesa would need to learn this information for each system.
10:24 MrCooper: lynxeye++
10:25 MrCooper: also, Mesa moving its threads might make the scheduler want to move the application thread somewhere else again in some cases
10:25 MrCooper: that's fighting, not cooperation
11:36 mareko: MrCooper: some of your statements are correct, but you need to choose better words and remove personal biases
11:37 MrCooper: take a look in the mirror
11:37 mareko: now you're just being an asshole
11:37 MrCooper: I could say the same about you
11:38 mareko: I try to use logic, not personal opinions
11:38 MrCooper: not a good word though
11:38 MrCooper: that makes two of us then
11:38 mareko: airlied: you owe me an apology for calling it "app hostile" while it's arguably friendly to apps and that you linked an unrelated ticket
11:39 mareko: a kernel solution would be better, but not by much
11:40 mareko: it wouldn't be very different from what's being done now
11:48 MrCooper: he doesn't owe you anything, certainly not for any of that
11:49 MrCooper: lynxeye explained the difference well
11:56 pq: mareko, please, be nicer than this.
12:04 mareko: MrCooper: there is nothing "app hostile" in Mesa right now, that was undeserved
12:06 mareko: MrCooper: what lynxeye said is correct, but it wouldn't conflict with the current Mesa code, in fact, it would actually work very well with it because it would cause the Mesa code to have no effect
12:07 mareko: I'm not seeing here what you don't like about it
12:09 mareko: yes, we agree that the kernel should do it instead, but it would be more of the same thing, that's cooperation
12:12 pq: mareko, the discussion seemed be about the pinning you proposed to *add* into Mesa, and not about what Mesa already does.
12:12 pq: mostly
12:13 pq: I seems to me these two different topics are being confused.
12:21 mareko: pq: that seems true for most people, but MrCooper really doesn't like what Mesa currently does and I have a reason to believe that because it's a continuation of our heated debate from 2018
12:23 pq: alright
12:24 pq: most of the discussion here did look like it was about adding even more CPU pinning, though
12:25 mareko: yes, that is being considered
12:28 mareko: pinning of the app thread wouldn't be friendly, but pinning and moving Mesa threads is only Mesa's business, nobody else's in userspace
12:31 pq: I'm not sure... some cgroup CPU load balancer might very well want to pin threads differently
12:34 mareko: hiding cores from the process will work, but as long as they are visible and the affinity API allows anything and Mesa created those threads, it can do whatever
12:36 mareko: so if you want something different, lower the privileges of the process
12:37 pq: that seems very inconvenient, and I wouldn't know how to do that
12:56 mupuf: robclark: your subscription to igt-dev was removed
13:51 robclark: mupuf: gah... why do we even still use email list for igt?
13:54 pinchartl: robclark: I didn't know you preferred usenet over e-mail
13:57 robclark: heh, well email is getting less and less useful.. not sure about switching to usenet but gitlab would seem like a no-brainer for igt
14:00 mupuf: robclark: yeah, Intel dropped the ball on this..
14:01 mupuf: robclark: yeah, Intel dropped the ball on this...
14:29 any1: Maybe we can vote on what to call the color format property here? Some candidates are "color format" and "force color format".
14:30 any1: sima, pq, emersion, swick[m]: ^
14:41 mripard: color format seems like the most consistent choice
14:55 pq: looking at CTA-861-H, I think color format is the right wording
14:57 pq: hmm, https://drmdb.emersion.fr/properties/3233857728/Output%20format
14:57 pq: implemented only by vc4, wonder what it does
14:58 emersion: on Linux 6.1.0-rpi6-rpi-2712
14:58 pq: so might be downstream? deleted?
14:58 emersion: maybe grep the kernel?
14:59 emersion: it might be a downstream patch, yeah
14:59 pq: are prop names guaranteed to be found in the source as-is verbatim?
15:00 emersion: would be quite confusing if not
15:00 pq: well, yes, it would :-)
15:00 emersion: the kernel tries to never split log messages on multiple lines to make them greppable, for instance
15:01 ccr: matching via md5 sum or rot-13
15:01 pq: and they would be in drivers/gpu/drm, not in headers?
15:03 pq: I can't find 'Output format' in 6.7
15:03 emersion: that's what i'd expect
15:04 mripard: https://github.com/raspberrypi/linux/commit/7b66d84478c9dec27ca47c9fc0644e524c536570
15:04 mripard: it's downstream
15:05 pq: "Color format", "Color Format", "color format", or perhaps colour? or the screaming all-caps version
15:06 pq: "hdmi_output_format" is another downstream version, it seems
15:06 mripard: ALL HAILS THE COLORS
15:07 pq: and "color_format"
15:08 emersion: extra points if you manage to add a new case style
15:08 pq: ha, "color_format" and "hdmi_output_format" are both from rockchip downstream
15:08 pq: middle-case
15:08 pq: ...not
15:08 emersion: i think we're missing camelCase
15:09 emersion: ofc we're missing emoji case as well
15:10 emersion: and zero-width-space-case
15:10 pq: any1, I'd say, pick capitalization/spelling of "color format" you like, as long as it's not found in https://drmdb.emersion.fr/properties?object-type=3233857728 (just to be paranoid).
15:10 pq: *pick any
15:11 emersion:shrugs
15:11 emersion: if there's a conflict, it's on downstream
15:11 any1: pq: hehe, good point.
15:11 pq: any1, I wouldn't include "force" in it, because "force auto" would be odd, right? And props are always force anyway.
15:11 any1: emersion: Yeah, but there's no point in making their lives difficult. ;)
15:12 emersion: yeah, if it doesn't make the name ugly
15:12 emersion: i suppose it's a good thing the naming scheme is the wild west in the end
15:12 pq: I think all the existing downstream ones are ugly enough, "color" is not popular
15:13 bl4ckb0ne: what about camelsnake_Case
15:14 emersion: if there is a conflict just pick "сolor format" (yes c ≠ с)
15:14 pq: that's mean
15:14 emersion: i love UTF-8
15:15 bl4ckb0ne: thats just vile
15:15 emersion: bl4ckb0ne: so far we've managed to not be inconsistent inside a prop for the case style
15:15 bl4ckb0ne: ship it
15:15 emersion: that's the new milestone i suppose
15:26 any1: Thanks people, looks like we'll be calling it "цолор формат"
15:26 ccr: :P
15:27 ccr: best to use unicode smileys
15:27 ccr: emojis
15:28 emersion: "🎨⚙️"
15:28 bl4ckb0ne: just call it colour fourmat
15:32 any1: We also haven't excausted names of famous painters like Michalangelo, Rembrandt and Bob Ross.
15:34 pinchartl: robclark: there's an ongoing trend to care less and less about @gmail.com for mailing lists. not sure if that applies to @google.com too :-)
15:35 pinchartl: Konstantin has been telling people that they should switch to a different e-mail provider to subscribe to mailing lists
15:37 mripard: lei is also a great solution
15:37 robclark: switching to a new email provider is work and probably expense for me.. I'm more likely to not bother since more of the stuff I care about is gitlab these days
15:37 mripard: (and it's free!)
15:38 pinchartl: lei is really good, yes
15:39 pinchartl: robclark: if it works for you. you will be missed :-)
15:39 ccr: gmail is apparently getting rather annoying due to their spam filtering/refusal to talk with mail servers on some seemingly arbitrary set of rules
15:40 robclark: well, I mean mesa shifted off of email, so that mainly leaves dri-devel... and well, igt.. but igt is already hosted at gitlab
15:41 robclark: yeah, I mean gmail can be annoying.. but email is pita for workflow.. we might as well be faxing patches
15:46 mripard: deal
16:29 Company: if glTexImage2D(..., GL_RGBA8, GL_BGRA, ...) is accepted by Mesa's GLES - is that a bug that I should file?
16:30 Company: because nvidia rejects it, and afaics nobody updated https://registry.khronos.org/OpenGL/extensions/EXT/EXT_texture_format_BGRA8888.txt to something more modern than GLES 2
16:35 jenatali: I recall seeing a Mesa extension for something along those lines
16:36 Company: I like that Mesa is not so annoying and allows the GL formats with GLES
16:36 Company: but I don't like that when I code against Mesa, I can end up with code that doesn't work with nvidia
16:37 Company: so I'm never quite sure what the best approach is
16:37 Company: and of course, I might have missed yet another extension somewhere
16:42 daniels: yeah, those extensions have some holes in them wrt ES3's sized internal formats
16:42 daniels: we need another extension to cover the gap
16:49 daniels: I discovered that in https://gitlab.freedesktop.org/wayland/weston/-/merge_requests/1382
17:13 Company: daniels: is that only BGRA or is that something more common?
17:14 daniels: Company: only BGRA8
17:14 Company: I'm wondering if it's good enough to add an if (BGRA && is_gles()) internal_format = GL_BGRA;
17:14 daniels: yeah ... I decided earlier today that I was just going to do that
17:16 daniels: because even when we do get the new extension, we still need to run on systems without it
17:17 Company: yeah, a new extension is not gonna help
17:17 gfxstrand: Ah, Arm fixed-rate compression. That isn't scary at all. :sweat:
17:18 daniels: gfxstrand: it's not scary in and of itself; what's scary is that it's actually required
17:18 gfxstrand: Wait, what?!?
17:20 daniels: gfxstrand: RAM has somehow got rather more expensive over time
17:21 daniels: so we have at least one client who needs AFRC because their price point dictates very little RAM, but the feature demands dictate large enough buffers that you need lossy compression
17:21 gfxstrand: *sigh*
17:23 gfxstrand: I mean, I'm all for reducing waste but, uh...
17:23 gfxstrand: How good are the compression rates? I guess it's probably better to do lossy compression than just throw 565 at it and call it a day.
17:24 gfxstrand: For as much as I kinda hate it, lossy render compression is a really neet tiler trick.
17:24 RSpliet: "visually lossless", according to their marketeers
17:25 gfxstrand: Well, yes. Everything's always "visually lossless". That's what they say about all those H.265 artifacts on cable TV, too. :-P
17:25 RSpliet: compression up to 4:1 apparently
17:26 Company: AFRC is configurable apparently: https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VkImageCompressionFixedRateFlagBitsEXT.html
17:27 Company: I think that's my new favorite Vulkan definition
17:29 daniels: it's not just that it's configurable, it's that you're _required_ to configure it to use it, rather than it just occurring without you wanting it, which is definitely a good thing
17:29 daniels: (AFBC, which is actually lossless, given that it does the CCS trick of only saving memory bandwidth rather than space, just happens automatically)
17:30 daniels: I haven't done much intensive staring at it, but I'm told that you do really notice AFRC visually when you push it down to 2bpc :P
17:30 Company: AFRC is just cutting off bits, no?
17:31 MrCooper: man, CMake sure is sloooow compared to meson
17:31 daniels: Company: it's far smarter than just decimating precision
17:32 Company: is there a description of how it works?
17:32 daniels: the only public sources I know of are patents, and I don't read patents as a rule
17:34 RSpliet: I imagine they pulled some tricks from texture compression algorithms. Well, not ASTC of course, the amount of computation you need for compressing that is bonkers, but simpler algorithms
17:34 Company: then I'll doubt the "far smarter" part for now
17:36 Company: the biggest innovation in there seems to be the guaranteed size reduction
17:36 RSpliet: https://en.wikipedia.org/wiki/Ericsson_Texture_Compression probably gets you a long way. Not sure what that means for image quality, but I'd assume it's tricks like that
17:42 Company: that'd be the typical kind of lossy compression you want for video/photographs
17:42 Company: that kills your desktop UIs and text
18:11 CounterPillow: the video/photo compression mostly kills your UIs and text due to chroma subsampling, which this likely doesn't do
18:53 Company: CounterPillow: all kinds of subsampling kill it, because you lose straight lines
18:54 Company: desktop UIs and fonts are absolute fans of using 1px wide lines - horizontal and vertical, too
19:19 JoshuaAshton: Has there ever been discussion about code that allowed drm_sched jobs to be submitted with timeouts lower than `sched->timeout`? Anyone aware if this has come up before?
19:19 JoshuaAshton: Right now on AMDGPU, the timeouts are like 10s for gfx and 60s for compute
19:19 JoshuaAshton: Which is pretty extreme for Desktop
19:19 JoshuaAshton: Windows has 2s for the default TDR for applications
19:19 JoshuaAshton: It would be nice to be able to control this on a per-job granularity I think, so maybe long running applications can request up to those amounts, but regular desktop applications would get eg. 2s
19:21 sima: JoshuaAshton, yeah those are way too long, but also they're kinda way too long because people want to run compute jobs on there and then they randomly fail
19:22 sima: the issue is also that without preemption if you have anything that actually takes too long your desktop freezes anyway, even if all the desktop stuff finishes rendering quickly
19:22 sima: it's all a bit frustrating big time suck :-/
19:22 sima: also compute jobs = some cts stuff iirc
19:22 JoshuaAshton: Yeah, amdgpu has 60s for compute.
19:22 JoshuaAshton: Hmmm
19:22 sima: iirc they bumped it from the 10s for other engines
19:22 JoshuaAshton: yea
19:23 sima: anholt iirc set something a _lot_ more reasonable for vc4/v3d
19:23 JoshuaAshton: pixelcluster and I have been doing a bunch of stuff to make GPU recovery actually work and useful on Deck and AMD in general
19:23 JoshuaAshton: one of the things I am seeing now that it actually *works* is that the timeout is a bit long for good-ish UX here
19:23 sima: yeah I think for desktop you probably want below 1s or so and actually recover
19:24 sima: ofc assumes your recovery is reliable enough
19:24 pixelcluster: I'd disagree
19:24 JoshuaAshton: below 1s is probably a bit much
19:24 pixelcluster: often games submit a bunch of stuff in bulk and it happens that frametimes spike over 1s
19:24 sima: hm yeah I guess for games where it's the only thing more is ok
19:25 sima: but apps randomly stalling your desktop for multiple seconds is rather rough
19:25 JoshuaAshton: yea
19:25 JoshuaAshton: 60s for compute is way too much on Deck. I think I will at least put the launch param to be 10s to be equal with gfx
19:25 pixelcluster: tried to work once with a raytracing app running in the background, 600ms per frame consistently
19:25 pixelcluster: can't recommend
19:25 sima: I guess for games the timeout + recovery needs to be quicker than what it takes the user to reach the reset button
19:26 JoshuaAshton: Maybe 5s would be a good compromise for Deck pixelcluster?
19:26 JoshuaAshton: Then we leave compute at 10
19:26 sima: JoshuaAshton, there's also the issue that the timeout is per job and you probably just want an overall frame timemout
19:26 pixelcluster: sounds good to me
19:26 pixelcluster: (5s timeout that is)
19:27 JoshuaAshton: sima: Yeah, it could be lots of really long small submissions
19:27 JoshuaAshton: Old Chrome used to do lots of small submissions e-e
19:27 JoshuaAshton: I guess my problem is less with games taking a long time, but actual hang detection
19:28 sima: JoshuaAshton, yeah a priviledge "kill this" ioctl on a fence might be useful for that
19:28 JoshuaAshton: I didn't consider also handling this from userspace hmmm
19:28 sima: then compositor or whatever can decide when it's getting bad and force everything in that app to reset/recover
19:29 JoshuaAshton: That's interesting
19:29 sima: JoshuaAshton, aside from the issue of overall frame time (which the kernel doesn't really know) it also solves the issue that you want much shorter timeout for desktop than single full screen app
19:30 sima: like the steam ui should probably not be stuck for 5s, but the game might have stalling frames and you don't want to kill it
19:34 sima: hm ... getting the priviledge model for that ioctl right might be really tricky
19:34 sima: or at least I don't have any good ideas
19:35 JoshuaAshton: Yeah
19:36 Lynne: JoshuaAshton: with your pr + kernel changes, what kind of hang can be recovered from?
19:36 JoshuaAshton: Another option is letting the submit ioctl pick the timeout (obv that will be min'ed with that of the drm_sched job)
19:37 JoshuaAshton: Lynne: Both soft recovery + MODE2 has been tested to be successful and not stomp others (although MODE2 may take one or two innocent contexts if they had stuff in CUs at the time)
19:38 Lynne: is an out of bounds read (e.g. from BDA) recoverable?
19:40 JoshuaAshton: Yes, page faults should be fully recoverable, even with soft recovery, but you'll need some extra kernel patches to workaround what appears to be a bug with interrupt handling buffer overflows on RDNA2 (and maybe other gens)
19:40 JoshuaAshton: There is also another kernel patch to fix soft recovery hang reporting
19:40 JoshuaAshton: https://share.froggi.es/0dc1820ac132 <-- patches
19:41 Lynne: holy shit that's massive
19:48 JoshuaAshton: You'll need both my MRs for it to work for both vk + gl also
20:56 Plagman: JoshuaAshton: handling it from userspace is more or less what the point of the interface we added was
20:57 Plagman: the two udev rules are like a super minimal/naive implementation of it, but the idea was always that it could be fleshed out further as a bunch of logic in the compositor or session manager to decide what to kill and within what timeframes
20:57 Plagman: and how to message the user
20:57 Plagman: i'd like to see that convo revisited upstream, last time i think vetter took objection to the existence of such an interface, and it got amalgamated with kdevcoredump, which is not at all the same thing
22:29 JoshuaAshton: Plagman: I think this (picking the TDR timeout time) and that are separate, but yes I think we should re-visit that
22:29 Plagman: yes
22:30 Plagman: agreed
22:30 JoshuaAshton: It would be nice for us to get feedback on an app hang so we can display a nice modal with "Oopsie woopsie" or whatever, and also for getting good feedback in Steam as to what apps are hanging for users.
22:41 zmike: can I just say that "oopsie woopsie" gets my vote
23:02 karolherbst: same