00:02Company: if somebody comes with some closed source game where the Vulkan renderer performs much worse on some GPU than the GL renderer, how would you guys debug why?
00:03HdkR: profiling
00:03Company: oh yeah: I'm assuming the GPU load is different
00:03HdkR: Profile CPU and GPU time, see where the bottleneck is. Then make it go faster
00:03Company: what GPU profiling tools do you use?
00:05airlied: Company: do you have evidence the games vulkan renderer is faster on other drivers?
00:06airlied: but yeah probably try and narrow down a frame that is slower and use renderdoc
00:07Company: the question is a big vague because what I'm trying to do is understand better how GPUs work by learning about what parts of my code could be improved
00:07Company: to be faster on GPUs
00:08Company: in concrete: I've been working on my own (Vulkan) renderer for GTK, mostly for experimenting and prototyping fancy features
00:08Company: and then there's GTK's GL renderer, which is solid, fast, but works entirely different than mine
00:09Company: and on this particular test, mine is 3x slower (200 vs 650fps)
00:09Company: but it's only 50% CPU load
00:09Company: and now I'm trying to understand what I do different that might take 3x longer on the GPU
00:10Company: am I batching wrong? Are my memory barriers wrong? Should I order stuff differently, stuff like that
00:10Company: or is it just that the Intel Vulkan stuff is slower than the GL stuff for what GTK is doing?
00:11airlied: you are probably at the point where getting another gpu involved might be useful
00:11airlied: not sure renderdoc would work with gtk though, it's pretty frame based
00:12Company: it does (well, apart from me using fancy new Vulkan extensions atm)
00:12Company: which might also contribute to the problem
00:12airlied: so you'd probably want to use renderdoc to get some timings from that test, another option might be to see how it works on zink
00:12Company: zink is at 300fps, I tried that
00:13airlied: but yeah getting comparison timings from another driver platform (radv/nvidia) might be another tactic
00:14Company: I should do that - I work on my laptop usually, but have an AMD gaming desktop right here
00:15Company: renderdoc is not in Fedora, right?
00:16airlied: not that I know off
00:16Company: so you guys compile it from git?
00:18Company: I did that and it works fine, but I'm wondeirng if there's config flags or something I should know about
00:18Company: and distro packages would usually have them turned on already
00:19zmike: always build renderdoc from git
00:21airlied: yeah I think that is the advice I normally get, always build from git
00:24cornelius: So I'm a fedora user who's trying to diagnose and issue:
00:24cornelius: I'm having an issue with what I think it the latest amdgpu drivers from Mesa. I'm using SteamVR to run my Valve Index and it has an issue where if the headset disconnects the system hard locks up. Like I can't use ctrl+alt+f3/4/5/etc to switch to tty. Even just typing in my login information, as if the display just isn't working does not work.
00:24cornelius: I also have a similar issue with OBS, using it's replay feature to let me record the last five minutes, causing crashes too. I occasionally get problem reports from ABRT reporter. 9 times out of 10 the "reason" listed will be:
00:24cornelius: WARNING: CPU: 6 PID: 5583 at drivers/gpu/drm/drm_vblank.c:728 drm_crtc_vblank_helper_get_vblank_timestamp_internal+0x28c/0x3a0
00:25cornelius: or something similar
00:25cornelius: My question is is how do I debug this further? I can't access the tail end of dmesg or journalctl when my system hard locks because... well... it's hard locked :/ so I'm at a loss as to where to go from here to fix this
00:25cornelius: also my version of mesa is 23.1.3-1.fc38 and llvm version 16.0.5-1.fc38
00:25airlied: seems likely to be a kernel bug
00:25cornelius: That could very well be, Fedora is pretty agressive with its versions of the kernel
00:26airlied: have you something you can ssh in from, and leave dmesg -w running, you might get more backtrace
00:27cornelius: That's a smart idea, yeah I have a few spare computers I could use to SSH into this one. I'd have to setup SSH on this machine though, that'd take me a minute to figure out
00:28airlied: https://gitlab.freedesktop.org/drm/amd/-/issues is probably a place to start filing or looking for issues related
00:28airlied: but trying some older kernels might help (like a 6.2.x if you are on 6.3.x)
00:29cornelius: I had a similar issue on Nobara with, iirc, 6.2.x, so I'd probably start with something older first
00:30cornelius: the OBS issue is really inconsistant so that one is really hard to debug, there's really no rhyme or reason, and I've even tried using alternative plugins for VAAPI encoding but ended with the same results
00:35haagch: journalctl -k -b -1 shows the kernel log from last boot
00:36cornelius: @haagch that's a huge help, thanks!
06:46arnd: tzimmermann: I've dug through the history of screen_info across architectures some more and come up with a cleanup: https://pastebin.com/raw/S2h26jPp
06:47arnd: I found that on the older architectures (alpha, mips, sparc, ...), it was originally set because it was required by the tty code, but that all moved into vga_console over time, so we can remove a bunch of them completely
06:49arnd: I also isolated the platforms that are actually able to use vga_con to the ones that both initialize conswitchp=&vga_con and allow configuring CONFIG_VGA_CONSOLE, so that's now a much smaller opt-in list instead of a long and incorrect opt-out list
06:50arnd: the arm screen_info usage in dummycon is also purely related to vgacon as far as I can tell, and this is now only used on the ancient footbridge/netwinder machine, which we'll probably remove in a few years
06:51arnd: a couple of other architectures copied the 'if DUMMY_CONSOLE || VGA_CONSOLE' check from arm, but they don't actually need that
06:55tzimmermann: arnd, amazing
06:55tzimmermann: thank you
06:56tzimmermann: could you turn that into a patchset?
06:56tzimmermann: i'd rebase my patches on top of that (if they'd still be useful)
06:57tzimmermann: i also want to go through drivers to limit the use of the global screen_info
06:58arnd: tzimmermann: sure, I can do that. I didn't want to get in your way, but if you want to rebase on top of my changes that works.
06:59arnd: is there anything in drm-next that's not in linux-next but might conflict with this?
07:00tzimmermann: i think the recent work on fb_pgprotect and the fb I/O helpers is not yet in upstream, but probably in drm-next
07:00tzimmermann: or it enters upstream with 6.5
07:00tzimmermann: but those conflicts should be minimal, if any
07:03tzimmermann: i intend to merge the include-cleanups for screen_info.h today. shouldn't conflict much
07:20MrCooper: zmike: alpha may be != 1.0 in other parts of the texture image (glamor handles the 1.0 everywhere case differently), it just needs to end up as 1.0 when loading data in a specific case
07:24MrCooper: anyway, the workaround I did seems cheap, I was just surprised it was necessary
07:31emersion: swick[m]: do you know if there's any progress wrt KMS backlight?
10:45swick[m]: emersion: hans de goede did some changes around loading order and preference but no KMS property stuff yet AFAIK
10:46swick[m]: and I'm going to look at intel hdr backlight stuff as soon as I receive a specific laptop
12:41bbrezillon: dakr: I pushed the 2 fixup commits I have here https://gitlab.freedesktop.org/bbrezillon/linux/-/commits/panthor
12:42bbrezillon: https://gitlab.freedesktop.org/bbrezillon/linux/-/commit/fae4618ecf87344cf4cd52d38ba28b50120610b1
12:42bbrezillon: https://gitlab.freedesktop.org/bbrezillon/linux/-/commit/d56d3261c46a68d28ff62a62b6bdc1941017a79a
12:42dakr: bbrezillon: That's convenient, thanks a lot!
12:42bbrezillon: dakr: np
12:51alyssa: bbrezillon: why the panthor name btw? :D
12:52alyssa: is Thor G610? I forget the codenames
12:52alyssa:does vaguely recall a tTHx
12:53robmur01: Thor was T620 :D
12:53alyssa: oh god was T620 CSF based?
12:53alyssa: is that why it never worked for me?
12:53alyssa: :~P
12:53robmur01: enjoy your nightmare tonight...
12:54alyssa: jokes on you i wasn't going to be able to sleep anyway /s
12:54alyssa: :P
12:58bbrezillon: alyssa: dunno, I guess we were searching for a more inventive name that was still keeping refs to the northen mythology. That's the result of a long brainstorming, and I don't want to do it again :-)
12:58alyssa: haha fair enough
12:59zmike: MrCooper: what did you end up doing?
12:59bbrezillon: so let's pretend T620/Thor never existed, please
12:59alyssa: bbrezillon: i already was ;)
13:00alyssa: bbrezillon: I guess https://cgit.freedesktop.org/drm-misc/tree/MAINTAINERS#n1693 should go..
13:01daniels: alyssa: you didn't keep a copy of codenames.txt?
13:05alyssa: daniels: was on my work laptop :p
13:06alyssa: ("What about the codenames you memorized?" "Only covers late Midgard, Bifrost, and early Valhall")
13:06alyssa: ...Concerningly long list
13:43javierm: arnd: great, I've just reviewed your series too. I'll rebase the FB_CORE series on top once tzimmermann and your patches land
14:19MrCooper: zmike: https://gitlab.freedesktop.org/xorg/xserver/-/merge_requests/1131/diffs?commit_id=682b3a7d12659d2b046ff838cda7da3eeacecd77
14:21zmike: ah so you just set 1.0 in the src data
14:22MrCooper: yep
14:22zmike: I was thinking there was some reason you couldn't do that
14:22zmike: but I guess I was overthinking
14:23MrCooper: I even use separate malloc'd temp memory, just in case
14:24MrCooper: I guess it would suck if it blows out the CPU cache, but /shrug
15:29MrCooper: cool idea, KMS drivers could transparently do black frame insertion: https://gitlab.gnome.org/GNOME/mutter/-/issues/2895
15:56Company: MrCooper: wouldn't you want that per client?
15:56Company: or do other clients not mind that insertion being triggered?
15:57MrCooper: not sure what you mean; it would mean the display effectively runs at half the refresh rate, with every other physical refresh cycle being all black
15:58Company: I'm wondering if the app should just submit a black frame every other frame
15:58MrCooper: (or possibly even multiple black cycles between non-black ones)
15:58Company: so that the app has black frames but the rest of the windows don't
15:59MrCooper: as discussed in the issue, that's not as reliable as doing it in the KMS driver or at least the compositor
15:59Company: yeah, that's true
15:59MrCooper: it also wouldn't do anything for the compositor's own drawing
15:59Company: but it allows finer-grained control
16:00MrCooper: if it fails to reliably hit every physical refresh cycle as intended, it'll look pretty bad
16:02Company: xdg_black_frame_this_surface_plz
16:02Company: :)
16:03MrCooper: hehe
16:07Company: that ufo test is fun
16:08Company: my hdr monitor looks terrible, my 10 year old one on WIndows looks okay - because Linux can't do vsync in Firefox I guess?
16:10Company: https://i.imgur.com/j7HnVW7.jpg
16:13MrCooper: eric_engestrom: BTW, '.*.' in a regex is equivalent to just '.*'
16:14eric_engestrom:is missing the context
16:14DavidHeidelberg[m]: robclark: Hey! Is this useful for you to see the problem behind? https://gitlab.freedesktop.org/mesa/mesa/-/issues/9247#note_1982390
16:14eric_engestrom: MrCooper: is this about one of the skips/flakes fixes?
16:14MrCooper: Company: works pretty well for me in Firefox (with the native Wayland backend)
16:15MrCooper: eric_engestrom: yep, those fixes left a bunch of '.*.'
16:15eric_engestrom: ah my bad, the trailing `.` is definitely a mistake
16:15eric_engestrom:checks
16:16MrCooper: well, it's not 100% equivalent in general, but presumably it is for those entries
16:16eric_engestrom: yeah it would be because the test name has at least 1 char
16:17jenatali: Ugh... this is why I hate it when CI gets disabled... https://gitlab.freedesktop.org/mesa/mesa/-/jobs/45088776
16:17eric_engestrom: but I don't find any `.*.`?
16:17MrCooper: e.g. src/gallium/drivers/zink/ci/zink-anv-tgl-flakes.txt
16:17eric_engestrom: MrCooper: oh, do you mean in the middle of tests, like `dEQP-VK.foo.*.bar.*` ?
16:18MrCooper: yep
16:18robclark: DavidHeidelberg[m]: tbh I've no idea how kernel config change could cause *that*.. is it only turnip/vk fails with the different kernel config?
16:18eric_engestrom: ah right, yeah those I just ignore becuase it makes no difference and I feel like it's easier to read
16:19MrCooper: unless it puts the brain in glob mode :)
16:19eric_engestrom: fair point, actually
16:19MrCooper: not a big deal, anyway
16:19DavidHeidelberg[m]: robclark: yes, only turnip, nothing else
16:19eric_engestrom: also, anholt does anyone really use the regex-ness of these, or should we make them globs instead?
16:19DavidHeidelberg[m]: I think maybe it's just something a bit different triggering "existing" bug
16:20MrCooper: just some random Friday beer'o'clock musings
16:20eric_engestrom: indeed, and good point about the time I should go ^^
16:23MrCooper: alternatively, could keep the explicit dots as '\.', that would prevent accidentally matching something the glob pattern wouldn't have
16:24DavidHeidelberg[m]: robclark: because the bug pop up also with kernel 6.3.4 in MesaCI, also with 6.3.1 in gfx-ci/linux build, where 6.3.1 work in MesaCI without this issue
16:26eric_engestrom: MrCooper: yeah, that's the problem, we need to escape all these dots everywhere every time we move these lines from fails to flakes/skips, but since we don't we're left with regex `.` everywhere :/
16:27robclark: DavidHeidelberg[m], maybe it could be bisected? Or maybe danylo or anholt has an idea? (But seems like danylo is only in #freedreno)
16:27Guest4888: I'm here
16:28robclark: oh, heh but your nick is wonky
16:28DavidHeidelberg[m]: robclark: my guess is the bug is (probably) present in 6.3, timing/barriers?
16:29DavidHeidelberg[m]: maybe I could try compile kernel with Os or something to slow it down a bit if it disappears
16:29Guest4888: Did anyone run these tests with ASAN/UBSAN?
16:37robclark: danylo, yeah there was link to asan run on the gitlab issue
16:39Guest4888: from their look they should be reproducible
16:48eric_engestrom: Guest4888: `/nick danylo` to change your nick :)
16:57Company: MrCooper: that driver/kernel/manufacturer whining in https://gitlab.gnome.org/GNOME/mutter/-/issues/2895 - is that a common thing in those communities?
17:00jenatali: CI appreciation: re-enabling Windows CI found 3 regressions that snuck in while it was off...
17:00jenatali: Honestly pretty frustrated that it was turned off due to issues that were expected to auto-resolve and then it didn't get turned back on
17:01jenatali: David Heidelberg: Can I get an ack on https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24042?
17:05daniels: jenatali: yeah, I was actually just talking with David (not even prompted by this) about how we need something better for tracking 'temporarily' disabled jobs
17:05jenatali: +1
17:06DavidHeidelberg[m]: jenatali: on a phone, ack for reenable
17:19Hazematman1: Hey, has any one else dealt with issues related to format promotion in gallium drivers with renderbuffers? O
17:21Hazematman1: I'm trying to fix a bug with fbo-blending-formats for RGB16F textures, and the root of the issue is the format is promoted to RGBA16F that the driver does blending behind the scenes on garbage data if the blending mode using DST_ALPHA. I've tried to fix it by "faking" support for RGB16F textures and then trying to work around them in the driver, but ran into a lot of issues with mesa st treating the data in the texture as RGB16F instead
17:21Hazematman1: of RGBA16F
17:22Hazematman1: I'm wondering if any other drivers have ran into this issue and how they fixed them. I'm tempted to expose the original texture format when a pipe_resource is created, just so the driver can handle things behind the scenes and mask the alpha channel properly. But that would be a mesa level change instead of a just a change to the driver in gallium :/
17:29zmike: which driver is this
17:32zmike: you probably need PIPE_CAP_RGB_OVERRIDE_DST_ALPHA_BLEND
17:33Hazematman1: zmike: thanks, i'll check that out! This is on v3d
17:38alyssa: zmike: why is this a CAP again and not just the default
17:39zmike: look I don't know all the answers
17:39zmike: I only know like 1-2 answers tops
17:40alyssa: because it depends on CAP_INDEP_BLEND_FUNC apparently
17:40karolherbst: some caps are really a mistake
17:41karolherbst: we should probably remove all which are 1/0 and are tight to a callback being provided.. I think we have a couple of those around
17:42zmike: I never understood why that was a thing
17:42karolherbst: same
17:42karolherbst: at least there is a plan to ditch all compute caps :D
17:42zmike: compute caps are one of those things I did at some point and pray I never have to touch again
17:43karolherbst: you'll love https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23870
17:43zmike: no, I still hate it
17:43karolherbst: why though?
17:44zmike: now I gotta review more code
17:45karolherbst: heh
17:53Hazematman1: zmike: that fixed it, thanks! Wasted so much time trying to fix this the hard way before and turns out its a one line fix :P
18:31zmike: pipe caps in a nutshell
18:33HdkR: Maybe they should switch to opt-out just to flip the problem on its head :P
18:34austriancoder: what does nir_texop_txs returns at each component?
18:34HdkR: Won't solve anything of course, but a different perspective is always nice
18:34austriancoder: can not find anything in the docs
18:35alyssa: austriancoder: width, height, depth/array_size
18:36austriancoder: alyssa: thx
18:36alyssa: matches glsl textureSize
18:36alyssa: gfxstrand: the descriptor set song https://www.youtube.com/watch?v=UQHaGhC7C2E
18:37alyssa: (specifically the refrain)
18:58austriancoder: Passed: 38/38
19:04alyssa: Whoop
19:34Hazematman1: I'm waiting for CI to run on my commit removing PIPE_CAP_RGB_OVERRIDE_DST_ALPHA_BLEND 🤞 Hoping I didn't break every driver in the process :P
19:36alyssa: gl
19:38DavidHeidelberg[m]: jenatali: damn, I did these changes when Win farm was fown
19:38jenatali: These changes?
19:39DavidHeidelberg[m]: The container manual/automazic if marge trigger
19:42DavidHeidelberg[m]: (rules)
20:24tnt: Hi. I'm trying to get rusticl on intel (12th gen iGPU), so I'm doing `RUSTICL_ENABLE=iris clinfo`. But this ends with 'LLVM ERROR: Broken module found, compilation aborted!' (full error output https://pastebin.com/y2tFKiZC ).
20:55penguin42: tnt: I don't know what it means, but the 'attribute does not match module context' sounds like the earlier nasty sounding error
20:56tnt: penguin42: yeah ... but I also have no idea what it means :/
20:59penguin42: tnt: I don't see that error in mesa source, so I guess it upset llvm somehow
21:02tnt: penguin42: https://pastebin.com/5PuwyFxy
21:02tnt: Yeah backtrace seems to say so.
21:03DavidHeidelberg[m]: robclark: I know weekend is coming... thou I revieved a530_gl (added skips for stuff killing GPU) and in one (only one) run I got all UnexpectedPass on this "series of tests": https://gitlab.freedesktop.org/okias/mesa/-/jobs/45107061#L1142 I suspect, that some other parallel test set something right, and then these pass until something else set other value or ... ?
21:03penguin42: tnt: Poor upset llvm
21:04DavidHeidelberg[m]: Why am I bringing this up, that usually one or two tests are fail/pass, but this is much larger "flake series" if I compare to normal
21:09DavidHeidelberg[m]: hmm, happends time to time, always most of the tests in that category: https://gitlab.freedesktop.org/okias/mesa/-/jobs/45108542#L1121
21:13manuel: hello, I want to contribute to mesa 3d, which part of the mesa3d project should i look to study and debug? if i can do it in a 4th gen i3 laptop would be ok
21:14robclark: DavidHeidelberg[m]: those links are 404.. I guess I don't have permissions?
21:15DavidHeidelberg[m]: robclark: that's weird, everyone should have access, daniels (I checked settings, it's for everyone incl. CI/CD)
21:15DavidHeidelberg[m]: meanwhile I did: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9333 robclark
21:24DavidHeidelberg[m]: jenatali: when re-enabling Windows farm, are you ok triggering container build by hand or should I implement check for marge OR make it `always`?
21:24DavidHeidelberg[m]: (ofc only when re-enabling farm, not always like for each run)
21:24jenatali: I think always is fine
21:25DavidHeidelberg[m]: jenatali: ok, only disadvantage will be it'll run twice (once for you locally, second for Marge)
21:25DavidHeidelberg[m]: but it's easiest and cleanest change
21:25jenatali: Yeah but I'd do that anyway
21:25DavidHeidelberg[m]: nice! :) love when problems has simple solutions
21:26jenatali: E.g. I had to add 3 additional changes to get it back working, each time I triggered CI to find the next one / make sure it was fixed
21:29DavidHeidelberg[m]: jenatali: I think this should fix it https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24048
21:30DavidHeidelberg[m]: eventually I could use on_success..
21:31DavidHeidelberg[m]: ... and it's done.
22:07DemiMarie: airlied zmike: has anyone packaged renderdoc for anything?
22:08penguin42: renderdoc/lunar 1.24+dfsg-1build1 amd64
22:08penguin42: Stand-alone graphics debugging tool -- metapackage
22:11DemiMarie: On Intel GPUs, how bad is it to not load the HuC firmware?
22:12DemiMarie: I know that hardware accelerated media will break, but I wonder if anything else will break.
22:12zmike: renderdoc is packaged on most distros, but its development speed usually exceeds its packaging
22:13DemiMarie: zmike: should it be part of Mesa git?
22:13zmike: no
22:13DemiMarie: I’m asking because the HuC handling header parsing smells like a complex binary format being parsed by a proprietary C blob running with high privilege, which is pretty much always a very bad idea.
22:15DemiMarie: This attack surface could easily be a dealbreaker if SR-IOV or virtGPU native contexts are used for isolation, so I am wondering if it is okay to just disable hardware media processing.
22:18DemiMarie: It might well be that the alternative is no GPU virtualization at all, which is even worse, so I am not concerned about the performance impact.
22:27robclark: DemiMarie: the question is what the HuC thing has access too.. like are gpu pgtables visible to it or any other fw or cmdstream (that would be kinda bad) or could the HuC do anything to change what memory is visible to the GPU... if all it has access to is other gpu memory of the same guest process, then that is kinda no concern from security standpoint
22:28DemiMarie: robclark: my understanding is that there is only 1 HuC shared between all guests
22:29robclark: sure.. but each guest process has it's own gpu virtual address space
22:29robclark: the question is how context switches happen
22:29DemiMarie: so the question is then whether the HuC uses only GPU VAs or if it has its own memory
22:30robclark: (I've looked at that quite closely on adreno.. not on intel)
22:30DemiMarie: yes
22:30DemiMarie: what is the answer on adreno?
22:30robclark: yeah
22:31robclark: for adreno, the context switch happens on on kernel controlled ringbuffer (ie. the thing that gets commands "branching" to userspace cmdstream) with some special commands that are no-op if they are in anything other than the kernel ringbuffer
22:33DemiMarie: Why did the scratch buffer overwrite exploit require a microcode fix to patch? Or was Freedreno never vulnerable in the first place?
22:34DemiMarie: My assumption here is that the HuC is a microcontroller with its own RAM that is not context switched by the host, in which case the security concerns would apply.
22:35robclark: so there were a couple issues fixed with that sqe update..
22:36robclark: one was a use of unitialized reg in sqe which with right combination of commands could let userspace cmdstream disabled protected mode (ie. access registers that it shouldn't, including related to context switch
22:37robclark: the other was insufficient memory protections on kernel scratch buffer.. upstream wasn't really vuln to that simply because we hadn't implemented context switching for a6xx
22:37DemiMarie: SQE?
22:39robclark: sqe is the microcontroller thing
23:20DavidHeidelberg[m]: anholt: I looked at my changes to the farm on/off handling and I don't see the possible interference with the nightly manual jobs. I disable/trigger stuff only when .ci-farms-disabled/ is touched, which shouldn't be the case of nightly runs
23:28DavidHeidelberg[m]: nevermind, it's one of the GitLab perks... "A rules: changes job is always added to those pipelines if there is no if that limits the job to branch or merge request pipelines." so I assume "changes: [ .ci-farms-disabled/ ] when: never" is triggered always on scheduled pipelines
23:29DavidHeidelberg[m]: wrong snippet "rules: changes always evaluates to true when there is no Git push event."