00:16 airlied: q66: that's like cacheline dirt, due to something on aarch64 not working like something on x86
00:28 q66: airlied: yeah it seems oddly regular
00:30 q66: on ppc64le things work fine, as they do on x86
00:30 q66: i have a honeycomb lx2 aarch64 machine where i had this gpu and that worked well
00:30 q66: i haven't tested it in a while with the same software as on the altra though
00:30 q66: but this could easily be specific to this board
00:31 q66: some stuff appears more affected than other stuff
00:31 q66: e.g. gnome-shell itself is largely clean
00:34 airlied: generally I think we end up blaming memcpys
00:34 kode54: I have some CCS dirt
00:35 kode54: https://f.losno.co/v/CCS_video_frame_blitting_errors.mp4
00:35 kode54: I'll add it to the Intel Xe issue I started on the Mesa tracker
00:35 kode54: not sure if it's Mesa or the kernel to blame
00:37 q66: airlied: oh, you think my libc's memcpy impl could be at fault?
00:39 airlied: it's usually due to vram being mmap to userspace and something in hw messing that up
00:39 airlied: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3274 that sort of thign
00:41 q66: airlied: ah, musl uses that memcpy impl on aarch64 now
00:41 q66: so https://gist.github.com/jnettlet/f6f8b49bb7c731255c46f541f875f436 should apply
00:41 q66: lemme try getting that in and see what happens
00:41 q66: even though i just turned off the box :p
00:56 q66: airlied: hm guess not
00:56 q66: tried both applying the patch as well as dropping musl's aarch64-assembly memcpy and it's still equally busted
02:10 DUOLabs[m]: Does anyone know how exactly does venus on the guest send Vulkan commands to qemu/virgl on the host?
02:14 airlied: via virtio
02:20 DUOLabs[m]: That doesn
02:20 DUOLabs[m]: That doesn't make things any more clear.
02:22 airlied: not sure what information you need though, the kernel driver talks to qemu via virtio, which is a standard transport mechanism
02:25 DUOLabs[m]: Yes, but looking through qemu's source code, I'm not sure where this takes place: for example, is it when virtio_gpu_simple_process_cmd processes VIRTIO_GPU_CMD_RESOURCE_CREATE_2D, or somewhere else?
02:27 airlied: virgl_cmd_submit_3d
02:28 airlied: though I haven't seen the venus code for qemu
02:32 DUOLabs[m]: I looked there already, but when I traced virgl_renderer_submit_cmd, no ctx passed in every went to venus, but straight to vrend proper, which is why I'm little confused.
02:34 HdkR: It's becoming even more opaque with the whole drm passthrough thing :D
02:35 DUOLabs[m]: s/every/ever/, s/venus/Venus/
02:41 airlied: DRM_IOCTL_VIRTGPU_EXECBUFFER seems to be that path that gets called
02:42 airlied: and that calls VIRTIO_GPU_CMD_SUBMIT_3D
02:44 airlied: and one the qemu side taht seems to go into virgl_cmd_submit_3d
02:45 airlied: but yeah I can't see how that hooks up in virglrenderer
02:49 airlied: DUOLabs[m]: I think the proxy layer steps in somewhere on the renderer side
02:50 airlied: proxy_context_submit_cmd
02:54 DUOLabs[m]: No, that can't be it --- only `vrend_decode_ctx_submit_cmd` is ever called.
03:15 airlied: DUOLabs[m]: do you have a venus supporting qemu?
03:16 DUOLabs[m]: Yes, that's what I'm working on.
03:16 airlied: if VIRGL_RENDERER_CAPSET_VENUS is passed then the proxy should be plugged in
03:17 airlied: when virgl_renderer_context_create_with_flags is called
03:21 DUOLabs[m]: Interesting, virgl_renderer_context_create hardcodes VIRGL2 as the only flag passed to virgl_renderer_context_create_with_flags
03:22 DUOLabs[m]: s/VIRGL2/VIRGL_RENDERER_CAPSET_VIRGL2/
03:23 DUOLabs[m]: However, if I do VIRGL_RENDERER_CAPSET_VIRGL2 | VIRGL_RENDERER_CAPSET_VENUS, everything breaks
03:26 DUOLabs[m]: When I trace virgl_renderer_context_create_with_flags, the capset_id passed in is never equal to VIRGL_RENDERER_CAPSET_VENUS.
03:33 airlied: okay then that seems to be where things go off the rails
07:29 narmstrong: mripard: mlankhorst: tzimmermann: I accidentaly pushed a non-reviewed bindings patch to drm-misc-next, I push a revert https://lore.kernel.org/all/20230526-revert-bad-binding-v1-1-67329ad1bd80@linaro.org/
07:31 tzimmermann: narmstrong, thank you for taking care and informing us
07:34 narmstrong: tzimmermann: sure, waiting for some feedback on the revert and I'll push it on drm-misc-next
07:52 MrCooper: q66: some ARM SOCs have not-fully-compliant PCIe, which means they can't work correctly with amdgpu
08:34 jfalempe: tzimmermann, For mgag200, I can use the GEM DMA helper for mgag200, but it works only on lower resolution, because it can't allocate more than 4MB for the framebuffer. I'm wondering how other drivers are using it, are they also hitting this limitation ?
08:35 tzimmermann: jfalempe, i've seen your response. i'm thinking how to move forward
08:36 tzimmermann: the other drivers with dma helpers are on SoCs. they set aside CMA areas and it apparently works. i'm a bit surprised that x86 is somewhat fragile about this
08:37 jfalempe: I think on x86, most hardware have scatter-gather DMA and can allocate smaller chunks.
08:42 _jannau_: there should be drivers using GEM DMA helpers on SoC behind an IOMMU. those do not need CMA but can still allocate large framebuffers
08:43 tzimmermann: jfalempe, i don't like that the dma code is "hacked into" the damage handling (for the lack of a better description)
08:46 jfalempe: tzimmermann, you mean I should move more code to the mgag200_dma.c, or that it shouldn't be called from the handle_damage() function ?
08:47 jfalempe: damage is where we copy the pixels to the VRAM, so I don't see how I can do that differently.
08:48 tzimmermann: jfalempe, there is mgag200_handle_damage: https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/mgag200/mgag200_mode.c#L432 maybe it should be moved to _dma.c and encapsulate the whole dma-based damage handling
08:48 tzimmermann: all models should use it.
08:49 jfalempe: yes I can move all the handle_damage() logic in _dma.c, that would be cleaner
08:50 tzimmermann: if we ever find a model that cannot use DMA, we could still add a device_func (https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/mgag200/mgag200_drv.h#L252) that handles this model in a different way
08:51 jfalempe: DMA engine is a core part of mgag200, so I think it should work on all models.
08:52 tzimmermann: jfalempe, i'd also put the dma fields into a separate struct mga_dma and init it from each per-model init function (such as https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/mgag200/mgag200_g200.c#L401)
08:54 tzimmermann: maybe the memcpy() can still be avoided with careful use of the SHMEM helpers.
08:56 tzimmermann: if physical SHMEM pages are within the lower 32-bit range, they should be dma-able. with care, they could be flushed and dma-ed directly.
08:56 tzimmermann: that's for a different patchset, though
08:57 jfalempe: hum, they also should be contiguous in physical memory, which may be unlikely.
08:58 jfalempe: my server has 4GB or RAM, so I though all of it should be in lower 32bit range, but in practice, with CMA it has often addresses over the 32bit limit.
09:01 jfalempe: tzimmermann, I will do a v2, trying to address all your comments.
09:01 tzimmermann: jfalempe, great thanks
09:02 sgruszka: tzimmermann: hi, could I get your ACK for my drm-misc commit rights request https://gitlab.freedesktop.org/freedesktop/freedesktop/-/issues/666 ?
09:05 tzimmermann: sgruszka, looks reasonable. done. someone else has to ack as well.
09:06 tzimmermann: maybe mlankhorst or danvet ^
09:06 sgruszka: tzimmermann: thanks!
09:07 sima: tzimmermann, mripard maybe, I generally try to leave this to you all
09:13 kode54: mismanaging my Fastmail account
09:13 kode54: "oops" using before/after ranges to move every archived message into the archive folders
09:13 kode54: resulted in all my categorized folders being emptied
09:14 kode54: so now I'm moving the messages based on the original rules back into their original category folders
09:14 kode54: "to:@vger.kernel.org" - "Moving 450,774 conversations to Linux Kernel"
10:04 doras: jenatali: any suggestion for the name of the Meson option? Should it be "feature" or "boolean"? What should be the default/auto behavior? If it's a "feature" option, we need consider that `--auto-features enabled` is commonly used, so the `enabled` case should reflect the more commonly desired case.
10:04 dj-death: is there a NIR function that tells whether a block/instr can run before another block/instr ?
10:05 dj-death: like in the if () { } else {} you can tell if there is no loop wrapping that the if block will not execute before the else block
10:05 dj-death: those are 2 different paths
10:19 doras: jenatali: my current approach is a "feature" option called `opencl-external-clang-headers`, with its default `auto` behavior being "enabled". I'd rather not tie it to `microsoft-clc` in any way; common code paths changing depending on which users were enabled (and affecting other users) is quite unexpected. It does mean that you'll need to disable it explicitly for the Windows packaging use case (and CI).
10:19 llyyr: mesa vulkan-beta builds have been failing without -Wno-error=missing-prototypes for a while now, is this intended?
10:47 tzimmermann: sravn, if you have the time, i'd appreciate another look at my fbdev I/O patchset https://patchwork.freedesktop.org/series/117672/#rev4
12:04 q66: MrCooper: sure but this is altra, not a random soc
12:05 q66: server cpu
12:05 q66: nvidia gpus work with it at very least
12:06 q66: usually the issue with these socs where things don't work is coherency and in those cases it won't come up at all
12:11 sravn: tzimmermann: Will take a look today or in the weekend. Busy week so far
12:11 tzimmermann: sravn, much appreciated. thanks
12:56 hakzsam: gfxstrand: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23254 any ideas how to properly implement interpolateAtXXX with a builtin as first argument?
12:58 hakzsam: I thought I could lower to load_barycentric_xx in lower_system_values but NIR expect a deref everywhere else
13:12 doras: jenatali: feedback would be appreciated: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23255
14:13 DavidHeidelberg[m]: eric_engestrom: are u having some devices under outage? For example now I see 2 rpi jobs waiting to get triggered, while rest is running
14:13 eric_engestrom: DavidHeidelberg[m]: not that I know
14:13 eric_engestrom: checking
14:13 DavidHeidelberg[m]: so, maybe decrease number jobs by two?
14:14 eric_engestrom: yeah, maybe we need to have tweak the jobs a bit
14:14 DavidHeidelberg[m]: it's already queued for 5 minutes: https://gitlab.freedesktop.org/mesa/mesa/-/jobs/42514568 + https://gitlab.freedesktop.org/mesa/mesa/-/jobs/42514569
14:15 eric_engestrom: jasuarez: ^
14:15 DavidHeidelberg[m]: eric_engestrom: if some jobs going to wait, I would decrease paralellness to avoid setting up new job (while it still increase of job runtime, at least not that much)
14:15 jasuarez: let me check
14:15 DavidHeidelberg[m]: thx! :)
14:16 DavidHeidelberg[m]: anholt: same goes for a630, I see 7 jobs running, 3 waiting. Are there currently 10 devices available?
14:17 eric_engestrom: anholt's on holiday iirc
14:19 DavidHeidelberg[m]: thx. I'll see on the next job, if still 3 devs will wait, I'll drop few tests for now
14:19 DavidHeidelberg[m]: *devs=devices
14:19 jasuarez: David Heidelberg: seems all the devices are already running jobs
14:20 jasuarez: there are 21 rpi4 available
14:21 jasuarez: and now all of busy working in your pipeline :)
14:21 jasuarez: s/of/are
14:22 DavidHeidelberg[m]: ok, so someone had to run some test from different pipeline
14:22 DavidHeidelberg[m]: thanks!
14:23 jasuarez: those jobs are now in run
14:31 DavidHeidelberg[m]: jasuarez: eric_engestrom btw. rpi4 jobs, I see it runs ~ 20 minutes. You need to cut it down to 15, please.
14:32 jasuarez: yes, we need to do some adjustments
14:32 DavidHeidelberg[m]: for LAVA devices it isn't that much pain, since we can do prioritization, but if someone send two pipelines to your farm just as pre-merge starts, it'll take more than 1h to finish
14:33 DavidHeidelberg[m]: with 15 minutes it would be still doable I think
14:33 jasuarez: yeah, totally agree
14:33 DavidHeidelberg[m]: Thank you :)
14:34 jasuarez: np!
15:07 q66: DavidHeidelberg[m]:
15:07 q66: oh shit
15:07 q66: sorry
15:07 q66: i was in mobile irc and accidentally tapped your name
15:08 DavidHeidelberg[m]: np, just read your report on Ampera and radeonsi hour ago
15:08 q66: yeah still no luck with that
15:09 q66: I'll dig around more next week
15:11 q66: i'll also be putting in different RAM but i doubt that'll make a diff
15:33 DUOLabs[m]: <airlied> "okay then that seems to be where..." <- What makes this even more interesting is that `vn_instance_init_experimental_features` returns the wrong data --- all properties are shown as 0, which is not what virglrenderer's source code says.
15:33 DUOLabs[m]: This implies that the command actually fails to be sent, and some default value is shown.
16:00 DUOLabs[m]: This probably means that vn_decode_vkGetVenusExperimentalFeatureData100000MESA_reply fails somewhere, causing instance->experimental to be NULL.
16:31 eric_engestrom: DavidHeidelberg[m]: re: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23258#note_1928632 should I cancel the almost done pipeline to add `--color=always`, or should I make it a follow up MR?
16:32 DavidHeidelberg[m]: eric_engestrom: just keep it and squash it with something else
16:32 DavidHeidelberg[m]: I think we seriously need implement CI: skip tag
16:32 DavidHeidelberg[m]: or maybe CI: skip-HW-test
16:33 DavidHeidelberg[m]: like for one linting line we run 250 devices for 10 minutes under full load. This is fcking ecology disaster (and also we have to wait)
16:33 DavidHeidelberg[m]: ...ehm not 10 minutes but more like 15-20 minutes
16:34 DavidHeidelberg[m]: *ecological
16:35 eric_engestrom: DavidHeidelberg[m]: ack
16:35 eric_engestrom: and yeah, earlier I merged a .gitlab-ci/*.yml change where the fact the pipeline was created was the verification that the change was correct, but it still ran the entire pipeline with all the hardware tests
16:37 DavidHeidelberg[m]: I'll bring it up on Monday team meeting, we'll need implement something for it in Marge-bot I assume.
16:43 eric_engestrom: I think a label would work well, we would just need to add `if [[ "$CI_MERGE_REQUEST_LABELS" = *ci::skip* ]]; exit 0; fi` at the top of each job that should be skipped (ie. at least all hardware jobs)
16:44 eric_engestrom: (or a !reference to that actually, to make it easier to edit later)
16:45 DUOLabs[m]: <DUOLabs[m]> "This probably means that..." <- However, while most `cs.hdr.ctx_id` is `2` (I'm assuming the "main" Virgl context, some `cs.hdr.ctx_id` is 5, which may refer to the venus context (in that case, why is the capset not for Venus)?
16:45 eric_engestrom: we could even have `ci::skip-hardware-tests` and `ci::skip-everything` for instance, if we want to even skip build jobs as well sometimes
16:48 DUOLabs[m]: DUOLabs[m]: However, the contexts with the 5 ctx_id don't exist with virgl_context_lookup
17:07 daniels: I can’t wait until everyone just smashes their MRs in with ci::skip-everything because I am awesomely smart and how dare a machine tell me my code isn’t perfect
17:14 eric_engestrom: yeah that's always the risk :/
17:15 eric_engestrom: I guess we could add checks that if any driver code is changed, using these labels fails the pipeline
17:30 eric_engestrom: DavidHeidelberg[m]: kernel uprev is breaking X on freedreno :/
17:30 DavidHeidelberg[m]: eric_engestrom: I see :/ just few patchlevels up....
17:32 DavidHeidelberg[m]: robclark: any ideas? "23-05-26 17:04:31 R SERIAL> [ 18.416476] msm-mdss: probe of 900000.display-subsystem failed with error -110"
17:33 eric_engestrom: (also, commit says the hash is for 6.3.4 but the image tag bump says 6.3.3; the image tag doesn't matter but it might be good to be consistent, assuming the latter is the wrong one)
17:33 robclark: DavidHeidelberg[m]: "connection timed out"?
17:34 DavidHeidelberg[m]: btw. a618 started for change crash with some tests
17:34 DavidHeidelberg[m]: eric_engestrom: oh fck. anyway, it's not going to get in. I originally started with 6.3.3 but then wild 6.3.4 appeared...
17:34 DavidHeidelberg[m]: thanks for noticing thou
17:37 robclark: what tests started crashing? This is with kernel uprev? Or?
17:42 DavidHeidelberg[m]: robclark: a618 and a530 fails: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23251
17:42 DavidHeidelberg[m]: A530 won't load display controller and a618 just crashes few tests
17:48 robclark: DavidHeidelberg[m]: what was previous kernel? Looks like something isn't healthy with that kernel, I guess
17:51 DavidHeidelberg[m]: robclark: 6.3.1
17:51 DavidHeidelberg[m]: I see large amount msm/ changes between 6.3.1..4
17:55 robclark: huh
17:56 robclark: I don't particularly follow the stable kernels, but wouldn't have really expected that.
18:03 DavidHeidelberg[m]: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/log/?h=v6.3.4
18:06 eric_engestrom: DavidHeidelberg[m]: can you add a comment on the line with the kernel url saying which tag the hash corresponds to, to having to look it up?
18:08 DavidHeidelberg[m]: I added the link into the MR
18:09 robclark: DavidHeidelberg[m]: looks like most of the changes are disp/dpu (no idea why, for ex, "drm/msm/dpu: split SM8550 catalog entry to the separate file" ended up on stable).. so should be pretty much unrelated..
18:12 DavidHeidelberg[m]: robclark: what about https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=v6.3.4&id=3a4e6dc2a6f659ac53a0d280832677b6ecb69944 ?
18:12 DavidHeidelberg[m]: I was thinking this one could affect a530
18:13 DavidHeidelberg[m]: but it doesn't look much likely :/
18:13 robclark: it didn't trigger any igt fails in msm-next/msm-fixes CI
18:14 robclark: IIRC db845c is using a bridge chip for hdmi, so could be something outside of drm/msm .. but maybe you could bisect
18:14 robclark: (or at least narrow it down to which 6.3.y tag started having problems)
18:15 DavidHeidelberg[m]: I'll try to do it, but just as I reduce the 1hr qaiting for the rootfs generation
18:15 DavidHeidelberg[m]: *waiting
18:18 robclark: thx
19:08 sassefa: Hello. I'm Surafel and I'm looking to get into DRI development. Just introducing myself for now 👋️
20:21 DUOLabs[m]: * However, while most `cs.hdr.ctx_id` is `2` (I'm assuming the "main" Virgl context), some `cs.hdr.ctx_id` is 5, which may refer to the venus context (in that case, why is the capset not for Venus)?
21:31 karolherbst: any debugging tips when kmalloc crashes on weird addresses? Tried kasan, but my bug doesn't happen once it's enabled
21:52 iive: karolherbst, how about using second crashkernel to make kdump of the crash?
22:18 q66: airlied: so i "fixed" the problem https://gitlab.freedesktop.org/mesa/mesa/-/issues/9100#note_1928940
23:07 Kayden: sassefa: welcome! always nice to see new folks here
23:09 DavidHeidelberg[m]: q66: nice! Good work
23:28 DUOLabs[m]: Got it to work! Forgot to implement CONTEXT_INIT in virgl_cmd_context_create.