01:34phire: hi, I'm doing some profiling on the v3d driver for the pi4.
01:34phire: And the one of the hot functions is the application's memcpy into the uniform buffer, taking up 15% of CPU time.
01:35phire: I did some testing, and write performance into buffers seems slow. ~800MB/s
01:35phire: I'm expecting closer to the theoretical preformance of the DDR4 memory, 4.8GB/s
01:36phire: I checked the kernel driver and it does appear to be mapping buffers with the writecombine flag enabled
01:37airlied: phire: is it using a neon memcpy?
01:38phire: it's using aarch64's paired register writes
01:39HdkR: ^ Which those will be bottlenecked by memory on Pi(and bigger cores even). No need to go wider
01:40airlied: okey then Im outta ideas :-P
01:41phire: I'm not sure if something is wrong somewhere, or I just have too high expitations of write preformance to uncached ram
01:42airlied: I expect the latter :-P
01:44HdkR: You could try the non-temporal loadstore pair instructions
01:45HdkR: stnp and ldnp
01:46HdkR: Not sure what the behaviour of the little Cortex cores are in that instance
01:46HdkR: and the big cores effectively say to avoid like the plague because they have bugs up to a certain point :P
01:46phire: these are A72s
01:47HdkR: Avoid like the plague then
03:19phire: I was considering hacking the kernel driver to allow me to map buffers via the GPU's L2 cache.
03:20phire: but after futher research it looks like the rpi4 might actually have two GPU caches. One for v3d and one for the old vpu. If so I doubt they are coherrent with eachother
03:45mareko: why is crucible called crucible? is it a reference to Doom 2016?
03:49airlied: mareko: the README has two definitions of the word whch I assume is meant to alude to where it came from
03:49airlied: vulkan -> forge -> crucible
05:50airlied: yay talos menu loads :-P
05:51airlied: oh and the game start, 1.61 fps :-P
05:55Kayden: hey, at least it's FPS and not SPF!
05:55mareko: airlied: sw vulkan?
05:56airlied: mareko: yeah
05:56HdkR: airlied: You're getting farther than me. I can't even boot Talos yet :P
05:56mareko: airlied: congrats
05:56HdkR: ...I don't think
05:57airlied: some slight rendering issues
05:57HdkR: I see a rainbow, looks normal to me
05:59HdkR: airlied: But how fast does xmoto through zink, through vulkan run? :D
06:01mareko: airlied: now that sw vulkan is done, what's next?
06:02airlied: dota2 still crashes on start :-P
06:02airlied: mareko: probably fixing opencl further :-(
06:04airlied: upstreaming this will be a bit of work though
06:04airlied: llvmpipe multisample, llvmpipe arb_gpu_shader5 then the vallium layer
07:00danvet: agd5f_, where (well which kernel release) is the latest p2p series from könig aimed at?
07:00danvet: everything already in drm.git?
07:01danvet:not entirely sure
07:03MrCooper: AMD developers tend to work on the amd-staging-drm-next branch
07:05MrCooper: anholt_ jekstrand: FWIW, I find the pipeline overview page (click on the pipeline status icon, e.g. https://gitlab.freedesktop.org/anholt/mesa/pipelines/136080) most convenient for triggering jobs (and cancelling unneeded ones); there one can just click away without waiting for anything
07:29danvet: MrCooper, bunch of it is in drm-misc-next, but not sure it's all or whether there's more
07:29danvet: I largely ignored the driver side of this
07:49MrCooper: narmstrong: any idea what's up with the lima/panfrost failures on the 20.0 branch?
07:51narmstrong: MrCooper: LAVA master is down, let me disable them until the lad is down
07:59MrCooper: narmstrong: I fixed the target branch to staging/20.0
11:57FireBurn: Hi, chromium keeps waking up my prime graphics card, I'm guessing it's checking something which brings it briefly out of sleep each time a tab is opened, would there be a way to record what it is that's beinging the card out of sleep
11:58FireBurn: On another note, is there a way to check that vaapi is acutally being used?
11:59bnieuwenhuizen: I thought Chrome didn't support vaapi on linux?
12:00FireBurn: IT does with the right patch
12:42FireBurn: I'm looking at amdgpu/amdgpu_atpx_handler.c but I'm not sure what wakes the card up
12:52agd5f_: danvet, it's all in drm-misc for 5.8
12:53agd5f_: FireBurn, any access to the card (ioctl, sysfs, etc.)
12:54FireBurn: Is it possible to figure out which access triggered it
12:54FireBurn: I remember a while back there was talk of having some values cached, so the card didn't need to be woken up needlessly
12:55agd5f_: FireBurn, you could put some logging in the kernel
12:56FireBurn: Just trying to figure out where
12:56agd5f_: look for pm_runtime_get_sync()
12:56agd5f_: amdgpu_drm_ioctl in amdgpu_drv.c is the ioctl wrapper
12:57agd5f_: all of the power related sysfs interfaces are in amdgpu_pm.c
12:57karolherbst: bnieuwenhuizen: it does actually.. some distributions have out of tree patches, but there is stuff upstream as well
12:59danvet: agd5f_, thx
13:17MrCooper: kusma: do you and Gert really need all jobs running in every pipeline for your D3D12 work? Mesa egress traffic is looking to go through the roof today
13:19kusma: MrCooper: I've been bisecting a regression that I can only trigger on CI, so I guess the answer to your question is "yes" :-P
13:19MrCooper: only if all jobs run?
13:19kusma: MrCooper: no, but it's surprisingly hard to trigger just a single step, due to how the UI is
13:19MrCooper: (that was kind of rhetorical :)
13:20MrCooper: it's really easy on the pipeline overview page
13:20kusma: MrCooper: I either have to go in and cancel all other steps manually, just to trigger the first step.
13:20kusma: Or cancel them as they start up.
13:20MrCooper: well, step 0 is only triggering the required container jobs in the first place :)
13:21kusma: MrCooper: well, yeah. But those are the only ones you can explicitly trigger, the rest gets implicitly triggered.
13:22MrCooper: e.g. on https://gitlab.freedesktop.org/gerddie/mesa/pipelines/136518 trigger one or two container jobs, then cancel any unneeded build/test jobs
13:22kusma: MrCooper: just for the record, I'm done with my bisecting. It turns out, well, the glsl_type system isn't initialized quite as often as it might be needed ;-)
13:23kusma: MrCooper: I haven't looked at what gert has been triggering, I can ask him
13:23kusma: MrCooper: that particular one is a merge-request, though.
13:23kusma: (but not to the upstream master)
13:24kusma: MrCooper: I suspect that the bulk of what you've been seeing is my bisecting. Sorry for that.
13:28kusma: MrCooper: Also, let me know if you have any good suggestions for how we can prevent triggering too much stuff on our integration-branch. We're currently moving fast, so it's kinda busy. Maybe we should only build windows-stuff per MR, and test other things at a lower frequency or something?
13:41daniels: mind you, the shader-db stuff is run on x86, so we could bisect that locally inside the build container?
15:15MrCooper: collabora-ec2-win-docker-1 is being slow again: https://gitlab.freedesktop.org/hakzsam/mesa/-/jobs/2409796
15:15MrCooper: daniels: ^
15:16MrCooper: more like frozen it looks like :)
15:58Lyude: seanpaul: poke, you around?
15:59seanpaul: Lyude: hey!
16:00Lyude: seanpaul: so-i think we either need to revert the multi-message support for mst or try to add some sort of heuristic for testing if it works or not on hubd
16:00Lyude: i've got a lenovo dock right now that doesn't work at all with it
16:01Lyude: seanpaul: yeah :\, kinda explains why they say not to use it in DP 2.0
16:01seanpaul: probably revert makes the most sense
16:01seanpaul: but i worry about the query stream enc messages that need to be sent periodically
16:03Lyude: seanpaul: how often do they need to be sent btw?
16:04Lyude: also, i'll go ahead and send out a revert
16:07seanpaul: Lyude: every ~2 seconds
16:07seanpaul: for each active stream
16:08seanpaul: yeah, so interleaving messages gets kind of important
16:08Lyude: seanpaul: so-my thought was if we really needed this to work on some hubs, we could eventually implement something to asynchronously test if interleaving messages works
16:09Lyude: although that might get somewhat complicated if you have to deal with nested topologies
16:09Lyude: that, or whitelist hubs where we know interlaving works
16:09seanpaul: yeah, and then i suppose you hope the hub doesn't lose its mind and fails gracefully so we can detect it
16:10Lyude: seanpaul: mhm
16:19daniels: MrCooper: yeah, I think maybe 4 jobs was too aggressive. I've just dropped the limit to 2.
16:22Lyude: seanpaul: huh, actually, I'm trying to confirm right now that it's not the DP monitor I was using causing problems (turns out not a single hub seems to like this particular monitor, and they end up resetting) since just using this hub with some hdmi displays seems to work fine. although I'm also not seeing it send any messages with seqno=1
16:24Lyude: ah-nope, I got it to fail now
16:36anholt_: austriancoder: are you ok with drm-shim defaulting to always being renderD128 for etnaviv?
16:36anholt_: or would you rather retain the "pick an unused node" behavior?
16:51austriancoder: anholt_: rb your MR
17:06Lyude: seanpaul: btw, mst revert is on the ml
18:23Lyude: seanpaul: mhhh-looks like I didn't revert hard enough? I'm still seeing some messages getting sent with seqno=1, I'll send out another patch in a bit
18:23seanpaul: Lyude: iirc, seqno=1 was generated even before that patch
18:24seanpaul: it was just that we weren't interleaving the messages
18:41Lyude: seanpaul: :(, it looks like it suggests that we shouldn't use seqno=1 either in the spec :\, although I don't think I've seen any hubs that break on that other then one specific dell monitor that is just known for being kind of broken
18:42seanpaul: oof, that's going to take a bit more surgery i think
18:42Lyude: yeah, i don't mind handling that. kind of wonder if maybe changing that will finally make this dell monitor somewhat sensible
18:42seanpaul: we should be able to just blindly set seqno=0 with the serialized msgs, but that will leave a bunch of unnecessary code around
18:43seanpaul: if that's the problem, that would be great
18:43seanpaul: i think we've had seqno=1 possible forever
18:43Lyude: it's kind of a shame because I'm not sure I've ever seen any other hubs that can't handle this
18:43Lyude: but dp 2.0 seems to imply we need to clear seqno
18:43seanpaul: yeah, i assume there are others, otherwise they'd just write those hubs off as non-compliant
18:44seanpaul: instead of explicitly chopping off that part of the spec
18:59jekstrand: bnieuwenhuizen: Can I get an ACK from you on https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4675
18:59imirkin: vsyrjala: also fix edid-decode for that dispid off-by-one thing? (or perhaps you already sent a patch?)
19:00jekstrand: krh: Who's the best trp person for ACKs?
19:00krh: jekstrand: oh sorry, let me do that
19:01krh: I think I can still do that, though turnip is almost moving too fast for me to keep up these days
19:02krh: jekstrand: thanks for looking after turnip
19:03jekstrand: When is turnip going to be good enough we can get some basic CI on it?
19:03jekstrand: I find myself having to touch it probably once a month or so.
19:03krh: I think it is now, but we're having trouble figuring out how to run the VK CTS pre-commit
19:03krh: last I heard it runs the whole CTS without crashing
19:04anholt_: yeah, I could probably just throw a fractional run on the current docker cheza setup already.
19:04jekstrand: Nice work all!
19:04jekstrand: Doing a full CTS run would be a bit on the nuts side
19:05vsyrjala: imirkin: it already decodes it correctly
19:05jekstrand: You only need about 20 of those 80k depth/stencil tests though
19:06imirkin: vsyrjala: hm? https://git.linuxtv.org/edid-decode.git/tree/parse-displayid-block.cpp#n219 -- am i looking at the wrong thing?
19:06anholt_: I think we estimated that we could fit in overnight full runs on the 8 cheza
19:06anholt_: but also as the passrate goes up the runtime should be going down too
19:06imirkin: vsyrjala: oh, i'm blind. it has the 1+ in front, rather than at the end!
19:07bnieuwenhuizen: anholt_: as you enable more extensions the passrate goes up but so does the runtime ... (skip == cheap)
19:07anholt_: bnieuwenhuizen: fair, and flto and cwabbott have been going strong on that front!
19:44robclark: hmm, with pubg, according to $debugfs/gem, we somehow end up with ~20k bo's with prsc->bind==PIPE_BIND_VERTEX_BUFFER before running out of memory, and crashing.. that seems a bit.. excessive..
19:45bnieuwenhuizen: tried suballocation?
19:47imirkin: in nouveau, we suballocate everything less than ... something.
19:49robclark: yeah, there are a bunch of thigns we should suballocate.. but all the same, that doesn't sound like a reasonable number, does it?
19:50imirkin: does seem large :)
19:50bnieuwenhuizen: it is largish, but I think for GL not unreasonably so?
19:50imirkin: 20k non-freed vbo's? seems like a lot
19:50robclark: some of them are also large(ish), like couple hundred kb, so sub-alloc won't help there
19:51robclark: I disabled the bo cache, so it isn't that..
19:53imirkin: could you be leaking persistent / coherent bo's?
19:53imirkin: seeing some issues with some of the semi-recent work by mareko with that in nouveau -- looks like we don't flush some cache and everything explodes
19:54imirkin: basically glBegin/End data gets dumped into a coherent-mapped bo now (if supported)
19:55robclark: pretty sure something is leaking, the # of bo's climbs until *boom*
20:05labbott: w/in 31
20:42Lyude: wow! this is absolutely nuts
20:42Lyude: seanpaul: using seqno=0 always has just fixed a monitor we have not been able to get working 100% of the time since I started working at red hat five years ago
20:43seanpaul: Lyude: woohoo! and also, owwww!
20:44seanpaul: what's the monitor, btw?
20:44Lyude: seanpaul: we literally had just assumed the monitor was broken because none of the other mst hubs we have ever had any issues with this
20:44seanpaul: i wonder if we've got reports of that badness
20:44Lyude: seanpaul: P2415Qb
20:44Lyude: erm, Dell P2415Qb
20:45imirkin: is that one of the early 4k ones?
20:45Lyude: robclark: btw ^ you will get a kick out of this since you were the first one to try fixing that thing, we finally got the super problematic Dell MST monitor working 100% of the time
20:45Lyude: imirkin: yep
20:46robclark: heh, nice
20:46seanpaul: Lyude: thanks, doesn't look like we have reports, but i'll keep it in mind for future
21:56jekstrand: hakzsam: I've got a patch which fixes the RADV regression
21:56jekstrand: hakzsam: It regresses iris. :-(
21:56jekstrand: hakzsam: I'll run my ANV fossil-db and we'll see what happens there.
21:56jekstrand: hakzsam: I suspect most of the hurt is just from some added loop peeling but it causes us to loose 8 SIMD16 programs and that's not so goo
21:58jekstrand: hakzsam: Sadly, running fossils takes like an hour. :-(
22:26kisak: I wonder if a general mesa 20.0.x release tracker milestone would work just as well as a bunch of smaller milestones, since the end result would be exactly the same except a pile of closed issue reports on it as they get resolved
22:27kisak: as an aide to track issues that are known to affect the stable branches
22:31Kayden: yeah, that seems more sane than adding 20.0.(N, N+1, N+2) etc milestones.
23:23kisak: looks like the meson-windows-vs2019 runner is offline