08:53 rellla: if i want to get an already (in master) merged commit to stable, it's just an email to mesa-stable with this "Nominate" subject?
08:55 doras: Does it make sense that under moderate load, even through flips are scheduled to the next vblank, the flip event is recieved only after ~3-5 vblanks?
09:01 doras: This is with 144hz refresh rate
09:04 doras: The effect of this is that trying to do vsync at the compositor level results in a maximum FPS of ~28-36, which is kind of terrible. Especially since the application renders at around ~60 FPS.
09:08 doras: This is "solved" by enabling vsync at the application level so it is throttled. The compositor then gets to put its frames on the screen at around the same pace. This is really not a solution, however.
09:17 MrCooper: rellla: or an MR for the staging/<version> branch
09:19 MrCooper: doras: that sounds like something's wrong somewhere; the expectation is that the kernel sends the event immediately when processing the HW interrupt signalling flip completion, and userspace should be able to process it well before the next vblank
09:20 MrCooper: doras: where/how are you measuring this?
09:24 doras: MrCooper: Mutter + Xwayland + DXVK/RADV. HITMAN 2, specifically. I'm getting ~50-60 FPS in the game (DXVK_HUD=fps), and 28-36 FPS at the compositor level (CLUTTER_SHOW_FPS=1). I'm saving the current time before calling drmModePageFlip and then I'm using the presentation time from the event to see when it actually occurred.
09:25 pq: doras, what might maybe be happening is that the app, unthrottled, queues up multiple frames' worth of rendering to the GPU, and the compositor's own job for the composite rendering is then starved by the app. This doesn't delay pageflip events from flowing, but it does delay pageflips from happening in the first place.
09:26 doras: I think I'm getting the event at the time of the flip, but I'm a bit surprised it takes 3-6 vblanks to perform single flip.
09:27 doras: a single*
09:27 pq: doras, the flip will wait for all dependant GPU jobs to finish. The last job is the compositor's compositing job, which depends on all application GPU jobs submitted before it (frames submitted to and processed by the compositor).
09:28 pq: so an app that spams the GPU can easily starve the compositor
09:28 pq: doras, GNOME has a tool called sysprof that might tell you more detailed what happens.
09:30 doras: pq: isn't the image "ready" from the application's point of view when the surface commit occurs? Is so, then the only dependencies for the frame are the compositor's own work on the buffer to be swapped.
09:30 doras: If so*
09:31 pq: doras, no. When application posts the frame, often the GPU has not even started on it yet.
09:32 pq: because people like moar fps, and this allows interleaving more stuff to gain more fps at the cost of latency
09:32 doras: I see.
09:32 pq: it's pipelining, so to say
09:33 pq: and less stalling sync between GPU and CPU, which would otherwise just performance all over
09:33 pq: *hurt performance
09:35 malice: People also like less latency
09:35 pq: but the pipelining becomes detrimental if the app is not throttled to the system refresh cycle, as it may queue multiple frames' of work that all needs to complete before the first frame can even show (depending on how many frames the display server managed to receive and process, queueing further into the pipeline
09:35 pq: actually, there are two different things
09:36 pq: One is that the display cannot flip until all dependent GPU jobs have completed.
09:37 pq: The other is that even if the compositor drops app frames on the floor (which it must do if the app is not throttled), the compositor probably takes the latest frame submitted. But that latest GPU job might not be able to complete until all the GPU work that is going to be dropped on the floor has completed too.
09:38 MrCooper: right, so one thing that's needed is for the compositor to be able to determine when the GPU has finished drawing a frame, so it can pick the latest done frame
09:39 pq: IMO running apps not vsync'd is simply a bad idea. :-)
09:40 doras: These are interesting points. What should a compositor do to drop an app's frame? I'd like to check if mutter already does that.
09:40 pq: doras, https://feaneron.com/2019/05/31/profiling-gnome-shell/
09:41 MrCooper: it can't "drop" anything per se, just ignore it
09:41 MrCooper: but the GPU still does the work drawing it for nothing
09:41 pq: the thing that hurts are the jobs queued to the GPU, and those cannot be dropped
09:44 doras: It sounds like the app needs throttling, not vsync necessarily.
09:47 doras: Does Wayland offer any kind of rendering throttling mechanism?
09:47 malice: pq: IMO certain games are unplayably shit with vsync enabled
09:48 emersion: doras: yes, wl_surface.frame
09:49 doras: emersion: this requires the app's cooperation, doesn't it?
09:49 doras: Or Xwayland, in this case.
09:51 pq: doras, EGL implementation will internally throttle to wl_surface.frame, unless the apps sets eglSwapInterval to zero. But that's for wayland native apps, through Xwayland I'm not actually sure.
09:51 emersion: wayland clients that don't use another way to throttle their rendering loop do use it
09:51 emersion: but in this case, Xwayland i think doesn't care
09:52 emersion: X11 doesn't have a frame throttling mechanism apart from PRESENT right?
09:52 MrCooper: Xwayland only uses frame callbacks for emulating sync-to-vblank
09:52 emersion: i think there are patches to make PRESENT use the presentation-time protocol
09:52 pq: letting the EGL implementation to throttle is a bad idea though, because it blocks the app.
09:52 emersion: well, i guess this doesn't matter too muchy for a game
09:52 emersion: much*
09:53 pq: malice, I think that says more about the game engine than anything else.
09:53 MrCooper: emersion: still won't make it always wait for the frame callback
09:53 malice: pq: Can you get latencies that match non-vsync with vsync?
09:53 emersion: MrCooper: do you mean that a client not using PRESENT won't wait?
09:54 emersion: are event clients using PRESENT won't wait in some cases?
09:54 emersion: or even*
09:55 MrCooper: no, only clients using Present with PresentOptionAsync will not wait
09:55 pq: malice, it depends on who you ask, and how smart the game engine is. IMO yes, others say no. I assume we are talking here about a case where the game is able to finish rendering multiple times per refresh just fine, and not a case like doras'.
09:55 MrCooper: (which basically corresponds to "no sync-to-vblank")
09:56 pq: malice, but I have the belief that most people writing game engines do not bother investing the extra thought of making things work, so end users end up claiming that vsync is bad.
09:56 malice: pq: So, 300fps to get decent latency with vsync? :D
09:56 pq: sorry?
09:57 malice: Multiple times per refresh, aka at least 2, 144Hz*2 = 288fps
09:57 pq: malice, there are also people who *want* tearing, and I have never understood that.
09:57 malice: Although, yeah, vsync is less shit with 144Hz, but still
09:58 malice: pq: I want tearing. Lack of tearing makes me concerned something is trying to eliminate it at the cost of input lag :(
09:59 malice: Also, tearing isn't very noticeable with 144Hz
09:59 pq: malice, isn't the whole point of turning vsync off that your game can render multiple times per refresh, so that the latest frame that actually gets presented (and all the earlier one got discarded) is more fresh (less latency)?
09:59 rellla: MrCooper: thanks, i hope this is acceptable https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4017
09:59 malice: pq: Yes
10:00 doras: So... what are we saying? Say an X app uses PRESENT, could Xwayland throttle it in if it were to use surface.frame?
10:00 pq: doras, didn't you say you toggled vsync off on your game? Don't do that. ;-)
10:00 doras: throttle it if*
10:00 pq: If you do that, you tell it to not throttle, so it won't throttle.
10:04 doras: pq: I'm trying to undertand if there is a more general solution to this. Not all apps have this toggle, and it's never the default. Not a very good experience.
10:06 HdkR: ignore SwapInterval setting ;)
10:07 pq: I think fundamentally that is imposible in a display server. Apps do not submit GPU jobs via a display server, they submit GPU jobs straight to the GPU.
10:07 pq: so the GPU driver stack, e.g. EGL implementation, would be the place to override app choices
10:08 pq: but then, people claim the implementation is broken, if it doesn't do what the app says through the API
10:08 MrCooper: Mesa has driconf vblank_mode for that
10:10 doras: Alright then. If we shouldn't limit apps "by force", to get any kind of decent experience we need the GPU to prefer the compositor's jobs over apps', so it at least match their frame rate.
10:11 doras: so it would*
10:12 pq: Reality is often wildly different from the optimal situation. Games are still written as busy-loops, and vblank_mode exists.
10:15 HdkR: If reality was optimal then all games would be front buffered and chasing the beam :P
10:19 MrCooper: doras: the i915 kernel driver has something like that, and I've been playing with adding something like it to amdgpu; however, the application frame selected by the compositor can still cause the page flip to be delayed. To avoid that, the compositor needs to be able to determine when the application frame is done
10:22 MrCooper: and only select application frames that are done
10:25 emersion: explicit sync?
10:25 MrCooper: jadahl: ^ I think this should already be possible with dma-bufs, by checking the dma-buf fd state?
10:28 MrCooper: though I guess a dma-buf fd is only passed the first time a buffer is attached to a surface
10:29 MrCooper: there was talk about adding kernel UAPI for retrieving the implicit sync object for a GEM BO, but I don't know the status of that
10:35 jadahl: doras: one can make the compositors egl context high priority which may cause it to be prioritized. doing that is a can-of-worm opener though as it requires the binary to have special flags set on causing various issues, but you can experiment it if you want, mutter supports it.
10:36 emersion: would be nice to always allow the DRM master to get a high-priority context
10:36 jadahl: that'd be convenient indeed
10:37 doras: jadahl: is this based on EGL_IMG_context_priority?
10:38 emersion: yes
10:38 jadahl: doras: yepp
10:39 jadahl: but I do like emersions idea about always giving DRM masters it by default
10:40 MrCooper: still won't help with the issue I described above though, as it's the application frame delaying the flip
10:40 jadahl: much better than giving compositor processes special caps triggering various security lock downs
10:40 jadahl: MrCooper: because it's not done in time?
10:40 MrCooper: yep
10:40 jadahl: ah
10:41 jadahl: I wonder if that could be solved by page queuing a page flip for every frame it sends, then making the kernel choose whatever is the most recent completed one
10:42 MrCooper: it's probably better to try that in userspace first, and only do it in the kernel if absolutely unavoidable
10:43 MrCooper: as doing it in the kernel would make the kernel implementation and UAPI more complex
10:43 jadahl: you mean making the application not have such a rush, or avoiding a page flip if some dma-buf is not " complete"?
10:44 jadahl: i.e. use some "old" buffer or something
10:44 MrCooper: only selecting application frames which are done for presentation
10:44 jadahl: wouldn't that mean the compositor has to keep a queue? then keep track of which ones are completed
10:44 MrCooper: right
10:46 MrCooper: your suggestion of queuing flips in the kernel is only feasible for direct scanout of application buffers anyway, not for compositing?
10:46 jadahl: that'd violate wayland protocol
10:46 MrCooper: how so?
10:47 jadahl: we have to pick the buffer from the last commit to be perfect
10:47 jadahl: could probably be a bit flexible if e.g. there are no subsurfaces or anything
10:48 MrCooper: the compositor can keep a queue of commits, and select the most recent one for which all buffers are ready?
10:48 doras: How about only committing complete frames?
10:49 emersion: a compositor supporting explicit sync could use the "most recent completed frame"
10:49 emersion: can't remember whether weston actually does that
10:50 emersion: > doras | How about only committing complete frames?
10:50 emersion: can't work, since with implicit sync clients themselves don't know when a frame is "complete"
10:50 MrCooper: sure they can
10:50 emersion: how?
10:51 MrCooper: this is not an issue for the client side drivers
10:52 doras: Why not?
10:52 emersion: how can cleints know when a frame is complete?
10:52 MrCooper: but it involves either having a presentation thread, or blocking an application thread
10:53 emersion: threads, yay
10:53 MrCooper: the client drivers can do this via their kernel drivers
10:53 MrCooper: via driver specific UAPI
10:54 ickle: is_my_frame_complete(int dmabuf) { return poll(&(struct pollfd){dmabuf, POLLIN}, 1, 0) == 1; }
10:54 emersion: not really a solution
10:54 emersion: hm
10:54 MrCooper: ickle: right, the problem being the dma-buf fd is only passed once, then the buffer can be reused any number of times
10:55 ickle: the wayland protocol is one per-frame iirc
10:55 ickle: at least you don't want to reuse the frame you give to the compositor until the compositor says it's finished
10:58 pq: jadahl, no, it would not violate Wayland.
10:58 MrCooper: ickle: the dma-buf fd is passed for creating a wl_buffer, which can be reused for any number of frames
10:58 MrCooper: same principle as DRI3
11:00 ickle: sure, but if you are using implicit fences (so cover the entire buffer) to generate the fence for frame-completion, you do not re-use that frame until you have the ack
11:00 ickle: otherwise it will be forever delayed
11:00 jadahl: pq: how so? if you commit state 1A and 1B to two surfaces atomically, then 2A and 2B atomically to the same two, if you post 1A and 2B because 2A was not finished, the state will be synchronized
11:00 ickle: (well not forever, until the fence snapshot is acquired)
11:01 jadahl: eh, not be synchronize
11:01 jadahl: (that is if you have multiple planes)
11:01 MrCooper: ickle: right, but the dma-buf fd is closed after import
11:01 pq: jadahl, you do need to keep state consistent, but other than the compositor is free to delay presentation just like DRM delays flips until vblank.
11:02 jadahl: pq: well if your frame consists of buffers from two surfaces in two planes, those need to update at the same time
11:02 pq: jadahl, yes, of course.
11:02 jadahl: that's what I meant
11:02 pq: but only for sub-surfaces in sync mode
11:02 jadahl: you can't just pick whatever the most recent buffer in those cases
11:02 MrCooper: jadahl: that's why the queue needs to be for consistent states, not just for individual buffers
11:02 jadahl: right
11:03 jadahl: you'd also need to time the page flip queuing to as close to the dead line as possibel
11:03 jadahl: well, need is a strong word
11:03 pq: MrCooper, there is no reason to close the dmabuf fd after import. In fact, it cannot be closed, because the compositor needs to re-import on re-use.
11:04 MrCooper: why does it need to re-import? Wouldn't that an easy DoS by exhausting available file descriptors?
11:05 pq: MrCooper, to assure that all relevant caches get invalidated. AFAIK, EGL/OpenGL offer no other API to do that.
11:05 pq: DoS, sure
11:06 pq: there are so many very easy ways to DoS a compositor, like simply creating wl_regions in a loop.
11:06 MrCooper: if the dma-buf fd is available, ickle's method should work
11:06 pq: the compositor likely runs out of memory before the client runs out of object ids
11:07 ickle: otherwise stash it inside a syncobj
11:07 pq: and making the region complex, you don't necessarily need to even create new protocol objects
11:07 MrCooper: a process only has 1024 file descriptors available by default
11:08 HdkR: DOS any PC by allocating all the RAM. Becomes unusable, least of your worries is allocating wl_regions :P
11:10 pq: MrCooper, such is life
11:11 MrCooper: pq: nothing like that is necessary with DRI3
11:11 pq: *shrug* as I said, I believe this requirement comes from EGL API
11:12 pq: something about creating an EGLImage is allowed to make a copy or something
11:12 pq: or maybe using the EGLImage is allowed to make a copy
11:29 hakzsam: MrCooper: no more comments on https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3960 btw?
11:29 CounterPillow: Not sure if anyone involved with the announcements on mesa3d.org is in here, but the 20.0.0 announcement contains a copy-paste related error. It says " Mesa 20.0.0 is a new development release. People who are concerned with stability and reliability should stick with a previous release or wait for Mesa 19.3.1." but that Mesa 19.3.1 should clearly be 20.0.1
11:33 CounterPillow: Also "Mesa 20.0.0 implements the Vulkan 1.1 API, but the version reported by the apiVersion property of the VkPhysicalDeviceProperties struct depends on the particular driver being used." should be 1.2 now, I believe. :)
11:34 doras: MrCooper, jadahl, pq, ickle, emersion: I kind of lost you on the approach to solving this, but if the compositor's own frames are delayed to a much later vblank than the next because of its clients' slow/intensive rendering, this sounds like the main reason for stutter of the entire desktop when compositing multiple windows where only some of them are intensive.
11:35 doras: Animations included.
11:36 MrCooper: doras: yes, I think solving this should be pretty high priority, thanks for raising it; that said, it's usually not so bad with apps which can sustain framerate >= refresh rate
11:38 MrCooper: hakzsam: not really, other than "don't waste bandwidth" :)
11:42 hakzsam: sure thing
11:42 hakzsam: should I wait for !2569 before pushing CI fossilize support?
11:42 doras: MrCooper: which is probably why I started noticing it since I got a 144hz monitor.
11:44 doras: I'm not even sure on which project to report this.
11:45 MrCooper: the display server you're using, mutter I suppose?
11:55 pq: doras, so your issue is that you run your game windowed and it causes the whole desktop to stutter, not only the game?
11:59 MrCooper: it results in presenting at a lower framerate even with the game fullscreen
11:59 doras: ^
12:00 pq: if the game is fullscreen and there is only one monitor, then what is there to fix? Other than turning vsync on in the game.
12:00 MrCooper: same thing: don't present a buffer which isn't done
12:01 MrCooper: otherwise the flip may be delayed
12:01 pq: if the game cannot reach your monitor refresh rate even vsync'd, then it certainly is going to be only worse unthrottled
12:01 pq: MrCooper, I don't understand. In the fullscreen case, there is nothing else to be presented than the game frame that is not done yet.
12:01 MrCooper: nothing else is needed
12:02 pq: you completely lost me
12:02 MrCooper: if the buffer isn't done in time for vblank, the flip gets delayed => lower presentation framerate
12:02 pq: yes
12:03 MrCooper: but most likely a done buffer could have been presented instead, and would have allowed to flip in time
12:03 pq: erm
12:04 MrCooper: doras: maybe talk a bit about your GpuTest test case
12:04 MrCooper: lunch time for me, bbl
12:05 doras: pq: vsync fixes both of the above issues. You just need to turn it on pretty much everywhere.
12:05 pq: 144 Hz monitor, game can achieve 60 Hz max - what use is there for the compositor to pageflip at 144 Hz instead of at the game rate?
12:05 pq: wait, what both cases?
12:06 MrCooper: pq: doras created a test case with GpuTest running at +- 60 Hz, but mutter only presents at ~half that
12:06 danvet: seanpaul, btw what's happening with the drm flight recorder patches?
12:06 doras: Single full screen window and a composited desktop with multiple windows.
12:09 pq: oh, the compositor is not reaching even the game rate but is significantly below that, ok.
12:09 doras: pq: the compositor would only achieve half of the game's frame rate regardless of the frame rate.
12:10 pq: right, I suppose the compositor using only the finished buffers might help, if this is because of the compositor depening on unfinished renderings.
12:10 bnieuwenhuizen: doras: is this a game using radv?
12:11 doras: bnieuwenhuizen: yes. Through Proton/dxvk.
12:11 bnieuwenhuizen: d3d11 or d3d12?
12:12 doras: But it also happens with OpenGL/radeonsi.
12:13 bnieuwenhuizen: oh hitman 2 is plain dxvk/d3d11
12:13 pq: but if it's a question of GPU starvation and the GPU not implementing context priorities or not being able to pre-empt, then throttling is the only solution.
12:13 bnieuwenhuizen: no idea then :(
12:23 doras: bnieuwenhuizen: I'm still looking forward to format modifiers for AMDGPU. It currently prevents dmabuf from being used in Wayland with AMD GPUs. At least with mutter.
12:27 danvet: tzimmermann, did you see my reply?
12:27 danvet: I'm confused why you would want to drop that patch outright
12:29 tzimmermann: danvet, using the _disable() function when you'd want _fini() seemed somewhat half-baked, even if it works. my concern was that someone will look at this code and think that the _fini() is missing
12:29 danvet: mripard, I need a backmerge of drm-next into misc for b8076b5e5b857fd82b3439fd22934a8e638c6ad8 and f3ed67395dca30432b5787c273818d2278f59989
12:29 emersion: doras: this is more of a mutter issue really
12:29 emersion: er
12:29 emersion: doras: this is more of a mesa issue really
12:29 danvet: tzimmermann, yeah I'm going to switch it to _fini
12:30 tzimmermann: danvet, doing _fini() in the disable() callback might not be less confusing. maybe just go with what you have?
12:31 emersion: doras: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2147
12:31 danvet: tzimmermann, https://paste.debian.net/1133111/ I think this is better
12:33 tzimmermann: danvet, maybe just keep the disable() call and add a TODO; saying that _disable() is good enough for now and a managed _fini() will be added later?
12:35 tzimmermann: danvet, once the managed _fini() materializes, i think the usb disable callback should call the _poll_disable() again.
12:35 tzimmermann: at least some kind of comment should indicate this
12:36 tzimmermann: as it's bikeshedding now, i'll probably ack whatever the next iteration of the patch does :)
12:40 danvet: tzimmermann, I don't see a "usb disable callback"
12:40 danvet: so totally confused what exactly you think needs to be changed
12:40 danvet: or where you'd want to insert a drm_kms_helper_poll_disable()
12:41 tzimmermann: danvet, sorry, i'm kind-of doing 3 things in parallel ATM
12:41 danvet: note that the poll_disable/enable is already in udl
12:41 danvet: it's part of the drm_mode_config_helper_suspend/resume helpers
12:41 tzimmermann: danvet, by disable callback, i meant udl_usb_disconnect()
12:41 danvet: and in the ->disconnect hook you very much want _fini, not _disable
12:41 danvet: yeah
12:41 danvet: that one wants _fini
12:42 danvet: that was the mixup I didn't realize in v1
12:42 danvet: which is now fixed in v2
12:44 tzimmermann: danvet, this should be fine then
12:59 linkmauve: “11:36:31 emersion> would be nice to always allow the DRM master to get a high-priority context”, I remember Intel people saying i915 already does that automatically, and this high-priority transfers to any context the DRM master is currently waiting on.
13:00 ickle: the flip specifically
13:01 ickle: it doesn't prevent an app feeding a really slow frame for the compositor and so delaying all the others from being presented on time
13:03 linkmauve: Sure.
13:09 emersion: cool
13:12 danvet: sravn, https://paste.debian.net/1133114/ did I miss anything?
13:26 jadahl: ickle: so in other words, making an egl context of a drm master high priority wouldn't make a difference?
13:26 bnieuwenhuizen: on AMD it would
13:27 jadahl: right, but maybe AMD can change to do the same there, since adding the caps on the binary has awkward side effects
13:28 bnieuwenhuizen: what caps?
13:28 jadahl: the one needed to get the high prio context
13:28 bnieuwenhuizen: just be DRM master?
13:29 pq: CAP_SYS_NICE
13:29 daniels: jadahl: if your client submits a job which takes forever to complete, and your composition is blocked on that job, high-prio context doesn't help at all
13:30 jadahl: daniels: I know, but I'm not talking about that :p
13:30 jadahl: we have patches flying around that makes the gnome-shell EGL context high priority, but setting CAP_SYS_NICE on the binary causes a bunch of issues
13:31 jadahl: so if it can be made redundant if one is DRM master not only on Intel, it'd be great
13:32 pq: bnieuwenhuizen, so what about when the KMS device and the render device are completely separate? Sound like relying on DRM-master for privileges to get high-prio context is driver-specific.
13:37 bnieuwenhuizen: pq: almost anything like that is going to be driver-speicifc though? the dependency trakcing intel does as well?
13:37 MrCooper: it wouldn't have to be, if i915 used the common GPU scheduler
13:38 ickle: right, we wouldn't have pi
13:39 pq: bnieuwenhuizen, yes, everything except the EGL extension API. How to grant enough privileges for the EGL ext to succeed should really be a common thing and not driver-specific.
13:40 jadahl: could we make requesting high prio egl context work for DRM master without requiring the caps maybe?
13:41 bnieuwenhuizen: I think the argument above is that that wouldn't work cross-device? (when master + renderer are different)
13:41 pq: yeah, unless kernel devs are ok figuring out a way for the two drivers to cooperate on that.
13:42 MrCooper: the takeaway for display server developers: first make sure not to select client buffers which miss the deadline, then you can worry about GPU prioritization / scheduling
13:42 pq: then again, we have DRM leasing, or we could have two KMS devices and one render device, meaning that there would be multiple DRM masters at the same time
13:43 jadahl: MrCooper: that doesn't help the situation where gpu time is starved for some other reason (e.g. maybe its busy-loop-rendering being minimized)
13:43 pq: would it be a problem if all DRM masters could gain high prio context on all render devices, I don't know
13:44 danvet: pq, there's an idea to backfeed priority inversion through dma_fence
13:44 pq: danvet, sounds cool
13:44 danvet: so if you flip and miss, we tell the offending fence to boost its context
13:44 MrCooper: jadahl: prioritizing the GPU work needed for a flip seems most promising for that to me
13:44 danvet: it's what's happening on i915, but ofc only within i915
13:45 danvet: so defacto on i915 if you pageflip, you already get a boost (after one miss, but alas)
13:45 jadahl: MrCooper: what about when we composite into an offscreen that will end up in a flip on another gpu?
13:45 MrCooper: but it's all moot as long as the client buffers can cause the flip to be delayed
13:45 ickle: waitboost is after a miss
13:45 ickle: which is for gpufreq
13:45 danvet: ickle, yeah but it's somewhat sticky I thought
13:45 MrCooper: jadahl: that can all be tracked in theory
13:45 danvet: and doesn't it pull on the scheduler too?
13:45 ickle: I915_PRIORITY_DISPLAY is given in prepare_fb
13:46 pq: requiring a miss does not sound good
13:46 MrCooper: no miss required for scheduling prioritization
13:47 jadahl: MrCooper: where it *wouldn't* help (I suspect) is glReadPixels() :P
13:47 MrCooper: sure, no flip no boost :)
13:48 jadahl: any reason why we can't just make all contexts by the DRM master process high priority?
13:49 MrCooper: as pq pointed out already, won't help when the rendering device isn't the KMS device
13:50 danvet: pq, the miss is for gpu freq boost only
13:50 danvet: don't want to unconditionally boost just in case and fry the battery
13:51 danvet: also reduce overall perf because we exceed the TDP
13:51 pq: ok
13:51 danvet: priority you get always
13:51 ickle: tdp throttles gpu even if cpu bound, go figure
13:51 danvet: jadahl, i915 should boost also there
13:52 danvet: but not auto high-priority
13:52 jadahl: the igpu->dgpu blit is done with the dgpu egl context. would that be boosted?
13:52 danvet: nothing cross driver is atm
13:52 danvet: that's the wishful plan part
13:52 jadahl: not even with the flags set?
13:58 MrCooper: it's automatic, not related to any flags
14:19 narmstrong: danvet: hi, is there a specific reviewer needed for a fourcc change ? https://patchwork.freedesktop.org/patch/354291/?series=73722&rev=2
14:45 danvet: narmstrong, well whomever wants to consume/produce that fourcc on the other end of your pipeline
14:45 danvet: or might otherwise be interested in this
14:46 danvet: since drm/drm_fourcc.h is even in khronos standards the entire "must have open userspace" is relaxed
14:46 danvet: but instead you need to make sure that every possible stakeholder is ok with this
14:46 danvet: even future ones (like maybe you'll have a gl driver eventually producing/consuming this)
14:49 narmstrong: danvet: ok, thx
14:57 Venemo: jekstrand: I've just read your reply about the CI thing. in principle I agree with you, but I would respectfully argue that the CI is currently not as nice as it could be. it has caused me more problems than it solved so far.
14:57 daniels: Venemo: i know this isn't any comfort to you, but you do seem to be exceptionally unlucky
14:58 Venemo: most of the time, I just need to click the "retry" button until panfrost passes its GL CTS
14:59 Venemo: daniels: maybe it's just lack of luck. I don't know
14:59 daniels: Venemo: i watch the results fairly closely and you seem to be the one most stung by random failures
14:59 daniels: (also a lot of your failures are freedreno, not panfrost)
14:59 Venemo: the last 3 are panfrost
15:00 Venemo: but every ARM driver seems to have a share in there
15:00 Venemo: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3964/pipelines -> these are all panfrost failures
15:03 daniels: yeah, someone in the same building as our office was doing electricity works and managed to kill our LAVA cluster
15:03 Venemo: ouch
15:03 daniels: quite
15:03 daniels: cf. https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4011
15:04 Venemo: daniels: w.r.t to the bandwidth issue, where does all that bandwidth actually go? what does the CI system do that takes so much bandwidth? sorry if this is a noob question, but I didn't find a good explanation on how it works.
15:05 daniels: Venemo: that's OK, we're trying to figure it out as well. but the short answer is that we run a _lot_ of jobs, and those containers aren't small
15:05 daniels: then you have projects like GStreamer who do builds for iOS and Android and Windows, again not particularly small
15:05 Venemo: I thought that the CI builds actually happen outside of the fdo hosted space
15:05 daniels: they do, but the containers and the job artifacts are stored centrally
15:05 Venemo: ah.
15:06 Venemo: do they have to be?
15:06 daniels: doing the actual builds doesn't cost us financially, but storing & transferring the data from the central service does
15:06 daniels: well, you have to store them somewhere, right?
15:07 Venemo: so the problem is that every ci run has copy some large container images over the network? and when it's done, it has to copy the results back?
15:07 daniels: right
15:07 daniels: not _every_ build since we do have caching, but seemingly quite a few
15:08 daniels: we're working on analysing the logs atm to find out where the usage is coming from -> what we can fix
15:08 Venemo: okay
15:09 Venemo: maybe a naive question, but do the build artifacts have to be copied back? couldn't they be just hosted on the machine that made the build?
15:10 Venemo: alternatively, could they be copied on-demand? so, skip the copy if noone actually wants to download them
15:12 bnieuwenhuizen: so AFAIU the ingress to gitlab is not an issue, it is mostly egress from gitlab + GCP to things outside GCP (e.g. external runners)?
15:13 bnieuwenhuizen: so the copy back is probably not even the problem in this cost structure
15:13 Venemo: GCP = ?
15:13 Venemo: ah, okay
15:13 bnieuwenhuizen: google cloud platform?
15:13 Venemo: ok
15:14 seanpaul: danvet: re v5 of flight recorder, the email thread is starred and marked as unread in my inbox
15:14 seanpaul: have to make inroads on other stuff before i can get back to it
15:15 danvet: yeah ingress is "free"
15:16 Venemo: I'll be curious to find out what the actual hotspot is
15:23 daniels: their data flows are so asymmetric that ingress is literally free
15:43 jekstrand: Venemo: Yeah, it's not perfect and sometimes you have to retry
15:44 jekstrand: One of the suggestions for reducing CI load was to not run tests on provably unaffected drivers. That would likely solve your problem.
15:45 jekstrand: At least it would mean that Panfrost instability isn't affecting you day-to-day as much
15:50 Venemo: jekstrand: yes, I did suggest that on the ML as well
18:09 daniels: robclark, anholt: two new a6xx flakes https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2147#note_424547
18:13 robclark: daniels: hmm, turns out a useful flake.. I wouldn't have noticed that MR request to suggest that it is a bad idea (although atm I'm not on laptop w/ gitlab login so can't comment immediately)
18:13 robclark: daniels: we currently rely on INVALID == legacy path (ie. no modifiers)
18:13 daniels: robclark: yeah? I read the driver and it seems like zero functional change
18:14 robclark: I think the question is how to support the pre-explicit-modifiers case, where some drivers like intel already supported tiled, for example..
18:14 daniels: that's literally what this is enabling - using a different Wayland interface for legacy
18:14 robclark: I think there are some implicit assumptions that INVALID == legacy, ie. not imported w/ explicit modifiers
18:15 emersion: yeah, that's what the wayland protocol uses INVALID for too
18:15 robclark: in freedreno, when we allocate, INVALID means internal buffer, you can use tiled+compressed.. but when we import, INVALID means legacy path (which for freedreno means linear, but other drivers it may not)
18:28 pcercuei: sravn: hi, is there something in place already to control the contrast and brightness of a panel?
18:28 robclark: hmm, I guess it is really more about platform_wayland, rather than the egl extension, as I initially assumed.. so maybe it is ok
18:57 daniels: robclark: yeah nothing changes about the extension at all, and I looked at all the drivers to make sure that they didn't see a behaviour change either (from INVALID appearing in the list with others; legacy path doesn't change at all)
18:57 daniels: it's just about the Wayland platform using a slightly different method of passing the same dmabuf internally
19:02 robclark: yeah, ok, that seems more reasonable
19:15 imirkin_: could i convince someone who doesn't hate tgsi to review the middle change in https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4014 ?
19:47 airlied: imirkin_: r-b (don't mistake me as caring too much about tgsi :-P
19:48 imirkin_: airlied: thanks :) but at least you don't hate it?
19:49 imirkin_: airlied: in a perfect world i might have created a TXL3/TXB3, but ... seems excessive.
19:52 airlied: imirkin_: well I have to care about since i bad a whole virtual gpu on it :-
19:52 airlied: :-P
19:52 imirkin_: hehe
19:53 sravn: pcercuei: only backlight for now
20:13 mdnavare: vsyrjala: kazlaus: With this function drm_for_each_detailed_block((u8 *)edid, get_adaptive_sync_range,
20:13 mdnavare: &info->adaptive_sync); where get_adaptive_sync_raneg will need to acces sthe range = &data->data.range; from the input edid argument, how can i access that?
20:14 kazlaus: the first parameter passed to get_adaptive_sync is the block
20:14 kazlaus: data is retrieved from that
20:27 mdnavare: kazlaus: so in general timing = &edid->detailed_timings[i] and data = &timing->data.other_data; and range = &data->data.range; but how do i get the timing, data and range from the per block?
20:43 mdnavare: kazlaus: okie i think i got it now, the first block passed to get_vrr_range will be detailed_timing and i can obtain data an drange from that further cool will give this a shot
20:58 Venemo: daniels: here is another arm64 script failure: https://gitlab.freedesktop.org/Venemo/mesa/-/jobs/1792469
21:03 daniels: anholt, robclark: ^ think this is a different a6xx es3 flake?
21:08 daniels: Venemo: a6xx and a3xx are freedreno not panfrost, it's a lot easier to just ping the freedreno people directly
21:16 Venemo: ah, sorry daniels I will do that next time.
23:26 imirkin: airlied: gah! i broke tg4 samplerCubeArrayShadow =/ updated patch coming up later tonight.
23:29 imirkin: thankfully CTS tests caught it