11:44MrCooper: eric_engestrom: please removed the "revert-5ff443b8" branch from the main Mesa repository
12:51eric_engestrom: MrCooper: sorry about that, I used the gitlab webui "revert" button and I forgot it creates branches in the upstream repo...
12:51eric_engestrom: that was for https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4112 and I think it only existed for a couple of minutes
12:51eric_engestrom: (it was auto-deleted when marge merged it)
12:58MrCooper: yeah, it's a trap :}
14:23jadahl: can one get a device node of some kind from a dmabuf fd? motivation of doing so would be to try to mmap it via libgbm and not mmap directly
14:26mripard: airlied: hi, could you merge the last rc in drm-next, jernej would need some patches that are in rc6 for drm-misc-next?
14:36ajax: jadahl: "get a device node"?
14:37jadahl: ajax: something that'll get me a gbm_device
14:37jadahl: ajax: lets say I have a dmabuf and nothing else. the "real" question is, how do I get the content to the CPU memory the fastest
14:39daniels: jadahl: call mmap on the dmabuf fd ...
14:39daniels: (you need to call the sync begin/end ioctls before and after mmap)
14:39daniels: oh sorry, I see about 'not mmap directly'. why not directly? does it not work?
14:42jadahl: MrCooper mentioned that on some drivers copying from mmap() might be very slow
14:42jadahl: and the only reliable way is to go via opengl, vulkan or gbm
14:43daniels: MrCooper: ^ does AMD actually give us a different view from gbm_bo_mmap() vs.mmap(dmabuf_fd)?
14:44jadahl: daniels: btw, whats the name of those ioctls?
14:45jadahl: fwiw the context here is how to handle dmabufs in pipewire stream consumers
14:46daniels: jadahl: https://cgit.freedesktop.org/drm/drm-tip/tree/include/uapi/linux/dma-buf.h
14:46jadahl: daniels: thanks!
14:55MrCooper: daniels: GBM mmap is implemented using Gallium transfers I think
14:59daniels: MrCooper: which architectures does it help on?
15:00MrCooper: anything where GPU accessible memory can be uncacheable for the CPU, in particular discrete GPUs
15:04lynxeye: daniels: MrCooper: yes, gbm_bo_map is implemented using gallium transfers, which a) may give you cachable memory and more importantly b) gives you a linear view regardless of the internal buffer layout
15:07daniels: does cache help that much if you're just doing memcpy() out ... ?
15:07daniels: i'd assume that's what pipewire would be doing, not RMW
15:07emersion: do you mean that regardless of the modifier, mapping the DMA-BUF will give a LINEAR view?
15:07emersion: or does that just happen with implicit modifiers?
15:09MrCooper: daniels: CPU reads from uncacheable memory are excruciatingly slow, more so across a PCIe link
15:09MrCooper: on the order of tens of MB/s
15:10daniels: ah, so Gallium transfer uses the GPU to blit to cacheable CPU?
15:11tomeu: airlied: what kind of applications are you using to test the functionality that you have been adding to llvmpipe?
15:15lynxeye: emersion: no, mmap on the dma-buf will give you the raw data (so you'll see the layout according to modifier). gbm_bo_map() however will give you a linear view.
15:22emersion: ah, interesting
15:25jadahl: so the question still stands then, how do I gbm_bo_mmap() a dmabuf if I don't already have a gbm_device?
15:27Venemo: why are there so many SPIR-V warnings from the Vulkan CTS? does the CTS really break the spec, or are we too pedantic about it?
15:33pepp: MrCooper: are we good to merge !2569, or should I wait for more reviews?
15:34MrCooper: it's been open for months, I say let's merge it
15:42daniels: jadahl: you'd need to create a gbm_device and gbm_bo_import the dmabuf
15:45jadahl: daniels: create a gbm_device from what?
15:46daniels: well, you can't go from a dmabuf fd to a gbm_device, because the entire point of dmabuf is to be cross-device ...
15:46jadahl: thats the issue right now indeed. dmabuf is cros-device, but I need to know what device it actually is from to fetch its content without being potentially slow
15:47emersion: does it actually matter? can't you use any gbm_device?
15:47jadahl: if I open a random gbm_device, import the dmabuf into a gbm_bo, then mmap it, will I still get the "fast" paths?
15:50jadahl: e.g. if I accidentally open a displaylink backed gbm_device and import a amdgpu dmabuf, then mmap it, will it work?
15:51MrCooper: surely there's no need to risk that, can always use the "primary" device?
15:52jadahl: to get the primary device, I have to use opengl
15:52jadahl: i don't know of any other way that is communicated from the display server
15:53MrCooper: how do Wayland clients pick the device?
15:53jadahl: but then a gstreamer pipeline will suddenly need to open display server connections (X11 or Wayland)
15:53dcbaker: kisak: I'm working with eric_engestrom to do that release. Just trying to get a few other people trained to do releases
15:53jadahl: the compositor initializes the egl display on the primary device
15:54kisak: dcbaker: thanks cool, thanks
15:54jadahl: an alternative is to add a metadata field in the pipewire stream which device it should be opened using
15:55jadahl: so pipewire consumers can continue to ignore windowing systems
15:58jadahl: emersion: was there a consensus in the dmabuf wayland protocol for how to best identify a drm device?
15:58emersion: unfortuntaely the patch is still pending review
15:58emersion: oh, different question maybe
15:58jadahl: yea, this would be to add some metadata in pipewire
15:58jadahl: so the consumer can open an appropriate gbm device
15:59jadahl: without having to query GLX/EGL
15:59emersion:read sthe backlog
16:00emersion: okay, so you want to identify the primary device
16:00emersion: this patch allows it: https://gitlab.freedesktop.org/wayland/wayland-protocols/-/merge_requests/8
16:00jadahl: more or less, because it's likely to be "good enough" to import the dmabuf to
16:01jadahl: emersion: don't want to use any window system protocols here though
16:01emersion: use get_default_hints
16:01emersion: oh, you don't want to use the dmabuf protocol?
16:01jadahl: no, I don't
16:02jadahl: i don't want to add multiple windowing system support in every pipewire consumer wanting to read dmabufs
16:03jadahl: so an idea is to add a field to the pipewire stream telling an appropriate device to create agbm_device on, to import and gbm_bo_mmap() the dmabuf
16:03emersion: i see
16:03emersion: then you're looking for https://gitlab.freedesktop.org/wayland/wayland-protocols/issues/10
16:03jadahl: and I remember there was a discussion about using something in some wayland protocol to identify a drm device
16:03gitbot: wayland issue 10 in wayland-protocols "Identifying DRM devices" [Opened]
16:03jadahl: ah, right
16:04jadahl: can pass ID_PATH_TAG of the primary gpu via pipewire as a hint then I guess
16:48shadeslayer: anholt: would you have some time to discuss what buffer labeling in userspace should look like so I can appropriately design the kernel interface?
18:11airlied: mripard: I'll try a backmerge today
18:11airlied: mripard: rc5 just came out, so not sure how someone can require rc6 patches :P
18:13airlied: tomeu: for tessellation I used heaven, but it's mostly been just piglit and vulkan cts (via my vulkan impl)
18:14tomeu: airlied: was thinking of which FOSS games and apps would give a good coverage of the more fragile parts of llvmpipe, when using traces for CI
18:15airlied: tomeu: there isn't really much intersection on new GL features and FOSS games/apps :-P
18:16airlied: my main thing is to not make openarena or gears slower
18:16tomeu: yeah, that's why I was asking :p
18:16tomeu: airlied: we aren't doing performance yet, though I already need that for panfrost
18:17tomeu: np, I think some FOSS engines have been lately adding features that make use of more modern opengl, will look at those in more detail
18:17airlied: would be good to know if supertuxkart or xonotic had any support for newer features
18:18airlied:still hasn't gotten Talos to start on my vulkan impl
18:20airlied: jadahl: I wonder if would make sense to add an ioctl that would return the initial device
18:21airlied: tomeu: I think the main use for getting llvmpipe features is just getting CI coverage across more of the API, rather than apps :-P
18:22tomeu: airlied: well, apps are able to find bugs even when the test suites pass...
18:23jadahl: airlied: "initial"?
18:24airlied: jadahl: to say who owns the resource, I suppose initial is the wrong thing
18:24airlied: tomeu: oh indeed, and when we have apps in CI I'll be happier :-P
18:25jadahl: airlied: that'd solve the problem at hand I think
18:26tomeu: airlied: yeah, the idea is to capture traces from interesting apps and start replaying them in CI
18:42jadahl: airlied: question is what would such an API return? an open fd to the device? if so with what permissions?
18:44emersion: we already have APIs to get a device node from a FD, e.g. drmGetRenderDeviceNameFromFd
18:44emersion: those return a path
18:45pepp: tomeu: maybe try the bgfx examples app?
18:45jadahl: hmm, what happens if I call that on a dmabuf
18:45emersion: it's not supposed to be called on a DMA-BUF, it's supposed to be called on a device FD
18:45emersion: (primary or render FD)
18:46emersion: i wonder if accepting a prime FD for this function would be too confusing
18:47airlied: jadahl: yeah not sure, it would have to be thought about a bit more,
18:47airlied: sumits and danvet (who is off this week) would be best people
18:48airlied: like it could just return a device node
18:52emersion: a device node path, you mean?
18:54airlied: emersion: a major/minor pair
18:54airlied: like fstat does
19:40tomeu: pepp: looks very interesting, thanks
19:58karolherbst: ehh.. images in CL are dodgy.. they don't have a sampled type
19:59pinchartl: bbrezillon: sorry to mix your lvds-codec series with the imx patch :-)
20:00Lyude: seanpaul: thanks for poking me about the mst patches that need review btw
20:02bbrezillon: pinchartl: no problem, it's actually the same issue
20:23robclark: janesma: I don't suppose you've used frameretrace (or apitrace or ??) for comparing two versions of mesa.. ie. which specific draws are helped vs hurt for a given compiler change, and things like that?
20:32Lyude: Hey mripard jfyi I'm going to go ahead and push some MST changes to drm-misc that touch some bits in radeon, i915, nouveau and amdgpu - is that alright with you (and do you need me to check w/ other driver maintainers?)? Patch series is https://patchwork.freedesktop.org/series/74412/
20:32pinchartl: bbrezillon: I see we'll have fun
20:42Lyude: ( skeggsb, j4ni, hwentlan ^ jfyi)
20:43bbrezillon: pinchartl: and we hijacked Marek's thread :)
20:44airlied: Lyude: btw are those MST fixes wanted for fixes? how big are they, I expect I'll have to sell them to Linus
20:45Lyude: airlied: which ones? if you're asking about the patch series I just linked to that's just misc cleanup stuff, but there's two rather important series of fixes for the current kernel rc on the mailing list https://patchwork.freedesktop.org/series/74295/ https://patchwork.freedesktop.org/series/74407/
20:46janesma: robclark: sure, that is what frameretrace is for
20:46janesma: it's not a workflow that is built directly in to framretrace
20:47janesma: you can do it fairly well by just opening the same trace in two instances with different LD paths
20:48janesma: to look at a larger workload, it can be easier to capture gpu metrics at intervals throughout a trace, and put the data into a spreadsheet.
20:48airlied: Lyude: yeah the ones you sent out to fix the pbn regression
20:48robclark: ahh, yeah, I figured that was possible, just wondering if there was some better trick.. exporting counters to something that I can dump into a spreadsheet would be useful..
20:49airlied: Lyude: it's just he already complained the last rc was too large, so these patches might put him over the edge, and we should revert instead and fix later
20:49janesma: I hacked up something called "framemetrics" that does this, and we use it to identify frames or renders that have different performance profiles between drivers.
20:50robclark: I see a thing called framestat
20:50janesma: robclark: https://gitlab.freedesktop.org/majanes/frameretrace/-/tree/master/src/framemetrics
20:50janesma: framestat was just frame timing.
20:50janesma:is terrible at picking names
20:51robclark: heheh, so am I
20:51robclark: ok, I'll poke around the framemetrics thing, thx
20:51janesma: you could hack something like framemetrics pretty easily. It runs on the intel perf interface, but I did get it running on the AMD monitor as well. It's not something that I've been pushing others to use, so YMMV.
20:52janesma: things to keep in mind about it: the idea is to capture metrics in more of a "live" renderloop scenario. However, apitrace has tons of parsing overhead. It is likely that you will be driving the GPU at less than 100% because of the parsing overhead.
20:53janesma: This is not a problem for frameretrace, because the apitrace objects for a single frame can be held in memory.
20:53robclark: looks like first I need to figure out why I'm getting an assert(!GL:GetError())..
20:53janesma: for a long trace, you have to parse and delete every call object. Apitrace is terrible at that.
20:54robclark: *probably* a single frame will be ok..
20:54janesma: I hacked the parser to parse ahead in worker threads.
20:54robclark: at this point it is more to figure out where my expectation of what matters for perf, and what the hw thinks matters, differ..
20:55janesma: yeah, so if you want to do a single frame, then you will want to hack the main loop to parse up to the frame, get all the call objects into memory, and iterate over them with begin/end metrics calls, dumping the values out to csv
20:56janesma: this particular tool is just a day of hacking to try to get metric data for myself which is similar to what you are looking for.
20:57janesma: it's not product quality but it's also not hard to hack something else similar.
21:02Lyude: seanpaul: do you think you'll have any time to take a look at https://patchwork.freedesktop.org/series/74295/ ? could use someone else's review before this goes into the current rc
21:05robclark: Lyude: jfyi, I got OoO email from sean earlier today.. so he may or may not be watching irc..
21:05Lyude: robclark: ooo, that is useful thanks for letting me know
21:06Lyude: vsyrjala, mdnavare: either of you two up for this? ^
21:06Lyude: or hwentlan as well I suppose
21:09mdnavare: Lyude: I am not sure i will have time for revieing this today, but i can take a look at it tomo
21:09Lyude: mdnavare: I think that should be ok
21:09Lyude: airlied ^ when do we need to get these fixes ready by for the next pull btw
21:09robclark: janesma: jfwiw, this "fixed" the GL:GetError() assert I was hitting.. https://paste.centos.org/view/5b372560 .. I didn't dig too much, but this trace I'm looking at throws a few gl errors of it's own, so not sure if there is somewhere where frameretrace isn't draining errors that come from the trace itself?
21:10mdnavare: Lyude: btw Alex just reviewed it
21:10Lyude: oh! didn''t even notice :)
21:11mdnavare: Lyude: oh wait may be thats a different series, Fis link address probing regressions
21:11Lyude: mh-I think agd5f started on reviewing the other one as well though
21:13janesma: robclark: thanks, you are right that frameretrace ought to drain gl errors (and report them in the UI). I have asserts in there now because I was trying to catch my own GL errors.
21:14robclark: yeah, makes sense
21:33mdnavare: hwentlan: kazlaus: When you set the vrr_capable_property, you check if the range is > 10, is this number specified anywhere?
22:54robclark: janesma: jfyi, that assert I was hitting before.. I don't see on latest master from your gitlab tree.. I hadn't realized you'd moved to gitlab, earlier I was still using your github tree
22:54robclark: (also \o/ for replacing cmake with meson)
22:55imirkin_: oh, is that why there haven't been updates to frametrace in like 2 years?
22:55janesma: oh, sorry about that. I'll push a branch that drops the code in github and refers people to gitlab.
22:55robclark:should probably do that with a tree or two
23:00robclark: hmm, I might have spoken a bit too soon about the assert.. oh well
23:20karolherbst: ehh.. images in CL are weird.. especially in spirv
23:34karolherbst: jekstrand: so.. in the spirv I have OpTypeImage and OpTypeSampler function parameters, but effectively those will just be scalar constants I want to pass into the function inside the CL kenrnel wrapper function. Any nice idea on how to handle that?
23:34karolherbst: I guess using the normal pointer deref stuff would be overkill/impossible?
23:35karolherbst: yeah.. maybe I should just do whatever we do in the else case as well and treat it like a regular ssa value
23:37airlied: karolherbst: I really felt we needed to encode the driver descriptors there
23:37airlied: so we'd have to pass around a driver decided descriptor whether that be a vec2/4/8
23:37karolherbst: well.. clover just indexes them by order of occurrence
23:38karolherbst: so first image arg is image0
23:38karolherbst: airlied: why that complicated?
23:38karolherbst: sure we could go the bindless way, but why?
23:40karolherbst: CL only requires us to handle 128 read args at once... which we can actually do, just requires some reworks I guess
23:41airlied: karolherbst: for the kernel inputs bindless seems to be cleaner to me
23:41airlied: otherwise the driver had to manage a binding table that was quite hard to work out how it looked
23:41karolherbst: same as with GL, no?
23:41airlied: not really, since GL has binding points
23:41karolherbst: clover calls into bind_sampler_states and set_sampler_views
23:41karolherbst: and binds them starting from 0 up to nr
23:42karolherbst: airlied: in CL all of that is more implicit than in GL actually
23:42airlied: it means the driver has to load an index from the kernel input stream and then load a descriptor from some other const buffer
23:42airlied: using that index
23:43airlied: what would be in the kernel input?
23:43karolherbst: those are opaque
23:43karolherbst: even in CL
23:43airlied: it's the area I ran into major problems trying to work out how to fix with current radeonsi
23:43karolherbst: I even checked what nvidia is doing and it also uses bound texture instructions
23:43airlied: and going bindless just seemed a lot saner
23:43karolherbst: with hardcoded texture/sampler regs
23:43airlied: but maybe I should try and fixgure it out
23:44karolherbst: well.. those args are opaque
23:44airlied: I think I've attempted images on radeonsi/llvm twice now and felt ill in that area
23:44airlied: but maybe it was due to me thinking the API was better than it was
23:44karolherbst: maybe llvm just sucks here :/
23:44karolherbst: anyway, I'll try to make it work with nvc0 and see how that goes
23:44airlied: I think managing the binding buffer was the problem
23:45karolherbst: but I really don't see clover doing anything special
23:45karolherbst: it just binds the sampler/samplerviews
23:47airlied: i think it was linkking the input up to it
23:47karolherbst: mhh.. actually it doesn't write anything into the input
23:50airlied: yeah maybe I was confused about how it should work
23:50airlied: oh wait there was something about not having all texture info at the time
23:50airlied: I'd have to probably go try implement it again to pull my hair out :-p
23:56karolherbst: airlied: well.. we need to support 128 textures
23:56karolherbst: which mesa doesn't
23:56karolherbst: but.. the painful part is to fix st/mesa, not gallium afaik
23:58airlied: karolherbst: it does write values into the input from what I can see
23:58airlied: insert(ctx.input, v)
23:58imirkin: karolherbst: you don't have to make st/mesa support >32 textures
23:58imirkin: just make sure it doesn't use MAX_SAMPLER_VIEWS or whatever directly