15:54 haritz: hello
15:59 haritz: I was wondering, I'm in a situation where I have to send tiling header information to kernel space when creating a frame buffer
16:00 haritz: I thought of doing it with drmModeAddFB2WithModifiers, but the offset parameter it takes is meant to be a byte offset to the beginning of actual FB pixel data
16:00 haritz: whereas the underlying display device requires an offset expressed in tile numbers, for FB compression
16:00 haritz: I've been researching how to pass this information over to the DRM display driver through one of the DRM userspace API functions, but couldn't find anything that matches this requirement
16:01 haritz: is there any canonical way of doing this, other than perhaps extending the DRM driver with a custom IOCTL ?
16:47 robclark: haritz: for drm/msm UBWC compressed buffers, we use a single plane but the start of the buffer is compression meta/flag data (ie. header).. we pass the offset to start of that as the start of the plane.. the offset from that to the start of pixel data is calculated based one fourcc/pitch/height (so the kernel is aware of a subset of the rules for layout of header vs pixel data.. ie. enough to deal with simple single
16:47 robclark: layer/level 2d textures)
16:50 robclark: mareko: is the intention that after replace_buffer_storage the src and dst pipe_resources can be used interchangably? I'm seeing a replace_buffer_storage() followed by create_stream_output_target() with the dst.. followed by transfer_map(src) .. https://paste.centos.org/view/raw/946ccb00
16:51 robclark: (if they can be used interchangeably that causes some lolz for batch dependency tracking / re-ordering.. because transfer_map() will not wait for the correct rendering to complete)
16:53 robclark: What I'd *expect* (and would work properly for me) is for tc to always pass threaded_resource::latest to driver.. but not sure if there is a good reason you didn't do it that way.. or if it was just something that happened to work out for radeonsi
16:53 robclark: (and if I change it to always use threaded_resource::latest, would that cause problems for radeonsi?)
17:18 haritz: robclark: sorry, I find your explanation a bit confusion
17:19 haritz: Is there any section in the DRM documentation that explains this in more detail?
17:20 robclark: the short version is we always pass the start of the header as if it were the start of the pixel data, and the driver knows the details about how to calculate the offset from the start of teh header to start of pixel data
17:20 robclark: probably not in docs.. intel does it a bit differently with two planes (one for header and one for pixel data)
17:21 robclark: but that approach has it's own drawbacks and krh didn't want to go down that path for msm
17:21 robclark: the details of how the figure out the header size are ofc somewhat hw and potentially format specific
17:21 haritz: well, that would greatly simplify things for me, because if I can simply pass the header size in number of compressed tiles down to the display driver, calculating there start of pixel data thereof is a very cheap operation
17:21 haritz: but calculating tile information from physical buffer size dimensions isn't
17:22 haritz: and the display hardware, when operating in compressed mode, has to be given the size of the compression header in number of tiles
17:22 haritz: however DRM documentation claims that the offset parameter in the drmModeAddFB2WithModifiers call always must be interpreted as a byte offset
17:23 haritz: I didn't heed this but then someone pointed it out to me and claimed it has to be corrected
17:24 robclark: it very much must be a byte offset.. the question is a byte offset to *what*.. if you make it byte offset to start of header, that is perfectly fine (because it is an opaque format from the PoV of anyone outside the driver who wants to do cpu access)
17:25 robclark: how hard can it be to calculate the header size??
17:25 haritz: it's a rather cumbersome operation that right now is being done in userspace
17:26 haritz: and reckoned from physical frame buffer dimensions and a user-supplied compression format
17:26 haritz: the compression format is then written into the modifier variable when calling drmModeAddFB2WithModifiers
17:27 haritz: however, for programming the display driver in compressed mode, for some reason the hardware engineer who designed it made mandatory specifying the size of the header in number of tiles
17:27 haritz: besides the actual byte offset to pixel data
17:27 haritz: so doing crunching the numbers one way is quite easy, but not the other way round
17:27 haritz: and I was trying to avoid having the display driver doing these sort of calculations in kernel space
17:28 robclark: I mean # of tiles should be not so hard to calculate.. it is a function of the pitch and height
17:29 haritz: it probably isn't a very expensive operation, but I'm resorting to a userspace library for working the numbers out
17:29 haritz: and doing it in kernel space would entail linking the display driver module against that library
17:30 haritz: which would somehow fatten the binaries, and also would end up calculating the same figures both in user and kernel space
17:30 haritz: which sounds like a bit of a waste of time
17:30 robclark: I mean it should just entail knowing the block size and some div_round_up
17:31 robclark: you're userspace library probably has to deal with a lot more cases than the kernel..
17:32 robclark: because getting into array/3d + mipmap and other cases userspace has to deal with for gpu is where the complexity is
17:32 robclark: the cases that the kernel has to deal with are very limited/trivial
17:32 haritz: mmmm, I'll look into this
17:33 haritz: so you claim that just with width, height, pitch and modifier information, calculating header tile size should be trivial?
17:35 robclark: yeah.. so stride should already be aligned to block size.. height may not be so you might have to round up.. for us, after that, we round up to page boundary
17:36 haritz: thanks
17:36 haritz: I'll have a look into this
17:36 robclark: so, blocksize comes from the format, it's roughly like ROUND_UP((stride / block_width) * (ROUND_UP(height, block_height) / block_height, PAGE_SIZE)
17:37 robclark: (and block w/h will be PoT so those can be shifts instead of actual divide)
17:37 haritz: that looks a lot like how I was calculatin byte offset from the number of tiles
17:38 robclark: (oh I guess I forgot to multiply by the compression header block size)
22:24 mareko: robclark: IIRC, replace_buffer_storage is a way to allow buffer invalidation (in place reallocation) without syncing tc or the driver. The idea is that transfer_map with invalidation on a buffer (let's call it dst) will allocate a new buffer in tc (let's call it src), map it, and return it to the user. In order to make that work while the driver hasn't even processed commands with dst, it enqueues
22:24 mareko: replace_buffer_storage, which means dst unreferences its storage and adopts the storage of src for all future calls.
22:26 mareko: for a small fraction of time, both src and dst will use the same storage, but src won't be used in any calls except unsynchronized transfer_map
22:26 mareko: .. I think
22:26 mareko: yes that's right
22:34 mareko: so tc can do invalidate, map buffer, invalidate, map buffer.... without ever syncing
22:36 robclark: mareko: ok.. I guess the question is which buffer to leave the tracking associated with.. if transfer_map and other uses were consistent with using the same pipe_resource ptr that would help.. but the problem now is transfer_map uses one and other pipe api's use the other.. so transfer_map() doesn't wait for the right thing.. need to think a bit on what a sane solution might be (but just headed out for walk+errands
22:36 robclark: so bbl)
22:42 mareko: robclark: transfer_map should always be unsynchronized with the src
22:44 mareko: robclark: let's forget about freedreno and let's think about how you would implement buffer invalidation for a buffer that has been used this frame and get the pointer to it right now when the driver hasn't even started executing anything for the current frame
23:54 robclark: mareko: the problem is freedreno is doing it's own draw-reordering and tracking the dependency between reads and writes to various resources.. which breaks when you mix use of the old and new version of a prsc.. the funny thing is it already has it's own resource shadowing internally (not for *exactly* the same reason, but similar.. to avoid forcing a tile pass flush)
23:56 robclark: maybe I can just move that dependency tracking to the fd_bo (since really it is about the underlying storage)