00:00 karolherbst: uhhh...
00:00 karolherbst: well
00:01 karolherbst: ahh, it's a vbo
00:23 HdkR: :D
00:23 karolherbst: mhh, there is a solution, just not a good one
00:24 karolherbst: HdkR: CL has this weird USE_HOST_PTR thing... but guess what, you can create a CL buffer from that, and because it's so insane, use your host_ptr as a SVM pointer directly as well
00:24 karolherbst: and now you have to guarentee consistency between the buffer and your host ptr
00:24 karolherbst: it all sucks
00:25 karolherbst: you can fake host_ptrs pretty well as long as you don't support SVM
00:25 karolherbst: but if you do.. you are screwed
00:25 HdkR: :D
00:25 imirkin: karolherbst: yeah, so small allocations will be backed in memory
00:25 HdkR: OpenCL SVM really just seems like it wants to have Cuda memory handling
00:27 karolherbst: imirkin: well, the issue is just, that if you map that CL buffer, your host ptr has to contain the valid data, if you unmap, the GPU side needs the valid data, which really plays out well if all you can pass into the kernel is the CL buffer
00:27 karolherbst: but what if now the SVM gets passed directly
00:27 karolherbst: and some weird offset of that
00:27 karolherbst: passed in through structs
00:28 karolherbst: and the struct _could_ be passed as a CL host ptr buffer of course..., but contains members having SVM pointers...
00:28 karolherbst: it's super annoying
00:29 imirkin: for obvious reasons, we don't use this for persistent mappings
00:29 imirkin: should probably mark them as persistent and move on?
00:29 karolherbst: well, persistent mappings is GPU memory
00:29 imirkin: since they basically are
00:29 karolherbst: there you don't have this issue
00:29 karolherbst: but
00:29 imirkin: i mean the PIPE_BIND_PERSISTENT or whatever
00:29 karolherbst: we talk about CPU memory here
00:29 karolherbst: malloced memory
00:29 imirkin: won't happen
00:29 imirkin: if you use that bind flag
00:30 karolherbst: the application mallocs
00:30 imirkin: oh
00:30 karolherbst: and passes that pointer into the driver
00:30 imirkin: super
00:30 karolherbst: ;)
00:30 imirkin: can't help you with that
00:30 HdkR: woo pinned_memory
00:30 karolherbst: yeah
00:30 karolherbst: in GL we have AMD_pinned_memory for that
00:30 imirkin: i thought you were talking about the small malloc's that we do in the driver
00:30 karolherbst: in CL it's essentially fine grained system SVM
00:30 karolherbst: ahh, no
00:30 imirkin: to temporarily back a buffer
00:31 karolherbst: we need to support that for vulkan anyway
00:31 karolherbst: but...
00:31 karolherbst: maybe we can get the new mm API with only our GL driver?
00:32 karolherbst: the problem in using HMM for userptr is... that the graphics class can't reply page faults
00:32 karolherbst: so... the memory needs to be migrated to the GPU
00:32 karolherbst: which... is super annoying if the application is stupid and just accesses the memory and it gets migrated back behind our back
00:33 karolherbst: yeah.. the intention of AMD_pinned_memory was, that the GPU just accesses system memory directly
00:35 imirkin: yes.
00:37 HdkR: Also fun thing with pinned_memory. Nothing stopping the application from fussing with mapping behind the drivers back on that one :D
00:38 HdkR: I planned on doing nefarious things with it if I managed to get the Nvidia driver team to implement it
00:39 karolherbst: :D
00:39 karolherbst: please don't
00:39 HdkR: They told me no, so I got bored and did something else :P
00:40 imirkin: like implement ARB_buffer_storage in dolphin?
00:40 HdkR: That's already there, that's fine
00:40 imirkin: but at the time...
00:40 imirkin: iirc dolphin started with AMD_pinned_whatever
00:40 HdkR: oh yea, because buffer_storage wasn't available yet
00:41 karolherbst: ufff.. adding support to nouveau_buffer_transfer_map for user_ptr...
00:41 karolherbst: fun
00:41 karolherbst: if user_ptr return user_ptr; at the top?
00:41 karolherbst: but the offset handling...
00:41 karolherbst: uff
00:42 karolherbst: ahh
00:42 karolherbst: it's already handled
11:50 Shred00: is anyone successfully using vdpau with nouveau lately? specifically i have a gt218. the last time it crashed pretty much out of the gate. but i am noticing that proprietary driver support for that chipset is EOL end of this year and Fedora have already dropped support for the 340.* drivers in F31.
15:58 imirkin_: Shred00: some amd-related changes kinda killed video decoding for a few releases, but it should be back to its original state
15:58 imirkin_: Shred00: 19.2 should work, i think
15:58 imirkin_: note that the original state is hardly perfect
15:58 Shred00: is that's what's in 5.3.7[-301.fc31]?
15:58 imirkin_: that's a kernel number
15:58 imirkin_: i'm talking about mesa release
15:59 Shred00: oh.
15:59 imirkin_: any kernel past, say, 3.13 or so, should work fine
15:59 Shred00: 19.2.2 here
15:59 Shred00: how "imperfect" is the original state?
15:59 imirkin_: not sure what the units are
16:00 imirkin_: but "some videos decode with artifacts"
16:00 imirkin_: your collection may be entirely unaffected
16:00 imirkin_: i think it was relatively rare on videos in the wild
16:00 imirkin_: but like the very first video i ever tested with was affected
16:00 imirkin_: heh
16:01 imirkin_: (some kind of h264 demo trailer of the simpsons movie)
16:01 Shred00: good think i don't watch trailers then. :-D
16:01 imirkin_: the artifacts are only on h264, not with mpeg4p10 or whatever (or vc1 or mpeg2, which no one uses)
16:02 imirkin_: i spent a _lot_ of time trying to figure out what was up, with zero progress made
16:02 imirkin_: it must either come down to reference frame mismanagement, or inter-engine buffer mismanagement
16:03 imirkin_: coz everything else is 100% identical
16:04 Shred00: as much as i hate the way nvidia keeps their driver closed and don't provide info on their hardware, i'd hate it only slightly less if they'd at least publish their driver and specs when they EOL it themselves. surely any IP in EOL products is of little or no value at that point.
16:05 imirkin_: mooch is still sitting here waiting for riva128 docs. i doubt it'll happen for gt218 :)
16:06 Shred00: oh, no doubt at all. i'm jus' sayin'.
16:09 HdkR: You don't want the drivers opened. You want the documentation opened instead
16:10 mooch: it's true lmao
16:12 imirkin_: in practice, i think the way that the most recent decoders work is somewhat similar to what was done on the gt218
16:12 imirkin_: at least at the OS <-> GPU interface leel
16:12 imirkin_: level*
16:12 imirkin_: it's a single engine now rather than 3, but i think the basics still hold
16:13 imirkin_: i'm guessing they got tired or rewriting their decoding logic every time
16:13 imirkin_: (and it's basically identical between vp3, 4, and 5, at the OS level)
16:13 imirkin_: vp5 doesn't have "user" firmware anymore, but that's a minor bit as far as the decoding logic is concerned
16:34 karolherbst: imirkin_: btw... I have a proper idea for the spilling issue
16:35 imirkin_: i like proper
16:35 karolherbst: I think we have to postpone when we decide on bailing out. A bit later in the code we collect the actual values which have to be spilled (after simplifying), so at that point we know what _has_ to be spilled, not what potentially needs to be spilled
16:36 karolherbst: problem is, at that point we already assigned registers
16:36 karolherbst: sooo
16:36 karolherbst: I think we have to collect the spill values + have a map of ssa value -> reg value and do the final step when we know everything is good to go
16:36 karolherbst: something like this
16:36 karolherbst: s/final step/final assigning/
16:37 karolherbst: the point where we currently bail out is just "we might not be able to spill it, so let's fail already"
16:37 karolherbst: or well
16:37 karolherbst: also not being able to assing a reg
16:39 karolherbst: I think I could come up with a patch by tomorrow or so. I doubt it will be hard to do overall
16:39 karolherbst: and it's much more obvious whats going on this way
19:20 danvet: imirkin_, feel like giving the patch a spin today?
19:21 danvet: mlankhorst r-b stamped it, so I'm inclined to just merge it and handle any screaming when it's in my inbox :-)
21:04 Lyude: skeggsb: you around? Wondering if you know whether or not it's intentional that we set the or depth to 0 in nv50_pior_enable()[dispnv50/disp.c] for bpc values != (10, 8, 6)
21:05 Lyude: Asking because I'm trying to move the depth (or at the very least, bpp/bpc settings) calculation into the atomic check and out of the actual atomic state commit to fix an issue I found with mst
21:06 Lyude: it seems like we'd want to set the depth to the highest value possible if we see a bpc we don't have a depth value for, not the lowest, but maybe it's some sort of thing nvidia hw expects?
21:10 skeggsb: that's... complicated.. HW error-checks the values, and accepts different things for different ORs / protocol values / various other configurations. "highest" isn't correct either necessarily, you have to limit that to what the sink can handle too / available bandwidth etc.
21:11 skeggsb: "0" is actually DEFAULT, and that also depends on the OR type, and also not accepted everywhere
21:11 Lyude: ahhh, I gotcha
21:12 skeggsb: what we do currently isn't ideal, but requires some more thought on what to actually do.. i believe imirkin was looking into that stuff for supporting higher depth displays etc
21:12 imirkin_: this will come up more for the hdmi high-bpc stuff too
21:12 imirkin_: i think ville's idea is to expose a "max bpc" property, and otherwise set the highest allowable
21:13 Lyude: mhhh. basically what I'm trying to accomplish here is just to make it so we settle on a depth/bpp value during the atomic check, and then only relying on the values for depth/bpp from the atomic state when performing the actual commit
21:13 Lyude: imirkin_: ah yeah, there's already a max bpc proprty btw
21:13 Lyude: *property
21:13 imirkin_: right
21:13 imirkin_: this all interacts fairly interestingly with yuv420 stuff
21:13 imirkin_: e.g. do you do yuv420 + lower bpc, or yuv444 + higher bpc
21:14 imirkin_: i haven't (tried to) figured out how to get yuv420 going yet, so it's a bit hypothetical
21:14 imirkin_: [i.e. how to make the hardware allow it]
21:15 Lyude: mhhh, now I'm more confused. I see us checking asyh->base.depth in nv50_head_atomic_check_dither(), but then we come up with our own depth in nv50_msto_enable() and nv50_sor_enable() (but only sometimes?) and nv50_pior_enable()
21:15 Lyude: and also nv50_dac_enable() it seems
21:16 Lyude: so, everything makes sense (and needs to be moved like I described above as a result) except for the nv50_head_atomic_check_dither() implementation
21:16 Lyude: is that just a mistake?
21:38 imirkin_: probably just subtle.
21:39 imirkin_: what's the question?
21:39 imirkin_: oh, looking at asyh->base.depth...
21:45 Lyude: which we don't set until commit time
21:46 Lyude: so unless I'm missing something that's likely mistakenly checking what amounts to the calculated depth for the previous state (since it wouldn't have been set yet) and not the current one
21:50 imirkin_: seems belieable