00:03 redsheep[d]: Doesn't seem terribly constraining if the video hardware only works with the bottom 40 bits of address space, right?
00:05 karolherbst[d]: I suspect the reason are "IP blocks" and nvidia not having bothered
00:12 airlied[d]: yeah amd had some 32-bit only blocks for a few revs, painful to fence
00:12 redsheep[d]: A terabyte of address space isn't exactly small, even on a big cluster just don't have the video hardware interact with the upper bits and it wouldn't break, I'd assume. That's still like 22 full hours at a pretty crazy 100 mbps
00:13 karolherbst[d]: well
00:13 redsheep[d]: Imagine using a dgx as an in memory only NVR :kekw:
00:13 karolherbst[d]: you don't use those fixed function hardware on those video files anyway 🙃
00:15 karolherbst[d]: also.. it's still shifted
00:15 karolherbst[d]: so we have 48 bits in total
00:15 redsheep[d]: redsheep[d]: It really puts the NV in NVR lol
00:29 airlied[d]: well the problem is sometimes you want the IP block to write a value into a 64-bit address space
00:30 airlied[d]: though I think a lot of the problem have also been with the interface layers vendors add to the IP blocks
00:55 groupofprogcode: The chip implanted by scammers is being felt very well on my neck, it's not like years of such performance and conspiracy ain't gonna form a clash of our lines. No one is as stupid to assume that a horny set of Estonian and their worldwide quasimodos were allowed to perform such things. And btw. all the code is generated for rust opencl, there is no mind that such programmers own/posess
00:55 groupofprogcode: to write code in manual. Every minute they terrored me is going to add matters worse for them. Involvement in the sports to repeat my youth achievements well liquid shit on the table is safe bet from those quasimodo horny fuckfaces naming me a loser, and well i never said they can't share their aids or anal fungus inside their own territory , they can. airlieds red hat old farts i saw
00:55 groupofprogcode: in cambodia , they used to have a red hat house in phnom penh too, these people waited when i got drunk to assault me, will be captured asap and thrown out from windows to the last road. You are very bad and scammy people. Elon Musk has nothing to do with such idiots of course he fired as much as possible of such. There similarly isn't a person in the sane world, like Aaron Ballman.
00:55 groupofprogcode: There is a terrorist who never made any code neither played baseball to begin with yet diagnoses higher intelligence as mentally ill. So that Estonian devils get a bad run in the world and worse feedback from swedish and finnish girls is known trend, it's all what they deserve and feedback is proportional to their scams. Sadly enough i lack life this time around due to actions of those
00:55 groupofprogcode: devils, but they will go to many iterations to born as similar quasimodo worms, and hence none is wanting to deal with such in thousonds of years. I was a good and productive and successful person pre-terror, and will get another opportunity in new life.
02:45 gfxstrand[d]: I just realized that once mhenning[d]'s transfer queue MR lands, I'm gonna have to re-run conformance on all the old hardware because Maxwell+ will be able to do 1.4.
02:46 gfxstrand[d]: https://tenor.com/view/luke-skywalker-no-star-wars-mark-hamill-hanging-on-gif-11368455
02:51 steel01[d]: Maxwell is capable of bleeding edge vulkan? Wow.
02:51 steel01[d]: Where's kepler stand?
02:54 chikuwad[d]: 1.2
03:21 gfxstrand[d]: Kepler is forever at 1.2.
03:23 steel01[d]: Not surprised there's hardware limitations. Wonder how long it'll take before maxwell hits similar. I wouldn't have expected it to last this long. Gonna be fun to get a shield tv fired up with brand new vulkan stuff running at like 3 seconds per frame. 😛
03:25 steel01[d]: A shield tablet with vulkan 1.2 will still be pretty slick, though.
04:30 gfxstrand[d]: I just need to get *something* fired up first. 😢
04:30 steel01[d]: gfxstrand[d]: Still no luck on that nano? Or other stuff took priority?
04:31 gfxstrand[d]: I got the kernel installed at the end of Wednesday. I wasn't in the office today. I'll probably go in for at least a little bit tomorrow. Hopefully 6.10.30 works.
04:32 steel01[d]: I'm trying to get something tegra k1 fired up and recently found out that the driver just locks up the kernel. Without a stack trace. >< So I gotta figure out how to debug that.
04:32 steel01[d]: Need to finish stabilizing everything else first, though. Can run with swiftshader for that.
10:30 snowycoder[d]: I'm chasing a weird OOB trap on Kepler tessellation control shaders.
10:30 snowycoder[d]: The encoding is correct and I'm dealing with small shaders, I think it's overflowing the attribute buffer?
10:30 snowycoder[d]: When a warp execution needs 4 threads, do the other threads still run?
11:13 karolherbst[d]: depends on which threads
11:14 karolherbst[d]: but yeah, the hardware might run helper invocations for various reasons
11:14 karolherbst[d]: but it's usually within quads
11:14 karolherbst[d]: but if threads 0, 1, 2 and 4 are running, there might be threads 3, 5, 6 and 7 as helper invocations
11:15 karolherbst[d]: and they do memory loads
11:16 karolherbst[d]: but their writes shouldn't be visible, at least not anymore
11:17 karolherbst[d]: it's fixed since 6.5 or something
11:41 gfxstrand[d]: snowycoder[d]: Some tessellation tests OOB. We shut the warning off on Maxwell+. We don't have any ability to smash the register without kernel help on Kepler.
11:42 snowycoder[d]: But why does it go OOB? Is it because of triangles?
11:43 gfxstrand[d]: https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/nouveau/vulkan/nvk_cmd_draw.c#L234
11:43 karolherbst[d]: gfxstrand[d]: ohhh another bit?
11:44 karolherbst[d]: snowycoder[d]: 1cb9e2ef66d53b020842b18762e30d0eb4384de8 see for the helper invoc thing, if you figure out the right register for kepler, you can write a similar patch for that
11:46 karolherbst[d]: it's great, because we have 0 documentation on those 🥲
11:47 karolherbst[d]: I suspect it's also `0x419e44` on kepler tbh
11:47 gfxstrand[d]: Not quite zero. I found them in some of the headers we have. But IDK that we have anything as far back as Kepler.
11:48 karolherbst[d]: I think it's the same
11:48 karolherbst[d]: on gm107 it's set to 0x00d3eff2
11:48 karolherbst[d]: on kepler1 it's 0x0013eff2
11:49 karolherbst[d]: in both cases the bit 14 is set to 1
11:49 karolherbst[d]: while nvk set it to 0, right?
11:49 gfxstrand[d]: Yup
11:49 karolherbst[d]: snowycoder[d]: `gk104_grctx_init_sm_0` on kepler 1
11:49 karolherbst[d]: just clear bit 14 😄
11:50 karolherbst[d]: `{ 0x419e44, 1, 0x04, 0x0013eff2 },` => `{ 0x419e44, 1, 0x04, 0x0013aff2 },`
11:51 karolherbst[d]: mmhhhhhh
11:51 karolherbst[d]: actually...
11:51 karolherbst[d]: the SW channel can write that reg
11:51 karolherbst[d]: 😄
11:51 karolherbst[d]: `gf100_gr_mthd_set_shader_exceptions`
11:51 karolherbst[d]: method `0x1528`
11:51 karolherbst[d]: well..
11:51 karolherbst[d]: it sets it to either `0xffffffff` or `0x0` so that's not helpful
11:56 gfxstrand[d]: If we had ever landed your SW channel stuff... 🙃
11:57 gfxstrand[d]: Not a slam on you. I was also 100% onboard the "we'll figure out Kepler when we get there" plan. Then we got there. 🙃
11:59 snowycoder[d]: karolherbst[d]: Thank you! I'll check it later, it's weird seeing kepler logs while tests pass
11:59 karolherbst[d]: gfxstrand[d]: well, we know just do it for every context, so whatever
11:59 karolherbst[d]: *now
11:59 karolherbst[d]: snowycoder[d]: ohh, if it's in the logs, we can actually verify it's the same thing... mind pasting it?
12:00 karolherbst[d]: gfxstrand[d]: the initial idea was to use it for the shader trap handler
12:01 karolherbst[d]: which... and I repeat this every time I remember it exists, we should figure out and wire up lol
12:01 karolherbst[d]: in theory it's simple 😄
12:01 karolherbst[d]: you whack the sm trap mask regs, install the shader trap handler, write some code and you got a GPU debugger
12:02 karolherbst[d]: I got the trap handler to trigger on some GPUs
12:02 gfxstrand[d]: snowycoder[d]: As long as it's just those, it's probably okay. But one of our sources of kernel instability is exception logging. If there's too many of them we hit race conditions or something. I'm not sure exactly what goes wrong. I just know that kernel stability is directly proportional to the amount of dmesg spam.
12:03 karolherbst[d]: but it wasn't really reliable in terms of debugging experience
12:03 karolherbst[d]:but
12:03 karolherbst[d]: in theory
12:03 karolherbst[d]: we can dump all registers of all trapping threads
12:03 karolherbst[d]: the PC, local memory, you name it
12:03 gfxstrand[d]: I got that working on Intel forever ago. IDK if the patch ever landed, though.
12:04 karolherbst[d]: there are some sys vals which are useful there
12:04 karolherbst[d]: e.g. amount of allocated regs
12:05 karolherbst[d]: oh.... uhm...
12:05 karolherbst[d]: so
12:06 karolherbst[d]: there is also a sys val that gives you the shared mem size in bytes 🙃
12:06 karolherbst[d]: oh wait, it's a constant
12:06 karolherbst[d]: nvm
12:06 karolherbst[d]: oh both exists, the constant and the allocated one
12:06 gfxstrand[d]: Yeah, allocated regs is pretty important. 😅
12:07 karolherbst[d]: mhhhhhhh
12:07 karolherbst[d]: it's the _current_ registers, keep in mind there was something that can change the amount of allocated regs dynamically
12:09 karolherbst[d]: starting with hopper
12:09 gfxstrand[d]: Yes but if you don't have that and you try to scan through the register file, you'll end up hitting an exception in your trap handler and that's not gonna be fun for anybody. 😂
12:10 karolherbst[d]: `USETMAXREG`
12:10 karolherbst[d]: I think traps within the trap handler are disabled
12:11 karolherbst[d]: `USETMAXREG` sounds a bit insane, but what do I know
12:12 karolherbst[d]: well not all traps, but I think you can just read any reg and you'll just get a 0
13:24 snowycoder[d]: karolherbst[d]: Running `dEQP-VK.tessellation.geometry_interaction.passthrough.tessellate_triangles_passthrough_geometry_no_change` on my KeplerB outputs this in dmesg:
13:24 snowycoder[d]: [23871.108286] nouveau 0000:09:00.0: gr: TRAP ch 3 [007fbb0000 deqp-vk[75926]]
13:24 snowycoder[d]: [23871.108306] nouveau 0000:09:00.0: gr: GPC0/TPC0/MP trap: global 00000000 [] warp 3c000e [OOR_ADDR]
13:24 snowycoder[d]: [23871.154160] nouveau 0000:09:00.0: gr: TRAP ch 3 [007fbb0000 deqp-vk[75926]]
13:24 snowycoder[d]: [23871.154178] nouveau 0000:09:00.0: gr: GPC0/TPC0/MP trap: global 00000000 [] warp 3e000e [OOR_ADDR]
13:25 snowycoder[d]: karolherbst[d]: You're saying that we could create a gdbserver for shaders? 0_0
13:26 karolherbst[d]: snowycoder[d]: yep
13:26 karolherbst[d]: like `cuda-gdb` is also a thing
13:26 karolherbst[d]: not sure it works just as well for 3D shaders tho
14:04 snowycoder[d]: karolherbst[d]: So, what's the best solution?
14:04 snowycoder[d]: I'm not familiar at all with nouveau kernel code but I can learn it.
14:04 snowycoder[d]: Why can userspace write all the Falcon(?) registers to 0x0/0xffffffff?
14:04 snowycoder[d]: Should we flip that bit as in 1cb9e2ef66d53b020842b18762e30d0eb4384de8, or should we implement a more general solution with SW channels?
14:04 snowycoder[d]: I guess we can also trade a bit of kernel instability, it's not triggered that often and Kepler has other bugs.
14:05 karolherbst[d]: karolherbst[d]: just do that snowycoder[d]
14:06 karolherbst[d]: might also have to do it for other gens.. like gm107 and gk110
14:06 karolherbst[d]: but for testing on gk104 this should be enough
14:07 karolherbst[d]: like yeah, would be cool to allow userspace access to some of those regs, but it's kinda a pain, and if we always flip a bit, then there is no point in bothering further besides "write a kernel patch"
14:07 karolherbst[d]: like we can't grant blanket access, so it's going to be on a reg by reg case anyway
14:07 karolherbst[d]: the only situation where having the SW channel thing wired up is when there is a need to toggle values
14:07 karolherbst[d]: or the value depend on API bits
16:51 gfxstrand[d]: Being able to toggle exception bits is uesful
16:53 gfxstrand[d]: Okay, I really don't know how this video spec is supposed to work. There are lots of images in flight and there are extents and offsets that get provided per-image but we only have one stride for all luma and one for all chroma and one tiling for everything.
16:53 gfxstrand[d]: I don't see any VUs that make this possible
17:04 gfxstrand[d]: Okay, I sent an e-mail to the NVIDIA guy. I suspect we have to manage shadow copies of everything in the `VkVideoCodingSession` and just copy out at the end.
17:04 gfxstrand[d]: Which also means that we need copy queue support in our video queue which is a bit of a problem...
17:05 gfxstrand[d]: Unless video has copy stuff...
17:07 gfxstrand[d]: skeggsb9778: How sure can I be that the list of classes returned by the kernel is complete and that there aren't a few missing?
17:08 gfxstrand[d]: I guess I could just try doing a `SET_OBJECT` and see what happens
17:11 gfxstrand[d]: I'm gonna be annoyed if I have to start juggling queues...
17:12 gfxstrand[d]: I mean, it's not the end of the world. I think I can abstract it away okay. It's just really annoying that I have to bother.
17:15 gfxstrand[d]: But yeah, the more I look at the spec and the hardware the less it looks like decoding directly into VkImages was ever intended except as an optimization.
17:29 gfxstrand[d]: Which isn't to say that no one can do it. Just that Nvidia has enough bandwidth that they don't care.
18:00 gfxstrand[d]: Especially compared to the PCI bandwidth being used to upload the raw data. And you probably want to make sure that all your reference images are in VRAM, anyway, even if someone wants you to decode to linear.
18:27 gfxstrand[d]: In any case, this means I have a lot of reworking to do. And probably figuring out how to add transfer to video queues. 😩
18:28 gfxstrand[d]: Fortunately, a couple of Mel's patches should help with the last bit. The asserts she added can also be used for switching queues when we go to submit the command buffer if we plumb them through.
18:28 mangodev[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1403444853601210440/image.png?ex=6897935f&is=689641df&hm=f01f729ed777152de3c8b6943a83418130a20123a6e3e33fcdfcbe49d8df1319&
18:28 mangodev[d]: moment of truth…
18:30 mangodev[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1403445271748284436/image.png?ex=689793c3&is=68964243&hm=84a74a184ea9ac52fefad5a31f7ec4eb05caba7a5390d4a73d279df426d9427d&
18:30 mangodev[d]: lmao, it still does this
18:30 mangodev[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1403445357051908157/image.png?ex=689793d7&is=68964257&hm=9fc85ba44983e9581086cd8d0a365e8f31cf2f8831899d6781d073822f39993b&
18:30 mangodev[d]: uh oh
18:30 mangodev[d]: that's new
18:35 mangodev[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1403446537530376302/image.png?ex=689794f1&is=68964371&hm=69772c36844b4fd772121d757e92a924d4c9fa297a13f428a494f6658c68928d&
18:35 mangodev[d]: okay wait moment of real truth
18:35 mangodev[d]: just had to verify integrity
18:35 mangodev[d]: testing to see if the actual game runs (as it did extremely poorly before)
18:35 mangodev[d]: testing nms in specific because it's vulkan already, might test satisfactory too since it also has a vulkan renderer
18:36 mangodev[d]: rip
18:36 mangodev[d]: that's still a hard nope
18:36 mangodev[d]: nvk doesn't like something it does, runs at a whopping 1fps at lowest settings ðŸ«
18:37 mangodev[d]: steam runs really smooth though :D
18:41 mangodev[d]: in my testing, if it runs above 120fps, it *shouldn't* stutter the mouse
18:41 mangodev[d]: -# very few things run above 120fps
18:41 mangodev[d]: i would try testing in wlroots vulkan mode to see if it's zink related, but i don't have a ton of time anymore and don't have a test pc (my test pc is my daily driver)
18:44 mangodev[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1403448922898432020/image.png?ex=6897972a&is=689645aa&hm=d223ec35edb4dc25326f181f9da7fbd746f6046a8380d43a57b1f7bdb738b2dc&
18:44 mangodev[d]: this is gonna be fun
18:49 mangodev[d]: mangodev[d]: nevermind
18:49 mangodev[d]: i think if the game is gpu bottlenecked, it lags the compositor
18:49 mangodev[d]: but not the actual game itself
18:49 mangodev[d]: it could be running 300 fps and look like 10 because the gpu just *doesn't care* about the compositor :P
18:56 gfxstrand[d]: mangodev[d]: That sounds like something going across the PCI BAR when it really shouldn't.
18:58 gfxstrand[d]: mangodev[d]: Oh, well that's plausible. We don't have any sort of priority stuff.
18:59 gfxstrand[d]: But still, I suspect it's thrashing PCI or something like that.
19:02 mangodev[d]: gfxstrand[d]: i think there's a nonzero chance i'm completely missing WSI
19:03 mangodev[d]: i tried to launch a plasma x11 session
19:03 mangodev[d]: it hard crashed
19:03 mangodev[d]: the wayland one i've been using since the start of me using NVK, but it feels *far* too easy to slow down
19:04 mangodev[d]: and it doesn't seem right because people seem to benchmark stressful games like horizon: zero dawn just fine
19:05 mangodev[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1403453964019695768/image.png?ex=68979bdb&is=68964a5b&hm=77a0d7ba5ddb8a8267d6bfdff35c7013597b370abaa2bf802a75d1cf387003b3&
19:05 mangodev[d]: mangodev[d]: journalctl doesn't speak much
19:06 mangodev[d]: is this an issue?
19:06 mangodev[d]: sddm-helper-start-wayland[1030]: "kwin_scene_opengl: Could not delete render time query because no context is current\n"
19:07 mangodev[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1403454624895340544/image.png?ex=68979c79&is=68964af9&hm=004328eceb099f77e7dfe374774c3ce37bd80a3422a2bb45b2d77ecb7175379f&
19:07 mangodev[d]: there's also this, but this error has existed for me even on proprietary and also on a previous install
19:09 mangodev[d]: mangodev[d]: i think i may've found part of the reason?
19:09 mangodev[d]: sddm-helper-start-wayland[1030]: "kwin_wayland_drm: drmSetClientCap for Atomic Mode Setting failed. Using legacy mode on GPU \"/dev/dri/card0\"\n"
19:11 mangodev[d]: is wayland AMS something that the mesa driver explicitly has to support, or is it handled by gallium, a different part of mesa, or a different library?
19:12 steel01[d]: Is there a reason atomic isn't on by default yet? I have to pass a kernel bootarg to enable it to make android happy too.
19:13 mangodev[d]: steel01[d]: so wait
19:13 mangodev[d]: is atomic mode enabled by the driver, not the desktop?
19:13 steel01[d]: Correct.
19:13 mangodev[d]: 🙃
19:14 mangodev[d]: maybe *that's* why my wayland session cries at the sight of moving colors
19:14 steel01[d]: Like nouveau.atomic=1. On my phone atm, so can't pull my bootargs string.
19:14 mangodev[d]: that's it?
19:15 steel01[d]: Afaik. Fwiw, my target is tegra. But pretty sure desktop cards get the same default.
19:16 mangodev[d]: is this on the side of the nouveau *kernel driver,* or the nouveau *gl driver?*
19:16 steel01[d]: Kernel driver.
19:16 gfxstrand[d]: There's no NVK on Tegra yet
19:16 steel01[d]: No, but the atomic setting in kernel is the same either way.
19:17 mhenning[d]: steel01[d]: I think the kernel people never got around to it
19:17 mangodev[d]: are these all the flags i need (turing+), or is there more i should add?
19:17 mangodev[d]: nouveau.NvGspRm=1 nouveau.atomic=1
19:19 steel01[d]: mhenning[d]: I keep wondering if there's broken stuff I just haven't hit yet that will pop up eventually. So far, no issues related to it that I've seen, though.
19:22 esdrastarsis[d]: gfxstrand[d]: I think you forgot to add the extension in `new_features.txt`: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36681/diffs?commit_id=1cf5c68d50224d6f350f5607045fbe1bdb198b39
19:23 gfxstrand[d]: Yeah, I did
19:30 mangodev[d]: steel01[d]: ty
19:30 mangodev[d]: made discord multiple times smoother :|
19:30 mangodev[d]: still not perfect, but a good amount better than before
20:09 airlied[d]: gfxstrand[d]: decoding into VK into vkimage should be fine, you just have to set the image up when you get the video decode usage flags, but also don't confuse output images and dpb images
20:10 airlied[d]: Though I haven't looked at NVIDIA closely enough to see if they keep them separate
20:16 sakantaru: You do not even have trillion nanodevices aka transistors , let alone IO registers on single graphics adapter, so many io registers are not even on server racks interconnects altogether. Entirely debiliated or senile people i would say. But if you want to rise the io bar address space per device , nowdays there are IO and VRAM pagetables, you need to go levels deeper in pagetable nesting to
20:16 sakantaru: expose more memory. But your problem is that you are illborn people. I signalled enough time that send normal people to discuss things with me, not mindless fecalists or interracial anal fuckers and extorters and humiliators. You are sick. How my encoder works is defined through the permutation and combinatory subject. So approximately 1024*1024 combinations are seeked through remapping the
20:16 sakantaru: 1024 combinations to 1024 answers and decoder multiplies those to yet with more than 1024 aka 4096, the combinations are accountable through the maths, but you need to have a little bit of mind soul and logic behind as to what maths is about through the intuition and reading through between the thought or lines that are missing i.e creative logics. if you map 1024 answers from 1024 variants
20:16 sakantaru: the chip wise or combinatory wise is that you access 1024*1024 values through that already, where as decoder that is partly virtual. overcommits and backpropagates to add weights, and the decoder maps 5 to 1, 9 to 2 etc. which is irrelevant but the formula for filtering showed 4*overcommit round about exactly, cause 2in power of 5 is where the multiplier comes in for the carry generate and
20:16 sakantaru: propagate logic, 4 first ones pass on as verbatim i.e you have 4*4=16 but not 5*5=25 but 32 for 5 bits. hence 2+4+8+16....to512 overcommits 511 by factor of 4 on 4 fields on 32bit. And my access filtering did the same such as muliplying with four at most. hence my algoithm and all the technique on modern computing has proof too, and is not only a vague theory. Your 48bit or 40 bit is massive
20:17 sakantaru: storage for IO it is mindblowing, it never needs more as said, all the interconnect worth of devices hasn't got so much io register in net total.
20:27 gfxstrand[d]: airlied[d]: The problem is that there's a single stride and tiling for all images (actually, two strides, one for chroma and one for luma). Also, you're allowed to decode into subregions of images. There's absolutely nothing I can find in the spec that gives us any useful guarantees that all the images are the same size and therefore the same stride.
20:27 gfxstrand[d]: Maybe I've missed something but I can't find it
20:33 airlied[d]: I think that is just how DPB images have to work, you have to allocate them at a certain size
20:33 airlied[d]: or you force them into an array
20:33 airlied[d]: using the flag that says use an array image for this
20:35 airlied[d]: VK_VIDEO_CAPABILITY_SEPARATE_REFERENCE_IMAGES_BIT_KHR and then you have to decide if DPB and DST images coincide
20:35 airlied[d]: but yes all your references usually have to have same stride and tiling
20:36 airlied[d]: there should be no implicit copy into the output image unless the firmware does it like on AMD
20:40 gfxstrand[d]: Oh, there's a cap bit for that? Okay. I missed that
20:41 airlied[d]: There is also a coincide bit, but i don't have the CPU to read the spec right now 🙂
20:42 gfxstrand[d]: Yeah, so the cap I think fixes it. There are still bugs (array slices are currently ignored) but I think that's what I was missing.
20:58 airlied[d]: have to check what nvidia exposes there
21:05 airlied[d]: dang vulkan gpuinfo is down
21:08 gfxstrand[d]: It's up in the US but it's down for other countries. I have to VPN
21:09 gfxstrand[d]: I just poked Sascha about it
21:09 gfxstrand[d]: Unfortunately, it doesn't help because it doesn't list video props
21:21 airlied[d]: so the reason we don't use array is because it is usually oversized compared to what is needed, so if you have a stream that you know won't need a full DPB to decode, you can use separate images, also if you want to also use those images as destinations
21:22 airlied[d]: at least on AMD the newer hardware for decode doesn't require arrays, only encode
21:26 gfxstrand[d]: Well, NVIDIA needs arrays for everything, unless it's changed on newer hardware (which I don't think it has)
21:53 gfxstrand[d]: So is the client responsible to copy stuff in to the DPB if it's going to reference it later?
21:54 gfxstrand[d]: and/or decode directly into the DPB
21:55 airlied[d]: No the dpb is only written by the hw
21:56 airlied[d]: Each slot is only written on a decode operation on that slot
21:56 airlied[d]: After that it is used read only as a reference image
21:57 airlied[d]: And depending on the decode|dpb usage bits the client can use it as the output
21:57 airlied[d]: VK_VIDEO_DECODE_CAPABILITY_DPB_AND_OUTPUT_COINCIDE_BIT_KHR
21:57 airlied[d]: VK_VIDEO_DECODE_CAPABILITY_DPB_AND_OUTPUT_DISTINCT_BIT_KHR
22:02 airlied[d]: Not sure NVIDIA support distinct
22:07 gfxstrand[d]: Okay, so in that case `pSetupReferenceSlot` is what gets written and the client has to copy out of that?
22:07 gfxstrand[d]: Clunky but it makes sense
22:08 gfxstrand[d]: I'm still confused what all this slot remapping is doing but things are starting to make sense now
22:09 mhenning[d]: wait, is the CE in NOUVEAU_FIFO_ENGINE_CE compute engine or is it copy engine?
22:14 airlied[d]: copy engine
22:17 mhenning[d]: okay, thanks. here I was thinking it was compute for the past three days
23:28 mhenning[d]: I think I remember someone talking about a kernel issue where jobs sometimes timeout that shouldn't, like we don't update the fence correctly?
23:28 mhenning[d]: is that supposed to be fixed?
23:29 mhenning[d]: I'm wondering if the issue I'm seeing in https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36617#note_3045737 might be a kernel problem
23:29 mhenning[d]: or if userspace isn't setting up sync properly or something
23:31 airlied[d]: I think I've debugged something weird about 5 times and never found it and then something else fixed it enough to make it not happen
23:33 airlied[d]: I keep suspecting a race on the irq/event enable/disable handling paths but fail to find it
23:34 airlied[d]: Esp around the allowed handling
23:37 mhenning[d]: I'm not sure what allowed handling is
23:37 mhenning[d]: but yeah that sounds plausible
23:38 mhenning[d]: The issue is happening on that branch + NVK_DEBUG=push_sync + talos principle if you want to play with it
23:38 mhenning[d]: (push_sync isn't required but it might make it more likely)
23:39 airlied[d]: Grep for allowed in nvkm
23:40 airlied[d]: You can force allowed to true always to see if it helps, like we shouldn't miss irqs but sometimes I wonder