01:07 dwlsalmeida[d]: hey guys, is there a way to tell the caller that a given command failed to be recorded? the VkCmd* stuff all return void
01:13 airlied[d]: it gets returned at end command buffer
01:15 airlied[d]: vk_command_buffer_set_error
01:21 dwlsalmeida[d]: gfxstrand[d]: Sorry, I screwed up slightly here and in similar places:
01:21 dwlsalmeida[d]: #[derive(Copy, Clone, Debug, PartialEq)]
01:21 dwlsalmeida[d]: pub struct SetControlParams {
01:21 dwlsalmeida[d]: pub codec_type: SetControlParamsCodecType,
01:21 dwlsalmeida[d]: pub gptimer_on: u32,
01:21 dwlsalmeida[d]: pub ret_error: u32,
01:21 dwlsalmeida[d]: pub err_conceal_on: u32,
01:21 dwlsalmeida[d]: pub error_frm_idx: u32,
01:21 dwlsalmeida[d]: pub mbtimer_on: u32,
01:21 dwlsalmeida[d]: pub ec_intra_frame_using_pslc: u32,
01:21 dwlsalmeida[d]: pub all_intra_frame: u32,
01:21 dwlsalmeida[d]: pub reserved: u32,
01:21 dwlsalmeida[d]: }
01:21 dwlsalmeida[d]: We should optimize 1bit values to bool. Would you be willing to take a patch to change that once you return?
01:22 dwlsalmeida[d]: dwlsalmeida[d]: Also, we should probably derive #Default everywhere we can
01:24 dwlsalmeida[d]: I don't think this will be a problem, I think there's only one file using `nv_push_rs` for now
01:29 gfxstrand[d]: Yeah, TRUE/FALSE values being bool is a did idea. IDK that we want to do it for every 1-bit value, though. For Default, I'm less sure. IDK that 0 is always a good default, or even allowed.
01:29 dwlsalmeida[d]: TRUE/FALSE are already bool,
01:30 airlied[d]: why do we derive Copy? I find not having copy derives finds place where we should pass things by reference a lot quicker
01:31 dwlsalmeida[d]: dwlsalmeida[d]: if the Python script reads the keywords TRUE and FALSE, it already sets the type to bool
01:31 dwlsalmeida[d]: it's the 1 bit stuff that was missed
01:32 dwlsalmeida[d]: airlied[d]: I think...it was just thrown in for good measure
01:42 dwlsalmeida[d]: fn class_to_subc(class: u16) -> u8 {
01:42 dwlsalmeida[d]: match class & 0xff {
01:42 dwlsalmeida[d]: 0x97 => 0,
01:42 dwlsalmeida[d]: 0xc0 => 1,
01:42 dwlsalmeida[d]: 0x39 => 2,
01:42 dwlsalmeida[d]: 0x2d => 3,
01:42 dwlsalmeida[d]: 0xb5 => 4,
01:42 dwlsalmeida[d]: _ => panic!("Invalid class: {class}"),
01:42 dwlsalmeida[d]: }
01:42 dwlsalmeida[d]: }
01:42 dwlsalmeida[d]: This is hardcoded? Shouldn't this be set according to the recent `SET_OBJECT` calls?
01:47 airlied[d]: I think it should be fine at least from kepler forward
02:05 skeggsb9778[d]: It's hardcoded in HW from kepler onwards
02:05 skeggsb9778[d]: You'll get a CLASS_SUBC_MISMATCH if you try to bind the wrong class to a subchannel
02:12 skeggsb9778[d]: The 0x39 in that list also refers to NV_MEMORY_TO_MEMORY_FORMAT, which doesn't exist after fermi
02:12 skeggsb9778[d]: You probably want 0x40 (for KEPLER_INLINE_TO_MEMORY)
02:17 dwlsalmeida[d]: as below?
02:17 dwlsalmeida[d]: drivers/gpu/drm/nouveau/include/nvif/class.h
02:17 dwlsalmeida[d]: 58:#define KEPLER_INLINE_TO_MEMORY_A 0x0000a040
02:17 dwlsalmeida[d]: 59:#define KEPLER_INLINE_TO_MEMORY_B 0x0000a140
02:18 skeggsb9778[d]: That'd be the one
02:19 dwlsalmeida[d]: __push_mthd(p, SUBC_NVC5B0, NV906F_SET_OBJECT);
02:19 dwlsalmeida[d]: P_NV906F_SET_OBJECT(p, { .nvclass = 0xC5B0,
02:19 dwlsalmeida[d]: .engine = 0 });
02:19 dwlsalmeida[d]: I read on envytools that `engine==0` is a special `software` engine
02:19 dwlsalmeida[d]: I wonder if that is still applicable..
02:19 skeggsb9778[d]: yeah, no
02:21 skeggsb9778[d]: https://github.com/NVIDIA/open-gpu-doc/blob/master/manuals/turing/tu104/dev_pbdma.ref.txt#L1170
02:22 skeggsb9778[d]: (subchannels 5-7 on current HW are treated as software engines)
02:22 skeggsb9778[d]: you achieve that in various different ways on earlier HW
02:23 skeggsb9778[d]: (passing ENGINE=0x1f in SET_OBJECT on ~fermi, or ENGINE=<sw engine id> in an object's RAMHT entry on earlier HW)
02:30 airlied[d]: skeggsb9778[d]: the kernel vmm code mentions dual page tables, is that still relevant on turing+?
02:30 skeggsb9778[d]: sure is
02:31 skeggsb9778[d]: PD0 (the level before the page tables) had pointers to both small and big page tables
02:32 skeggsb9778[d]: it's so you can mix small/big pages within the same virtual address region
02:32 skeggsb9778[d]: prior to that (fermi), if you mapped a small page at, say, +0x1000, the whole area covered by that same PDE would have to be small pages too
02:37 airlied[d]: what's the difference between pd0 and pd1?
02:38 skeggsb9778[d]: different entry format
02:38 skeggsb9778[d]: pd1-3 all have the same format, pd0 has the dual page table pointers
02:41 airlied[d]: okay I think I get it now :-), just trying to recrate enough vmm code to setup bars and map something into bar1
02:46 dwlsalmeida[d]: I guess one last thing for today (as it's 11:45pm so almost tomorrow 😂 ), do you have any tips to debug this:
02:46 dwlsalmeida[d]: > gsp: mmu fault queued
02:46 dwlsalmeida[d]: > gsp: rc engn:00000001 chid:24 type:31 scope:1 part:233
02:46 dwlsalmeida[d]: > fifo:000000:0003:0018:[ffmpeg[15027]] errored - disabling channel
02:47 airlied[d]: that with video?
02:48 dwlsalmeida[d]: yeah, I can inspect the data I am submitting more thoroughly, but I wonder if there's some trick I don't know about
02:48 skeggsb9778[d]: you can decode the "type" by grepping "ROBUST.*31" in open-gpu-kernel-modules, but that's a mmu fault
02:49 skeggsb9778[d]: for the moment, if you want that decoded, you'll have to apply timur's gsp log patches, and send the log to me
02:59 airlied[d]: it's usually some undersized allocation, which is quite likely since I think I don't calculate all those things very well
03:04 dwlsalmeida[d]: airlied[d]: where this comes from?
03:04 dwlsalmeida[d]: VKAPI_ATTR VkResult VKAPI_CALL
03:04 dwlsalmeida[d]: nvk_GetVideoSessionMemoryRequirementsKHR(VkDevice _device, VkVideoSessionKHR videoSession,
03:04 dwlsalmeida[d]: uint32_t *pMemoryRequirementsCount,
03:04 dwlsalmeida[d]: VkVideoSessionMemoryRequirementsKHR *pMemoryRequirements)
03:04 dwlsalmeida[d]: {
03:04 dwlsalmeida[d]: // VK_FROM_HANDLE(nvk_device, dev, _device);
03:04 dwlsalmeida[d]: // VK_FROM_HANDLE(nvk_video_session, vid, videoSession);
03:04 dwlsalmeida[d]: uint32_t memory_type_bits = (1u << 2) - 1;
03:04 dwlsalmeida[d]: VK_OUTARRAY_MAKE_TYPED(VkVideoSessionMemoryRequirementsKHR, out, pMemoryRequirements, pMemoryRequirementsCount);
03:04 dwlsalmeida[d]: vk_outarray_append_typed(VkVideoSessionMemoryRequirementsKHR, &out, m) {
03:04 dwlsalmeida[d]: m->memoryBindIndex = 0;
03:04 dwlsalmeida[d]: m->memoryRequirements.size = 16384;
03:04 dwlsalmeida[d]: m->memoryRequirements.alignment = 4096;
03:04 dwlsalmeida[d]: m->memoryRequirements.memoryTypeBits = memory_type_bits;
03:04 dwlsalmeida[d]: }
03:04 dwlsalmeida[d]: vk_outarray_append_typed(VkVideoSessionMemoryRequirementsKHR, &out, m) {
03:04 dwlsalmeida[d]: m->memoryBindIndex = 1;
03:04 dwlsalmeida[d]: m->memoryRequirements.size = 122880;
03:04 dwlsalmeida[d]: m->memoryRequirements.alignment = 4096;
03:04 dwlsalmeida[d]: m->memoryRequirements.memoryTypeBits = memory_type_bits;
03:04 dwlsalmeida[d]: }
03:04 dwlsalmeida[d]: vk_outarray_append_typed(VkVideoSessionMemoryRequirementsKHR, &out, m) {
03:04 dwlsalmeida[d]: m->memoryBindIndex = 2;
03:04 dwlsalmeida[d]: m->memoryRequirements.size = 12288;
03:04 dwlsalmeida[d]: m->memoryRequirements.alignment = 4096;
03:04 dwlsalmeida[d]: m->memoryRequirements.memoryTypeBits = memory_type_bits;
03:04 dwlsalmeida[d]: }
03:04 dwlsalmeida[d]: return vk_outarray_status(&out);
03:04 dwlsalmeida[d]: }
03:05 dwlsalmeida[d]: also the memory for the video session will only be used to house the out of band stuff right like SPS PPS and etc, right
03:06 airlied[d]: I think I just took those values from the binary driver playing one video
03:07 airlied[d]: the video session memory is used by the GPU internally
03:07 airlied[d]: it's meant to be opaque to the user, I don't think it stores SPS or PPS in it
03:11 airlied[d]: looks like coloc, mbhist and history buffers used by the hw to store state
03:12 airlied: but I've no idea how to size those
03:17 airlied[d]: my rough plan was to throw a bunch of different sized videos at nvidia driver and see what they produced
03:20 dwlsalmeida[d]: I will try that, for this issue in particular, I am sure I introduced it when porting stuff to Rust, since your code actually somewhat works for a single I frame
03:20 dwlsalmeida[d]: so I will be able to figure this out eventually
03:21 dwlsalmeida[d]: I was asking more generally because sometimes people have hidden tricks or debug tools etc that can be useful
03:21 airlied[d]: check the tiling bits
03:25 airlied[d]: but yeah if mine does one frame, I'd dump the cmd streams and check
03:54 dwlsalmeida[d]: Yeah the pushbuf is never submitted, I forgot you have to manually copy it to the right place, at least for the Rust version
03:55 dwlsalmeida[d]: We should probably assert that size != 0 when allocating BOs, I don’t see how that can be ever right anyways
03:56 dwlsalmeida[d]: Faith, we ended up not merging the Rust code to debug the rust push, but I wonder why we can’t pipe that through vk_push_print anyways ?
03:56 dwlsalmeida[d]: And basically reuse the C stuff already there
10:13 ahuillet[d]: mohamexiety[d]: what are you trying to achieve? anything specific you need help with?
10:16 mohamexiety[d]: ahuillet[d]: the main/original goal is enabling DCC compression but first it needs some kernel changes (DCC needs large pages and currently nouveau uses small pages for everything). I am just not as familiar with that part of the stack so it’s a lot of poking around in the dark first
10:17 ahuillet[d]: (DCC isn't a thing in our terminology, FWIW)
10:17 ahuillet[d]: (color compression/depth compression, just "compression" basically)
10:19 ahuillet[d]: mohamexiety[d]: right, what Ben says then indeed.
10:20 karolherbst[d]: do large pages need proper physical alignment?
10:21 ahuillet[d]: I don't know but would be surprised if they didn't, that's what pages are for after all
10:21 karolherbst[d]: yeah...
10:22 mohamexiety[d]: ahuillet[d]: Yeah I noticed. This was confusing at first because I thought it would be closer to the marketing terminology and that general compression is something else :KEKW:
10:22 mohamexiety[d]: (e.g. texture compression or such)
10:23 karolherbst[d]: I _think_ our UAPI has at least an alignment field, but I suspect userspace has to provide a sufficient alignment on allocation
10:23 karolherbst[d]: otherwise it's gonna be an UAPI change for "this allocation might be used for compression"
10:24 karolherbst[d]: ehh mhhh
10:24 karolherbst[d]: mhhh
10:24 karolherbst[d]: there isn't
10:24 karolherbst[d]: pain
10:24 mohamexiety[d]: karolherbst[d]: Yeah and from what Ben said earlier, it will be user space that would guarantee all the conditions
10:24 karolherbst[d]: it can't
10:26 karolherbst[d]: like if the memory needs physical alignment (which is a safe bet) then on allocation this information that this _might_ be used for depth buffers and hence needs to be suitable for large pages needs to be there
10:26 karolherbst[d]: so I doubt you would be able to achieve that without an UAPI change
10:27 karolherbst[d]: and the memory allocation behaving differently with VM_BIND enabled is also kinda weird
10:28 karolherbst[d]: so as of now I don't really see a way besides using `tile_mode` and `tile_flags` to transport the information of "this allocation needs to be proper for large pages"
10:38 airlied[d]: gem_new has an align field
10:40 airlied[d]: though maybe that's an out field
10:45 karolherbst[d]: ohhh mhhh
10:45 karolherbst[d]: it's 32 bit
10:46 karolherbst[d]: mhhh
10:46 karolherbst[d]: let's see..
10:47 karolherbst[d]: airlied[d]: oh yeah.. you are right, it's used and all that, I totally missed that one
10:48 karolherbst[d]: guess the uapi is suitable afterall then
10:48 karolherbst[d]: (just need to be aware of the 32 bit part)
13:43 dwlsalmeida[d]: is there any particular reason why this is not upstream?
13:43 dwlsalmeida[d]: https://gitlab.freedesktop.org/airlied/mesa/-/blob/nv-pushbuf-dump-tool-hacks-video/src/vulkan/nv-shader-dump-layer/dump_layer.c?ref_type=heads
13:43 dwlsalmeida[d]: i.e.: legal reasons or something
16:12 mhenning[d]: dwlsalmeida[d]: I think people use this instead https://gitlab.freedesktop.org/nouveau/nv-shader-tools/-/blob/main/src/bin/nvdump.rs
16:14 dwlsalmeida[d]: Ah, Mary's tool
16:14 dwlsalmeida[d]: but..what I actually want is to dump command buffers through `vk_push_print`, I don't care about shaders really
16:15 mhenning[d]: For that I think people use envyhooks
17:55 airlied[d]: Not upstream because it's a bit hacky
20:02 mhenning[d]: karolherbst[d]: Do your docs talk about what ERRBAR is used for?
20:02 mhenning[d]: I've seen the cuda compiler pair MEMBARs with an ERRBAR on ampere. nak doesn't do this, and I'm wondering if it might cause problems
20:03 karolherbst[d]: it doesn't cause any problems
20:04 karolherbst[d]: it only really matters if there are e.g. warp errors happening, but nothing in normal operation
20:58 dwlsalmeida[d]: marysaka[d]: marysaka[d] is there a particular ordering to the dumps generated by envyhooks?
21:16 marysaka[d]: dwlsalmeida[d]: It dumps all command buffer when the submission is kicked off
21:16 dwlsalmeida[d]: so I implemented the dynamic class tracking via SET_OBJECT in nv_push_dump, and I've concatenated all the pushbuf dumps into one file, so that we keep track of all calls to SET_OBJECT
21:16 dwlsalmeida[d]: This doesn't quite work, because the data that I want seems to be there, but there's a SET_OBJECT just before that for the same subchannel, so everything gets interpreted as copy commands instead of video
21:17 marysaka[d]: marysaka[d]: well more precisely, when the GP_PUT is set right before the doorbell is ring
21:18 marysaka[d]: dwlsalmeida[d]: The command buffer are probably not for the same context so that may be why
21:19 marysaka[d]: If it's not that, then it may be my deduplication code acting up maybe?
21:19 marysaka[d]: you can disable that behaviour by setting `EHKS_DISABLE_PUSHBUF_CACHE=1`
21:19 marysaka[d]: it will get quite verbose tho
21:20 dwlsalmeida[d]: If instead of keeping track of SET_OBJECT, we force everything on subchannel 4 to be parsed as video, then we get the video stuff, but copy is broken
21:20 dwlsalmeida[d]: let me try this disable cache thing
21:20 marysaka[d]: Does COPY and VIDEO overlap?
21:20 marysaka[d]: (in term of methods)
21:21 dwlsalmeida[d]: they both share subchannel 4, from what I understand, other than that not sure
21:22 marysaka[d]: in any cases, maybe you want to tryu to add some identifier to indicate what pushbuf belong to what context when writing to file
21:23 marysaka[d]: you can likely pass the GpFifoChannelInfo::handle to dump_pushbuf here and change the filepath format https://gitlab.freedesktop.org/nouveau/envyhooks/-/blob/main/src/nvrm/tracker/pushbuf.rs?ref_type=heads#L575
21:24 dwlsalmeida[d]: the disable cache thing did not work, but what is this?
21:24 dwlsalmeida[d]: 4 [0x00006766] HDR 1fff0 subch N/A SUB_DEVICE_OP
21:24 dwlsalmeida[d]: 5 mthd 0001 SET_SUBDEVICE_MASK
21:24 dwlsalmeida[d]: 6 .VALUE = 0xfff
21:24 dwlsalmeida[d]: says `mthd 0001`, isn't this mthd 0000 being misparsed?
21:24 marysaka[d]: it's some specific GPFIFO opcode
21:25 dwlsalmeida[d]: this comes right before the first video command, so if this was SET_OBJECT instead, it would make sense really I think
21:25 marysaka[d]: we fixed that with karolherbst[d] some months ago but I think think you need to think too much about those
21:25 karolherbst[d]: yeah... I think it's something you can ignore for now
21:25 marysaka[d]: dwlsalmeida[d]: maybe it's something different here
21:26 marysaka[d]: maybe there is some implicit SET_OBJECT happening depending on the channel being selected when creating the context (that's only speculation)
21:26 karolherbst[d]: newer hardware kinda works like that
21:27 marysaka[d]: I remember that the golden context stuffs had that at least for 2D/3D/... on GM20B
21:27 karolherbst[d]: older hardware was way more dynamic, it might be they got rid of it on very new gens once they moved to nvdec/nvenc entirely
21:27 karolherbst[d]: and yeah.. the kernel could also set up things in a specific way
21:32 dwlsalmeida[d]: marysaka[d]: yeah, printing the context in the file name quickly shows that this is the case
21:32 dwlsalmeida[d]: so maybe concatenating everything together is not the answer?
21:33 dwlsalmeida[d]: but rather concatenating the ones with the same context?
21:33 marysaka[d]: you probably want to separate by context
21:33 marysaka[d]: and concatenate in order of submission
21:33 dwlsalmeida[d]: ok this is extra extra helpful
21:33 dwlsalmeida[d]: thanks
21:35 karolherbst[d]: ohh right... the blob does that
21:37 dwlsalmeida[d]: what is `entry_index`?
21:37 dwlsalmeida[d]: (should I care?)
21:37 dwlsalmeida[d]: I guess entry index is just...the index of the command buffer when multiple are submitted in a batch?
21:38 marysaka[d]: dwlsalmeida[d]: so it's the index in the current submission
21:39 marysaka[d]: so basically I dump when GP_PUT is being written and that gave me the range of pushbufs that are part of the submissions
21:39 marysaka[d]: so there is likely always more than 1 pushbuf being submitted in general
21:57 dwlsalmeida[d]: ok, this finally works!!! 🎉 :triangle_nvk:
21:57 dwlsalmeida[d]: concatenating as you said did the trick
22:06 dwlsalmeida[d]: would you take a patch to make it permanent? the naming of the file stuff
22:07 marysaka[d]: sure!
23:56 dwlsalmeida[d]: What's the inverse of this?
23:56 dwlsalmeida[d]: VkResult
23:56 dwlsalmeida[d]: nvk_cmd_pool_alloc_mem(struct nvk_cmd_pool *pool, bool force_gart,
23:56 dwlsalmeida[d]: struct nvk_cmd_mem **mem_out)
23:57 dwlsalmeida[d]: I need memory to copy the data from the Rust push into, but I see no way to give it back to the pool once the command buffer is destroyed