IRC Logs of #radeon on irc.freenode.net for 2023-08-18

06:50 buduar: ECE4760 as i explain to gfxstrand is it uses 3 channels out of 13 on raspberry pie 2400...that can be generalized to linux api, so obviously two channels mux together in case of pure adder, other ops derive from the fact, and third channel is the one you use to toggle the feedback, modern hw has those dma chipsets everywhere, on gpu, on network cards, on cpu interconnect etc. etc. toggle the feedback means that channel is used on dma registers,
06:50 buduar: to first use it to split the channels and on demand merge them , it was expected but Cornell people did a nice work to expose it on very nice page. I knew that after long period of theoretical work indeed yeah.
06:55 buduar: since at both of chipsets on #intel-gfx talked composed stream the latencies of every instruction is constant on dma and cpu stream that i defined as rule based approach, you do not need any locks, semaphores or anything alike, so as anti vulkan argument it serves, that multiple contexts are handled by dma channels , that works so naturally, since everything on gpu is handled by dma anyways, except the shader engine, where you shading processors.
06:56 buduar: so shading language is compiled in more modern ways that i tried to explain
06:57 buduar: so the cpu or gpu dma can be used to accelerate the alus in compilation phase inside the driver, but shader processor is more aggressive on runtime functionality
06:59 buduar: their communication can be trimmed to have exactly one to one latency, even there if you really want the locks do not have to be used
06:59 buduar: this model would be lot simpler, cause doing that you avoid big hunk of deadlock bugs
07:03 buduar: now the video codecs, are today handled by AI kits that i am not sure about how they all work in terms of memory usage and efficiency, but dlvc concept or that type of thing, is that it scrapes the screen of one picture and compares that against scanout buffer and builds a codec on it's own
07:04 buduar: i have so much experience , that i can offer all those goodies pretty quick as paid job
07:06 buduar: i was in very "being attacked" position, but i worked even upfront for FPGA research which is with that compression on LUTs performing the best, i did the theoretical needed work mostly upfront too.
07:08 buduar: those are unlike nuclear chips something an average man who has some resources in contrast to me currently can get best performance out of, this would be fpga on modded synth, and tetramem, seems very similar but hw based reram hack.
11:25 haagch: is undervolting a 6900xt supposed to work? even with ludicrously large numbers like echo "vo -500" > /sys/class/drm/card1/device/pp_od_clk_voltage, /sys/class/drm/card1/device/pp_od_clk_voltage shows OD_VDDGFX_OFFSET -500mV but it doesn't seem to have an actual effect (I did try small numbers first)
11:30 haagch: 6.4.10-zen2. other settings seem to work, so for now I set max core clock to 2500 to keep temperatures in check
14:17 MrCooper: mareko: cd7e20f51388 ("radeonsi: specialize si_draw_rectangle using a C++ template") caused a regression here on Navi 14 with alacritty: ../src/gallium/drivers/radeonsi/si_state_draw.cpp:2379: void si_draw_rectangle(blitter_context*, void*, blitter_get_vs_func, int, int, int, int, float, unsigned int, blitter_attrib_type, const blitter_attrib*) [with amd_gfx_level GFX_VERSION = GFX10; si_has_ngg NGG = NGG_ON; si_has_pairs HAS_PAIRS =
14:17 MrCooper: HAS_PAIRS_OFF; blitter_get_vs_func = void* (*)(blitter_context*)]: Assertion `sctx->ngg == NGG' failed.
14:18 MrCooper: call path: u_transfer_helper_transfer_unmap → si_flush_resource → si_blit_decompress_color → util_blitter_restore_vertex_states → si_draw_vbo<(amd_gfx_level)11, (si_has_tess)0, (si_has_gs)1, (si_has_ngg)0, (si_has_pairs)0> → si_draw<(amd_gfx_level)11, (si_has_tess)0, (si_has_gs)1, (si_has_ngg)0, (si_is_draw_vertex_state)0, (si_has_pairs)0, (util_popcnt)0> → si_emit_draw_packets<(amd_gfx_level)11, (si_has_tess)0, (s
14:18 MrCooper: i_has_gs)1, (si_has_ngg)0, (si_is_draw_vertex_state)0, (si_has_pairs)0>
14:46 MrCooper: si_init_draw_vbo_all_pipeline_options needs to use sctx->screen->use_ngg instead of checking for NO_NGG
16:04 mareko: indeed
16:04 mareko: thanks for reporting
16:10 mareko: fixed in https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24759
16:51 MrCooper: thanks
17:46 johnny0: haagch: Core voltage offset worked for me a while back with a 6600XT but there was definitely a voltage floor in place that's higher than in windows. The card had to be under load and at a high clock speed for the offset to actually have an effect. I wasn't able to get below this floor via PPT either.
17:48 haagch: i was running gravitymark and looked at the temperature + voltage in the lact gui and saw no change when setting the voltage offset both higher and lower
18:22 Armada: haagch: I think this is a nice little project: https://github.com/sibradzic/amdgpu-clocks - simple config, can run as systemd service. I also tried to undervolt directly through /sys/class/drm, all guides that I've used were not working for me. Consider giving it a shot.