01:52sanityinsert: so the execution can, but does not have to depend on data access methods, so i looked at arithmetic ieee1164 core pretty long time ago, and have not inspected it recently enough, but it is very likely , that such pattern of execution through subtract elimination works cause it's essentially hardware modulus rather than logarithm
01:52sanityinsert: https://www.csus.edu/indiv/p/pangj/166/f/d8/5_Modulo%202%20Arithmetic.pdf .
03:32DemiMarie: How good is rusticl compared to clvk?
06:15WhyNotHugo: o/
08:46colinmarc: nowrep: I finally got hierarchical coding working on my 6650 (navi 23 I think) - thanks for all the help over multiple MRs. Unfortunately the bitrate... really really sucks. It's like 10x worse than IPPP, even if I up the QP significantly on the upper layers. Do you know why that might be or is it probably just not optimized for that?
08:46colinmarc: This is hevc btw. I haven't managed to get h264 working yet
09:19emilyhunter: https://en.wikipedia.org/wiki/Kogge%E2%80%93Stone_adder it's one of the popular adders, the whole computation before taking XOR is shown there.
09:35emilyhunter: so the fear of getting bitten is almost zero there, the remainder can not go wrong as to so it would confuse the banks or bases of operand sums, it could only happen if you had the duplicates in the results as negative values, in which case the final result will still be correct. The electronic circuit will correct such error in the adders hw where the result can even go beyond or below
09:35emilyhunter: the expected.
09:45emilyhunter: And there is nothing that involves logarithm like in photonics circuits or FFT like in sampling of the domains. It's adders correction circuit in the electronics core.
09:45emilyhunter: and this is more over a modulus kind of operation when the terms get merged likely.
10:23emilyhunter: the science of adders is pretty complex, and I do not claim to understand it fully though, the rule of thumb is contiguous indexes, that should be the foundation for the order manipulation.
11:38nowrep: colinmarc: if you set the bitrate and fps for each layer correctly, it shouldn't be huge difference. also makes sense to use higher bitrate for base layer because the references are too far from each other there. changing qp will have no effect if you enabled rate control
11:39colinmarc: @nowrep I was trying fixed QP, with low QP for the base level and higher for each layer above. That's what the papers I found mentioned testing. Is that not the right strategy?
11:40colinmarc: so rate control disabled basically
11:41colinmarc: I didn't change the fps per layer, but I wonder if that's making a difference. I'm not sure how to set the fps with vulkan video
11:42nowrep: if you don't need rate control then you can jest set QP for each frame, and ignore the layers
11:43colinmarc: Yeah, the reference structure is layered but I'm not using layered rate control
11:43colinmarc: and the resulting stream size is 10x higher :(
11:44colinmarc: s/higher/bigger/
11:44colinmarc: intuitively I get that there are bigger differences on the bottom layer, but I would expect that to be offset by the higher QP on higher layers
11:45colinmarc: Is your suggestion to use layered VBR then?
11:46nowrep: yeah you can try with vbr
11:48colinmarc: does 10x sound realistic to you? or does it sound like I messed something up? :)
11:48nowrep: sounds wrong
11:49colinmarc: hrm, ok, thanks
11:53jkqxz: What does the prediction structure look like, and which frames are large?
11:54jkqxz: That device doesn't support B frames in HEVC so it's inevitably going to be significantly larger to fit in predicting the same thing multiple times, but 10x is unreasonable.
11:56colinmarc: jkqxz: P-only, hierarchical, with 4/8/16 length mini-GOP. It's a good question which layer is contributing to the size. I should measure that.
11:57colinmarc: (this one: https://docs.vulkan.org/spec/latest/_images/h26x_layer_pattern_dyadic.svg)
11:57nowrep: i've tried different qp for each level with vaapi/radeonsi and the resulting file size was roughly the same
11:59colinmarc: nowrep: roughly the same vs single-level IPPP?
11:59nowrep: yes
11:59colinmarc: ok, that's good to know. I will try a few more things
12:00colinmarc: should I already file an issue or is this not really plausibly a bug in radv?
12:00jkqxz: That sort of structure could plausibly give you 2x rate at same quality if the video has lots of activity (because many changes will have to be coded multiple times), but 10x says that something is going very wrong.
12:02nowrep: first try if changing qp per frame works at all, if it does then there is no reason why it would break if you use other than last frame as reference
12:03colinmarc: that's what I've already tried. but maybe I did something wrong
12:03colinmarc: oh, you mean with IPPP. ok, I can try that
16:27karolherbst: jenatali: you already responded with "looks reasonable to me" on this patch, but can I read it as a rb, because I kinda wnat to land that MR before XDC: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30831?commit_id=addea00eb273c3a4fd87c17913a5cef087a03954
16:28jenatali: karolherbst: sure
16:28karolherbst: cool, thanks
18:36DavidHeidelberg: karolherbst: I'm at airport to Montrela, but I could give u Tested by after running it on freedreno CI