07:52Ermine: do I need to be subscribed to dri-devel to be able to post there?
08:06emersion: no
08:07emersion: but you can be registered and have delivery off
08:09MrCooper: and if you're not subscribed, your posts may get delayed due to going through the moderation queue
08:30tagr: sima: thanks, will do that
08:31Ermine: emersion: ok, thank you, will do that
08:59tzimmermann: jfalempe, thanks for the reviews. if nothing else comes in for the client lib, i intend to merge it by the end of the week, which should unblock drm_log
09:01jfalempe: tzimmermann: you're welcome, and thanks for the client lib work, that's really appreciated.
12:43zmike: daniels / MrCooper: probably one of you should take a closer look at https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31658
12:44MrCooper:adds to list
16:00zamundaaa[m]: When using writeback connectors, the WRITEBACK_PIXEL_FORMATS property only tells me what formats I can use, but not which modifiers... is it just assumed to always be linear?
16:33emersion: zamundaaa[m]: good question
16:52linkmauve: K900, speaking of the NPU of that SoC, I’m really unhappy with the teflon API, which depends on the Python library tflite_runtime and doesn’t expose anything to make it usable in any different library.
16:53K900: I honestly don't know much about Teflon other than people are working on it
16:53K900: I don't really have any use for the NPU myself
16:53linkmauve: It is very different from OpenGL, Vulkan or any other API implemented by Mesa, which can be used by any program from a C ABI, here it exposes a single function to which you have to pass a tflite_runtime graph object and which will modify it to run parts of it on the NPU.
16:54linkmauve: I currently would have two uses for it, automated translation and noise reduction, both of which are way too slow to do in real-time on the Cortex-A76.
16:57linkmauve: I use tract for most of my experiments in that area, but it would have to reimplement the exact memory layout of tflite_runtime to be able to make use of teflon, which is most likely not the way forward. But I don’t think I’d be able to design a cross-vendor cross-platform cross-everything API that would allow NPUs to be exposed by Mesa to any program using a proper C ABI.
17:17K900: Yeah, I run my 3588s as server boxes
17:17K900: So nothing like that
17:38Ermine: speaking of 3588s - it is served by panthor driver, right? But it doesn't expose DRIVER_MODESET nor DRIVER_ATOMIC
17:39daniels: Ermine: yes it is panthor, it's only a GPU driver; the display driver is completely separate
17:39daniels: on Rockchip SoCs it's rockchip-drm
17:39daniels: (which does support atomic)
17:43linkmauve: K900, even as a server, you might want to do automated translation for instance.
19:55Ermine: daniels: oic
20:17galileohasit: in my opinion it is all fixed by operands stored in the materials. so once you feed a 1 it is 10000....and samewise other operand is 5 101000, it could only yield an answer set in range 6 , and six is made up from 3+3 1+5 2+4 0+6, so only 4 combinations, that means, so how to get the subindex, so 1 yields 64 in format 561 and 5 does 64+66=130+64=194 at that index you access 1st
20:17galileohasit: combination at subindex 1 of course. So the question with multiply, as to how many combinations one has to expect in the hash, that might be logarithmic loop indeed but that is not performance sensitive, it depends as to how many fixed integer multiplies of two operands yield the same result. So now how many combinations have to be pinned or expected? well since none of the operands
20:17galileohasit: collide in internal adder once you subindex we can assume a linear mapping isn't it, so expected combinations are the amount of adder collisions not more, so when you fix the operands to a sum of 1 or 2 or 3 or 4 or 5 or 7, the collisions is way over the count of possible answers at in max 7 variations, so but collision amount is that what counts, does it then make sense of saying that
20:17galileohasit: on several rounds of packing when collision amount shrinks the combinations amount grows or shrinks or what it would do, imo it shrinks and it is allowed to right? so 4.2xxx billion given subindex 1 can only yield a max at (max-1) addend but as you expect max operand variance collisions are 4.2billion you say, what i say any of the 561 format combinations yielding to 561 is enough, cause
20:17galileohasit: until it really went to max it did not matter what the operands were, so 561+1024+32 is 0to33 where 32 is 0 and 32+33=highest power, so 561+1024+32 is maximum, so in the context of multiplier you encode some powers for value 1 is 32+33=65 and value 5 is 32+33+35=100 their sum is 165 and that pins the results of multiply that might be gotten from 6, so at1=5 at2=8 at3=9 are the only
20:17galileohasit: combos. you subindex to 1 aka 32+65 from 165. so remember 6 can be also 2+4 3+3 and 0 +6. now the worry is: amount of internal adder collisions if anything else as you see and that they can be gotten rid of by either within the intrinsic, or just encode one or both operands one level deeper and summing them will yield no collisions, so 5 is 33+35+32=100 is now 32+3+32+6+32+7=112 in two
20:17galileohasit: rounds however 4 is different as 72, so now 2 is 107, so 107+72=179 where but 112+71 is now 183 , where 71 is double rounds done for 33. it's symmetric routine can be gotten back to, as well as can be mapped for dependencies during execution, to understand the execution btw, took me less time then getting to data and io selective access, cause the theory is simpler though the work is
20:17galileohasit: larger at them.
21:04Ermine: dwfreed: ^
21:06dwfreed: ack
21:08Ermine: thank you
22:53DemiMarie: jenatali: do you by any chance know what the “PCI location path” Windows uses for Discrete Device Assignment is, and if it has any analog on Linux (sysfs?)?
22:54DemiMarie: This is somewhat on-topic because it is needed for GPU passthrough in a somewhat reasonable way.
22:54jenatali: No idea
22:58DemiMarie: Thanks jenatali.