08:16mivanchev: hey, developer of static-wine32 here, I have a question regarding 24.3.0. How are the GLX drivers loaded now if driOpenDriver is missing? is there a new architecture without the mega driver?
08:51MrCooper: mivanchev: are you asking about the client side or the X server side?
08:52mivanchev: MrCooper, client side. Specifically I want to know what's required to load the GL.so statically
08:52mivanchev: I'm confused as to what substitutes driOpenDriver
08:54MrCooper: the new loadable binary is libgallium-<Mesa version>.so
08:56mivanchev: MrCooper, is this still a mega driver and how is it initialized?
08:56MrCooper: yes, and via the usual GLX/EGL/... APIs
08:58mivanchev: ok, so hypothetically if i have libgallium-<Mesa version>.so's code in a static mesa, no further steps will be need from the Wine side, just calling GL functions?
08:59MrCooper: driOpenDriver was always Mesa-internal API AFAICT
08:59MrCooper: not sure it's that simple, it's presumably dlopen'd
09:00mivanchev: yes, i saw the dlopen so I thought something happens in the initialization code which i can't find
11:23eric_engestrom: mareko: I think you might know the answer? ^
11:23eric_engestrom: oop ignore me, for some reason I was missing all the replies from MrCooper 🤦
11:36Lynne: do GPUs suffer the same performance issues as CPUs when it comes to denormals?
11:45kode54: presumably they can have the same level of denormal filtering that SSE has on CPUs
11:45pendingchaos: Lynne: for recent AMD GPUs, I don't think denormals should have a large performance impact
11:45pendingchaos: omod (free multiplication by 2/4/0.5) doesn't work and most inexact transformations are disabled
11:45pendingchaos: before RDNA2/GFX10.3, we had v_mad_f32 (unfused multiply-add), which was great but didn't support denormals
11:45pendingchaos: we can use v_fma_f32 instead but that's inexact and also slow until Vega/GFX9
11:47glehmann: I think on gfx6 and 7 enabling fp32 denorms caused slow down to fp64 rate
11:48Lynne: why would a fused mult add be less exact? isn't it a point of fusing to increase accuracy as well as speed?
11:48pendingchaos: it's not exactly the same as what the programmer wrote, so it's inexact
11:49glehmann: > and most inexact transformations are disabled
11:49glehmann: pendingchaos: isn't it the other way around? explicitly flushing denorms disables inexact patterns, explicitly preserving doesn't matter
11:50pendingchaos: glehmann: right. I misremembered and thought inexact transforms were disabled in any float control was used
11:51pendingchaos: if the shader author uses the invariant or precise keyword for an expression, we can't use v_fma_f32
11:54pendingchaos: unless the fma() builtin is used IIRC
13:09Lynne: by the way, monthly reminder that no profilers for pure compute-only vulkan exist
13:53glehmann: Lynne: what do you mean? I think radv RGP capture support for compute should be possible, so if you want that, create a mesa feature request issue
13:58Lynne: nope, RADV RGP only triggers if there's a frame that the client draws
13:58Lynne: modifying it isn't so easy either
13:59Lynne: I gave it a shot and I was able to get it to trigger on a dispatch, but it generated blank files, and if I forced it, it simply crashed the GPU
14:07glehmann: well the chances of someone with a better understanding of the RGP code looking into it are going to be a lot higher with a feature request issue
17:25vasilyvisniak: As you have recognized or noticed the video encoder/decoder needs to be rewritten as is the case with many other things. I implemented in place algorithm for FS satashing. I had taken a trace of all number theory and numbers and their algebra has been done entirely correctly by scientists. Mathematics is the language of universe (cited from. Grigori Perelman's talks). But i will not be
17:25vasilyvisniak: doing any of this anytime soon, because I need to fill my wallet a bit before.
17:39vasilyvisniak: We need to talk about underlying maths a bit more at another time to clear things up, what some legal research seems to suggest is exponention FFT/DFT and alike which causes very high losses, such algorithms publicly endorsed leads to high entropy of things, the real science at numbers is done actually differently. The actual methods are based of base 10 logarithms, but even this is
17:39vasilyvisniak: misleading since, those logarithms are taken upfront, and rounded and looped for compilation in contiguous uniformity and for runtime they never get called either. But this science mentioned earlier is being done deliberately to cause higher throughput of sales of electronics and energy that is needed to run them. Planned obsolescence is not actually even a thing on most electronics on
17:39vasilyvisniak: it's own though.
17:41mareko: eric_engestrom: libgallium-$version.so is a linked lib now, not dlopen'd
17:42mareko: Mesa doesn't dlopen gallium drivers anymore
17:56stsquad: can the mesa intel drivers be built for non-x86 platforms? (I need to test the QEMU native context patches and wondered if cross-arch would work)
17:59eric_engestrom: mareko: ack, thanks!
18:40dj-death: stsquad: seems to build fine on a M1
20:20stsquad: dj-death: excellent.. just need to tweak my buildroot test image build
20:37eric_engestrom: dj-death: last I heard, i915 doesn't build on aarch64 (because libdrm_intel doesn't build); has that been fixed?
20:37eric_engestrom: but crocus and iris build fine
21:46kismetrealm: so what we have is a tree of index+const to derive other specifiers. for an example the algorithm goes like this: 883 is 115+768, index component is just contiguous where as data powers are in fact taken more carefully to reflect certain none colliding minified powers to perform calculus. But the other things first. The modulation of 883 is 443+440 as told, where the first term goes as
21:46kismetrealm: base of 256 +value of 72 index of 115 and second term as base of 256 + length/distance of 69 + index of 256, now final term being valuedelta 141 which comes as 1024-883 and indexdelta coming as 1024-883-115=26, now the last calculus is also hashsum capable i call them sumvector capable but more on this later. Now very important construct is 397 inverse index. Now we start to
21:46kismetrealm: separate/synthesize the value: 883-397-141-16-320=9 (notice the cross referencing) now 141 aka 69+72+397+397+397+397-883-883=-37 so 883+883-69-72-397-397-397-397=37 so 9 added to it is 46 so we add 26 and the result comes back as 72. Now this algorithm is the last in place version, considered slightly superior than the last i offered, since now all you maintain is a tree of index+const
21:46kismetrealm: or twice inverse index+index aka, 397+397+115-26 is also 883. Now we demonstrate/show as to how we do sumvector calculations on getting the ideltas. And the relevant arithmetics is here: 894.00+883+888+120−115−126=2544 2560-2544=16 894.00+883+888+115−126−120=2534 2560-2534=26 894.00+883+888+126−115−120=2556 2560-2556=4 , where we added one more subterm to hashsum 894 is
21:46kismetrealm: 126+72+256+57+126+256=894 also coming as said before as 126+386+386-4. So that's all folks today the in place versions work great , and filesystem is possible to be built. Last real algorithm was not in place and was spread to the world by me as nickname boratfromkz 2024-11-11 17:33 on #dri-devel i.e same channel. Have fun.