00:08 gfxstrand[d]: Ugh... I really don't want to debug more kopper bugs...
00:11 orowith2os[d]: Is it only NVK, or consistent on AMD and Intel too?
00:13 orowith2os[d]: I really want to say I've ran Steam on AMD with Zink before, in the timespan of this issue, but I'll have to double check
00:20 mhenning[d]: orowith2os[d]: Some people have reported nvk working fine too. It's not totally clear what's required for the bug to trigger
00:21 mangodev[d]: mhenning[d]: i wonder
00:21 mangodev[d]: what desktops/wms does the bug *not* happen on?
00:21 mhenning[d]: I have no idea if it can happen on amd or not
00:22 mangodev[d]: i wonder if it's a conflict between steam, kopper, nvk, and the way kde handles things
00:22 mhenning[d]: In my case, I can reproduce under gnome
00:22 mangodev[d]: interesting
00:23 mangodev[d]: according to the thread, it also happens under x11 too, so it's not a wayland thing
00:23 mhenning[d]: but yeah the link to kopper/winsys is new before it wasn't clear where things are failing
00:27 mangodev[d]: progress is progress
00:27 mangodev[d]: and finding a specific part of mesa that's the culprit is big
03:14 airlied[d]: just FYI, latest F42 with a nightly mesa copr and latest steam seems okay to me right now under gnome
03:15 accurateentirely: So what does it boil down to, if yuo get reference correct as 224 and a remainder of certain magnitude was shown to you.224+68-224-68 would eliminate the element without the rest of the prodcures ever needing to execute, which i already uploaded such as compensation and correcting banks, which otherwise need no flow either but it's already in it's simplest form. IRC registered this
03:15 accurateentirely: with compensation you would end up doing that part -80+35+35=-10 and hence 193 added to it is 183+124+124-124-124-124=59 55+55+55-35-79=55+55-59 etc. now those parts you never have to run cause we made the calculation to be perfectly accurate. That is more like for fuzzing or cryptically complicating the code which needs not to be performed. Where HdkR is actually wrong is that
03:15 accurateentirely: whatever you lift the annotation code from needs to binary translation in packed format, it needs code property graphs from wasm only.
03:15 accurateentirely: to/no
03:46 robusocastri: You think that you are mindill trash in humans life's history and all the sudden know someone worthwhile from Estonia? That is never a case, where human trash knows something from not ones onwn groups, so for an example your deluded monkey airlied knows same human shit type of abuse people from estonia who will be all turned into concentration camp in a war like he is only, he is absolute
03:46 robusocastri: human garbage in life and hence knows concretly people from his group. I know people from real groups etc and what he knows excites me none or interests.
03:52 mangodev[d]: airlied[d]: "latest" as in latest stable, or latest beta channel?
03:54 airlied[d]: latest stable I'd sayu
03:56 airlied[d]: oh I had gpu rendering option off already
03:56 airlied[d]: now it's a flicker fest
03:57 airlied[d]: I had Enable GPU accelerated rendering in web views
03:57 airlied[d]: turned off
03:57 mangodev[d]: airlied[d]: for me it's my mouse that's a flicker fest
03:57 mangodev[d]: is steam native wayland now?
03:57 mangodev[d]: steam is giving me multiple signs that it's now a native wayland window instead of using xwayland
03:58 mangodev[d]: the cpu renderer runs really smooth, but the cursor does a weird shake any time it changes type
03:58 mangodev[d]: …which is also done in a chromium window if you don't specify `WaylandWindowDecorations`
04:02 airlied[d]: ah inded zero vram goes all black screen
04:02 soreau: What is the breakdown of what library/platforms steam games use? Are most native windows games that run through wine/proton or are there some large percentage that use SDL/x11? Just wondering if there will ever be a native steam-on-wayland planned or if it's already a thing
04:13 HdkR: soreau: something around 50% of games on Steam are Unity+Windows.
04:14 soreau: Ah, Unity is windows-only?
04:17 mangodev[d]: soreau: no, windows is just a common build target for unity
04:21 airlied[d]: disabling kopper gets me a wierd loop of nothing rendering
04:26 mangodev[d]: airlied[d]: maybe it's not all kopper?
04:27 mangodev[d]: what if something in nvk isn't playing nice with kopper
04:30 mhenning[d]: airlied[d]: when steam crashes, it auto-restarts, which can result in it repeatedly popping up and closing windows
04:30 mangodev[d]: what if it's the same thing that causes discord to segfault and soft crash
04:30 mangodev[d]: both are chromium, and both are an nvk-specific crash
04:30 mangodev[d]: or at least something related
04:37 HdkR: If the browser also isn't making forward progress on a render to get the heartbeat response out in ten seconds then it'll force restart the process as well
04:38 HdkR: So if it is hanging on a fence, you'll see it receive a SIGKILL eventually.
04:44 airlied[d]: gfxstrand[d]: btw if you want to knock CTS time down a bit more diff --git a/external/vulkancts/modules/vulkan/renderpass/vktRenderPassMultisampleResolveTests.cpp b/external/vulkancts/modules/vulkan/renderpass/vktRenderPassMultisampleResolveTests.cpp
04:44 airlied[d]: index 58d53cda9..29a437c2e 100644
04:44 airlied[d]: --- a/external/vulkancts/modules/vulkan/renderpass/vktRenderPassMultisampleResolveTests.cpp
04:44 airlied[d]: +++ b/external/vulkancts/modules/vulkan/renderpass/vktRenderPassMultisampleResolveTests.cpp
04:44 airlied[d]: @@ -450,7 +450,7 @@ vector<AllocationSp> MultisampleRenderPassTestBase::createBufferMemory(const vec
04:44 airlied[d]: {
04:44 airlied[d]: VkBuffer buffer = **buffers[memoryNdx];
04:44 airlied[d]: VkMemoryRequirements requirements = getBufferMemoryRequirements(vkd, device, buffer);
04:44 airlied[d]: - de::MovePtr<Allocation> allocation(allocator.allocate(requirements, MemoryRequirement::HostVisible));
04:44 airlied[d]: + de::MovePtr<Allocation> allocation(allocator.allocate(requirements, HostIntent::R));
04:44 airlied[d]: VK_CHECK(vkd.bindBufferMemory(device, buffer, allocation->getMemory(), allocation->getOffset()));
04:44 airlied[d]: memory[memoryNdx] = safeSharedPtr(allocation.release());
05:26 melindonatri: My work to the open source systems, is not unspectacular or insignifficant, otherwise they would not investigate my stolen computers each time to get clue as to where to go with their work. Samewise as a person i had never been anything less , otherwise they would not had so amoral plan to injure me alike. I know where you got the shaders for vulkan video, i know where you branched your
05:26 melindonatri: opencl etc. It's coming all from my research labs. And i have nothing against that, it's just you are angry and impatatient to receive my latest work again onto your name, which is like whatever and you do want to mistreat me with it, what makes me thinking about ways to either get this last work on my achievements portfolio or not implement it at all, cause all my home computers are wired
05:26 melindonatri: to abusers ISP controlling my packets as well as hard taps of other kind.
05:37 HdkR: 💃
05:49 DocDidonandy: Your nobel prize winners are not only super idiotic freaks in most cases or even all cases but in the case of Svande Pääbo, we are already talking about abuse terrorist along with other butchering doctors from Estonia, who backed up their work which is entire nonsense with a discriminating terror. Either science in general or computer science you know nearly nothing about. And your
05:49 DocDidonandy: fex-core is a total nonsense pony which is all what you do. It serves as bad example cause there is not even point to binary translate from wasm output say from x86-64 to arm64, cause wasm supports all backend natively, you just compile those procedures to the target instructions.
05:49 DocDidonandy: And every one knows the corruption of such prize associations along with your terror committed, you will get the real awards for it later in the war.
06:23 HdkR: 🎉
06:36 outofthewindow: this pony tail will be captured along with other fraud comitters and scammers/terrorists without issues, he can get some anal before thrown out from the window, the tail is pretty fantastic to use it as a grip holder while performing anal sex and oral to all of his holes basically.
06:37 outofthewindow: One did not understand that he is not allowed to speak aout me nor stalk or do shit to me.
06:47 HdkR: 🎉
07:14 thechoiceisyours: no one will really offer any create transactions to you, to continue such abuse, we will find all the team abusers from europe and elsewhere that harassed me in cambodia, and will treat them terribly.
07:20 HdkR: ❤
07:35 samsworth: So tvm generates those shaders to vulkan and opengl out of any of the custom codecs. caffe does also opencl etc. It's not a complex task at all, along with boyi which implements all of the opencl in any form of possibly possible, of course the work and suggestions came from my labs. Similarly all llvm work was spawned from my research labs that of firrtl which i messed with yices2 now you try
07:35 samsworth: to say to me, that i did few work, when you got all the code from me, that is why i harass you on the channel, that you can not say such shit at all human fucking dumplings.
07:36 HdkR: 🎉
07:36 snowycoder[d]: How bad is it if I use gotos directly from nir in Kepler to avoid unwanted syncs?
07:49 battleonceagain: DLVC is the source code that generates shaders for the the video encoders and decoders that my research lab offered, it's chinese research, where as boyi was south.korean if my memory serves, since you are not willing to accept my code that belongs to my development, cause i have much more powerful solutions to such problems.
08:06 illuminati19: It's never a great deal to talk to you in any kind of way, since you are so big retards it's just nervy to put you through to any kind of nirvana or heaven in code. It's likely cause you belong to hell instead and have accessed such people from finland and estonia who are hell monsters in life. But it matters not much to me, cause every day computer usage should be great experience now for
08:06 illuminati19: anybody without my last highest profile words of development added, words i added and pseudo code, but real code i do not bother to upload anymore.
08:32 performancetriangle: I myself am not interested in those codes offered, cause i have better methods as i proved, and it's also because my box of computers i have consist of low-end and entry level performance equipment, with only minor addition to code i get them running well, and i still consider all this hw to be world science and wonder, where lot of real pwople have worked on it behind the curtains
08:32 performancetriangle: just like me. It was already years back when i saw, how well things were maintained even for MONO, i commented on it, i got all working there that i needed, and gaming platform based of mono and nuget with little code added is not complex work to me, i just lack clearence or allowance type of grant to work with directx games.
08:33 performancetriangle: Otherwise the linux platform and also even mono was just fabulous.
12:03 snowycoder[d]: gfxstrand[d]: Ok found an answer and it's quite simple: If we clamp a negative number we clamp it to 0, if we clamp a positive, way-too-big number we clamp it to the positive bound
12:57 karolherbst[d]: I wonder if I want to convert matrices stored in function temp memory to SSA values entirely and implement all those nir intrinsics on top of SSA values (and add new intrinsics for it or so)...
12:58 karolherbst[d]: so just have a nv_mat_load (from global or shared) doing the layout stuff, and then everything else is just plain operations without the matrix semantics (except maybe mma)
13:03 gfxstrand[d]: snowycoder[d]: Oh, right. That makes sense. Except we never actually want to clamp, just bounds check.
13:04 gfxstrand[d]: snowycoder[d]: Very
13:05 gfxstrand[d]: karolherbst[d]: We're not already?
13:06 karolherbst[d]: gfxstrand[d]: we do, but the nir code often loads/store from/to private memory, and then we have a lot of roundtrips and offset math
13:06 karolherbst[d]: not sure if it all gets optimized away properly
13:06 karolherbst[d]: probably yes
13:07 karolherbst[d]: it's just that a lot of the math relies on the lane id
13:07 karolherbst[d]: it usually gets all optimized to SSA values, but I'm also seeing some weird code around I'm not entirely sure how to get rid of
13:08 gfxstrand[d]: A matrix store isn't a terrible idea
13:13 karolherbst[d]: the issue is that the nir intrinsics operate all on memory, not SSA values
13:35 gfxstrand[d]: They're indeed to be lowered to intrinsics on SSA values.
13:35 gfxstrand[d]: It's just that that intrinsic on Intel or AMD is just `broadcast()`.
13:36 gfxstrand[d]: Or maybe some DP4
13:59 karolherbst[d]: mhhh
14:34 karolherbst[d]: the issue with nvidia is, that the per values in the matrices need to be in the right SSA value in the right thread, and there are a couple of different layouts, which are kinda weird. Like it's not really about the bit size, and int16/fp16/fp32 matrices have the same layout, but then fp64 is different and looks more like it's merging two fp32 values and just extends the overall size. Still need
14:34 karolherbst[d]: to understand how movm fits into it all. Sure it can be used to transpose a 16x16 matrix of 16 bit values, but... not sure if we can use it on other data types easily, or if it's just a turing limitation and stuff...
14:46 wechosenottodealwithyou: The coolest thing about LLVM besides mlir/multilevel ir hardware and soc generators is the linker and binary containers linker script it supports all containers from major os's executable binary formats to be linked into filesystem that can be implemented through the linkerscript language, i just maintain different systems there, but you can link any metadata to be added at the
14:46 wechosenottodealwithyou: linkers level, but you can not execute binaries without it, so llvm has grown to support all of IOS/macosx dyninst tapi and such binary containers.
14:46 wechosenottodealwithyou: you can use it for an executable including for the sake of gpu code.
15:09 karolherbst[d]: okay.. the membar issue is solved with `nir_opt_barrier_modes` 🙂
15:13 karolherbst[d]: mhenning[d]: want to push that MR through shader-db or whatever you are all using? https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35055
15:13 karolherbst[d]: skipping membar still gives more perf, but it's now +5% and not +60% 😄
15:14 karolherbst[d]: I think the main benefit here isn't the membar, but like the +7 waits in a hot loop per membar
15:15 karolherbst[d]: shaders doing membars in a loop could benefit from it
15:15 karolherbst[d]: but not sure how often games do that
15:17 karolherbst[d]: 903 -> 865 instructions in this benchmark here
15:17 mhenning[d]: Sure.
15:18 mhenning[d]: Have you cts'd that? Last time I tried turning on nir_opt_barrier_modes it broke cts eg. dEQP-VK.memory_model.write_after_read.core11.u32.coherent.fence_fence.atomicwrite.device.payload_nonlocal.buffer.guard_local.image.comp
15:18 karolherbst[d]: nah, I just tested it
15:18 mhenning[d]: I never went though the trouble of figuring out why
15:18 karolherbst[d]: mhhhh
15:18 karolherbst[d]: maybe I could investigate it
15:20 mhenning[d]: My hunch is that it has requirements for when the pass is called, so trying to mimic the pass order anv uses might help
15:20 karolherbst[d]: yeah...
15:20 karolherbst[d]: could be something related to nir_opt_combine_barriers
15:20 karolherbst[d]: dunno tho
15:21 karolherbst[d]: it's funny that such a "small" change makes sometimes such a perf difference
15:25 karolherbst[d]: mhenning[d]: still passes here
15:25 karolherbst[d]: do you have your commit where it broke?
15:27 mhenning[d]: No, I did this quite a while ago
15:27 mhenning[d]: all I have are the notes that it broke that test
15:27 karolherbst[d]: I see
15:53 mhenning[d]: karolherbst[d]: Oh, wait, I actually just remembered the commit is in an MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26347/diffs?commit_id=363c7514246841895c5df2267d3c080d3f3c7c9e
15:53 mhenning[d]: That was where I tried to add it, in 2023
15:55 karolherbst[d]: mhhhh
15:56 karolherbst[d]: only significant difference is the order of those two passes
15:56 karolherbst[d]: passes regardless here
15:56 gfxstrand[d]: karolherbst[d]: Then that needs to be sorted out with bits on the intrinsic
15:57 mhenning[d]: karolherbst[d]: well, yeah. It's possible other stuff has been fixed in the meantime. Anyway, I wouldn't worry about it if cts passes currently
15:59 mhenning[d]: although, anv does call nir_opt_barrier_modes during preprocess and it might make sense to mirror that
16:00 mhenning[d]: since they're the only vulkan driver that uses it right now
16:23 karolherbst[d]: gfxstrand[d]: currently it's nak specific intrinsics to turn it all into SSA values, and load/stores from memory. It does work, but I'm not entirely happy about keeping the round-trips to function memory
16:56 karolherbst[d]: mhhhh
16:57 karolherbst[d]: can be store driver private data in nir_variables?
16:58 karolherbst[d]: could use driver_location...
16:59 karolherbst[d]: the big issue is making it all sane enough for cmat_convert, because that can force a layout change which does require moving values between threads given how things are designed atm
17:02 karolherbst[d]: though if you do cmat_muladd + cmat_convert + cmat_muladd then you won't be able to get rid of those anyway
17:14 logsincluded: You are one of the highest order human feces excrements that i have ever tried to work with, as stupid as fucks, and have not achieved anything in real tasks compared to me, and you complain on top, LOL. Regardless the fact that i have all logs about things i proposed and what i have worked on from all private messages.
17:14 logsincluded: https://blog.wingman-sw.com/linker-substitution-in-c-limitations-and-workarounds
17:15 logsincluded: Every ISP has been called my toxic deadly stupid moron, that i spam something.
17:16 logsincluded: But your state is only exclusively thanks to my lab anywhere functioning.
17:42 ransomwareTED: I have explained or tried to my people what type of fraud those complaint fillers do, including money laundrying, robbery and stalking, yet they report stuff, and can not even program as the most amusing fact about you, cause all the programmingn concepts i had to report, cause you found nothing like no real escapes or solutions your own.
17:43 ransomwareTED: You do not even likely understand that my code predicts nothing to achieve performance.
17:43 ransomwareTED: it just never predicts anything, it already knows what the answer will be.
17:44 ransomwareTED: In other words, there is no prediction there is only correct result which is 100percent correct all the time.
17:45 ransomwareTED: Why do you spam your Markov Chains is mystery to me, as to why you can not think your own as well.
18:31 gfxstrand[d]: karolherbst[d]: Yeah, we want to get rid of function memory. It's not as terrible on Nvidia a some but we're never going to hit max flops if we can't keep stuff in registers.
18:37 karolherbst[d]: nir passes are able to get rid of it all, but I don't like that we rely on nir passes here
18:37 karolherbst[d]: though maybe it's fine
19:19 gfxstrand[d]: Oh, if NIR passes get rid of it all, I'm not too worried.
19:19 karolherbst[d]: I'm more worried about losing the matrix layout information
19:19 gfxstrand[d]: Sure
19:19 karolherbst[d]: so keeping the variables around does make sense
19:19 karolherbst[d]: I'm just wondering if we can do better than load + matrix op + store
19:20 karolherbst[d]: but maybe that's fine
19:20 karolherbst[d]: but we also don't need that information after lowering anymore, so... dunno
19:21 karolherbst[d]: I think I'm just very unhappy how conversion works
19:22 karolherbst[d]: atm I've written the layout conversion code by hand and it's a bunch of inefficient shuffles
19:22 gfxstrand[d]: Well, for coop vector, we may end up adding massive vector sizes to NIR at which point we might be able to make it all SSA values eventually.
19:22 karolherbst[d]: good thing is, for nvidia you can kinda treat it all as multiples of 8x8 vectors and you'll be fine
19:23 karolherbst[d]: mostly
19:23 gfxstrand[d]: But at least for the initial parsing, we can't do that because the number of SSA components depends on subgroup size.
19:23 karolherbst[d]: right...
19:24 karolherbst[d]: and then you have those cursed layouts with replicated values like the old 8x8x4 matrix stuff 🙃
19:24 karolherbst[d]: but I'm considering moving it to a branch and let it rot there
19:24 karolherbst[d]: and if somebody really cares, they can dig it out again
19:24 karolherbst[d]: I kinda want to prototype fp64 support tho sooner or later
19:25 karolherbst[d]: though I think I'll need a new GPU for that
19:25 karolherbst[d]: ehh no, ampere has DMMA
19:42 airlied[d]: karolherbst[d]: is there any hw instructions for conversions?
19:42 airlied[d]: like at a matrix level
19:42 karolherbst[d]: movm probably?
19:42 karolherbst[d]: I think the transpose might be good enough not having to do shuffles, but not sure
19:42 karolherbst[d]: but the type conversion we have to do manually
19:43 karolherbst[d]: most of the time you don't even have to shuffle
19:43 karolherbst[d]: it's just when you convert to int8 types for IMMA
19:43 karolherbst[d]: *to/from
19:43 karolherbst[d]: int16/fp16/fp32 all have the same layout
19:43 karolherbst[d]: so it's just i2f/f2i
19:43 airlied[d]: ah yes other than movm, there's no cross-warp magic ones
19:43 karolherbst[d]: and no shuffle
19:44 karolherbst[d]: just the int8 matrices have a slightly different layout, so values need to be moved around (and the old 884 HMMA ones)
19:45 karolherbst[d]: it's all very quad based, so _maybe_ we can make use of quad ops.. or maybe I just check what nvidia is doing 🙃
19:45 karolherbst[d]: like uhm... let me check my notes
19:46 karolherbst[d]: T0: needs 0,0 0,1 0,2 0,3 (new layout) located in T0.x, T0.y, T1.x, T1.y (old layout)
19:46 karolherbst[d]: T1: needs 0,4 0,5 0,6 0,7 (new layout) located in T2.x, T2.y, T3.x, T3.y (old layout)
19:46 karolherbst[d]: T2: needs 0,8 0,9 0,10 0,11 (new layout) located in T0.z, T0.w, T1.z, T1.w (old layout)
19:46 karolherbst[d]: T3: needs 0,12 0,13 0,14 0,15 (new layout) located in T2.z, T2.w, T3.z, T3.w (old layout)
19:46 karolherbst[d]: and then the pattern just repeats
19:47 karolherbst[d]: T0: needs 0,0 0,1 0,8 0,9 (new layout) located in T0.x, T0.y, T2.x, T2.y (old layout)
19:47 karolherbst[d]: T1: needs 0,2 0,3 0,10 0,11 (new layout) located in T0.z, T0.w, T2.z, T2.w (old layout)
19:47 karolherbst[d]: T2: needs 0,4 0,5 0,12 0,13 (new layout) located in T1.x, T1.y, T3.x, T3.y (old layout)
19:47 karolherbst[d]: T3: needs 0,6 0,7 0,14 0,15 (new layout) located in T1.z, T1.w, T3.z, T3.w (old layout)
19:47 karolherbst[d]: for the other way around
19:47 karolherbst[d]: the pain part are those xy <-> zw shuffles
19:48 karolherbst[d]: it _looks_ like something that doesn't need 8 shuffles and 4 bcsels 😄
19:50 karolherbst[d]: also... you can probably just pack the int8 data and....
19:50 karolherbst[d]: like move the int8 values around, then it's 2 shuffles and maybe some prmt
19:53 karolherbst[d]: airlied[d]: movm isn't even cross-warp
19:54 karolherbst[d]: it's actually a quad operation
19:55 karolherbst[d]: mhh well..
19:55 karolherbst[d]: kinda
19:56 karolherbst[d]: mhh yeah maybe actually movm doesn't work for this...
19:57 karolherbst[d]: mhhh
20:05 karolherbst[d]: maybe I should write some PTX code and see what nvidia is doing 😄
20:09 karolherbst[d]: I don't think cuda supports it....
20:15 snowycoder[d]: Kepler shfl.up requires 128-bit aligned lane.
20:15 snowycoder[d]: Only for shfl.up, not for down, idx or bfly -.-
20:16 HdkR: butterfly is your friend :D
20:16 snowycoder[d]: Why?
20:18 HdkR: Been a while since I've looked at it but it kind of lets you brute force reductions without touching shared mem.
20:20 HdkR: Also it has a great name.
21:22 mhenning[d]: gfxstrand[d]: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35033 is ready for another round of review whenever you have time
22:00 gfxstrand[d]: Probably Wednesday.
22:01 gfxstrand[d]: snowycoder[d]: That's annoying.
22:27 gfxstrand[d]: Maybe we just lower shfl.up?
23:00 mhenning[d]: gfxstrand[d]: snowy already added a legalization which I think is reasonable https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35059