00:01 steel01[d]: gfxstrand[d]: Well, that's already looking better. That's an awful lot of skip, though. GM20B too old? Thought I'd heard that it should support the current vulkan spec, though.
00:01 karolherbst[d]: uhh.. seems like I need to use the filter callback stuff.. oh well.. something for tomorrow
00:01 airlied[d]: there is always a lot of skip
00:08 mhenning[d]: yeah, to give you an idea, on a recent desktop run I got `Pass: 1401922, Warn: 4, Skip: 1449293, Duration: 1:41:10, Remaining: 0`
00:20 gfxstrand[d]: Yeah. Half skip is expected even on full featured drivers.
01:13 airlied[d]: karolherbst[d]: btw we should reorder the advertised coopmats in the same order as NVIDIA, they are meant to be listed in the preferred usage order
01:13 airlied[d]: I got stuck on radv, so switching gears to adding a few of coopmat2 bits to nvk for now
01:41 gfxstrand[d]: gfxstrand[d]: This is gonna be a problem...
01:41 gfxstrand[d]: > There must be at least one memory type with both the `VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT` and `VK_MEMORY_PROPERTY_HOST_COHERENT_BIT` bits set in its `propertyFlags`.
01:43 airlied[d]: it's called uncached
01:43 airlied[d]: nobody said it had to be fast 🙂
01:50 mhenning[d]: Do we have uncached?
01:50 mhenning[d]: I think we don't have it on desktop although I'm not totally sure why
01:51 HdkR: Did the uapi never expose uncached? Bit hard to map otherwise.
01:57 airlied[d]: guess we need to work that out first 🙂
01:58 gfxstrand[d]: It's exposed in the API but Mary claims it's broken. :frog_upside_down:
02:07 gfxstrand[d]: Okay, I'm gonna try using them from NVK and see how badly things blow up.
02:07 steel01[d]: 🔥
02:19 gfxstrand[d]: gfxstrand[d]: Yeah, getting misc bus errors. 😢
02:25 gfxstrand[d]: So, yeah, we're going to have to fix the kernel
02:25 gfxstrand[d]: I think my plan is to clean up my branch some more tomorrow and land what I have.
02:26 gfxstrand[d]: It's all still hidden behind NVK_I_WANT_A_BROKEN_VULKAN_DRIVER so landing something that only mostly works is okay.
02:27 gfxstrand[d]: My query patches kinda suck but they're better than using COHERENT for queries, at least for now.
02:27 gfxstrand[d]: We can revert them when we get COHERENT working.
02:29 HdkR: Interesting
02:35 gfxstrand[d]: I suspect the COHERENT bug is pretty fixable but I honestly don't know where it comes from.
02:35 gfxstrand[d]: gfxstrand[d]: But as per the above, we can't claim Vulkan support until we fix it.
03:04 olegparlanxiv: gfxstrand: so i encourage you not to push such things into linux but permission is given to use for scientific non absuive legal activities
03:06 olegparlanxiv: i do many projects in the future since i have a company that builds such systems, those are not suitable in the hands of general public, too high profile
03:36 ochawung: gfxstrand[d]: discord pm since you complained that i do not show in lifetime my results or ways, my vision is the institutions i handle with such software i hide everything and those insitutions are surveilanced during all their lives, that they do not commit anything horrific. In memorial of great amoralism let us stand up for a minute for jack dedman who was one of my assaulter, we hold his
03:36 ochawung: funeral some days before my powers execute him.
03:38 HdkR: Today's a sweepy day.
03:39 gfxstrand[d]: IDK why he's tagging me. <a:shrug_anim:1096500513106841673>
03:39 gfxstrand[d]: But also, that's not a PM. 😂
03:43 HdkR: I do like the idea of DMing across a bridge by tagging someone. That's a typo away from a mistake :D
04:02 realmerith: well since i am banned from all doscord channels due to your reports, i just thought that was wiser to go that route, cause on disconnects it could recover the messages. 300+34+34+57+61+50+50−52−52−52−52−44=334 300+34+34+57+61+50+50+52+52+52+52+44 57+61+50+50-52-52-52-52-44=-34 so the 300 was remainder and 838-334=504 aka 204+300 so 204 can be removed then from hash 300 removed then 34 times
04:02 realmerith: two and then 4 times 50 to get to the field of first 4bits a half byte. So the test 2 300+19+19+57+61+65+50+52+52+52+52+44=823-319 that comes from 50-50-50-50+44 and 504 again , cause the spectrum was manipulated this way with proper numbers, as i said something from collatz, something from kececi, derivatives and computer science in general to get to such compression structure. heavily
04:02 realmerith: inspired by those things but also heavily modified the full synthesis is longer.
04:30 andyhunter: lz77 bzip are RLE, it is just part of jewish propaganda the last who in general are not the sharpest pencils but want to demonstrate their supremacy all the time, so we should attribute all the science breakthroughs to those totally 666 satans who abuse others, they invented nothing in reality.
05:20 willypenhold: one way looking at intel and amd fpga's is that luts are bunch of adders with register file 32bit adders, there is some wiring and with extreme resources and block ram added to the mix of internal registers. I did all that research long time ago, actually it was fun but turned out to be over my shadow dangerous. One thing you notice is the old promise that adder and subtractor would do all
05:20 willypenhold: the needed, so imagine where tree of such adders are synthesized correctly where i already know how to do that, i committed the work upfront as told, especially being with nonsense brutes and humiliators and was very depressed. so those FPGA's are bit scary. decoders and encoders for io can be all integrated , but ASIC can do wonders as well, i never bought old hardware for nothing i
05:20 willypenhold: recently bought mil-801 housing tablet too, it has this powervr gpu chip and intel atom, which i optimize my own, i bought it with one hundred euros, i dunno what the fuck happened with my equipment in cambodia , negros scammed me too, but those were retards. But a lot of talks were given about too much humidity, so i bought one tablet that should resist to those, though i more think i was
05:20 willypenhold: beamed by emp's and other out of band malicous beamers where the equipment gave up.
05:43 tranhcihui: If you read that in ukraine there was department of 1000persons scamming money from the world, by stealing their moneys through digital fraud. What happened was if the video wasn't fake which it likely was not, roudnd about 10missiles hit the entire building and this trash was wiped like previous gene hospitals committing fraud through others resources that they pumped out through terror and
05:43 tranhcihui: lifetime humiliations. Now with recent trends in europe i feel weird are they backing me up, they kill off very bad people consistently which is where i do not expect a long life for either estonian nor cambodian tyrans. Neither for europe scammers or traitors in medical reverse engineering or reverse psychologists.
05:47 HdkR: Those times when one is about to fall asleep, but then it feels like one's falling. Definitely the worst thing in the world, yuppers.
07:44 karolherbst[d]: airlied[d]: yeah.. I think the spec says "most perf on the top", but not sure if just copying what nvidia does is the right way? Though I'd order fp16 above fp32, and the bigger matrix sizes on top of smaller ones
07:44 karolherbst[d]: not sure if that's what nvidia is doing tho
08:45 airlied[d]: I think they wrote the spec so I hope got it right in their driver
09:20 karolherbst[d]: oh sure, but performance characteristics can be different across the drivers, and I'd rather have an algo doing the proper sorting than having a fixed list that might even change between gens or something
09:46 airlied[d]: I think it will be pretty much what NVIDIA measure, but maybe use the perf app, I think an algo would be overkill
09:48 karolherbst[d]: algo as in "put biggest matrices on top" or something, nothing complicated
09:48 karolherbst[d]: the biggest your matrices are, the less address calculation you have to do overall
09:49 karolherbst[d]: and the more likely you can use bigger ldsm, etc...
09:49 karolherbst[d]: so something trivial like that might be good enough, but maybe I should look at nvidias list and see if I can figure out something obvious
10:20 snowycoder[d]: Newer VK-GL-CTS versions do not check if the `vulkanMemoryModel` feature is enabled?? 0_o
10:50 gregorwhiteland: Well one of the biggest attacks when you did this mental illness propaganda about me during the years of 2008-2011 when i stayed overseas, australia , new-zealand, thai, and cambodia, was on my route back to home from cambodia, where trash like adam jackson and your fecalists back then everywhere annoyed me, so on my way to home, there was likely informator stepping out , that something
10:50 gregorwhiteland: bad is going to happen, and just like that my ticket was swapped, the flight was indeed scheduled from domededov and it made sense that western toxic worms something from your lines of suicidal clowns arranged a kill off for me, but domededov was a connecting flight back to tallinn, and 2days earlier they stole my laptop, but from internet and skype places they had around for small
10:50 gregorwhiteland: money, i received an e-mail that i can not enter to that flight, and in fact the explosive sank 165 or a large enough number of people straight to you know injusticed death, and it looks from that time, russian powers take everything very seriosuly as to what i say, but during life i have been brothers with many russian estonians and sure their lives matter, i like the girls alike
10:50 gregorwhiteland: either estonian russian or pure russian similarly enough and quite many so, so i tend to be not directly rusophobic too, but it's evident that their political powers are spotting a lot of anomalies around me, and perhaps are trying to help me which needs to be honored and there are too many anomalies around my life for sure compared to pure mean, so those take my hints and facts
10:50 gregorwhiteland: seriously, since i am serious guy too.
11:31 gfxstrand[d]: snowycoder[d]: New tests have bugs? <a:shrug_anim:1096500513106841673>
12:51 snowycoder[d]: gfxstrand[d]: They seem to be old tests, and the bug (if I'm not mistaken) is in vktTestCase.
12:51 snowycoder[d]: How was it even working 0.0
12:53 gfxstrand[d]: 😲
13:01 gfxstrand[d]: snowycoder[d]: Is dEQP-VK.api.ds_color_copy.d32_sfloat_s8_uint_r8_uint_stencil_level0_to_level0_att_usage failing for you?
13:02 gfxstrand[d]: Might just be a Tegra thing
13:04 gfxstrand[d]: I should really level-set on 1.4.4.0 on Ampere
13:07 snowycoder[d]: gfxstrand[d]: KeplerB passes
13:07 gfxstrand[d]: Okay, it's a Tegra thing. 😩
13:08 gfxstrand[d]: Probably missing an invalidate in the tests. They're new tests.
13:14 gfxstrand[d]: Ugh... vulkan-cts-1.4.4.0 isn't building for me
13:14 gfxstrand[d]: It built yesterday on my aarch64 machine
13:17 gfxstrand[d]: TIL about `fetch_sources.py --clean`
13:21 jja2000[d]: Btw, forgot to ask. What was the gist about doing nvk on armv7? There were some architectural issues w.r.t. memory or caching right?
13:21 mohamexiety[d]: same
13:22 gfxstrand[d]: jja2000[d]: Should be fine. There are two primary issues:
13:22 gfxstrand[d]: 1. We have a kernel bug around coherent maps. That just needs to be fixed.
13:22 gfxstrand[d]: 2. We need a new ioctl for explicitly flushing maps. Shouldn't be too hard to type and then plumb through. I've already left the plumbing hooks in place.
13:23 karolherbst[d]: ohh.. SM90 has `elect` 🙃
13:23 gfxstrand[d]: Ooh! Fancy!
13:24 karolherbst[d]: yeah.. just saw it in PTX, but it requires sm90 😄
13:24 karolherbst[d]: but there is also an `ELECT` instruction
13:24 gfxstrand[d]: :frog_upside_down:
13:24 jja2000[d]: I thought there was some inherent issue with armv7 that made it a whole lot harder. That's really nice to hear! The bulk of Tegra devices that are in the hands of mortals have the ARMv7 K1 SoC (or X1, but not everyone wants to touch their Switch).
13:25 gfxstrand[d]: Mostly the cache flushing thing. 32-bit arm doesn't let you do CPU cache maintenance from userspace.
13:25 gfxstrand[d]: So we need an ioctl
13:25 jja2000[d]: Yes, I think it was that, but from what I gather it's not that huge of a roadblock?
13:25 gfxstrand[d]: No. Just need to type some code.
13:26 gfxstrand[d]: The coherent maps bug scares me more
13:26 jja2000[d]: Hmmm
13:27 gfxstrand[d]: But that bug affects all tegra
13:27 jja2000[d]: Yes, I figured from reading here. This was a nouveau bug right?
13:27 gfxstrand[d]: yeah
13:39 monkey: Hi!
13:41 snowycoder[d]: Hello!
15:30 karolherbst[d]: does vulkan allow arbitrary workgroup sizes? Like what's the expectation in regards to compute shader launches with a workgroup size of e.g. 200x1x1 on nvidia?
15:35 karolherbst[d]: oh nvm, I messed up testing
15:51 karolherbst[d]: oh no.... I'm using `%warpid` 🙃 and uhm... yeah.. it's a physical identifier
15:51 gfxstrand[d]: oops
15:51 karolherbst[d]: I totally forgot about that and also why nvk lowers it
15:54 karolherbst[d]: `%warpid is intended mainly to enable profiling and diagnostic code to sample and log information such as work place mapping and load distribution.` ahhh
16:20 karolherbst[d]: okay more importantly, I need a name for this thing
16:23 mohamexiety[d]: rusti(cuda)l
16:33 snowycoder[d]: RustiCluda
16:34 HdkR: ShouldaCUDAWoulda
16:51 jja2000[d]: Uda, Cuda without C (I don't know what we're talking about)
16:53 chikuwad[d]: call it Jeffrey
16:55 f_: uda
17:51 gfxstrand[d]: cludge
18:02 mhenning[d]: you could name it something obvious like cl-on-cuda
18:03 mhenning[d]: boring but gets the point across
18:08 karolherbst[d]: mhhhh
18:08 karolherbst[d]: working name is "cluda" but that's kinda also boring
18:09 karolherbst[d]: "claudia" also came into my mind, but I don't really want to use like a proper name here 🙃
18:46 zmike[d]: ferricuda.
18:47 snowycoder[d]: Clocu (cl on cuda)
18:49 f_: claude
18:49 f_: oh wait that's already taken nevermind
18:52 snowycoder[d]: karolherbst[d]: CludAI so you can use more AI
19:01 gfxstrand[d]: I still like cludge
19:02 f_: Sounds like bridge
19:02 HdkR: CludAI is a good way to get half an industry to ignore it, if that's a goal.
19:02 chikuwad[d]: clobber
19:26 airlied[d]: Muddy, mesa, cuda and it muddies the waters
19:34 f_: clutter :P
19:36 mhenning[d]: clutter is already taken by GNOME. see eg https://blogs.gnome.org/clutter/about/
19:44 mohamexiety[d]: airlied[d]: oh yeah this is genius actually. there's also muda in similar vein
19:53 notthatclippy[d]: The MUDA project always makes me smile. It means bollocks.
19:55 HdkR: There's a 金玉 joke in there somewhere for people that enjoy that.
23:01 karolherbst[d]: okay lol.. I need `nir_op_fmad` 🙃
23:03 gfxstrand[d]: :frog_upside_down:
23:05 karolherbst[d]: yeah... so unless opt level is 0, it might or might not fuse fmul and fadd, unless you use the `mad` instruction 🙃
23:06 karolherbst[d]: I only found legacy options to disable it, and those options don't do anything anyway
23:06 karolherbst[d]: on the ptx CLI tool there is a flag doing _something_ tho...
23:08 mhenning[d]: :/ there's no exact modifier?
23:09 mhenning[d]: From https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#floating-point-instructions-mul : "The default value of rounding modifier is .rn. Note that a mul instruction with an explicit rounding modifier is treated conservatively by the code optimizer. A mul instruction with no rounding modifier defaults to round-to-nearest-even and may be optimized aggressively by the code optimizer.
23:09 mhenning[d]: In particular, mul/add and mul/sub sequences with no rounding modifiers may be optimized to use fused-multiply-add instructions on the target device."
23:10 mhenning[d]: So you can probably turn .exact into a rounding modifier
23:11 karolherbst[d]: mhhhh
23:11 karolherbst[d]: let me try that
23:12 karolherbst[d]: ohh that seems to work
23:13 karolherbst[d]: given that nir already contracts to ffma, maybe I just set them always for add and mul and call it a day for now