08:18 karolherbst[d]: airlied[d]: planning to push any more to your flexible cmat MR? Want to rebase my changes today
08:19 airlied[d]: karolherbst[d]: not at the moment, I think it has all the changes I wanted to make and passes CTS again
08:19 karolherbst[d]: cool cool
08:20 karolherbst[d]: probably will have to write my stuff from scratch tho 😄
08:25 airlied[d]: I think I at least have all the nir bits for reduce and per-element ops written as well
08:42 karolherbst[d]: yeah.. I'll see. the CTS was happy with your previous version at least
08:52 karolherbst[d]: well.. that was easy ` Failed: 0/11644 (0.0%)`
08:53 karolherbst[d]: `vk_cooperative_matrix_perf --correctness` is also happy
08:53 karolherbst[d]: well.. that's only 16x16x16 HMMA
08:53 karolherbst[d]: let's wire up everything else now 😄
08:53 karolherbst[d]: (and drop a bunch of lowering code)
09:45 karolherbst[d]: airlied[d]: mhhhhhh.. so there is an annoying issue, we can do 8x8x16, 16x8x16 and 16x8x32 INT8 IMMA ops, but if you e.g. look at a B matrix that is e.g. 32x8, it's hard to tell if you should lower it to 16x8 or keep it as 32x8, because you have no idea how the A or accum matrices are sized
09:45 karolherbst[d]: at least inside `get_rowcol_gran` this information doesn't really exist
09:47 karolherbst[d]: the situation will get worse if we also support e.g. int4 or other weird types where just looking at the accum you might not be able to tell either
09:47 karolherbst[d]: like int4 supports 8x8x32, 16x8x32 and 16x8x64, so with an 16x8 accum you don't know what to do with it
09:48 karolherbst[d]: though that might actually be fine..
10:07 karolherbst[d]: but like the matrix sizes also don't really matter to NVK besides for matmul...
10:10 airlied[d]: Not sure how to do that unless you do a prepass to work it out from the matmuls
10:10 karolherbst[d]: could only split the matrices for matmul...
10:10 airlied[d]: Then give it optimal nmk
10:10 airlied[d]: I have to split them for everything
10:11 airlied[d]: At least for flexible dims to work properly
10:11 karolherbst[d]: right...
10:12 karolherbst[d]: we also only have a vec16...
10:12 jja2000[d]: steel01[d]: I'm a proxy in this case, but I got suggested that you need to check the gpu related clocks and if you would like to test uboot without the secure mode measures, try abusing fusée-gelée to load u-boot that way
10:12 karolherbst[d]: I could add a specific matmul callback, but that would suck...
10:12 karolherbst[d]: mhh maybe not
10:12 karolherbst[d]: actually...
10:13 karolherbst[d]: yeah.. should be fine to have a op specific callback
10:13 airlied[d]: I think a prepass to workout the preferred matrix sizes for the shader from the ops might work, then lower to that
10:13 karolherbst[d]: we _might_ also want to split differently for LDSM and STSM
10:13 karolherbst[d]: airlied[d]: point is.. it doesn't really matter for nvk
10:14 karolherbst[d]: like... if we use bigger sizes and then have to split for muladd that's fine
10:15 karolherbst[d]: so maybe just a second callback giving you the intrinsic would be good enough for now
10:15 airlied[d]: It might not be worth using flex dim lowering then for nvk internals
10:15 karolherbst[d]: we'd have the same problem for flex dims
10:15 karolherbst[d]: like a 8x8x64 vs 16x8x64 IMMA would be lowered differently
10:16 karolherbst[d]: former has to use 8x8x16, where the latter can use 16x8x32
10:16 airlied[d]: But we would lower flex dims to whatever granularity we specify for each matrix type
10:17 airlied[d]: So get flex dim props would list supported sizes as granularity and we would have to pick the size from nvk I suppose
10:17 karolherbst[d]: karolherbst[d]: and you don't know that just looking at the B matrix
10:18 karolherbst[d]: airlied[d]: well.. but then we have to pick to either support flex dims on 8x8x16 or 8x8x32 but not both
10:18 karolherbst[d]: ehh
10:19 karolherbst[d]: 8x8x16 or 16x8x16
10:19 karolherbst[d]: so either we could allow 8x8x32 (and leave perf on the floor for 16x8x32) or...
10:19 airlied[d]: Have to look at what NVIDIA exposes I suppose
10:20 karolherbst[d]: I doubt they have this problem because in the end it really doesn't matter
10:20 karolherbst[d]: like.. just split muladd, everything else is whatever
10:21 karolherbst[d]: even the data layout is the same
10:21 karolherbst[d]: it's just more components on bigger matrices
10:22 karolherbst[d]: I think I'll try if adding a cb for muladd is good enough
10:22 karolherbst[d]: don't see a reason why it would cause any issues
10:27 karolherbst[d]: anyway.. I really want to get rid of the internal lowering because it's just a mess, quite a bit of code and has no real benefit carrying it
10:29 karolherbst[d]: but we also don't advertise 8x16x32 or something 🙃
10:29 karolherbst[d]: ` Failed: 0/11644 (0.0%)` yeah.. figures
10:29 karolherbst[d]: let me try with 8x16x32 just to see if it blows
10:29 karolherbst[d]: yeah...
10:30 karolherbst[d]: well.. "future me problem" (tm)
10:30 karolherbst[d]: it's good enough for all the internal lowering
10:47 karolherbst[d]: airlied[d]: does 16x16x32 int `./vk_cooperative_matrix_perf --correctness` run successfully for you on radv?
10:48 karolherbst[d]: it passes the CTS but `vk_cooperative_matrix_perf` throws error
10:49 karolherbst[d]: looking at the nir it feels like the lowering is slightly wrong somewhere
10:56 karolherbst[d]: mhh yeah...
10:56 karolherbst[d]: internally we keep the B matrix, but flex dim lowering keeps A.. mhh
10:57 karolherbst[d]: maybe I messed it up somewhere..
11:04 karolherbst[d]: mhh that's not it, something else is odd
11:06 karolherbst[d]: ohh your lowering pass doesn't split B.. now it makes sense
11:10 karolherbst[d]: ahh yeah.. I see it now 🙂
11:18 karolherbst[d]: might be our bug heh
11:20 karolherbst[d]: mhh the split properly uses different variables.. but I wonder if something merge them for... weird reasons
12:47 karolherbst[d]: ohh pain.. gallium trace doesn't work there.. *sigh* I'm sure it's some pipe_caps fuck up... wasn't there an issue like that somewhere?
13:00 hentai: Where should I carry my stack of "nouveau: kernel rejected pushbuf: No such device" traces from 6.12.38 to?
13:01 hentai: Here is one, if anyone wants a read: https://paste.c-net.org/CrisisMotor
14:21 martinvesely: So the easiest of proofs is where data access is always relying on internal adder hw of any accelerator to add weights, so we say it can not lie to us, but it's adder cause it would self ripple to selection and adder selections are presented as encoded presentations tiles that you decode and add to hash. so if you decode them to a hash it gets the selection , hence multivalue decoder has
14:21 martinvesely: to be instantiated in parallel somehow. That would mean after the hash gets decoded it needs to combine with another operand to point to selection in which tile the next answer is at, so that also is mandating to the fact that there is no point to have full presentation to virtually encode tiles to, but just intermediate mapping presentation that can be used to ripple into again. But this
14:21 martinvesely: time we force that to happen not in the decoder but in mapped answer sets as told since the decoder is packed presentation but different than the default packed presentation. such as 1024 combined result maps to correct tile out of Xpossibilities. so the tile gets mapped into presummed. And any arithmetic are handled in such way. hence when first 1024 from 1024 is selected to tile 1024 in
14:21 martinvesely: it ripples through the answer set and is procedurally a precomputation step it is followed up with a presummed-decoder per intermediate debug values IO or final answer. Those get data and instruction dep tree mapped. Considering that say in my family we have 3.5illborn people offered to the world, dad and my two sisters who are consumers without any capabilities , i have been trained to
14:21 martinvesely: handle illborn people terror, it was approved by court once as to what they lied all up, but after many repetitions of the same scam court finally overruled all their claims. We talk about very high time sick people there, who are likely even sicker than you and highly incapable ones, the biggest lie was as to how dad bought me an apartement and all things roll over the money in this
14:21 martinvesely: world, but sure it was my grannies life work that he just owned and shared to others as well as the steroida and blood business is similarly about such id theft, those people are incredible tyrans who cracked me up as bad as jack did and likely will all get their penalties in the most cruelest way approved by me.
14:26 zmike[d]: karolherbst[d]: it should work everywhere
14:27 karolherbst[d]: yeah no, I forgot to add support for semaphore_create
15:12 martinvesely: Ryan Houdek is HdkR's name as i understood, i really also understand his dilemma that who is correct or who does bullshit, those endless pdf's from american universities is that nonsense or a random man in Estonia claiming he is smarter than all of them, of course you'd pick american university campuses, but you are able to do that if that is what you believe, but why you bother me with
15:12 martinvesely: this insulting silencing of my sockets? I explained you that Peter Shor or Illinois or cornell stuff is all bullshit as is AI, at the end of the day you have rights to believe what you want to believe in, but do not force this to others, i do not force you to do anything my own too, i suggest as well as advise that in real paradigm you can do lot better on your fighting spirit or base
15:12 martinvesely: knowledge that i tried to add to your arsenal than that x64 to AARCH64 slow jit. And for bothering to come out in your defense that you would not waste your life, it's more positive or noble than the other way around, cause i say that for your own sake.
16:01 Jnavratilova: It only requires a real professional to be paid to work on this for couple or more months to get to the needed computation heaven. Maybe it's that employees like Intel and AMD and NVIDIA and some other smaller ones who earn money from numbers of hardwares sold, do not want to commit to such software , well seems plausible, however red hat maniacs i understand not much about, they knock me
16:01 Jnavratilova: down as one sick fart told me, it's very fair that the day that never happens such shitbag fart is already eliminated, absurd to absurd people. That same HdkR said how i want to impress such hw vendors, which is total nonsense from the same man, why on earth would i want to deal with idiots or the least scammers from nvidia intel or AMD, who say nonsense about hardware capabilities or
16:01 Jnavratilova: number systems and maths?
16:02 Jnavratilova: It's rather if that man can finally keep or get some sanity i'd be happy not to see one getting killed like some other tyrans.
17:28 matthiasberg: HdkR: no one listens to your toxicity shit, everyone also knows what have you done and how ill you are according to your theorems and your own words. Indrek laura and alex from estonia/finland and their worldwide terror crew composed of similar crooks thought that after they frame charges to me illegally and failed to kill me with series of attempts, that they get overhaul of new tourism
17:28 matthiasberg: to that direction, but reality is that everyone embargos such illborn set of monsters. Sooner you cut of your terror and show signs of imprevoement the better it is, your case of mental illness diagnosis has been fully collapsed so long ago, if you want to die carry on with your shit, it's your decision i gave jack and alex opportunity after they attacked me from behind, they do their jail
17:28 matthiasberg: and get analized round about daily basis during 2years of time, and are free otherwise if they do not turn themselves in they get killed, cause proportional attack to them is a kill off, same case with other estonians who bragged with the achievement of cracking me down physically with the help of doctors, we are working on proportional retaliation with our alliance and that indicates
17:28 matthiasberg: 100percent kill off to those incapable trashes. Russians not sure if can be trusted as whole, to take care of them, but we have several others to finish all this thing with good results.
18:14 karolherbst[d]: airlied[d]: okay.. I think it's your bug.. the load_store code calculates an offset of 0 for some of the lowering here
18:15 karolherbst[d]: `col_offset=1`, `offset=8` and `deref_bytes_size=16`, so `(col_offset * offset) / deref_bytes_size` is just 0
18:15 karolherbst[d]: ohh
18:15 karolherbst[d]: I know what it iwas...
18:15 karolherbst[d]: uhm..
18:15 karolherbst[d]: I made the same mistake...
18:15 karolherbst[d]: let me check..
18:18 karolherbst[d]: right.. vector elements
18:19 karolherbst[d]: you need to throw in a ` const unsigned vec = glsl_get_vector_elements(deref->type);` somewhere
18:19 karolherbst[d]: I _think_
18:20 karolherbst[d]: could be something else
18:37 Ristokekkonen: It's a simplest of logics , that i were not part of their anal parties, means nothing alike that i am a wanker, but the women who held those with various arrogant humiliating penniless crankgangsters are just idiotic sluts, who i have zero interest about and obsessed with bad cranks in the world, i was not involved cause they had better rotation of such idiotic midnless steroid cranks,
18:37 Ristokekkonen: similarly if i ain't go together with one of your crookillborn developers, if you want to call me schizophrenic cause of that, force will be used against you along with the Estonians who broadcasted all that fraud. Because the slut was analyzed by such i ain't gonna listen to their idiotic theories about how sexy bad guys are such as Courtney Ryans and other relationship tutorials, it's
18:37 Ristokekkonen: just that i make sure that they understand that any approach or association with them is leading to me firing tin bullets at them. Similarly your abortion leftovers take on my own relations i do not take serious and i stay who i am , i care nothing about sluts and their theories.
19:51 moth: Does "compute" (listed as WIP for all supported cards in the feature matrix) mean the usage of the GPU for non-graphics tasks?
19:53 mhenning[d]: if so, then that page is out of date
19:54 mhenning[d]: we definitely support compute shaders on everything nvc0+, I think we have support on nv50 too but I'm less certain of that.
19:57 karolherbst[d]: I think it was left as "WIP" because of OpenCL support back in the day not being existent
19:57 mhenning[d]: oh, yeah if it means opencl then that's different
19:57 karolherbst[d]: but codegens "compute" support was always bad, soo... technically for the gallium driver "WIP" is a truthful statement
19:58 karolherbst[d]: but also GL compute wasn't really allowing much
19:58 karolherbst[d]: with NVK it's an entirely different story
19:58 mhenning[d]: well, yeah codegen's known bugs are more annoying on compute
19:59 karolherbst[d]: anything non int32 is pretty broken sadly
19:59 karolherbst[d]: not sure I trust its subgroup and atomic support either 😄
20:00 karolherbst[d]:anyway
20:00 karolherbst[d]: might want to set it to done with a huge asterisk
20:27 moth: Is there compute stuff that still needs work on noveau's end? I know C and have a RTX 2070 Super (NV164).
22:17 airlied[d]: karolherbst[d]: is your nak/flexible branch what I should debug on?
22:18 karolherbst[d]: airlied[d]: yeah, but it had the old stuff, I pushed it now
22:18 karolherbst[d]: but the tldr is that the deref can be of any type
22:19 karolherbst[d]: and the stride is based on the deref type, not the matrix
22:19 airlied[d]: make sense
22:19 karolherbst[d]: had to deal with this stuff for LDSM as well: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36363/diffs#2811255f0166cfecac738d47dd92cdc0a7a25c96_571_572
22:20 karolherbst[d]: I didn't quite get it to work.... somehow I messed up the stride still based on your branch
22:21 karolherbst[d]: but I think using a cast and then ptr_as array messed it up in some way I don't want to think about today 😄
22:22 airlied[d]: I'll see if I can figure it out today
22:23 karolherbst[d]: it's a bit annoying that the VK CTS doesn't test any of those corner cases...
22:25 airlied[d]: well it's a vendor extension that fact it has CTS at all is impressive
22:25 karolherbst[d]: I mean it doesn't even test it for the core feature
22:25 airlied[d]: ah could propose it as a feature request then
22:25 karolherbst[d]: like having a uvec4 shared mem array and then... load any sort of matrix from it is just legal in core
22:26 karolherbst[d]: yeah.. should probably do so
22:33 karolherbst[d]: in the spirv it looks like this e.g.:
22:33 karolherbst[d]: %503 = OpTypeCooperativeMatrixKHR %uchar %uint_3 %lK %lN %uint_1
22:33 karolherbst[d]: %514 = OpAccessChain %_ptr_Workgroup_v4uint %Bsh %513
22:33 karolherbst[d]: %523 = OpSpecConstantOp %uint UDiv %STRIDE_B_SH %uint_16
22:33 karolherbst[d]: %524 = OpSpecConstantOp %int Select %BColMajor %int_1 %int_0
22:33 karolherbst[d]: %525 = OpCooperativeMatrixLoadKHR %503 %514 %524 %523 MakePointerVisible|NonPrivatePointer %uint_2
22:33 karolherbst[d]: ehh and `%Bsh = OpVariable %_ptr_Workgroup__arr_v4uint_320 Workgroup`
22:36 karolherbst[d]: Anyway.. tested it on Ampere so 16x16x32 gets lowered to 16x8x32 IMMA
22:36 karolherbst[d]: disable the `nir_lower_cooperative_matrix_flexible_dimensions` call in `nak_nir.c` to get NAKs own lowering if you want to compare at some point
22:38 karolherbst[d]: I wonder if the `u32vec4` thing makes sense for alignment reasons...
22:38 karolherbst[d]: anyway, gonna sleep 😄 good night
23:04 mhenning[d]: moth: Sure, computes shaders should already work correctly under nvk, but there's still plenty of work to do on performance and maybe some missing features
23:04 mhenning[d]: if you want to contribute, then compiling nvk is probably a good first step
23:09 jarinylander: Europe endorses such people to the world that are incapable and charges real people to mental institution to harvest substances. That all made a chaos on our nation and we need to make sure that this trash they endorsed onto international wank spam, get's all retroactively aborted. Karolherbst and airlied have been branched as amoebas originating from such deals, where as Ryan and ÄŽog are
23:09 jarinylander: american fart sniffers of the former cheaters. This crew is composed of extra debiliated as an example of europes policy however. But for the luck of real deals, I say the picture isn't so freightening Europe has real persons too, they just do not get any endorsement or promotion from the politicians who print those moneys and fold the proteins from victims to support other monsters.
23:09 jarinylander: Unfortunately Hitler did not make the thinsg better, and we do not know if the dialogue can be opened with Putin to get to the better end of that business, to kill all europes quasimodos and cheatscammers.
23:10 snowycoder[d]: mhenning[d]: It would be helpful to have a list of missing features or performance optimizations to know the state of the project and to help newcomers
23:15 snowycoder[d]: An entry in the list that I know of would be "take functional units into consideration in instruction scheduling", that should have a minor performance lift (but it requires documentation or RE).
23:19 mhenning[d]: snowycoder[d]: That would be helpful, yes, but it's also a lot of work.
23:19 mhenning[d]: There's been some attempt already at tracking this stuff in gitlab issues