00:58gfxstrand[d]: Ugh... I'm not seeing any QMDs in these crucible dumps
01:00nsITobin: How's the support on the 20 series cards these days?
01:01gfxstrand[d]: Should be good
01:01airlied[d]: gfxstrand[d]: the push dumper that avoids things starting with 0s is broken
01:01airlied[d]: in envyhooks
01:01gfxstrand[d]: *sigh*
01:01airlied[d]: I had to comment that out to get proper dumps, I think there is a bug in envyhooks understanding of gpfifo
01:02nsITobin: While I primarily use AMD graphics because that is what I have I did pick up a 20 series card that was barely used ran it on winders for a bit was decent except on newest stuff but I have it as a backup or if I build another system.. I can also throw it in here for testing if testing won't burn the card out lol
01:03gfxstrand[d]: 20xx is generally decent, at least as good as anything is with NVK
01:06gfxstrand[d]: I get okay dumps out of deqp tests just not crucible. IDK why
01:07nsITobin: another question.. will the kernel modules work built in to the kernel? my kernel has amdgpu firmware built in but of course nvidia needs extras besides the firmware
01:07gfxstrand[d]: Yeah, the usual nouveau module shipped with your distro will work
01:08nsITobin: gfxstrand[d]: you're funny.. while currently I am on an EL8 userspace with a 6.10.x kernel and standalone manually managed grub I am indeed heading more and more to LFS.. so soon I WILL be my own distro ;)
01:09gfxstrand[d]: heh
01:11gfxstrand[d]: You probably want something newer than 6.10, though. That's getting kinda old
01:11gfxstrand[d]: It
01:11gfxstrand[d]: It'll work, though.
01:14nsITobin: so far 6.7 to 6.10 works well on this machine.. 4x and 5x have IO issues that vanished at 6.7 .. really only affected wine and virtualization .. doesn't matter which one vbox vmware or kvm (just never EVER at the same time) I haven't tried anything newer but when i looked at 6.11 and plans for 6.12 i wasn't too impressed overall.. I need more C.
01:16nsITobin: anyway when I just put that card in here to do The Things(tm) i will do some poking at it .. and learn to build it. Thanks gfxstrand[d] for answering mah questions :)
01:16nsITobin: next put that card in*
03:01airlied[d]: gfxstrand[d]: https://paste.centos.org/view/raw/dc1558cc just to confirm I can read, that does say no visible vram at all doesn't it?
03:15gfxstrand[d]: Yup. So it seems.
03:16gfxstrand[d]: VRAM on Arm is hard and we're UMA so why bother? <a:shrug_anim:1096500513106841673>
03:19airlied[d]: coherent except the CPU can't access it 🙂
03:20airlied[d]: they do have distinct CPU and GPU RAM though
03:21airlied[d]: got like 96MB VRAM on the GPU and 480MB RAM on the CPU
03:21HdkR: Coherent, except when it isn't.
06:33mohamexiety[d]: nvlink-c2c is kinda :nervous: with gh100, it’s not really true UMA :/
06:35HdkR: Kind of like Apple Ultra effectively then :P
10:58mohamexiety[d]: gfxstrand[d]: did you try grepping for `3bc4`? that always worked for me and my crucible shaders were super simple
11:39snowycoder[d]: Buffer image load/store works, but I currently don't support min_alignment (buffers should be aligned at 0x100).
11:39snowycoder[d]: The only reason for that is that it would require an additional descriptor load + add instruction, nothing unreasonable.
11:49snowycoder[d]: Should I force 0x100 buffer alignment for Kepler cards or fix it and degrade performance for everyone?
11:52gfxstrand[d]: Buffers really should have a low alignment.
11:53snowycoder[d]: So it's better to have a bit less performance and keep low alignment, right?
11:59gfxstrand[d]: Yup
12:00gfxstrand[d]: An extra add in the shader won't cost much
12:36karolherbst[d]: okay.. how do I build the VK-GL-CTS now 🙃
12:37karolherbst[d]: upgraded to fedora 42 (g++-15.1) and it's just broken
12:37snowycoder[d]: karolherbst[d]: I recently had to use `-DCMAKE_CXX_FLAGS="-m64 -include cstdint"`
12:37snowycoder[d]: Don't really know why though
12:37karolherbst[d]: yeah...
12:38karolherbst[d]: I mean the cstdint including is missing for sure
12:38karolherbst[d]: but it also fails to compile for c++ reasons...
12:38karolherbst[d]: like one translation unit needs -std=c++20 passed in
12:42mohamexiety[d]: wow so I am not alone
12:42mohamexiety[d]: I ran into this yesterday with 14.2.1 and thought I messed up
12:43mohamexiety[d]: which is why I tested 1.4.2.0 only
12:58snowycoder[d]: snowycoder[d]: I abused `imadsp`, now I don't even need an extra add.
12:58snowycoder[d]: They gave me too much power, muhahaha
13:09gfxstrand[d]: karolherbst[d]: Oh boy... I haven't rebuilt the CTS yet. I had enough trouble last night with openrm.
13:10gfxstrand[d]: So much for C's legendary backwards compatibility. 🤡
13:10gfxstrand[d]: I get tempted to complain about Rust breaking shit but then I have to remember this happens every other Fedora release or so.
13:11gfxstrand[d]: snowycoder[d]: Glad you're having a good time. 😂
13:12gfxstrand[d]: I'm gonna have to review this mess, aren't I? :silvy_sweat:
13:18karolherbst[d]: heh
13:18snowycoder[d]: gfxstrand[d]: I'm documenting it as much as I can, I swear
13:30mohamexiety[d]: snowycoder[d]: you could even say... umad
13:51ircdloud_871111: So my supercompiler is nearly ready actually for long periods from now, i committed only the key logic to #LLVM which in full form would contribute to my terror as in all the past when i have dealt with you and disrespectful people like you, you are just too stupid to understand it, it's a fact that i do not even bother to prove. That resulted with other hard evidence into adam ray kind
13:51ircdloud_871111: of biden show, where they bluffed with m+ R+ and then left with that radios old dejavu disturbance, or let's say zuma kind of loop of AI. It starts off as that threat where we do do do do aaaau when the boomerang hits again back where they remember that we are fecalists that no one gives fuck about when threattening also.
14:32snowycoder[d]: Where can I put multisample coords offsets?
14:32snowycoder[d]: It's a small 16 bytes table needed to lower multisampling into real addresses.
14:32snowycoder[d]: codegen expected them at a fixed cbuf location that was pushed with `PUSH_DATA` (search for `NVC0_CB_AUX_MS_INFO`).
14:32snowycoder[d]: If we store it in cbuf too, something like `nir_opt_large_constants` would have been perfect, but it's called early.
14:32snowycoder[d]: (`lop3` could have been perfect if it wasn't sm70+ only)
14:41gfxstrand[d]: You'll have to stick them in the descriptor
14:41gfxstrand[d]: We already do that for images today with the Maxwell+ descriptors
14:42gfxstrand[d]: https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/nouveau/vulkan/nvk_descriptor_set.c?ref_type=heads#L143
14:45snowycoder[d]: This way it's duplicated for each storage_image descriptor though, isn't it?
14:59gfxstrand[d]: yup
14:59gfxstrand[d]: But given how much we're already burning per-descriptor, that little table is tiny
15:59highendleech: <highendleech> so what that means is dmlloyd you should had born as dead to begin with, then i would not have those troubles, this is the sumup i and my partners have reached to, and for explanation i demanded nothing from laura just no humiliating no abusive gossiping while getting resources from me, and it wasn't hard situation for her to say that i love you if it ever was the case,
15:59highendleech: which would had been enough of a start to start our real journey and mutual partnership. And i only waste my time here to explain that, cause i spotted before any mentions of that fiasco humiliations in that hard situation of mine there in the past IRC people rubbing that shit under my nose.
15:59gfxstrand[d]: Got an exception out of a compute shader!
15:59gfxstrand[d]: [ 4144.931430] nouveau 0000:65:00.0: gsp: Xid:13 Graphics Exception: SKEDCHECK05_LOCAL_MEMORY_TOTAL_SIZE failed
15:59gfxstrand[d]: [ 4144.931442] nouveau 0000:65:00.0: gsp: Xid:13 Graphics Exception: ESR 0x407020=0x1000 0x407028=0x0 0x40702c=0x110 0x407030=0x0
16:00gfxstrand[d]: At least that means the GPU is trying
16:08highendleech: So yes, i am the stupidest guy on earth, because of me, world wide war started in other words world war 3, first battles when i did not release anything, biggest battles of the century and last 70 years after vietnam, and now when i released only smaller key snippets its already war3 and issue was only cause i talked that i had something alike in technology, so all in all i feel sad!
16:09highendleech: I lost my love, i lost my grip in life, have not yet recovered, and i caused world wide battles and wars it seems clearly to me.
16:35gfxstrand[d]: running 1 test
16:35gfxstrand[d]: test hw_tests::test_sanity ... ok
16:35gfxstrand[d]: test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 34 filtered out; finished in 0.06s
16:37mohamexiety[d]: on the right track!
16:44iwouldstillchooselife: It's bad what i did cause, (yes maybe others knew this, and i was said by other smart people it's bad idea to share such methods) ones could possibly utilize those ideas to do noble things, which is my excuse of trying woth small snippets on #LLVM which likely those battles are not about mostly, but for myself yes i try to choose life over death in any situation.
17:19roundofrounds: you mostly follow the steps from me, as i follwed the steps of Europes and US universities. I see all that well, that my computer is on line of some researchers, and no charge has been or accusal has been filed by me , and will not be either. Now FIRRTL imo actually comes from yosys, i used the same rtl to translate circuits back to verilog with yices2, simulator with timing things i used
17:19roundofrounds: tachyon, which was the most accurate then. So you made lot's of cool things indeed on LLVM infra. But that was not so hard and ideas came from me anyways, my ideas came from others etc.
17:20snowycoder[d]: gfxstrand[d]: Wait we can fold them in lop3 then, we can save 2 instrs and descriptor space
17:22mhenning[d]: does kepler have lop3? I thought that was sm70+
17:34snowycoder[d]: Yep I meant for sm70+
17:36roundofrounds: Now if very amoral events in form of assaults with weapons or behind the back etc. are not posted to me, i have absolutely zero possibility that i fail in life with my knowledge or experience or skillset, i went over from temporary complications some time ago already.
17:52roundofrounds: If you want to improve the freedpk process, i have two tools for that on my harddrives DSCENT tool which american professor wrote in mit, and another from brazill programmers, i could program this by hand, but first was given to by request kindly, and mjt clockless circuits that germans deal with in science are as powerful as anything with xtreme euv based but cheaper to produce, i am
17:52roundofrounds: unsure of my eager interest there as of yet however, we are already at the very tops of science, lot of medicine i studied too and i know how my organism will heal up too, cause i have taken notes as to what meds suite my internal charge and what not etc.
17:52gfxstrand[d]: running 1 test
17:52gfxstrand[d]: test hw_tests::test_iadd64 ... ok
17:53roundofrounds: the headers for DSCENT tools however say it's little bit cheaty to use others processes but you are on your own there doing this.
17:56gfxstrand[d]: gfxstrand[d]: mohamexiety[d] airlied[d] ^^
17:57gfxstrand[d]: Kinda hard to see in all the spam
18:04roundofrounds: Solvers would solve anything in just minutes to solve even better, but understanding or doing things manually would just train you better, and as i was falling down, i used that , but i am aware that i can get to where i want by just using solver indeed and making minor modifications and stitchings to the output numbers.
18:05roundofrounds: in other words, my brain would had rotten indeed but i faught back on this.
18:09mhenning[d]: karolherbst[d]: Do you know what LDGDEPBAR does?
18:09karolherbst[d]: yes
18:10karolherbst[d]: basically once it completes all previous load instructions have written to their dest regs
18:10mohamexiety[d]: gfxstrand[d]: yooo awesome!!
18:10karolherbst[d]: "completes" as in signals the scoreboard
18:10mohamexiety[d]: what did it take to get it to run?
18:11mhenning[d]: karolherbst[d]: okay, so it sounds like it's an alternative to using the write barrier. makes sense
18:11karolherbst[d]: read
18:12karolherbst[d]: ohh wait
18:12karolherbst[d]: no write 🙃
18:12karolherbst[d]: I confused myself
18:12karolherbst[d]: and yeah.. ldgdepbar signals on the write scoreboard
18:13karolherbst[d]: not really sure for what it would be helpful tho...
18:13karolherbst[d]: feels kinda pointless
18:13karolherbst[d]: mhhh
18:13karolherbst[d]: guess if you run out of scoreboards it could help
18:14karolherbst[d]: ohhhhhhhh
18:14karolherbst[d]: mhhhhhhh
18:14karolherbst[d]: to make use of it you need depbar practically
18:15mhenning[d]: how so?
18:15karolherbst[d]: you can have multiple batches of reads and all LDGDEPBAR signal on the same scoreboard
18:15karolherbst[d]: and then with depbar you can wait on a specific batch to complete
18:16karolherbst[d]: though not sure for what that's even useful
18:16mhenning[d]: I'm not sure I follow
18:16karolherbst[d]: the scoreboards are counters
18:16mhenning[d]: yeah
18:17karolherbst[d]: and with depbar you can wait on them to become a specific value or lower or whatever
18:17mhenning[d]: right
18:17karolherbst[d]: so you can have multiple groups each ended with LDGDEPBAR
18:17karolherbst[d]: and all LDGDEPBAR use the same scoreboard
18:17karolherbst[d]: then you can depbar to wait until n are done
18:17karolherbst[d]: instead of all
18:17karolherbst[d]: but again.. no idea why you'd want to do that
18:18mhenning[d]: this is where I'm lost. Does LDGDEPBAR convert the LDG to a barrier wait somehow?
18:18karolherbst[d]: it waits until all previous loads are completed
18:18mhenning[d]: earlier I thought you meant it just waits on LDGs immediately
18:18karolherbst[d]: the previous loads won't have to signal a scoreboard at all with this
18:19karolherbst[d]: LDGDEPBAR signals one for all previous loaads
18:19karolherbst[d]: though mybe it's just ldg and ldgsts
18:19karolherbst[d]: well.. it's ldgdepbar so I guess it's only global loads
18:20roundofrounds: I do not hate anyone in my life that took place took place, and respect can not be bought i blame the devil or satan and bad timing. So i hate not LGBT, it's just that gfxstrand you can just generate on all chips the missing instructions with a very cool patch to solvers and caliban and such faul injection frameworks. So that approach was done on youtube but slightly in wrong way by
18:20roundofrounds: person who discovered missing instructions to get to root access of x86 chips from ring0. so if you are anxious let me know i write this project for free. or solve the needed hunks with solver and generate the needed code in semi-automatically, otherwise i can not help with things that needs authority so nvidia can be just sent to wherever wanted if you have troubles, i wouldn't care but
18:20roundofrounds: i have no authority to do that.
18:20mhenning[d]: okay, so then LDGDEPBAR doesn't actually wait, it just says to signal a barrier at some point in the future?
18:20karolherbst[d]: it signals it's write scoreboard once all previous global loads are done writing to their dest register
18:21mhenning[d]: okay, makes sense
18:21karolherbst[d]: but yeah...
18:21karolherbst[d]: no idea how that's useful
18:21karolherbst[d]: maybe some cuda stuff
18:29roundofrounds: So you do not need barrel shifter to do that work as digging all instructions that flip bits on the output , whatever instruction can be used technically but i would go for adders instead. that is bit risky information again but it is what it is, you solve the super-execution methods that would not round trip the arithmetic ranges, carry the caliban code to protected mode, and write a
18:29roundofrounds: little patch to it, all nvidia chips and parsers and compilers and stuff would be easy to maintain as result.
18:31gfxstrand[d]: mohamexiety[d]: Some educated guessing
18:31gfxstrand[d]: And smashing some bits I don't know what they are. :frog_upside_down:
18:31mohamexiety[d]: Yeeah…
18:34gfxstrand[d]: But the educated guessing got me a ways
18:36roundofrounds: so let me recap what it does, the interrupts launch 32bit barrel shifters to control any of the bits at time, i just messed around with LLVM people, i am afraid they know anyways how capable enough mind i am.
18:37gfxstrand[d]: mohamexiety[d]: My current blackwell branch has everything but one line. I'm still trying to figure out what it does
18:46gfxstrand[d]: Okay, branch now works. I don't know why
18:57spaghettiwordscode: i said my friends are scientists and bitcoin advocates, me i am even bigger computer scientist, and we know venezuela tried to put their cryptocurrency miners up, it's not about pump and dump, this can be all fraud and conspiracy, proper scam rng's can be harvested same as proper firmware can not be hacked it just does not allow you to enter the needed modes, as the endless memory or
18:57spaghettiwordscode: clearence codes permute away from single access , that is always so in nato sectrets, multilevel clearence. So that brings to the point as to how all humanity can go distinct, as when the needed effort is put onto this, there are so angry instances, and i quit from here.
18:58mohamexiety[d]: gfxstrand[d]: hey we take those wins :KEKW:
19:00gfxstrand[d]: Yeah, I need a shader that spills before I can really test some of those bits
19:03mohamexiety[d]: gfxstrand[d]: in theory wouldn't taking my regfile RE test and going with a large x like 300 or so work?
19:03gfxstrand[d]: Yeah, quite possibly
19:04gfxstrand[d]: I've been mostly focused on stabilizing the really important stuff like the program address and cbuf0
19:04mohamexiety[d]: yeah
19:04mohamexiety[d]: really nice finds!
19:04gfxstrand[d]: But it looks like most of the SSBO tests are passing
19:04mohamexiety[d]: I am also surprised the program address is now shifted :blobcatnotlikethis:
19:04gfxstrand[d]: I'm not. Blackwell has more pointer bits than Hopper
19:05gfxstrand[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1370114375217840289/9tblwv.png?ex=681e51f2&is=681d0072&hm=a7f825f65fd1ba6dcd392452c4ea1013accf66cfde3515d02aae7c6035855823&
19:05mohamexiety[d]: wait it does? o_o I see
19:06gfxstrand[d]: It's got 6-level page tables
19:06mohamexiety[d]: wait so how long is the address now? still 52 or do we go to 64 now :KEKW:
19:06gfxstrand[d]: like 58, I think
19:06gfxstrand[d]: You need a lot of bits if you're going to address that petabyte of VRAM on that mega-cluster.
19:06mohamexiety[d]: hmm. fair enough I see
19:07gfxstrand[d]: It's still rediculous
19:07mohamexiety[d]: yeah :nervous:
19:07gfxstrand[d]: The PCI cards might have fewer levels. IDK. But they still have all the bits in the various fields.
19:10gfxstrand[d]: I'm a little surprised they only shifted by 4, though. program addresses are already required to be 256B-aligned.
19:10gfxstrand[d]: They could have shifted by 8
19:10gfxstrand[d]: Test run totals:
19:10gfxstrand[d]: Passed: 12130/12160 (99.8%)
19:10gfxstrand[d]: Failed: 30/12160 (0.2%)
19:10gfxstrand[d]: Not supported: 0/12160 (0.0%)
19:10gfxstrand[d]: Warnings: 0/12160 (0.0%)
19:10gfxstrand[d]: Waived: 0/12160 (0.0%)
19:10gfxstrand[d]: That's `dEQP-VK.ssbo`
19:12mohamexiety[d]: <a:vibrate:1066802555981672650> that's some big progress
19:17gfxstrand[d]: I think something fishy is going on with UGPRs
19:19mhenning[d]: Maybe the ugpr file size matches the number of gprs we request?
19:20mhenning[d]: I was honestly surprised that the ugpr file size is static on earlier gens
19:28r_headblasted_now: i am uninterested to mess around there, did that already happen you try to put a fuse to nvidia chip so it would explode like american soldier helmut on reverse engineering gps clearances , lol do that with commodity hardware that support all mathematics of the world, ouch or did that already happen in 2011 framed hacker got a little clue from secret intelligence service, anyways sure
19:28r_headblasted_now: malicous code can be in hardware that can not be identified, since control path would give a fake illegal opcode , and control path can not be sweeped in other ways , it needs those codes. But fuses of mechanical kinds that break or explode jokes aside they do not have. safety first soldiers.
19:29gfxstrand[d]: mhenning[d]: I don't think so but also I have no way to know yet without docs.
19:29gfxstrand[d]: mhenning[d]: I'm not that surprised. The whole UGPR file cost 2 registers. It didn't affect occupancy that much. It's a little worse now but not massively.
19:29gfxstrand[d]: AMD does the same thing
19:30gfxstrand[d]: (We won't talk about Intel)
19:32mhenning[d]: gfxstrand[d]: yeah, that's fair
19:33mhenning[d]: gfxstrand[d]: well, you could request all the registers and see if that fixes things
19:35gfxstrand[d]: I found a SSBO fail that doesn't use that many UGPRs so I'm gonna try to fix that first
19:38gfxstrand[d]: imnmx now has a pile of predicates
19:41airlied[d]: Would also not be shocked if we have a few more places that need urZ forced
19:46gfxstrand[d]: gfxstrand[d]: Yeah, imnmx now returns 2 predicates and takes an extra one. No idea what they all do
19:47gfxstrand[d]: fmnmx has not grown extra predicates
19:47gfxstrand[d]: Also, IMnMx now has a 64-bit mode
19:48gfxstrand[d]: I wonder if ISetP does
19:52gfxstrand[d]: Doesn't look like it
19:52airlied[d]: I can probably answer those questions if I can find out where I left the files 🙂
19:53airlied[d]: oh maybe VIMNMX
19:53airlied[d]: that takes two extra predicates vs IMNMX
19:54airlied[d]: one is for if resultant value(s) came from lower SIMD lane(s) of Ra , and one for upper SIMD lane(s)
19:55gfxstrand[d]: Regular IMNMX also returns 2 predicates
19:58airlied[d]: docs are not documenting that then 🙂
19:59gfxstrand[d]: I hate that all the failing SSBO tests are all thousands of instructions. ðŸ˜
20:00airlied[d]: there appears to be a new {?pm_pred} on every instruction, no idea what it does
20:00airlied[d]: its in with all the req/rd/wr stuff
20:06noinfoatthat: So again i try to do the last. It's only hard work with my thinking where i hit the results well so often. It's i do not know if people have psichic skills hoeever i am sure i never had any not with or without chips that were tried to give me some. So sure i never know what really happened, no one shares this to me obviously. That is the formal answer if i am alien perhaps, which i also
20:06noinfoatthat: doubt, catalogue of 57 different alien people was kept for nasa, and they suspected i am one, doubtful but not entirely sure, psychic i am not hundred percent, antigravitational vechicles what the same guy talked about is probably yes that they have been already developed before. Until i do not know the ultimate truth i can not rely on anything, and that is not possible to be known in my
20:06noinfoatthat: case. So i am not sure who did the heroics for my life to continue, might had been you as well as possible outcome.
20:23gfxstrand[d]: I think I might give up on these last 10 tests and find something else to debug.
20:32airlied[d]: gfxstrand[d]: does serial fix them?
20:32gfxstrand[d]: nope
20:33gfxstrand[d]: It's a register stop somewhere
20:40gfxstrand[d]: I'm gonna push all UGPRs above 63 and see what fails
21:00gfxstrand[d]: OMG these shaders are pathetic. We have so much address optimization to do. 😳
21:02gfxstrand[d]: gfxstrand[d]: Okay, that does cause a lot more fails
21:04gfxstrand[d]: mhenning[d]: I don't think this is quite the case but I'm starting to wonder if there's some funky aliasing going on or something. Like maybe we have to avoid r0 if we use ur64+.
21:07gfxstrand[d]: Ugh... Earlier writes are fine and they also use ur76/77/78
21:08gfxstrand[d]: I've got nothin'
21:08gfxstrand[d]: I'm going home
21:24karolherbst[d]: gfxstrand[d]: yeah
21:24karolherbst[d]: I'm sure if nobody else will figure those out, I'll have to to make coop matrix go brrr 🙃
21:26gfxstrand[d]: There's a LOT of stupid we could get rid of before we worry about new instruction forms. SO MANY MOVS!
21:27karolherbst[d]: there is a patch for that
21:27karolherbst[d]: https://gitlab.freedesktop.org/karolherbst/mesa/-/commit/5e1e925cc6b71464375723206069a05f3908bd02
21:27karolherbst[d]: that should get rid of most of them
21:32dontreckonthese: it's weird to think that some person was conspired so bad against through gangsters that could not get over the line with simplest surgery known to all medical staffs in the orthopedics world. Where as others are pimped into whatever wished from a man to woman if needed etc. Feel lucky once in the worlds history that i do not charge you. It's also weird that why would one reject any
21:32dontreckonthese: relations to his past love if one was not bitten all the aproaches except once in favor of other bums in 2.5 years time and even that single time succeeded resulted in rapist fame , jesus christ are you so mad or what?
21:41airlied[d]: gfxstrand[d]: by the end of my branch I had proper ureg address calcs working, but need to probably do more validaion
22:03gfxstrand[d]: I've got most of your patches in a big squash. I'm picking things out and cleaning them up as it makes sense to do so.
22:05airlied[d]: the address work is not in the blackwell stuff, it's in the coop mat fun
22:06airlied[d]: not sure I've got enough spite to make gh100 work with the limitations it has
22:11mhenning[d]: karolherbst[d]: I'm not a big fan of the "One big change is to ignore phi webs for vectors." part of this. I tried that when I was writing the phi web code and it made things worse on average for shaderdb (although it did improve some shaders)
22:11karolherbst[d]: yeah... I didn't dig into the details of the patch
22:12mhenning[d]: I've been hacking a little at vector RA heuristics this week, still have more ideas to try out
22:12karolherbst[d]: however, there is a giant issue with movs around vectors, and if there are better solutions for it, then great
22:12karolherbst[d]: cool
22:12mhenning[d]: yeah, we definitely need improvement, I just think that some of the details of that patch are a little too naive
22:12karolherbst[d]: it's just that getting rid of those movs gives +20% perf in coop benchmarks 🙃
22:13karolherbst[d]: yeah.. possibly
22:14airlied[d]: arrgh getting illegal red encoding on an int, but disasm is fine
23:58gfxstrand[d]: Doing a full run to level set again. It'll probably die in an hour or two. As long as nothing catches fire, I don't care.