01:36gfxstrand[d]: gfxstrand[d]: Also note that `bound` is intentionally non-uniform. UGPRs don't count towards the limit (at least they didn't on Turing) so we want to ensure that we get lots of GPRs
02:11tiredchiku[d]: mhenning[d]: I suppose, yeah 😅
03:14x512[m]: tiredchiku[d]: Removed nvRmApi* ctl/dev separation. Currently all ioctl calks go to ctl node (/dev/nvidiactl).
03:27airlied[d]: mhenning[d]: looks like there is an .lb.lc it's just hidden somehwere else
03:28airlied[d]: 04007f60 4a040602 0d1e3f02 00016400
03:30airlied[d]: bit 59 seems part of it
03:33mhenning[d]: ah, it's possible I missed it
03:36airlied[d]: I only found it trying to work out 3d/lb/ulc found nvidia used lb/lc
04:38tiredchiku[d]: x512[m]: okay
04:51gfxstrand[d]: So... I just realized that I think we can do an SSBO bounds check in a single IAdd3 instruction on Turing+.
04:52gfxstrand[d]: It'll require figuring out how to optimize to that but I opt_algebraic can do it.
04:54orowith2os[d]: How does this affect the US economy and how large of an impact does it have on the security of the federal government :ferrisHmm:
04:54gfxstrand[d]: `iadd %offset %load_size -%ssbo_size` and take the carry bit.
04:54gfxstrand[d]: Who needs comparison instructions when you have carry bits?
04:58gfxstrand[d]: With that plus predication, we should be able to get bounds-checked loads and stores pretty cheap.
04:58gfxstrand[d]: Not as cheap as skipping all together, of course, but a lot cheaper than they are today.
05:12airlied[d]: okay my branch has what I think is the correct truth table for the lods modes
05:31gfxstrand[d]: Woo
05:32gfxstrand[d]: There's also two extra LOD modes on Kepler. I doubt they're related to Blackwell, though.
05:35airlied[d]: just got shadows and tg4 offsets failing in dEQP-VK.glsl.tex*
05:35airlied[d]: oh and some wierdness with two ms texture size queries crashes the gpu
11:37mohamexiety[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1360217173301264424/image.png?ex=67fa5075&is=67f8fef5&hm=134f5bd9487e18a1f47b5f8504f6eaf6c4b8463f790a6a3a5fc46ee482f3dc8c&
11:37mohamexiety[d]: hm interesting, Spark seems to have RT cores
11:38x512[m]: Why not RISC-V? :(
12:16avhe[d]: Is there info on what their DGX OS is based on?
12:16avhe[d]: NVRM kernel or tegra kernel specifically
12:30mohamexiety[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1360230485225242735/image.png?ex=67fa5cdb&is=67f90b5b&hm=960da26465942e713b565ae908ff7241986a3e9594386bca471cc86d784b8efa&
12:30mohamexiety[d]: avhe[d]: it's not clear 😦
12:30mohamexiety[d]: but it does look to be NVRM
12:31mohamexiety[d]: in the video he says "the same OS as our DGXs run in datacenter" which would aid that too as I'd imagine GH100/GB200/etc use NVRM
12:31x512[m]: NVRM have Tegra and SoC mentions in source code.
12:31avhe[d]: mohamexiety[d]: good news imo
12:32avhe[d]: the tegra software stack is a bit dodgy
12:32mohamexiety[d]: yeah
12:32avhe[d]: x512[m]: Yes that was discussed some time ago. Apparently they use a display driver from nvrm on some tegras
18:39rasterbandgap: 366+328+324+332+−40+362+322−512−72−144−144 the trick actually is based off that the hash is formed so that it has high and low, where high and slightly smaller term, i.e almost two equal halves. So that one can cancel itself and collide into smaller high or low to yield a value. but it is very simple to generate since the smaller term has to result in 144 to cancel itself from this
18:39rasterbandgap: arithmetic. 362+322+332−512−144−144−72=144 (because the subsequent arithmetic removes 1024 36 and 144 from it), that minus 40 is just a hack. to manipulate spectrum. since after 144 removes itself from subsequent arithmetic the terms subtracted by 1024 were -8 and -6, hence it reverses the polarity and has 40+40+2 as result so 82 is gotten like this:
18:39rasterbandgap: 366+328+324+332+−40+362+322-144-144-72-512=1122 1994+1122-72-144-1024=1912 and 1994-1912=82 where as the other way around is -82. I do not know what your arrogant asses are trying to prove here, or what economy are you talking about , however it works on bigger sequences too, cause when the douplet of terms is treated with a mod, as after subtracting element buffers as modified procedure
18:39rasterbandgap: or derivate of previous should get to 366+328+324+332+362+322−2−144−1024−144−144−144−72−72−144−144=0, and to merge the contents 2034−512−72−144−144−512−72−144−144−328+80+40 , but if one wants to avoid 40 which comes as 80/2 with bitshift, such as dma or gpu codes would lack bitshift however gpu has mul 0.5 always present, but for dma half spectrum could be used to have 42 used as max,
18:39rasterbandgap: it's longer story you can use odd's too and have two cells for single value etc. But all the trick adheres to the fact that you lock the values together at constant difference such that 366+328+324-144-144-72-512=146 and 332+322+362-144-144-72-512=144 and proportionally you compile so to cancel with the help of those while allowing selection to succeed as per one or more results fetched.
20:35gfxstrand[d]: How do I enable VM logging again?
20:35rasterbandgap: There is no whole lot of fixes needed to these ideas as they should be in working condition, and the technology was brought to you by Mart Martin the most violated person in the country of Estonia, his hardowrking pribate labs, and there will be war started against those people i promise. Same wise i mention again that I have not had issues against Russians at all, and according to the
20:35rasterbandgap: proportional assault clause in law for jack, i said already clearly, i go back and kill him, because i have lifetime neck injury and i had a man's neck which he does not have so proportional attack from behind is a kill off anyways, but if he turns himself in for the jail for couple of years he survives, he has to do time there or he's soon dead. couple similar offers to others like Alex
20:35rasterbandgap: enrico sif , but nastiest terrorists are all finished during next few years. So this is a clear signal for the future, that they were not allowed to terror a childhood champion to cripple.
20:36gfxstrand[d]: gfxstrand[d]: I strongly suspect something page table related is going on with Kepler images.
20:49redsheep[d]: What did the GPU from New York say when it ran out of TLB?
20:49redsheep[d]: I'm walkin' here!
23:53butterflies[d]: avhe[d]: Orin uses NVRM for display and nvgpu for gpu
23:53butterflies[d]: And there's an official NV way to use Orin (w/ full feature set) with a stock kernel but it involves quite some DKMS modules