02:24 worldhacker: hi all
09:13 anonymous: Hi, a noob question here: does nouveau work stably on the 40 series (Ada Lovelace) desktop GPUs?
14:25 ifwoera[m]: Get in touch with this platform for greatness you’ll definitely thank me later
14:25 ifwoera[m]: â„šī¸â¤ī¸
14:25 ifwoera[m]: https://t.me/+zEFPK-AiVFEyZDM0
16:49 mhenning[d]: snowycoder[d]: I suppose we could ask, although there's no guarantee we'd get anything, and if we do get anything it'll likely be given to red hat under nda.
16:50 karolherbst[d]: mhhhhh
16:50 karolherbst[d]: stuff for older gens is generally more difficult because it draws resources from newer stuff
16:50 karolherbst[d]: but yeah.. could ask...
16:50 karolherbst[d]: so far still waiting on the ampere/ada ones
16:51 karolherbst[d]: and with ampere/ada I mean ones that are correct for non SM80 ampere 😄
19:35 TheHypervisor[m]: This is a bit out of scope for Nouveau but how does NVIDIA implement PCIe 1.0 x1 on their CMP/Mining GPUs? There are a lot of these GPUs on the used market for cheap and I've been wondering if this is just a firmware/driver restriction as that would make these an amazing value if the clocks and PCIe version could be upgraded.
19:36 TheHypervisor[m]: Does Volta use the GSP?
19:39 gfxstrand[d]: Volta does not use GSP. Volta+Nouveau is a very expensive, very hot paperweight.
19:47 HdkR: It's shiny though.
19:48 HdkR: Has the weird bronze/gold colouring that NVIDIA loves :P
19:49 TheHypervisor[m]: Lol I can't tell if that's a good or bad thing for the prospects of using the CMP 100HX for gaming in a VM
19:50 TheHypervisor[m]: I imagine they'd implement the FP64 nerf in the GSP, right?
20:08 gfxstrand[d]: I think fp64 is fine but we can't reclock it so it's slooooow
20:09 gfxstrand[d]: And fp64 doesn't matter for games
20:09 gfxstrand[d]: But reclocking matters for everything.
20:11 mohamexiety[d]: Isn’t CMP 100HX ampere?
20:11 mohamexiety[d]: It’s cut down A100 iirc
20:12 mohamexiety[d]: 8GB of HBM (to make sure it’s useless for AI), 1.0x1 (to make sure it’s useless for anything serious), no display out (duh), and no NVLink so you don’t get any funny ideas :KEKW:
20:13 mohamexiety[d]: Oh nah it’s Volta. TIL
20:18 mohamexiety[d]: (CMP 170HX is the ampere one. If you don’t mind potentially completely wasting the money could look into that as it should have the GSP. Note though that this is based on A100 Ampere not the consumer lineup)
20:31 TheHypervisor[m]: <mohamexiety[d]> "8GB of HBM (to make sure it’s..." <- x1 can be fixed... (full message at <https://matrix.org/oftc/media/v1/media/download/ATGhwidAqSShuQ_lndSVNH1LUiZWLCSnFVMzupF0e5IQYYxfZw5prDoNP-drkr1dYWv9BbLItVNz5VQnK3KCiwtCeYSG7fiAAG1hdHJpeC5vcmcvUGJyQ21vU2tuclRVYldHc1JsWm9GTkNz>)
21:28 mohamexiety[d]: TheHypervisor[m]: I know that FMA in general is gimped on these cards, not just F64 FMA so not sure how it will work out for games but I guess worth a try :thonk:
21:28 mohamexiety[d]: but yeah with idle clocks (nouveau on Volta) you have bigger issues to worry about re perf
21:29 TheHypervisor[m]: mohamexiety[d]: How would it work out on Ampere or Turing with proper reclocking?
21:29 mohamexiety[d]: it would just work. the card will get to full clocks and all that
21:30 TheHypervisor[m]: Even F64?
21:31 TheHypervisor[m]: So something like the CM50 just becomes a cheap 2080 ti with 10 GB of VRAM at PCIe 1x?
21:31 mohamexiety[d]: oh if you mean the nerfs, I am not sure honestly. it depends if they're HW level/firmware level or just done in the driver. if they're HW level or firmware level then FMA will remain nerfed even on nouveau since all that stuff isnt handled by us
21:35 TheHypervisor[m]: Turing firmware signing got bypassed a while back so I wonder if I can flash a modified RTX 2080 Ti firmware
21:37 TheHypervisor[m]: Huh, even Ada Lovelace has a bypass
21:37 TheHypervisor[m]: But I dont know if its a fully signing bypass
21:59 karolherbst[d]: you can't really flash the firmware, only flash the vbios
22:04 austriancoder: I am looking at NIL and wondering what unit GOBs represents
22:09 karolherbst[d]: pixels
22:09 karolherbst[d]: or well..
22:09 karolherbst[d]: a specific amount of them
22:12 sonicadvance1[d]: GOB: A collection of like-minded pixels.
22:12 karolherbst[d]: austriancoder: https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/nouveau/nil/tiling.rs#L15
22:13 karolherbst[d]: maybe it's bytes..
22:14 austriancoder: ahh.. thx
22:14 sonicadvance1[d]: I think it doesn't necessarily mean pixels since storage buffers can be in GOB format these days?
22:14 karolherbst[d]: yeah..
22:14 karolherbst[d]: but GOBs are like the smallest part in the tiling layout
22:15 karolherbst[d]: maybe "element" is the better phrase?
22:15 karolherbst[d]: there is more comments on the tiling below
22:16 sonicadvance1[d]: I was super excited about blackwell gaining support for tiled storage buffers, but I personally don't have a use for it anymore 😄
22:17 karolherbst[d]: 😄
22:17 karolherbst[d]: I mean...
22:17 karolherbst[d]: ~~could always used them as 1D images~~
22:18 mohamexiety[d]: austriancoder: it's Group Of Bytes. basically a bunch of bytes; 512B, arranged in a 64x8 layout on all current discrete GPUs.
22:19 mohamexiety[d]: Tegra has a different layout but I _think_ it's still 512B/64x8 just arranged differently. it's documented in the TRM for each SoC though
22:19 karolherbst[d]: tegra has a different layout?
22:19 karolherbst[d]: pain
22:20 mohamexiety[d]: it's arranged differently for sure, but no clue if the older SoCs had even more differences
22:20 mohamexiety[d]: when starting out with host copy we used the Orin TRM as a reference and it was wrong
22:20 karolherbst[d]: how does that work out in regards to modifiers tho?
22:21 mohamexiety[d]: Tegra has different modifiers
22:21 sonicadvance1[d]: Orin/Thor should have gained support for Desktop tiling format at least.
22:21 karolherbst[d]: aren't those just desktop GPUs 😛
22:21 mohamexiety[d]: weeell :KEKW:
22:21 sonicadvance1[d]: :>
22:22 mohamexiety[d]: Orin is sm_87 rather than _86, and Thor... isnt out yet but we have dieshots and it's really weird
22:22 karolherbst[d]: yeah I mean funky shader stuff, because they need to do AI/ML in fast, but don't want it on consumer cards 😛
22:22 karolherbst[d]: probably
22:22 karolherbst[d]: or they have even weirder reasons
22:22 mohamexiety[d]: Thor should be the same thing as Spark/GB10, and the released die shot/CGI from NVIDIA shows a design that's kind of a hybrid between GB200 (SMs and compute, with cut down tensor cores?) and GB20x (graphics parts)
22:23 mohamexiety[d]: shouldnt have to wait long to find out for real at least. allegedly they're finally launching july 22
22:23 karolherbst[d]: apparently sm_87 has more shared memory
22:23 sonicadvance1[d]: A month late but better than usual.
22:24 karolherbst[d]: I should start looking into the tensor stuff, but I don't have any GPU for it 😄
22:24 karolherbst[d]: at least I know how it works, lol
22:25 mohamexiety[d]: wait didnt you already do that?
22:25 karolherbst[d]: no?
22:25 mohamexiety[d]: huh wasnt that the coop matrix stuff? 😮
22:25 karolherbst[d]: nope
22:25 karolherbst[d]: tensors use special instructions which are well.. tensor instructions
22:26 karolherbst[d]: https://docs.nvidia.com/cuda/cuda-binary-utilities/index.html#hopper-instruction-set
22:26 karolherbst[d]: "Tensor Memory Access Instructions"
22:27 mohamexiety[d]: oh that's not relevant is it
22:27 mohamexiety[d]: I thought it was exclusive to sm_100/103 only
22:27 mohamexiety[d]: so GB200/GB300 only
22:27 karolherbst[d]: blackwell also got "tmem" yeah
22:28 mohamexiety[d]: consumer blackwell?
22:28 karolherbst[d]: no clue
22:28 karolherbst[d]: there is "Uniform Matrix Multiply and Accumulate" with blackwell
22:28 karolherbst[d]: which is a tensor op
22:28 karolherbst[d]: it's all kinda weird
22:29 karolherbst[d]: pre blackwell, all the MMA stuff is just.. well.. plain global memory load/stores
22:29 karolherbst[d]: and you have a fancy ALU doing a thing
22:29 karolherbst[d]: (and ldsm which is just crazy)
22:30 karolherbst[d]: https://gitlab.freedesktop.org/karolherbst/mesa/-/commit/6a661dada5b7bfb5df350b5dacfd8c31392f27a7
22:47 mhenning[d]: mohamexiety[d]: yeah, I think consumer blackwell got it. that's why they're trying to push the neural rendering garbage so hard
22:48 mohamexiety[d]: the PTX docs implied it was 100a/101/103 exclusive so I am not sure. the neural rendering stuff comes another thing I think, consumer blackwell allows for concurrent vector and tensor within a SMSP now. previous archs it was one or the other
22:49 mohamexiety[d]: thing with tensor memory is it's _a lot
22:49 mohamexiety[d]: _ of SRAM
22:50 mohamexiety[d]: and while it's not exactly a clear cut comparison because the tensor cores themselves are a lot bigger but a single die in GB200 (full GPU is dual die) is reticle sized... with only 74 SMs. GB202 is 750mm2 and has 192 SMs
22:51 mohamexiety[d]: so I dont think consumer Blackwell has the ~ 1MB of extra SRAM per SM given the amount of SMs for the die size
22:58 mhenning[d]: oh, that's possible
23:27 gfxstrand[d]: austriancoder: Group of Bytes. It's 64 bytes by 8 rows.
23:43 gfxstrand[d]: mhenning[d]: Neural rendering, as far as I can tell, is just hmma.