03:22fdobridge: <gfxstrand> That's in vkcube?!? Why is vkcube taking derivatives?!?
03:23fdobridge: <gfxstrand> @Mr Fall🐧 I'm gonna assume you're asleep right now so I don't expect an answer right away but did you make any progress on reclocking?
06:46fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> I just used `NVK_USE_NAK=all` 🐸
08:38fdobridge: <DadSchoorse> does nvidia have special hw for derivatives or do they use subgroup ops like amd?
09:30fdobridge: <karolherbst🐧🦀> what do you mean by using subgroup ops?
10:17fdobridge: <karolherbst🐧🦀> I'm not sure if it's possible at all if the current code... I can get the core to reclock, but not memory
10:20fdobridge: <karolherbst🐧🦀> @gfxstrand if you have a benchmark only needing high core clocks (like pixmark_piano, but for vulkan) this should be good enough: https://gitlab.freedesktop.org/karolherbst/nouveau/-/commit/abc4e378ad5cff28b96077ca65c6431f907ac528
10:21fdobridge: <karolherbst🐧🦀> I tried to boot our old PMU firmware image, but uhhh.. that stuff is highly incompatible how we are doing things these days
10:22fdobridge: <karolherbst🐧🦀> might be possible somehow tho
10:24fdobridge: <karolherbst🐧🦀> maybe some odd lowering?
10:29fdobridge: <Ella> it's a hack for lighting
11:22fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> ?
11:51fdobridge: <Ella> taking derivatives of the texture coordinates gives you tangents. taking the cross product of the tangents gives you the surface normal
12:03fdobridge: <karolherbst🐧🦀> but anyway.. adding support for fddx/fddy should be trivial
12:44fdobridge: <DadSchoorse> Amd uses v_sub and and subgroup quad swizzle for it.
12:45fdobridge: <karolherbst🐧🦀> mhh, yeah.. I was under the impression we have special instructions for that, but seems like nvidia ditched it and it's some subgroup stuff + shuffle
12:51fdobridge: <DadSchoorse> Amd had a special tex instruction on terascale.
13:06fdobridge: <karolherbst🐧🦀> yeah.. not sure if ours is a tex instruction, but it's called `dfdx/dfdy` afaik
13:08fdobridge: <karolherbst🐧🦀> anyway... it's like 2 or 3 instructions on maxwell after they dropped it
13:08fdobridge: <karolherbst🐧🦀> we have a special modifier on shuffle to do this efficiently
13:15fdobridge: <Mohamexiety> bit of a dumb question, sorry, but how does this give you a derivative? I can understand v_sub, but how does the shuffle/swizzle take you there?
13:15fdobridge: <Mohamexiety> bit of a dumb question, sorry, but how does this give you a derivative? I can understand v_sub, but how does the shuffle/swizzle after that take you there? (edited)
13:20fdobridge: <gfxstrand> I already have that patch
13:21fdobridge: <gfxstrand> It
13:21fdobridge: <gfxstrand> It's a texture instruction on Qualcomm as well (edited)
13:22fdobridge: <gfxstrand> Ok, you made it sound like maybe you had a plan for that but now it sounds like you don't
13:23fdobridge: <karolherbst🐧🦀> I still have, but that plan doesn't really work on the recent kernel
13:23fdobridge: <karolherbst🐧🦀> could potentially compile a very old one and use that tho...
13:24fdobridge: <gfxstrand> Throw me whatever branch you have. As long as I don't need hang recovery, I should be okay. 😅
13:25fdobridge: <gfxstrand> Might need to backport the helper patch. 🤷♀️
13:31fdobridge: <karolherbst🐧🦀> @gfxstrand on top of 4.19: https://github.com/karolherbst/nouveau/commit/1883fba75c633e84fa9f8ab687af4793f2650d06 and https://github.com/karolherbst/nouveau/commit/241342504559eb76bba49fe6e3bd906dd9ea32df
13:31fdobridge: <gfxstrand> kk
13:32fdobridge: <karolherbst🐧🦀> for the helper invoc it might be enough to run the turing thing on maxwell, just with a different register
13:33fdobridge: <karolherbst🐧🦀> `0x419f78` should be the kepler/maxwell/pascal one
13:33fdobridge: <karolherbst🐧🦀> so no need to backport the kernel patch.
13:35fdobridge: <gfxstrand> How do I use that branch? The rest of the kernel isn't there?
13:35fdobridge: <karolherbst🐧🦀> yeah.. we used to have an out of tree thing for nouveau
13:35fdobridge: <karolherbst🐧🦀> there uhm.. was a thing, but might be easier to just use patch directly
13:36fdobridge: <gfxstrand> kk
13:36fdobridge: <karolherbst🐧🦀> or copy
13:36fdobridge: <karolherbst🐧🦀> it isn't much
13:36fdobridge: <gfxstrand> there
13:36fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> How about ~~Thermi~~ Fermi?
13:36fdobridge: <gfxstrand> it's like 15 patches
13:36fdobridge: <karolherbst🐧🦀> mhh?
13:36fdobridge: <karolherbst🐧🦀> just the two
13:36fdobridge: <karolherbst🐧🦀> don't need any of the others
13:37fdobridge: <karolherbst🐧🦀> there are mostly for protection and stuff
13:37fdobridge: <gfxstrand> It's all clk and therm
13:37fdobridge: <karolherbst🐧🦀> like overheating things in case somebody runs it on a GPU with driver managed fans
13:38fdobridge: <karolherbst🐧🦀> none of those other patches matter really
13:38fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> What if someone wants to do a NVIDIA GPU explosion any% speedrun? 😅
13:39fdobridge: <karolherbst🐧🦀> the GPU shuts itself down at some point anyway
13:39fdobridge: <karolherbst🐧🦀> it's just around 113° or something
13:39fdobridge: <karolherbst🐧🦀> depends on the GPU
13:42fdobridge: <Mohamexiety> so core reclocking works fine on laptops since it's not the driver controlling the fans but the laptop itself?
13:47fdobridge: <karolherbst🐧🦀> more or less, I just never upstreamed the patches because it's also quite wonky and I'm sure things like suspend/resume will break it
13:49fdobridge: <Mohamexiety> yeah..
13:57fdobridge: <gfxstrand> Ok, now I pulled @Mr Fall🐧 's nouveau branch and it grabbed me a kernel to go with it?!?
13:57fdobridge:<gfxstrand> is confused
13:57fdobridge: <karolherbst🐧🦀> I meant you can just copy those two changes into a 4.19 kernel tree and it should be fine
13:57fdobridge:<gfxstrand> is failing to use git
13:58fdobridge: <karolherbst🐧🦀> yeah.. don't bother with git, just copy the changes.. there was some git magic to make it work, but...
13:59fdobridge: <karolherbst🐧🦀> you can probably also skip that `NvFanless` part
13:59fdobridge: <gfxstrand> Oh, drp... because I pulled the wrong branch 🙄
13:59fdobridge: <karolherbst🐧🦀> 😦
13:59fdobridge: <gfxstrand> It's a monday...
14:01fdobridge: <gfxstrand> Ok, I'm gonna let this kernel build...
15:06tomman: hi there~
15:06tomman: I'm using phram to use the surplus VRAM on some old GeForce card as swap space
15:06tomman: is there a way to tell nouveau to limit the amount of VRAM it can use? (other drivers like radeon and even uvesafb allow to do so)
15:07tomman: I already got a stern warning from the kernel because I was getting too greedy, apparently
15:07tomman: (I don't use X11 - it's an old routerbox whose main RAM can't be expanded for reasons, and my last recourse is an ancient 64MB MX4000 I had on my box of parts)
15:08tomman: I would like to limit the card to 8MB VRAM, and leave the rest unused for phram
15:13tomman: brb, gonna reboot a few times while I fix this mess
15:25karolherbst: tomman: you are probably better of using zram, as using VRAM of your GPU isn't possible to do reliably
15:26karolherbst: the PCI memory region isn't what people claim it is
15:27tomman: yeah, I had to switch to uvesafb
15:27karolherbst: anyway, you probably have more sys memory than 128MB, so you should use zram
15:27tomman: Both the card and the machine are OLD, so I'm not sure if zram is the best option
15:28karolherbst: another issue is, that using VRAM like that is also going to be slow
15:28tomman: The machine has 256MB, but one service is really gluttonous
15:28karolherbst: how much has the GPU?
15:28tomman: karolherbst: not as slow than a rust spinner, tho
15:28tomman: I use this machine as a routerbox
15:28tomman: years ago I used uvesafb+phram with no issues, as uvesafb does let me limit VRAM
15:29tomman: at the cost of a slower framebuffer console
15:29karolherbst: though I can imagine that the compression/decompression with zram is going to be faster, because in the best case it's just to page out unused stuff
15:29tomman: Oh, the CPU is a single-core Pentium III
15:29karolherbst: I don't see this working with a driver loaded
15:29tomman: (well, a Celeron, actually)
15:29karolherbst: the issue is that the PCI area is configurabe and can even point to system ram
15:29karolherbst: and by default it points to... something
15:29karolherbst: might be VRAM once the GPU is posted
15:30tomman: with modern cards and AGP/PCIe systems I see that being a problem, and I get why the kernel warns me of that
15:30tomman: but this system is quite old anyway
15:30tomman: With nouveau + phram the one that gets angry is... nouveau, the rest of the system keeps trucking along
15:30karolherbst: I mean it's probably fine without a driver, but I'd still check if using zram is better here
15:30tomman: Will research on that, for sure
15:31karolherbst: some use it to double/triple system RAM
15:31karolherbst: you have some compression overhead but it might be fine
15:31tomman: expanding RAM on a i810E chipset is anything but fun because it hates most 128/256MB sticks
15:31karolherbst: it's at least better than disk swap and more reliable than VRAM
15:31tomman: (otherwise a 512MB routerbox would be killer)
15:31karolherbst: mhh.. annoying
15:32karolherbst: tried any other sticks?
15:32tomman: it's ye olde' "do not use high density sticks"
15:32karolherbst: I wouldn't be surprised if people have a lot of old ones around they'd throw away
15:32tomman: Finding 256MB SDRAM sticks these days is difficult in my country, but all the ones I've found haven't worked with this thing at all
15:32tomman: (but they do work on similar vintage, non-Intel chipsets)
15:33tomman: either the machine doesn't detect them, or it does detect but will fail any memory check
15:33tomman: It's not my first rodeo with nVidia cards and abusing VRAM as swap
15:34tomman: years ago I did it on an even older box (a Socket 7 i430VX box!)
15:34tomman: but that one couldn't load the nouveau driver because the machine lacked ACPI
15:34tomman: so uvesafb did the job nicely
15:34karolherbst: mhh.. we hard depend on ACPI?
15:34tomman: for whatever reason yeah
15:34tomman: surprisingly radeon doeesn't
15:34karolherbst: I mean we use it for laptop stuff, just didn't know we also require it
15:34tomman: (and that one DOES allow to limit VRAM)
15:34karolherbst: but it's kinda safe to assume it's there...
15:35karolherbst: it kinda depends on how the VRAM is made available to the OS
15:35karolherbst: it's just kinda funky on nvidia as the access window can and will be configured by various pieces
15:35tomman: In my case lspci sees two memory windows
15:36tomman: Memory at 40000000 (32-bit, non-prefetchable) [size=16M]
15:36tomman: Memory at 48000000 (32-bit, prefetchable) [size=128M]
15:36karolherbst: the first one are gpu registers
15:36tomman: of course I'm using the second one
15:36tomman: but I also know the card only has 64MB (and both uvesafb and nouveau report that correctly
15:36karolherbst: I mean the second region can point to system RAM just as well
15:37karolherbst: it can be configured and the vbios and nouveau mess with it
15:37karolherbst: I don't know what the vbios usually points it to after POSTing it, but.. could also be some random part of the VRAM
15:38tomman: Doubt that's happening here (I would expect a fiery crash in that case), but how I can verify?
15:38tomman: (dmesg lists a bunch of memory mappings here and there)
15:39karolherbst: uhh not sure, I never messed with that stuff myself
15:40tomman: I would assume that it should be safe as long as it doesn't crash and burns, and as long as my framebuffer console doesn't go Funkvision™ :D
15:40tomman: of course I would expect this trick to be kinda unusable/pointless if the card had loads of memory
15:41tomman: (this card isn't even AGP, but PCI - the box doesn't have AGP slots but has a onboard Intel IGP whose framebuffer is too terrible for be of any use)
15:41tomman: certainly I wouldn't want to try on, say, a ReBAR system
15:41karolherbst: PCI is kinda slow
15:42karolherbst: the issue is that also random access RW stuff over PCI is also very slow on top of that
15:42tomman: I'm aware of that, but it still beats an ancient 80GB PATA HDD over a even slower UDMA66 interface
15:43tomman: (and noone makes PATA SSDs despite the surprisingly large base of still-alive PATA boxes out there)
15:43tomman: brb, rebooting again (this time for updates!)
15:49tomman: yeah, uvesafb looks FAR more stable than nouveau
15:49tomman: console is slower, but eh, I can live with that
15:50tomman: those extra 56MB over the PCI bus really help, until hopefully someday I find a 256MB SDRAM stick that this i810E junk wants to accept
15:50karolherbst: yeah.. though I'd still suggest trying out zram to see if that's good enough, as this will also allow you to get more swap space
15:50tomman: I will research on that, indeed
15:50tomman: hopefully it doesn't impact performance that much on dinosaurs
15:51karolherbst: yeah.. it shouldn't
15:51karolherbst: people use it alot in low memory systems
15:52karolherbst: could force an algorithm with good compression/decompression speed
15:53karolherbst: I think lz4 would be the one?
15:53karolherbst: zstd has best compression ratio, but could be very slow
15:55tomman: the target CPU in this case is a Celeron (Coppermine) @766MHz, although I don't rule out upgrading it in the future for a proper Pentium III, maybe a 1GHz one if I can find one for cheap
16:55fdobridge: <gfxstrand> Well, this is a good start... 😓
16:55fdobridge: <gfxstrand> https://cdn.discordapp.com/attachments/1034184951790305330/1102639459327033374/rn_image_picker_lib_temp_421f3b63-5de2-4e00-9881-f4fd979fc4d5.jpg
16:57fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> What is waldorf? 🐸
16:59fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> I see this on Wikipedia (so this might be it): <https://en.wikipedia.org/wiki/Waldorf_education>
17:01fdobridge: <karolherbst🐧🦀> RIP
17:03fdobridge: <karolherbst🐧🦀> but did it reclock memory? 😄
17:12fdobridge: <gfxstrand> No, it failed to load firmware
17:13fdobridge: <gfxstrand> Waldorf is a character from the Muppets
17:13fdobridge: <gfxstrand> https://en.wikipedia.org/wiki/Statler_and_Waldorf
17:13fdobridge: <gfxstrand> I name my computers after muppets. This is a cranky old laptop so Waldorf seemed like a reasonable name.
17:14fdobridge: <gfxstrand> @Mr Fall🐧 `nouveau 0000:01:00.0: Direct firmware load for nvidia/gm204/gr/sw_nonctx.bin failed with error -2`
17:15fdobridge: <gfxstrand> Hrm... Maybe it doesn't know what to do with xz'd firmwares?
17:17fdobridge: <karolherbst🐧🦀> ohh, that might be
17:18fdobridge: <gfxstrand> Ugh... Also, it's failing to remount / read-write, probably because it's btrfs and this is an ancient kernel. 😕
17:18fdobridge: <gfxstrand> I may have spoken too soon when I said I didn't care how ancient. 😅
17:20fdobridge: <gfxstrand> I can re-install with ext4
17:22fdobridge: <karolherbst🐧🦀> 😄
17:22fdobridge: <karolherbst🐧🦀> could see how far you can rebase those two patches
17:22fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> Interesting Ekstrand lore
17:23fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> I gave up on rebasing the OGK ones for :triangle_nvk:
17:23fdobridge: <gfxstrand> Naming them after mupets is actually pretty recent. 🤷♀️
17:23fdobridge: <karolherbst🐧🦀> should be safe on any kernel they savely apply
17:25fdobridge: <gfxstrand> Let's start with an ext4 filesystem. That way I can actually use this thing with an old kernel.
17:26fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> I have 3 named computers: ShintelPotato, RenoirBeast and RISCFruit
17:31fdobridge:<gfxstrand> reheats pizza while waiting for the re-install
17:32fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> Do you eat frozen pizza?
17:34fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> I basically always eat it from a pizza place 🐸
17:34fdobridge: <gfxstrand> Leftovers from a pizza place.
17:34fdobridge: <gfxstrand> We do frozen sometimes.
17:45fdobridge: <karolherbst🐧🦀> that reminds me, I have to eat something
17:52fdobridge: <Mohamexiety> we need to set up a reminder for karol to take care of himself and eat 😦
20:06fdobridge: <🌺 ¿butterflies? 🌸> """kinda"""
20:38fdobridge: <karolherbst🐧🦀> I mean.. if the alternative is a rotating HDD disk from 15 years ago 😛
20:55fdobridge: <gfxstrand> Ok, I've got the system re-installed with ext4 and 4.19 built and working. It's now patched and re-building with the patches. 😄
20:56fdobridge: <gfxstrand> Time to see if I can melt my laptop. 😈
21:05fdobridge: <gfxstrand> `AC: core 1037 MHz memory 5009 MHz`
21:07fdobridge: <gfxstrand> Now time to install Steam again and see if I can play a game
21:08fdobridge: <gfxstrand> @ASDQueerFromEU where did you stash your DXVK branch? Do I need anything besides a Vulkan 1.1 hack patch?
21:17fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> I don't have a single public branch with all of my local changes due to the fear of issue spam 🐸
22:26fdobridge: <karolherbst🐧🦀> btw, if you boot with `nouveau=config=NvBoost=1` or `=2` you can increase the clocks even further, but then you should kinda be careful about thermals 😄
22:26fdobridge: <karolherbst🐧🦀> it's fine in most cases (tm), it's not fine with furmark
22:26fdobridge: <karolherbst🐧🦀> `=0` (which is the default) kinda means "safe in all cases"
22:27fdobridge: <karolherbst🐧🦀> for =1 and =2 we kinda would have to implement power capping and thermal throttling and all that fun stuff
22:32fdobridge: <gfxstrand> Cool. Thanks! Games are downloading now. I'll actually try some out tomorrow. If I can hit 30 FPS @FHD, I'll be hair
22:38fdobridge: <karolherbst🐧🦀> yeah, should be possible.. I hope 😄
23:24fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> I could probably send a tarball of all the patches I used though 🐸
23:25fdobridge: <Esdras Tarsis> Are you trying to implement reclocking for maxwell v2 and pascal?
23:29fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> I think it's incomplete (because externally-controlled fans won't work)
23:29fdobridge: <Esdras Tarsis> Well, it's better than nothing, awesome work