IRC Logs of #nouveau on irc.freenode.net for 2024-07-17

10:06 EisNerd: hi I have trouble with a hybrid graphics with a turing chip, resulting in issues with skype and libreoffice "[ 39.228446] nouveau_pmops_runtime_resume+0xdd/0x180 nouveau" kernel 3.8.12; with and without "config=NvGspRm=1" (at least as far as I can judge, maybe worth to check if it was really picked up, when it was set)
10:06 EisNerd: [ 47.577938] Loading firmware: nvidia/tu117/acr/bl.bin | 0000:01:00.0 3D controller: NVIDIA Corporation TU117GLM [T550 Laptop GPU] (rev a1) | [ 15.914682] [drm] Initialized nouveau 1.4.0 20120801 for 0000:01:00.0 on minor 1 | [ 14.379849] nouveau: detected PR support, will not use DSM | [ 14.490900] nouveau 0000:01:00.0: bios: version 90.17.95.00.a6
10:06 karolherbst: ohh interesting....
10:07 karolherbst: those issues go away with `nouveau.runpm=-0`?
10:07 EisNerd: -0?
10:07 karolherbst: ehh
10:07 karolherbst: 00
10:07 karolherbst: 0
10:07 karolherbst: I mean
10:07 karolherbst: anyway, mind pastbining your entire dmesg output?
10:07 EisNerd: I can try, would recommend using NvGspRm?
10:08 karolherbst: yeah, you should use NvGspRm=1 if that works
10:08 EisNerd: at least it don't seem to make things worse
10:08 karolherbst: that option will enable dynamic reclocking and power management and stuff
10:09 karolherbst: so you can get good perf finally
10:09 karolherbst: highly recommend using it
10:09 EisNerd: ok so I'll try altering to both options
10:10 EisNerd: and then double check after reboot if they have been picked up
10:11 karolherbst: "nouveau.runpm=0" will disable suspending the GPU so it will draw battery way quicker
10:11 karolherbst: but it's a good way to figure out if runpm is responsible for those issues
12:14 OftenTimeConsuming: Hmm, /sys/kernel/debug/dri/0/pstate isn't telling me what the default pstate is.
12:16 karolherbst: there isn't really a default one
12:30 OftenTimeConsuming: What does it default to then and how do I go back to that default after clocking up? Performance is less bad when I set pstate to 0f, but that uses a bit too much power for idle, so I set it to go back to 07, but power state 07 is unstable.
12:47 karolherbst: you can't
12:48 karolherbst: if 07 is being unstable we should probably fix it, because it shouldn't
12:48 karolherbst: be
13:04 OftenTimeConsuming: There was some instability by default in 6.8.7-rc7, but that was seemingly fixed by 6.10.0-rc7. I'll try changing to power mode 07 and see what happens.
13:05 OftenTimeConsuming: It's nice that Tor Browser no longer freezes at random and partial video streams no longer hang Xorg, but running past to end of the yt-dl buffer seems to hang the video, but oh well.
13:05 OftenTimeConsuming: *to the end of the still buffering
13:23 karolherbst[d]: notthatclippy[d]: ahuillet skeggsb9778[d] some thoughts I have in regards to "broken edid" bugs like those: https://gitlab.freedesktop.org/drm/misc/kernel/-/issues/44
13:23 karolherbst[d]: Would it be possible if nvidia could kinda upstream those workarounds? I've had the idea several years ago, but it kinda went nowhere.
13:24 karolherbst[d]: it's something affecting all drivers, so it would be kinda cool to have something in the kernel fixing up those edids for all drivers
13:42 OftenTimeConsuming: >snd_hda_intel 0000:01:00.1: IRQ timing workaround is activated for card #1. Suggest a bigger bdl_pos_adj. Huh
14:19 OftenTimeConsuming: Still seems to have issues but at least it doesn't crash; https://termbin.com/e7nd
14:20 asdqueerfromeu[d]: karolherbst[d]: Hopefully that will finally fix this issue: <https://youtu.be/ZvoVajW4qVE?t=84>
14:34 karolherbst: OftenTimeConsuming: mhhh.. do you have other power states, e.g. 0a?
14:34 karolherbst: and does that one work?
14:34 karolherbst: though those errors could be simply coincidental
14:34 OftenTimeConsuming: Yes; 0a: core 324-692 MHz memory 1620 MHz
14:35 karolherbst: mhh yeah, could use that one for now if that's better. Though what mesa version are you running anyway?
14:35 OftenTimeConsuming: 24.0.8
14:35 OftenTimeConsuming: 5W power difference so bah
14:36 karolherbst: but anyway, "ILLEGAL_INSTR_ENCODING" might indicate something going wrong with memory
14:37 OftenTimeConsuming: VRAM or system RAM?
14:37 karolherbst: VRAM
14:38 OftenTimeConsuming: Issues only seem to happen with playing videos in mpv with gpu-next and Tor Browser.
14:38 karolherbst: mhhh
14:38 OftenTimeConsuming: Minetest works but it runs at 7fps.
14:39 karolherbst: does that also happen with default clocks?
14:39 karolherbst: like the ones you boot with
14:39 OftenTimeConsuming: Yes, but not that exact error.
14:39 karolherbst: then I suspect it's just a mesa bug you don't hit if the GPU is fast enough
14:40 OftenTimeConsuming: I can update mesa, but updating it doesn't seem to make much difference.
14:41 karolherbst: there is something you might be able to try
14:41 OftenTimeConsuming: Btw I have nouveau.config=NvPmEnableGating=1 enabled - I forgot.
14:41 karolherbst: ohh mhh
14:42 karolherbst: that _can_ cause random errors
14:42 karolherbst: might want to try disabling it and see if the errors go away
14:42 karolherbst: what GPU do you have?
14:42 OftenTimeConsuming: 780 Ti.
14:42 karolherbst: right..
14:42 OftenTimeConsuming: I don't really care about errors as long as Xorg doesn't crash - which seems to be the case now.
14:42 karolherbst: I think I've seen random errors caused by NvPmEnableGating being enabled
14:43 karolherbst: so worth a shot disabling it
14:44 karolherbst: another thing to try out is to set the env variable NOUVEAU_LIBDRM_DEBUG=2
14:46 OftenTimeConsuming: The log I posted previously appears to be something to do with slow interrupts (https://termbin.com/e7nd) at the top you can see that an interrupt took too long.
14:46 karolherbst: nah, that should be unrelated
14:47 OftenTimeConsuming: Doesn't PCIe data transfer things rely on interrupts?
14:47 karolherbst: yeah, but it's a different thing
14:48 OftenTimeConsuming: It seems that the version of coreboot used for the KGPE-D16 just has slow interrupts and nobody has fixed that yet.
14:56 karolherbst: OftenTimeConsuming: again, it's probably unrelated, because you see this error also on other systems. Just don't worry about that one
14:56 OftenTimeConsuming: Yep I checked and it seems that issue is actually unrelated unlike previous seemingly interrupt related issues that now look fixed.
16:00 notthatclippy[d]: karolherbst[d]: We don't actually have anything for this model. Or _any_ Samsung monitor at all for that matter.
16:00 karolherbst[d]: notthatclippy[d]: huh......
16:00 karolherbst[d]: but the edid is clearly wrong
16:00 karolherbst[d]: maybe there is another more generic workaround at play here?
16:00 notthatclippy[d]: Your wider message is received, though.
16:02 karolherbst[d]: yeah, hopefully now things are in a better shape to get those things going
16:03 notthatclippy[d]: As usual, no promises, but I'll bring it up.
16:04 karolherbst[d]: I think to remember that one argument on why that can't be done is that some workarounds depend on errata given under NDA
16:04 notthatclippy[d]: Yes. And also just hunting down which of them are under NDA is significant effort.
16:11 karolherbst[d]: yeah.. though could start with bug reports like those
16:11 karolherbst[d]: but if you say this one specific model doesn't have one.. mhhh
16:11 karolherbst[d]: maybe there is a secret edid somewhere or something silly
16:13 skeggsb9778[d]: karolherbst[d]: https://github.com/NVIDIA/open-gpu-kernel-modules/blob/main/src/common/modeset/timing/nvt_edidext_861.c#L181
16:13 skeggsb9778[d]: it's not related to this?
16:14 skeggsb9778[d]: err, line 354
16:15 skeggsb9778[d]: i didn't look very heavily into it, but it looks like a mention of 5120x1440 at least 😛
16:16 karolherbst[d]: mhhhh
16:16 karolherbst[d]: let's see...
16:18 notthatclippy[d]: karolherbst[d]: Oh, it might be in NBSI
16:22 karolherbst[d]: mhh could also be that the edid parser doesn't support some of that
16:24 karolherbst[d]: anyway, the linux edid parser doesn't know any `5120x1440` modes at all 🥲
16:25 notthatclippy[d]: notthatclippy[d]: Nope, belay that. Ideacenter is a desktop.
16:27 karolherbst[d]: seems like the "Resolution Identification" stuff is missing
16:36 karolherbst[d]: skeggsb9778[d]: do you know from what spec that all is?
16:36 karolherbst[d]: or able to figure it out?
16:36 skeggsb9778[d]: it mentions it above the list of modes i think - one moment
16:37 skeggsb9778[d]: oh, no, that was something else
16:39 karolherbst[d]: seems like we also don't have the `CTA-861-G_FINAL_revised_2018_Errata_2.pdf` stuff applied
16:39 karolherbst[d]: and drm uses different timings
16:40 karolherbst[d]: and I also don't see the `VIC 193-219` entries
16:41 karolherbst[d]: ehh wait, thoes are there
16:41 karolherbst[d]: but also slightly different timings
16:43 karolherbst[d]: anyway I don't have access to those docs
16:43 karolherbst[d]: so maybe something else should verify that, because nvidia declares those in a weird way
16:43 karolherbst[d]: or drm deos
16:49 skeggsb9778[d]: karolherbst[d]: the last paragraph of https://en.wikipedia.org/wiki/Extended_Display_Identification_Data#CEA-861-E mentions the standard, i *think*
16:50 skeggsb9778[d]: it seems those modes come from the VFDBs
16:50 skeggsb9778[d]: oh, maybe that's just 4:2:0 support though
16:54 karolherbst[d]: mhhh
16:55 karolherbst[d]: yeah no idea
19:09 tiredchiku[d]: openrm will be the only nvidia module starting with driver 560: https://developer.nvidia.com/blog/nvidia-transitions-fully-towards-open-source-gpu-kernel-modules/
19:09 tiredchiku[d]: `We’re now at a point where transitioning fully to the open-source GPU kernel modules is the right move, and we’re making that change in the upcoming R560 driver release.`
20:00 babblebones[d]: Noiceeee
20:03 babblebones[d]: Hmm begs the question though what of the 10 series and lower?
20:03 babblebones[d]: Does it just toggle detect modules at install?
20:06 babblebones[d]: Ah they mention
21:34 snektron[d]: tiredchiku[d]: After this the only remaining blob is the gsp firmware which is executed on the device, am i right?
21:34 snektron[d]: Which would be similar to how the AMD driver works
21:35 clangcat[d]: babblebones[d]: Yea I mean the 10 series is a sad state
21:36 clangcat[d]: Mainly just the either use prop. Or sacrifice your performance with Nouveau.
21:38 karolherbst[d]: snektron[d]: userspace is all closed source
21:38 clangcat[d]: clangcat[d]: But yea no fault on Nouveau really. Or even really the Nvidia peeps I imagine they don't want to spend lots of time and money working on GSP like firmware for old cards.
21:39 airlied[d]: old cards don't have a GSP core to do GSP-like anything
21:39 clangcat[d]: karolherbst[d]: Yea that to Nvidia's what is is umd driver? Whatever the actual name for for the Userspace driver is +all the libraries it uses. Nvidia's EGL implementation and such.
21:40 clangcat[d]: airlied[d]: Yea but I just mean in general allowing Nouveau(or any FOSS driver) access to the reclock the cards. They aren't going to want to spend time on that for old cards.
21:41 clangcat[d]: But atleast for the time being 10 series and even some older cards are still supported by prop. Which is cool.
21:41 snektron[d]: karolherbst[d]: Ah, right. So does this change anything for nvk? Like would it be able to use this instead of nouveau, or would that be too complicated?
21:42 clangcat[d]: snektron[d]: I mean technically if you re implement things nvk needs on the kernel side(assuming they aren't already there
21:43 clangcat[d]: Though I imagine nouveau and open nvidia module probably work very differently
21:44 karolherbst[d]: snektron[d]: what would be the point though?
21:44 karolherbst[d]: supporting the nvidia kernel driver is a huge PITA due to unstable UAPI
21:44 snektron[d]: karolherbst[d]: I dont know, im just wondering if this opens any doors for nvk
21:45 karolherbst[d]: so if we'd support mesa running on it, we'd have to special case probably every minor driver update
21:45 karolherbst[d]: and this needs a very very very good reason why we should even bother
21:46 clangcat[d]: snektron[d]: I mean technically speaking they are both open source it's possible it's just the question of if it's worth the effort.
21:47 clangcat[d]: karolherbst[d]: Yea I mean I agree it would be cool. But from what I understand the actual kernel modules aren't much good with that closed source Userspace code anyways.
21:49 karolherbst[d]: the uapi is just entirely different, so it adds support burden nobody really wants to take, but yeah.. most of the magic happens inside userspace anyway
21:49 karolherbst[d]: the kernel module usually just allocates memory and gives you an interface to submit stuff
21:50 karolherbst[d]: (and handles display stuff)
21:50 karolherbst[d]: some of it
21:50 clangcat[d]: karolherbst[d]: Yea one of those technically possible. Just how much effort you want to put into something and how much use you will get out of it. The Nvk running on it I mean.
21:51 clangcat[d]: Though I can hardly talked I programmed a Virtio GPU driver cause idk wanted to see if I can
21:51 karolherbst[d]: it's also just a bad user experience, because once the prop driver gets updated, nvk won't be able to run
21:51 karolherbst[d]: until updated
21:52 karolherbst[d]: and the mesa update gets shipped in like... 2 years, because you are running ubuntu LTS
21:53 clangcat[d]: karolherbst[d]: Yea welp ideally if it ever came to that systems would speed up their update cycles/make exceptions. Though whether they would is another thing.
21:53 karolherbst[d]: the only feasible approach here is if downstream owns the compat
21:54 clangcat[d]: But yea better to stay on mesa and nouveau/nova for stability sake.
21:54 notthatclippy[d]: Forgive my ignorance, but would would one gain from running mesa on nvidia.ko at all?
21:54 karolherbst[d]: at which point you arrive at the decision "postpone nvidia driver update just for nvk compat?" in which case they'll say "nah, fuck it, CUDA is more important" or something
21:54 karolherbst[d]: notthatclippy[d]: stability maybe
21:54 notthatclippy[d]: I figured cuda on nouveau would be more interesting
21:55 karolherbst[d]: working on it (tm)
21:55 karolherbst[d]: with 100 footnotes
21:55 clangcat[d]: notthatclippy[d]: It's cool I would say that. But I don't think it would be a super feasible or widely used thing
21:55 karolherbst[d]: ~~there is HIP on top of CL~~
21:55 karolherbst[d]: "mom, I want cuda" - "we have cuda at home" - the cuda at home: HIP
21:56 karolherbst[d]: though seeing vulkan and SyCL replacing CUDA as the only option in a couple of places is kinda cool to see
21:56 clangcat[d]: clangcat[d]: Just cause like most people are either gonna use Nouveau/mesa/nvk or Nvidia(prop/open)/Nvidia userspace stuff. Even if a mix and match was an option
21:57 notthatclippy[d]: karolherbst[d]: I can actually see that being useful. Develop on a local machine with HIP+CL+nouveau, deploy to CUDA+nvidia.ko servers
21:57 karolherbst[d]: notthatclippy[d]: yeah...
21:57 notthatclippy[d]: And then spend a week hunting subtle bugs
21:57 karolherbst[d]: it's just that all those SyCL and hip runtimes require SVM or USM
21:57 karolherbst[d]: and like...
21:57 karolherbst[d]: that's quite the pain to implement in mesa
21:58 karolherbst[d]: and even in proper vendor stacks it's mostly based on good luck
21:58 karolherbst[d]: I still need to figure out a good way to synchronize the CPU and GPU VM without that being a huge headache
21:58 clangcat[d]: Though one neat thing from Nvidia open modules.
21:58 clangcat[d]: I do osdev.
21:59 clangcat[d]: Nvidia is now a lot easier to do in OS-dev XD
21:59 notthatclippy[d]: clangcat[d]: Only if your OS can execute linux (or bsd or solaris) userspace somehow.
21:59 karolherbst[d]: I've checked what intel is doing, and they just mmap and allocate stuff in that place
21:59 clangcat[d]: Not exactly time feasible. But still a lot more than it was when it was all closed sorce.
21:59 karolherbst[d]: on the GPU side
22:00 karolherbst[d]: and maybe I should just do the same, it's just that some drivers e.g. iris in mesa are reserving ranges for internal use, because 3D (tm)
22:01 clangcat[d]: notthatclippy[d]: No as in like the open kernel modules provide a bit more of window into the kernel side operation of the device. i.e. people have reversed engineered the Linux intel driver to get Intel uhd graphics in their OS despite not using the exact same layout as the Linux Intel driver.
22:01 clangcat[d]: I don't mean like a drag and drop solution
22:01 clangcat[d]: XD
22:01 notthatclippy[d]: karolherbst[d]: Could you use nvidia-uvm.ko without nvidia.ko? Hell, just copy it over...
22:01 karolherbst[d]: what I really need is an mmap call where I can say "please allocate in this region"
22:01 karolherbst[d]: notthatclippy[d]: nah, the kernel interfaces aren't the issue
22:02 karolherbst[d]: modern drivers will use VM_BIND, which is "userspace manages the GPU's VM"
22:02 karolherbst[d]: which solves like almost all of those issues
22:02 snektron[d]: notthatclippy[d]: Maybe zluda would have less resitance
22:02 snektron[d]: There's the scale thing now too
22:02 karolherbst[d]: I think I'm just gonna reserve a bug chunk of stuff from the GPU driver
22:02 karolherbst[d]: and use `mmap` to allocate host memory
22:02 karolherbst[d]: and somehow make it work
22:03 karolherbst[d]: I already have a prototype kinda working
22:03 clangcat[d]: karolherbst[d]: I mean you can provide the first argument to mmap but I think that's only a suggestion.
22:03 karolherbst[d]: clangcat[d]: there is `MAP_FIXED` to make it a req
22:03 karolherbst[d]: the issue is just, that mmap is a terrible interface
22:03 clangcat[d]: karolherbst[d]: Ahhh neat to know.
22:03 karolherbst[d]: though
22:03 karolherbst[d]: there is `MAP_FIXED_NOREPLACE`
22:03 karolherbst[d]: because otherwise you'd just overwrite existing mappings
22:04 karolherbst[d]: I had the idea of just reserving like 64GiB of VM space
22:04 karolherbst[d]: but then you'd have every CL process to use 64GiB of VM space 😄
22:04 notthatclippy[d]: IIRC CUDA lets you use regular malloc on the CPU side and the heap VA space is reserved on the GPU.
22:04 karolherbst[d]: which... might not even matter
22:04 clangcat[d]: karolherbst[d]: I mean I would've hoped it would just fail rather than overwrite but I mean I guess it's "neat" you can do that XD
22:04 karolherbst[d]: notthatclippy[d]: right.. that's with mmu_notifiers
22:05 karolherbst[d]: the pain with that is just that.. like.. the hell knows where the memory lives atm
22:05 karolherbst[d]: and implicit migration is quite the terrible perf pitful I'm sure
22:05 notthatclippy[d]: karolherbst[d]: You mean physically? Why do you care?
22:05 karolherbst[d]: notthatclippy[d]: what if the memory ping pongs constantly
22:06 notthatclippy[d]: pin it? ¯\_(ツ)_/¯
22:06 karolherbst[d]: also memory faults in the shader, yuck
22:06 karolherbst[d]: well...
22:06 karolherbst[d]: what if hte application doesn't?
22:06 karolherbst[d]: it's mostly just bad APIs I think
22:06 clangcat[d]: karolherbst[d]: I mean to an extent you can't guard an app from everything they may do wrong.
22:06 clangcat[d]: you can only really do what is reasonably within your power to guard them
22:06 karolherbst[d]: yeah, but the SVM interface for implicit everything is just terrible
22:07 karolherbst[d]: in the explicit SVM mode, the application tells you what range of the allocation it accesses
22:07 karolherbst[d]: and when it stops
22:07 karolherbst[d]: so you can handle migration and everything explicitly
22:07 karolherbst[d]: with the "system SVM" stuff, the runtime doesn't know shit, because only the kernel really knows
22:08 karolherbst[d]: so you can't really do much besides "please move everything into device memory, because it might be accesses"
22:08 karolherbst[d]: and it relies on applications to provide the proper hints
22:08 notthatclippy[d]: No, you really wanna do lazy pagefaults from the device....
22:09 notthatclippy[d]: Hrm.. Could you do this with userfaultfd....
22:09 karolherbst[d]: I question how you can get good perf with that model
22:09 HdkR: karolherbst[d]: As someone that consumes 256TB of VA space daily, 64GB is fast and cheap, just do it :P
22:09 karolherbst[d]: HdkR: I was considering having different SVM models, and the "reliable" one would pre allocate the VM
22:10 HdkR: If it's PROT_NONE by default then it doesn't even consume any PML4 space
22:10 karolherbst[d]: yeah...
22:10 karolherbst[d]: that was my plan
22:10 karolherbst[d]: just needs to write a memory allocator
22:10 karolherbst[d]: notthatclippy[d]: mhh.. that might allow me to know what the application accessed
22:11 notthatclippy[d]: Yeah, vaspace is cheap. I implement all my std::vectors as mmap of a few gigs and MAP_NORESERVE
22:11 clangcat[d]: notthatclippy[d]: This is why I like this chat I find weird functions I ain't ever heard of. (Though largely cause I've never had to deal or catch page faults)
22:11 karolherbst[d]: notthatclippy[d]: does that even do something on top of `PROT_NONE`?
22:12 notthatclippy[d]: I don't do PROT_NONE, and just let it grow organically.
22:12 karolherbst[d]: I see...
22:12 notthatclippy[d]: (this isn't NV code btw)
22:12 karolherbst[d]: yeah.. my idea was to reserve VM space on the host and on the GPU
22:12 karolherbst[d]: and just make it work somehow
22:13 karolherbst[d]: and the `PROT_NONE` was mostly to prevent `malloc` or anything else to use the region I just reserved for the GPU
22:13 karolherbst[d]: and I kinda need to reserve this, because I don't want to teach mesa drivers to be SVM aware
22:13 karolherbst[d]: as they do internal allocations and....
22:13 karolherbst[d]: so I just want them to give me a VM range I can manage myself
22:14 HdkR: I didn't actually check if MAP_NORESERVE with something other than PROT_NONE reserves the PTE
22:14 HdkR: Although spicy if you're expecting "non-mapped" pages to fault
22:14 karolherbst[d]: yeah...
22:14 notthatclippy[d]: Read /proc/self/maps, find where your heaps are, then reserve that VA range on the GPU and use malloc...?
22:14 karolherbst[d]: it's a fun topic no matter how you look at it
22:14 karolherbst[d]: notthatclippy[d]: uhhhh
22:15 karolherbst[d]: I'd rather not use malloc
22:15 karolherbst[d]: luckily nothing really needs system SVM, so I don't really need to map existing allocations
22:15 clangcat[d]: karolherbst[d]: I mean you could just use anything to allocate the memory at that point? no?
22:15 karolherbst[d]: OpenCL has `clSVMAlloc` for the other SVM levels
22:16 karolherbst[d]: so I allocate the host memory as well
22:16 Ermine: nova.ko FTW
22:16 karolherbst[d]: clangcat[d]: so the thing is, I need to allocate memory in such a way, that it has the same address on the CPU side and the GPU side
22:17 karolherbst[d]: the issue with malloc is, that it _might_ return an address which is already reserved by the mesa driver for internal use
22:17 karolherbst[d]: and I'd rather not malloc until I get an address I could use
22:17 notthatclippy[d]: karolherbst[d]: Reserved on the GPU?
22:17 karolherbst[d]: anyway... we have `vma_heap` as a C primitive to manage VM spaces, so I wouldn't even need to write the core part of an memory allocator
22:17 karolherbst[d]: notthatclippy[d]: yes
22:18 notthatclippy[d]: Are these hardcoded?
22:18 karolherbst[d]: sometimes
22:18 karolherbst[d]: sometimes they also do on the fly allocations
22:18 karolherbst[d]: but anyway, the idea was "allocate VRAM sized VM space from the driver" and "mmap the same region on the CPU with PROT_NONE" and go from there
22:19 notthatclippy[d]: That works fine if you require custom alloc functions for it.
22:19 karolherbst[d]: yeah, in CL you have `clSVMAlloc`
22:20 karolherbst[d]: except for system SVM where all allocated memory is also valid on the GPU, but for that I could just say "mmu_notifier" support or.. well.. it won't work
22:20 notthatclippy[d]: Right, but it limits interop with other components, you'll likely have to do some copying on the CPU
22:20 karolherbst[d]: yeah...
22:20 karolherbst[d]: the idea was to do explicit migration
22:20 karolherbst[d]: which is fine
22:20 karolherbst[d]: there is `clEnqueueSVMMap` and `clEnqueueSVMUnmap` the application has to use to access it on the host
22:20 karolherbst[d]: so it's all explicit
22:21 karolherbst[d]: and `clEnqueueSVMMap` has size + offset arguments
22:21 karolherbst[d]: well.. not "offset", but "ptr" which doesn't need to be start of an SVM allocation
22:23 karolherbst[d]: there is also `Fine-grained` SVM which is in between, which has explicit allocation, but implicit synchronization
22:24 karolherbst[d]: the basic "Coarse-grained" just doens't really need driver support besides `VM_BIND` so it's a nice default and implementable on every GPU
22:24 notthatclippy[d]: The problem with such explicit migration is that tools like HIP/scala/ZLUDA/etc will never be able to execute existing CUDA code on that target because the code might be accessing any malloc'd pointer from the device.
22:24 notthatclippy[d]: Which may or may not be a consideration for you
22:24 karolherbst[d]: the implicit synchronization will be the PITA part in all of this and is left for later if something really cares
22:25 karolherbst[d]: notthatclippy[d]: mhhh, but the compiler can do explicit things under the hood, no?
22:26 karolherbst[d]: though the interesting part will be when you have concurrent access from the CPU and GPU
22:26 notthatclippy[d]: Well.. The code gets a pointer and then dereferences it. Compiler can't know where the pointer came from.
22:27 karolherbst[d]: mhh right, and there can always be an indirection
22:27 notthatclippy[d]: The runtime might be able to detect when a CPU heap pointer gets sent to the GPU and rewrite it...
22:27 karolherbst[d]: but I think implicit synchronization needs some sort of kernel support, no?
22:27 karolherbst[d]: (except when you are unified memory)
22:28 notthatclippy[d]: karolherbst[d]: Yes. That's prpbably the bulk of nvidia_uvm.ko
22:28 karolherbst[d]: how did that work before replayable page faults?
22:28 notthatclippy[d]: It didn't AFAIK.
22:28 karolherbst[d]: okay..
22:28 karolherbst[d]: so it only works on niche hardware and nvidia 😛
22:29 notthatclippy[d]: There was a fair bit of incremental steps along the way
22:29 karolherbst[d]: okay... anyway
22:29 karolherbst[d]: for now I want to just implement the "everything explicit" level, because that's tractable
22:29 notthatclippy[d]: Anyway, I'm off to bed. Sorry for derailing this.
22:30 clangcat[d]: notthatclippy[d]: nini.
22:30 karolherbst[d]: implicit synchroniztation might be possible in a simple way on unified memory archs, but devices with discrete VRAM probably need HMM/mmu_notifier support or whatever you want to name it
22:30 karolherbst[d]: dw
22:30 karolherbst[d]: I still need to learn about the details of SVM, because it's just a huge topic honestly