14:12jbakita: Is anyone here familiar with how bare channels are scheduled in the runlist?
14:13jbakita: I'm only familiar with the round-robin scheduling between TSGs/cgrps in the runlist
14:14RSpliet: Hello Joshua!
14:16jbakita: Hello! Are you the Roy I chatted with at some length at ECRTS a few years ago?
14:17karolherbst: jbakita: we do actually have docs on that stuff from nvidia, I just don't know if the details you want to know are in there.. let me find the link
14:18jbakita: karolherbst: Thanks!
14:18jbakita: RSpliet: How have you been? Are you still actively fiddling with GPUs?
14:19karolherbst: jbakita: https://github.com/NVIDIA/open-gpu-doc/blob/master/manuals/volta/gv100/dev_ram.ref.txt
14:19karolherbst: jbakita: there are more files in the directory and they are focused on programming
14:20jbakita: karolherbst: Unfortunately I've already been through those pretty closely =(
14:20karolherbst: skeggsb should know more about it, but Ben is located in australia
14:20karolherbst: jbakita: ahh...
14:20karolherbst: yeah.. so maybe Ben can answer your questions later
14:22karolherbst: ohh wait.. I think I might have something..
14:23karolherbst: jbakita: did you see this and does it cover what you asked for? https://github.com/NVIDIA/open-gpu-doc/blob/master/manuals/ampere/ga100/dev_pbdma.ref.txt#L42
14:23karolherbst: there are quite a lot references to scheduling
14:25karolherbst: anyway.. if you didn't look at the ampere versions of the files yet, I think it would be worth a shot checking those out as they appear to be better documented
14:26jbakita: Maybe? I think that's focused on the scheduling of commands associated with a single channel (?), whereas I'm trying to understand how it schedules between multiple channels when there's no associated TSG
14:26jbakita: That file is a bit difficult to parse though, so I've spent the least time reading it of the other stuff in the open-gpu-doc repo
14:26karolherbst: yeah... not sure.. I think this goes deeper into what the firmware is doing and we usually don't need to do that ourselves
14:27jbakita: Unfortunately probably true
14:29jbakita: I'm trying to investigate if there are better ways to structure the runlist for safety-critical/latency-sensitive applications, but exactly how the runlist is run is a bit unclear
14:30karolherbst: yeah... I think Ben might be able to help here
14:30karolherbst: Ben deals with all this stuff on the kernel side
14:30RSpliet: jbakita: yes still fiddling with GPUs. But now employed by Imagination, so not with NVIDIA GPUs anymore
14:31RSpliet: Sounds like you haven't given up on your dream to make NVIDIA GPUs more HRT-friendly
14:31RSpliet: speaking of the devil!
14:32jbakita: RSpliet: Yeah, I took a brief detour into trying to better manage memory interference for mixed-criticality real-time of COTS multicore, but I'm trying to move back in the GPU direction
14:33RSpliet: Turns out all of that stuff is near-impossible if you have no control over the DRAM controller :-(
14:34jbakita: Page coloring can get you pretty far, but it's really hard to make any strong guarantees
14:35jbakita: Particularly with stuff like MSHR contention (yuck)
14:36RSpliet: Yep, there's the usual cache/DRAM bank partitioning tricks, and some people have tried TDM solutions... but ultimately I feel that all those controls are so far away from the DRAM controller scheduling decisions that it remains approximate. Or "mixed-criticality", as you said :-P
14:38jbakita: What specifically are you working on at Imagination (if you can say)?
14:41RSpliet: jbakita: nothing researchy currently, just performance modelling. Hope to get back to research at some point, but all in good time. Had to take a breather to recover from finishing that PhD
14:41RSpliet: speaking of which, in a solid act of patting myself on the back, have you seen the 12-page summary of my PhD in TC?
14:41jbakita: karolherbst: What's the rule for nouveau folks looking at NVIDIA's Jetson driver for reference? Are there issues because it's not all GPL?
14:42jbakita: RSpliet: I have not, I'll look it up now
14:42karolherbst: jbakita: the kernel side is all GPL, no?
14:44jbakita: karolherbst: I think it's a mix of MIT and GPL
14:45karolherbst: ehh, yes and no
14:45karolherbst: nouveau is also GPL
14:46karolherbst: you can't ship a linux kernel without _all_ being GPL
14:46karolherbst: that nouveau on its own is MIT is nice and all
14:46karolherbst: but when shipped as a whole it's GPL. period.
14:46karolherbst: MIT just allows you to ship the source also as GPL
14:47karolherbst: same with nvgpu
14:48karolherbst: licensing the drm drivers as MIT has the advantage that BSD folks can grab the source and include it
14:48orbea: it also has the advantage that some proprietary OS could use all the code with giving nothing back
14:49karolherbst: yeah well.. that's for the BSD folks to decide
14:50karolherbst: but yeah..
14:50karolherbst: it also allows vendors to ship their drivers as a non GPL version (akin to nvidia) based on the MIT source code
14:50karolherbst: amdgpu-pro is something like that? or was.. I don't know if they have some closed kernel module
14:50karolherbst: but yeah..
14:53jbakita: Well, that's a nice with with nvgpu (the Jetson driver), as they have a much more nuanced approach to runlist management
14:53karolherbst: yeah.. well.. I would be surprised if ours would be better tbh :D
14:55jbakita: Haha, nouveau appears to just dump all the bare channels in and then all the channel groups. nvgpu appears to have the capability to take priority into account in some way when doing some sort of interleaving
14:55karolherbst: yeah.. but this might actually be a thing more requested by compute workloads and for graphics it might matter less.. dunno
14:56jbakita: Very good point, I've been very focused on compute work so far
14:56karolherbst: well at least we have working OpenCL for nouveau now.. for some defintions of "working"
14:56jbakita: I need to think more about the implications for graphics
14:57jbakita: RSpliet: Do you have a link to the TC article? My university accidentally let our IEEE subscription lapse...
14:58jbakita: RSpliet: Your dissertation looks excellant though
14:59jbakita: karolherbst: Do you know what happened to that brief attempt by NVIDIA to get CUDA working with nouveau?
15:02RSpliet: jbakita: can you request the full-text through the Cambridge repo? https://www.repository.cam.ac.uk/handle/1810/318427
15:05jbakita: RSpliet: Yeah, but I think you have to approve the request
15:06RSpliet: I do. I bet it's greylisted, I'll approve as soon as I see the e-mail (... I hope they don't have an older e-mail address!)
15:25RSpliet: jbakita: it looks as if I'm not receiving the requests for full-texts from the repository. :-( I've sent them an e-mail to see if they are sending these requests to an outdated e-mail address... I don't think I have the PDF at hand myself at the moment. Bear with me!
15:36jbakita: RSpliet: No problem! I think IEEE policy now is that you're allowed to also post papers on personal websites etc if you want to
19:07marex: karolherbst: hello again, I need help with my computer :)
19:08marex: karolherbst: so now that the kernel doesn't go bonkers anymore, I'm still running into suspend/resume taking forever when wlroots/sway is running
19:08marex: there is some buffer that is being held by the compositor in the kernel (I believe), do you have any hints how to go about debugging that part ?
19:08marex: or at least how to narrow it down a bit ?
20:39karolherbst: marex: do you have any error thrown?
20:46marex: karolherbst: suspend/resume takes minutes
20:46marex: karolherbst: there is no error, although it is visible from kernel log timestamps
20:46marex: karolherbst: the nouveau module does have some debug options, doesnt it ?
20:52karolherbst: marex: maybe boot with nouveau.debug=trace
20:52karolherbst: it's _very_ verbose, but maybe it helps...
20:52karolherbst: on a second though...
20:52karolherbst: maybe use novueau.debug=debug first
20:55marex: karolherbst: lets see if I can rmmod/modprobe nouveau instead
20:57marex: I can't, hum, so lets reboot
21:09marex: grumb, kernel log buffer too small
21:11marex: try two
21:17Lyude: i always do log_buf_len=10M
21:19marex: Lyude: that box isnt mainline for debugging kernel
21:19marex: ugh ...
21:21marex: OK, paste.debian.net recognizes the nouveau kernel log as "do not spam"
21:21marex: what the ...
21:25marex: karolherbst: https://dpaste.org/hQ3v try this
21:25marex: Lyude: ^
21:25marex: notice these two "suspend completed in <2s>" ...
21:30karolherbst: marex: ehhhh....
21:30karolherbst: something looks wrong :D
21:31karolherbst: skeggsb: any idea on how to debug that?
21:35skeggsb: will need to see a more complete log
21:36marex: skeggsb: there isnt really all that much else
21:36marex: skeggsb: can I make it a bit more verbose ?
21:37marex: or rather , what am I looking for ?
21:37skeggsb: the verbosity is fine, but i need all the messages *before* the ones in the paste too, to see more specifically what part of suspend is taking longer
21:37skeggsb: nouveau: DRM-master:00000000:00000080: suspend running...
21:37skeggsb: nouveau: DRM-master:00000000:00000080: suspend completed in 1780003us
21:37skeggsb: that is the time taken for that, and all children (which will be earlier in the log)
21:40skeggsb: ah, and actually, can you use "trace" and not "debug"
21:40marex: skeggsb: https://dpaste.org/Kkan
21:40marex: skeggsb: try that ^
21:41marex: skeggsb: I would really like to know how can you tell what went on from this spew
21:41skeggsb: yeah sorry, i misspoke above, i need the trace output instead
21:44marex: hold on, its suspending ...
21:51marex: 32 MiB kernel log buffer was insufficient, hold on
22:08marex: skeggsb: https://dpaste.org/tmb8
22:08marex: skeggsb: the whole log has some 70 MiB
22:08marex: the imem looks suspicious ?
22:08skeggsb: yeah, it's what i thought it would be, backing up all the page tables and engine contexts etc
22:09skeggsb: that's a read-back over vram, and can be *veeeeery* slow
22:09marex: how much is there to back up if its just wlroots running ?
22:10skeggsb: varies a lot based on GPU (graphics context size can be very different)
22:10marex: but the suspend still takes minutes, not seconds ... maybe I should take a more general approach and profile the suspend itself, to see whether this even is nouveau problem
22:11skeggsb: hmm yes actually, that value is "only" 1.8 seconds
22:11marex: the odd thing is, when I stop the wlroots/sway, the suspend is almost instant
22:11marex: so there must be something with the sway ...
22:11skeggsb: in the log, what are the kernel timestamps from the very start of nouveau suspend messages, until the end?
22:13marex: 252...491 seconds from boot, give or take
22:14marex: lemme add initcall_debug just for a try
22:14skeggsb: ah, not from boot, just from when nouveau starts its suspend (ie. to see if it's nouveau that's taking ages somewhere)
22:15marex: yep, suspend commences at 252 mark, and the system disables CPUs around 491
22:15marex: in those ~3 minutes, "something" happens ...
22:23marex: OK, so it is nouveau
22:27marex: ahhh, I think I see it now
22:28marex: skeggsb: https://dpaste.org/v1ME#L367 look at line 302 and then line 367 and notice the timestamps there
22:28marex: could it be the I2C got turned OFF and then the FAN control still tries to frob with the fan (likely over i2c) and that is where it accumulates massive timeouts ?
22:28marex: Lyude: karolherbst: ^
22:37karolherbst: marex: unlikely
22:38marex: so why are there these 1 second long gaps ?
22:39karolherbst: it might not even be a nouveau bug directly...
22:39karolherbst: the entire suspend process is a bit... annoying
22:40karolherbst: marex: do you know when the kernel gets notified about suspending?
22:40karolherbst: does it wait mostly before that or after?
22:45marex: karolherbst: doesn't the initcall debug indicate that nouveau is what is taking forever to suspend ?
22:45marex: karolherbst: as for the later part of your question, uh ... what am I looking for in the kernel log ? somewhere around freezing processes or something else ?
22:45karolherbst: why does it?
22:45karolherbst: yeah.. something like that
22:45karolherbst: it is quite obvious
22:46karolherbst: "PM: suspend entry" I think
22:47karolherbst: before that userspace is doing random stuff to prepare going into suspend
22:47karolherbst: and sway might just do this
22:47karolherbst: not saying it's not nouveau fault, but we should't assume that nouveau actually slows down suspending, but might just confuse sway or whatever
22:48karolherbst: or maybe something else is going on...
22:49marex: [ 89.986823] PM: suspend entry (deep)
22:49marex: [ 90.348508] nouveau 0000:01:00.0: PM: calling pci_pm_suspend+0x0/0x160 @ 370, parent: 0000:00:03.0
22:49marex: [ 90.388790] pcieport 0000:00:1c.7: PM: pci_pm_suspend+0x0/0x160 returned 0 after 1 usecs
22:50marex: [ 356.934800] nouveau 0000:01:00.0: PM: pci_pm_suspend+0x0/0x160 returned 0 after 266587034 usecs
22:50marex: [ 356.934935] pcieport 0000:00:03.0: PM: calling pci_pm_suspend+0x0/0x160 @ 993, parent: pci0000:00
22:50marex: [ 356.935977] nouveau 0000:01:00.0: PM: calling pci_pm_suspend_late+0x0/0x30 @ 993, parent: 0000:00:03.0
22:50marex: [ 356.936001] nouveau 0000:01:00.0: PM: pci_pm_suspend_late+0x0/0x30 returned 0 after 0 usecs
22:50karolherbst: yeah.. okay, that's quite obvious that it's nouveau stalling :D
22:50marex: [ 356.954454] nouveau 0000:01:00.0: PM: calling pci_pm_suspend_noirq+0x0/0x2c0 @ 7, parent: 0000:00:03.0
22:50marex: [ 356.954462] nouveau 0000:01:00.0: PM: pci_pm_suspend_noirq+0x0/0x2c0 returned 0 after 0 usecs
22:50marex: [ 356.976172] Disabling non-boot CPUs ...
22:51marex: that should be all that is relevant, the rest looks like ... churn
22:51karolherbst: yeah.. so the pci_pm_suspend call into nouveau is taking so long
22:51marex: yep, I think so too
22:53karolherbst: skeggsb: btw.. is there anything critical in the MR? If not, I'd use this one to figure out what I can do with a merge bot making the process less painful (like adding s-b-o tags, rebasing, adding Link: tags, etc...) If yes, we might want to short curcuit the critical things and I play around with the patches less important
22:54karolherbst: on the other hand.. we are at rc5 and it will take like 3-4 weeks until those patches will be pulled into drm anyway
22:54karolherbst: into upstream
23:18skeggsb: marex: what's dmesg say after nouveau loads for "using <X> for buffer copies"
23:18skeggsb: karolherbst: most of them are trivial bits iirc, Lyude wanted her crc stuff pulled in properly though
23:21karolherbst: skeggsb: all of Lyude patches I assume?
23:21karolherbst: anyway.. it shouldn't take me too long to figure out those fings
23:24marex: skeggsb: I can mail you the whole log if you want, redhat or gmail mail ?
23:25marex: it is like 1.5 MiB compressed
23:26skeggsb: sure, doesn't matter which address
23:29marex: skeggsb: sent to redhat one, compressed to max