00:07imirkin: on boot:
00:07imirkin: [ 38.358531] nouveau 0000:01:00.0: disp: outp 00:0006:0344: training (min: 1 x 270 MB/s)
00:07imirkin: on resume:
00:07imirkin: [ 72.344911] nouveau 0000:01:00.0: disp: outp 00:0006:0344: training (min: 4 x 540 MB/s)
00:08imirkin: that's not going to go so well ... that's DP 1.2, not supported by your GPU
00:08imirkin: so, question is, why does it think it needs so much bw, and why does it even consider 540MB/s
00:09pedahzur: Haven't a clue. :)
00:09imirkin: on boot: disp: outp 00:0006:0344: data 169155 KB/s link 0 KB/s mst 0->0
00:10imirkin: on resume: disp: outp 00:0006:0344: data 281925 KB/s link 0 KB/s mst 0->
00:10pedahzur: Note that log includes resuming twice.
00:10imirkin: oh. right. the "on boot" is actually "on first resume"
00:11pedahzur: So the black screen results from an invalid DP version read?
00:12imirkin: well, it's the result of unsuccessful DP link training
00:12imirkin: which is a pre-requisite for DP things working
00:28pedahzur: Does does it need to retry? Or have a "special case" for that card that says "No! It's not DP 1.2...don't believe it!" :)
00:29pedahzur: Is there anything I can run at the command line to retry the train/init?
00:49skeggsb: there's two bugs here
00:50skeggsb: in the failing case, we're trying to use 30bpc (nfi why, haven't looked at that yet), and there's not enough DP bandwidth for that
00:50skeggsb: that's the core issue
00:51skeggsb: secondary bug where we pick an insane rate to train at (also don't know why yet)
00:51skeggsb: the second issue only happens because of the first one, so, that should be the focus :P
00:51pedahzur: logging off for the day. If there is something you would like me to try, please reply to the mailing list thread.
00:54skeggsb: i'm not sure i'll be able to properly look more until tomorrow at least anyway, perhaps imirkin will see what causes the first issue before i get there
01:07imirkin: unlikely that i ever will :)
01:08imirkin: maybe some read returning 0xffffffff or something?
01:32skeggsb: fixed the second issue while i was eating lunch.. but won't help the actual bug
01:34imirkin: that's a good one
01:34imirkin: apparently units matter, huh
01:35imirkin: index into array != value in array... who knew
01:35skeggsb: best guess on the other is that connector->display_info.bpc is "0" on resume, and our default is 10bpc in that case
01:35skeggsb: and we also don't bother to validate bw when selecting that value..
01:35imirkin: oh yeah, that seems reasonable
01:36imirkin: although it should be able to do the higher bw with 2x240MB/s
01:36imirkin: er, 2x270
01:36skeggsb: no, the sink only has 1 lane
01:36imirkin: is 324MB/s DP1.2?
01:37skeggsb: no, but either end of a DP connection is allowed to support 1/2/4 lanes
01:37imirkin: oh, 324MB/s = 162x2, nevermind
01:38imirkin: so that would mean that we try to link train before edid becomes available?
01:39skeggsb: i'm not too sure how that happens, we don't train the link until modeset, and presumably we have edid at that point
01:41imirkin: but maybe the decision is done before?
02:04imirkin: cosurgi: did you get a chance to try out that patch?
11:55tretinha: hello! i have a GeForce 920MX, running arch linux with nouveau drivers
11:56tretinha: i'd like to record with obs but i have NVENC problems
11:56tretinha: do any of you know any solution to this?
11:57tretinha: kernel; 5.4.10-arch1-1
11:58tretinha: i have installed "libva-mesa-driver" and "mesa-vdpau" from arch linux with not lucky
12:24karolherbst: tretinha: we don't support video encoding
12:24tretinha: there's no way around it?
14:10joepublic: sure there is, but it would take many coders and much time
14:14karolherbst: imirkin: before digging too much into it, but I think the crash in RA I tried to fix is essentially caused by this line: https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp#n963
14:14karolherbst: does that make sense to you as well?
14:16imirkin: sure, "caused"
14:17imirkin: it's like a core part of the algorithm though, so not easily removed
14:17karolherbst: yeah... but I had this idea to save it somewhere else and do the actual step later
14:17karolherbst: but... that sounds a bit messy to do for this one
14:18karolherbst: but maybe I figure something out...
14:18karolherbst: just.. it feels messy
16:01karolherbst: uff, actually, it wasn't that bad
16:02karolherbst: might even send a patch out tomorrow
16:09karolherbst: imirkin: something like that in less ugly? https://github.com/karolherbst/mesa/commit/3e9c41238b08a45d74e81153d19d945d778decab
16:10imirkin_: heh. map on Value * ... what could possibly go wrong
16:10imirkin_: would it make sense to index it by RIG node id?
16:14karolherbst: I also have to handle removed Values/nodes
16:15imirkin_: the whole Value removing stuff is dodgy
16:15imirkin_: you really don't want to *delete* them
16:20karolherbst: well, it happens when you delete instructions
16:21karolherbst: which we do when we spill
16:27karolherbst: sometimes I am thinking about moving to ralloc and only mark objects as dead so we can assert on that.. but that would also require quite some time and we already have some polled allocation stuff going on actually
16:27karolherbst: it's just not hierarchical
16:28imirkin_: yes, values are alloc'd out of a pool
16:28imirkin_: calim was trying to avoid doing too much c++ stuff
16:28imirkin_: since mesa was fairly anti-c++ at the time
16:28imirkin_: that's why there are all these semi-stl things
16:28karolherbst: he could have defined new container types with custom allocators.. would be less painful I think
16:29imirkin_: was trying to avoid templates and whatnot
16:29karolherbst: like using basic_list to have a pooledList
16:29karolherbst: but yeah..
16:29imirkin_: i went rogue with the tr1 unordered_map stuff :)
16:29karolherbst: now we require c++14 anyway
16:30imirkin_: like i said - different time.
16:30karolherbst: we could clean up so much stuff.. maybe I even go ahead and do it
16:32karolherbst: anyway, I will think about the RA fix a bit, but I think the current approach goes into the right direction.. just need to make it look nicier and handle those weirdo corner cases
17:53pqatsi: karolherbst: Do you mind build kernel-headers package into https://copr.fedorainfracloud.org/coprs/karolherbst/Nouveau_Testing/ too?
17:53karolherbst: the headers would be the same
17:54karolherbst: or... wait...
17:54karolherbst: they wouldn't
17:54pqatsi: karolherbst: I'm using you kernel since we found a patch you did stoped a nouveau crash on my machine, so I dont use upstream kernel anymore. But now Ill being forced to build a module outside mainline kernel
17:54pqatsi: And fedora version mismatches yours
17:55pqatsi: "Package kernel-headers-5.4.7-200.fc31.x86_64 is already installed." :(
17:55karolherbst: ohh wait
17:55karolherbst: the repo contains the kernel-headers
17:55karolherbst: you might just need to upgrade
17:55pqatsi: karolherbst: isnt here https://copr.fedorainfracloud.org/coprs/karolherbst/Nouveau_Testing/build/1144524/ and dnf does not found too
17:56karolherbst: mhh, interesting
17:56karolherbst: pqatsi: actually, I think you hit a different error
17:56karolherbst: you need kernel-devel
17:56karolherbst: kernel-headers are just the UAPI
17:58karolherbst: I don't have issues building out of tree modules, so I would have noticed if something would be missing
17:59pqatsi: karolherbst: hmmm, What i'm facing is this: https://paste.centos.org/view/54610f7f
17:59karolherbst: # ls /usr/src/kernels/: 5.4.10-9001.fc31.x86_64 5.4.7-9001.fc31.x86_64 5.4.8-200.fc31.x86_64 here
17:59karolherbst: pqatsi: do you have kernel-devel installed?
18:00karolherbst: or just do "dnf install /usr/src/kernels/5.4.10-9001.fc31.x86_64"
18:00pqatsi: [root@manauara rts5139]# rpm -qi kernel-devel-`uname -r` | grep ^Version
18:00pqatsi: Version : 5.4.10
18:00karolherbst: use the dnf install command
18:00karolherbst: that should help
18:01pqatsi: karolherbst: "dnf install /usr/src/kernels/5.4.10-9001.fc31.x86_64" says its already installed
18:01karolherbst: check "ls /usr/src/kernels/" then
18:01karolherbst: ohhh wait
18:01karolherbst: Documentation/Kconfig doesn't exist for real
18:02karolherbst: the same thing with the fedora kernel
18:02pqatsi: karolherbst: Hmmm, Ill check a way to just remove from module Im trying to build
18:03karolherbst: the file does exist in the kernel tree though
18:03karolherbst: maybe that could be considered a fedora bug?
18:04pqatsi: hmmm :(
18:04karolherbst: with_doc in the spec file
18:05pqatsi: karolherbst: Also https://bugzilla.redhat.com/show_bug.cgi?id=1478726
18:05karolherbst: mhh, I could enable it.. let me see how to do that
18:06pqatsi: karolherbst: If you dont mind, please. I think as I need to do with NVidia, ill need to do with my realtek card reader too: Changes in kernel at every upgrade :(
18:08karolherbst: pqatsi: triggered a new build.. might take 10 hours or so
18:09pqatsi: karolherbst: thank you so much :)
18:10karolherbst: pqatsi: https://copr.fedorainfracloud.org/coprs/build/1144781
18:10karolherbst: pqatsi: I think there will be a new kernel-doc package when it's done
18:12pqatsi: Ok! Ill keep my eyes on the build. Thanks!
18:30pqatsi: karolherbst: A bit offtopic, but my module is requiring a include contained in kernel-debug package. Do you know how can I include the path of debug symbols in make? (Ref.: https://paste.centos.org/view/f1e1181e)
18:31karolherbst: pqatsi: it could be that the module simply doesn't compile against a newer kernel...
18:32pqatsi: Hmmm, fair :(
18:32karolherbst: ohh.. but that header isn't shipped either
18:33karolherbst: pqatsi: I think for this error you could file a bug against fedora
18:35pqatsi: karolherbst: I'll need think a way to boot with nouveau blacklisted in the fedora kernel, as I think may developers cannot accept a report against non-official kernel.
18:35karolherbst: boot with nouveau.modeset=0
18:35pqatsi: karolherbst: Anyways, I found this bug: https://bugzilla.redhat.com/show_bug.cgi?id=1758710
18:38pqatsi: karolherbst: Maybe this https://bodhi.fedoraproject.org/updates/FEDORA-2019-68d7f68507 patch arent applied to you kernel?
18:38karolherbst: I use the fedora sources
18:39pqatsi: weird :/
18:50pqatsi: karolherbst: As pointed in https://patchwork.openembedded.org/patch/155873/, a "cp -R /usr/src/debug/kernel-5.4.fc31/linux-5.4.11-9001.fc31.x86_64/security/selinux/include/ /usr/src/kernels/5.4.10-9001.fc31.x86_64/security/selinux/" worked. This is a upstream bug?
18:52pqatsi: same goes to "LANG=en_US cp -R /usr/src/debug/kernel-5.4.fc31/linux-5.4.11-9001.fc31.x86_64/tools/include/tools/be_byteshift.h /usr/src/kernels/5.4.10-9001.fc31.x86_64/tools/include/tools/" :(
21:30imirkin_: Lyude: since you're the resident DP expert ... can you look at the mail thread on nouveau@ subject 'Display broken after resume from suspend' ?
21:30Lyude: imirkin_: sure thing
21:30imirkin_: and see if there's (a) anything odd that jumps out at you and (b) any reason for the bpc to be messed up
21:30Lyude: oh no bpc
21:31imirkin_: is that like "oh, no bpc"? or "oh no! bpc!"? :)
21:31Lyude: second :P, I fixed some bpc related stuff recently
21:31imirkin_: it _seems_ like it gets the bpc wrong on resume
21:32imirkin_: but unclear how that would happen
21:32imirkin_: but also haven't read oodles of code
21:32Lyude: iirc on certain connectors sometimes the bpc can get changed between suspend/resume, although we're supposed to never change the bpc when applying duplicated states (so when we resume, we always keep the exact same bpp we had before suspending with the expectation that if connectors changed userspace will do a modeset to change it)
21:33pedahzur: Lyude: That's my thread. The latest kernel dump is two resumes. The first one works, the second one fails: laptop is awake, can ssh in, screen back-light on, but black screen.
21:33cosurgi: imirkin_: I didn't try this patch yet ( https://github.com/skeggsb/nouveau/commit/dd09ebc623e3b3f2ee1ebd9df53bb0754b1dc79b ) . Sorry.
21:33imirkin_: cosurgi: oh, no worries
21:33Lyude: pedahzur: what kernel is this btw?
21:33imirkin_: cosurgi: you're the one with SIBGUS's :)
21:33pedahzur: Lyude: One sec. I think it's in the top-of-thread e-mail.
21:33imirkin_: i'm in no rush
21:33pedahzur: Lyude: Linux joyful 5.3.0-26-generic #28-Ubuntu SMP Wed Dec 18 05:37:46 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
21:34cosurgi: imirkin_: I will let you know definitely. Yeah, I know ;)) Before I will be starting another round of calculations I will reboot with it.
21:34pedahzur: Ubuntu 19.10
21:34cosurgi: imirkin_: I want current calculations to finish first. Will take a couple more days.
21:34Lyude: ahh, pedahzur - think you could try a newer kernel and see if that fixes your issue?
21:34imirkin_: cosurgi: sure, whenever
21:35pedahzur: Lyude: Certainly could. I'm up to date on that box. From where do you want me to pull it?
21:35Lyude: pedahzur: for ubuntu uhhhh, one sec
21:35imirkin_: cosurgi: you should start a datacenter at your place
21:35imirkin_: get some quad-socket boards or something
21:37Lyude: pedahzur: I -think- this should be what you want, note I don't run ubuntu myself: https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.5-rc6/
21:37pedahzur: Lyude: I mean, this is a "scratch" system. I could try 20.04...but I'm not even sure that's Alpha yet. :)
21:38imirkin_: Lyude: are you talking about the thing where you added 8bpc caps?
21:38Lyude: pedahzur: I don't think you'll get a mainline kernel with 20.04 though
21:38Lyude: imirkin_: yeah
21:38imirkin_: that's just a hack that will happen to work out here
21:39Lyude: imirkin_: yeah, when I've got time i need to implement proper max bpc support
21:39imirkin_: but the issue here is that connector_info.bpc is wrong
21:39imirkin_: not the max bpc support.
21:40Lyude: imirkin_: yes, but-that patch series also moves the bpc into the atomic state so we use that when resuming a state, instead of connector_info.bpc
21:40imirkin_: (or at least, that's our guess ... coz it starts at 6, and then ends up with the 10 settings)
21:40imirkin_: aha, ok
21:40imirkin_: so that should be much better then
21:41imirkin_: Lyude: this one, right? https://github.com/skeggsb/nouveau/commit/880b2e7ab3a2aacb16ea48d2c92eb11b93cc59ea
21:41Lyude: imirkin_: bingo
21:41pedahzur: Lyude: I'll give it a go! :)
21:42imirkin_: pedahzur: alternatively patch the above into your tree
21:42imirkin_: yeah, looking at connector->display_info seems quite dodgy
21:42pedahzur: Yeah, if i can find an image it's already in, then I'm happy with that. :)
21:42pedahzur: So 880b2e7ab3a2aacb16ea48d2c92eb11b93cc59ea is in 5.5-rc6?
21:43Lyude: pedahzur: yep
21:43imirkin_: that commit hash is bogus
21:43Lyude: also ^
21:43imirkin_: but a proper variant against the linux kernel tree is there, i think
21:44imirkin_: that's an actual commit in the kernel tree.
21:53cosurgi: imirkin_: the electricity is damn too expensive. I prefer to put most of my servers in the university server room. Where electricity is free ;))
21:55pedahzur: Lyude: Welp...5.5.0-rc6. Same behavior. First resume works, second does not. Do you want more dmesg logs?
21:56Lyude: pedahzur: yeah, suspend/resume with "drm.debug=0x116 log_buf_len=50M" added to your kernel commandline at boot (remove the quotes)
21:56Lyude: then get me the dmesg from that
21:58imirkin_: oooh, 116, fancy
21:58imirkin_: everyone has their favorite drm.debug setting :)
21:58Lyude: imirkin_: hehe
21:59pedahzur: My favorite debug setting is ALL_THE_THINGS. :)
22:04pedahzur: Lyude: Only 432K...are you sure we have the right debug settings? :)
22:05Lyude: pedahzur: yep
22:05pedahzur: OK, with log len set to 50M, I was expecting more.:)
22:05Lyude: i just set it to 50M so I never have to ask for a log twice :P
22:06pedahzur: Lyude: Sent to the list.