00:13 imirkin: i think in general, most drivers aren't as affected, since they use u_vbuf
00:31 edgecase: hey I think I've got the same bug as https://bugzilla.kernel.org/show_bug.cgi?id=203033 on GT218 here
00:32 edgecase: not sure where is the best place to ask advice, i'm bisecting kernel now but it's an intermittent problem after linux 5.0.0
00:33 edgecase: one clue is [ 506.196840] [TTM] Buffer eviction failed
00:36 edgecase: stuck process is wine, wchan dma_fence_default_wait
00:38 edgecase: this is a regression from Ubuntu 19.04 -> 19.10. If I use the older 19.04 kernel v5.0.0, everything works. 19.10's kernel is 5.3, ees brok
01:14 edgecase: try latest from git https://github.com/skeggsb/nouveau.git
01:35 edgecase: doesn't build vs 5.5.7 ?
01:37 edgecase: https://nouveau.freedesktop.org/wiki/InstallNouveau/ suggests 5.5 should work?
01:45 edgecase: ugh full kernel compile
01:55 imirkin: edgecase: just go with upstream
02:30 edgecase: er, upstream is https://github.com/skeggsb/linux.git ?
02:35 imirkin: that or torvalds
02:35 imirkin: note that the master branch on ben's tree is ... silly
02:35 imirkin: you want to pick a linux-* branch
02:42 imirkin: alright, got all the EXT_texture_shader_lod tests to pass
02:43 edgecase: oh, i forgot to mention ubuntu's ppa kernel 5.5.7 is broken the same, so torvalds (= kernel.org stable yes?) would likely be the same?
03:19 edgecase: maybe someone can spell it out for me, if i use kernel.org mainline is that the newest, or would skaggsb branch linux-5.7 be newer?
03:20 imirkin: i think there could be something in the linux-5.7 branch which is newer
03:31 edgecase: git log from linux-5.7 branch shows from Dec 11, 2019, kernel.org mainline updated 14 days ago, so maybe not?
03:31 imirkin: can't really go by dates
03:31 imirkin: i could have developed a change 30 years ago
03:31 imirkin: that never went to mainline
03:32 edgecase: ah, pull-request getting merged, takes today's date?
03:33 imirkin: well, the merge commit is the merge commit
03:33 imirkin: but the actual commits have whatever date they have
03:35 edgecase: sorry, I'll work on git training elsewhere... I just want to try the newest code to see if the issue is resolved. Kind of like a cave man who wants the biggest fire, because fire is powerful.
03:36 edgecase: will compile linux-5.7 branch and see how well it burns.
03:37 edgecase: I am amazed that nouveau works so well, RE isn't easy!
03:40 imirkin: most people are amazed it works so poorly
03:40 edgecase: drm/nouveau: zero vma pointer even if we only unreference it rather than free
03:40 edgecase: I'm not sure this affects anything, but best be safe.
03:40 edgecase: from log msg, sounds promising
03:41 edgecase: sounds vaguely like dmesg i got
03:41 imirkin: dunno if it's made it to the main branch, but i think this change could also help with some stability issues: https://github.com/skeggsb/nouveau/commit/ed875d952dc099516c467b04e0070fb84a60c71b
03:52 edgecase: any relation to [ 20.532562] cma: cma_alloc: alloc failed, req-size: 1 pages, ret: -12 ??
03:52 imirkin: cma? probably not
03:53 edgecase: that was around linux 5.1.21. anywhoo, will compile skeggsb's and try
12:48 edgecase: well new error, that's progress I guess: [ 188.332109] refcount_t: underflow; use-after-free.
12:49 edgecase: [ 188.332272] ttm_bo_put+0x27e/0x370 [ttm]
12:49 edgecase: [ 188.332357] nouveau_gem_new+0x8a/0x120 [nouveau]
12:55 edgecase: last I heard, debugging memory management and heaps = not fun
12:57 edgecase: is there an ElectricFence for ttm?
13:49 edgecase: what should I do next to try and solve this?
15:01 imirkin_: edgecase: there's KASAN and some other thing too
15:01 imirkin_: refcount underflow sounds plausible for this issue ... skeggsb is probably going to be the only one capable of tracking it down though =/
15:02 imirkin_: (and his availability varies greatly... sometimes he's super-responsive, and sometimes he's entirely awol)
15:02 imirkin_: [and other times it's awl, i guess... heh.]
16:48 edgecase: should I rebuild kernel with KASAN on?
16:49 imirkin_: unfortunately i don't have any concrete advice
16:49 imirkin_: sorry!
17:03 edgecase: np. KASAN claims to help debug SLAB and SLUB allocators, but maybe TTM is using something else?
17:03 edgecase: i could try and bisect again, I think I skipped past the problem because there was some randomness about it
17:04 imirkin_: it's likely the problem has always been there
17:04 imirkin_: but just became more triggerable
17:45 edgecase: aye
19:06 Lyude: imirkin_: curious, do you have any idea where you got the numbers in get_tmds_link_bandwidth from?
19:07 imirkin_: yeah, not 100% sure.
19:07 imirkin_: which numbers specifically?
19:07 Lyude: if (drm->client.device.info.family >= NV_DEVICE_INFO_V0_KEPLER) return 297000;
19:07 imirkin_: yeah, that's the most suspect of them all
19:07 imirkin_: vs e.g. 300 or 340 for the later gens
19:08 imirkin_: i think it was a combination of thin air, looking at specs, and looking at what blob did
19:08 Lyude: what if I said 570425
19:08 imirkin_: then you'd be mistaken
19:08 imirkin_: that's too high for hdmi without the hdmi 2.0 stuff
19:08 imirkin_: max is 340mhz
19:09 Lyude: mhh, there's definitely something weird with these clock caps then
19:09 imirkin_: with hdmi 2.0, you can go up to 600mhz iirc
19:09 imirkin_: to achieve that you need to enable scrambling and tmds clock division, both of which require cooperation of the sink
19:09 Lyude: note however, the evo channel doesn't actually hang when setting pixel clocks around that value
19:10 Lyude: (gtx760 here, jfyi)
19:10 Lyude: (nve4)
19:10 imirkin_: yeah
19:10 imirkin_: hdmimhz can be used to override whatever
19:10 imirkin_: but the rating of the cable/etc is for 340mhz
19:10 imirkin_: or lower
19:11 Lyude: oh, you know what
19:12 Lyude: i wonder if the 570425 limit is for dual-link dvi
19:12 imirkin_: no, that's 330mhz
19:12 imirkin_: 165mhz per link
19:12 Lyude: huh, we're reading the number wrong then :s
19:12 imirkin_: this is basically what i went through when trying to find values in the bios
19:12 imirkin_: and the end result is the code you see in front of you =]
20:53 edgecase: imirkin_ so after 18 hours, the frozen graphics unfroze!
20:54 edgecase: i was compiling last night and this morning, then left it for a few hours, came back, it's working!
20:58 imirkin_: problem solved!
20:58 imirkin_: just wait 18 hours, and everything is fine
20:58 imirkin_: working as intended :)
21:04 edgecase: are you russian by any chance O.o
21:05 imirkin_: kinda
21:06 edgecase: hmm, i should see what it's doing, maybe it's not retrying in a loop, and eventually succeeds?
21:07 imirkin_: there's various recovery
21:07 imirkin_: it works relatively poorly
21:07 imirkin_: could have been a fence that finally timed out. dunno
21:07 edgecase: that [TTM] Buffer eviction failed, seems like a clue
21:07 imirkin_: that means it's pinned
21:08 imirkin_: (i think)
21:08 imirkin_: or you're out of free system memory
21:08 edgecase: could be the timeout is inside a loop
21:08 imirkin_: i'm not very familiar with that logic, unfortunately
21:09 edgecase: another idea, maybe there are ttm/drm improvements in linux-mainline, not in skeggsb's tree?
21:10 imirkin_: his tree is based on mainline.
21:10 edgecase: maybe it's a bit behind? kernel version in what i built is 5.5.0-rc2+
21:11 imirkin_: it'll get rebased on 5.6 eventually
21:12 edgecase: 12 daysMerge branch 'linux-5.6' of git://github.com/skeggsb/linux into drm-fixesDave Airlie5-0/+48
21:13 edgecase: that's a commit message in mainline
21:14 edgecase: but i'm using branch linux-5.7 ?
21:14 imirkin_: that's more "for-linux-5.7"
21:15 imirkin_: i.e. commits that will be sent to dave when the 5.7-rc window opens
21:15 edgecase: maybe i should use skeggsb's linux-5.6 branch?
21:15 edgecase: since that's what is being merged into mainline?
21:31 edgecase: git doesn't tell you what people's workflow is, osmosis does
21:32 edgecase: ok so exit program, start again, appears frozen. yay for reproducibility!
21:33 edgecase: [31689.515680] Trying to vfree() bad address (0000000098832c10)
21:33 edgecase: [31689.515697] WARNING: CPU: 3 PID: 1940 at mm/vmalloc.c:2282 __vunmap+0x1ec/0x210
21:33 edgecase: heh heh it's confused and angry now
21:34 edgecase: Mar 2 16:30:07 hammer kernel: [31568.635057] nouveau 0000:01:00.0: fb: trapped read at 0000e30000 on channel -1 [1fec0000 unknown] engine 06 [BAR] client 08 [PFIFO_READ] subclient 00 [FB] reason 00000002 [PAGE_NOT_PRESENT]
