00:13 imirkin: i think in general, most drivers aren't as affected, since they use u_vbuf
00:31 edgecase: hey I think I've got the same bug as https://bugzilla.kernel.org/show_bug.cgi?id=203033 on GT218 here
00:32 edgecase: not sure where is the best place to ask advice, i'm bisecting kernel now but it's an intermittent problem after linux 5.0.0
00:33 edgecase: one clue is [ 506.196840] [TTM] Buffer eviction failed
00:36 edgecase: stuck process is wine, wchan dma_fence_default_wait
00:38 edgecase: this is a regression from Ubuntu 19.04 -> 19.10. If I use the older 19.04 kernel v5.0.0, everything works. 19.10's kernel is 5.3, ees brok
01:14 edgecase: try latest from git https://github.com/skeggsb/nouveau.git
01:35 edgecase: doesn't build vs 5.5.7 ?
01:37 edgecase: https://nouveau.freedesktop.org/wiki/InstallNouveau/ suggests 5.5 should work?
01:45 edgecase: ugh full kernel compile
01:55 imirkin: edgecase: just go with upstream
02:30 edgecase: er, upstream is https://github.com/skeggsb/linux.git ?
02:35 imirkin: that or torvalds
02:35 imirkin: note that the master branch on ben's tree is ... silly
02:35 imirkin: you want to pick a linux-* branch
02:42 imirkin: alright, got all the EXT_texture_shader_lod tests to pass
02:43 edgecase: oh, i forgot to mention ubuntu's ppa kernel 5.5.7 is broken the same, so torvalds (= kernel.org stable yes?) would likely be the same?
03:19 edgecase: maybe someone can spell it out for me, if i use kernel.org mainline is that the newest, or would skaggsb branch linux-5.7 be newer?
03:20 imirkin: i think there could be something in the linux-5.7 branch which is newer
03:31 edgecase: git log from linux-5.7 branch shows from Dec 11, 2019, kernel.org mainline updated 14 days ago, so maybe not?
03:31 imirkin: can't really go by dates
03:31 imirkin: i could have developed a change 30 years ago
03:31 imirkin: that never went to mainline
03:32 edgecase: ah, pull-request getting merged, takes today's date?
03:33 imirkin: well, the merge commit is the merge commit
03:33 imirkin: but the actual commits have whatever date they have
03:35 edgecase: sorry, I'll work on git training elsewhere... I just want to try the newest code to see if the issue is resolved. Kind of like a cave man who wants the biggest fire, because fire is powerful.
03:36 edgecase: will compile linux-5.7 branch and see how well it burns.
03:37 edgecase: I am amazed that nouveau works so well, RE isn't easy!
03:40 imirkin: most people are amazed it works so poorly
03:40 edgecase: drm/nouveau: zero vma pointer even if we only unreference it rather than free
03:40 edgecase: I'm not sure this affects anything, but best be safe.
03:40 edgecase: from log msg, sounds promising
03:41 edgecase: sounds vaguely like dmesg i got
03:41 imirkin: dunno if it's made it to the main branch, but i think this change could also help with some stability issues: https://github.com/skeggsb/nouveau/commit/ed875d952dc099516c467b04e0070fb84a60c71b
03:52 edgecase: any relation to [ 20.532562] cma: cma_alloc: alloc failed, req-size: 1 pages, ret: -12 ??
03:52 imirkin: cma? probably not
03:53 edgecase: that was around linux 5.1.21. anywhoo, will compile skeggsb's and try
12:48 edgecase: well new error, that's progress I guess: [ 188.332109] refcount_t: underflow; use-after-free.
12:49 edgecase: [ 188.332272] ttm_bo_put+0x27e/0x370 [ttm]
12:49 edgecase: [ 188.332357] nouveau_gem_new+0x8a/0x120 [nouveau]
12:55 edgecase: last I heard, debugging memory management and heaps = not fun
12:57 edgecase: is there an ElectricFence for ttm?
13:49 edgecase: what should I do next to try and solve this?
15:01 imirkin_: edgecase: there's KASAN and some other thing too
15:01 imirkin_: refcount underflow sounds plausible for this issue ... skeggsb is probably going to be the only one capable of tracking it down though =/
15:02 imirkin_: (and his availability varies greatly... sometimes he's super-responsive, and sometimes he's entirely awol)
15:02 imirkin_: [and other times it's awl, i guess... heh.]
16:48 edgecase: should I rebuild kernel with KASAN on?
16:49 imirkin_: unfortunately i don't have any concrete advice
16:49 imirkin_: sorry!
17:03 edgecase: np. KASAN claims to help debug SLAB and SLUB allocators, but maybe TTM is using something else?
17:03 edgecase: i could try and bisect again, I think I skipped past the problem because there was some randomness about it
17:04 imirkin_: it's likely the problem has always been there
17:04 imirkin_: but just became more triggerable
17:45 edgecase: aye
19:06 Lyude: imirkin_: curious, do you have any idea where you got the numbers in get_tmds_link_bandwidth from?
19:07 imirkin_: yeah, not 100% sure.
19:07 imirkin_: which numbers specifically?
19:07 Lyude: if (drm->client.device.info.family >= NV_DEVICE_INFO_V0_KEPLER) return 297000;
19:07 imirkin_: yeah, that's the most suspect of them all
19:07 imirkin_: vs e.g. 300 or 340 for the later gens
19:08 imirkin_: i think it was a combination of thin air, looking at specs, and looking at what blob did
19:08 Lyude: what if I said 570425
19:08 imirkin_: then you'd be mistaken
19:08 imirkin_: that's too high for hdmi without the hdmi 2.0 stuff
19:08 imirkin_: max is 340mhz
19:09 Lyude: mhh, there's definitely something weird with these clock caps then
19:09 imirkin_: with hdmi 2.0, you can go up to 600mhz iirc
19:09 imirkin_: to achieve that you need to enable scrambling and tmds clock division, both of which require cooperation of the sink
19:09 Lyude: note however, the evo channel doesn't actually hang when setting pixel clocks around that value
19:10 Lyude: (gtx760 here, jfyi)
19:10 Lyude: (nve4)
19:10 imirkin_: yeah
19:10 imirkin_: hdmimhz can be used to override whatever
19:10 imirkin_: but the rating of the cable/etc is for 340mhz
19:10 imirkin_: or lower
19:11 Lyude: oh, you know what
19:12 Lyude: i wonder if the 570425 limit is for dual-link dvi
19:12 imirkin_: no, that's 330mhz
19:12 imirkin_: 165mhz per link
19:12 Lyude: huh, we're reading the number wrong then :s
19:12 imirkin_: this is basically what i went through when trying to find values in the bios
19:12 imirkin_: and the end result is the code you see in front of you =]
20:53 edgecase: imirkin_ so after 18 hours, the frozen graphics unfroze!
20:54 edgecase: i was compiling last night and this morning, then left it for a few hours, came back, it's working!
20:58 imirkin_: problem solved!
20:58 imirkin_: just wait 18 hours, and everything is fine
20:58 imirkin_: working as intended :)
21:04 edgecase: are you russian by any chance O.o
21:05 imirkin_: kinda
21:06 edgecase: hmm, i should see what it's doing, maybe it's not retrying in a loop, and eventually succeeds?
21:07 imirkin_: there's various recovery
21:07 imirkin_: it works relatively poorly
21:07 imirkin_: could have been a fence that finally timed out. dunno
21:07 edgecase: that [TTM] Buffer eviction failed, seems like a clue
21:07 imirkin_: that means it's pinned
21:08 imirkin_: (i think)
21:08 imirkin_: or you're out of free system memory
21:08 edgecase: could be the timeout is inside a loop
21:08 imirkin_: i'm not very familiar with that logic, unfortunately
21:09 edgecase: another idea, maybe there are ttm/drm improvements in linux-mainline, not in skeggsb's tree?
21:10 imirkin_: his tree is based on mainline.
21:10 edgecase: maybe it's a bit behind? kernel version in what i built is 5.5.0-rc2+
21:11 imirkin_: it'll get rebased on 5.6 eventually
21:12 edgecase: 12 daysMerge branch 'linux-5.6' of git://github.com/skeggsb/linux into drm-fixesDave Airlie5-0/+48
21:13 edgecase: that's a commit message in mainline
21:14 edgecase: but i'm using branch linux-5.7 ?
21:14 imirkin_: that's more "for-linux-5.7"
21:15 imirkin_: i.e. commits that will be sent to dave when the 5.7-rc window opens
21:15 edgecase: maybe i should use skeggsb's linux-5.6 branch?
21:15 edgecase: since that's what is being merged into mainline?
21:31 edgecase: git doesn't tell you what people's workflow is, osmosis does
21:32 edgecase: ok so exit program, start again, appears frozen. yay for reproducibility!
21:33 edgecase: [31689.515680] Trying to vfree() bad address (0000000098832c10)
21:33 edgecase: [31689.515697] WARNING: CPU: 3 PID: 1940 at mm/vmalloc.c:2282 __vunmap+0x1ec/0x210
21:33 edgecase: heh heh it's confused and angry now
21:34 edgecase: Mar 2 16:30:07 hammer kernel: [31568.635057] nouveau 0000:01:00.0: fb: trapped read at 0000e30000 on channel -1 [1fec0000 unknown] engine 06 [BAR] client 08 [PFIFO_READ] subclient 00 [FB] reason 00000002 [PAGE_NOT_PRESENT]
21:56 ambienthero: if someone has shown ones superority in thinking while having resillient resistance against getting knocked down by many persistent agents from various countries, how hard is to anticipate when sanctioned 10years in a row at my home country, that my glamorious breakathrough will end up in/during this perior? You did not have any success still in forcing or framing in the meds to my life!
21:58 ambienthero: What happens is our corrupted scammers will be removed from "governing" our country and justifyingly defense police gives a real act also a license to kill , that was your only result.