10:19karolherbst: imirkin: do you want to review the nv30 patch or should I just push? https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12054
14:40imirkin_: i thought you were going to remove the map += start thing
14:40imirkin_: karolherbst: --^
14:49karolherbst: imirkin_: mhh.. yeah.. I have no opinion on how we should fix it :D What do you prefer?
14:49imirkin_: i'd prefer nuking the map += start thing, and passing in the start argument
14:49imirkin_: let's look at how it's done on nv50
14:50imirkin_: karolherbst: hrmph
14:50imirkin_: nv50 has the map += start thing
14:50imirkin_: const void *data = info->index.user;
14:50imirkin_: leave it how you have it.
14:51karolherbst: :) okay
15:16imirkin_: karolherbst: will you be doing more nv30 fixing?
15:16imirkin_: if you want to be scared, run dEQP-GLES2 :)
15:17karolherbst: I did it once for my mt fixes I think? :D
15:17karolherbst: but yeah..
15:17karolherbst: wasn't looking great
15:17karolherbst: I wanted to fix my MT issue with resized .text https://gist.githubusercontent.com/karolherbst/e16c0a3c3e423aaf3baee79604e030a0/raw/c1ad2e3265771ac401e3cdf639dddcad0bbaa1db/gistfile1.txt
15:18karolherbst: now I hit this :)
15:18imirkin_: you're just shuffling this stuff around.
15:18imirkin_: moving tsan error from one spot to another
15:18karolherbst: that's asan :D
15:18imirkin_: what nobody told you: tsan errors can't be eliminated!
15:18imirkin_: yeah, they morph from one into another
15:18imirkin_: resilient buggers
15:19karolherbst: accessing a fence after it got destroyed...
15:19karolherbst: well that shouldn't happen
15:19imirkin_: tsan error afterall!
15:20karolherbst: oh wow
15:20karolherbst: imirkin_: there is no second context :)
15:21imirkin_: tsan error without threads
15:21HdkR: Multiple threads, one context :D
15:21karolherbst: what the
15:21karolherbst: imirkin_: check the backtraces very carefully...
15:22karolherbst: so one of the work items killed the fence
15:22imirkin_: HdkR: i prefer the other way
15:22imirkin_: HdkR: that's actually not legal :) there's a MESA ext for that, but nobody uses it
15:22imirkin_: (and i think even MESA stopped implementing it)
15:23HdkR: MakeCurrent enough and anything is possible
15:23karolherbst: using the same context from multiple threads without making it current?
15:23HdkR: That mesa extension was wacky
15:24karolherbst: imirkin_: here is what happens: we have too many work items and kick the fence, which might end up kicking the pushbuffer, which might end up derefing the fence and delete it :)
15:24karolherbst: I am sure this happens without my patches as well
15:25imirkin_: karolherbst: i think i limit the number of work items per fence
15:25karolherbst: yeah, we do
15:25karolherbst: to 64
15:25imirkin_: that was required since otherwise some tests would generate infinity work items for the fence
15:25karolherbst: then we kick it out
15:26imirkin_: i was't SUPER happy about adding it
15:26imirkin_: but i also couldn't come up with a genius other solution
15:26karolherbst: imirkin_: the thing is.. I just reference memory after that happens
15:26karolherbst: this is new
15:26karolherbst: should be super easy to fix
15:34karolherbst: k.. fixed :D
15:35karolherbst: maybe I should run the entire CTS with libasan enabled....
15:36imirkin_: enjoy debugging CTS issues
15:36imirkin_: although i think anholt did it
15:36imirkin_: so perhaps CTS is more fixed up than it might otherwise be
15:36karolherbst: well atm I have fun with cs:go....
15:37karolherbst: the key is to not compile the CTS with libasan
15:37karolherbst: then you only see mesa+libdrm issues
15:52felco: Hello guys, I have a GTX650Ti and I having a bad time with nouveau in Debian 10 kernel 5.10bp... I getting lots of nouveau channel <x> killed kernel messages, whenever that happen my screen hang and mostly likely all work loaded in the X
15:53felco: I have two kernel traces, if that helps
15:53imirkin_: karolherbst: do you remember what kernel nouveau got broken in?
15:54imirkin_: (the GEM stuff)
16:01karolherbst: imirkin_: ehh.. 5.14?
16:01karolherbst: or did you mena something else?
16:01karolherbst: we have so many regressions, I lose track
16:01karolherbst: felco: does it happen like quickly or just after using your machine for a while?
16:02imirkin_: karolherbst: the GEM thing was around for a while i thought
16:02imirkin_: a couple of releases at least
16:02karolherbst: could be
16:02karolherbst: but if it got fixed it should have been backported or it's simply too new
16:03felco: That is though question, because it varies a lot, but I have a feel that whenever the hardware acceleration is used the chances of a hang increases
16:03karolherbst: yeah well, that much is obvious
16:03karolherbst: channel killed essentially means we fed up the GPU
16:03karolherbst: by doing wrong things
16:04karolherbst: usually it helps to figure out what applications are triggering it
16:04felco: Like if I don't use RDP or Firefox/Chrome, at may not hang, but then one time it hanged after a gnome-shell thread did something nouveau didn't liked
16:04felco: I have two traces
16:05karolherbst: yeah.. the nouveau messages before the killed channel message might help
16:05felco: Let me put it in a pastebin-like service, which one you guys like to use?
16:05karolherbst: whatever users are using
16:08karolherbst: this bug...
16:08karolherbst: should have been more careful about the GPU model...
16:08karolherbst: uhhh, yeah
16:08karolherbst: that's an annoying one
16:08karolherbst: we have no idea what's causing that :)
16:08karolherbst: it seems like all gk10x GPUs are somewhat affected
16:08karolherbst: some more some less
16:08felco: That is ryzen fault?
16:09karolherbst: it's probably a nouveau bug, we just have no idea what exactly atm
16:09karolherbst: I was able to hit this on one of my GPUs recently, but didn't find time to investigate
16:09felco: On my GTX1080 the same things happens
16:09karolherbst: I'd assume on your GTX1080 you see different errors
16:10felco: I can't tell... But I may get futher into it later
16:10felco: Is that helpful?
16:10karolherbst: not sure. depends more on the context. Once you have logs from your gtx 1080 as well, seeing the full dmesg might help
16:10felco: Ok, I will keep that in mind
16:14imirkin_: unfortunately there are about 10000 diff issues which manifest in "channel hang" or "screen hang"
16:27felco: I shooting in every direction here... there is any chance that using a DisplayPort 2 HDMI adapter may cause an unexpected issue leading to these behaviors?
16:28felco: I use this on both cards
16:29felco: Also I using two monitors, one HDMI out one is on the adapter on DP
16:30felco: What really intrigues me is that I have used that same GTX650Ti for years with nouveau without issues
16:39karolherbst: felco: so it started recently?
16:40karolherbst: might be that you hit a different issue afterall
16:40karolherbst: but always hard to say
16:40karolherbst: what helps if you can say: kernel version a is completely reliable, anf kernel version b is not
16:40karolherbst: and then one could figure out what commit broke it
16:43felco: I think I would need to go back too much, like kernel 2.x on Ubuntu, that is what I used back there
16:43felco: But I not even sure, I would need to dig a bit
16:44felco: But I can't do that right now, and I need another disk to setup that system
16:44felco: Another thing changed, I using UEFI
16:45karolherbst: one problem might be more applications using OpenGL and context switching issues are getting more likely
17:15felco: seems that disable accel stabilizes the system, lets see if that stands true
17:18felco: I can pretty much trigger it by opening up a video stream on Chrome and moving a window in front of the video
17:18karolherbst: ohh, interesting
17:20felco: And I can't be sure of that, but using nvidia blobs I was getting a computer freeze kind of thing going... So I moved to nouveau so I could at least see something
17:21felco: I was suspecting of a hardware issue, but replacing the 1080 didn't helped, and using Windows is out of question
17:22felco: But on Windows the machine rocks for days without a glitch
17:26karolherbst: yeah.. not saying it's not a nouveau issue, such issues are just super hard to track down unless you have a very reliable and very quick way of reproducing
17:35felco: oh yeah, who needs accel I play quake ^^
17:37felco: Disabling accel stabilizes the system and it seems more responsive in general
19:18ciscon: raket and i also use nouveau to play quake, but because of vidlag issues (and we're definitely reclocking)
19:34ciscon: i can say that i had issues with the "proper" bios on the card when using the blob, with certain high gpu load applications it'd throw out errors and depending on the application either freeze for a bit or crash out. changing to a "better" bios fixed it though
21:11felco: I running pretty solid since I disabled accel
21:11felco: I using nouveau.noaccel=1 nouveau.nofbaccel=1
21:11felco: I just ran Quake for a bit and man... who cares the games goes just fine
21:15felco: I think I have software accel because glxgears works, but vainfo and vdpauinfo doesn't work at all
22:05karolherbst: felco: fbaccel should be fine though
22:07felco: I guess... and maybe I could use some sort of FB driver for xorg?