00:33 imirkin: pmoreau: btw, probably just have to purge on store. for a load, just skip it (without purging)
01:40 DMJC: I have a 460gtx SLI setup I don't use
01:41 DMJC: are there any nouveau developers in Australia that would be interested in hacking SLI with these?
06:15 ask6155: hello
06:16 ask6155: when running minecraft with mods I get this error: java: ../libdrm-2.4.105/nouveau/pushbuf.c:723: nouveau_pushbuf_data: Assertion `kref' failed.
06:16 ask6155: this crashes minecraft instantly
06:16 ask6155: help?
06:16 imirkin: ask6155: iirc minecraft is multithreaded, which upsets some things.
06:17 ask6155: is there a workaround or a fix?
06:18 imirkin: i believe karolherbst has been working on a "fix" for multithreading issues. not 100% sure that this would help your particular problem
06:18 imirkin: you'd need to build mesa from source to get try it
06:19 ask6155: so the 'fix' has been released but has been commited to mesa?
06:19 ask6155: how hard is it to build mesa?
06:19 imirkin: ask6155: no, the fix is being worked on
06:20 imirkin: but you can grab the patches and build your own copy
06:20 ask6155: aah
06:20 imirkin: (this is an issue which has existed since the dawn of time, so it's not exactly a new thing, nor particularly easy to fix)
06:20 ask6155: how hard is it to build mesa? I can compile programs from source but I don't quite understand what mesa is so...
06:21 imirkin: i mean ... pretty easy - run a few commands, have a good time.
06:21 imirkin: not sure what your level is
06:21 imirkin: so hard to judge
06:21 ask6155: i mean I can compile a binary in a folder and run it but is mesa like that too?
06:21 imirkin: the fact that you refer to the word 'folder' makes me more worried.
06:22 imirkin: mesa is a software package
06:22 ask6155: I'm talking about static binaries
06:22 imirkin: in this case you'd have to clone a git tree
06:22 ask6155: I build them sometimes for fun
06:23 ask6155: anyways is there a website with build instructions?
06:23 imirkin: and then you run a few commands to build the libs which will land them in some location-of-choice, and then when running minecraft you'd point at those libs instead
06:23 imirkin: probably
06:23 imirkin: i hear google is good at finding things on the internet
06:23 ask6155: ah ok
06:24 imirkin: afaik this is the branch you want: https://gitlab.freedesktop.org/karolherbst/mesa/-/commits/nvc0_threading
06:25 imirkin: (actually ... stupid question ... what GPU do you have?)
06:25 ask6155: GT 730
06:25 imirkin: yeah ok, that should be fine
06:26 imirkin: man, what is it with everyone getting GT 730's?
06:26 imirkin: you're like the third person with a GT 730 in the past few days
06:26 ask6155: I dunno why others are getting it but I got it because I'm stupid
06:27 ask6155: should've spent my money on an amd cpu with the same amount of gpu performance...
06:29 imirkin: :)
06:30 imirkin: certainly what i would have recommended, yes
06:30 imirkin: is it at least the GK208 GT 730? or is it a GF108?
06:30 ask6155: GK208
06:32 imirkin: ok, that's good at least
06:32 ask6155: okay got the source, switched to nvc0_threading
06:32 imirkin: you get reclocking. so you can crash faster :)
06:36 ask6155: i did meson ..
06:36 ask6155: how do I compile them to a different folder and not overwrite my own libs?
06:36 imirkin: --prefix=/home/ask6155/install
06:39 imirkin: and then run minecraft with LD_LIBRARY_PATH=/home/ask6155/install/lib64
06:40 ccr: you'll also may want to set LIBGL_DRIVERS_PATH
06:42 imirkin: nope
06:42 imirkin: don't set that
06:42 imirkin: ever
06:43 ccr: hmm
06:43 ccr: I've had to set it
06:43 ask6155: this might take some time
06:44 imirkin: ccr: nope
06:44 imirkin: setting it just confuses things greatly
06:44 imirkin: don't set it
06:44 imirkin: ever.
06:45 ccr: okay. perhaps it's an artifact from Mesa's past, but I'm 100% sure that I've had to set it for test builds (that I install under /opt)
06:46 ccr: because things didn't work without it
06:47 ccr: imirkin, you are correct that I don't seem to need it with current Mesa at least. hrm.
06:47 imirkin: ccr: that's because you didn't install libGL
06:47 imirkin: and were trying to point the dir to a non-install directory
06:47 imirkin: don't do that.
06:47 imirkin: that's the only case where that thing is useful
06:47 imirkin: but don't do that
06:49 ccr: heh. well, I wish I knew what I did wrong back then .. how could I manage to not install libGL?
06:49 ccr: anyway, sorry for confusing things.
06:50 ccr: oh, wait, I think I know what happened. it was probably some lib/ vs lib64/ thing.
06:50 imirkin: the LIBGL_* thing is only needed when you're doing "weird shit". if you're just installing + using that installed thing, that's not weird :)
06:56 ccr: Weird Science
07:10 ask6155: it is... done
07:11 ask6155: inside install there is lib share and include
07:12 ask6155: there is no lib64
07:14 ccr: just use LD_LIBRARY_PATH=/home/ask6155/install/lib then. lib/ should contain libGL etc
07:14 ask6155: I'm using multimc and as a wrapper command I'm adding `LD_LIBRARY_PATH=/home/ask6155/install/lib`
07:15 ask6155: nope that failed
07:15 ccr: failed how?
07:16 ask6155: well the wrapper thing failed
07:16 ask6155: maybe I have to pass it as a java arg?
07:16 ccr: no
07:16 ccr: just "LD_LIBRARY_PATH=/home/ask6155/install/lib multimc"
07:16 imirkin: lib4
07:16 imirkin: lib64
07:16 ask6155: yeah but that is for multimc
07:17 ccr: [10:12] <ask6155> there is no lib64
07:17 imirkin: weird
07:17 ask6155: multimc launches minecraft from within
07:17 ccr: well, usually the environment variables get inherited
07:18 ccr: you could try (assuming bash or similar): export LD_LIBRARY_PATH=/home/ask6155/install/lib
07:18 ask6155: ok I'll try that then
07:18 ccr: and after that run whatever
07:23 imirkin: if multimc launches minecraft, you may have to teach it about the extra env var
07:30 ask6155: well I had to export ld_library to my profile and something different has happend
07:30 ask6155: it had a different crash
07:30 imirkin: yay!
07:30 ask6155: it crashed with sigsegv
07:31 ccr: it's progress, at least :P
07:31 ask6155: problematic frame: nouveau_dri.so+0x981b2d
07:32 ask6155: is the coredump of any use?
07:35 imirkin: ideally you could explain your exact software to karolherbst and he can take a look when he gets up
07:35 imirkin: speaking of
07:35 imirkin: it's time i went to sleep
07:36 imirkin: good luck
07:37 ask6155: well I'm out of time now... I come tomorrow when he'll be awake I guess
10:51 karolherbst: imirkin: minecraft does use multithreading
10:52 karolherbst: but the code isn't complete so asserts might be hit
10:57 karolherbst: imirkin: btw, still want to land those patches already: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8765
12:39 RSpliet: What in the hell... nouveau just went from rock solid to crash-within-minutes, without any updates today except usbredir.
12:39 RSpliet: I'll post some logs later tonight.
12:52 karolherbst: RSpliet: sometimes the position of the moon affect things
12:55 RSpliet: Ah, in that case let me just nuke the moon. That'll solve things!
12:56 RSpliet: Nobody'll miss it, right?
13:03 karolherbst: I am sure it would be fine
13:07 ccr: oh noes, my secret evil moonbase!
15:09 RSpliet: https://paste.centos.org/view/2a1ea77d
15:10 RSpliet: I think web_content is firefox... but with everything being an electron app these days I'm not holding my hand in the fire for that one
15:12 ccr: yeah, 'Web Content' is a FF thread typically
15:12 ccr: Electron apps are based on Chrome/Chromium
15:13 RSpliet: https://paste.centos.org/view/cd6a3f81
15:13 RSpliet: Then it was my entire Wayland session disappearing, nothing happening but a scroll-up in Pidgin due to a new msg
16:22 karolherbst: RSpliet: not sure if FF is threaded these days
16:22 karolherbst: but yeah...
16:23 karolherbst: our recovery from broken channels is... bad
16:23 karolherbst: I want to fix it after I fixed multithreading
16:24 pmoreau: How hard could it be to store a constant at a constant address in shared, read from that fixed location, and store it in global (for char3, given that char2 works in more complex scenarios)? Apparently a lot harder than I would think… 😅
16:59 pmoreau: Sneaky ConstantFolding for SPLIT transforming `mov u32 %r40 0x00000001; split u32 { %r41s %r42s } %r40` into `mov u32 %r41s 0x00000001` because it had TYPE_U32 hardcoded in it; the storing/loading/storing of constants now works.
17:00 imirkin: pmoreau: what's the problem with that?
17:00 imirkin: oh, type = u32
17:00 pmoreau: :-)
17:00 imirkin: i made that work a while back
17:00 imirkin: mostly to make mul's work better
17:00 imirkin: s/better/with immediates/
17:01 imirkin: i wasn't always super-careful with that stuff
17:01 imirkin: "sorry"
17:01 pmoreau: No worries
17:02 imirkin: on the bright side, half-reg emission should be moderately solid now
17:02 pmoreau: I realised today that char2 was working purely by mistake yesterday, since instead of giving a byte size to mkSplit, I was giving it TYPE_U16… 🤦
17:03 karolherbst: imirkin: btw, I have a "fun" solution for this recursive locking thing... just not sure how "safe" it is
17:03 karolherbst: https://gist.github.com/karolherbst/6a1325b3d33e3e26ae6894f7ddca883b
17:03 karolherbst: skeggsb mentioned that mesa shouldn't use kick_notify for that kind of stuff
17:04 karolherbst: and that change _seems_ to work just fine
17:04 karolherbst: just unsure if there are weird side affects
17:04 imirkin: is it "yolo"? :)
17:04 imirkin: coz that's my favorite solution to any locking problem
17:05 karolherbst: or if there are more places we have to
17:05 karolherbst: imirkin: not quite
17:06 imirkin: mmmmmmmmmmmmmmmmm
17:06 karolherbst: yep...
17:06 imirkin: it miiiight
17:06 imirkin: it's a free country!
17:06 karolherbst: I think we have to do that in a couple more cases _or_ we move that into places where we kick it explicitly
17:06 karolherbst: and just ignore implicit fluches
17:07 karolherbst: uhm..
17:07 karolherbst: submissions
17:07 imirkin: bah, can't find the simpsons clip
17:11 RSpliet: karolherbst: With a bit of luck I don't need it after we get MT rolling proper :-)
17:20 karolherbst: ahh heh.. yeah, that doesn't work if there is no flush
20:01 pmoreau: Note to self: after two tests triggering “gr: TRAP_MP - TP0: 00000001 [LOCAL_LIMIT_READ]”, it is time to reset the GPU cause it will refuse to do anything more.
20:01 karolherbst: :D
20:01 karolherbst: happens
20:02 imirkin: pmoreau: eh, i haven't really seen that
20:02 pmoreau: What would be the best to reset it without restarting (and hopefully without rmod’ing nouveau, cause I need for my iGPU to drive the screen)?
20:03 imirkin: you can unbind + bind nouveau to the device
20:03 imirkin: or rather, other way around
20:04 pmoreau: I get a bunch of “gr: TRAP_MP_EXEC - TP 0 MP 0: 00000008 [TIMEOUT] at 0003d0 warp 5, opcode 303f01fd 6c0087c8” for any additional test I try to run, some WARN in nouveau_bo_move, etc.
20:04 pmoreau: How do I bind/unbind a device to a driver? That is something new to me
20:05 imirkin: cd /sys/somewhere
20:05 imirkin: echo 0.1.23 > unbind
20:05 imirkin: etc
20:08 pmoreau: Thanks, found it: /sys/bus/pci/drivers/nouveau
20:08 imirkin: now you can just ls in that dir
20:08 imirkin: and the symlink names are what you can echo into unbind
20:09 imirkin: and then the same value back into bind
20:09 imirkin: (the pci address iirc)
20:11 pmoreau: Welp, reboot it is cause looks like the laptop froze.
20:11 imirkin: wait 30s or so
20:12 imirkin: sometimes it times out the channels
20:12 pmoreau: But should it impact the screen when it it being driven by my other GPU?
20:13 imirkin: if nouveau itself locks up, yes
20:13 imirkin: locking should be per-device though
20:13 pmoreau: Okay, sure
20:16 pmoreau: Good to know about that unbind/bind dance, assuming it doesn’t lock up every time I try to use it.
20:19 pmoreau: Mmh, so I got a couple of WARN in nouveau_display_destroy, and some nullptr dereference in the following cleanup.
20:19 imirkin: you sure you unbound the right one?
20:19 imirkin: they all look alike ;)
20:20 pmoreau: Yeah, 02 is the dGPU and 03 is the integrated.
20:20 pmoreau: But I think the display sharing via the apple gmux might make things a bit funky.
20:21 imirkin: ah maybe
20:21 imirkin: but still unbind from nouveau shouldn't affect that? maybe?
20:21 imirkin: this isn't a super-tested path
20:21 imirkin: esp with the gmux thing
20:21 imirkin: just be glad it loads :p
20:22 pmoreau: :-D
20:23 pmoreau: I did spent like 2-3 years getting that laptop to boot without locking up, soooo
20:23 pmoreau: That’s how I started with Nouveau
20:26 pmoreau: After the device gets removed, vgaswitcheroo gets disabled, and that is followed by some refcount underflow when trying to get the EDID; I guess vgaswitcheroo getting disabled triggers a connection probe that does not go well.
20:27 pmoreau: Let’s see if I can find a way to detect when we might run into those LOCAL_LIMIT_{READ|WRITE} and disallow launching the grid.
20:28 imirkin: how are you even getting those in first place
20:28 imirkin: oh... local limit
20:29 imirkin: yeah, that could be worse, since it could lead to a shader infiinte loop or something
20:29 karolherbst: pmoreau: ahh yeah.. mmhhh
20:29 imirkin: if the wrong thing gets messed up
20:29 karolherbst: imirkin: well.. it's CL so those thing can happen sadly :/
20:30 karolherbst: imirkin: do we have a nice way of stopping all work on the GPU?
20:30 karolherbst: or well.. a channel?
20:30 karolherbst: I am thinking of ctrl+c where we end up waiting on something to happen
20:31 imirkin: karolherbst: dunno sorry
20:31 karolherbst: couldn't we just require that the kernel automaticallys kills the channel ones we destroy the userspace side of it?
20:51 pmoreau: < https://matrix.org/_matrix/media/r0/download/matrix.org/yPkdXOwpTlcTvVqLaDqURXol/message.txt >
20:51 pmoreau: Why am I not getting an error that I am getting over max tls? 🤔
20:55 imirkin: is there such an error?
20:55 imirkin: oh, you mean a LAUNCH error?
20:56 pmoreau: When doing a nv50_tls_realloc it does check against max_tls_size
20:56 imirkin: iirc that never gets called
20:56 pmoreau: But I think I might not be computing it properly
20:56 imirkin: maybe i'm thinking of a different function
20:56 pmoreau: It is, from nv50_program_upload_code
21:23 karolherbst: uhh
21:24 karolherbst: I might even fix that fix problem we had with the shader-db runner shim
21:24 karolherbst: imirkin: soooo... imagine a fence with ref==1
21:24 karolherbst: we never emit such fence
21:36 pmoreau: Oh
21:37 pmoreau: Just noticed one thing: the tls buffer info is set up once in nv50_screen_compute_setup and that’s it, but when realloc is called it updates using the NV50_3D macros…
21:38 imirkin: pmoreau: yeah. i didn't think it could ever change
21:38 imirkin: maybe i'm thinking of something else
21:38 imirkin: iirc it doesn't udpate the bufctx either
21:39 pmoreau: It doesn’t look like it.
21:40 pmoreau: Unless it’s managed by nouveau_bo_new?
21:41 imirkin: maybe i was thinking of something else
21:41 imirkin: there's definitely something which looked like it could get reallocated but never actually was
21:57 pmoreau: Looks like this was the cause for my LOCAL_LIMIT_{READ|WRITE}; bumping the default allocation allowed to run more tests before hitting the issue (if I didn’t make any mistake). Unfortunately, the test whether to allocate or not is `tls_space < screen->cur_tls_space`, so I was a bit surprised to see it reallocate for “16 temps” when that’s the value I used for the default…