IRC Logs of #nouveau on irc.freenode.net for 2023-12-03

01:09 fdobridge: <airlied> sniel: can you file an issue at the above link? That Pastebin will timeout I'd say
02:52 sneil: Sure, here: https://gitlab.freedesktop.org/drm/nouveau/-/issues/283
04:10 fdobridge: <gfxstrand> We really need a plan for that. Otherwise, distros will be stuck at non-GSP forever. Another option would be to try to find/load GSP and fall back to non-GSP if it fails. That way the user will get auto-upgraded as soon as an acceptable GSP is installed.
04:14 gfxstrand: cmiller, karolherbst: IDK if Linux Libre does this but some libre distros are so eager to delete firmwares that they don't realize that the 770 firmware is open-source RE and they delete it in spite of the fact that the sources are in the Linux kernel tree. 🤡
08:10 fdobridge: <airlied> Generally we just provide a config option to enable it by default, then tell distros to enable it
08:10 fdobridge: <airlied> But upstream leaves it default n
15:27 karolherbst: gfxstrand: they used to, but that was fixed a long time ago
15:55 xiphmont: Ah, hard to fork/PR on a ro repository. I guess I'll stage on the gitlab mirror.
16:01 xiphmont: Oh wow, and there *has* been work nearby very recently. Interesting.
16:01 xiphmont: It feels like Nouveau is picking up speed.
16:18 xiphmont: OK, so, despite the absurdity of submitting a PR to a mirror, I have a URL and will send mail to [nouveau] as well.
16:18 xiphmont: https://gitlab.com/freedesktop-mirror/drm/-/merge_requests/1
16:38 xiphmont: ...and mail sent
16:55 xiphmont: thanks for the guidance all.
17:33 soreau: not sure if anyone actually looks on gitlab.com since it's a mirror and not upstream, but if you sent to the ML, maybe you will get some feedback
17:34 xiphmont: yeah, advice last night was "get a URL to point to somewhere, then mail it to the list"
17:35 xiphmont: Though at that point, I thought I could fork and PR from gitlab.freedesktop.org, which of course I can't.
17:35 xiphmont: Oh right, you were part of the advice :-)
17:36 soreau: oh, you couldn't fork https://gitlab.freedesktop.org/drm/nouveau even after login? huh
17:37 xiphmont: nope.
17:37 soreau: well that's not very convenient :P
17:37 xiphmont: repo is RO for non-maintainers
17:38 xiphmont: can't even fork into a new namespace. I suppose I could just ask for that permission, but... I kinda have too many hobbies already, don't want to get roped in ;-)
17:38 xiphmont: [I should stop whinging and just ask]
17:39 xiphmont: anyway, it's probably sensible to be fairly locked down really.
17:39 xiphmont: I can always pester for attention if appropriate
17:40 soreau: you could ask in #dri-devel I know they've had certain amounts of spam and CI abuse issues
17:41 xiphmont: right after submitting all this, airlied followed me on Twitter, so I suspect I've got sufficient attention.
18:01 fdobridge: <airlied> Yeah we did just fix some stuff in that area for the new API, just not sure who understands the old API well enough, maybe we can work through it
18:38 xiphmont: Yeah, the old API is obviously kinda confused on a few details.
18:39 xiphmont: [one thing is certain: it's inconsistent in ways causing crashes]
18:39 xiphmont: All the recent cleanup was necessary to find that.
18:42 fearsonga: Now when the memory management and access selection gets sedated after which it's stable, next thing to do, is to stash instruction stream into the hash, you'll be selecting out the banks with minor procedures described for branching , but you do not have to , even if the generic driver for access never worked alu based stashing is operand based, not register file index -- but gladly both work, so index and operand based schema can be intermixed. It
18:42 fearsonga: could be mixed anyways, as when straight line index based accessing was not working, you could implement alu instruction based access still, but generic is so simple, as said first thing is to write a little state tracker a public domain apitrace and kick out the fixed function from the api. So now they ban me again in fury. But we found a simple cross stitch or lock stitch on the procedure of sw register file i suppose regardless of the bans. It
18:42 fearsonga: seems to work
20:03 airlied: xiphmont: I expect userspace should possibly be fixed here
20:03 airlied: we could make the kernel reject things better, but that might break older userspace which is just as annoying
20:33 xiphmont: airlied: by userspace, you mean mesa, or...?
20:34 fdobridge: <airlied> I'm assuming it's mesa or maybe xorg-x11-drv-nouveau if you are using that
20:37 airlied: the no_prefetch stuff is important afaik
20:38 xiphmont: Noy using xorg, pretty stock F39 right now (so also 6.6, not 6.7)
20:38 xiphmont: I don't know at what layer requests are being pushed into the buffers; I've not any experience as yet crossing over the user/kernel boundary. Above the ioctl, I have little idea what's going on.
20:39 airlied: what app are you using since you mention CAD?
20:39 xiphmont: there's no guarding or checking pushbuf size (and no advertisement of limits AFAIK), until we get to encoding the DMA request, which has a warning, but does it anyway and boom.
20:40 xiphmont: FreeCAD.
20:40 xiphmont: But also PrusaSlicer
20:40 xiphmont: (whick is also locking up on really big meshes)
20:41 airlied: yes so the fix would be to reject pushbufs that are illegal there, but I expect that would just be useful to track down userspace bug
20:41 airlied: though if not rejecting them just crashes things, then we should reject them always
20:41 xiphmont: we don't reject; we warn, bug and then do it anyway.
20:43 xiphmont: FWIW, this pops up now and then even without large models, even without load, using no GL whatsoever. Every week or so, eg, Firefox will also push too big a buffer just doing.... whatever.
20:43 xiphmont: I don't have insight into the specific requests, just their sizes.
20:44 xiphmont: I had been assuming that for nv cards everything is coming through gallium... and so a central fix there. Not correct?
20:45 xiphmont: [also, we'd need an interface to inform the upper layer that there is a limit and what it is, I did not see such an interface]
20:46 xiphmont: so, spot fix: small, easy, feels wrong. correct fix: whoooo here we go.
20:46 airlied: https://paste.centos.org/view/raw/21343576
20:46 airlied: should reject them
20:46 airlied: if you aren't using xorg-x11-drv-nouveau then yes the fix will be in the gallium driver
20:47 xiphmont: OK, and if I am it will be in the xorg driver. Only two places? That's not too bad.
20:48 airlied: it might be a fix in libdrm_nouveau where some code is shared, might also be possible to shove an assert in somewhere to get a better idea where userspace is going wrong
20:49 airlied: nouveau_pushbuf_data seems like a suspect
20:51 xiphmont: so, in concept--- what is going 'wrong'? Were pushbufs/DMA tranfers never meant to get anywhere this large?
20:51 xiphmont: anywhere near
20:51 airlied: probably the one in nvc0_vbo.c in mesa
20:52 airlied: I'm not sure how much the hw cares, but when skeggsb added the no prefetch handling he limited things but userspace doesn't seem to have gotten the message
20:55 xiphmont: I recall seeing code in libdrm that was consolidating pushbuffers, I wonder if it's just going crazy.
20:55 xiphmont: well, not crazy, it just doesn't know it ever has to stop.
20:55 xiphmont: wild speculation
20:59 xiphmont: still the case that it's hard to give userspace a spefici limit message when there's only a few hardware-specific limits here and there and no advertisement.
20:59 nosecondgear: but in this case, kernel brings it down deliberately, user space does not honor the limit, but kernel knows anyways.
20:59 xiphmont: ? Kernel does not bring it down.
20:59 xiphmont: You can request a 32MB pushbuf, we hand it back happily.
21:00 nosecondgear: ok
21:00 xiphmont: heck, I think 16MB is the defaul allocated size in several parts.
21:01 xiphmont: er... that number isn't right. it has to be > 1<<23, dmem is allocating buffers four times that size. And nv50 is the only thing with that specific limit anyway.
21:02 xiphmont: [there's also confusion where some code is using words for buffer size, others are using bytes. I think that's actually the source of one size disconnect in the code]
21:03 xiphmont: I think only pushbufs are using bytes; the hardware can only work in words AFAICT.
21:04 xiphmont: [I'm partly thinking aloud here, I have a grand total of two days experience in this code. So I obviously only understand a pretty narrow slice]
21:07 xiphmont: switching gears: what do the maintainers use to watch calls across the kernel / userspace boundary (if anything). Not a necessity but man, if there is an automated tool, I'd love to know about it.
21:10 nosecondgear: yeah, valgrind-mmt devs used as well as mmiotrace, but i still think that all the stack should go through a modernization, i assume nouveau would be stable once and for all too .
21:17 xiphmont: OK, so we need a few more things. Fixes to the userspace drivers, a new interface to tell them what the limits actually are, and code to reject ala what arlied linked.
21:17 xiphmont: airlied that is
21:18 xiphmont: And until then, no CAD :-|
21:18 xiphmont: [well, except for ME ;-]
21:19 xiphmont: also, if we're cleaning things up, userspace should NOT be submitting chipset specific flags as part of a length argument.
21:20 nosecondgear: i have my memory management otherwise i would look more into others code, its the most tiring thing to do, on my mobile i get message that they solve something with memgpt, tiered memory, but its time consuming to look others code.
21:21 xiphmont: is that about right, or is it molehil -> mountain?
21:21 xiphmont: Enh. I mean, I can fix it any number of ways, but it's nice to know ahead of time which will be accepted. Like, this patch works right now, but I agree, it just works around a different problem.
21:22 xiphmont: Ugly? Sure. But my machine is no longer locking up. Like it has been for an entire decade.
21:24 xiphmont: OK, actually, maybe this is now an Issue rather than an MR. I should go make that conversion.
21:25 xiphmont: I also have enough to go see where userspace is going wrong. And I have the motivation to go look, so I can do that.
21:26 fdobridge: <airlied> Yeah I think my patch is needed in kernel then just wait to try and catch an EINVAL somewhere in userspace
21:27 fdobridge: <airlied> For userspace we can hardcode the kernel 8mbish limit and just try and split things.there
21:27 xiphmont: [is airlied talking elsewhere? Did I miss mail?]
21:28 xiphmont: yeah, actually, that's probably not terrible. the xorg driver is nv specific, I suppose the nv path is too.
21:28 xiphmont: er, the path in gallium.
21:29 xiphmont: I had it in mind that the nouveau_gem ioctl interface was just the NV instance of a generic drm ioctl that had to support many chipsets/drivers.
21:29 xiphmont: if that's not the case then pfft, yeah, ok, that's a lot easier.
21:30 xiphmont: and non-ginormohumungous DMA pushes are probably a good thigs.
21:42 nosecondgear: it's discoord, i tried , got banned after getting bullied in one day out of all invitational channels :>o so anyway overall i only wanted to make one presentation about low-level handling of memory and instructions so it could reach to a specification that others follow, otherwise i have no time at all anymore, drivers license i have to do again, competitions , other work etc.
21:43 xiphmont: ah, a discord ok
21:44 xiphmont: huh. FOSS people using discord on purpose, I guess you have to interact with other software people too :-)
21:45 xiphmont: I'll take it over Slack any day.
22:14 xiphmont: OK, yeah, the ~ same fix could technically go into one place in libdrm.
22:16 xiphmont: [not suggesting it should, just found the other side is all]
22:26 airlied: yeah just not sure libdrm should be doing it or higher layers, that flag is the tricky thing to handle
22:44 xiphmont: well, I really need to read more before saying/doing anythign else.
22:44 xiphmont: so, that for now
22:45 xiphmont: [also, some CAD catchup!]
22:50 nosecondgear: radeon logged by talk :P , so final method goes to presentation, i have more than five more methods, bt the last combines the wisdom i studied for five years, five years ago i looked at calculator output and figured that it only needs to determine where and when to inject 512 and where and when 1024 with favorably changing operands, and self rolling or interactively permuting hashes are just fun if results start to come, five years ago
22:50 nosecondgear: i fluke hit occasional erratic correct output, i've learned i think.
23:10 nosecondgear: anyways it's near brilliant time to do this code, but the key infrastructure i mean procedures took so much time that i ran out of time my own :P, it's ill to think that 5years was not enough, but i am still one of those pioneers, cause others did not even try to mess with unforgiving numbers.
23:16 nosecondgear: but i have nowhere to run, currently don't even have permanent home, i can not say i can not go to work that i got on goldmine and i be rich in 1year if i worked very incredibly hard on the last year, so yeah nothing to do, i wasted my time my own. i need to respond to phone calls i need to do other things that get bread on the table :)
23:44 nosecondgear: altogether ten years behind , but i knew it, since i lost the battle for more than ten years in a row to get equipment , the egl stack was offered round about 12-14 years ago, and that same shift of momentum got me miserable that i was not on form yet when it came out.