06:12tanriol: Hi. Is it expected that trying to use GSP on TU117GLM causes a WARNING in ad102_gsp_new, "nouveau 0000:01:00.0: disp: one-time init failed, -110" and an idle laptop consuming 40+W power?
06:14airlied: tanriol: uggh sounds like the devinit bits
06:14airlied: maybe need to only respect devinit when it comes to display
06:14tanriol: (kernel 6.8.4, Gentoo default, based on fedora configs)
06:16airlied: yeah if you can find 6.8.1 it'll work
06:16airlied: https://gitlab.freedesktop.org/drm/nouveau/-/issues/349
06:16fdobridge: <rinlovesyou> yeah nouveau isn't the only one dying with 6.8.2+, nvidia proprietary is failing too :Hehe:
06:20tanriol: Thanks, will remember to check gitlab issues next time.
06:21airlied: I'll try and write a patch for it now
06:24airlied: tanriol: booting with nouveau.disp=0 might get past that (not sure)
06:42airlied: okay patch is out
06:49airlied: https://lore.kernel.org/dri-devel/20240408064243.2219527-1-airlied@gmail.com/T/#u
06:49airlied: dakr: can you review that but don't apply it I'll send to Linus asap so to fix 6.8 asap
07:43yusisamerican: ping
07:45fdobridge: <ahuillet> pong
07:48yusisamerican: ahuillet: Is dri-devel being moved away from irc?
07:54yusisamerican: Either way my issue is unrelated to hosting/bridging: I have been working on lowering nv50,nvc0 vbo overhead but I keep running into an issue where the kernel keeps rejecting my pushes in nvc0_draw_arrays(or at least that is what the pushbuf demp dumps)...[cont]
08:00fdobridge: <Sid> @ahuillet is there a way I can do what nvidia-modprobe does without nvidia-modprobe?
08:00fdobridge: <Sid> basically I'm missing /dev/nvidia* on boot and can't use nvidia-modprobe for reasons
08:00yusisamerican: Oh
08:00yusisamerican: Sid I have a script for that
08:00yusisamerican: lemme give it to you in pastebin or whatever
08:00fdobridge: <Sid> :o
08:01fdobridge: <Sid> yeah you can use my paste service if you want to: https://paste.sidonthe.net
08:01yusisamerican: I have so many scattered scripts for the nvidia-open driver
08:02yusisamerican: Uhhh I think I have 3 that do what you want to do, try all of them and see what works or see what it does and do it yourself
08:02fdobridge: <Sid> I don't understand it well enough to write my own e-e
08:02fdobridge: <Sid> mm, that'll be helpful, thanks
08:02yusisamerican: https://paste.sidonthe.net/pasta/hawk-hawk
08:03yusisamerican: Its very simple I think
08:03yusisamerican: its based off a official script, but the perl script is written by chatgpt so your milage may vary
08:04fdobridge: <Sid> heh, fair
08:04fdobridge: <Sid> can you point me to the official script?
08:04yusisamerican: lost it
08:04airlied: yusisamerican: I've had reports in the past of submtting twoo many vertices causes us to collide with the flags
08:05yusisamerican: airlied: Yes, im submitting wayyyy too many verticies in my scripts, anything I can do to fix the flag collision?
08:06airlied: don't send too many :-P
08:07yusisamerican: airlied: anything I can do in the driver to not not do that? My scripts break even on official mesa :o
08:09airlied: can't find the bug I looked at previously
08:09yusisamerican: Sid: basically your supposed to: mknod -m 666 /dev/nvidia(g) c 195 (g) where (g) are the pci minor numbers(???) of your GPUs, and then you also want to `mknod -m 666 /dev/nvidiactl c 195 225` so nvidiactl boots up. lspci | grep -i NVIDIA is your friend if you dont your minor nums. I have no idea why the driver doesnt do this for you, probably some legacy bloat.
08:10Sid127: oh, neat
08:10fdobridge: <ahuillet> @tiredchiku it's open source so you can inspect what it does https://github.com/NVIDIA/nvidia-modprobe
08:10Sid127: driver doesn't do it for me cuz I'm on an unsupported distro
08:10Sid127: musl based system
08:11Sid127: also thanks arthur 💖
08:11fdobridge: <ahuillet> you're most welcome
08:11yusisamerican: Sid: found the official bootstrap script
08:12yusisamerican: no idea if its proprietary or not, probably worse than compiling nvidia-modprobe
08:12yusisamerican: https://paste.sidonthe.net/pasta/mouse-wolf-bear here
08:12Sid127: I'll compile nvidia-modprobe first, then try the rest
08:12Sid127: thanks tho, both of you
08:13fdobridge: <ahuillet> @karolherbst is that what you talked to me about, that pushing too fast creates issues?
08:13Sid127: will try it in ~4h, have classes
08:13airlied: yusisamerican: https://gitlab.com/freedesktop-mirror/drm/-/merge_requests/1 was something someone opened in wrong place a while back
08:13airlied: not sure if it's related
08:19yusisamerican: airlied: maybe ill look into seeing if that fixes my issues since you had no luck...
08:19fdobridge: <ahuillet> yusisamerican: do you have a log of what you see happening? how much are you pushing?
08:20yusisamerican: ahuillet: Not a whole whole lot
08:20yusisamerican: but the code I have is centered around pushing as many Multidraw requests as possible
08:20yusisamerican: GL_PointS
08:20fdobridge: <ahuillet> where is the vertex data, in the pushbuffer?
08:21fdobridge: <ahuillet> can you show code?
08:21yusisamerican: Yeah, its this code, but in a while(1) loop: https://paste.centos.org/view/edced257
08:22fdobridge: <ahuillet> would you happen to have runnable code lying around?
08:22yusisamerican: nvc0_draw_arrays is only called 7 times before it crashes
08:22yusisamerican: arthur: https://paste.centos.org/view/3842f51f, applied to mesa demos
08:25yusisamerican: 0x2FB45710 is pushed as the start type a couple of times but otherwise it seems okay
08:30fdobridge: <ahuillet> I don't know if I can trivially make time to investigate, but I could take a look how the blob handles it (or you can dump it yourself, it may be as quick as inspecting the blob source actually). anyway, it would make sense to understand how Nouveau references the vertices
08:31fdobridge: <ahuillet> if it's a vidmem VBO then you shouldn't see the data memcpy'd into the pushbuffer
08:35yusisamerican: ahuillet: > vidmem VBO then you shouldn't see the data memcpy'd \ so that means that...I shouldnt expect that to happen?
08:36yusisamerican: > you can dump it yourself \ Theres a tool for that?
08:37fdobridge: <ahuillet> I don't know if you shouldn't expect it to happen, it depends on what it is Nouveau does exactly. is this a VBO and is it in vidmem? or is it client arrays?
08:39fdobridge: <ahuillet> suppose you have client arrays or a sysmem VBO, you need to get the vertex data to the GPU, and one of the methods (not the only one) is to inline it onto the pushbuffer, which obviously takes a proportional amount of space there. maybe this is what's happening and Nouveau has a bug with that approach when there are too many vertices. there's other things that work, but your main perf case shouldn't be client arrays anyway
08:40yusisamerican: ah
08:41yusisamerican: I think thats what nvc0_push_vbo is doing... at first glance
08:42yusisamerican: !!!
09:10fdobridge: <karolherbst🐧🦀> probably yes
09:11fdobridge: <karolherbst🐧🦀> depending on the error, we could retry after a bit of time
09:11fdobridge: <ahuillet> and there are similar things in Vk?
09:12fdobridge: <karolherbst🐧🦀> could be that vk already handles it properly
09:13yusisamerican: it looks almost identical when I looked at it...
09:13fdobridge: <karolherbst🐧🦀> I see..
09:13fdobridge: <karolherbst🐧🦀> the error code is -16, right?
09:13yusisamerican: No such device?
09:14fdobridge: <karolherbst🐧🦀> EBUSY
09:14yusisamerican: nouveau: kernel rejected pushbuf: No such device
09:14yusisamerican: channel killed
09:14fdobridge: <karolherbst🐧🦀> ohh
09:14fdobridge: <karolherbst🐧🦀> that's a different error
09:14fdobridge: <karolherbst🐧🦀> the context got destroyed
09:14yusisamerican: oooh.....why does the context kill itself?
09:14fdobridge: <karolherbst🐧🦀> means the GPU ran into some kind of trap or memory fault
09:14fdobridge: <ahuillet> any error message why? surely nouveau can tell you why it killed your channel?
09:15fdobridge: <karolherbst🐧🦀> yeah, there should be stuff in dmesg
09:15fdobridge: <ahuillet> not /enough/ stuff as we discussed, but more than 0 I'd hope!
09:15yusisamerican: [135345.575211] nouveau 0000:01:00.0: gsp: mmu fault queued [135345.575214] nouveau 0000:01:00.0: gsp: rc engn:00000001 chid:56 type:31 scope:1 part:233 [135345.575217] nouveau 0000:01:00.0: fifo:1eae1001:0007:0038:[drawoverhead[248650]] errored - disabling channel [135345.575220] nouveau 0000:01:00.0: Xorg[7828]: channel 56 killed!
09:16fdobridge: <karolherbst🐧🦀> yeah, sounds like a normal memory fault
09:16fdobridge: <karolherbst🐧🦀> which might or might not be Userspace's fault
09:17yusisamerican: anyway I can debug that?
09:18fdobridge: <ahuillet> starting to wonder if your test case is valid :)
09:19fdobridge: <ahuillet> are you staying in bounds of your vertex array?
09:20fdobridge: <ahuillet> mesa-demos https://gitlab.freedesktop.org/mesa/demos.git ? that file isn't in that repository AFAICT
09:21yusisamerican: wut
09:21yusisamerican: lemme check.....
09:21yusisamerican: oh no it is, just in an ancient version
09:22yusisamerican: https://paste.centos.org/view/89826848, https://gitlab.freedesktop.org/mesa/demos/-/blob/93d267ea28187124b28d761d1873c27260e49c59/src/perf/drawoverhead.c
09:23fdobridge: <ahuillet> got it, removed by https://gitlab.freedesktop.org/mesa/demos/-/commit/9a57ab98e440b56f683b926bc55185a82b77059e
09:24yusisamerican: lemme actually read the drawoverhead program right now...
09:24yusisamerican: I just adhoc changed it to use MultiDraw
09:26yusisamerican: const GLint first = 0; /* probably should be an array ngl */
09:28fdobridge: <ahuillet> should very much be an array
09:28yusisamerican: drawcount may be where the whole issue is though, since I think its being incremented every single draw call :oops:
09:29fdobridge: <ahuillet> with the "const" specifier I'm not sure what the compiler does, but if it's on the stack then you're pretty much feeding your return address as "first" which is obviously OOB
09:29fdobridge: <ahuillet> if not on the stack, well, same thing really
09:30fdobridge: <ahuillet> ? not following, what's "drawcount"?
09:30yusisamerican: Reading the OpenGL spec, the count parameter thats currently there
09:31fdobridge: <ahuillet> at any rate, you have a bug with "first" and it's sufficient to explain the symptom
09:31yusisamerican: Should I handle it in userspace mesa or nahhh?
09:31fdobridge: <ahuillet> absolutely not
09:32fdobridge: <ahuillet> you can't, actually.
09:32fdobridge: <ahuillet> drawcount is fine -- it's the size of the arrays. but "first" needs to be an array.
09:32fdobridge: <ahuillet> otherwise the driver reads past the bounds of the 1-size array "first" that you gave, reading whatever random data that will eventually definitely be greater than 0, and boom on the GPU if not on the CPU
09:33fdobridge: <ahuillet> you can't handle that driver side, and even if you could you probably wouldn't want to. you need to fix your app.
09:34yusisamerican: cant I make it boom on the CPU first? And give an assert()?
09:34fdobridge: <ahuillet> and that's our episode of today's "driver engineer tell you it's an app bug", thank you for watching ;)
09:34fdobridge: <ahuillet> and that's our episode of today's "driver engineer tells you it's an app bug", thank you for watching ;) (edited)
09:35fdobridge: <ahuillet> CPU (and GPU actually) MMU work based on pages, so no, you can't make it boom before it reads out of the page. asan (definitely) and valgrind (maybe) should be catching the error though
09:35fdobridge: <ahuillet> generally speaking, a library cannot sanity-check arrays passed to it in C, because there's no concept of "the size of the array" that the library could check against
09:36yusisamerican: [strikethrough] drawcount
09:36fdobridge: <ahuillet> (it's just a pointer and a size, your job as the programmer is to make sure there are at least <size> valid elements starting at the pointer)
09:36fdobridge: <ahuillet> negative, that's how much you're telling the library to read
09:36fdobridge: <ahuillet> it has no way to know how much storage is actually valid at the start address you give it
09:36yusisamerican: unless we segfault?
09:36fdobridge: <ahuillet> but, y'know, AddressSanitizer is designed to catch this
09:37fdobridge: <ahuillet> can't do it reliably (page based MMU, as long as you don't cross a page you won't get a CPU fault)
09:45fdobridge: <ahuillet> fond recollection of debugging a MultiDrawArraysIndirect hang where the indirect BO was written by a compute shader and a bug made it draw 2**31 vertices, over and over again...
09:47fdobridge: <ahuillet> that ended up not being an *episode* but a *season* of "driver engineer tells you it's an app bug"...
09:48yusisamerican: What, did it make it draw 2^31 verticies, or did it draw 2^31 verticies over and over again (゜ロ゜;)
09:48fdobridge: <ahuillet> indirect buffer contained count = stupidly high number
09:48fdobridge: <ahuillet> indirect buffer was big, so many times stupidly high number
09:49fdobridge: <ahuillet> it was in bounds, I don't recall how, but it was just so big the drawcall would have taken days if allowed to execute
10:32yusisamerican: Alright, thanks for helping me with the app issue, drawoverhead with multidraw after applying my mesa patches provides a 2x improvement (づ。◕‿‿◕。)づ
10:32yusisamerican: yay!
10:55yusisamerican: logging off
11:51fdobridge: <Sid> compiled this on my distro, it's seemingly still not creating /dev/nvidiactl
11:51fdobridge: <Sid> hmm
12:15fdobridge: <Sid> but hey
12:15fdobridge: <Sid> don't need it anymore, since now I can get it going with just mknod
12:25fdobridge: <Sid> this is what I did in the end, in my rc.local
12:25fdobridge: <Sid> https://paste.sidonthe.net/raw/fish-seal-ape
12:25fdobridge: <Sid> /etc/rc.local, that is
12:25fdobridge: <Sid> since this distro ships freeBSD coreutils
13:44ad__: hi, i could finally patch nouveau to have working nv_backlight on ADA LOVELACE ad107M
13:44ad__: this is the first rough patch, not still in a shape to be sent to the list
13:44ad__: https://pastecode.dev/s/jwylrlj9
13:45ad__: issue i have is the max_brightness, had to increase it now to 2048 to have a decent max
13:45ad__: any review/help is welcome
13:59fdobridge: <ahuillet> that sort of matches what I described the other day, except you're using a pre-existing facility to do these register writes?
14:01fdobridge: <ahuillet> as for the brightness value, maybe you need to play with the different options inside drm_edp_backlight_enable?
17:30fdobridge: <gfxstrand> I'm working to try and land https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27397 right now. It's currently CTSing one last time and running CI build tests. Annoyingly, it involves bumping the build containers (we need cbindgen) so it'll take a bit to get CI happy.
17:32fdobridge: <gfxstrand> @asdqueerfromeu You may have to tweak the Arch package to add the new build dep.
17:33fdobridge: <gfxstrand> Also, it pulls in the `paste` crate. Shouldn't be a big deal for anybody but it may also involve a tiny bit of work by packagers.
18:04fdobridge: <gfxstrand> NGL, I'm in love with cbindgen....
18:26fdobridge: <karolherbst🐧🦀> mhhh.. I never used it, but... how magic is it?
19:17fdobridge: <gfxstrand> NIL doesn't have a C header file anymore
19:17fdobridge: <gfxstrand> It's autogenerated
19:17fdobridge: <gfxstrand> And the amount of work required to generate an entrypoint is pretty low
19:17fdobridge: <gfxstrand> It's actually less work than C++
20:40fdobridge: <rinlovesyou> Yeah it's great, it creates great c bindings so using rust in a c codebase becomes a lot easier
20:41fdobridge: <rinlovesyou> For small things i usually just write my own bindings, but it's a lifesaver for bigger things
21:13Lyude: gfxstrand: i would love to see a better bindgen in the kernel
21:14Lyude: There's a number of types I don't see us getting away with not writing our own wrappers for, but there's tons of types where we basically literally don't need anything beyond a new container type or just importing a static table of data
21:15Lyude: (like fourcc)
21:16Lyude: airlied, dakr - I just hit a strange bug on this new laptop: https://paste.centos.org/view/6f5b9aa0
21:17Lyude: Unfortunately I don't get the feeling it's easy to reproduce since this is the first time I've seen it after using this laptop for ~2 weeks
21:20karolherbst: Lyude: cbindgen is the other way around
21:21karolherbst: so generating C headers/bindings for Rust code
21:21airlied: Lyude: ouch, tries to allocate a lot memory and it's probably fragmented
21:22Lyude: airlied: so just low memory?
21:22Lyude: that's totally plausible since this laptop kind of is lacking in the RAM department at the moment
21:23Lyude: karolherbst: ah gotcha
21:23fdobridge: <gfxstrand> Landed! @mohamexiety go ahead with the modifiers rebase whenever you please. Let me know if you run into trouble.
21:23fdobridge: <mohamexiety> alright, got it
21:23fdobridge: <rinlovesyou> oh my bad, just woke up and had just "bindgen" in my head
21:23fdobridge: <mohamexiety> good job with the rewrite you and @dwlsalmeida! ❤️
21:23fdobridge: <rinlovesyou> oh my bad, just woke up and just had "bindgen" in my head (edited)
21:23fdobridge: <gfxstrand> @mohamexiety Also, FYI: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28639
21:24fdobridge: <gfxstrand> I'm really looking forward to that one. 😄
21:24fdobridge: <karolherbst🐧🦀> if you want to go a step further you could do the token concept thing
21:24fdobridge: <gfxstrand> Token concept thing?
21:25fdobridge: <karolherbst🐧🦀> like.. you have a private Token type and instead of having Phantom Data you restrict usage by how that thing can be called/constructed in the first place
21:25fdobridge: <mohamexiety> ooo. nice! that should be pretty useful given sometimes units get a bit too much
21:26fdobridge: <gfxstrand> I'm not following I've done stuff with private types but that's not at all the same thing
21:27fdobridge: <karolherbst🐧🦀> mhh yeah, it's a bit hard to explain, but I'm also going to sleep soon, because tomorrow I'll be flying to Chicago
21:27fdobridge: <rinlovesyou> how long until all of nvk is oxidized? /s
21:28fdobridge: <rinlovesyou> how long until all of nvk is oxidized? after that all of mesa maybe? /s (edited)
21:28fdobridge: <karolherbst🐧🦀> I'm sure we have a few people around who don't want to write rust code in mesa 🥲
21:28fdobridge: <rinlovesyou> :Hehe:
21:52fdobridge: <gfxstrand> All of NVK? IDK. I want to eventually but there are a lot of problems to solve frist
21:53fdobridge: <gfxstrand> All of NVK? IDK. I want to eventually but there are a lot of problems to solve first (edited)
21:54Lyude: karolherbst: regarding the token stuff: that sounds exactly like how rust kms currently handles atomic states :)
22:09airlied: Lyude: yeah it tries to allocate an order 7 page for dma and fails
22:10Lyude: well, good excuse to see if i can expense a ram upgrade for this machine then :P
22:10airlied: though I think we shouldn't need to allocate a coherent order 7
22:10airlied: since we have a radix tree mapping to the gpu
22:11airlied: so we might be able to use a vmalloc
22:11Lyude: i can try writing a patch in a bit to do that and add it to my kernel rpm and see what happens
22:59fdobridge: <rinlovesyou> It's certainly not impossible but from what I've seen of the code there's no precedent for a full rust driver in Mesa lol
23:36airlied: dakr: I hand unrolled everything, it's pt->memory->ptrs that is NULL