01:00makinbacon21: re: gsp being a hw decision, wouldn't that mean that orin would support it? given it's an ampere die?
01:04makinbacon21: we don't have "gsp.bin" per se but we have three acr-gsp bins, not sure exactly what can be used there or what format you get.
02:44balisj: hi everyone
02:45balisj: I'm curious if there is a roadmap to having cuda equivalent support for parallel programs with Noveau
02:46balisj: and if this seems unrealistic, I'm interested to hear why
03:37fdobridge: <airlied> An actual ampere die? Or a die with ampere compute cores? But acr-gsp bins might be meaningful
03:40fdobridge: <airlied> Roadmap is currently unrealistic, implementing cuda also so, but maybe some other paths
03:56pabs: the person asking about cuda left after an hour, so missed your answer
04:47fdobridge: <airlied> doh can't see on discord 😛
06:04fdobridge: <246tnt> `RUSTICL_ENABLE=nouveau` ? 😅 I'm not sure what it takes for a driver to support that TBH.
08:34airlied: TimurTabi: I left a comment on the github repo, hopefully you can see it
13:24TimurTabi: airlied: I think it got cut off, so I'm still a little confused. My concern is that I don't want a link between Nouveau and the proprietary driver. You wouldn't install both drivers anyway, and even if you did, it's highly unlikely that you would install the exact same version of both.
13:25TimurTabi: The proprietary driver is always going to be more recent than what Nouveau uses.
14:31makinbacon21: airlied (dont know how to ping on here) jetson orin (t23x) is an actual ampere die, ga10b, and yea we have 3 acr-gsp bins (different mem regions)
14:32makinbacon21: i dont think it's just compute cores
14:36makinbacon21: but re: the driver stuff, my understanding is that the way tegra drm works, you need both a drm-compatible framebuffer from whatever inits the dc (previously the tegradrm driver) and a render node from nouveau, and the two don't interact until mesa connects them (as per thierry's commit here: https://gitlab.freedesktop.org/mesa/mesa/-/commit/1755f608f5201e0a23f00cc3ea1b01edd07eb6ef)
14:37makinbacon21: that would imply that with some modification, one could use the already existent nvidia-drm driver as that framebuffer and dc init, and would not need to interact with nouveau at all
14:37makinbacon21: edit: and nouveau itself would not need to interact with the proprietary driver at all
14:38makinbacon21: tho fwiw, it looks a bit like a fully oss tegradrm-like driver for t234+ is coming anyway if thierry's github is anything to go by
17:56airlied: TimurTabi: no it doesnt need the other driver and i picked paths to avoid conflicts with where the binary installs put things
17:57airlied: TimurTabi: you will always instsll linux firmware
17:57TimurTabi: airlied: I still don't understand what's wrong with the current paths
17:58airlied: i just prefer to keep nvidia names for the files
17:58airlied: and link to them
17:58airlied: makes it easier to see at a glance how many gsp vers we have
17:59TimurTabi: So /lib/firmware/nvidia/tu102/gsp/ will contain the file gsp_tu10x.bin and the symlink gsp-535.54.03.bin -> gsp_tu10x.bin ?
18:00airlied: no id put the files somewhere separate
18:00TimurTabi: I'm not sure I see the value of that.
18:01airlied: i can look in lib/firmware/nvidia/gsp/
18:01airlied: and see every version we used in one place
18:02airlied: instead of having to dig into subdirs
18:03airlied: makes it easier to audit that we dont screw up
18:03TimurTabi: Do you want that scheme for just gsp.bin or all the other firmware files?
18:03airlied: i think just gsp.bin
18:04TimurTabi: well, okay. I understand know what you're asking for, but I don't think it's an improvement, personally.
18:04airlied: the other files are pretty chipset specific
18:05airlied: do you not think it makes it easier to see what we are using? esp as we start using more bin files
18:06airlied: like now with one fw its not bad
18:06airlied: but that probably wont stay that way for too long
18:06TimurTabi: I don't think it's important enough to warrant two different paths for the same group of files.
18:07airlied: i also like to keep the original names, as i guess those may change over time
18:10airlied: do you have a script to generate those links now?
18:10airlied: for gspmbin
18:10TimurTabi: Yes, it's in the WHENCE file
18:12TimurTabi: I don't have a script like extract-firmware-nouveau.py for gsp.bin, if that's what you're asking. I do the moves manually.
18:14airlied: hmm i suppose its hard to write a future proof script since the tu vs ga decision is in driver code
18:14TimurTabi: and it will change
18:16airlied: like i did a trial update to try latest 535 fw and its non trivial effort :-)
18:17karolherbst: airlied: what's the status of the kernel module loading stuff btw?
18:18airlied: karolherbst: the fw loading?
18:19airlied: need to go and track down where my last idea derailed
18:19airlied: someone wanted versions in fw names
18:20airlied: to be meaningful
18:20airlied: when they are anything but
18:20TimurTabi: what's wrong with versions in filenames?
18:21airlied: just that nobody can agree on them
18:21karolherbst: I think the problem is rather, that version in filenames might not be in linear order (and syntax)
18:21airlied: where to put them, how to order them
18:21karolherbst: just do semantic versioning, everything is wrong :P
18:21karolherbst: end of discussion
18:22karolherbst: at least I would ask if we could agree on semantic versioning, and if not, then we agree on nothing in regards to versioning. It's really pointless to have a rando discussion there
18:22airlied: it wasnt agreeing on versioning
18:23airlied: it was deciding if we could evem use it
18:23TimurTabi: $ strings gsp_tu10x.bin | grep 535.86.05
18:23TimurTabi: Driver Version: 535.86.05
18:23karolherbst: yeah.. I doubt we could
18:23TimurTabi: In theory, we could just extract the version from the binary itself.
18:23karolherbst: we can't
18:23karolherbst: well.. we can use it to rename the file/symlink
18:23airlied: this problem space is for the whole kernel
18:24karolherbst: but we can't use it for loading
18:24airlied: not just nouveau
18:24karolherbst: it's pointless to have it inside the binary anyway
18:24airlied: and some fw version strings are very inventive
18:24karolherbst: we could however verify in nouveau if the file name matches the binaries version
18:25airlied: karolherbst: if we had major vers abi stability it would be useful
18:25airlied: but we dont
18:25karolherbst: mhhh... yeah, fair enough
18:26TimurTabi: Well, I'm not a fan of complicating the directory structure just so that we can keep the gsp_tu10x.bin filename. I don't see anything wrong with renaming the gsp.bin file to something that Nouveau expects, and I don't see any problem with having to do "find /lib/firmware/nvidia -name gsp*.bin" to find all the binaries.
18:28airlied: i would ideslly like to audit things against run files in the future to make sure we actuslly have what nvidi releaes
18:29karolherbst: but for that it doesn't matter if the version is in the path or in the file name, does it?
18:29airlied: it saves parsing whence
18:29airlied: to work out tu10x vs ga10x
18:30airlied: like we rename both nvidia files to gsp-vers.bin and bury them in a symlink foresy
18:31karolherbst: we could also change what nouveau expects
18:31airlied: and WHENCE is only source of truth
18:32karolherbst: but we already have the problem with e.g. tu116 vs tu106, those could also become different firmware files, or such a case might arise in the future, where we kinda have to parse it anyway
18:32TimurTabi: The problem is that "gsp_ga10x.bin" is not necessarily only for GA10x. In fact, there was a time (internally, at Nvidia) that gsp_tu10x was for Hopper.
18:32karolherbst: right, but we already have this problem within a generation
18:33karolherbst: we have to know if a generation has its own file or shares one with another, regardless of the architevture
18:33TimurTabi: So the actual filename that Nvidia provides is not as meaningful as it sounds. The actual difference between gsp_tu10x.bin and gsp_ga10x.bin is the libos version embedded in the bin that bootstraps GSP-RM.
18:34karolherbst: yeah.. we shouldn't focus on the "tu" vs "ga" bit, that's entirely arbitrary
18:34airlied: its also something we can revisit later
18:34airlied: by adding symlinks
18:35airlied: but the process to add a new fw is non trivial and I felt it made it easier
18:36karolherbst: yeah.... I _think_ with gsp it makes sense to just move to a `nvidia/$version/$chipset` structure (also on the nouveau side), because that's kinda mapping to the hierachy quite nicely
18:36TimurTabi: I think a bigger problem is that the WHENCE file contains the version number for all the filenames, so when I eventually have to push a new version of GSP-RM to linux-firmware, it'll be a mess.
18:37karolherbst: or rather
18:37karolherbst: why would it change?
18:37karolherbst: we have to add all entries anyway
18:37airlied: yeah whence is additive
18:37TimurTabi: Do we expect to keep old and new versions of gsp-rm in linux-firmware at the same time?
18:37karolherbst: we have to keep them all
18:37airlied: yes we have to, forever
18:37karolherbst: like _all_
18:37TimurTabi: So will WHENCE install all versions or just the newest?
18:37karolherbst: why do you think that the file size is such a pita
18:38TimurTabi: I was hoping it would just be the newest.
18:38airlied: nope all
18:38karolherbst: and newer kernels also have to support loading old firmware
18:38karolherbst: because $regressions
18:38karolherbst: we can't require changes to userspace when updating the kenrel
18:38karolherbst: it has to still work
18:38TimurTabi: So 5 years from now, when there are 3 versions of GSP-RM, WHENCE will install 180MB of images, of which only 60MB will actually be used?
18:39TimurTabi: That seems silly.
18:39karolherbst: it's not silly :P
18:39karolherbst: there is a reason for it
18:39airlied: but which 60MB
18:39karolherbst: and the reason is: we don't regress userspace
18:39karolherbst: there are some ideas on how to decrease the cost of initramfs file generation
18:39airlied: new kernels cant require new fws
18:40karolherbst: but in the firmware repository (and the users system) we kinda have to keep them all
18:40airlied: for exisitng he
18:40airlied: distros could in theory cull old ones
18:40karolherbst: distributions are free to do whatever they want here
18:40airlied: but linux firmware cant
18:40karolherbst: but it's helpful if you have to boot older kernels
18:40karolherbst: e.g. for a `git bisect`
18:41TimurTabi: I suppose.
18:41TimurTabi: Well, I need to get lunch. I'll be back.
18:41airlied: TimurTabi: thats why i want a toplevel :-)
18:42airlied: and simpler audit tra
18:42karolherbst: yeah.. I think personally we want somethign like nvidia/r535/tu10x_gsp.bin and then a bunch of symlinks to it
18:43karolherbst: and then an update just adds another directory
18:43airlied: yeah i was suggesting nvidia/gsp/535.54.08/
18:44airlied: since nvidia/535.54.08 is where the nvidia packaging puts em and i dont want to conflict
18:45karolherbst: that won't be confusing at all, but uhhh
18:45karolherbst: can we ask nvidia to not install into /lib/firmware? :D
18:53airlied: karolherbst: we could, but they've got a bunch of run files doing it already that won't change
19:09airlied: TimurTabi: I also see a thing called CrashCat in the newest openrm, is that new logging?
19:10TimurTabi: Yes it is
19:35TimurTabi: On a side note, do we care that Ben's GSP-RM code only works on little-endian platforms?
19:36fdobridge: <airlied> probably not, I don't think nvidia supports any big-endian platforms anyways
19:36TimurTabi: I think it would be cool if Nouveau worked on big-endian PPC systems
19:44TimurTabi: There is some endian-aware code:
19:44TimurTabi: + u32 rate = (le16_to_cpu(rates[i]) * 200) / 10;
19:57airlied: I don't think any big-endian PPC systems really exist since nv40 times
19:58airlied: TimurTabi: yeah we cared back in the nv40 days
19:58airlied: and some code probably does do the correct wrappers, but it's not a major concern anyomre
19:59TimurTabi: I wonder why we don't have "depends on LITTLE_ENDIAN" in the Kconfig
19:59airlied: just because it used to work on nv40
19:59airlied: and probably still does
20:01airlied: I tried getting 535.104.05 to boot, doesn't get past booter-loader
20:03airlied: but I noticed in the traces the dmem setup is quite different
20:03airlied: https://paste.centos.org/view/raw/e7e3a206 is the working 535
20:03airlied: https://paste.centos.org/view/raw/33bb0975 is updated
20:03airlied: note dmem ranges 5d00 and above are empty in the second
20:04airlied: not sure if that is the problem, I'll trace openrm load
20:10TimurTabi: If you apply my debug patch and send me the contents of loginit and logrm, I can see if something comes up. It's on my to-do list, but it'll be a while before I get to it.
20:39airlied: TimurTabi: I don't even get past booter-load, not sure I've even setup loginit
20:40airlied: probably have though, will see how I go
20:41TimurTabi: Can you try booter from 535.54.03 with gsp-rm from 535.104.05? That should actually work.
20:41TimurTabi: or booter-load or whatever
20:55airlied: ah that dies as well, with a different mbox code, I'll try and get more out of it later
21:39TimurTabi: ok, I'm almost done reviewing the 44 patches. I'll debug this issue tomorrow.
21:41karolherbst: big endian is kinda best effort...
21:41karolherbst: but besides that I don't think it's even worth caring
21:42karolherbst: mesa is a dumpster fire on big-endian anyway
21:42karolherbst: it kinda works, but a lot of the format handling is just wrong
21:42karolherbst: kinda works as long as you only do plain rgba
21:43karolherbst: we can make it work, but that requires somebody with a loooot of time to look at the big endian handling in mesa and do it from scratch
21:43TimurTabi: I feel sad for my former PowerPC brothers.
21:44karolherbst: yeah.. but there aren't enough people to care and make it works
21:44karolherbst: mesa's BE support is basically "make gnome run somehow"
21:44karolherbst: and mostly for s390x
21:44karolherbst: and only CPU
21:45karolherbst: if somebody cares to fix it all, be our guest
21:46karolherbst: mesa's BE support is basically on a "we just reverse the channel order" level, and as you might have guessed, it's broken for packed formats and other funky bits
22:01RSpliet: AFAIK you can also run a PowerPC processor in little-endian mode. There is no reason to suffer https://catfox.life/2018/11/03/clearing-confusion-regarding-modern-powerpc-endianness/
22:03RSpliet: actually, a closer-to-the-source source: https://www.ibm.com/support/pages/just-faqs-about-little-endian
22:14fdobridge: <mohamexiety> PPC + NV was interesting given iirc Pascal + Power9 was the first uniform memory system combo?
22:14fdobridge: <mohamexiety> but I may be really badly misremembering things
22:14fdobridge: <mohamexiety> I know it was Power9 and _something_... either Volta or Pascal. then nothing until Hopper reintroduced it
22:23airlied: there are certain OpenGL things that are really hard to do in BE as well
22:23airlied: like qbos
22:23airlied: unless you fix the apps and cts etc