00:24 imirkin: oh. that works better if i also set the number of user parameters.
00:30 imirkin: ok, dEQP-GLES31.functional.shaders.builtin_var.compute.* all pass now. yay.
01:22 imirkin: now i just need to figure out how to get it to wait properly.
01:23 imirkin: (and hook up samplers/views, but that's easy)
06:07 imirkin: ok, textures work. now just need to hook up images. looks like it's actually not so hard -- the (semi-)difficult bit is supporting writeonly (format-less-in-shader) but i can punt that until later. and 3d is a slight pain, but i think it's the same deal as on nvc0, so can just copy what we did there
06:10 imirkin: [and figure out how to deal with the insufficient flushing...]
07:09 imirkin: pmoreau: do you have any blob traces of nv50 compute things?
07:11 imirkin: wait what?? atomicCounterDecrement works but increment fails??
07:12 imirkin: oh ffs
07:12 imirkin: i think $r63 isn't 0
07:12 imirkin: in compute
07:12 imirkin: wtf
07:33 imirkin: nuked r63 usage from compute, everyting works now
07:36 imirkin: for dEQP-GLES31.functional.ssbo.layout.*:
07:36 imirkin: Test run totals:
07:36 imirkin: Passed: 1999/2007 (99.6%)
07:36 imirkin: Failed: 7/2007 (0.3%)
07:37 ccr:cheers
07:38 imirkin: the failures appear to be where the hw thinks the accesses are out-of-bounds
07:39 imirkin: (at least some are)
07:39 imirkin: hm
07:41 imirkin: pmoreau: so my plan is to get the images stuff going at least reasonably well, and then take all the nv50/ir and nv50 patches, and make a series out of it. i'll be modifying some of your stuff, taking other bits as-is (and adding a bunch of my own). hope that's alright.
07:41 imirkin: pmoreau: i expect i'll have something towards the end of the week
09:06 damo22: what is missing from nouveau for some kind of compatibility for opencl?
09:08 RSpliet: damo22: good question. karolherbst, pmoreau: what's the current status of opencl? was stuff merged yet, or still brewing?
09:08 damo22: i just discovered the hard way that NVIDIA is a pain in the arse
09:08 RSpliet: Yeah, sorry
09:08 damo22: due to their binary driver situation
09:08 damo22: but you guys must be doing something right
09:09 RSpliet: there's a lot of people doing things here that are right, but it's essentially a 4-man team, so it's also quite a bit behind on features
09:09 damo22: i have a GTX 280
09:09 damo22: its compute 1.3
09:09 damo22: in cuda terms
09:10 damo22: but i have no idea how that could map onto something free
09:10 RSpliet: I hope that's "NVA0"
09:10 damo22: how do i tell?
09:11 RSpliet: dmesg says something along the lines of
09:11 RSpliet: [ 9.596159] nouveau 0000:03:00.0: NVIDIA MCP79/MCP7A (0ace80b1)
09:11 damo22: ok let me blacklist nvidia and reboot my server
09:11 RSpliet: The second and third hex digit (here "ac") tell you.
09:11 RSpliet: Wow.
09:13 damo22: ?
09:14 damo22: [ 41.964716] nouveau 0000:01:00.0: NVIDIA GT200 (0a0080a2)
09:15 RSpliet: there you go, NVA0
09:15 RSpliet: and a kernel bring-up that's four times as slow as my Atom board :-P
09:16 RSpliet: Anyway, that's a bit of a special GPU when it comes to OpenCL. I can't promise that the work currently in the pipeline already works on that
09:16 damo22: i modprobed it manually
09:16 RSpliet: :_D
09:16 RSpliet: *:-D
09:17 damo22: ok
09:18 damo22: i had a look at the binary driver "source" its a blob with a bunch of stuff around it
09:18 damo22: is anyone looking into the blob?
09:19 RSpliet: It's used as a black box when reverse-engineering the GPU. Nobody uses it in a disassembling-sense because that may come with legal risks
09:20 damo22: okay
09:21 RSpliet: Up to this day, AMD and Intel have been more committed to open source. NVIDIAs contributions to nouveau code are sparse and GPU docs are patchy. Most of what you see is the result of reverse-engineering. Hence people
09:21 RSpliet: s mileage may vary wildly
09:24 damo22: i see
09:26 damo22: im pretty annoyed that nv is using free software as a platform for selling proprietary drivers for hardware
09:26 damo22: ie, binary driver restrictions etc
09:27 damo22: and having binary toolchain is a mega pain in the arse
09:27 damo22: for gnu compute
09:27 damo22: gpu*
09:28 damo22: but AMD ROCm is a perfect example of how NOT to distribute a gpu toolchain
09:28 damo22: its soo messy
09:28 RSpliet: on AMD you get upstream OpenCL support I think
09:29 FLHerne: It doesn't really work reliably yet
09:29 damo22: yeah i have an AMD card that also has no support unfortunately,
09:29 RSpliet: oh
09:29 RSpliet: FLHerne: what does "not reliable" mean?
09:30 damo22: i have a NAVI Radeon RX 5700
09:31 FLHerne: It tends to miscompile, or fail to compile, a lot of the kernels that people would want to use
09:31 damo22: its pretty useless at this point, or was a few months ago
09:31 FLHerne: And unlike graphics, you can't really get away with "it works sometimes" because you need to trust the results
09:32 damo22: FLHerne: ive heard that the consumer cards also miscompute sometimes, so to really trust results you may have to recompute and compare
09:32 FLHerne: AMD haven't done much on it for years, it's only been a hobbyist thing
09:33 FLHerne: (until this year, when Intel/MS did a lot of work on the Clover frontend, but most of that is for CL-over-nir which isn't the default path on AMD hardware)
09:34 damo22: but the reason i am raising the discussion here, and including AMD stuff is because i think you guys need to do it "right" and learn from the mess that AMD did
09:34 FLHerne: AMD's supported thing is ROCm, which as damo22 says is "open" in a remarkably useless way :p
09:37 damo22: i think it could be a game changer if nouveau supported opencl and was upstreamed... i have no idea how much work that is for the toolchain
09:39 damo22: is there a document/roadmap for that?
09:50 RSpliet: damo22: nouveau lacks 1) adequate clock control pretty much across all generations of GPUs, 2) lack of vulkan, 3) lack of OpenCL, 4) fundamental issues with multi-threaded accelerated applications, 5) a spectacular lack of manpower.
09:50 RSpliet: All of these issues feel like they haven't changed in the past 2 years because the few devs we have have to play catch-up with every new gen to get display and basics up and running and then run into the brick wall of "NVIDIA requires signed firmwares since 2nd gen Maxwell, which they release very late and incomplete"
09:51 damo22: aww
09:52 RSpliet: There's the will to do things proper, but it's hard to get good driver devs for such complex hw. And find someone who can employ them (most of them here are Red Hat, one hero currently spends his precious free time on improving user space)
09:53 damo22: yeah thats incredible
09:53 RSpliet: We've had occasional contributions from NVIDIA for bits that seem like more niche features
09:54 damo22: is there a source repo i can poke around for opencl?
09:54 RSpliet: and sometimes less niche - heard they sent out some video decoder patches recently
09:55 RSpliet: I don't know where this lives
09:55 damo22: ok no worries
09:55 RSpliet: suspect it's branches in https://gitlab.freedesktop.org/pmoreau/mesa
09:56 RSpliet: but where to start, no idea
09:56 RSpliet: ^ that guy is also working to finish up a PhD.
09:57 FLHerne: damo22: karolherbst was working on it more recently
09:57 RSpliet: FLHerne: yeah they work together on this lately
09:58 RSpliet: seems to have fallen off the internet
09:58 FLHerne: damo22: It should generally work on upstream Mesa now
09:58 RSpliet: he'll be back today ;-)
09:58 damo22: so mesa is the 3d driver?
09:58 damo22: sorry i have no idea how things fit together
09:59 RSpliet: yep
09:59 RSpliet: mesa is the userspace acceleration thing. Has quite a lot of components to it - 3D, compute, video decoding the main branches
10:00 damo22: so if i wanted to add support for something for my card i would start in mesa?
10:00 RSpliet: eh, video decoding being VDPAU/VAAPI. GStreamer/ffmpeg/whatver VLC uses still have the actual codecs I think
10:00 RSpliet: Depends on what you want to add
10:00 damo22: im not actually interested in the graphics just the compute
10:02 RSpliet: kernel handles device memory allocation, isolated contexts, clocks, firmwares and a lot of the display mode setting stuff
10:02 damo22: im guessing there needs to be some kind of API for the memory in the kernel exposed to userspace so opencl can work, is that already in place?
10:03 RSpliet: Yeah all that stuff exists
10:03 RSpliet: and I'm always amazed at how complex that code gets
10:03 damo22: ok so i probably dont need to delve into kernel
10:04 damo22: what would be calling the kernel api to do compute?
10:05 RSpliet: mesa
10:05 damo22: ok i'll look in there
10:48 damo22: im looking in mesa: src/gallium/drivers/nouveau it looks like nva0 is a missing folder?
10:49 RSpliet: nv50
10:51 damo22: ah 0xa0 -> nv84
10:51 damo22: at least for the decoder
10:52 RSpliet: The differences between NV50-NVAC are minor, so they share code in the nv50 folder
10:55 damo22: gotcha
10:56 damo22: is there anything i can dump from my card that would be useful?
11:35 damo22: what is gallium and how does clover fit in for opencl
11:41 damo22: it appears to be a llvm frontend
11:43 RSpliet: gallium is the driver framework. It's supposed to abstract away the various APIs (OpenCL, OpenGL, DirectX, VAAPI, VDPAU) and inputs (GLSL, SPIR-V) into more generic concepts
11:44 damo22: where does gallium run, as a library?
11:44 RSpliet: mesa tends to be built as a monolith IIRC
11:45 RSpliet: anyway, LLVM is only a tiny bit of it, and apart from SPIR-V, nouveau doesn't really use LLVM.
11:46 damo22: so when i write an opencl program, and JIT compile it, it calls into mesa to figure out how to lower my code into something my gfx card understands?
11:46 damo22: or is that baked into llvm
11:46 RSpliet: the kernel goes into mesa, which translates it to an intermediate called NIR (or TGSI for some pipelines), and the nouveau "codegen" bits then translate it to assembly for your specific graphics card
11:47 RSpliet: SPIR-V is, or should, be involved somewhere. I *think* the steps are OpenCL C->SPIR-V->NIR->NV50IR
11:48 RSpliet: SPIR-V is in there because programs can also ship with pre"compiled" kernels, which ship in SPIR-V
11:48 damo22: ok, so how come i cant find any opencl function implementations like a library in mesa
11:48 damo22: are they currently missing?
11:49 RSpliet: https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/frontends/clover
11:49 damo22: api/interop.cpp: * TODO: implement the CL functions
11:49 RSpliet: it's an old working name, and a pun. "CLover"
11:50 RSpliet: just like "nine" is the front-end for DirectX 9
11:50 RSpliet: not hugely descriptive, but might avoid trademark issues (the OpenCL trademark may or may not be in the hands of Apple)
11:52 damo22: what about a math library for opencl for example, is that not dependent on the driver somehow?
11:53 RSpliet: what kind of math library? you mean something like a BLAS library. That'd be a separate library linked against your application, which calls the OpenCL APIs
11:53 RSpliet: doesn't have to know about the driver
11:53 damo22: oh ok
11:54 RSpliet: anyway, sorry, I hate to leave you hanging further, but got to get back to work
11:57 damo22: thanks for your help
13:16 RSpliet: kherbst: damo22 was asking about the current state of OpenCL and nouveau (generally, and also specifically for NVA0). And tbh I've lost track too, so I'm kind of curious
13:17 kherbst: should work (tm)
13:17 kherbst: ehh... more or less
13:17 kherbst: nva0 is tesla, isn't it?
13:18 RSpliet: yep. Somewhere between first and second gen ;-P
13:26 RSpliet: kherbst: also, on F33 it doesn't work out of the box on GM10x. clinfo reports the Clover platform, but no device regardless of whether I define DRI_PRIME=1 or not. F33 stack too old? More instructions required?
13:27 kherbst: RSpliet: NOUVEAU_ENABLE_CL=1
13:28 RSpliet: makes no difference
13:28 RSpliet: mesa-libOpenCL is installed, the Clover platform is reported by clinfo
13:29 RSpliet: There is this note: our OpenCL library declares to support OpenCL 3.0, but it seems to support up to OpenCL 2.2 only.
13:29 RSpliet: But since it's a note, it shouldn't make a difference
13:29 RSpliet: (famous last words)
13:42 kherbst: RSpliet: then something is broken on your end :P
13:43 kherbst: ahh.. wait, you need to compile yourself
13:43 RSpliet: Ah right, there's the catch
13:43 bbear: hello, do you know if the new GT710 cards are well supported with the nouveau driver ?
13:43 RSpliet: Think F34 will have it enabled by default? I've lived without it for 5 years on this laptop, can survive another 3 months :-P
13:44 RSpliet: bbear: should be, but someone complained recently that there was trouble
13:52 kherbst: RSpliet: dunno
13:52 kherbst: doubt it makes sense as I don't think any application would just work out of the box atm
13:53 RSpliet: it's hidden behind an environment variable, so even if it doesn't work well it doesn't hurt either
13:54 RSpliet: I'm personally of the opinion that we should make the cost to entry as low as possible for people who want to experiment. Compiling mesa is a relatively high bar
13:55 RSpliet: We've got the OCL equivalent of a triangle rendering (and quite a bit more), which used to be good enough for an upstream driver :-P far from perfect, but good enough for a bit of play and bug-hunting by ppl like damo22
14:59 imirkin: RSpliet: i'm actually doing some of the compute bring-up work for tesla right now
14:59 imirkin: not all of it is opencl-specific
14:59 imirkin: however some of it should help pmoreau whose goal has been opencl
14:59 imirkin: mwk: any idea why $r63 isn't 0 on a G84 in compute shaders?
15:50 dviola: just a question...
15:50 dviola: the "wait for idle" comment is a reference to the nvkm_msec line right? https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/nouveau/nvkm/subdev/pmu/base.c#n105
15:51 dviola: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/nouveau/nvkm/subdev/pmu/base.c#n107
15:51 imirkin: mwk: specifically i'm having trouble with using $r63 in g[$r63] with "ld add" style atomic ops
15:51 imirkin: dviola: yes.
15:52 dviola: imirkin: thanks
15:52 imirkin: dviola: it waits for some period of time for some condition to be true. that condition, ideally, is that the thing is idle.
15:52 dviola: nvkm_wr32(device, 0x10a014, 0x0000ffff); <-- I guess this would be "Inhibit interrupts" then
15:52 imirkin: yeah, i think that's the intr mask?
15:53 imirkin: dviola: https://github.com/envytools/envytools/blob/master/rnndb/falcon.xml#L86
15:54 dviola: cool
16:04 mwk: imirkin: I seem to recall there was some problem with specifically g[$r63]
16:04 mwk: no idea why
16:04 imirkin: weird
16:04 mwk: yes
16:04 imirkin: and very annoying, might i add
16:05 mwk: oh yes
17:11 imirkin: kherbst: do you have a nva0 btw?
17:11 kherbst: uhm... nvae or something
17:11 imirkin: definitely not nvae
17:12 imirkin: nvaf is the higher-end IGP of the series
17:12 imirkin: there's also nvaa/nvac. but nva0 is the monster of the generation
17:12 kherbst: MCP78 or so
17:12 imirkin: yea, that's nvaf iirc
17:12 imirkin: nva0 has fp64, nothing else does
17:12 imirkin: (in tesla)
17:12 kherbst: ahh
17:13 kherbst: you know what is annoying? my flat comes with ethernet ports all around the house, sadly, only one out of 6 ports do work :/
17:13 imirkin: lol
17:14 kherbst: yeah.... for whatever reason the first I tried was the working one
17:45 RSpliet: Also, I think NVA0, like NVAA/NVAC doesn't have PMU, but is still HWSQ
17:46 imirkin: sounds right
17:47 RSpliet: it's one of those classic transitional frankenGPUs :-P
17:50 imirkin: how so?
17:51 imirkin: it's basically identical to G92
17:51 imirkin: but with fp64 bolted on
17:51 imirkin: (and the ability to do xfb stream pause/restart, but that seems minor)
17:55 RSpliet: according to wikipedia, it's also a different number of active warps per SM
17:56 RSpliet: I *thought* there were other ISA additions that made it closer to NVA3/5/8, but I may be mistaken
17:56 imirkin: yeah, but diff warps/sm happened all over, i'd think
17:57 imirkin: the isa changes were mostly in nva3, with DX10.1 features
17:57 RSpliet: yeah, think it might also support a wider range of atomics
17:57 imirkin: 64-bit atomics, yea
17:57 RSpliet: and atomic to shared mem
17:57 imirkin: but that kinda goes hand-in-hand with fp64 support in gpu
17:58 imirkin: i'll have to check on that, i think that might be nva3+
17:58 RSpliet: my source is wikipedia :-P
17:58 RSpliet: (and a very hazy memory)
17:58 imirkin: like wikipedia knows anything :p
17:59 imirkin: i was looking at this like yesterday, but mostly i just remember that g84 does *not* have shared atomics
18:00 imirkin: on the bright side, the shared atomics logic is very similar to nvc0
18:00 imirkin: with the lock/unlock thing
22:11 damo22: good to hear you guys are working on it currently, that brings hope
22:12 damo22: how do i enable opencl on NVA0 on fc33?
22:12 imirkin: damo22: it's a relatively recent spurt of effort
22:12 imirkin: damo22: you can't
22:12 damo22: can i compile mesa?
22:12 imirkin: you can, but it won't help
22:12 damo22: ok
22:12 imirkin: pmoreau has a branch
22:12 imirkin: but i'm not sufficiently opencl-savvy to have gotten the opencl bit of it to work
22:12 damo22: i can wait for fc34
22:13 damo22: will it work there?
22:13 imirkin: dunno, probably not
22:13 damo22: heh
22:13 imirkin: (i have no clue what the fedora release cycles are)
22:13 Lyude: note that most drrivers don't match distro release cycles
22:13 imirkin: i think there's a modest chance that mesa 21.1 will have enough to make tesla + opencl be a thing
22:14 damo22: okay well its fantastic you are doing it
22:14 damo22: thank you
22:14 damo22: i'll try it when it works
22:14 imirkin: damo22: just curious - what do you need it for?
22:15 damo22: my work bought some TESLA T4s and i have a NVA0, i want to learn opencl at home before i unleash onto the monsters at work
22:15 damo22: id rather learn opencl than cuda
22:15 imirkin: ok - i don't know what Tesla T4 is, but i'm going to guess it's about 100 generations ahead of the nva0
22:15 damo22: yes
22:15 imirkin: nvidia does expose opencl i thought?
22:16 imirkin: i could be wrong
22:16 imirkin: it's been quite a while
22:16 damo22: oh maybe i can use the binary driver with plain opencl?
22:16 imirkin: you should double-check, but i do think they had an opencl lib too
22:16 damo22: ok cool
22:16 imirkin: opencl 1.1, nothing too fancy, but you're not trying to be fancy either
22:17 damo22: but its good to know there is freedom in my hardware coming
22:17 imirkin: hopefully
22:17 imirkin: keep in mind this is a volunteer team
22:17 imirkin: so ... no precise timelines. people do what they want.
22:17 damo22: is there anything i can do like dumping from my card something?
22:17 imirkin: nope
22:17 damo22: ok
22:18 RSpliet: imirkin: fwiw Fedora does biannual releases. F34 happens in May*, 35 in November*, 36 in May 2022* etc
22:18 imirkin: there's a small chance i might want someone to do some testing at some point, but it probably won't need a nva0 exactly
22:18 RSpliet: * They never actually make their release date
22:18 Lyude: there's always dev work to do though :)
22:18 damo22: the nv50 lowering code looks crazy, i have no clue how you guys figure that stuff out
22:19 imirkin: RSpliet: ah ok. so then fc35 is probable
22:19 imirkin: damo22: happy to explain a particular portion of it
22:19 imirkin: it's a compiler though, and compilers tend to be complicated
22:19 RSpliet: they regularly roll newer mesa versions in, it's not bound to a release. But yes, sounds like a plausible target
22:20 damo22: for example, do you have an ISA description and the instruction set for the GPUs?
22:20 imirkin: sure
22:20 imirkin: damo22: https://envytools.readthedocs.io/en/latest/hw/graph/tesla/cuda/isa.html#
22:20 damo22: are there any instructions missing from your nv50 compiler?
22:21 imirkin: dunno, i'll find out as i implement more of compute
22:21 imirkin: all the GL stuff gets used
22:21 damo22: cool
22:21 imirkin: it's possible that some instruction forms are never emitted? dunno
22:21 imirkin: probably not very useful ones though
22:22 damo22: so is CL a graphics pipeline with no display?
22:22 RSpliet: no, that's Brock
22:22 imirkin: depends
22:23 RSpliet: *Brook
22:23 imirkin: on the nv50 series, compute has access to global memory directly while graphics shader stages don't
22:23 damo22: oh hmm thats interesting, are there any instructions in CL that manage the memory?
22:24 imirkin: you can read/write buffers and images from the shader, yea
22:24 RSpliet: damo22: also, OpenCL doesn't use the entire rasterisation part of the graphics pipeline.
22:27 damo22: thanks for your help, i need to work now
22:32 damo22: btw, the cards we have are NVIDIA TU104 (164000a1)
22:33 imirkin: makes sense that they'd be called T4 :)
22:33 damo22: they seem to be detected by nouveau already
22:33 imirkin: yeah, they should work
22:33 imirkin: but you're not going to get good perf out of them because we can't change clocks on them
22:33 damo22: wow good job
22:34 imirkin: so they come up at like 1milli-hertz to save power, and that's what you get
22:34 damo22: what if someone discovered a command to change the clocks somewhere in the blob
22:34 Lyude: we would have done it a long time ago if that was the case
22:34 imirkin: that's not really how it works
22:34 Lyude: it's more complicated then that
22:35 damo22: bummer
22:35 imirkin: unfortunately you can't just reclock the memory willy-nilly
22:35 imirkin: you have to be "inside" the GPU to do it
22:35 imirkin: and that firmware must now be signed by nviida
22:35 damo22: arrgh
22:35 imirkin: we could try to reuse the firmware they ship in some driver, but that wouldn't be redistributable
22:36 damo22: cryptographic locks like management engine :(
22:36 RSpliet: and even then we'd need someone who is willing to strap themselves to a computer for a few months just to figure out all the tedious bits of configuring memory
22:37 damo22: f$$k nv
22:37 RSpliet: but this time without the help from being able to alter the VBIOS on a running system
22:38 RSpliet: (which is where all the big tables with memory params are hidden)
22:38 damo22: ah
22:38 RSpliet: because, AFAIK, the VBIOS too is signed these days
22:38 damo22: eg, like which clock speed you are allowed on which card
22:38 RSpliet: oh that's the easy part
22:39 imirkin: damo22: there are a bunch of clocks, but the most important one is the memory
22:39 imirkin: the memory itself has to be configured differently in order to operate at various clock speeds
22:39 damo22: i imagine there are a bunch of different IP cores with different roles
22:39 imirkin: different boards will have different memory chips
22:39 RSpliet: a DRAM data sheet comes with 22 different timings. You may have heard of CAS, there's loads more. And then there's voltage regulation, link training, DLLs, all the stuff.
22:39 imirkin: and all this memory configuration stuff is wildly proprietary anyway
22:40 damo22: omg, you have to train memory on the gpu?
22:40 imirkin: the information is in the vbios somewhere ... but where
22:40 imirkin: damo22: if you change clocks, yea
22:40 damo22: oh shiiit
22:40 damo22: i did some memory init for x86 in coreboot
22:40 damo22: it was not fun
22:40 imirkin: yeah. and that's at boot time
22:40 RSpliet: I did it for your NVA0
22:41 damo22: :)
22:41 imirkin: when memory doesn't have anything too useful in it
22:41 RSpliet: Well, not 100% it works, I did it for *a* NVA0
22:41 imirkin: whereas this is at runtime, in the middle of rendering
22:41 damo22: holy crap
22:41 imirkin: gotta coordinate it with scanout too, otherwise you get blinks
22:41 imirkin: it _really_ helps to have hardware docs for all this
22:42 RSpliet: damo22: trust me, it's not *difficult*, just play monkey-see-monkey-do with the closed-source driver. it's just very very tedious. Get one bit wrong and you get to reboot.
22:43 imirkin: and diff boards are different
22:43 imirkin: so when we ship a driver, we can't just have it work on OUR boards
22:43 imirkin: it has to be data-driven
22:43 imirkin: somehow
22:43 imirkin: but we don't necessarily know how the blob driver is making its decisions
22:44 damo22: yeah i wrote a radare2 plugin for pulling apart AMD atombios crap, they wrote a virtual ISA for doing the gfx init in the vbios
22:44 imirkin: yeah, nvidia has that too
22:44 imirkin: but that's not where the complexity is
22:46 damo22: https://user-images.githubusercontent.com/3164348/93406389-e1014980-f8d2-11ea-9212-51dbdcfe5236.png
22:46 imirkin: delightful
22:47 RSpliet: damo22: if you ever get bored, you can run your NVIDIA video bios through https://people.freedesktop.org/~imirkin/nvbios/
22:47 RSpliet: /sys/kernel/debug/dri/<X>/vbios.rom
22:47 damo22: cool
22:47 imirkin: the vbios is now executed on-board now i think though?
22:48 imirkin: i.e. on the gpu
22:48 imirkin: not sure
22:48 RSpliet: imirkin: think you can add a little box to fill in a strap value? :-D
22:48 imirkin: RSpliet: hehe sure
22:48 imirkin: i don't remember how to feed it in, but i'll work it out
22:48 RSpliet: yeah... it normally reads it from a file called strap_peek. Maybe we need a param to provide it on the command line
22:49 RSpliet: Anyway, nice-to-have, not nearly as cool as compute support, don't get too distracted by it :-P
22:50 imirkin: i just realized i'm going to have to add some remapping for image support, so that'll be fun
22:50 imirkin: since buffers and images are the same
22:53 RSpliet: Not sure what else they would be... mapped textures?
22:53 damo22: opencl generic compute numbers?
22:53 imirkin: RSpliet: well, on fermi+, images are bound separately
22:53 imirkin: and then you keep track of buffers separately
22:54 imirkin: whereas here i have one set of indices in hardware
22:54 imirkin: to deal with 2 sets of indices in the shader
22:56 RSpliet: Actually sounds kind of cool if they could be textures. Get to sample them using that dedicated pipe rather than using open-coded pointer arithmetic... guess you still want the ability to sample them anywhere in the image though
22:56 imirkin: no open-coded pointer arithmetic
22:56 imirkin: you give it x/y coords
22:56 imirkin: but you have to do the buffer conversion
22:56 imirkin: er
22:57 imirkin: the format conversion
22:57 imirkin: to/from the packed data
22:57 damo22: so you apply an inverse format conversion for your generic data and let the hardware compute the format of the "pixels"
22:57 RSpliet: cvt doesn't cover that does it?
22:58 imirkin: RSpliet: let me know which cvt arg goes between a 32-bit value and r11fg11fb10f
22:58 imirkin: it's all doable, but it's not like a single op
22:58 RSpliet: imirkin: exactly :-D
22:59 imirkin: and it's a different set of ops per format
22:59 imirkin: but i've already written it for nvc0, so easily adapted
22:59 imirkin: unfortunately for nv50, also have to do it for *writing*
23:01 imirkin: and, if we support the full functionality, we have to support the case where the format is not known in the shader
23:01 imirkin: so ... very annoying. and nv50 doesn't have the "library" stuff hooked up (yet, apparently)
23:02 RSpliet: Does NVIDIA even support that level of functionality?
23:02 imirkin: on nv50?
23:02 imirkin: in CUDA? dunno
23:02 imirkin: definitely not in GL
23:02 RSpliet: OpenCL?
23:02 imirkin: dunno
23:15 dviola: I just imported my vbios to https://people.freedesktop.org/~imirkin/nvbios and I get a lot of failed to parse, etc errors
23:15 imirkin: ok
23:15 imirkin: it's just a webassembly version of the nvbios tool in envytools
23:15 imirkin: feel free to fix it up
23:17 dviola: I don't know what I'm even looking at
23:18 RSpliet: a partial view of the data and scripts in the VBIOS of course!
23:21 damo22: is there a disasm tool for the code in the vbios?
23:21 RSpliet: nvbios does that too
23:21 damo22: i could potentially write it into a r2 plugin
23:23 dviola: can I get someone else vbios to compare? is there a list somewhere?
23:24 RSpliet: dviola: if it's just for kicks, there's plenty of overclockers forums with lists of VBIOSes
23:24 dviola: I'll take a look, thanks
23:40 imirkin: dviola: https://people.freedesktop.org/~imirkin/traces/ -- a bunch in there, enjoy
23:54 dviola: thanks
23:54 imirkin: at some point i should dump all of my gpu's
23:54 imirkin: i haven't been good about that
23:54 imirkin: a bunch of them are in our vbios repo
23:55 imirkin: dviola: https://gitlab.freedesktop.org/nouveau/nouveau_vbios_trace
23:55 dviola: should something like appear in the vbios dump: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/nouveau/nvkm/subdev/pmu/base.c#n108 ?
23:55 imirkin: no
23:55 dviola: 0x10a04c
23:55 dviola: ok
23:55 imirkin: i mean, i guess i dunno
23:55 imirkin: it might
23:56 imirkin: i think it usually does a more hard-core reset
23:56 imirkin: but i dunno
23:59 dviola: hrm, ok