00:56 airlied[d]: I was contemplating using nom for ptx parsing but I didn't know enough to start yet 🙂
01:01 gfxstrand[d]: It looks like nom probably only pulls in one dep but it's hard to tell with crates.io.
08:32 marysaka[d]: gfxstrand[d]: yeah it only depends on memchr
09:42 snowycoder[d]: airlied[d]: Nom is quite easy and minimal, I already built some parsers with it
09:56 magic_rb[d]: Nom is great, ive used it ages ago, but most recently ive used the haskell version and its also great. In general theyre called parser combinator libraries
09:57 magic_rb[d]: The outcome is a recursive descent parser
10:52 pac85[d]: Time for Haskell in mesa
11:17 magic_rb[d]: oh im down :)
20:04 airlied[d]: mhenning[d]: I've asked nvidia for info on the fp16 stuff, will let you know what comes out of it, I could just drop that patch from the series, it's probably not essential
20:20 mhenning[d]: airlied[d]: Does it still pass cts without that patch?
20:27 airlied[d]: I'd say so, I'll try and run it today
20:28 karolherbst[d]: airlied[d]: you are aware that the fp16 stuff is always coupled on ampere, right?
20:30 airlied[d]: not according to the docs you gave me 🙂
20:30 airlied[d]: it might be Tegra SMs that are different, hence why we don't see it
20:31 airlied[d]: is there production tegra ampere SMs?
20:31 Jasper[m]: SM?
20:31 airlied[d]: mhenning[d]: does nvcc generate tegra code?
20:31 airlied[d]: shader m
20:32 airlied[d]: oh streaming multiprocessor?
20:32 Jasper[m]: It would be on the Orin's
20:32 karolherbst[d]: airlied[d]: yes
20:32 karolherbst[d]: they are according to the docs
20:33 airlied[d]: they really aren't, the docs clearly say "Redirected" and talk about scoreboarding fp16 operations
20:33 karolherbst[d]: it says coupled 🙃
20:33 karolherbst[d]: I have it open here and checked
20:33 airlied[d]: the opcode extension doc?
20:34 karolherbst[d]: the tables
20:34 airlied[d]: go read the opcode extension doc, it disagrees
20:34 karolherbst[d]: the extension doc was written by a technical writer
20:34 karolherbst[d]: specifically for us
20:34 karolherbst[d]: it's not used internally at nvidia
20:34 karolherbst[d]: and it's wrong in a couple of places
20:34 karolherbst[d]: apparently
20:35 karolherbst[d]: the tables are correct afaik
20:36 airlied[d]: the ampere latency tables at least don't say redirected, so it's probably fine, I only care about Turing since we don't enable fp16 on it yet
20:36 karolherbst[d]: mhhh... oh right, I haven't given you the entire docs...
20:36 karolherbst[d]: or have I?
20:37 karolherbst[d]: nah, I did
20:37 karolherbst[d]: check the xls files, first sheet
20:38 airlied[d]: yup, have read that, so we are fine on Ampere, it's just Turing that has a question
20:38 karolherbst[d]: afaik it's true on turing that it's redirected
20:39 karolherbst[d]: but maybe nvidia really sometimes omits the barriers on turing there?
20:39 airlied[d]: yup so it's strange nvcc doesn't do it
20:39 airlied[d]: I've got to type in the rest of the Ampere latencies this week if I get some time
20:41 karolherbst[d]: mhhh wikipedia also claims that 2:1 FLOPS for fp16:fp32... strange
20:41 karolherbst[d]: maybe it's a mistake in the doc...
20:41 karolherbst[d]: or something funky going on
20:43 karolherbst[d]: mhhhhh
20:43 karolherbst[d]: sooo
20:43 karolherbst[d]: apparently the ISA doc do list the barriers are valid things for fp16 instructions...
20:43 karolherbst[d]: maybe they added it just in case, but never created hardware where it's actually slow?
20:44 karolherbst[d]: maybe they planned to do so for the 16 series but then reconsidered?
20:44 airlied[d]: yeah it might just be no product ever shipped with this
20:53 mohamexiety[d]: karolherbst[d]: this should be true in Turing
20:53 mohamexiety[d]: packed FP16
20:53 mohamexiety[d]: not true for ampere onwards though, because FP32 also got a doubling
20:54 mhenning[d]: airlied[d]: That's a good question. I know you can run cuda code on tegra so there's defiinitely some version of nvcc that works for tegra, but I don't know if it's the same as the desktop one or not
20:55 mhenning[d]: nvk doesn't support tegra yet so I haven't figured out much for reverse engineering there
20:56 airlied[d]: I suppose Xavier was Volta based? so it probably suffers from this
20:57 mhenning[d]: isn't fp16 more common in gles? I'd be a little surprised if tegra had worse fp16 than desktop
20:59 mhenning[d]: Also, I did check volta and it also shows fixed latency HADD2
21:00 karolherbst[d]: there is no turing tegra
21:01 karolherbst[d]: maybe that's the product they skipped, but dunno
21:05 mohamexiety[d]: yeah, no turing tegra
21:05 mohamexiety[d]: iirc it was Maxwell > Volta > Ampere > Blackwell
21:08 Jasper[m]: Pascal aswell
21:08 Jasper[m]: Kepler before that
21:10 mohamexiety[d]: I see, forgot the Pascal ones
21:25 kriztall: Client: HexChat 2.16.1 • OS: Debian 6.2 • CPU: Intel(R) Core(TM) i5-4460S CPU @ 2.90GHz (1.10GHz) • Memory: 14.7 GiB Total (12.3 GiB Free) • Storage: 19.5 GB / 45.5 GB (26.0 GB Free) • VGA: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller @ Intel Corporation 4th Gen Core Processor DRAM Controller • Uptime: 2h 28m 55s
21:25 kriztall: memory stick
21:56 gfxstrand[d]: I think I'm getting a Volta Tegra. Someone found it lying around the Collabora office and said, "Faith should have this." 😅
21:56 gfxstrand[d]: Maybe an excuse to fix the kernel
22:02 redsheep[d]: Poor volta
22:02 airlied[d]: ah must be a xavier then
22:16 gfxstrand[d]: On the upside, it'll mean I have a Volta. 🤷🏻‍♀️
22:16 gfxstrand[d]: Possibly the most cursed GPU in the line-up, though. 😂