05:01maniacs: It is actually exactly meant that way, so where event loop is being polled DMA packets flow control the devices. https://thenumb.at/cpp-course/sdl2/06/06.html , it should be nothing but complex paradigm for any random programmers profile and not at all any headache for medium size maintenance team, it goes as fast as possible. I think dcbaker might merge his efforts to something like this instead of doing mindless work, i can spot some depression he
05:01maniacs: has, same here i also do not want to do pointless work.
05:13maniacs: So Cornell people also implemented the dma compute machine on chips that lack enough dma channels, like microchip that has no bus master ...however they called it, it only has 4system used channels, but this hack was also known to me, so on such chips the io uses slightly different ways, but those are microcontrollers, i am not aware of devices other than that totalling below 8 channels.
05:45Venemo: Hey guys
05:45Venemo: what happened to freedesktop gitlab? is it still down?
05:45emersion: yes
05:48maniacs: We run on those chipsets, an improved version slightly, dma controllers job is to print variables in the answer set and make gpio, so if you have 1024 intermediate answers, it can without enough dma channels use a bytemask on the offsets and decrement and increment, so it ends up being still very fast, cpu lifts it back to 32bit representation, cause with big numbers dma would delay otherwise, that happens only on some microchip controllers that are
05:48maniacs: not GX or up enumerated.
08:34pq: Lynne, I have so much FUD and paranoia about being legally tainted by dolby vision, that I'm hesitant to even want to know anything about it.
09:01karolherbst: maybe we have to work on a proper solution on how to deal with such proprietary technology like that, like in a more general sense and see if there is a way to move forward, even if that means not supporting such techs on a case by case basis. If we want the linux desktop to succeed, we also kinda need a solution for those problems and I'm wondering what we could do here to also protect developers getting involved with such IP heavy techs
09:51slattann: Test Mesg
10:37bddvlpr: Hey, any updates on the GitLab downtime? Are there any available mirrors atm?
10:49emersion: you can join #freedesktop to follow along the latest updates, bddvlpr
11:54Lynne: pq: what it means is that you don't have to worry about it that much
11:55Lynne: the blu-ray profile carries regular HDR with its main 4k stream, it's only when combining the extra stream that you need dovi processing
11:55Lynne: as for the streaming profile, you can just let some other library optionally convert it to regular hdr
11:59Lynne: it's not going to light up the Dolby™ light on the TV or receiver, but if done properly, no one would miss it
12:00Lynne: reminds me of the issue with ac3/spidf passthrough, even though hdmi supports uncompressed PCM, some users reported the dolby light does not come on unless ac3 is sent instead
12:04karolherbst: if all those proprietary stuff could be dealt with inside userspace that would be very helpful indeed. Does something like that already exist and are the drawbacks known?
13:06Lynne: yeah, libplacebo supports the streaming profile (profile 7)
13:08Lynne: drawback-wise, I guess if the target HDR colorspace is a subset of what the TV supports, and the TV has a better knowledge of what it supports such that it can better convert the dovi source into its own native colorspace
13:10Lynne: but with displays these days needing dithering to get 10ish bits out of an 8-bit panel, not sure if that could happen
13:12pq: I'm sure it can happen, as display metadata (EDID etc.) seems to be unwilling to tell what the display actually supports.
13:12pq: as in, true hardware capabilities, rather than legacy colorimetry or desired signal properties
13:24gfxstrand: zmike: Trying to become a moderator on all the channels? :P
13:25zmike: no, I don't want to do it
13:29haasn: yeah the whole point of dovi is for the dovi special chip inside the TV to do custom tone-mapping based on the known display characteristics
13:30haasn: taking into account peak luminance at different window sizes
13:30haasn: and power limits etc.
13:30haasn: we can't replicate that in software without knowing all those details about the hardware
13:30haasn: good luck
14:38karolherbst: that's fair I guess, though something we can consider being a viable fallback or something
14:44pq: If one is legally allowed to simply pass through dovi data, then that could be done if the dovi content is fullscreen and guaranteed exclusive.
14:45karolherbst: I have by no mense actual knowledge about the HDR stuff here, I'm just trying to approach this issue from a consumer/user perspective. And if users/consumer want to have this all supported from an upstream kernel, I'm wondering how we can find a solution here to make it work for everybody.
14:45pq: The problem is that such pass through interface likely becomes dovi specific and useless for anything else, and I wouldn't know if that's ok.
14:46karolherbst: could there be other proprietary formats like it? Maybe we need such an interface for formats where we just hand wave on the actual result and the hardware just does what it thinks is right
14:47karolherbst: it's a pain issue, but I suspect there are users wanting to use that (or something similiar) today or in the near future
14:48Lynne: I remember hearing from some company (plex, I think) that they were told that if they only pass the dovi data without processing it, they don't have to license anything, but I'm not a lawyer (especially not one of those super-paranoid layers who thinks it's okay to cut aac decode features from a library and still say it's compliant)
14:48pq: or we need to make the open solution good enough, that dovi loses its appeal
14:49zamundaaa[m]: pq: That is my biggest problem with it. Maybe in embedded use cases only playing HDR video well in fullscreen is fine, but I don't think it's something that would be good on desktop systems
14:49karolherbst: Lynne: might help to have that in writing
14:49pq: and for that I have big wishes for SBTM
14:50karolherbst: yeah.. just annoying if you don't have the hardware for fancy new replacements
14:50pq: I don't have any dovi supporting hw that I recall either
14:51karolherbst: yeah well.. but random users might
14:52karolherbst: maybe it's also an non issue and it only matters for downstream/vendors/whoever
14:52pq: I shall invest my time making open stuff better.
14:53karolherbst: fair, and I don't think anybody expects that you specifical help implement all that stuff. And if nobody from the maintainers want to accept it, then that will be the end of this feature
14:54karolherbst: but then we should be clear we don't want it so nobody wastes their time trying to upstream it
15:02Kayden: gitlab status - https://floss.social/@XOrgFoundation/110966947933348638 - can listen in on #freedesktop if needed for the latest
16:07DavidHeidelberg[m]: karolherbst: `ggml_opencl: device FP16 support: false`, `RUSTICL_FEATURES=fp16` radeonsi is listed as supported in features
16:12karolherbst: yeah... mesamatrix is a bit broken there
16:12karolherbst: ehh wait
16:12karolherbst: DavidHeidelberg[m]: is it though?
16:13karolherbst: ohh wait.. mhh
16:13karolherbst: yeah, setting that env var _should_ make it work, but maybe there is something not compiling properly or something?
16:13karolherbst: dunno
16:13karolherbst: would need more details
16:14DavidHeidelberg[m]: maybe bug in the code, trying different version
16:14DavidHeidelberg[m]: (the application)
16:14karolherbst: mhh
16:14karolherbst: maybe it's shader caching...
16:14karolherbst: or something weird
16:14karolherbst: though I thought I hooked that up
16:17DavidHeidelberg[m]: `fp16_support = strstr(ext_buffer, "cl_khr_fp16") != NULL;`
16:17DavidHeidelberg[m]: this should work I guess
16:17karolherbst: yeah...
16:17karolherbst: else check with clinfo
16:17karolherbst: DavidHeidelberg[m]: is your mesa tree new enough though? :D
16:18DavidHeidelberg[m]: karolherbst: nightly build :)
16:18karolherbst: mhhh
16:18karolherbst: yeah.. then no idea. Is it listed with clinfo at least?
16:19DavidHeidelberg[m]: karolherbst: not listed: cmd: RUSTICL_FEATURES=fp16 RUSTICL_DEVICE_TYPE=gpu RUSTICL_ENABLE=radeonsi clinfo
16:19karolherbst: mhhh
16:20karolherbst: DavidHeidelberg[m]: mind debugging si_get_shader_param with PIPE_SHADER_CAP_FP16 inside si_get.c ?
16:20DavidHeidelberg[m]: nvm, oops, I see it's not nightly :/
16:20karolherbst: heh
16:20DavidHeidelberg[m]: 23.1.6-1 debian probably downgraded as FDO was down
16:20karolherbst: might also be that radeonsi disables it by default
16:20karolherbst: there is this `return sscreen->info.gfx_level >= GFX8 && sscreen->options.fp16;` check
16:21karolherbst: and fp16 defaults to off
16:21karolherbst: uhm.. false
16:21DavidHeidelberg[m]: hmm, that's only for INT16
16:22karolherbst: and PIPE_SHADER_CAP_FP16
16:22DavidHeidelberg[m]: (ignore me, right falltrough)
16:22karolherbst: fp64/fp16 is basically untested :)
16:23DavidHeidelberg[m]: with LLM I think that's not going to be problem soon after people start playing with it
16:23karolherbst: yeah... also LLM won't run into the broken parts
16:24karolherbst: though if somebody gives me a non painful tutorial on how to do LLM stuff, I might even make sure it properly works and stuff
16:24karolherbst: but...
16:25DavidHeidelberg[m]: btw. I overriden the check in the program and it works
16:25DavidHeidelberg[m]: (at least it seems)
16:25karolherbst: yeah.. it should
16:25karolherbst: the biggest issue are just denorms + missing libclc builtins
16:26DavidHeidelberg[m]: and we still don't talk about bf16 which isn't even in OpenCL spec :'(
16:27karolherbst: no idea if anybody even works on it
16:28karolherbst: but supporting bf16 in mesa will be pain
16:28DavidHeidelberg[m]: No mention anywhere, so it would be nice if secretly it'll show and everyone instantly supports it, but I'm too old to believe in fairy tales :D
16:28karolherbst: it's going to be quite a bit of work sadly
16:29karolherbst: or mhhh
16:29karolherbst: is bf16 supposed to be faster than fp32 in terms of processing or just reducing memory bandwidth?
16:30karolherbst: I'm sure we could just treat is as fp32 and only store /load16 bits....
16:30karolherbst: unless there are GPUs supporting it natively
16:30DavidHeidelberg[m]: I assume it's mostly reducing memory consumption, if you have large LLM, it make sense to have a bit lower precision and more nodes?
16:30karolherbst: ohh. I'm more wondering about alu speed, because on modern GPUs fp16 is twice as fast as fp32
16:31karolherbst: if that's not the case with bf16, then it's fairly trivial to support
16:31DavidHeidelberg[m]: karolherbst: btw. the output: https://paste.sr.ht/~okias/cb912df89bbd80509049d3a34b41f56de8bbb078
16:31karolherbst: meaning we just treat is as fp32, just 16 bit sized
16:31karolherbst: yeah, so radeonsi disables it
16:46zmike: fp16 had this issue https://gitlab.freedesktop.org/mesa/mesa/-/issues/5953
16:51DavidHeidelberg[m]: btw. updated to nightly mesa-opencl-icd (as FDO is up now) and now rusticl works much better
16:51DavidHeidelberg[m]:quietly selling the debian nightly repo :D
16:54DavidHeidelberg[m]: airlied: tinygrad producing reasonably quickly output on AMD XT 6800, first time. thou from 13.G now 15.7/16.3G ram is gone and MBytes disappearing quickly. I think more than one complex question won't fit into 16G :D
16:56DavidHeidelberg[m]: .. and it does some cleanup time to time, so maybe it'll work. Also while it not producing totally stupid things, I wouldn't let it to recommend me realiable BMW engine :D
16:57DavidHeidelberg[m]: *VRAM cleanup
17:02mareko: karolherbst: only RDNA 3 has BF16 and only for fdot and the cooperative matrix ops
17:03mareko: BF16 for all FP ops is unlikely
17:10mareko: karolherbst: FP16 isn't 2x faster. It can be faster in marketing specs, but the reality is more complicated. RDNA 3 should have equal FP32, FP16, and Int16 performance in Wave64. Wave32 is more complicated and perf is compiler-dependent. RDNA 2 has half the theoretical ALU perf of RDNA 3. You can get 2x FP16 and Int16 perf on RDNA 2, but you have to generate vec2 16 ops to get close to what RDNA 3 has,
17:10mareko: otherwise you're out of luck.
17:10Lynne: do the matrix units support f16 or are they bf16 only?
17:12karolherbst: mareko: sure, but nvidia claims being able to do twice as many fp16 ops than fp32, but I never actually checked
17:12karolherbst: but yeah... I think we should probably model bf16 as fp32 in nir, otherwise it's going to be pain
17:12karolherbst: and have instructions doing bf16 stuff that eat fp32? dunno...
17:13karolherbst: or we use fp16..., but then float operations are going to be a mess
17:13mareko: all marketing "claims" for FP32 and FP16 are only for fma16 and fma32 under ideal circumstances
17:15HdkR: karolherbst: Make sure to fuzz which fp16 operations your hardware is missing :P
17:15karolherbst: seems like nvidia only claims it for turing anyway
17:16mareko: the problem with using fma to market GFLOPS is that fma is the only instruction that can do 2 ops per issue (mul+add), so it's at least 2x the performance of all other instructions
17:17karolherbst: yeah, fair
17:17mareko: even mov is 2x slower than fma, so 1/2 GLOPS for mov
17:17mareko: *GFLOPS
17:19karolherbst: right, but this was about fp32 vs fp16, not fma vs anything else
17:19mareko: it's about the same perf on RDNA 3
17:19karolherbst: okay, so the win is realy just less memory bandwidth
17:20mareko: and register usage
17:20karolherbst: yeah, in the optimal case
17:20mareko: RDNA 3 can source half a GPR
17:20mareko: and can select the low or high half
17:21mareko: RDNA 2 must use vec2 to save registers
17:22mareko: while RDNA 3 can just do register allocation at 16-bit granularity
17:46mareko: Lynne: they support fp16, bf16, i8, u8, i4, u4
18:01Lynne: ah, ok, I was wondering how the coop matrix extension would work, since vulkan doesn't make a distinction
18:07DavidHeidelberg[m]: karolherbst: btw. iris also doesn't show fp16
18:09karolherbst: yeah.. iris doens't support fp16 at all
18:35DavidHeidelberg[m]: kk, then ok :)
18:36tnt: What happens when you ask for a fp16 texture then ? It just uses fp32 ?
19:07Kayden: your textures can be in whatever format you want, but when you request data from the sampler, it returns it as float32/sint32/uint32
19:08Kayden: there is hardware support for fp16 send/return messages these days, and we ought to use it, but don't yet
19:08Kayden: not as much as we should
19:09Kayden: it got put on the backburner quite a bit because fp16 support was kind of mixed, in that there were a variety of painful restrictions, and performance was originally not really better, or in many cases not better - again, kinda complicated
19:09Kayden: I think it's worth using these days
19:13karolherbst: at least fp16 in CL is really just the alu stuff, and maybe that wouldn't be too hard to wire up
19:14karolherbst: but it's kinda pointless without proper libclc support
22:41bnieuwenhuizen: Lynne: the vulkan extension doesn't support bf16 and i4/iu :(
23:18Lynne: eh, you only need bf16 if you're doing NN stuff without taking care of normalization, there's plenty you can do with f16
23:19Lynne: 16x16 is a large block, you can implement a DCT or some other transform in just a matrix multiply
23:21Lynne: as long as it's possible to justify the loading/storing overhead in case you need to gather samples