00:12mareko: gfxstrand: what is NAK?
00:13airlied: nvidia kompiler
00:13mareko: so not NCO
00:15mareko: maybe the NXPTX LLVM backend is the answer, who knows
00:15mareko: *NVPTX
00:16karolherbst: the answer to what? ptx is a high level language
00:18airlied: where's my ptx to spir-v translator :-P
00:19airlied: https://github.com/gthparch/NVPTX-SPIRV-Translator oh someone wrote it :-P
00:19karolherbst: I wonder how often we can translate in circles until something crashes
00:20airlied: or someone could write tcg like layer for nvidia binaries :-P
00:21karolherbst: uhhh
00:35alyssa: karolherbst: dozen + vkd3d all the things
00:36karolherbst: mhhh
00:37alyssa: or vkd3d + dozen if you prefer
01:20idr: karolherbst: It's like that game of translating some bit of text through various human languages until you get total gibberish.
01:27alyssa: Es como ese juego de traducir un poco de texto entre varios idiomas humanos hasta que no tenga sentido
01:50idr: https://media.tenor.com/ts_UxTASGroAAAAC/cant-understand-your-accent-spongebob.gif
03:03tuxayo: Hi, hola, saluton :) Does anyone know about what could be missing in an AppImage to have Vulkan support? Someone worked on an AppImage for the 0ad game and when enabling the Vulkan renderer, it doesn't detect support (probes for VK_KHR_surface) on what seems to be any Intel and AMD GPUs in general (so it seems to have something to do with Mesa).
03:03tuxayo: And it falls back on OpenGL.
03:03tuxayo: On an NVIDIA GPU it works (I'm assuming it was the non-libre driver).
03:03tuxayo: So it can find the right mesa stuff when using OpenGL but when using Vulkan it doesn't find it. But it does find the non-libre NVIDIA Vulkan driver...
03:03tuxayo: Any clue? Here is the main build script and it seem to do nothing in particular to give us good Mesa OpenGL support: https://github.com/0ad-matters/0ad-appimage/blob/trunk/workflow.sh
03:03tuxayo: And here is the head-scratching so far:
03:03tuxayo: https://discourse.appimage.org/t/vulkan-disabled-when-running-0ad-appimage-with-intel-or-amd-chipsets/2908
03:03tuxayo: https://github.com/0ad-matters/0ad-appimage/issues/19
03:23airlied: tuxayo: probably missing the vulkan loader
03:27airlied: but probably also need the mesa vulkan drivers
03:27airlied: not sure how appimage works there
03:31airlied: tuxayo: maybe also the headers to build against, not sure how NVIDIA works
03:44tuxayo: airlied: thanks for the hints. So likely linuxdeploy/AppRun which build the AppImage takes care of basic mesa stuff but lacks vulkan loader/mesa vulkan drivers/headers
03:50airlied: yeah if I had to guess
04:03marcan: looks like gitlab is unhappy...
04:34Nefsen402: It's an issue for me as well so it isn't localized
07:49javierm: tzimmermann: hi, I haven't reviewed your optional fbdev series yet, but wondered what did you different than what I attempted in https://lore.kernel.org/lkml/20210827100027.1577561-1-javierm@redhat.com/t/
07:51javierm: tzimmermann: ah, I see. You want to hide all the fbdev uAPI (/dev/fb?, sysfs, etc) while I tried to only disable the "real" fbdev drivers (but keeping emulated fbdev uAPI)
07:51javierm: tzimmermann: so you plan to only keep the bare minimum to support fbcon, makes sense
07:53tzimmermann: javierm, it occured to me that we spoke about that change at some point. but i didn't remember that you even sent a patchset. i'll give you credit in the next iteration of the patchset.
07:53tzimmermann: javierm, i'm not sure what the difference is. but i was just reading the old discussion and I left a comment about the exisence of the fb device
07:54tzimmermann: in my patches i remove all of that. everything in devfs, sysfs and procfs is gone
07:54javierm: tzimmermann: yeah, I wasn't sure about the difference but after reading your cover letter I understand the difference of the approach now
07:54javierm: tzimmermann: I tried to keep the emulated DRM fbdev while you are also getting rid of that
07:54tzimmermann: fb_info will only by a data structure that connects the framebuffer device with fbcon
07:55tzimmermann: s/by/be
07:55javierm: tzimmermann: I think that did because something still dependend on that (maybe plymouth?) but that has been fixed already
07:55javierm: so I agree that your apparoch is better, get rid of all the uAPI for fbdev and just keep fbcon for now
07:55tzimmermann: there's not much in userspace that requires fbdev. i guess most doesn't even support it
07:56javierm: tzimmermann: yeah
07:56javierm: tzimmermann: I see that you will post a v2. I'll review that then
07:56tzimmermann: two thirds of these patches are actually bugfixes :)
07:56javierm: :)
07:57tzimmermann: javierm, your review is very welcome. i'll keep the current version up a bit longer.
08:19javierm: tzimmermann: sure, I'll review v1 then
08:22MrCooper: DavidHeidelberg[m]: my main point was that the commit logs don't accurately reflect the situation and trade-off being made
08:36tzimmermann: thanks, javierm
09:01siddh: Hello, can anyone merge the revert commit [1] I had sent some time ago regarding drm macros? IIRC, the author had blindly used coccinelle and did not consider the unintended change. It is part of the drm macro series, but even if the series is not considered for merge, the revert should be since the change was incorrect.
09:01siddh: [1] https://lore.kernel.org/dri-devel/e427dcb5cff953ace36df3225b8444da5cd83f8b.1677574322.git.code@siddh.me/
09:36jani: siddh: it no longer applies, needs a rebase
09:37dj-death: gfxstrand: do you remember what led you to disable compression for Anv's attachment feedback loop?
09:38siddh: @jani: oh okay... will send after doing a rebase
09:38dj-death: gfxstrand: there is a comment about aux data being separately
09:38dj-death: gfxstrand: but that makes no sense to me
09:39dj-death: gfxstrand: texturing & rendering having different caches does, but I'm failing to see where the compressed data fits in there
09:58javierm: tzimmermann: not sure I got your comment about the page_size on the ssd130x driver, that's not the system memory pages but the ssd130x controller "pages", that is how they divide the screen
09:59javierm: tzimmermann: https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/solomon/ssd130x.c#L442
10:00javierm: tzimmermann: also, the GEM shmem allocation is only done for the shadow buffer and that's bigger than the actual screen. Since is DRM_SHADOW_PLANE_MAX_{WIDTH,HEIGHT}
10:00javierm: or am I wrong on that?
10:31tzimmermann: javierm, what i mean is: userspace allocates a GEM buffer, say 800 x 600. those sizes are aligned to a multiple of 64. so you'd allocate a memory block of 832 x 640 bytes. if these sizes are not dividable by 'page_size' and you do a DIV_ROUND_UP, you might end up with values that refer to areas outside the memory. for example during pageflip's memcpy() . i don't know if that can actually happen in the driver. i was
10:31tzimmermann: just concerned that the page_size might interfere here
10:36Hazematman: Hey, I'm working on a driver that doesn't have native support for PIPE_FORMAT_R32G32B32_FLOAT. If an OpenGL app requests that format as a RB, gallium seems to converts it to PIPE_FORMAT_R32G32B32A32_FLOAT (which is supported). Does anyone know where this happens. I've been trying to dig through the gallium infrastructure to see where it handles surface conversion, to find if its possible to access the native requested format. Any
10:36Hazematman: guidance of where I should look would be appreciated
11:11danylo: Hazematman: I think it chooses the compatible format with `choose_renderbuffer_format`. I guess to see where it handles the mismatch between formats you'd have to search for where `->InternalFormat` is used.
11:38javierm: tzimmermann: ah, got it. Good point, I'll look if that can happen and if is a possibility can fix on top. Thanks!
12:42swick[m]: Lyude: I'm looking at https://gitlab.freedesktop.org/drm/intel/-/issues/8425 again. The intel eDP proprietary backlight control has a bunch of registers and control bits unused which sound like they could be the cause.
12:43swick[m]: jani: ^
12:43swick[m]: are there more details on them? I don't have the hardware to test any of that...
14:34mareko: DavidHeidelberg[m]: do any amd CI tests use LLVM < 15?
14:39DavidHeidelberg[m]: mareko: I don't think so, so far all images are 15
15:43mareko: great, thanks
16:18mareko: karolherbst: when do you think we can drop clover support from radeonsi?
16:35karolherbst: mareko: I want to wait until proper function calling support
16:36karolherbst: that's more or less the biggest regression compared to clover
16:37karolherbst: I kinda plan to prototype this with llvmpipe and radeonsi given they use LLVM so it shouldn't be too hard to do, but nir needs some fixes here and there
16:38mareko: NIR->LLVM can't do function calls
16:38karolherbst: I know
16:39karolherbst: but without that we sometimes get shaders with like 2 million SSA values and RA eats through RAM and takes hours
16:40karolherbst: there are still some unknowns on how to do things, but my initial plan was to kinda only have function calls between the kernel and libclc
16:40karolherbst: and maybe only for functions being of specific size
16:40karolherbst: some of those libclc functions are massive and even use LUTs
16:41mareko: I don't know if LLVM support function calls with the Mesa (non-native) ABI
16:42mareko: LLVM compiles shaders slightly differently for radeonsi, RADV, PAL, and clover (same as ROCm)
16:44mareko: there is an LLVM target triple that we set, radeonsi sets amdgcn--, RADV sets amdgcn-mesa-mesa3d, and I don't know what clover sets
16:44karolherbst: clover has a CAP for it: PIPE_COMPUTE_CAP_IR_TARGET
16:45karolherbst: it's amdgcn-mesa-mesa3d as it seems
16:45mareko: ok
16:45karolherbst: we don't need an ABI because I'm not planning to link GPU binaries, so as long as the final binary works it's all fine
16:45karolherbst: or rather, not a stable one
16:46karolherbst: so whatever llvm does internally for function calls doesn't really matter here
16:46mareko: arsenm on #radeon might know if amdgcn-- supports function calls
16:48karolherbst: besides that we have a little delete clover tracker here: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19385
16:48karolherbst: fp16 is kinda the only other thing missing, but that should be fairly trivial to add
18:07mareko: karolherbst: given what arsenm said, the only missing thing is function call support in ac_nir_to_llvm and probably adjacent places
18:23airlied: karolherbst adding functions to llvmpipe was a bit of a pain, radeonsi might be easier at least as long as it's using llvm
18:23airlied: but I think with llvmpipe the overheads of sticking stuff onto the stack was quite noticeable
18:36mareko: wow the Marge queue has 15 MRs
18:36karolherbst: airlied: yeah.. that's why I only want to turn calls into huge libclc functions into proper calls
18:37mareko: for Mesa
18:37karolherbst: where copying them multiple times would just hurt everything
18:40karolherbst: airlied: I kinda want to figure out why those luxmark benchmarks explode in size and just do function calls to deal with that problem
18:50mareko: radeonsi also unrolls aggressively
18:50mareko: see si_get.c
18:50mareko: loops with up to 128 iterations are unrolled
18:51mareko: probably regardless of the loop body size
18:51karolherbst: mhhhh, that would be.. bad
18:52karolherbst: anyway, I didn't check why those shaders explode in side, I just now they end up with millions of SSA values
18:52karolherbst: maybe I should do that
18:53karolherbst: mareko: seems like opt_loop_unroll checks for 26 instructions
18:53karolherbst: ehh wait
18:53karolherbst: iterations
18:54karolherbst: or is it instructions?
18:54karolherbst: yeah.. it's instructions
19:16karolherbst: mareko: btw, will you create the MR for the vectorization stuff?
19:18HdkR: karolherbst: How does one setup rust in meson's cross files for 32-bit? Or do I just ignore 32-bit rusticl?
19:18karolherbst: HdkR: good question, but I guess you just set rustc and set a 32 bit target as a compiler flag
19:18karolherbst: I actually did that...
19:19HdkR: Currently meson just complains that `rust compiler binary not defined in cross or native file`
19:19karolherbst: ahh yes.. HdkR: rust = '/home/kherbst/.rustup/toolchains/1.59-i686-unknown-linux-gnu/bin/rustc' 🙃
19:20HdkR: ah
19:21karolherbst: I think you can potentially also set the target, but I think just pointing to a toolchain is the proper way.. dunno.. I guess it depends on how your distrubtion handles it if you are not using rustup
19:22HdkR: Currently poking around at ArchLinux
19:23HdkR: ah, blocked by them not supporting spirv-tools for 32-bit and I'm too lazy to build that :)
19:24karolherbst: :)
19:24HdkR: Oh well, not too concerned about 32-bit CL anyway
19:25karolherbst: one user actually filed a bug, because some 32 bit windows app ran into problems with rusticl 6
19:25karolherbst: s/6//
19:26HdkR: I guess they can figure that out if they want it running under FEX :P
19:26karolherbst: :D
19:26karolherbst: fair enough
19:26karolherbst: at some point I also have to check out FEX on my macbook
19:26HdkR: Finally getting around to creating an Arch image so Asahi users can have a nicer experience
19:26karolherbst: but CL doens't run there very well except for llvmpipe
19:27karolherbst: ahh, cool
19:27karolherbst: but the new and hot asahi distribution is fedora based :P
19:27HdkR: Next step Fedora I guess
19:47DemiMarie: Is the simplest solution to the LLVM problems to stop using LLVM? Walter Bright wrote a C frontend to Digital Mars D in ~5000 lines of D, and I suspect Mesa has far more code than that that just works around LLVM problems. LLVM isn’t magic, and from what I have read it seems that its optimizations don’t really do anything useful. If one needed a C++ frontend that would be another matter, but my understanding is that none is needed.
19:49karolherbst: 1. we'd still have to maintain it 2. llvmpipe 3. C isn't just the language
20:01karolherbst: I won't say no if somebody comes around and writes a full C compiler + all the OpenCL API nonsense bits, but I won't do it
20:02DemiMarie: I see
20:02DemiMarie: Clang having OpenCL support is not something I expected.
20:02karolherbst: yeah.. we just use clangs support there
20:02karolherbst: they deal with most of the extension + header nonsense
20:03karolherbst: well.. builtins at this point, using the headers is slower than using the new and fancy stuff, which isn't headers
20:03karolherbst: kinda don't want to replicate all of that
20:05karolherbst: also.. writing a new C frontend is all cool and everything, but 5k just for parsing/lexing? kinda brutal
20:06karolherbst: anyway.. the CL bits dealing with LLVM are small, most of it is dealing with spir-v stuff.
20:06karolherbst: the part where LLVM matters more is on the backend side
20:06HdkR: Considering lexing is my least favourite part, I'll never do that :D
20:07karolherbst: llvmpipe and radeonsi do a lot of LLVM backend stuff, none of it is even remotely frontend related
20:07karolherbst: radeonsi problem will be solved with ACO, probably
20:07karolherbst: and to replace LLVM's use in llvmpipe we'd have to support _multiple_ CPU architectures with all their nonsense
20:07karolherbst: no thank you :D
20:09DemiMarie: Yeah LLVM is awesome at generating CPU code.
20:10HdkR: LLVM and the CPU side, great
20:10karolherbst: we even found an auto vectorization issue recently ...
20:10karolherbst: now radeonsi calls a nir pass to vectorize so LLVM can still mess up and we won't care
20:11DemiMarie: Not suprising. I imagine llvmpipe generates very easily vectorizable code.
20:11karolherbst: probably
20:11karolherbst: airlied: you might want to call nir_opt_load_store_vectorize :D
20:11karolherbst: in llvmpipe
20:13DemiMarie: Curious: what does llvmpipe-generated code wind up bottlenecking on?
20:16jenatali: Yeah WARP's JIT backend for multiple CPU architectures is a mess...
20:16HdkR: llvmpipe is usually bottlenecked on vertex processing isn't it?
20:17HdkR: Since that was one of the things that SWR targeted as an improvement
20:35karolherbst: antoniospg____: btw, did you made some progress on fp16 support?
20:35airlied: for most workkloads it bottlenecks on memory bandwidth around fragment shading
20:36airlied: there are some vertex heavy workloads where binning hits hard
20:36Lynne: isn't the mess in writing custom jit mostly in the platform ABI differences?
20:36karolherbst: good thing is: we have no ABI to care about
20:37airlied: yeah I'd hate to have to write backends for every processor in mesa itself
20:37karolherbst: anyway...
20:37airlied: karolherbst: not sure, llvmpipe doesn't do vectors like others do vectors
20:38karolherbst: airlied: you still want to give nir_opt_load_store_vectorize a go :D somehow llvm is too dumb to merge loads and ditch repeated address calculations in loops
20:38karolherbst: nah.. it has like _nothing_ to do with vectors
20:38karolherbst: airlied: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9139#note_1940698 and following comments
20:38karolherbst: just vectorizing loads ditches some alus on address calculations
20:38karolherbst: it's very dumb
20:39karolherbst: might be some amdgpu backend specific issue though
20:39airlied: yeah as I said when llvmpipe translates from nir it doesn't a whole lot of address translations itself
20:39karolherbst: mhh, fair enough then
20:39airlied: but yeah I should throw it in at some point
20:39airlied: but I've no real way to notice it working :-P
20:39karolherbst: I'm just super surprised it even matters for radeonsi
20:39airlied: shaderdb someday :-P
20:40karolherbst: heh
20:40karolherbst: maybe I should check with luxmark
20:44DemiMarie: airlied: bottlenecking on memory bandwidth explains why llvmpipe works so well on Apple Silicon.
20:44DemiMarie: Because they have loads of it.
20:44karolherbst: kinda, but less on the CPU side sadly
20:45HdkR: 800GB/s is very much in dGPU territory :)
20:45DemiMarie: Why can GPUs have so much better memory bandwidth?
20:45karolherbst: because it needs more
20:45karolherbst: the CPU seems to have slower access but it might also be because the CPU is too slow
20:46DemiMarie: I’m more interested in what is different about the GPU memory systems, especially on iGPUs where the DRAM and DRAM controllers are identical.
20:46HdkR: To note, CPUs tend to have lower latency on their memory accesses
20:47DemiMarie: Why is there a latency vs throughput tradeoff there?
20:47karolherbst: CPUs cheap out on memory bandwidth because they are still DIMM
20:47airlied: the other things GPUs have is tiling
20:47karolherbst: and it's all very limiting
20:47karolherbst: Apple uses 128 bit for memory transfers
20:47karolherbst: where on x86 you always get 64
20:47airlied: tiled textures are a big advantage if you are doing all the address translation in hw
20:48karolherbst: and the "channel" situation with x86 memory is also just silly
20:48DemiMarie: karolherbst: for iGPUs both the CPU and GPUs have the same DRAM chips, so DIMMs are not relevant here.
20:48karolherbst: but how to you connect the memory?
20:48karolherbst: *do you
20:48karolherbst: the DIMM spec specifies memory operation latencies + transfer rates
20:49karolherbst: can't really fix that
20:49karolherbst: so you are just stuck with whatever that uses
20:49DemiMarie: How is this relevant? My point is that iGPUs have the same memory the CPU does, so it must be something other than the RAM chips.
20:49karolherbst: CPU memory is _slow_ on x86 systems
20:49DemiMarie: what part of the CPU memory is slow?
20:49karolherbst: you get like 50 GB/s on normal consumer systems
20:49karolherbst: the DIMM :)
20:50DemiMarie: then why does i915 not have garbage performance?
20:50karolherbst: it does have garbage perf
20:50karolherbst: the M2 is like 8 times that?
20:50karolherbst: the normal M2
20:50DemiMarie: Is this Intel-specific or does AMD also have bad memory bandwidth?
20:50karolherbst: same on AMD
20:51karolherbst: it's just that consumer systems are dual channel 64 bit at most
20:51karolherbst: and that's like around 50-60 GB/s
20:51HdkR: M2 gets 100GB/s
20:51DemiMarie: Does this mean that the M2 could do faster shading in software than i915 can in hardware?
20:51airlied: steamdeck does 88GB/s
20:51DemiMarie: At least on some workloads where fixed function isn’t a bottleneck
20:51karolherbst: mhhh.. probably not
20:52karolherbst: airlied: quad channel?
20:52karolherbst: or what is differnet on the steamdeck apu?
20:52HdkR: 128-bit bus, technically quad channel because of DDR5
20:52karolherbst: ahhh
20:52airlied: karolherbst: yeah
20:52karolherbst: 128 bit then
20:52karolherbst: well.. it's easily fixable on consumer hardware, but no vendor is tackling it
20:52HdkR: Desktop class would get roughly equivalent on DDR5
20:53karolherbst: Dell kinda tried that with replacing DIMMs, but that's not going anywhere as it seems
20:53karolherbst: HdkR: that's so sad
20:53HdkR: It's a shame that desktop class has been stuck on 128-bit for so long
20:54HdkR: 192-bit would be a cool upgrade for that segment
20:54karolherbst: yeah..
20:54HdkR: Or just go straight 256-bit since most every board supports quad dimms anyway
20:54karolherbst: but that's not getting you 400 or even 800 GB/s :D
20:54DemiMarie: Can we start working on some design specs and see of SiFive can actually build a fast chip?
20:54psykose: people would have riots if you removed replacable dimms
20:55DemiMarie: What does get one that?
20:55karolherbst: psykose: well.. dell suggested something better
20:55karolherbst: but...
20:55psykose: haha
20:55karolherbst: but we can also just stick with slow memory :D
20:55karolherbst: something has to change or it's the end of x86 for real
20:55HdkR: Does CAMM allow 128-bit per module?
20:55DemiMarie: karolherbst: x86 needs to die
20:55karolherbst: HdkR: good question
20:56psykose: riscv also needs to die but nobody wants to hear it
20:56HdkR: Four dimms of 128-bit each would get desktops a hell of a lot closer
20:56airlied: riscv will eat itself
20:56karolherbst: HdkR: probably if you call it QIMM and bumb it to 128 bit :P
20:56DemiMarie: airlied: eat itself?
20:56HdkR: :D
20:56karolherbst: or di 96 bit first and call them TIMMs
20:56airlied: it'll just be incompatible fork after incompatible fork, until there is no "risc-v"
20:57DemiMarie: karolherbst: the task is not keeping x86 alive, but rather ensuring that open platforms do not die with it.
20:57karolherbst: don't use DIMMs
20:57karolherbst: that's the way
20:57karolherbst: just do whatever apple did with memory
20:57airlied: solder that shit down
20:57HdkR: karolherbst: Anything more than 64-bit per DIMM would be an upgrade and I'm all for it.
20:57airlied: or HBM it up
20:57karolherbst: yeah.. soldering is the solution here
20:57DemiMarie: Why????
20:57karolherbst: but "people need to replace it" no
20:57karolherbst: well
20:58karolherbst: it's either that or it dies :)
20:58HdkR: Memory on package is how Apple managed to get those numbers
20:58karolherbst: yep
20:58airlied: the DIMM socket is an impediment to speed
20:58karolherbst: and how GPUs get those numbers for years
20:58airlied: all sockets are
20:58HdkR: It's infeasible in a current spec socketed system
20:58DemiMarie: karolherbst: are you saying that replacable memory simply cannot be anywhere near as fast as soldered memory?
20:58karolherbst: the Dell thing was interesting, but not sure what peak speeds they have
20:58psykose: it's pretty much electrically impossible yes
20:58karolherbst: DemiMarie: correct
20:59psykose: there's too many wires and length of wire to make it fast
20:59DemiMarie: Even with active signal regeneration in the sockets?
20:59karolherbst: the RAM on the M2 is right beside the SoC
20:59karolherbst: like literally right beside it
20:59karolherbst: and it's super small
20:59DemiMarie: Maybe we need optical on-board interconnects
20:59karolherbst: the entire SoC is smaller than an entire DIMM module
20:59psykose: IBM was doing some serial memory thing with firmware on the ram modules
20:59psykose: weren't they
21:00HdkR: optical would introduce /more/ latencies. Short runs of optical are actually slower than just copper. Ask people that use direct-attached-copper cables in networks
21:00puck_: psykose: there's also CXL now
21:00psykose: interesting
21:00karolherbst: ahh CAMM is the Dell thing.. right
21:00DemiMarie: HdkR: Signal propogation velocity is _not_ the limiting factor here.
21:01karolherbst: I acutally don't know if it fixes the perf problem
21:01puck_: i'm reminded of the AMD 4700S
21:01puck_: which is very distinct and has 16GB of soldered RAM used for both the CPU and what would be the GPU but i think they fused off the APU bits
21:02DemiMarie: Even in optical fiber light still goes 6cm in a single clock cycle.
21:02DemiMarie: CPU clock
21:02puck_: ..but it's 16GB of *GDDR6* as main memory
21:02DemiMarie: At 3GHz
21:02karolherbst: yeah but you also have to translate it into electrical signals and all that
21:02puck_: which is fast but has higher latency
21:02DemiMarie: karolherbst: my point is that the signal integrity problems simply vanish
21:02karolherbst: but it comes with massive latency costs
21:02DemiMarie: Why
21:02DemiMarie: ?
21:03karolherbst: because translating to optical signal isn't for free?
21:03karolherbst: we talk single digit ns here
21:03DemiMarie: Let me guess: the real limitation is cost?
21:03DemiMarie: I know
21:03karolherbst: maybe?
21:04karolherbst: but in any case, just soldering it together solves the problem in a simpler way
21:04DemiMarie: And I am 99.99% certain that e.g. optical modulators have latencies far, far lower than that
21:04karolherbst: close to nobody actually upgrades RAM
21:04DemiMarie: fair
21:04karolherbst: and it needs more space to be replaceable and everything
21:04DemiMarie: my point was that a high-speed socketed system is possible, not that it is going to be cost-effective
21:04puck_: i wonder if we'll see an era where there's soldered-on RAM plus CXL if you really need more bandwidth (aka more distinct tiers of RAM)
21:05puck_: s/bandwidth/memory/
21:05karolherbst: soldered RAM even leads to less ewaste in avarage, because it's needing way less space and everything
21:05DemiMarie: True
21:05DemiMarie: Honestly what I really want is for Moore’s Law to finally peter out.
21:05karolherbst: like to match the 800GB/s you need like... 16 DIMM slots I think? :D
21:05HdkR: Say that optical does solve the signal integrity problem. You now need 16 DIMMs worth of bus width to match M1/2 Ultra bandwidth
21:05karolherbst: but yeah...
21:05karolherbst: DIMM is stupid
21:05HdkR: Sixteen!
21:06DemiMarie: HdkR: yeah, not practical
21:06HdkR: Because the M1/2 Ultra has 8 LPDDR5 128-bit packages on it
21:06karolherbst: maybe CAMM would need more
21:06karolherbst: *less
21:06karolherbst: but...
21:06DemiMarie: Serious question: is there room for something that is a GPU/CPU hybrid?
21:06karolherbst: it's still huge
21:06karolherbst: the M2 24GB memory is so _tiny_
21:06dj-death: airlied: what's the current rule to update drm-uapi headers in mesa? take drm-next or drm-tip?
21:06airlied: dj-death: drm-next usually
21:06DemiMarie: Something made for those workloads that are easy to parallelize, but are somewhere between hard and impossible to meaningfully vectorize?
21:07karolherbst: DemiMarie: good question.. intel kinda tries that with AVX512, no?
21:07karolherbst: but....
21:07DemiMarie: karolherbst: anti-AVX512
21:07karolherbst: yeah well.. more threads would help
21:07karolherbst: but we are already going there
21:07DemiMarie: I’m thinking of stuff where the hard part is “what the heck do I do next?”
21:07HdkR: SVE2-512bit :P
21:08karolherbst: yeah.. more threads if you can parallelize
21:08karolherbst: more low power ones even to make use of low power consumption at lower clocks
21:08DemiMarie: Modern compilers are highly parallelizable, but nigh impossible to vectorize
21:08karolherbst: I think most CPU manufacturers will see that high perf cores give you nothing
21:08DemiMarie: Same
21:08karolherbst: and we'll end up with 4+20 core systems, 4 high perf, 20 low perf
21:08DemiMarie: Except for security holes
21:09DemiMarie: Yup
21:09karolherbst: intel kinda moves into having same high/low perf cores :D
21:09karolherbst: it's kinda funky
21:09DemiMarie: Xen is having a really hard time with HMP right now
21:09DemiMarie: Mostly because Xen’s scheduler is not HMP aware
21:09dj-death: airlied: apparently some amdgpu headers where pulled from neither : https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21986
21:09karolherbst: it's also funky that the difference between 12700 and 12900 was not more perf cores, but 4 more energy cores
21:10DemiMarie: Not at all surprised.
21:10karolherbst: heh
21:10karolherbst: 13th gen is already there
21:10karolherbst: 8 high perf, 16 low perf :D
21:10DemiMarie: is that a comment on Xen?
21:10karolherbst: https://en.wikipedia.org/wiki/Raptor_Lake#Raptor_Lake-S
21:10karolherbst: kinda totally forgot about that
21:11dj-death: airlied: not quite sure what do to since we want to update the intel ones to the next drm-next
21:11karolherbst: so yeah.. intel is already there
21:12karolherbst: I wonder when Intel kills hyperthreading
21:13DemiMarie: To me the problem with big cores is that the stuff they do well on are:
21:13DemiMarie: 1. Wizard-optimized programs written with lots of SIMD intrinsics or even assembler.
21:13DemiMarie: 2. Legacy single-threaded programs that cannot be parallelized.
21:13HdkR: Once the sram cost of duplicating all the register state costs too much die area to them :P
21:13DemiMarie: 3. have lots of security holes
21:13karolherbst: DemiMarie: well... some things are hard to parallelize, like game engines
21:13DemiMarie: karolherbst: why?
21:13karolherbst: because things depend on each other
21:14karolherbst: AI in games is not trivial
21:14karolherbst: game devleopers can probably explain it a lot more
21:14karolherbst: there are things which can happen in parallel, but it's not as trivial as it might sound at first
21:15karolherbst: also think sound mixing and stuff
21:15DemiMarie: Sound mixing should happen on another thread.
21:15karolherbst: yeah
21:15karolherbst: so that's what I meant with some things can happen in parallel
21:16karolherbst: but you still need high single thread cores if you want to mix more sources in realtime
21:16karolherbst: in some games you notice that sound sources get disabled on demand, because of load
21:16DemiMarie: Should that be handled by a dedicated DSP?
21:16karolherbst: maybe?
21:16karolherbst: maybe not
21:17karolherbst: might be not flexible enough
21:17DemiMarie: I wonder if graph-reduction machines might help.
21:17karolherbst: but the point is rather, that there will be need for perf cores
21:17HdkR: Just throw another E-core at the problem, homogeneous programming model is better here
21:17DemiMarie: Basically a processor designed for Haskell and other purely functional languages, where everything is safe to run in parallel unless a data dependency says otherwise.
21:18DemiMarie: Where if there is a hazard that means the program has undefined behavior because someone misused unsafePerformIO or similar.
21:18psykose: even "in haskell" the above issues apply
21:18psykose: parallelism is not magic
21:19karolherbst: also.. caches
21:19DemiMarie: HdkR: Mobile devices have lots of DSPs IIUC
21:19psykose: strong '1 person in 12 months 12 people in 1 months' manager vibes
21:19HdkR: DemiMarie: And nobody's game uses them directly
21:19HdkR: Burn a Cortex-A53 to do the sound mixing, let the OS use the DSP for mixing
21:20DemiMarie: HdkR: maybe we need higher-level sound APIs that have “sound shaders” or similar
21:20karolherbst: in theory everything can be perfect, but practically we have the best outcome possible :P
21:20HdkR: DSP also takes up AI and modem responsibilities there...
21:20karolherbst: DemiMarie: cursed
21:20HdkR: OpenAL 2.0
21:20DemiMarie: karolherbst: cursed?
21:20karolherbst: very
21:20DemiMarie: I meant, “define cursed”
21:21karolherbst: it just sounds cursed
21:22Lynne: don't AMD have some weird GPU sound mixing thing?
21:22DemiMarie: I mean eBPF and P4 are basically shading languages for network devices.
21:22karolherbst: yeah, and many think eBPF is very cursed
21:22karolherbst: not saying I disagree, but...
21:22DemiMarie: part of that is because of the need to prove termination
21:23DemiMarie: In hardware that can be solved by having a timer interrupt.
21:23karolherbst: well.. on the kernel side you can also just kill a thread
21:23karolherbst: but you don't want to do that
21:23karolherbst: like never
21:23DemiMarie: longjmp()?
21:24karolherbst: so... you can't really do that wiht random applications, because they have to be aware of getting nuked at random points
21:24karolherbst: so they have to be written against that
21:24karolherbst: otherwise you risk inconsistent sate
21:24karolherbst: *state
21:24DemiMarie: the other possibility is that if your program doesn’t finish soon enough, that’s a bug
21:24karolherbst: if the modules work strictly input/output based, then yeah, might be good enough
21:25karolherbst: but then it's more of a design thing
21:25karolherbst: oh sure
21:25karolherbst: but you still can't kill it if it doens't know it will be killed randomly
21:25airlied: dj-death: just pull the intel ones and agd5f can chase down what happened with amd ones maybe
21:25karolherbst: you kinda have to enforce that in the programming model
21:25DemiMarie: yeah
21:28DemiMarie: Also, I hope these conversations are interesting and not wasting people’s time! (Please let me know if either of those is false.)
21:29karolherbst: nah, it's fine
21:31psykose: what else would we be discussing
21:35mattst88: development of dri?
21:41karolherbst: that X component?
21:41karolherbst: hell no!
21:41mattst88: might as well just ramble on about optical interconnects and haskell machines for 90 minutes instead :P
22:37Lynne: I understand fences are not quite a replacement for mutexes
22:38Lynne: but damn it, they should've added a wait+unsignal atomic operation on fences in vulkan
23:40memleak: Hello, I'm using kernel 6.4-rc5 (patched with PREEMPT_RT) and DRM/KMS works just fine on both AMDGPU and Radeon, I'm using an R9 290 (Hawaii) however when starting SDDM or LightDM, USB breaks
23:40memleak: If I use radeon then I get garbage on the screen, USB is dead, if I use AMDGPU, the screen at least looks fine but USB also dead.
23:41memleak: This problem does not exist on 6.1.31 (have not tried 6.1.32 yet)
23:43airlied: memleak: anything in dmesg?
23:43memleak: I set the panic timeout to -1 (instantly reboot on panic) and enabled panic on oops, the cursor for the login screen keeps blinking and the system stays on.
23:44memleak: I can't quite check it once the USB is dead lol i may have to grab a PS/2 keyboard if that works I don't have serial debug either
23:44memleak: I'll try and get dmesg output
23:45memleak: I have to head out, I'll be back later, just wanted to get this down in the channel. airlied nice to see you again btw, it's NTU/Alec from freenode lol
23:49memleak: Oh, just want to note that USB works indefinitely as long as X doesn't start :)
23:50airlied: oh hey, have you another machine to ssh in from?