09:58llyyr: does anyone know what DCN X.XX mean in the drm/amd? I'm trying to bisect an issue so it'd be helpful to know what DCN3 and DCN3.1 refer to
09:58emersion: display core next
09:59emersion: https://kernel.org/doc/html/latest/gpu/amdgpu/driver-core.html#gpu-hardware-structure
09:59llyyr: thanks
12:10DottorLeo: hi!
12:13DottorLeo: karolherbst: did you check FluidX3D with rusticl? It seems a nice OpenCL app to test the cards :)
12:13DottorLeo: https://github.com/ProjectPhysX/FluidX3D
12:13karolherbst: uhh, fancy
12:15karolherbst: is it packaged anywhere?
12:17karolherbst: the license is a little.. mhh custom?
12:19DottorLeo: phoronix has a test profile but on the github there are the downloads also for linux for fp16 and 32
12:19psykose: it's not open source no
12:19DottorLeo: yeah, i think it's free and open to use unless commercial
12:20karolherbst: psykose: well... it's open source, but I suspect it's not GPL compatible
12:20DottorLeo: karolherbst: can't you use it personally only to test rusticl?
12:20psykose: it's not either
12:20karolherbst: it's basically like MIT I think, just disallows military and commercial use
12:21karolherbst: ahh yeah.. looks like MIT was the template for this license
12:21DottorLeo: there are already some numbers from amd/intel/nvidia to test rusticl against :D
12:21karolherbst: huh, where? :D
12:22DottorLeo: go down on github paghe
12:22DottorLeo: page
12:22karolherbst: ohh
12:22karolherbst: I thought you meant tests running rusticl :D
12:22karolherbst: but yeah.. I might look into it and see if it runs, but it should if it's just CL 1.2
12:22karolherbst: should just work (tm)
12:23DottorLeo: oh no ^^" sorry, but i think it's useful because it uses both FP16/32 mixed and FP32
12:23DottorLeo: and it's quite taxing as simulation :D
12:23karolherbst: yeah.. fp16 is atm not supported, because of.... reasons
12:23karolherbst: are there examples/demos one can use?
12:23DottorLeo: it's not experimental??
12:23karolherbst: it's broken, but people are free to enable it
12:24karolherbst: I think it should be fine if you don't use any of the OpenCL builtins with fp16 types
12:25DottorLeo: https://github.com/ProjectPhysX/FluidX3D/issues/8, here is the benchmark issue
12:27karolherbst: I want to see stuff though :D
12:27karolherbst: forget the benchmark, I want to see the simulation
12:28karolherbst: I wonder if with rusticl it's faster than ROCm...
12:28karolherbst: I think I have one of the GPUs listed there
12:29DottorLeo: karolherbst: i asked you in the past that in theory rusticl can use simultaneous different gpus, right?
12:29DottorLeo: lets say nvidia+amd+igpu intel
12:29karolherbst: yeah... though I think the memory model is a bit broken on this
12:29karolherbst: not sure, but it e.g. worked with luxmark just fine
12:30karolherbst: so if you see any issues there feel free to report it
12:30DottorLeo: so that software could use ALL the computing units from a PC, CPU+all the gpus? :D
12:30karolherbst: yeah
12:30karolherbst: just
12:30karolherbst: llvmpipe is slower than pocl :D
12:30karolherbst: it got better, but there are still some issues left to resolve
12:30karolherbst: llvmpipe is really bad at utilizing the CPU
12:31DottorLeo: wow you know that when rusticl will be conformant on all the platforms it will be seen as the second reborn of OpenCL? :D
12:31karolherbst: maybe?
12:31karolherbst: my goal is just to make _some_ compute API available to the linux desktop
12:32karolherbst: as in: people can rely on it being functional
12:32karolherbst: at this point, only Nvidias and Intels stack is what I'd consider somewhat functional
12:33DottorLeo: it's interesting why the author of FLuidX3D used OpenCL instead of only CUDA, not only just for the multivendor support. He says that if done right, openCL on Nvidia is as good as CUDA
12:33karolherbst: yeah, it is
12:33karolherbst: most code is just bad
12:33karolherbst: and most runtimes are
12:33karolherbst: nvidia's CL impl is really the best so far
12:34karolherbst: but they also have the compiler to back it up
12:34karolherbst: I mean.. there are computational heavy benchmarks where rusticl outperforms ROCm by 20%
12:34karolherbst: it's a bit disappointing to be honest
12:34DottorLeo: why?
12:35karolherbst: I expected a serious business company like AMD would put more effort into this
12:35penguin42: but there again I've got kernels where ROCm wins for me; so shrug
12:36DottorLeo: Maybe rusticl will be used on some GPU computation farms instead of ROCM, i think that for the final client doesn't matter, only the speed and correctness matter :)
12:36karolherbst: penguin42: yeah.. but rusticl isn't optimized at all
12:37DottorLeo: did you tried it with blender?
12:37karolherbst: blender dropped CL
12:37karolherbst: so.. it's either CUDA or HIP
12:37karolherbst: there is a HIP on CL implementation, but it's not ready for blender
12:38DottorLeo: yeah, sorry the HIP implementation
12:38karolherbst: but rusticl also still has huge issues so it's still gonna take a while
12:38DottorLeo: and the SyCL from intel?
12:39karolherbst: it's progressing
12:39karolherbst: the issue with SyCL from intel is, that they produce invalid spir-v
12:39karolherbst: a lot
12:40DottorLeo: karolherbst: one last thing, when you merged the Optional image support, it also enables it on r600? it was one of the missing things on clover for that cards
12:41karolherbst: ohh, images were supported since day 1
12:41karolherbst: the optional stuff are just more formats
12:42DottorLeo: yes but when you merge a feature, it is enabled for all the supported platforms that uses rusticl? Sorry, i'm trying to understand how it works when you add new stuff to rusticl :)
12:43karolherbst: yeah
12:43karolherbst: sometimes there are driver bits to it
12:43karolherbst: but we try to be accurate in the features.txt file
12:43karolherbst: also
12:43karolherbst: https://mesamatrix.net/#RusticlOpenCL
12:43DottorLeo: because @gerddie said on the MR request for r600 that images support was missing
12:44karolherbst: doesn't list r600 yet, because it's broken
12:44karolherbst: ehh.. should be fine
12:44karolherbst: there is a bit missing for r600, I just don't have the hardware to test it
12:44penguin42: karolherbst: If you need a test run on r600 I can do that for you, this <--- laptop has oene
12:45DottorLeo: i should have an old 5450 (cedar) to test it :D
12:46DottorLeo: @illwieckz has probably all the R600 cards :D
12:46DottorLeo: it's impressive
12:47penguin42: 'AMD Thames [Radeon HD 7550M/7570M/7650M]'
12:47karolherbst: amazing.. Intel's CL stack ooms my system
12:47penguin42: (Very oddly configured HP Elitebook I found in a 2nd hand shop; nice i7, 8G RAM, Radeon, every interface you can imagine, and a shit 1366x768 display...]
12:48karolherbst: hashcat MD5 test: Intel: Speed.#1.........: 231.4 MH/s (1.53ms) @ Accel:16 Loops:16 Thr:64 Vec:1
12:48karolherbst: rusticl: Speed.#1.........: 2002.3 GH/s (0.00ms) @ Accel:2048 Loops:1024 Thr:32 Vec:1
12:48karolherbst: hashcat benchmarks in the most silly way though
12:48karolherbst: so whatever
12:48karolherbst: penguin42: yeah.. so somebody needs to implement the `get_compute_info` hook
13:02penguin42:tries to get himself past his existing patches first
13:06karolherbst: the key to the compute info stuff is really calculating how many threads can be launched
15:44penguin42: karolherbst: Nora is asking for 'Btw, please add the CL_PLATFORM_HOST_TIMER_RESOLUTION and CL_PLATFORM_HOST_TIMER_RESOLUTION device info queries in api/device.rs' aren't they in platform.rs query?
15:50karolherbst: yeah looks like CL_PLATFORM_HOST_TIMER_RESOLUTION is indeed a platform query
15:50penguin42: ah yes Nora confirmed that
18:42karolherbst: jenatali: sooo.. I looked a bit into what LLVM passes actually help: EarlyCSEPass roughly 10% cut in spir-v size, MergeFunctions roughly a 50% cut in spir-v size. 10% cut is great, so enabling EarlyCSE is probably what we should do. However, MergeFunctions can generate function pointers and the translater only allows us to use it with SPV_INTEL_function_pointers
18:43karolherbst: but 50% in reduction is kinda neat... but a lot of LLVM passes are just generating random stuff we can't handle, so maybe it's better to really rely on spirv-opt here instead :/
18:44jenatali: I don't know that I see much point in running spirv-opt, vtn is pretty lightweight I feel like
18:45karolherbst: it's more about reducing the size of the spir-v
18:45karolherbst: like.. hashcat generates a 7MB spirv by default
18:45karolherbst: for one hash function
18:46karolherbst: but smaller spirv also means less time spend in clc_parse_spirv, which.. kinda makes up a huge amount of CPU overhead at that size
18:46karolherbst: but smaller spirv also helps with the disk cache and evertyhing
18:47karolherbst: and I also suspect the compilation to be quicker the earlier we drop massive amount of code
18:47karolherbst: but anyway...
18:47karolherbst: would be cool to just be able to use MergeFunctions on the LLVM IR level
18:47karolherbst: but... it generates function pointers :(
18:47karolherbst: sometimes
18:50karolherbst: another problem is, that linking spirvs isn't cheap either :/ and even with a single spirv file we kinda have to do it, because... random nonsense
19:21karolherbst: mhh GVN seems to also help a lot, nice
22:05DemiMarie: Is the compilation happening ahead of time or at runtime? If the latter, could LLVM IR be translated directly to NIR, without going through SPIR-V?
22:06karolherbst: no
22:06karolherbst: the point is to use spirv
22:06DemiMarie: Ah
22:06DemiMarie: sorry, I was missing some context