06:15songafear: If none wants to talk to me, i have perhaps problems to send out things to review, hence on the longer run, i would prefer to make a small presentation to the web, if xdc is not desured or possible, that could be some netmeeting conference, or just offline slides and or offline video presentation, that way it would be possible to rotate the head to the correct path on the development. I was not kidding at first i would love to talk with xorg crew as
06:15songafear: to how to place the systems to the more modern end, ouh yeah i am mentally stable for some time already, lot of work was done mentally to be that.
09:20hch12907: gentle ping to issue #10185 (dev access to mesa/demos)?
09:22hch12907: btw, I think giving all mesa/mesa developers the dev access to every other mesa repo (piglit, demos, ...) is a better solution overall.
13:02karolherbst: jenatali: ever looked into subgroup support for clon12?
13:21jenatali: karolherbst: not yet
14:00karolherbst: jenatali: sad... I was looking into openvino and apparently disabling subgroup support makes it work :')
14:01jenatali: Oof
14:02jenatali: karolherbst: does CL subgroup support require independent forward progress? Or is that something else?
14:02karolherbst: only for cl_khr_subgroup
14:02karolherbst: however.. intel came up with cl_intel_subgroups which is cl_khr_subgroups without independent forward progress (for pre CL 3.0) + a bunch of additional subgroup ops
14:02karolherbst: they emulate the other ops
14:02karolherbst: so that might be broken as well
14:03jenatali: I see
14:03jenatali: D3D doesn't have IFP guarantees. And I know that WARP can't do it for example
14:03karolherbst: yeah.. you don't need IFP support to advertize subgroups in CL 3.0
14:04karolherbst: `CL_DEVICE_SUB_GROUP_INDEPENDENT_FORWARD_PROGRESS`
14:04jenatali: Any GPU that can run Nanite can do it though
14:04karolherbst: it's only require for cl_khr_subgroups which is pre CL 3.0
14:04jenatali: Oh cool. I need to do a full CL3.0 run and actually flip on that switch
14:04jenatali: It's been a few years
14:04karolherbst: I don't think the CL CTS actually tests it
14:04karolherbst: just the API consistency bits
14:04karolherbst: I think...
14:05karolherbst: dunno :D
14:05karolherbst: nvidia doesn't claim support for subgroups apparently....
14:06karolherbst: mhh maybe opencl.gpuinfo is just weird
14:10karolherbst: mhh.. but on ROCm (which only advertizes cl_khr_subgroups) it works... maybe my implementation of subgroups is indeed a bit broken...
14:12karolherbst: it also crashes the GPU with radeonsi... and disabling subgroups also makes it work there.. *sigh*
14:13karolherbst: jenatali: do you know anything important which uses subgroups? Kinda want to test subgroup support with something else besides multi layer AI/ML stuff :D
14:13jenatali: 🤷
14:40karolherbst: sad
14:49alyssa: 14:03 jenatali | Any GPU that can run Nanite can do it though
14:49alyssa: cries in m1
14:50alyssa: karolherbst: Is it expected to have to spill for large block sizes?
14:51alyssa: => is it expected that launch_grid might trigger a shader variant for variable block size?
15:02karolherbst: sadly yes
15:02alyssa: ugh
15:03alyssa: thanks
15:03karolherbst: however
15:03karolherbst: well.. not however
15:03karolherbst: but you can pin the block size in CL
15:04karolherbst: reqd_work_group_size
15:05karolherbst: and if that's set in the kenrel, it's illegal to launch it with a different local size
15:05karolherbst: alyssa: so you can e.g. compile all variants with specific `reqd_work_group_size` ahead of time and just use those...
15:06karolherbst: could even have them all in the same source file
15:06karolherbst: and they just call into a common function
15:06alyssa: whee.
15:06karolherbst: `__attribute__((reqd_work_group_size(X, Y, Z))) ` on the kernel
15:35pinchartl: is anyone working on DP MST support for an ARM-based platform ?
15:39HdkR: pinchartl: Which ARM platform? I'm sure Tegras already support MST
15:41HdkR: The question is very broad
15:41pinchartl: indeed
15:41pinchartl: any platform that would use drm_bridge
15:41pinchartl: so not tegra :-)
15:41pinchartl: I was trying to find prior art and didn't see any in mainline or on the list
15:42pinchartl: as far as I can see, only i915, nouveau and amdgpu have MST support
15:42songafear: So only x86
15:43songafear: Weird, it's fun thing that MST
15:43pinchartl: HdkR: I don't see any mention of MST in drivers/gpu/drm/tegra/
15:44songafear: You can easily control sync of the display mux by creating such control sync packets in the stream
15:45HdkR: pinchartl: Oh sorry, newer tegra which should just use nouveau bits
15:45HdkR: Tegra the SoC rather than the drm API :)
15:45pinchartl: :-)
15:45songafear: Yeah wau, so tegra has MST docks
15:46HdkR: Slap a radeon GPU in to an ARM device and we could technically claim that one is an ARM platform as well :P
15:48songafear: But how is the wire be split from the shared port?
15:48songafear: To additional separate do ports right?
15:48songafear: DP
15:48songafear: do/dp
15:49songafear: So it's analogue stream aggregation
15:50songafear: And the packets flow through the data channel
15:50songafear: So monitor or tv displays the stream of media
15:51songafear: Data is av
15:51songafear: Audio and video streams
15:53jenatali: karolherbst: Kernels that came from CL C 1.2 (or are annotated a certain way) need to use a fixed grid size too, just not necessarily statically defined
15:53karolherbst: there is no restriction on the grid size afaik
15:53jenatali: Just that it's fixed
15:53jenatali: You can't have some work groups using different sizes than others. Varying work group size wasn't a thing until CL2
15:53karolherbst: you mean block or grid?
15:54jenatali: Er, block I gusss
15:54karolherbst: yeah.. we were talking about the block size
15:54jenatali: Sorry this terminology is foreign to me still
15:54karolherbst: yeah...
15:54karolherbst: doesn't help that every vendor has different names either
15:54karolherbst: I think block/grid is what nvidia uses and where it comes from, but not sure
15:55karolherbst: maybe in the r600 days AMD used the same terms?
15:55jenatali: Anyway, block size has to be uniform for CL1
15:55karolherbst: dunno :)
15:55karolherbst: yes
15:55karolherbst: but it was about the block size specified through ndrangekernel
15:55karolherbst: or I guess what the runtime picks for a given grid size...
15:56jenatali: Right, you just can't use a different size for the last block
15:57songafear: It actually wasn't that you could not do that, just hw was not being controlled to run kernels on different cu's without restrictions, they could run only copies of kernels which were all same, no sync
15:58jenatali: Which now that I re-read it, isn't what alyssa was asking about :)
15:58karolherbst: :)
15:58alyssa: (:
16:03songafear: There were no synchronization primitives neither scheduling to control cu on various loads, you could though vary the block size in which case two streams run one kernel and five some other kernel and they would graduate together or something like this
16:04songafear: But that's just one developers unimportant detail, it still works well for modern compute and graphics
16:06songafear: But on all my chips I had best cl 2.1
16:08songafear: Those are very good, cuda even also very good, single source
17:57illwieckz: can a kind people properly tag my issue there: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10224 😊
18:13anarsoul: illwieckz: I added intel and iris labels
18:44illwieckz: anarsoul, thanks a lot!
21:42robclark: pinchartl: some qc things support MST.. but not sure if anyone is working on that yet
21:44pinchartl: robclark: so someone will need to be the first to interface this with drm_bridge, and everybody is hoping someone else would do the work ? :-)
21:47robclark: that is plausible
21:47pinchartl: it wouldn't be a first
21:48pinchartl: when that happens to me, if I wait long enough and the frustration builds up too much, it usually explodes in a desire to rewrite the whole subsystem
21:49pinchartl: (I'm not very frustrated with DRM/KMS if anyone is wondering ;-))
21:49robclark: tbh I'm not 100% sure why we need a bridge in that case (MST is only with external dp, no physical bridges involved.. but I've managed to avoid the dp code)
21:49pinchartl: I've managed to avoid the MST code so far
21:49robclark: maybe abhinav__ is aware of some plans on the mst side of things
21:50pinchartl: wouldn't it be in theory feasible to have a DSI-to-DP bridge with MST support ?
21:50pinchartl: DSI has virtual channels
21:51pinchartl: (not that I wish anyone would make such hardware)
21:51robclark: idk.. probably.. but I've not seen it
21:54abhinav__: pinchartl robclark we do have plans to add DP MST support , perhaps in the next 1-2 months. i can share more light on this that time
21:54pinchartl: abhinav__: nice
21:55pinchartl: well, before saying nice, I should wait to see how it looks like, and if it's one of those cases where the hardware designers should have stayed in bed on that fateful day
21:55pinchartl:should go to bed
21:56abhinav__: pinchartl I will remember to CC you on the changes for DP MST when we post them
21:56pinchartl: thank you