00:11zmike: DavidHeidelberg[m]: is there any way to see the log for trace jobs? I wanted to look at https://gitlab.freedesktop.org/mesa/mesa/-/jobs/47941379 but there's no info
00:57anholt: zmike: you mean other than the problems html from the URL it prints to look at?
00:57anholt: the python doesn't print useful info, it's all in the generated results.
07:40tnt: Does anyone know what intel mean by Q Pitch ?
08:01austriancoder: karolherbst: thanks - vec8/vec16 are gone now \o/
08:28karolherbst: nice
08:28karolherbst: I've updated my patch, because there were a few corner cases, not sure if you tried the latest version
08:28karolherbst: austriancoder: ^^
08:29karolherbst: ohh, I guess you did
08:29karolherbst: but yeah, they are also all gone on my end
08:29austriancoder: karolherbst: pulled your branch about 40 min ago
08:30karolherbst: I kinda hate how it's done as it relies on copy_prop not being smart enough to reverse the things...
08:31karolherbst: kinda have to sit down and rethink the entire process, but if you are e.g. before io lowering, you still have vec8/16 derefs and we can't split them up... or if the vec4 is based on anything else which is still a vec8 copy prop might reverse it
08:32karolherbst: like imagine you have hardware which can do a vec16 load (like if your loads are actually byte based and you can do a 16 byte load, vec1x16)
10:10DavidHeidelberg[m]: zmike: it seems like downloaded traces not matching checksum, pretty happy to see that it works sometimes. Otherwise it would run the trace and probably generated incorrect screenshot due to damaged trace. But yes - no logging for piglit.
11:11zmike: DavidHeidelberg[m]: not sure exactly what that means?
11:11zmike: why did it crash
11:21DavidHeidelberg[m]: zmike: the downloaded file doesn't match the checksum provided by S3, so the file was (probably) damaged
11:21DavidHeidelberg[m]: this shouldn't usually happen, maybe some corruption of filesystem on runner
11:22DavidHeidelberg[m]: zmike: the "crash" here is only wait to communicate here, because it's not fail, pass or timeout
11:23zmike: ahh ok
11:23zmike: also would it be possible to change the output to not have the ' at the end of the problems.html link
11:24DavidHeidelberg[m]: zmike: I have it in not-yet-merged MR :D
11:24DavidHeidelberg[m]: I got pissed by it few times already :P
11:25DavidHeidelberg[m]:sneaked the commit into 6.4 kernel uprev, but there was some fighting with Cheza boards in the last minute before merge :)
11:26zmike: it's always bothered me
11:26zmike: but not enough to do anything about it
11:36zmike: DavidHeidelberg[m]: also have you had time to look at that blender trace?
11:44DavidHeidelberg[m]: zmike: would you mind make the trace performance-testing compatible? (3x2 frames)? I checked, it works for me, but crashes iris when I try to replay it in perf mode)
11:44DavidHeidelberg[m]: if I dropped last frame (which probably does some cleaning) I think it would work
11:44DavidHeidelberg[m]: s/3x2/initial frame + 3x2)
11:45zmike: so...7 frames?
11:45DavidHeidelberg[m]: yup
11:46DavidHeidelberg[m]: previous Blender traces didn't work reliably in performance testing, but maybe recent Blender builds will be better
11:46zmike: ok updated
11:48DavidHeidelberg[m]: nice, look good. loop=1500, no extra memory consumption, 33 fps on Intel.. so far so good
11:49DavidHeidelberg[m]: hehe, 150 runs 60 fps.. I think I need improve my laptop cooling :D
13:50karolherbst: gfxstrand: I got told, that for Vulkan SPIR-V it's technically valid to use vec8 and vec16? I was kinda under the impression it's all vec4 at most.
14:00zmike: I'm not sure the first part of that is accurate
14:00zmike: vec16 is only legal with the Kernel cap, and that cap is not legal in vulkan afaik
14:01zmike: vec8 is a bit more nebulous, but I imagine if you try to use a vec8 somewhere then vvl will tell you why you can't
16:07karolherbst: nah, it's actually in the spir-v spec, it's just a bit hidden :D
16:07karolherbst: "Vector types must be parameterized only with 2, 3, or 4 components, plus any additional sizes enabled by capabilities."
16:07zmike: yes
16:07zmike: that's not hidden
16:07zmike: and Vector16 requires the Kernel cap
16:08zmike: Vector8 doesn't seem to exist in the base spirv spec
16:08karolherbst: well.. you won't find it if you search for "OpTypeVector" :D
16:08karolherbst: well..
16:08zmike: sounds like someone needs to improve their spec-fu
16:08karolherbst: Vector16: Uses OpTypeVector to declare 8 component or 16 component vectors.
16:08zmike: mm fair enough
16:08zmike: still not useful
16:08karolherbst: yeah..
16:08karolherbst: it's kernel only :)
16:11alyssa: DavidHeidelberg[m]: panfrost-g52-vk job seems slow
16:11alyssa: given that we've sunsetted the g52 vk experiment (and would remove the code from tree if not for $politics), it really shouldn't be a premerge job imho
16:11alyssa: (It has 0 users and, unless I'm very mistaken, is not being developed.)
16:12zmike: careful, those sound like the words of a panfrost developer
16:12DavidHeidelberg[m]: Give me numbers or links:) i'll try to look at it today :)
16:12alyssa: https://gitlab.freedesktop.org/mesa/mesa/-/jobs/47991585
16:13alyssa: I don't understand why the job is running in pre-merge at all, the driver is not shipped and not intended to be shipped, it's served its purpose
16:13zmike: karolherbst: there's a few more in https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24839 that seem like they could easily be merged
16:13DavidHeidelberg[m]: 20min is still +- in 15min range
16:13karolherbst: probably
16:13alyssa: DavidHeidelberg[m]: Um, wasn't it a 10 minute limit?
16:13alyssa: Since when was 20 minutes ok?
16:13DavidHeidelberg[m]: I guess because it's supported HW?
16:14DavidHeidelberg[m]: Nah, we have 15min, but some jobs are ranging 10 - 20
16:14alyssa: Ok, it should be 10 minutes
16:14alyssa: Also, the job is super flaky because panvk on g52 is broken
16:14karolherbst: zmike: you are free to rb any of those patches and I might extract more of them
16:14alyssa: and again, nobody is going to be fixing it because it's not a developed project
16:14DavidHeidelberg[m]: Hmmm..... and now back to reality ...
16:14DavidHeidelberg[m]: Well, then we should remove it from mesa if it has no users
16:15zmike: karolherbst: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24839#note_2054681 ?
16:15alyssa: DavidHeidelberg[m]: Yes, at least when I was still at Collabora, that was in the cards
16:15alyssa: It's being kept in tree because deleting it would cause $problems, but the driver as upstreamed is unused and should not be in CI
16:16karolherbst: zmike: if you think the global_binding one is in order, then yeah, I guess
16:16karolherbst: _but_
16:16karolherbst: I think it's causing issues with anv
16:16DavidHeidelberg[m]: daniels: ^^ Alyssa message
16:16karolherbst: or maybe that was something else
16:16alyssa: and again, 20 minutes is unacceptable for the job
16:17alyssa: that's not 20 minutes due to a retry, that's 20 minutes of actual deqp-runner time
16:17alyssa: if we bumped the expectation from 10 minute limit, that was a mistake.
16:17zmike: I think 10 has always been aspirational
16:17DavidHeidelberg[m]: kk, we'll talk about it tomorrow, anyway in worst case I'll xut it to 15
16:18DavidHeidelberg[m]: *cut
16:18alyssa: so now that job has flaked after 20 minutes (because panvk is known broken) is now being retried
16:18alyssa: so that will be 45 minutes for a job that is providing no value
16:18alyssa: also, for some reasons t860-traces is running too?
16:19alyssa: since when is that not manual?
16:19alyssa: does marge run manual jobs now?
16:19alyssa: I see it listed as panfrost-midgard-manual-rules but it was triggered in https://gitlab.freedesktop.org/mesa/mesa/-/jobs/47991577
16:21alyssa: frankly i don't know why we have these jobs at all, they're not providing value
16:22alyssa: but zmike is right, i'm sounding like a panfrost developer
16:23alyssa: I thought we had a simple rule, jobs go into premerge if we expect them to provide significant value, to take 10 minutes or less of execution time not including setup/teardown overhead, and to be robust against flakes
16:24alyssa: something like panfrost-g52-vk fails all 3 principles. I dont know why it's there. Adding testing for the sake of adding testing is actively counterproductive.
16:24alyssa: and we have a lot of that.
16:24alyssa: but i should shut up before i get yelled at again for acknowledging the problems that lots of people are thinking
16:24alyssa: so... never mind
16:25karolherbst: could send an MR to disable it and then we merge it (or something)
16:25alyssa: mesa ci is perfect, please don't punish me again like last time.
16:26zmike: karolherbst: I did some other reviews
16:28karolherbst: cool, thanks
16:52DemiMarie: alyssa: what is the problem with panvk?
16:52Lynne: does any hardware offer real 8-component vectors anyway?
16:55jenatali: Fwiw I think the dzn jobs are closer to a 15 minute average, which I was told was fine when deciding what level of fraction to use
16:58gfxstrand: Lynne: Some Mali hardware does (-:
16:58gfxstrand: vec4, f16vec8, and u8vec16 (-:
17:12Lynne: what about 16-component vectors?
17:16DemiMarie: https://github.com/gpuweb/gpuweb/wiki/implementation-status Is there something wrong with GPU drivers on Linux that causes Chrome to disable WebGPU there?
17:17DemiMarie: And if so, is this specifically the Nvidia problem?
17:22robclark: DemiMarie: given vk's lack of guarantees about undefined behavior, do you _really_ want webgpu enabled without separate sandboxes for the usermode driver?
17:23DemiMarie: robclark: what makes desktop Linux different than ChromeOS in this regard, especially with lacros?
17:25DemiMarie: the answer to your question is of course “no”, but desktop Linux being inconsistent with ChromeOS is confusing.
17:26robclark: I guess the main difference w/ CrOS is we know what drivers we ship and are in control of uprev'ing them... I'm also not sure what the status of gpu-process sandbox is w/ chrome/ium on desktop linux
17:27robclark: (but even in the CrOS case I think we need more hardening)
17:29DemiMarie: robclark: If you decided “we don’t support WebGPU with X11 or with non-Mesa drivers” I would support that. It’s the inconsistency that is confusing.
17:31robclark: I don't think x11 really changes much.. as much as the random unknown driver thing.. even with mesa drivers we know what versions we ship and can push out an update... that said, I wasn't involved in this decision wrt. linux vs cros, just speculating
17:31robclark: w/ distro linux, it could be some ancient version of mesa, for ex..
17:34DemiMarie: robclark: _insert rant about LTS distros here_
17:37robclark: ;-)
17:37DemiMarie: Simplest solution for the user-space drivers would be to ship Chromium as a Flatpak and use the up-to-date version of Mesa in the relevant runtime.
17:38DavidHeidelberg[m]: Demi: that will haunt me in my dreams :D
17:38DemiMarie: David Heidelberg: what will?
17:39robclark: webgpu haunts me in my dreams
17:39DavidHeidelberg[m]: but if I get it right, you can give Chromium flatpak some beta runtime with recent Mesa
17:40DemiMarie: the stable ones do not have recent Mesa? That’s a problem.
17:41DemiMarie: Anyway, I’ll stop so that this does not take away people’s time any more than it has.
17:41DavidHeidelberg[m]: I've been told you can somehow use the system ones, just haven't took notes how :D
17:41DavidHeidelberg[m]: *system Mesa3D
17:42DavidHeidelberg[m]: GPU-Viewer reports 23.1.4
17:44robclark: The real thing would be to have webgl/webgpu canvas's spin off their own private sandboxed gpu-process.. we don't really want usermode part of gpu stack to need to be a security barrier
17:44robclark: (ofc the sandbox thing itself takes a bit of maintenance and sometimes needs to change with mesa versions)
17:45DemiMarie: Should Mesa have its own built-in sandbox?
17:48robclark: I'm not _entirely_ sure how that could work.. I mean if dri_foo.so is dynamically linked against something that hasn't been allowed then we can't even load mesa in the first place.. but there are plenty others who know the mechanics of deploying the sandbox better than I do
17:50robclark: usually that sort of thing doesn't change _too_ often so it hasn't been enough of a pain point for CrOS to try to come up with something better.. but as they say, patches welcome ;-)
18:06austriancoder: robclark: had you time to have a look at the isaspec doc poc !23763 ? If we could define what information should be shown how I can spend some time on it
18:13robclark: austriancoder: idk if we could generate table, and then example syntax (which might just be dumping the display str w/ instruction name plugged in??).. I think that would be easier to read.. ie:
18:13robclark: https://usercontent.irccloud-cdn.com/file/NGoTZHS7/image.png
18:15austriancoder: robclark: images .. hmm .. lets see
18:15robclark: (that is from arm arm, fwiw)
19:40apteryx: are there free drivers for the likes of Matrox M9128 GPUs?
19:40apteryx: c.f.: https://video.matrox.com/en/products/graphics-cards/m-series/m9128-lp-pcie-x16
19:43Lynne: airlied: ping on the scaling list PR
19:47gfxstrand: apteryx: Given that that's obviously a server card, it almost certainly on Linux (they wouldn't be able to sell it otherwise) and probably out-of-the-box.
19:48gfxstrand: apteryx: I wouldn't call it a GPU, though.
19:49gfxstrand: It looks like pretty much just a display card.
19:49airlied: yeah I've no idea, matrox kinda fail
19:49airlied: but a lot of their gpus are now just rebadged other people's gpus
19:49karolherbst: apparently it supports GL 2.0 :D
19:50apteryx: karolherbst: if it works without proprietary binary firmware blobs, that's already better than AMD!
19:50karolherbst: ehhhh....
19:50karolherbst: no
19:50apteryx: (for my needs)
19:50karolherbst: yeah, if you all need a display then yeah.. probably
19:51gfxstrand: And D3D9!
19:51karolherbst: though I suspect not on linux
19:51karolherbst: but also kinda depends on what GPU that actually is
19:51gfxstrand: But only Windows 7 and earlier because no WDDM2 (-:
19:52karolherbst: it's probably some old AMD gpu
19:52karolherbst: or something
19:52gfxstrand: Could be a GeForce2 or similar
19:52karolherbst: DDR3 128bit ehhhh
19:52karolherbst: *DDR2
19:54apteryx: gfxstrand: vista support is advertized for what it's worth: https://video.matrox.com/en/media/957/download
19:54karolherbst: I kinda hate that they just don't say what it is
19:54apteryx: could still be their own ASIC
19:55karolherbst: mhhh... maybe
19:55airlied: could be a g450 in disguise :-P
19:55karolherbst: maybe it's also just software gl
19:55airlied: probably one of their P series
19:55airlied: which they never supported
19:57apteryx:is giving them a call
19:59karolherbst: would be cursed if they have actual GL and if it is their own ASIC and somebody does write a mesa driver for it
20:06milek7: >High-resolution two DisplayPort monitor support: Support resolutions up to 2560x1600 per output
20:06milek7: high resolution, yeah...
20:09karolherbst: well.. that's a 2012 card
20:09karolherbst: or something
20:09ids1024[m]: When they advertise "Native PCI express x16 performance" and Windows XP support, would that be PCIe... gen 1?
20:10karolherbst: I wonder what they mean by "native PCI express" though there were a bunch of GPUs which just had a PCIe to PCI bridge on the board
20:13apteryx: their 2008 line looks very similar in terms of supported resolutions and memory, and was made of their own ASICs: https://www.techpowerup.com/64033/matrox-introduces-five-new-quadhead-graphics-cards
20:13ids1024[m]: My interpretation of "Native PCI express x16 performance" is that it actually uses 16 PCIe lanes?
20:17apteryx: couldn't get them on the phone
20:23apteryx: otherwise from what year did the mainstream GPUs (AMD, nVIDIA) started requiring signed firmware?
20:30apteryx: seems their latest offering is powered by Intel ARC: https://www.phoronix.com/news/Matrox-Intel-Arc-Graphics
20:33glennk: apteryx, https://vgamuseum.info/images/demiurge/m9128/img0062.jpg looks like one of the parhelia variants
20:33apteryx: and this message suggests their previous GPUs were using AMD ones: https://www.phoronix.com/forums/forum/hardware/graphics-cards/1384974-matrox-announces-luma-graphics-cards-powered-by-intel-arc-graphics?p=1385583#post1385583
20:35apteryx: how hard would it be to make a crappy 2D video card using an FPGA?
20:35glennk: anything newer than that card from matrox is probably rebranded radeons or geforce
20:35karolherbst: apteryx: probably easier to use the CPU for that
20:35karolherbst: (and more power efficient)
20:36glennk: apteryx, https://github.com/Wren6991/PicoDVI
20:38apteryx: karolherbst: I guess that's stops being true the minute I'd want to implement video acceleration?
20:38karolherbst: depends
20:38karolherbst: you'd have to compete with modern CPUs or whatever CPU you have with all their SIMD units
20:39karolherbst: it's probably not hard to be smart about all of it and make it power efficient
20:39karolherbst: but these days we also don't really have any 2D APIs
20:39karolherbst: and it's all going through 3D _anyway_
20:39apteryx: oh!
20:39karolherbst: I think nvidia is the only GPU vendor still haveing a native 2D interface
20:40karolherbst: and it hasn't been updated for 10+ years
20:40karolherbst: so if you want acceleration you have to think about 3D and potentially shaders and....
20:41apteryx: hm, and that raises the bar for entry
20:41karolherbst: at which point it's a hell of a project and you'd have to consider if it's worth spending time on :D
20:41karolherbst: though
20:41karolherbst: with X you still can get 2D
20:41karolherbst: but then you need to write your own X driver
20:41airlied: and it's kinda pointless
20:42airlied: since most modern stuff uses paths that really need a 3d accel path
20:42airlied: not seeing anyone implementing Xrender in hw :-P
20:43karolherbst: I wonder if we could implement Xrender on top of nvidia's 2d stuff :D
20:43karolherbst: I'm sure nvidia has done it
20:43airlied: no I don't think their 2d engine is that featureful
20:44karolherbst: it even has polylines
20:44airlied: that's ancient X core rendering, not X render
20:44karolherbst: ahh, fair enough
20:44karolherbst: so more blending stuff?
20:44airlied: alpha blending and compositing
20:44karolherbst: let's see...
20:44glennk: trapezoids with compositing and masking
20:45karolherbst: yeah, it supports blending
20:45karolherbst: it even has two blend modes, but I never fiugred out how to actually use it
20:46glennk: i think all those methods are firmware emulation
20:46glennk: shader turtles all the way down
20:46karolherbst: maybe
20:46karolherbst: but also not likely
20:46karolherbst: or maybe it would be
20:46karolherbst: dunno
20:46glennk: silicon validation is pricy
20:47karolherbst: sure, but if you layer it on shaders, why even keep it in hardware?
20:48glennk: backwards compat for old os:es
20:48karolherbst: on newer GPUs?
20:48karolherbst: also.. nothing talks with it directly, it all goes through drivers
20:49karolherbst: it also doesn't invalidate any of the 3D or compute state using that stuff
20:50glennk: host visible state
20:51karolherbst: given that all state generally lives in buffers, that's hard to believe
20:51apteryx: seems one approach is going straight to vulkan: https://www.phoronix.com/news/Libre-RISC-V-February-Designing; would that be usuable for a general purpose video card?
20:52karolherbst: anyway, it makes more sense to be it their dedicated stuff for fast path certain operations
20:52karolherbst: generally in memory
20:52airlied: apteryx: yeah those guys not really know what's going on
20:53apteryx: back to the boring real world: I'm recommended this for a cheap, free software friendly GPU: https://www.phoronix.com/review/asus-50-gpu
20:53apteryx: It seems an AMD RX 580X would also be a fine choice, running the radeon driver
20:53apteryx: according to https://h-node.org/videocards/view/en/2024/Advanced-Micro-Devices--Inc---AMD-ATI--Ellesmere--Radeon-RX-470-480-570-570X-580-580X-/1/1/undef/2017/works_with_3D/undef/video-card-works/undef
20:53karolherbst: Intel burned a lot of money on making their CPU ISA viable for 3D
20:53karolherbst: the conclusion was: don't do it
20:56karolherbst: anyway
20:56karolherbst: are they still doing this RISC-V GPU thing or is that abondened?
20:56karolherbst: ahh, looks like it's dead
20:57apteryx: this must be keeping Luke's busy: https://redsemiconductor.com/
20:57apteryx: Luke*
21:06agd5f: in the R600 days we actually had a set of shaders and that emulated the old 2D engine. You could actually use the old 2D pm4 packets if you loaded the right state and shaders. not sure if it ever got productized.
21:19Lynne: karolherbst: there's some EU funded project for a custom from scratch GPU using the PPC ISA
21:20airlied: yeah they got distracted into some sort of network accelerator sidetrack as well
21:20Lynne: as for risc-v, it's still young, give it time, right now there are no CPUs out that you can buy with the vector extension
21:21Lynne: though I do feel like the vector ISA may be a bit too flexible/rigid for a GPU
21:22Lynne: they'd have to noop every instruction to set the vector size, and swizzles are afaik not supported
21:22airlied: yeah like doing a risc-v gpu should really just be more around the effort of an open isa than reusing the risc-v isa
21:23airlied: and creating a gpu/compute isa
21:23airlied: that is scalar
21:36Lynne: pretty much all popular RISC ISAs are unsuitable for as a GPU ISA base I think, they all use 32-bit instructions which leaves no room for immediates to allow for swizzles
21:38Lynne: x86 may still be the most optimal general purpose ISA to build a GPU ISA around, avx 512 has the right ideas about swizzles via k-registers which most instructions support
21:39Lynne: as long as each wavefront has a decently sized uop cache the decoder footprint wouldn't be larger than a CPU's
21:40karolherbst: no
21:41karolherbst: it's not
21:41karolherbst: the best thing about GPUs ISAs are that they are scalar
21:42airlied: yeah scalar with subgroup ops seems to be the winner
21:42karolherbst: you don't need swizzles
21:42karolherbst: so that's a pointless argument against RISC
21:43Lynne: really? what about vectors?
21:43karolherbst: they don't exist
21:43karolherbst: only in memory load/stores
21:43karolherbst: thats all
21:43HdkR: Vectors are a figment of your imagination~ They can't hurt you anymore~
21:43ccr: "there is no spoon."
21:43karolherbst: nvidia is purely scalar since nearly forever
21:43Lynne: huh, I was under the impression GPUs had vector units for 4-component float vectors
21:43karolherbst: I think pre nv50 is vectorized?
21:44karolherbst: Lynne: silly GPUs do
21:44karolherbst: but scalar GPU ISAs are always the winner
21:44karolherbst: because vectorized ISA are just evidence of wrong mindset
21:44Lynne: ah, alright, I stand corrected then, RISC-V is a good base, especially with compressed instructions
21:44karolherbst: well
21:44karolherbst: no
21:45karolherbst: the thing is, that most of the "let's use CPU ISAs on GPUs" miss the point on what makes GPUs fast
21:45karolherbst: and it's not the ISA
21:45karolherbst: it's the programming model
21:45karolherbst: it's a mood point, because running CPUs code on GPUs is a lost battle
21:46karolherbst: GPU programming model is fast on GPUs, because parallelism is _implicit_
21:46airlied: wish someone would tell that to luxcore :-P
21:46karolherbst: like you run a shader per primitive and only think of each primitive of a scalar program
21:46karolherbst: and the parallelism happens under the hood
21:48karolherbst: in theory you can do great things even with x86 SIMD units, but you won't achieve it if you are using C as your language
21:48karolherbst: because C's programming model maps poorly to GPUs and compilers rely on auto vectorization (which GPUs don't even need)
21:49Lynne: so what happened to VLIW GPUs?
21:49karolherbst: they are dead
21:49gfxstrand: They're a bad idea
21:49Lynne: they explicitly specify parallelism
21:49Lynne: what replaced them?
21:49karolherbst: scalar ISAs
21:50Lynne: oh, I read implicit as explicit
21:50karolherbst: ahh, yeah
21:52karolherbst: Lynne: anyway, I can recommend reading this series of blog posts: https://pharr.org/matt/blog/2018/04/18/ispc-origins it's not _that_ technical, but it explains the problems pretty well and what's the _actual_ problem
21:52karolherbst: in theory you can also make SIMD ISAs work, they are just pointless and have drawbacks you can't really fix
21:52karolherbst: at least for GPUs
21:59airlied: yeah the auto vectorise story is great
22:01gfxstrand:looks at Intel's ISA
22:01gfxstrand: The "best" SIMD...
22:01Lynne: thanks, that looks very interesting
22:02Lynne: why did they split SPIR-V into kernel and non-kernel mode btw? did it have something to do with opencl and silly vector GPUs?
22:03karolherbst: llvmpipe even uses the masked SIMD instruction thing described later in the series as well
22:03karolherbst: yeah
22:04karolherbst: CL is... weird
22:04karolherbst: so
22:04karolherbst: the main differences are that CL has vec8/vec16 types
22:04karolherbst: and that CL doesn't require a structured CFG
22:04karolherbst: everything else being different are just details
22:04karolherbst: those are the two major things
22:05karolherbst: but you can implement CL just fine on Vulkan SPIR-V
22:05karolherbst: it's just more work
22:06HdkR: gfxstrand: The best SIMD because you have effectively infinite encoding space to fix past mistakes :P
22:07Lynne: I still think vectors have a place in GPUs, if you're the type of maniac who hand-writes assembly :)
22:07karolherbst: no
22:08karolherbst: it just doesn't make sense for GPUs
22:08karolherbst: it makes sense for a couple of instructions, but not for general ALU stuff
22:09karolherbst: the problem is really, what if you can't make use of the full vector hardware? then it's just wasted silicon with no inherent benefit, as evertyhing is already threaded anyway
22:10karolherbst: high end GPUs run like 10k+ threads at once
22:10HdkR: Just scale the scalar GPU hardware wider if you need a wider "SIMD" unit :)
22:10karolherbst: at which point SIMD just doesn't make a difference
22:11karolherbst: the entire GPU is already a huge "SIMD" unit, you just describe what each of those threads do
22:11karolherbst: (divergency is bad on GPUs though for this reason)
22:13Company: so you're saying the maniacs should code stuff with more threads instead of with more SIMD?
22:14airlied: I did wonder if we should just do an enable "some kernel SPIRV" in vulkan, instead of trying to do all the CL stuff
22:14Company: like, one thread for each color channel instead of using SIMD for rgba?
22:14karolherbst: yeah... but...
22:14airlied: find the most interesting ibts, though I suspect the CL extended instruction set is probably a lot of it
22:14karolherbst: airlied: though maybe the middle ground is to allow the CL extended instruction set
22:14karolherbst: but...
22:15airlied: karolherbst: yes that + vec8/16 but definitely structured cfg :-P
22:15karolherbst: yeah.. thing is... we can also just handle it in the driver for everybody instead :D
22:15karolherbst: not if they would do anything different
22:15karolherbst: GPU optimized libclc might be a good argument
22:16karolherbst: but then you can also just write a vk extension adding that instruction set
22:18karolherbst: but yeah... we'll see how it turns out
22:19karolherbst: I really should upstream rusticl on zink and file for official conformance :D
22:19karolherbst: radv is just being a bit annoying with more crashes then anv or lvp
22:19karolherbst: *than
22:19karolherbst: and running on nvidia is kinda slow for whatever reason
22:27alyssa: karolherbst: ruzticl conformance .. i hope that deprecates clvk >:)
22:28karolherbst: heh
22:28karolherbst: I'll just try to be conformant before them
22:28karolherbst: I think I ignored it for long enough
22:28karolherbst: this xdc conformant rusticl on zink?
22:28alyssa: >:)
22:28alyssa: 30 day window, yeah you could make it
22:28karolherbst: I don't have _that_ much time
22:28karolherbst: but also
22:29karolherbst: I think I need to fix like one or two bugs?
22:29karolherbst: I'm mostly there
22:30karolherbst: there are just annoying things to figure out
22:30karolherbst: "Pass 2374 Fails 69 Crashes 11 Timeouts 0" on anv, where 60 fails are CL_FILTER_NEAREST fails, which the conformance test doesn't check at all
22:30alyssa: nice
22:31karolherbst: so 9 fails and 11 crashes out of 2400
22:31alyssa: karolherbst: when you get a chance btw I want to talk CL on M1 >:)
22:31karolherbst: radv is causing more issues.. and I manage to crash Nvidia
22:31karolherbst: alyssa: I should upstream my branch :D
22:31alyssa: we now have ES3.1 class compute/images + working 8-bit/16-bit/64-bit tested against dEQP-VK
22:31karolherbst: yeah...
22:32alyssa: right now the thing I'm most worried about are float controls
22:32karolherbst: I have it all working
22:32alyssa: I have no idea if there's flush-to-zero in hardware, if it's always on, never on, if we can control in the driver..
22:32alyssa: I'm nervous abut those contractions tests
22:32karolherbst: ohh, that's fine
22:32alyssa: can't.
22:32alyssa: won't.
22:32alyssa: shouldn't.
22:32karolherbst: you don't need flush-to-zero supported
22:32karolherbst: it's entirely optional
22:33alyssa: the problem is that our libclc build assumes ftz
22:33alyssa: and I don't want to sign up for building a denorm-aware libclc
22:33karolherbst: ohhh...
22:33alyssa: for Mali, I had to enable the hardware ftz to pass the contractions tests
22:33karolherbst: uhhh...
22:33karolherbst: I don't think I had issues with that?
22:33alyssa: I know it's not an opencl requirement but I think it effectively is a rusticl one ..
22:34karolherbst: yeah.. at this point I don't support denorms yet
22:34alyssa: karolherbst: See b261a185508 ("panfrost: Honour flush-to-zero controls on Valhall")
22:35bnieuwenhuizen: dcbaker: what happened to 23.2? I see rc2 happened more than 3 weeks ago. Any issues we can help with?
22:35alyssa: karolherbst: Presumably, you don't see issues because the float_controls_execution_mode is honoured by the underlying drivers
22:35karolherbst: alyssa: mhh.. I can certainly take another look, now that I have working vec8/16 to vec4 lowering
22:35alyssa: karolherbst: what's that got to do with anything?
22:35alyssa: AGX is scalar
22:35alyssa: vec8 stuff gets lowered via alu_width + mem_width
22:36karolherbst: not all of it
22:36karolherbst: uhh
22:36karolherbst: maybe it does now
22:36karolherbst: but you were left with some leftoever vec8/16 stuff
22:36karolherbst: I never bothered checking actually
22:37karolherbst: anyway
22:37karolherbst: I have some leftover patches I can submit an MR for
22:37DavidHeidelberg[m]: btw. Intel gles + vk StarWars tellusim engine crashes on recent mesa (but I have nightly build without debug). Btw. it worked on stable mesa
22:37DavidHeidelberg[m]: I'll compile with debug and drop logs
22:38karolherbst: alyssa: though what I really need is a proper agx_get_compute_state_info implementation
22:38karolherbst: specifically that `max_threads` property
22:39kisak: DavidHeidelberg[m]: file an issue report on gitlab if it hasn't been already so that your findings aren't buried in the backscroll.
22:41DavidHeidelberg[m]: kisak: sure sure, I also asked for more update version, if there is any from Tellusim (this is 20221109), anyway still it should work, let me compile one beatiful mesa with intel vk and iris :P
22:46alyssa: karolherbst: should be easy
22:46alyssa: In the compiler agx_occupancy_for_register_count gives you the max_threads (grep for it in the src)
22:47karolherbst: nice
22:47alyssa: so just need to add that to agx_shader_info and then you'll get the value as part of the agx_compiled_shader
22:47karolherbst: yeah, that should be good enough
22:47alyssa: can't 1000% guarantee correctness but it should be a good enough approximation
22:47karolherbst: the CTS will run into issues if that value is too high
22:48alyssa: good, I want to hear about those since if I got this wrong, perf will suffer
22:48illwieckz: speaking about rusticl on zink, once terakan works, would it be possible to use rusticl on zink to get opencl on terascale?
22:49illwieckz: or the vulkan version/features supported would not be enough?
22:49karolherbst: uhhh
22:49karolherbst: if it supports bda
22:49illwieckz: what's bda?
22:49karolherbst: real pointers
22:49illwieckz: ah ok
22:49karolherbst: which I doubt terascale could support
22:50karolherbst: not sure
22:50karolherbst: you kinda need an MMU for that and all that
22:50illwieckz: now that you say it I feel like I already asked and we already had this conversion 🤷♀️️
22:50karolherbst: because with CL you can use arbitrary pointers
22:50karolherbst: probably
22:51illwieckz: the information flows in my brain like if that's not the first time it happened
23:16karolherbst: do gallium drivers expect the frontend to call nir_lower_pack?
23:16karolherbst: at least the glsl linker does it mhhh...
23:18karolherbst: at least zink seems to expect that..
23:56Lynne: airlied: tests passed, but I think someone needs to press the rebase button