00:40karolherbst: well.. sure, but the point I was making here, I already have shaders with _millions_ of SSA values
00:40karolherbst: literally
00:40alyssa: I mean
00:40alyssa: if you have shaders executing millions of instructions I kinda feel like you're already toast?
00:40karolherbst: it's not millions of instructions
00:40karolherbst: it's just control flow
00:40alyssa: oh
00:40alyssa: meh?
00:40karolherbst: and very nasty one
00:41karolherbst: the thing is.. those shaders usually run if LLVM compiles them to AMD
00:41karolherbst: but on mesa you OOM your system
00:41karolherbst: so any heurestic where we always inline functions based on argument types won't work
00:41karolherbst: because what if that function is called bazillion times?
00:41karolherbst: then we are again toast
00:42karolherbst: some shaders also do switches on type parameters to call into certain functions and other unky bits
00:42karolherbst: like hand rolled function tables
00:43karolherbst: some of the compute kernels are just massive and wild
00:44karolherbst: but if we allow function calls, we can also just duplicate functions with generic arguments and call the variant we actually need
00:44karolherbst: might be better than if-else-ladders resolving generic pointers
00:44karolherbst: but we also kinda want to make use of hardware supporting generic pointers natively
00:45karolherbst: which is the best case and solves a lot of the pain points here
01:13alyssa: sure
01:13alyssa: I suspect they're in the minority, though?
01:13alyssa: I mean, Mali does but it's deeply terrible
01:14alyssa: and honestly i'd be tempted to do 62-bit on mali
01:16alyssa: karolherbst: FWIW, Apple claims that they force inline functions that read stack or constant mem
01:17alyssa: citing "SROA, Buffer preloading"
01:25DemiMarie: karolherbst: is this some sort of numerical algorithm or scientific computing code? If so, this would not surprise me at all.
01:53youmukonpaku1337: hey guys
01:53youmukonpaku1337: so im trying to do a bit of trickery with my mainlined ebook and get usb display
01:53youmukonpaku1337: and it WORKS but its using llvmpipe instead of lima and i get this
01:53youmukonpaku1337: libGL error: failed to load driver: gud
01:53youmukonpaku1337: libGL error: MESA-LOADER: failed to open gud: /usr/lib/dri/gud_dri.so: cannot open shared object file: No such file or directory (search paths /usr/lib/arm-linux-gnueabihf/dri:\$${ORIGIN}/dri:/usr/lib/dri,
01:53youmukonpaku1337: suffix _dri)
01:53youmukonpaku1337: am i missing something? am using mesa from debian repos
01:54youmukonpaku1337: oh and the way i got GUD is very fucky (compiling out of tree with kernel headers) but it seems fine
01:56youmukonpaku1337: i have the module and theres a drm device at card1
01:59youmukonpaku1337: oh
01:59youmukonpaku1337: right
01:59youmukonpaku1337: i probs need gl4es lol
02:03youmukonpaku1337: es2gears also works but uhh
02:03youmukonpaku1337: same err
02:17kode54: I have to test a regression in ANV since 23.1.6
02:18kode54: It renders that game, The Spirit and The Mouse, into a colorful and flickery mess
05:53airlied:fails to get a gitlab container to run deqp tests locally, the docs don't seem to be up to date, or just don't tell you how to run deqp/piglit tests against a build
05:54airlied: or at least the docs explain builds, but not how to test already built artifacts
06:10airlied: okay hacked it around, and now the tests don't hit the assert in my container they hit in the CI one
06:12youmukonpaku1337: huh lima DOES have desktop gl
06:12youmukonpaku1337: why is mesa looking for a gud.so though
06:12youmukonpaku1337: *gud_dri.so
06:13youmukonpaku1337: what did i mess up lmao
06:13Sachiel: what the hell is gud?
06:13youmukonpaku1337: generic usb display
06:13youmukonpaku1337: essentially a way to get display output with a pi turned into a usb gadget
06:14youmukonpaku1337: it *works* (kinda) but mesa freaks out and spits this: libGL error: failed to load driver: gud
06:14youmukonpaku1337: libGL error: MESA-LOADER: failed to open gud: /usr/lib/dri/gud_dri.so: cannot open shared object file: No such file or directory (search paths /usr/lib/arm-linux-gnueabihf/dri:\$${ORIGIN}/dri:/usr/lib/dri,
06:14Sachiel: oh, if that's using a specific kernel driver, then something that doesn't recognize it might be trying to find a userspace driver matching the name, thus the failed search for gud_dri.so
06:14youmukonpaku1337: suffix _dri)
06:14youmukonpaku1337: ah
06:15youmukonpaku1337: any way to make it not do that?
06:15youmukonpaku1337: but yea theres no userspace driver
06:15Sachiel: try MESA_LOADER_DRIVER_OVERRIDE=whateveryouexpecttowork
06:15youmukonpaku1337: oh true
06:15youmukonpaku1337: also for some reason even with an override es2gears and glxgears run at 15fps
06:15youmukonpaku1337: ;-;
06:16youmukonpaku1337: i doubt the mali 400 is *that* bad
06:16youmukonpaku1337: maybe i should test with wayland instead of X
06:19youmukonpaku1337: anyway i guess ill test once im home lol
06:19youmukonpaku1337: still kinda cool that im able to get any display output at all on an ebook
06:20youmukonpaku1337: using wifi pins for usb lol
06:20youmukonpaku1337: https://youmu.i-am-in-your.systems/EzbdBigmCqRj
06:21youmukonpaku1337: i changes out the about-to-short usb port for a breakout board now but its still about as cursed
06:22youmukonpaku1337: also had to compile GUD out of tree because it isnt enabled in linux-image-armmp :(
06:42kode54: cool
06:42kode54: I found the bad commit, or commits
06:42kode54: it outright crashes on them
06:42kode54: I'm building a full debug build now to produce proper backtraces
06:43kode54: the thing I hate about debug builds of mesa is that this full build results in about a 2GB install footprint
06:43kode54: most of which is the debugging symbols package
06:43kode54: it also takes upwards of 10-15 minutes for the strip/objcopy process that pulls the debug data off the binaries and stuffs it into a debug package
06:44kode54: the default mesa-tkg-git package config and script, the PKGBUILD hardcodes b_ndebug=true, and the config file defaults to --strip --buildtype release
07:00kode54: crap
07:00kode54: doesn't crash in debug build
07:00kode54: but it does have the rendering bugs
07:48karolherbst: DemiMarie: ray tracer
07:49karolherbst: alyssa: I'm sure we could do it for _most_ functions, but we have to be mindful about how we do it all. There will be situation we can't simply inline certain type of functions, because it would blow up the kernel. If inlining works for 99% of the applications, good, but we need a fallback for the 1%
07:56pq: youmukonpaku1337, I don't think that USB display drivers (that is, *not* USB-C DP alt mode) would support hardware rendered content (dmabuf), which is the reason why you'd get software rendering on GUD. There could be hardware+display specific exceptions, but I don't know about those.
07:57pq: youmukonpaku1337, a Wayland compositor could implement hardware rendering and then do a CPU copy into GUD's buffers, but I don't know if anyone implemented that.
07:59pq: oh right, Mutter does at least
08:02pq: youmukonpaku1337, a USB display driver is probably always going to shovel pixels with the CPU, so that will always hurt.
09:07youmukonpaku1337: pq: oh yea i see (btw is there a way to use mutter alone without gnome)
09:08pq: umm... mutter can run without gnome-shell, but I'm not sure how useful that is
09:08pq: other than testing
09:10youmukonpaku1337: pq: though i probs gotta test weston too, might work
09:10youmukonpaku1337: ~~as long as it doesnt use waaaay too much ram~~
09:10pq: who knows, maybe you could configure even Xorg to render with lima and copy to GUD...
09:11pq: I don't remember Weston having such copy you'd need, but it has had some multi-DRM-device patches I haven't really looked into what they do.
09:12pq: for Xorg, if you can get it to recognize both rendering and GUD devices, playing with xrandr --setprovideroutputsource / --setprovideroffloadsink
09:13pq: ..might do something maybe
09:33turol_: is it possible for non-developers to get rights to add tags to issues?
09:33turol_: labels, whatever gitlab calls them
10:16DavidHeidelberg[m]: to everyone who running manually pipelines for testing their MR: We currently have too many rootfs images hiting the caches, please always rebase before running pipeline, if you can.
11:07alyssa: DavidHeidelberg[m]: rebase on upstream to pick up the latest image?
11:09turol_: alyssa: the nir if condition change also seems to apply to loops
11:09turol_: that caused a regression
11:09turol_: was is intended to apply to loops?
11:10turol_: issue 9750 if you want more details
11:12alyssa: uh oh
11:13turol_: it triggered unrolling of a loop that previously wasn't
11:13alyssa: turol_: What's the regression?
11:13alyssa: Being able to unroll more loops is a good thing..
11:13turol_: causing increased register pressure and lowered subgroups per SIMD
11:13turol_: not when there's a texture read inside the loop
11:14alyssa: ok, but that's a deficiency in the loop unrolling heuristic then (deciding to unroll loops when it's not beneficial)
11:15alyssa: not the fault of last night's patch
11:15alyssa: and also, unrolling loops with a texture read inside may *still* be a win in practice?
11:15alyssa: you get lower occupancy, but you get more ILP to hide the latency, and might come out ahead despite the pipeline stats
11:16DavidHeidelberg[m]: alyssa: if you have older MR, which using .gitlab-ci/image-tags.yml which been produced long time ago, the CI (until assigned to Marge) will use the old images
11:16DavidHeidelberg[m]: and the old images aren't usually that much cached
11:16alyssa: DavidHeidelberg[m]: +1, got it2
11:16alyssa: thx
11:17alyssa: turol_: see the discussion in https://gitlab.freedesktop.org/mesa/mesa/-/issues/7161
11:17turol_: just tried, 142 fps unrolled, 146 fps not unrolled
11:17alyssa: OK. That's a more interesting statistic then
11:17turol_: it's the slowest shader of SMAA
11:17turol_: the others are pretty simple
11:18turol_: it's actually a little bit infamous for causing issues in both spirv-tools and spirv-cross
11:18pendingchaos: I'm not sure this particular form of loop unrolling is a good idea (it's that weird nested if form) since it usually doesn't overlap iterations
11:18pendingchaos: but we would need LICM/GCM to fully replace it
11:18pendingchaos: it's not very beneficial for this particular shader (just doing LICM for a descriptor load)
11:21pendingchaos: (complex_unroll() in nir_opt_loop_unroll.c, I think)
11:27turol_: on nvidia proprietary driver unrolling or not affects the binary size but not register count
11:27turol_: fps seems identical
11:27turol_: don't have other amds to easily test
11:28turol_: does someone have instructions for setting up a chroot/vm for compiling mesa for the steam deck?
11:28pendingchaos: you can use RADV_FORCE_FAMILY and Fossilize to loop at how shaders compile for other gpus
11:29turol_: but that doesn't let me test the fps
11:29alyssa: pendingchaos: will NIR ever grow a dedicated LICM? or is that purely part of nir_opt_gcm?
11:30alyssa: (Every time I look at opt_gcm, it blows up my reg pressure and slows things down)
11:30pendingchaos: no idea
11:31turol_: and like i mentioned in the issue while i can fix this shader for myself that doesn't help everyone else who's used it in their proprietary game
11:31turol_: on the other hand in more complicated render it's proportionately less important
11:31pendingchaos: maybe nir_opt_gcm can be modified so that it can only do LICM
11:32alyssa: pendingchaos: fair
11:32alyssa: the other case that comes up is duplicated stuff on both sides of an if
11:33alyssa: another case that's not nearly as problematic of gcm's usual thing of "move EVERYTHING!!"
11:33alyssa: but opt_gcm seems like a blunt hammer, idk
12:45pq: swick[m], what property setting ioctls did you refer to in the email?
12:50swick[m]: pq: DRM_IOCTL_MODE_SETPROPERTY, etc
12:50pq: why would you use those?
12:51swick[m]: to set the property of a connector?
12:51swick[m]: it's all hidden in libdrm
12:51pq: no, that's atomic commit ioctl
12:51pq: let's see...
12:51swick[m]: mhh, is it?
12:52pq: it wouldn't be atomic, if each property was set with a separate ioctl
12:52swick[m]: oh, you're right...
12:52swick[m]: I mean, it could still be atomic
12:54pq: atomic commit ioctl argument is struct drm_mode_atomic, and it seems to contain the whole lot.
12:55swick[m]: yes, it actually only issues one ioctl
12:55swick[m]: my bad
12:56pq: I thought I missed something :-)
12:56swick[m]: just saying, that's not a requirement for it to be atomic, just like in wayland where we built up state in the compositor and then start using it on a commit message
12:57pq: right, if DRM_IOCTL_MODE_SETPROPERTY staged stuff
12:58zamundaaa[m]: Please don't reinvent the atomic API in worse
12:58pq: we're not
13:13pq: Is it so that KMS has no way of choosing BT.2100 ICtCp as video stream colorimetry?
13:15pq: not in v6.5 it seems
13:16swick[m]: do sinks support that?
13:17pq: I dunno, but CTA-861 defines it
13:18pq: no hits in linuxhw/EDID, so I guess not
13:18swick[m]: oh, in 861-H
13:18swick[m]: pretty new then
13:19pq: oh, yeah, I'm reading H, and wasn't there I already too?
13:19swick[m]: only YCbCr in 861-G
13:21pq: there is no RGB variant of it, is there?
13:21pq: or you mean BT2020_YCC?
13:22pq: swick[m], t
13:23pq: swick[m], this reminds me, should the new color pipeline UAPI replace the automatic RGB/YCC selection from the start?
13:23swick[m]: yeah, bt 2020 YCC is defined in CTA-861-G already but not ICtCp
13:24swick[m]: it's only for the plane right now, so I don't think so
13:24pq: right, memory is slowly coming back
13:24pq: and it can be added later with "auto"
13:29pq: were the diagrams supposed to appear as rendered images in https://dri.freedesktop.org/docs/drm/gpu/drm-kms.html ? I see source, e.g. Overview.
14:41mareko: llvmpipe-traces times out randomly: https://gitlab.freedesktop.org/mesa/mesa/-/jobs/48524383
14:41zmike: yeah something to do with new infra
14:41zmike: being discussed in #freedesktop
14:43karolherbst: jasuarez: I'm kinda looking into compute stuff for v3d, but I'm running issues with fences. At least it looks like some aren't signaled and I wonder what's the best approach here to debug it
14:48jasuarez: I never deal with such issues so not sure what's the best approach
14:49jasuarez: I don't remember to have anything special for that
14:54karolherbst: mhh.. maybe I'm doing something incorrectly, but I also don't see the GPU faulting, or at least nothing in dmesg
14:56karolherbst: jasuarez: do you know if all memory (a.k.a. pipe_resources) need to be referenced before work can be launched/waited on or somehting odd like that? I'm currently not doing this, so maybe I want to figure out how to properly do it in v3d
14:56karolherbst: but then again it's a bit odd to not see any errors
14:59karolherbst: yeah mhh.. doesn't seem to be it either
15:20jasuarez: Pretty sure Iago dealt with then when developing v3dv, but he is not connected now. I could ping him tomorrow
15:20karolherbst: cool
16:15mareko: rustcuda when
16:16karolherbst: I wonder if layering HIP on CL is good enough here, at least that's my hope and that the project in question works out :D
16:16karolherbst: but I also don't know if AMD plans to stay compatible with CUDA forever or not
16:16youmukonpaku1337: amd is compatible with cuda??
16:16youmukonpaku1337: the hell
16:17mareko: youmukonpaku1337: it's called HIP
16:17karolherbst: well.. HIP is basically `s/cu/hip/` + some mistakes or something
16:17youmukonpaku1337: i see
16:17mareko: I'm hearing nobody uses OpenCL
16:18karolherbst: yeah, hearing that a lot from AMD people
16:18youmukonpaku1337: yep thats true
16:18youmukonpaku1337: most stuff uses cuda
16:18karolherbst: yeah, but the reason is, that all the CL stacks were horrible in the past :D
16:18karolherbst: but yeah..
16:19karolherbst: at least there are a couple of companies still invested in CL.. anyway.. I think layering CUDA/HIP/whatever on top of CL or whatever is probably the best strategy here
16:19karolherbst: and such projects already exists
16:19youmukonpaku1337: yep that could work
16:19youmukonpaku1337: ~~the zink of opencl~~
16:19karolherbst: HIP on CL on zink on....
16:20mareko: .. glide
16:20zmike: nope shut it down
16:20karolherbst: *layers weren't supposed to be layered on top of layers*
16:20mareko: or zink on r600
16:20zmike: don't encourage them
16:20karolherbst: uhhh
16:21karolherbst: anyway... all those HIP on CL layers require insane extensions
16:21karolherbst: e.g. SVM
16:21karolherbst: :')
16:21youmukonpaku1337: anyway unrelated but why the hell does an e reader need a dedicated video encoder/decoder chip on the soc LOL
16:21karolherbst: mhh
16:21karolherbst: copyrighted embedded videos?
16:21youmukonpaku1337: am not complaining but its kind of funny
16:21youmukonpaku1337: nope
16:21karolherbst: (with DR)
16:22karolherbst: *DR
16:22youmukonpaku1337: the reader never plays any videos of sort
16:22karolherbst: ... my M is stuck
16:22karolherbst: huh.. weird
16:22karolherbst: adds?
16:22karolherbst: :D
16:22karolherbst: ehh
16:22karolherbst: ads
16:22youmukonpaku1337: its just the allwinner a13 has a cedar VPU and they couldnt be bothered to get anothee soc lol
16:23karolherbst: uhh
16:23youmukonpaku1337: ads are impossible, this device doesnt have wifi (by default at least)
16:23youmukonpaku1337: HOWEVER
16:23youmukonpaku1337: you can uh
16:23youmukonpaku1337: do something so utterly cursed
16:23karolherbst: like using the sound card?
16:23youmukonpaku1337: that its just plain insane
16:23youmukonpaku1337: karolherbst: no sound card to speak of
16:23mareko: lavacuda would be interesting, zink can help I'm sure
16:24karolherbst: sooo... there is this cuda driver API we could potentially implement
16:24karolherbst: which is libcuda.so
16:24karolherbst: but I have no idea how painful that would be
16:24youmukonpaku1337: karolherbst: check out this monstrosity i made https://youmu.i-am-in-your.systems/MtWKwDaqmTye https://youmu.i-am-in-your.systems/fnpkpGMuwCpB
16:25youmukonpaku1337: so technically this system doesnt have wifi *right*
16:25youmukonpaku1337: BUT the usb interface used for it works and is very easy to use
16:25youmukonpaku1337: so its like a nice usb interface
16:26youmukonpaku1337: i should probs boost the 3.3v it provides to 5v
16:26youmukonpaku1337: instead of external power
16:26karolherbst: cursed
16:26youmukonpaku1337: very
16:26youmukonpaku1337: but it works (horribly)
16:26youmukonpaku1337: i love compiling wifi drivers for an hour
16:26youmukonpaku1337: best pastime
16:27youmukonpaku1337: thank god i didnt have to cross comp mesa and lima is included in stock debian mesa lol
16:30youmukonpaku1337: i just hope that i can get desktop gl lima to work
16:30youmukonpaku1337: because if so this makes this actually workable instead of pure hell
16:50gildekel: @pq @emersion Hi! I am currently going through review with Intel on a series that suggest a fix around complete link-training failures, in which in these cases, the effective bandwidth of a connector is set to 0Gbps, which will cause all its modes to be pruned in in the next probing. The risk here is introducing a change that userspaces are not expecting. The intuition suggests that connectors without modes should be ignored...
16:50gildekel: The series is here: https://patchwork.freedesktop.org/series/122850/
16:50gildekel: I would love to get your input as weston/sway maintainers (hope I got it right)
16:52gildekel: And, needless to say, anyone else here who feel like this change is relevant to their product stability
17:06zamundaaa[m]: For KWin connectors with zero modes would be fine; this already happened in the past (don't remember in what circumstances though) so we have a workaround in place
17:12gildekel: That's good. The approach here is that upon link-training failure, userspace will get a uevent in which it will see the failed connector is "sterile", so ignoring it, or marking it in a bad state is the goal. At least that's what we would like to see in ChromeOS.
19:06airlied: karolherbst: ptx parser in rust?
19:07HdkR: Don't even need to parse PTX, plenty of Switch emulators proved you can just take the raw ISA and translate it :P
19:07karolherbst: airlied: why not tho...
19:07airlied: HdkR: that assumes yoy have raw isa though
19:08karolherbst: I just think it makes more sense to have an open ecosystem besides CUDA, so opting in into supporting CUDA is kinda a double edge one here
19:08airlied: but yeah sass to nir translator
19:08karolherbst: HdkR: right... we could even pattern match commong lowering, it shouldn't be all too hard
19:08karolherbst: for compute almost none of this exists anyway
19:09karolherbst: though cuda supports texgrad and other evil things :')
19:10airlied: also does nvidia have a d9cumented calling convention to their sass kernels?
19:12karolherbst: uhm... no idea
19:12karolherbst: airlied: you mean sass kernels as in normal compute shaders?
19:13karolherbst: the elf binaries usually document all the constant buffers
19:13karolherbst: but not sure how flexible they are with that
19:13karolherbst: but there doesn't really exist any kinda of calling convention here besides some internal data passed in via const buffers
19:13karolherbst: at fixed locations
19:16airlied: dont they have libs you link against, or is it just pre made kernels?
19:17karolherbst: they have some internal binaries and I'm sure there is some kinda of calling convention for those, but I don't actually know what they are doing there
19:18karolherbst: anyway, nvidia does not want you to target SASS
19:18karolherbst: so I doubt they document anything
19:20karolherbst: and I'm sure I wont' even be allowed to help out writing a SASS parser....
19:20karolherbst: or at least that might bring me in a icky legal situation
19:56jtatz[m]: cuobjdump can dump SASS, and the ISA is somewhat documented https://docs.nvidia.com/cuda/cuda-binary-utilities/index.html#instruction-set-reference
19:59jtatz[m]: Also for JIT kernels you can use CUPTI to grab it at runtime
20:06alyssa: considerably more docs than I was expecting, neat
20:08youmukonpaku1337: okay this is not working out
20:08youmukonpaku1337: cant change even resolution
20:08youmukonpaku1337: no gles2 in es2info with lima
20:08youmukonpaku1337: no desktop gl either
20:08youmukonpaku1337: and X is buggy as all hell
20:09youmukonpaku1337: and gud + lima arent in xrandr providers
20:09DemiMarie: youmukonpaku1337: try Wayland
20:10youmukonpaku1337: weston literally locks up the system
20:10youmukonpaku1337: so uh
20:10youmukonpaku1337: anything else
20:14youmukonpaku1337: DemiMarie: can i run mutter separate from gnome? it seems to support this kind of trickery
20:16karolherbst: "somehwat documented" :D
20:16karolherbst: yeah...
20:18DemiMarie: youmukonpaku1337: Try Sway or KWin.
20:18youmukonpaku1337: kwin... definitely no
20:18youmukonpaku1337: am gonna try mutter first because pq mentioned it having support for trickery like what im doing
20:19DemiMarie: And report a kernel bug, because Weston should not lock up the system in a way that killing it cannot correct.
20:19youmukonpaku1337: i mean, system is more or less alive but it crashes GUD it seems
20:20DemiMarie: GUD?
20:25youmukonpaku1337: Generic USB Display
20:40DemiMarie: Ah
20:40DemiMarie: Probably a kernel bug; I would report it to the relevant mailing lists.
20:42youmukonpaku1337: hmm
20:42youmukonpaku1337: how can i run mutter with PRIME
20:49youmukon1: youch
20:49youmukon1: 4fps under mutter with es2gears wayland
20:49youmukon1: also permission denied with kmscube
20:50youmukon1: oh nvm
20:51youmukon1: okay so i can get 2fps in kmscube if i run it on card0 which is software accelerated (and badly at that) GUD
20:52youmukon1: question is how can i use lima to render to card0
20:52youmukon1: if i do mesa loader override to use lima it just throws an invalid modeset argument error
20:54Sachiel: mesa drivers expect to talk to their corresponding kernel driver, not just some random out of tree thing, so if you are having issues with some random out of tree thing, go ask their authors for support. I don't think you'll find much help here with that
20:55youmukon1: GUD is in mainline lol
20:56zmike: sounds like bugs
20:57youmukon1: ehhh
20:57youmukon1: its probably intended and it uses sw accel by default
20:57youmukon1: question is how do i make it offload to lima
21:00airlied: might have to hack gud kmsro, not sure if that would help
21:01youmukon1: what's kmsro
21:02youmukon1: as long as it isn't *too* difficult im fine with a little trickery
21:03airlied: kmsro is mesa internal thing to link accel and display drivers
21:03youmukon1: aha
21:03youmukon1: i see
21:04airlied: i think you might need to write code in mesa, but that is close to the limit of what i know about it
21:04youmukon1: oh fuck
21:04youmukon1: i have almost 0 knowledge of how to write C lol
21:05karolherbst: does setting `DRI_PRIME=1` help or did you already try that?
21:05youmukon1: tried that
21:05youmukon1: nope
21:05youmukon1: also
21:05youmukon1: kmscube shows renderer as mali400
21:05youmukon1: but uhh i somehow doubt that's right
21:06karolherbst: why not?
21:06youmukon1: 2.5 frames per second
21:06karolherbst: there might be a different reason it's so slow
21:06karolherbst: maybe it's CPU overhead
21:07youmukon1: hm
21:07karolherbst: the content of the frames kinda need to be copied over to the display driver
21:07karolherbst: and if there is no accelerated path for that the performance is kinda toast
21:07airlied: also copied over usb
21:07karolherbst: try LIBGL_ALWAYS_SOFTARE=1 and see if that changes antyhing
21:07youmukon1: thats possible but also es2 info and glxinfo list driver as llvmpipe
21:07youmukon1: oh
21:07youmukon1: will test
21:07youmukon1: in a sec
21:08karolherbst: LIBGL_ALWAYS_SOFTWARE=1 I mean
21:08karolherbst: but kmscube is kinda special
21:08karolherbst: there might be a different way for kmscube to use llvmpipe
21:12youmukon1: karolherbst: libgl always software makes kmscube throw "failed to set mode: invalid argument"
21:13karolherbst: fair
21:13youmukon1: it does show that jts using llvmpipe before that
21:13youmukon1: hm
21:14youmukon1: i would go the kmsro route but i have absolutelt no idea how to program C so i suppose thats not an option lol
21:16youmukon1: is there a way to test rendering speed headlessly?
21:16karolherbst: I think it's already working as intented
21:16karolherbst: it's just that the kernel driver doesn't provide what we need for proper offloading here
21:16karolherbst: at least that's my working theory
21:16karolherbst: did you check the CPU load?
21:16youmukon1: am about to do that
21:16karolherbst: and where it spends most of the CPU cycles at
21:17karolherbst: or rather what process uses most of the CPU
21:20youmukon1: kmscube was not using much cpu at all
21:20karolherbst: yeah.. so it's indeed not softare rendering, or if it is, the bottleneck is something else
21:21youmukon1: and how would i go about finding it i guess?
21:21karolherbst: is your CPU busy nonetheless?
21:21youmukon1: what do you consider busy
21:22youmukon1: around 10% with htop open
21:22karolherbst: mhh, that's not much
21:22karolherbst: like total or is one core at 100%?
21:22youmukon1: theres a single core lol
21:22karolherbst: heh
21:22karolherbst: yeah.. so I guess something with the GUD driver and usb and... other things is why it's slow
21:23daniels: there’s no magic bullet here - the only possible solution (pipelining rendering) absolutely requires knowing C
21:23youmukon1: ugh
21:23daniels: fundamentally, you have a very slow GPU rendering to system memory, then copying back out over USB, and waiting for this to take effect, every frame
21:26youmukonpaku1337: yeah i can see why itd be slow...
21:26youmukonpaku1337: wait
21:26youmukonpaku1337: how can i check usb bandwidth?
21:27youmukonpaku1337: i may have a hunch something went horribly wrong and im running over usb1.1 bandwidth
21:27karolherbst: uhh.. that would be terrible indeed
21:28youmukonpaku1337: very
21:28karolherbst: but would the bandwidth be enough for displaying anything?
21:28glennk: lsusb -t should show the theoretical bandwidth for each port
21:28youmukonpaku1337: karolherbst: for a tty should be lol
21:28glennk: i'm also guessing this platform is stuck with single channel memory too?
21:29youmukonpaku1337: OH
21:29youmukonpaku1337: LMFAO
21:29youmukonpaku1337: it is running at 12mbit bandwidth
21:29youmukonpaku1337: the entire hub
21:29karolherbst: RIP
21:29youmukonpaku1337: ugh
21:29youmukonpaku1337: guess im gonna have to somehow use the main port
21:29glennk: all pixels, line up in single file...
21:30youmukonpaku1337: glennk: i mean i doubt it would have dual channel 256mb ram lol
21:30youmukonpaku1337: and no its single channel
21:30youmukonpaku1337: but yea
21:30glennk: gpu + cpu + usb memory access
21:30youmukonpaku1337: i think i found the problem lmao
21:31youmukonpaku1337: lemme try uh
21:31youmukonpaku1337: the main micro b port
21:31youmukonpaku1337: it didnt work before but you never know
21:34youmukonpaku1337: oh GREAT
21:34youmukonpaku1337: guess we're back to the roots of this
21:34youmukonpaku1337: ;-;
21:35youmukonpaku1337: how can i check mode of a usb port?
21:39youmukonpaku1337: i do have dr_mode set to host in DT but i dont think it works lol
21:40glennk: cat /sys/bus/usb/devices/<device>/speed and version is one way
21:41youmukonpaku1337: mode as in peripheral or host
21:41youmukonpaku1337: not speed
21:47youmukonpaku1337: hm
21:47youmukonpaku1337: it should be host
21:47youmukonpaku1337: this is quite weird
21:59alyssa: karolherbst: hmm, this is a spicy problem
21:59alyssa: I /really/ want read-only buffers to be read with load_global_Constant, not load_global
21:59alyssa: Is that even a thing in CL? I guess _constant?
22:00alyssa: doesn't work with generic ptrs, though
22:03alyssa: ok I guess I can just not use generic ptrs, fine
22:09alyssa: OK, yeah, using constant does what I need. Cool
22:09alyssa: thanks
22:10karolherbst: alyssa: yeah.. I want to use ubos for those things in the near future
22:10alyssa: I don't =D
22:10alyssa:flexes her agx hardwrae
22:11karolherbst: ehh...
22:11karolherbst: it should at least use load_global_constant though I think
22:11alyssa: yeah I just wrote that patch
22:11alyssa: https://rosenzweig.io/0001-nir-lower_io-Use-load_global_constant-for-OpenCL.patch
22:11karolherbst: but some hardware benefits from those being actual ubos
22:12karolherbst: yeah... I think I have a patch like that somewhere as well
22:12karolherbst: I suspect you lower ubos to load_global_constant in agx?
22:12alyssa: yes
22:12alyssa: in the gl driver
22:12karolherbst: do you load the descriptor at runtime?
22:13alyssa: there is no descriptor
22:13karolherbst: I mean.. the actual address
22:13alyssa: it's pushed in
22:13karolherbst: mhhh
22:13alyssa: AGX is literally the CL model
22:13karolherbst: right...
22:13alyssa: pass __constant pointers in and read em
22:14karolherbst: yeah, that's fair, I'm just wondering if there is a significant overhead when using ubos and if we want to make that optional
22:14karolherbst: on nvidia it really should be an ubo e.g.
22:14karolherbst: I probably don't even need a new cap for, if the constant buffer size is below 1M it's probably a hardware thing :D
22:15karolherbst: `PIPE_CAP_MAX_SHADER_BUFFER_SIZE_UINT` is what I'm currently reporting as constant memory size
22:17karolherbst: alyssa: the thing on nvidia is, that you can literally use ubos as sources for alu instructions
22:17karolherbst: and they are as fast as gprs
22:17karolherbst: but I know that some drivers really just map them to raw memory
22:18karolherbst: so I guess this situation all warrants a flag (which st/mesa might also do in the far future if at all)
22:18karolherbst: though I thikn there are also robustness arguments, so drivers are supposed to bound check them?
22:24alyssa: i can literally use arbitrarily complicated expressions on UBOs as sources for alu instructions as fast as gpr :~)
22:24karolherbst: sounds cursed
22:25alyssa: nir_opt_preamble
22:40karolherbst: right.. but I guess you don't have like 1MB of space there
22:41karolherbst: anyway, you still bind it as a normal buffer
22:42karolherbst: not as some special ubo thing
22:43alyssa: 1KiB, so not quite as big as you ;)
22:44karolherbst: right.. I'm just wondering if it makes sense to add complexity to bind constant buffers as ubos or normal global memory depending on what the driver wants or if it's good enough to always bind them as ubos in the future
22:47alyssa: always UBOs is fine
22:48karolherbst: okay, cool
23:53emersion: gildekel: yeah i think 0 modes is a kernel bug