IRC Logs of #dri-devel on irc.freenode.net for 2023-09-08

00:46 anholt_: zmike: so, I think I've got some hope for dropping the deqp-runner build in the rootfs -- I can cargo deb them during CI pipelines, and then when I tag a release I do a release build pipeline that makes them permanent artifacts..
00:47 anholt_: if I figure out this workflow, it might be interesting to do for deqp.
00:53 airlied:throws llvmpipe functions at marge
01:09 karolherbst: airlied: I suspect doing the same with radeonsi will be way more interesting
01:09 karolherbst: also in terms of performance
01:16 karolherbst: airlied: nice.. the other benchmarks are running with your brnach :3
01:16 karolherbst: not sure it renders correctly :D
01:18 karolherbst: and now we need a proper inliner with heurestics and uhhh...
01:25 airlied: yeah a proper inliner is definitely a thing we would want, but it requires a bit more focus
01:25 airlied: I started on radeonsi, but it didn't work in my first effort
01:26 karolherbst: airlied: huh.. it seems like you can't run luxmark with llvmpipe on aarch64
01:28 karolherbst: something something only floats with fpext
01:28 airlied: I'm surprised luxmark works at all on aarch64
01:28 karolherbst: and apparently it gets int inputs
01:28 karolherbst: it does with asahi :D
01:29 airlied: karolherbst: if it crashes throw me a backtrace
01:29 airlied: do you get 128 or 256 wide vectors?
01:29 karolherbst: the llvm is invalid
01:30 airlied: yeah so that's likely some path that is falling off the x86 paths that just needs fixing u
01:30 karolherbst: https://gist.githubusercontent.com/karolherbst/4fc414deda4b0cdbb8173794b59c5dd5/raw/4a1d8a8226b5aef309c1f5253b5f4ff114617e6c/gistfile1.txt
01:31 airlied: ah looks like 128-bit vectors, I wonder can I reproduce on x86
01:31 kode54: Any word on GuC logging for errors in xe.ko?
01:31 kode54: Would be nice to log these reproducible crashes I have
01:32 kode54: So far they mostly only crash the app
01:33 airlied: karolherbst: looks like missing f16c could caus that
01:34 karolherbst: heh..
01:34 karolherbst: sounds plausible at least
01:35 airlied: though could be a backend bug, there does appear to be a bitcast
01:35 airlied: karolherbst: can you get a GALLIVM_DEBUG=ir run?
01:35 airlied: would need to see the nir
01:37 airlied: does apple have 8-wide vector instructions?
01:37 karolherbst: no idea
01:38 karolherbst: Features are "fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 asimddp sha512 asimdfhm dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp flagm2 frint i8mm bf16 bti ecv" btw
01:39 karolherbst: it seems to be armv8.5-a
01:40 HdkR: 8-wide with what size elements?
01:40 HdkR: 32-bit? You won't get that
01:40 HdkR: Apple only supports 128-bit wide
01:41 karolherbst: airlied: https://gist.githubusercontent.com/karolherbst/367540934fc3851c744e2b954192d0b1/raw/e3e2694b25fb37840410686bc9fee11dd2dc3b5e/gistfile1.txt
01:52 karolherbst: airlied: btw, on an updated luxmark-v3.1 llvmpipe renders correctly even with functions
01:55 airlied: karolherbst: yeah I built illwieckz version here to make sure
01:55 airlied: HdkR: any f16c equiv on aarch64?
01:55 karolherbst: and it's not faster on my system :D
01:55 karolherbst: but kinda equally fast
01:56 airlied: karolherbst: okay 16-bit is busted if you don't have f16c, not sure how best to fix it yet
01:56 karolherbst: mhh
01:56 HdkR: airlied: Yea, Arm64 supports the conversions natively
01:56 airlied: HdkR: on all arm64s? so I should just set the flag always there?
01:56 HdkR: fcvt is the instruction that does half to 32-bit/64-bit
01:57 HdkR: Scalar is always available, vector isn't always available
01:58 HdkR: `fphp` in the featurelist that Karol posted is the vector support
01:58 airlied: karolherbst: https://paste.centos.org/view/raw/897c85de might fix it
01:58 airlied: I can probably let llvm take care of that part of it
01:59 karolherbst: airlied: works
02:00 karolherbst: and llvmpipe is like... 60% faster than on my i7-10850H
02:00 karolherbst: and that's a M2 air...
02:02 karolherbst: did I say 60%? I meant of course 90%
02:02 airlied: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25104
02:03 karolherbst: m2 (passive cooled): 666, i7-10850H (jet fans): 359
02:03 HdkR: Delta Airlines called, they want their plane engines back
02:04 karolherbst: _sadly_ llvmpipe isn't fast enough for the other scenes and it fails at image validation :D
02:05 karolherbst: but anyway...
02:05 airlied: karolherbst: I'm honestly not sure it's a speed thing, but it's hard to tell
02:05 karolherbst: it's a speed thing
02:05 karolherbst: the more samples you get, the closer you get the to reference image
02:05 airlied: yeah on the ohter scenes I only ever get black though
02:06 karolherbst: heh
02:06 karolherbst: I'm not
02:06 karolherbst: they render fine with your function MR
02:06 karolherbst: well
02:06 karolherbst: "fine"
02:06 karolherbst: the hotel one doesn't validate with a 75% mismatch
02:06 karolherbst: Neumann looks fine tho...
02:07 ANDROID_IOS_user: mmmmmm
02:07 airlied: oh cool
02:07 karolherbst: heh
02:07 airlied: so maybe tuning the inliner might get them over the line
02:07 karolherbst: 0.61% difference :D
02:07 karolherbst: but that's not that hard when 60% of the image is a white background
02:08 karolherbst: mhh doubtful
02:08 karolherbst: the CPU is just not fast enough
02:09 karolherbst: the fun part is, they have a C++ version and it's just ultra fast...
02:10 karolherbst: but maybe embree is also just that optimized and their CL raytracer isn't
02:11 karolherbst: the C++ run of the hotel scene shows 35% different pixels :D
02:11 karolherbst: and it's like 10x as fast
02:20 HdkR: They validate the image after time has passed rather than number of samples gathered?
02:45 airlied: mripard: just fyi I'll leave fixes pull until next week as it needs an rc1 backmerge
04:38 mareko: dcbaker: any comment on https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25042 ?
04:47 dcbaker: mareko: ack
06:02 kode54: HdkR: yes
06:02 kode54: you can fail validation due to not enough samples being gathered in the time limit
06:46 mripard: airlied: ack, sorry, I didn't notice
07:05 airlied: okay 2 jobs failed to marge not sure what farms died
07:07 kode54: dj-death: I retested the Borderlands GOTY Enhanced issue
07:07 kode54: it reduces significantly, but is still bad, if I use my compositor's fullscreen hotkey to toggle it to a 1080p window
07:07 kode54: it was just as bad on GNOME, so I don't know what's up
07:08 kode54: I can try setting my display modes to 1080p
07:16 kode54: it reduces even further if I set my xwayland scaling to 100%, then display it in a tiny window in the middle of the screen
07:16 kode54: it's still rendering at 1080p, but it's blitting to a tiny window
07:32 kode54: okay
07:32 kode54: yeah
07:32 kode54: that's bork
07:33 kode54: why should the upscaling blitter be so slow
09:12 jfalempe: tzimmermann, do you have an opinion regarding https://patchwork.freedesktop.org/patch/554504/?series=122897&rev=2 ?
09:15 tzimmermann: well, i have mixed feelings
09:19 jfalempe: you think it's too restrictive or too permissive ? I tried to get the different point of view, and have a clear compromise.
09:22 javierm: jfalempe: I think what you document is a good compromise
09:24 javierm: also, I expect that the optimization of discarding the alpha component when not used/supported is something that most display controllers do by HW
09:24 tzimmermann: it opens the door for all kinds to conversions and 'performance fixes'
09:24 tzimmermann: javierm, you can leave this optimization to userspace
09:24 tzimmermann: it's better done there
09:24 javierm: tzimmermann: yeah, but then you can't have user-space that's not HW agnostic
09:25 tzimmermann: javierm, in which way?
09:26 javierm: tzimmermann: in that you need to configure it to use RGB888 instead of ARGB8888
09:26 javierm: for the specific case that jfalempe is trying to solve in the driver
09:26 jfalempe: tzimmermann, I explained why it's not always the case here: https://lists.freedesktop.org/archives/dri-devel/2023-August/419620.html
09:27 tzimmermann: it *is* HW agnostic. it just needs to support another color format.
09:28 tzimmermann: i'd argue that it's not even clear wether that is an optimization at all. by dripping that X from XRGB, you're reducing bus bandwidth at the expense of processing each transfered pixel individually.
09:29 tzimmermann: so better leave this to userspace, which can generate the correct format in the first place
09:29 tzimmermann: 'dropping'
09:29 jfalempe: when I asked some mesa devs they told me that supporting RGB888 in userspace is very complex. llvm and opengl doesn't like this.
09:30 jfalempe: also doing the convertion in userspace would be slower, because you can't do it on the fly during the transfer.
09:30 tzimmermann: jfalempe, exactly. but that doesn't mean you should use it in the driver
09:31 tzimmermann: it's even worse in the kernel. no sse-like instructions. and for the conversion output, you have to move byte-wise over the ouput array.
09:32 jfalempe: I don't see that at being worse than doing memcpy().
09:33 jfalempe: memcpy() is so slow, the CPU can do thousands of instruction when copying 4 bytes to the mgag200 VRAM.
09:33 tzimmermann: you mean memcpy_toio() ?
09:33 jfalempe: yes
09:35 tzimmermann: but that's because of your fast cpu. the tradeoff could be different on other systems. and if so, you could then drop the X component in userspace and send the RGB888 date to the kernel
09:35 tzimmermann: 'data'
09:35 tzimmermann: i see no upside for running this code in the kernel
09:36 jfalempe: Yes it could be, if you use a very slow CPU with a Matrox, but such thing doesn't exist in practice.
09:37 javierm: tzimmermann: just for compatiblity with XRGB8888
09:37 tzimmermann: you've not seen my g200 test system :p
09:37 javierm: tzimmermann: but you could also say the same about copying an (unused) alpha component, what could be the upside ?
09:37 javierm: I guess avoiding the computation of dropping it?
09:37 tzimmermann: javierm, simplicity
09:37 tzimmermann: yes
09:37 tzimmermann: leave that to userspace
09:38 tzimmermann: and only in-kernel user, fbcon, can handle rgb888
09:38 tzimmermann: i'd even argue to use rgb565 for matrox
09:40 jfalempe: rgb565 is different, because in this case you lose color information
09:41 jfalempe: and it's really noticeable
09:43 jfalempe: so basically the main argument for not doing software color conversion in the kernel is performance. But for you simplicity does also count ? even if that means slower performance with current userspace and less features (lower resolution) ?
09:48 MrCooper: if performance is an argument, XRGB8888 is faster with jfalempe's patch than without "conversion"
09:52 Kayden: is it possible to test the d3d12 driver's nir_to_dxil.c code on a linux system?
09:53 Kayden: looks like I broke it while trying to fix freedreno regressions
09:53 Kayden: I suspect the thing I asked for isn't representable in DXIL but I'm not familiar enough to know off-hand why it's breaking
10:36 pq: javierm, I would expect display controllers to do bulk memory read bursts from whatever memory they are scanning out from, meaning they also read the X channel and ignore it. So no savings in memory bandwidth. Dropping every fourth byte is kinda difficult if you operate on 4-byte or wider words. Or display cachelines.
10:43 karolherbst: does virgl operate on DRM renderer nodes?
10:43 karolherbst: or rather, how are virtualized GPUs exposed to the OS for solutions we care about inside mesa
10:44 karolherbst: We kinda need to figure out what CL device matches a GL one and I was wondering if using the renderer node is something we could do here
10:50 daniels: Kayden: yeah, if you're building with spirv-to-dxil=true, then you'll build a spirv2dxil executable which does what it says on the box
10:53 Kayden: daniels: thanks!
10:53 Kayden: found that, ran my piglit test through glslc to get spirv then trying to run it through spirv2dxil
10:54 Kayden: doesn't handle my silly glsl features so I'm trying to hack it to do that, heh
10:56 Kayden: hm, with a lot of hacking, it produced dxil
11:19 javierm: pq: I see
11:31 steve--w: hey guys, noob to mesa. trying to build windows, using msys2.
11:31 steve--w: I get :
11:31 steve--w: meson.build:876:2: ERROR: Problem encountered: Python (3.x) mako module >= 0.8.0 required to build mesa.
11:31 steve--w: installed mako using pip
11:37 steve--w: seems don't use msys2, use windows command prompt & native meson/ninja, etc. is the way.
11:52 steve--w: next up :
11:52 steve--w: FAILED: src/compiler/glsl/glsl_lexer.cpp
11:52 steve--w: "C:\Program Files (x86)\GnuWin32\bin\flex.EXE" "-DYY_USE_CONST=" "-D__STDC_VERSION__=199901" "-o" "src/compiler/glsl/glsl_lexer.cpp" "Z:/dev_public/FFmpeg_dev/mesa/src/compiler/glsl/glsl_lexer.ll"
11:52 steve--w: C:\Program Files (x86)\GnuWin32\bin\flex.EXE: unknown flag 'D'. For usage, try
11:52 steve--w: C:\Program Files (x86)\GnuWin32\bin\flex.EXE --help
11:56 zmike: anholt_: oh that'd be great
13:10 agd5f: sima, if you get a chance. radeonfb regression https://gitlab.freedesktop.org/drm/amd/-/issues/2826
13:25 karolherbst: I kinda wished I'd know what's going on here.. https://gist.githubusercontent.com/karolherbst/407dea07c0d8fd9ff04b28d81823614f/raw/634bddac47f4f961acf8d200355b99fff870ba81/gistfile1.txt
14:07 karolherbst: ../src/compiler/nir/nir_serialize.c:1277:70: runtime error: left shift of 524280 by 13 places cannot be represented in type 'int'
14:07 karolherbst: ../src/compiler/nir/nir_serialize.c:1274:70: runtime error: left shift of 524224 by 45 places cannot be represented in type 'long int'
14:07 karolherbst: mhhhh
14:08 karolherbst: but I guess that's practically fine
14:08 karolherbst: or rather, it's doing what it shall do
14:32 zmike: mareko: need your ack on tc part of https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25121
14:38 steve--w: this apparently : https://github.com/lexxmark/winflexbison/releases - compiled now :)
14:44 steve--w: don't get vaon12_drv_video.dll
15:15 eric_engestrom: tnt: oct 25 will be the branchpoint I think
15:27 tnt: eric_engestrom: thanks.
15:35 idr: So... when I try to send mail to mesa-dev, it gets spam blocked. That seems broken.
15:37 idr: Hrm... maybe I sent from the wrong address.
15:40 idr: Yup. PEBKAC
15:48 cheako: Mesa 23.2.0~rc3-1, arc a770, Starfield seems no good? The #intel-gfx seems abandoned. I understand it could take some time, but I haven't found any recognition acknowledging it.
15:49 karolherbst: cheako: tried filing a bug?
15:49 tnt: Doesn't starfield also render bug out on windows though ?
15:50 HdkR: No need to compare the Windows experience to the Linux experience. The driver stacks are completely different
15:50 cheako: Yeah, I'm not one to stick myself in the middle of a group of ppl who act as individuals.
15:50 karolherbst: ?
15:51 karolherbst: I'm confused
15:51 cheako: https://twitter.com/IntelGraphics/status/1697395355229331631
15:51 tnt: HdkR: I was more pointing to the game itself doing bad things :)
15:51 HdkR: Well, Bethesda game, nothing new there :)
15:51 karolherbst: cheako: isn't that for windows?
15:51 cheako: Team A blames Team B and Team B says only Team A can fix it, they don't realize they are both at fault.
15:52 karolherbst: ?
15:52 karolherbst: who is blaming who
15:53 cheako: Right, hence why I'm asking... it's fine if the linux platform is being ignored, just that I and others know it's being ignored.
15:53 karolherbst: if you are running into a bug, I'm sure that people are willing to look into it once it's reported
15:53 karolherbst: cheako: so you filed a bug and it got ignored?
15:54 karolherbst: I honestly don't understand what you are trying to say here and how we can/what we can do to help...
15:54 cheako: It usually doesn't go well when I file bugs.
15:55 cheako: ppl get all defensive.
15:55 karolherbst: I mean.. there is probably a reason to get defensive
15:55 karolherbst: otherwise they wouldn't
15:55 cheako: Yeas, I have autism.
15:56 karolherbst: anyway, I doubt anybody here is doing anything on purpose, just that there is often way more work to do than people have time
15:56 cheako: There are a bunch of things wrong with me, so I prefer chat to email.
15:56 karolherbst: but I wouldn't say intel gpu on linux is abondoned while develoeprs are paid to work on it
15:57 cheako: The chat channel, I posted yesterday and no reply.
15:57 cheako: 14 members and all.
15:57 karolherbst: sounds like the wrong channel then
15:58 karolherbst: and I've also didn't see any message of you in the mentioned channel
15:59 karolherbst: maybe something messed up with your client or some other reason your message didn't get through?
15:59 cheako: Libera.Chat is bad.
16:00 karolherbst: anyway, if you try to write in #intel-gfx the IRC server should either respond on why you can't write there... or maybe it doesn't work with your client. Dunno, but in either case, I don't see any message from you there, so I'd say that simply nobody saw your message
16:03 cheako: TRhanks that helps a lot.
16:03 karolherbst: np
16:04 MrCooper: a chat channel isn't an issue tracker anyway
16:05 cheako: I don't like issue trackers, look at the above conversation and ask if it just screams that needed to happen on such a platform?
16:06 cheako: I don't mind marking down in stone the products of such conversations, but to just talk about something it's a poor choice.
16:06 karolherbst: anyway, it seems we have currently one starfield related bug in our issue tracker: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9759
16:06 karolherbst: but that's AMD related
16:07 MrCooper: just don't expect issues to get fixed if they're not reported in the place created for reporting issues
16:07 karolherbst: cheako: did you try it yourself yet? Or do you want to know if it works at all before e.g. buying the game?
16:08 cheako: Yes, I've exhausted the resources google suggested and moved on to asking about it.
16:08 karolherbst: cheako: we have protondb to check how well windows games run on proton/steam: https://www.protondb.com/app/1716740
16:08 karolherbst: but it doesn't seem like anybody with an Intel GPU tried yet
16:09 karolherbst: or rather.. didn't leave a report
16:09 karolherbst: cheako: Steam also has the option to refund games and I've done it multiple times with games not working through proton, in case that's an option for you
16:09 karolherbst: but I think availabilty for this depends on the country?
16:09 karolherbst: not sure
16:10 karolherbst: ohh wait
16:10 karolherbst: there is a report
16:10 karolherbst: cheako: https://www.protondb.com/app/1716740#f7JmskE1ky
16:11 lumag: robclark, generic question regarding libdrm. Currently modetest lacks attention from libdrm maintainers. There are 13 MRs open against modetest. emersion wrote that he is not really interested in this tool. Do you know if there any way to reinstantiate tests/modetest maintenance within libdrm? I/we can volunteer reviewing and merging/declining these patches. I'm asking because otherwise my only option is to fork it using git filter-repo and maintain
16:11 lumag: it as a separate tool (if that sounds more acceptable from drm maintainers POV).
16:11 karolherbst: cheako: anyway.. seems like the best you can do for now is to wait and see until it gets fixed
16:12 cheako: I noticed there are a few IDs for starfiled, be sure you've the correct one. I can't seem to tell what Id was used on my system.
16:13 gildekel: lumag: I am surprised by this. modetest is a great tool. Would love to see support for it continues. Will help if I can as well!
16:14 lumag: Yep. Marijn[m] also has been pinging me, as this tool is of great use during bringup
16:15 karolherbst: cheako: I suspect the game itself is checking what GPU it sees and just doesn't run on Intel... or something. It's unknown to me what the reason for this problem is, but I'm confident that it will get resolved at some point
16:15 gildekel: we use it daily for dev and debugging on ChromeOS, so it'll be quite tragic to have it drop without replacement. seanpaul_: FYI
16:18 robclark: lumag: if you want to maintain it, I'm all for that
16:18 seanpaul_: yeah. modetest, while awkward, is pretty useful
16:19 lumag: robclark, then how can we proceed?
16:21 idr: cheako: #intel-3d on OFTC is more likely to be active.
16:21 robclark: lumag: I added you as a "Developer".. looks like libdrm isn't using margebot yet, not sure if there is a good reason for that or if something needs to be done.. but I think "Developer" should probably be enough to merge MRs
16:24 gildekel: kudos for picking up the glove, lumag. Appreciated.
16:26 alyssa: robclark: re sample locations, it was my understanding that gl_SamplePositions returns the default positions regardless, not that it was undefined
16:27 alyssa: IDK where I saw that claim originally, though
16:27 alyssa: (yes, this also makes gl_SamplePosition kinda useless.)
16:28 robclark: as best I can tell, the `rgetpos` instruction doesn't even return that when user sample locs are enabled... so it matches the specified (ie. undefined) vk behavior ;-)
16:29 alyssa: fair
16:29 lumag: robclark, seems to work, thank you. Waiting for the MR to be merged
16:29 alyssa: Honestly I'm very surprised that you have ISA level support for it, lol?
16:29 robclark: lumag: cool, thx for volunteering
16:30 alyssa: (As opposed to lowering to gl_SampleID + a uniform + some math)
16:31 robclark: alyssa: yeah, and I could lower it to const lookup for sampleloc case.. but that would be extra shader variant and more draw-time overhead.. and not really thinking that it is worth adding that fraction of drawoverhead overhead
16:31 lumag: gildekel, seanpaul_: I'll review outstanding MRs. Feel free to ping me in future if there is anything pending
16:32 robclark: alyssa: I guess we could do some common lowering since zink has the same issue.. but I'm pretty meh about that combination of features ;-)
16:32 alyssa: robclark: meh?
16:32 robclark: esp when the one single user afaict is a single piglit test
16:32 alyssa: i'm definitely not suggesting a shader variant
16:33 alyssa: I'm more wondering why Qualcomm added in rgetpos in the first place
16:33 robclark: oh.. less draw overhead ;-)
16:33 alyssa: instead of doing a little bit of ALU and maybe some preamble stuff
16:33 robclark: idk, maybe it was supposed to work w/ samplepos
16:33 alyssa: yeah..
16:33 alyssa: FWIW, Mali has the 'interesting' approach where the sample positions ptr is pushed directly into a special register in the shade
16:34 alyssa: so you still recover gl_SamplePosition with math, not a special op
16:34 alyssa: but there's no extra draw overhead, your
16:34 alyssa: you couldn't do anything else with that uniform, etc
16:34 robclark: not sure if dx has anything to say about this.. if vk doesn't care and there is no gles extension then I could see them maybe not testing this
16:34 alyssa: ...Of course I think that's also broken with custom sample locs.
16:35 alyssa: robclark: IDK, I think working with AGX has massively increased my tolerance for shader lowering shenanigans
16:35 alyssa: =)
16:36 alyssa: (Both because Apple does tremendous amounts of lowering to fit Metal onto this wacky hw, and also because I'm implementing APIs that the HW wasn't designed for.)
16:38 robclark: lowering as long as not variant dependent isn't the end of the world.. but this would also mean pushing more const state.. so more draw time overhead.. so I'm sticking with the excuse that it is not worth penalizing other use cases when it is easier to say "yer holding it wrong" if someone tries to use gl_SamplePosition with user sample locs ;-)
16:38 alyssa: mood
16:39 alyssa: Like I said. AGX has massively increased my tolerance for shenanigans
16:41 alyssa: (It probably helps that the min-spec CPU I care about is massively faster than what freedreno has to care about, so I'm not nearly as worried about drawoverhead #s.)
16:48 alyssa: robclark: Oh wait, now I'm noticing my entire samplepositions approach is silly tho
16:48 alyssa: if it's supposed to return default positions I may as well just calculate as ALU in shader entirely
16:48 alyssa: since I already have the num_samples
16:49 alyssa: for the monolithic case, that means load_sample_position will constant fold away entirely
16:50 alyssa: for the non-monolithic case, that means a bit more math. but probably not a huge deal
16:50 alyssa: (and if it *is* a problem, I can preload the constants with a bit of linking magic. but I expect it's irrelevant.)
16:50 robclark: it isn't just cpu overhead, at this point our cpu overhead is low enough that even on small cores, drawoverhead is CP limited without FD_MESA_DEBUG=nohw.. and pushing extra state for this one misconceived use-case is dumb
16:51 alyssa: oh, yeah, that's true
16:52 alyssa: ..and now I'm observing that prologs/epilogs totally breaks preambles
16:52 alyssa: Erghhhh
16:52 alyssa: maybe I want function ptrs instead of prologs/epilogs then :|
16:55 HdkR: alyssa: Time for real subroutines? :D
17:03 alyssa: HdkR: maybe
17:03 alyssa: Actually no, I think if I run nir_opt_preamble on the fragment shader, the resulting preamble will still be valid for the linked pixel shader
17:04 alyssa: It /does/ mean that we can't use preambles for the prologs/epilogs themselves, but that would be the case if they were subroutines too
17:04 alyssa: unfortunately I could see that being a problem for e.g. load_blend_const
17:05 alyssa: unless we just always reserve uniforms for the blend const unconditionally when epilogs are used
17:05 alyssa: not the end of the world if so
17:49 alyssa: strictly could concatenate preambles too but unlikely to be worth it
18:12 pzanoni: In a Vulkan Mesa driver, when I need to allocate memory, when am I supposed to use malloc()-family functions vs when am I supposed to use vk_alloc()-family functions? Is there some documentation about it somewhere? What are the trade-offs?
20:00 aleasto-: hi, is there a public api to get the number of planes for a format-modifier combo before allocating a buffer?
20:00 aleasto-: driver agnostic
20:44 mareko: RGB888 isn't supported by any or most hw, it can only do power-of-two pixel sizes
20:45 airlied: rgb888 in userspace isnt ever going to haopen
20:46 airlied: if the kernel can accel xrgb by converting to packed rgb then that is the only place to do it
20:56 aleasto-: oh great, gbm_device_get_format_modifier_plane_count exists
21:07 alyssa: mareko: bizarrely Mali supports it
21:07 alyssa: and IIRC I benchmarked a small perf improvement from using it over rgbx :\
21:07 alyssa: bizarre decisions
22:04 idr: pzanoni: I don't have a full answer to your question, but... Vulkan provides a mechanism so that the application can control memory allocation. The system memory allocations done by the driver might call out to the application.
22:05 pzanoni: idr: yeah, I'm also aware that CTS plays with things by making allocations fail and verify if programs are behaving as expected, that's why I gravitated more towards the vk_ versions, but I was recently asked to change some code to a version that used plain malloc() and started wondring about when each one was more desired/acceptable
22:55 Kayden: hmm. was wondering if a barrier intrinsic with WORKGROUP execution and memory scope, and no memory modes, even does anything useful. But I guess that's exactly what SPIR-V's OpControlBarrier is
23:05 pendingchaos: you could probably combine it with non-control memory barriers to get what's effectively a combined memory+control barrier
23:09 alyssa: Kayden: Isn't that a GLSL barrier()?
23:10 alyssa: Kayden: Oh, I guess the funny thing is having a non-NONE memory scope and no memory modes
23:10 alyssa: Probably opt_barriers should drop the memory scope if there are no modes left?
23:12 Kayden: yeah, it probably should
23:12 Kayden: nir_to_dxil.c is having trouble with that, it seems that barriers need to have some kind of memory mode
23:15 Kayden: I can try and have it set memory_scope = NONE then
23:15 Kayden: should I leave memory semantics at ACQ|REL? seems like we always do that
23:53 Kayden: okay, this .gitlab-ci/bin/ci_run_n_monitor.py is pretty handy!
23:53 zmike: amen
23:54 Kayden: detects the wrong pipeline by default (the one in my personal repo, rather than the MR, which doesn't have all the d3d12 jobs), but easy enough to --pipeline-url
23:54 Kayden: got a PKGBUILD for python-filecache too, I guess I should figure out how to put that in AUR