00:00jenatali: jekstrand: Yeah none of those 4->16 fixes did the trick (unsurprisingly)
00:02jekstrand: jenatali: can_propagate_through_alu looks wrong to me
00:07jekstrand: Nah, it's ok
00:07jekstrand: jenatali: Can you isolate which part of opt_if is causing the problem?
00:07jekstrand: jenatali: There are half a dozen different optimizations in there
00:07jenatali: jekstrand: I can try
00:11jenatali: jekstrand: As expected, removing opt_if_evaluate_condition_use from opt_if_safe_cf_list doesn't break
00:12jenatali: Not sure if there's a specific thing you want me to try to drill more into?
00:15jenatali: jekstrand: Since it's an if condition that's being rewritten, the path seems pretty clear, to evaluate_if_condition, which just does dominance checks
00:16jekstrand: jenatali: But it checks that it's dominated by the then or the else which is clearly false
00:16jekstrand: jenatali: Unless metadata isn't getting reset somewhere
00:17jenatali: Ah hm, you're right...
00:17jenatali: Well that's fun
00:19jenatali: jekstrand: Is there a way to force dominance re-indexing?
00:20jekstrand: jenatali: Really?
00:20jenatali: jekstrand: I'm not positive, but it must be, right?
00:21jekstrand: jenatali: You can throw an impl->valid_metadata = 0 before the nir_metadata_require
00:23jenatali: Still fails
00:23jenatali: jekstrand: The one thing I saw: /* If a block is unreachable, then nir_block::dom_pre_index == INT16_MAX
00:23jenatali: This large shader might be hitting INT16_MAX
00:24jekstrand: jenatali: I bet that's it. :)
00:24jekstrand: jenatali: Time to make those block counters 32-bit. 😭😭😭😭😭😭
00:25jekstrand: Actually... 16304 isn't anywhere close to INT16_MAX
00:25jekstrand: But it's scarily close
00:25jekstrand: By "anywhere close" I mean, 2x more and we'll hit it
00:26jenatali: jekstrand: Are the dominance indices close to the block IDs?
00:26jekstrand: Oh.... it's about 2x the block index....
00:26jekstrand: Oh, my
00:26jekstrand: Why is it signed?
00:27jekstrand: I guess we probably use -1 for something?
00:27jenatali: Yeah, the next line of that comment :P
00:27jenatali: * and nir_block::dom_post_index == -1. This allows us to trivially handle
00:27jenatali: * unreachable blocks here with zero extra work.
00:27jekstrand: Make them 32-bit and see if that fixes it. :)
00:27jekstrand: It's 2-per-block; it's not going to bloat the NIR that much
00:27jekstrand: But also, dang
00:28jenatali: Serialization probably also needs a fix? Unless this data is just expected to be recalculated after deserializing
00:28jekstrand: No, serialization throws away most metadata
00:29jenatali: My initial hunch about the shader just being huge was right, I was just wrong about what was blowing up as a result :P
00:29jenatali: jekstrand: Yep, that was it
00:29jekstrand: oh, man.....
00:30jekstrand: jenatali: We could also start our indexing at 1 instead of 0 and make it unsigned
00:30jenatali: That'd buy us one order of magnitude I guess
00:30jenatali: 32-bit's probably the better fix
00:30jenatali: Well, I guess we could do both
00:31jekstrand: IMO, both sounds like a good plan
00:32jekstrand: jenatali: That bug makes me sad. :-(
00:32jekstrand: jenatali: Also, you sould add in an assert while you're at it
00:33jenatali: jekstrand: I'm glad I only wasted a day on it, that feels like that could've been much worse
00:33jekstrand: Not that overflowing uint32_t is likely to happen. :)
00:33jenatali: Good call
00:34jenatali: jekstrand: I'm sure someone said the same about overflowing int16_t ;)
00:34jekstrand: jenatali: But also, how much stuff is in this shader? Clearly too much, I suppose.
00:35jenatali: jekstrand: It's literally just sin on a float16
00:35jenatali: Just, libclc's sin is... large
00:35jenatali: Which it has to be to pass the tests
00:35jenatali: At least there's not fmas in there which need to be further inlined to a software fma ;)
00:36jekstrand: Some days, I think NIR needs "real" function calls....
00:36jenatali: Yeah, if we lowered libclc by copying the functions, instead of inlining them, then we could optimize it once, instead of 16 times
00:37jenatali: That's... probably something that's worth doing
00:37jekstrand: jenatali: You could optimize the libclc before inlining
00:37jenatali: That's true
00:37jenatali: Maybe "should" is the better verb there
00:37jenatali: Though it's also huge and most apps won't use it...
00:37jekstrand: That's what a shader cache is for?
00:38jenatali: Good point. I even added one to D3D12's API for just this reason
00:38jenatali: I should actually hook it up at some point
00:38HdkR: jekstrand: RT should mean NIR needs function calls today to handle callable :)
00:40jenatali: jekstrand: FYI, you missed the 'R' when adding my r-b to your patches ;)
00:40jekstrand: HdkR: Callable in RT is.... not quite the same. :)
00:41jekstrand: jenatali: drp
00:41jekstrand: HdkR: That's more like function pointers. :)
00:42HdkR: Almost the same deal though. Need to handle being able to call that pointer
00:42jekstrand: Yes, it does require a calling convention
00:42HdkR: plez no subroutine solution
00:43jekstrand: But, depending on your hardware design, it may not be that similar in the end.
00:43jekstrand: In particular, you may have to spill out everything and deal with a shader continuation.
00:43jekstrand: But for regular non-inline functions, you probably want some of it to live in registers and have some sort of save/restore convention
00:44jekstrand: That or some sort of globally-aware register allocation
00:45HdkR: It ends up being messy, just like it being a real function :D
00:46jenatali: Now to clean up my branch and get these patches out
00:47jekstrand: HdkR: Oh, they're similar quantities of messy. Just different types of messy. :)
00:48italo: can someone give me a quick help with understanding nir_uabs_usub?
00:48HdkR: Now just to allow goto in glsl
00:48jekstrand: HdkR: We can handle that now. We have a structurizer. :D
00:48jekstrand: HdkR: Not that I want to encourage such behavior....
00:49jekstrand: italo: Not sure what you're confused about. It's |a - b| but without any overflow issues
00:50jekstrand: italo: nir_opcodes.py has the C constant-folding expression
00:52italo: jekstrand: I was confused why it doesn't seem to be generated in my code, is it that it can only be generated by the absoluteDifference() from the INTEL_shader_integer_functions2 extension?
00:53jenatali: jekstrand: I've got a couple more 4 -> MAX_VEC_COMPONENTS patches on top of what you have in !6655. Should I just open a separate MR?
00:53jekstrand: italo: Quite possibly. Why do you expect your code to generate it?
00:54jekstrand: jenatali: Push another patch to the same MR and I'll review it
00:54jenatali: jekstrand: Sounds good
00:54italo: jekstrand: I thought it was generated if I had something like abs(x-y) in my code, but I guess it makes sense that it doesn't
00:54jekstrand: italo: abs(x-y) doesn't have quite the same semantics
00:54jekstrand: italo: In particular, it behaves differently wrt. roll-over, I think.
00:55italo: jekstrand: and what about the overflow issues? is it different as well?
00:56jekstrand: italo: uabs_usub(UINT32_MAX, 0) = uabs_usub(0, UINT32_MAX) = UINT32_MAX
00:57jekstrand: italo: But iabs(isub(UINT32_MAX, 0)) = iabs(isub(-1, 0)) = iabs(-1) = 1
01:07italo: jekstrand: what about uabs_isub? im trying to follow the const expr but I'm not sure, it seems like it would give -UINT32_MAX, but that doesn't make sense given that it's supposed to return a positive number
01:09italo: jekstrand: nvm I think it returns 1 as well, so what would be the difference between this one and iabs(isub())?
01:26jenatali: jekstrand: Pushed
01:32jekstrand: jenatali: Thanks
01:41italo: jekstrand: nvm I got it :)
01:59jenatali: jekstrand: FYI, the fdot8/fdot16 doesn't exist in stable I think. The rest of that patch is good for backport
02:25jekstrand: jenatali: Yeah, I'll let the backport people sort that out. :)
02:25jekstrand: Which is to say, they'll figure out it doesn't apply and ask me to backport it in about 4 days. :P
02:25jenatali: It'll apply, it just won't build
02:33dcbaker[m]: If it applies, ship it in stable
02:37krh: jenatali: more than 2^16 blocks?
02:38jenatali: krh: 2^14
02:38jenatali: Signed, plus each block gets 2 indices
02:39krh: that's still crazy
02:40jekstrand: Yeah, it's nuts
02:40airlied: the compiler is getting all grown up :-P
02:41imirkin: is that like when the kernel was no longer able to represent all pages with an int32?
02:41jekstrand: airlied: Either that or jenatali needs to stop torturing it. :P
02:41jekstrand: I swear, all this OpenCL is compiler abuse.
02:47jenatali: I'm just glad I didn't have to start from scratch for CL support
02:54airlied: you'd just spirv->llvm and try and write an llvm backend
02:54airlied: and never be heard from again
03:00jenatali: Yeah... that was pretty much where we were at before kusma suggested we use Mesa
03:01jenatali: Otherwise we'd have to deal with spirv->llvm3.7
07:49pq: pinchartl, your use case has been imagined before: https://gitlab.freedesktop.org/wayland/weston/-/issues/244
07:51pinchartl: pq: thanks for the pointer
07:55pq: IOW, I'd welcome a good implementation in Weston upstream. :-) It seems mvlad was doing something towards it.
07:56pq: pinchartl, but the Weston side is just half the story. You have to get your client side to behave: use exactly two wl_surfaces (perhaps one as a sub-surface) and allocate scanout-able buffers for them.
07:57pq: if you get even one more wl_surface (well, cursor should be fine I guess), then it'll fall apart. Like, a drop-down box.
07:57pq: or tooltip
07:58pinchartl: yes, the client would need to render everything on a single surface
07:59pq: and if your KMS planes had size restrictions, your client side would need to magically know that
08:01pq: pinchartl, hmm, but the reason you do this is because you want to have the two surfaces from different processes? You may need to hack the weston window management to ensure the z-ordering and positioning is correct.
08:03pq: that shouldn't be too hard either, you might even just replace the whole WM with the skeleton WM Weston uses in its test suite and add some special cases in it to position your windows right
08:07pinchartl: pq: no, it's from the same process
08:08pq: oh, ok, so I assume using the same Wayland connection too? Then sub-surfaces work fine and OOTB.
08:08pq: from Weston perspective
08:11pinchartl: I'd have to figure out how to get hold of the wayland connection created by Qt
08:15pq: mmhm, that I don't know
08:15pq: I'd assume you can
08:36daniels: yeah, you query the native handle
09:27danvet: melissawen, thx for your review
09:27danvet: can you pls also review the vgem patch right before in the same series?
09:27danvet: it's practically the same thing
09:28melissawen: danvet, welcome! yes, I'm doing it right now :)
09:29danvet: melissawen, btw on the "derefence before the pointer check", compilers have in the past used that as an excuse to optimize out the pointer check outright
09:29danvet: "can't be NULL here, it was dereferenced right above"
09:30danvet: so pretty important to get this right
09:36melissawen: hmm, these compiler details mess my head up a little bit.. thanks
10:19mszyprow|home: may I ask for help merging "[PATCH v10 00/30] DRM: fix struct sg_table nents vs. orig_nents misuse" patchset to drm-misc-next?
13:39pcercuei: linusw: RE: DBI patchset, do you happen to have a ILI9341 panel to test with? :)
13:40pcercuei: patch 4/6 ("drm/tiny: Add TinyDRM for DSI/DBI panels") is also untested, because I have no hardware to test it with. My Newvision panel does not support the MIPI_DCS_WRITE_MEMORY_START command
14:37mszyprow|home: is there any drm-misc maintainer willing to help?
14:54amylizzle: so I'm working on adding support for my weird wireless card to the broadcom driver, and I've hit a bit of a bump
14:56amylizzle: I'm pretty sure I've got the firmware out the windows driver correctly, and it's getting send to the card, but failing to initialise and I'm not sure how to debug why
15:03kisak: amylizzle: hello, in general this channel focuses on open source graphics. You might have some luck over at #linux-wireless
15:17amylizzle: ah, my bad, thanks for the tip
15:17karolherbst: jenatali: so it was something annoying afterall :/
15:17jenatali: karolherbst: Yep
15:18karolherbst: jenatali: fun fact: ubsan would have detected that :)
15:18karolherbst: but we also have a lot of noise with ubsan
15:18jenatali: Signed integer overflow?
15:18jenatali: I guess technically, assigning an out-of-range unsigned to a signed value
15:19karolherbst: it also reports signed shifts of 31 bits and stuff :)
15:19karolherbst: some say it's well defined, others say: probably bad coding :p
15:21jenatali: I'll merge that one later today though, it's pretty straightforward and already reviewed
15:26linusw: pcercuei: sadly No... I don't have that panel.
16:04alimon: tomeu: hi, to run mesa deqp on freedreno devices, which mustpass files uses?, https://gitlab.freedesktop.org/mesa/mesa/-/blob/master/.gitlab-ci/deqp-runner.sh
16:08zmike_: did we just run out of space on the git server?
16:09ajax: looks like...
16:09ajax: gitlab admin panel is telling me the /gitlab-data partition is at 393G/393G
16:10ajax: being worked on, i'm told
16:11anholt: alimon: .gitlab-ci/build-deqp*.sh
16:20daniels: fixed now
16:21tagr: daniels: not sure if you're the right person for this, but I'm wondering if there's some document that describes how the gitlab CI works on freedesktop?
16:21daniels: tagr: uh
16:22daniels: tagr: generally, GitLab itself has quite docs; for Mesa specifically there are some explanatory docs, especially for particular areas like running on-hardware testing; the ci-templates project has docs explaining how the general container+build flow works
16:22daniels: what do you want to know?
16:23tagr: daniels: basically I'm wondering how we could participate from the Tegra side
16:24tagr: daniels: if we wanted to enable some tests, what would be the best way to do so? would we need to set up a lab that the CI can connect to, or would it be better if we donated hardware that could be hooked up in some existing lab?
16:25daniels: tagr: it's better if you can set one up and maintain it yourself, unless you can convince Collabora or anholt to physically maintain it for you :P
16:26daniels: there's a fair bit of information on https://docs.mesa3d.org/ci/index.html
16:26daniels: Collabora uses LAVA (which has a separate page linked from there) for all our test work, because it means that we can share the workload with both KernelCI and internal tests as well (e.g. boot-testing on the same devices for customer projects)
16:26anholt: and if there's missing stuff, let me know! I'm trying to make those docs enough to reasonably set up your own lab.
16:27daniels: Google/freedreno and also the etnaviv WIP use the 'bare-metal' setup, which is rather more simple as it has fewer moving parts than LAVA, but does mean you have to dedicate your setup to only GitLab CI
16:27daniels: (again that's also well documented under that page)
16:29alimon: anholt: i see thx
16:30tagr: daniels: okay, that's a great reference page
16:31tagr: I'll work through that and then think about if we can hook that up with our existing testing infrastructure
16:32tagr: I'm not sure the existing infrastructure is going to be a good fit because it does sometimes have intermittent failures, especially on some of the older boards (Tegra20, Tegra30)
16:32tagr: although I guess those aren't all that interesting from a graphics point of view (yet)
16:32daniels: tagr: you can mostly thank anholt for those docs
16:33tagr: anholt: thanks for the docs =)
16:33daniels: tagr: re intermittent failures, it depends what they are - if you can detect those within e.g. dEQP runs, we already have skip/flake lists we use to blacklist anything which isn't reliable
16:33daniels: if it's more like physical infrastructure failure, then mm ... that's not ideal, but if you can autodetect it it's certainly better
16:40tagr: daniels: I think most of the time when it happens the devices will just refuse to boot
16:40tagr: I suppose that would fall in the category of "easily detectable"
16:44anholt: tagr: yep. in baremetal we're watching the serial and when we get matches to known issues we just try rebooting into the tests again.
17:00ajax: i know i've asked this before, and probably i should ask mesa-dev, but: does _anyone_ care about building on macos?
17:00anholt: I don't know of anyone shipping anything on macos
17:01bnieuwenhuizen: zink on moltenvk maybe?
17:01anholt: I think there is some long-tail usage in ports
17:02ajax: bnieuwenhuizen: i mean, that'd be hilarious and awesome, but afaict the thing we build for macos is only a libGL that would work with X11.app. we don't build a driver that would plug in under OpenGL.framework.
17:04bnieuwenhuizen: ajax: consumer I have in mind is codeweavers/wine (though not entirely sure of official plans there or whether it materialized at all), so likely wouldn't even be native GL
17:05bnieuwenhuizen: (and not saying you should block it on this, as I'm not even sure anything materialized yet)
17:08dcbaker[m]: ajax: homebrew was still shipping llvmpipe when we did the meson conversion, and let me know I broke everything, lol
17:26ajax: dcbaker[m]: blink. llvmpipe? not the cgl plugin thing?
17:27ajax: i mean i understand wanting to avoid apple's gl in like a general sense, but picking llvmpipe over hardware accel is an interesting choice
17:28dcbaker[m]: I think it's mostly people doing osmesa kind of stuff. My understanding is that too do hardware excel you have to use Apple's GL frontend and LLVM
17:29ajax: the applegl code in mesa dates to well before the whole gcc gplv3 llvm kerfuffle
17:30ajax: the whole reason it's there is to make glx + x11.app go fast
17:33daniels: ajax: zink on moltenvk is literally a thing
17:33daniels: I know of at least one commercial product using it
17:33ajax: daniels: that's glorious
17:33daniels: (not wine)
17:34ajax: what's the winsys binding even look like for that. i guess molten has something there?
17:37daniels: the last I heard, it was literally ReadPixels
17:37daniels: (i.e. they had their own thing to do ReadPixels followed by native present, not that they plumbed that through AGL or whatever)
18:04airlied: ajax: vlee seems to keep building on osx as well I think
20:52bnieuwenhuizen: cheakoirccloud: I don't see explained what your issue is
21:06cheakoirccloud: I always do that 😂
21:11krh: can somebody please layer d3d on top of zink on top of moltenvk?
21:13bnieuwenhuizen: krh: the d3d -> moltenvk route directly is already being done
21:14krh: bnieuwenhuizen: but I wanted all the APIs in there
21:27vsyrjala: to run one set of tests and get full coverage for all APIs at once? :)
21:28dcbaker[m]: What about mantle? I won't be impressed until mantle is in there
21:29bnieuwenhuizen: dcbaker[m]: grvk (mantle on vulkan) is being worked on :)
21:30dcbaker[m]: I know. I didn't realize they'd released a spec for it or I might have tried my hand at it. Gotta get my tell games with mantle support working in wine, lol
21:31bnieuwenhuizen: though it is hard to put it as an intermediate layer. sounds like we might need to go grvk -> vallium+d3d12 -> vkd3d -> vulkan -> multenvk but then we miss GL
21:31dcbaker[m]: *two games
21:32dcbaker[m]: Someone's just needs to write a vulkan->OpenGL layer. I know someone's thought about it
21:33bnieuwenhuizen: can we do vallium on top of virgil on top of GL ?
21:35vsyrjala: should layer all the apis based on the dates when they were published
21:36kisak: going in and out of gles is going to be painful
21:37kisak: or just gles 3.2?
21:38ajax: i won't be happy until we have frontends for glide, rredline, and quickdraw 3d rave
21:38ajax: maybe irisgl in there too, why not