00:00karolherbst: anholt: okay soo yeah., it was just stale files I had locally
00:00karolherbst: sorry for the noise
00:01karolherbst: hehe, "ERROR: unknown nir_intrinsic_op convert_alu_types" :) now to implement it for real
00:13jekstrand: karolherbst: Wrote my own libCLC linker: https://paste.centos.org/view/101a96c9
00:13jekstrand: 21 LOC :)
00:13karolherbst: jekstrand: will the convert intrinsic have fully qualified bit_sizes in its type declarations?
00:13karolherbst: ohh, it does
00:14jekstrand: karolherbst: Yes, nir_validate ensures you have fully qualified bit_sizes everywhere.
00:14karolherbst: ahh, cool
00:14jekstrand: Wanted to make it easy to sue
00:14karolherbst: that makes stuff easy for me
00:14jekstrand: deqp-vk: ../src/intel/compiler/brw_fs_nir.cpp:5429: void fs_visitor::nir_emit_intrinsic(const brw::fs_builder&, nir_intrinsic_instr*): Assertion `!"unknown intrinsic"' failed.
00:14jekstrand: Hey, look, convert_alu_types!
00:14jekstrand:is hitting it too
00:14jekstrand:turns on the lowering pass because he's lazy
00:19jenatali: jekstrand: That libclc linker won't work, you need to copy all the variables too
00:19jekstrand: jenatali: Bah!
00:19jekstrand: You're right!
00:19jekstrand: Good thing I don't care about trig functions today. :)
00:19jenatali: Or async? :P
00:19jekstrand: or that
00:19jenatali: Those need the system value variables
00:20jekstrand: I figured there was a reason......
00:25AndrewR: karolherbst, https://pastebin.com/7M30ZczW - !6569 fails on 32-bit ....
00:26karolherbst:gccs error prints with C++ makes me not want to debug such issues
00:26airlied: lols pastebin warmed it was potentially offensice
00:27karolherbst: pastebin isn't wrong :p
00:27AndrewR: karolherbst, thing is, it was compiling with clang-10 ....
00:28karolherbst: even java generates besser errors
00:29karolherbst: but honestly?
00:29karolherbst: no idea what I did wrong there...
00:30karolherbst: maybe it just broke due to rebases? no idea..
00:30karolherbst: I'll give it a compile
00:31AndrewR: karolherbst, may be upgrading my stl_pair.h file will help (not sure if it will break something else ..)
00:32karolherbst: jekstrand: sooo.. let's see how much is broken
00:32karolherbst: but it does seem to work :)
00:33karolherbst: jekstrand: does it also convert conversion to/from bool?
00:33karolherbst: mhhh, I guess it does
00:34jenatali: I don't think you'll get that from the SPIR-V generator, but you probably will from the lowering pass
00:34karolherbst: need to check where OpenGL regresses
00:34karolherbst: " 1 file changed, 51 insertions(+), 44 deletions(-)" :D
00:34karolherbst: I kind of hoped I'd remove more code
00:35jenatali: So close
00:35karolherbst: well.. I had to add some helpers
00:35karolherbst:should make more use of Converter::convert(nir_alu_type type)
00:36karolherbst: I can potentially remove quite a lot of code then
00:36karolherbst: and undef needs special handling anyway :/
00:37karolherbst: but I guess it's worth the clean up
00:37karolherbst: AndrewR: yeah.. could be
00:39karolherbst: ahh yeah...
00:40karolherbst: the hw hates me :D
00:40karolherbst: " gr: GPC1/TPC2/MP trap: global 00000004 [MULTIPLE_WARP_ERRORS] warp 140009 [ILLEGAL_INSTR_ENCODING]"
00:40karolherbst: but those are because I don't lower invalid combinations
00:42karolherbst: AndrewR: the libstdc++ is from gcc, no?
00:42karolherbst: 5.5.0 feels "old"
00:42AndrewR: karolherbst, yeah, but new libstdc++ breaks api/abi, no?
00:42karolherbst: it never broke the ABI/API by default
00:43karolherbst: you had to explicitly tell it to use the new one
00:43karolherbst: but c++14 should be supported in 5.5
00:43karolherbst: maybe I used something newer? dunno
00:48karolherbst: AndrewR: yeah.. no idea, it compiles fine here
00:49karolherbst: I could imagine that I just use a newer feature than I should, but no idea how to tell gcc to check that
00:53AndrewR: karolherbst, is there way to force mesa build to use clang/llvm (?) version of headers?
01:02AndrewR: karolherbst, https://github.com/abseil/abseil-cpp/issues/731 ?
01:21AndrewR: interesting - after I installed bignet (intel's old openCL impl) piglit started to fail: File "/home/guest/botva/src/piglit/framework/core.py", line 174, in collect_system_info result[name] = out.decode('utf-8') UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 2988: invalid start byte
01:24AndrewR: https://pastebin.com/RHQ5hhS9 - piglit error
03:19AndrewR: may be this? https://en.cppreference.com/w/cpp/language/class_template_argument_deduction
03:21dcbaker[m]: AndrewR: can we just forget about beignet?
03:42AndrewR: dcbaker[m], sure, I just tried to compile it .... was not expecting piglit being upset :}
03:43dcbaker[m]: It has a lot of bugs. I seem to remember that it caused clinfo to output garbage, this the results you're seeing
04:18AndrewR: dcbaker[m], no problem , it was easy to deactivate
04:19AndrewR: karolherbst, http://lists.llvm.org/pipermail/cfe-commits/Week-of-Mon-20190812/283920.html - waguely similar error with resolution http://lists.llvm.org/pipermail/cfe-commits/Week-of-Mon-20190812/283936.html
04:20airlied: anholt: are tracie results delayed somehwere?
04:20airlied: reports some reults, but not seeing them on the dashboard
04:20sumits: sravn, were you able to get to the panel patches? If not, and there aren't any more review comments, I could go ahead and merge them.
04:52anholt: airlied: apparently they show up "eventually"
04:52anholt: I made a fuss about it in a bug, nothing happened.
04:53anholt: I had to go look at the individual fail images from the log, which worked, and compare them to an existing run on the dashboard
04:57lrusak_: airlied, I sorted the issue. It seems I wasn't using VK_QUEUE_FAMILY_IGNORED when defining the VkImageMemoryBarrier. So I guess intel and amd are different when it comes to graphicsqueue and present queue
05:08sravn: sumits: Still way behind - sorry. What panel patches do you have in mind?
06:38sumits: sravn - the novatek poco f1 panel ones
06:44sumits: sravn: https://patchwork.kernel.org/project/dri-devel/list/?series=341993 - it is in its v7, and it didn't look like there were any more comments; bamse has also given his r-b
07:40mareko: well, we'll need much better CPU topology cpuid code in Mesa, because it doesn't really do anything on some Ryzens
07:41mareko: I'm also adding pipe_context::multi_draw, but only generated by u_threaded_context
07:56kusma: anholt, ajax, jekstrand: I guess you've pieced things together by now, but the drisw-stuff was really just there to get results quickly. airlied implemented "proper" DRI-support a long time ago. We suspected that the drisw code-path would make things at least *work* on NVIDIA etc, but that doesn't actually seem to work, beats me why...
07:58kusma: But yes, implementing Vulkan WSI support is probably a good idea in the longer term. In fact, I might even have a business-reason for looking into that soonish.
09:01icecream95: AndrewR: For the error on !6569, you could try replacing "new root_buffer" with "std::make_unique<root_buffer>"
09:48daniels: anholt: being worked on
09:56tpalli: mesa CI says 'Pipeline blocked' but even manual 'play button' does not seem to work, maybe there's something wrong there
09:57tpalli: this happened with MR 7025
10:02MrCooper: tpalli: looks like no detached pipeline was created, maybe because Eleni doesn't have at least developer status in mesa/mesa?
10:02MrCooper: the non-detached pipeline seems to work though on https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7025/pipelines
10:02tpalli: MrCooper ok that could be it .. I tried manually but it did not seem to launch itself
10:03MrCooper: if you mean the "Checking pipeline status" on the Overview tab, that might be waiting for a detached pipeline
10:04MrCooper: it should work fine when assigning to Marge though
10:05tpalli: I'll wait for possible objections and assign to Marge later
10:51AndrewR: icecream95, you saved me from upgrading whole toolachain ! It works!
10:52AndrewR: 710/710] skip: 73, pass: 495, fail: 43, timeout: 1, crash: 98 :}
11:11AndrewR: and with CL 1.2 override (both fixed !6569 and !4974 applied) : [710/710] skip: 57, pass: 499, fail: 52, timeout: 1, crash: 101
11:17AndrewR: https://yadi.sk/d/wPDkTyuU40sj4g?w=1 piglit results (json). You can download them by clicking on small 'down' arrow in top right corner of page .. (not best interface, but at least seems to work)
12:26MrCooper: tpalli: now a detached pipeline was created, maybe it was just fluke before
12:53tpalli: MrCooper 👍
15:02jadahl: am I supposed to be able to drmModeATomicCommit() a transaction containing things (primary plane, connector mapping) for CRTC A in one commit (with allow-mode-set, page-flip-event flags, async flags), then do the same but for CRTC B, without waiting for the former one to finish?
15:03jadahl: or for mode sets, do I have to update all CRTCs within one commit?
15:15daniels: the former is completely OK
15:16daniels: but be aware of https://gitlab.freedesktop.org/wayland/weston/-/issues/24
15:16daniels: and the dri-devel thread on that recently
15:20jadahl: daniels: i get EBUSY on just the second CRTC mode set without hotplug
15:20daniels: any substantial reconfiguration of the first CRTC will cause that
15:21daniels: if you do anything which effectively touches the FIFO - turn planes on/off, resize planes, change tiling modes, etc - Intel will pull in all CRTCs into the global state to redraw its watermarks
15:21jadahl: it wasn't reconfigured, I only enabled a second monitor. it was "re"configured with the same property values though
15:22jadahl: maybe it's still what happens
15:27jadahl: seems to indeed be that. making mode sets sync fixed it
15:29jadahl: another thing, if I first with one commit configure e.g. a cursor plane, then with a second commit reconfiguresd the primary plane and sets the page flip event flag, is this not supposed to be allowed? I get EBUSY here too
15:29daniels: not allowed
15:29daniels: you get one commit per CRTC per cycle
15:30jadahl: so I can only commit once per page flip per CRTC
15:30daniels: nice to see mutter moving along with atomic :)
15:30jadahl: it's kind of working with a single monitor atm
15:30jadahl: and the second too after the mode set being sync
15:31jadahl: but I should probably change that to commit all crtcs together (which I assume will be allowed)
15:39jadahl: the cycle "deadline", can it be queried? or is it the same as the drm page flip event timestamp?
15:46karolherbst: airlied: atm clover fails to load the device if libclc isn't there, but I guess this is fine
15:47jenatali: karolherbst: Yeah, we had a bunch of IRC discussion about that and it sounded like that was fine
15:47karolherbst: I just wished we had an explicit error about it
15:47karolherbst: just set up that stuff on another machine and I was wondering why clinfo didn't show my device :D
15:49karolherbst: test_allocation buffer gives me this now: BUG: kernel NULL pointer dereference, address: 0000000000000000 :D
15:50karolherbst: kernel regressions, the best kind of regressions
16:25MrCooper: jadahl: the timestamp corresponds to the end of vertical blank, but the deadline is normally at the start of vertical blank (or possibly even earlier)
16:26jadahl: MrCooper: "possibly even earlier" -> not something I can predict I suspect then
16:26jadahl: and I don't get to know about the start of the vblank either right?
16:28MrCooper: I'd say better leave at least 1ms gap before vblank, to avoid missing the deadline due to e.g. delays in the mutter process getting scheduled
16:28MrCooper: start of vblank can be calculated from the last timestamp and the mode timings
16:30jadahl: yea, wouldn't want to do it *right* before
16:32jekstrand: karolherbst, jenatali: I typed up a new libclc loader: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7034
16:32jekstrand: Haven't tested a line of it though. :)
16:38karolherbst: jekstrand: yeah.. because it looks like the HAVE_STATIC_CLC path wouldn't work at all at this point :p
16:39jekstrand: karolherbst: :P
16:39karolherbst: but I guess you planed to have a header file containing the blob?
16:39jekstrand: karolherbst: Yeah, there's zero build-system stuff setup for it :)
16:39karolherbst: I think I kind of like the idea of being able to embed it like this
16:39jekstrand: karolherbst: Yeah, that's the plan. The libclc_spirv_mesa3d_spv.h file is something I'm assuming dcbaker will make magically show up. :-)
16:40jenatali: We'll add the patches for setting up that header file when we get our upstream patches ready
16:40karolherbst: just wondering if the increase file size of libnir or whatever it gets linked into trips people of
16:40jekstrand: We can also zlib it like we do with some other stuff
16:40jenatali: jekstrand: For what it's worth, ours ended up named "spirv64-mesa3d-.spv.h"
16:40jenatali: Yeah, zlib is an interesting idea...
16:40karolherbst: jekstrand: I guess that works
16:40karolherbst: or maybe xz?
16:40jekstrand: jenatali: Works for me. I don't really care. You and dcbaker[m] can sort that out.
16:41jekstrand: karolherbst: Mesa already has a zlib dep
16:41karolherbst: but zlib is... so 1990 :p
16:41karolherbst: eh 1995 actually
16:41jekstrand: Which is why we can depend on it in a graphics driver without worrying about the fallout. :-)
16:41dcbaker[m]: karolherbst: zstd
16:41karolherbst: or zstd :p
16:42dcbaker[m]: we already do that other places and the API sis really nice
16:42karolherbst: ahh, cool
16:42karolherbst: I guess that's the cooler xz these days?
16:42jekstrand: We already use zstd?
16:42jekstrand: Yeah, looks like that's what the disk cache uses
16:42dcbaker[m]: yeah, we use it instead of zlib if it's available
16:42jekstrand: So, yeah, use zstd for the static one
16:43jekstrand: I don't even think you need the zlib fall-back if jenatali is ok with zstd
16:43karolherbst: I guess zstd compression speed sucks but if we move that to compile time "nobody" will bother
16:43dcbaker[m]: I'm assuming we'll do the same thing we do in the cache (and the patches I sent Ken) and do zstd if possible and fall back to zlib
16:43jekstrand: karolherbst: That's the whole point of zstd. Compression time sucks decompression time rocks.
16:43jenatali: I think I'm fine with whatever here
16:44dcbaker[m]: cool, I'd rather not screw around with zlib and it's 1990 streaming API :)
16:44jekstrand: jenatali: My point there is that I expect distros to use non-static because that's the "linux distro way" so I doubt anyone other than you will care about the static path.
16:44jenatali: jekstrand: Yep, understood
16:44jekstrand: Well, and maybe me when I want to rsync things to machines that may not have libclc installed. :)
16:44jenatali: jekstrand: it is convenient ;)
16:46jekstrand: dcbaker[m]: I've marked the MR as "devs can push" so feel free to push over it.
16:46jenatali: dcbaker[m]: FYI I had a couple patches to xxd.py for libclc usage: https://gitlab.freedesktop.org/kusma/mesa/-/commit/fac8529620c84a349054f6f7457723dcf370f4f4 and https://gitlab.freedesktop.org/kusma/mesa/-/commit/54eceec88d71894d71a7fec453fe0ad357bf7b07
16:46jenatali: I'm happy to create an MR for these, or anyone can cherry-pick them
16:47jekstrand: jenatali: That second one amuses me and makes me very sad. :-/
16:47jenatali: jekstrand: We also embed opencl-c.h and opencl-c-base.h using xxd.py FYI
16:47jenatali: I was really worried when I first bumped into that stupid limit
16:47karolherbst: what the
16:48karolherbst: how is that not a bug of msvc? :D
16:48karolherbst: but I bet the C standard has some silly exceptions for broken runtimes :p
16:49jenatali: Yeah, I dunno. It's easy enough to work around though
16:49anholt: yeah, longstanding limit, and iirc msvc wasn't the only one.
16:49jekstrand: It's not nearly as bad as if you hit the 65k limit on the number of types allowed. :-)
16:49anholt: note that the char array can be really slow to compile
16:50jekstrand: Actually, I think that one's 32K. It's a signed short IIRC.
16:50jekstrand: But, yeah, a big project with too many templates can do that to you, aparently. :-)
16:50anholt: (sigh, C not having include_bytes!())
16:51jekstrand: anholt: include_bytes sounds pretty nice.....
16:51jenatali: anholt: What language has that?
16:51anholt: jenatali: rust
16:51jekstrand: anholt: Does that do anything fancy like automatic compression?
16:51melissawen: danvet_, siqueira, I just sent an update of the vkms to-do; pls check if it's ok or if you guys have any suggestions to add
16:51anholt: jekstrand: nope
16:52jekstrand: anholt: Bummer. Though, given that it's built into the language, that sounds like it should be something they can change later.
16:52anholt: I think the usual answer is "compress your thing in build.rs, then include that"
16:52jekstrand: That's fair
16:52karolherbst: airlied, jekstrand: btw, I think my constant buffer clover MR is ready to get merged :)
16:52anholt: ahaha https://github.com/SOF3/include-flate
16:52jekstrand: karolherbst: Land it!
16:53jekstrand: anholt: Of course there's a crate for it!
16:53karolherbst: so, next: conversions :D (or images)
16:53anholt: that's pretty much how it goes.
16:53karolherbst: jekstrand: anything holding back the image support MR we have? at least I think I saw one
16:53jekstrand: karolherbst: You know how we've been talking about rewriting clover..... Maybe that'd be a good place to use rust?
16:53jekstrand: karolherbst: Not AFAIK
16:54karolherbst: no opinion on that :D I didn't found time to write rust code yet
16:54karolherbst: mhh 6th of november I could spend a full day of just doing rust stuff :D
16:54jekstrand: Rust is fun
16:54karolherbst: I can imagine
16:55jenatali: All the discussion on the mailing list made me finally actually read about it, but I haven't written any myself yet
16:55jenatali: I'll need to find some time and a good reason to do it...
16:55danvet_: melissawen, for the syzbot maybe link to the syzbot dashboard directly https://syzkaller.appspot.com/bug?extid=e7ad70d406e74d8fc9d0
16:55jekstrand: And I think it'd be easier to write an API front-end in it than a back-end compiler.
16:55karolherbst: ohh, probably
16:55jekstrand: And I think some of the safety features would be more natural there too.
16:55danvet_: in general for mailing list references please use lore.kernel.org, that's maintained by kernel admins, so higher chances it stays around
16:55danvet_: melissawen, otherwise lgtm, a-b: me
16:55karolherbst: just who wants to start that as I imagine this could take some time
16:57jekstrand: karolherbst: That's the other great thing. The OpenCL API is tiny so it's actually tractable.
16:57jekstrand: You'd have to wrap some gallium stuff, of course.
16:57jekstrand: This is way too tempting of a saturday project
16:57jenatali: jekstrand, dcbaker[m]: Found the patch with the Meson to create the .h for libclc: https://gitlab.freedesktop.org/kusma/mesa/-/commit/49f532c9cb8240cc90c1ffbebc147624a49933b9 - again, we'll include it in our upstreaming MR or feel free to pick pieces of it sooner
16:58anholt:started putting together ci image bits for rust the other day
16:58anholt: haven't tried actually putting any .rs in the build. I have a bunch of questions as to how that works in meson, and couldn't really find meson docs.
16:59karolherbst: jekstrand: sure, but I can do that on work time :p just the next possibility would be 6th of november, or I just do it anyway, but then I'd feel bad about other more important things I have to finish first :p
16:59jekstrand: karolherbst: Fair enough. :-)
16:59karolherbst: but if someone would to start I could pick it up then
16:59jekstrand: karolherbst: I can't make any guarantees
17:00jekstrand: It just sounds like a good bit of fun
17:00karolherbst: well, it's just a month away anyway
17:00karolherbst: ohh, sure
17:00karolherbst: maybe we then rewrite vaapi and vdpau next :p
17:00dcbaker[m]: jenatali: I've picked the first patches you'd mentioned.
17:00karolherbst: might want to include curro in the discussion though, as I suspect curro might have a strong opinion here :p
17:00karolherbst: or we do it anyway
17:01karolherbst: don't care
17:01dcbaker[m]: anholt: due to oddness in rust you have to create a static_library of just your rust files and link that into your final thing (if it includes C or C++)
17:01karolherbst: dcbaker[m]: well.. sounds like the thing which happens with c++ and C anyway :p
17:01anholt: that part sounds fine, it's stuff like "how do you control including std?" and "can it represent modules like cargo does" and "how do you do rust unit tests?"
17:01karolherbst: but I guess rustc has an included linker which just calls indo lld?
17:02dcbaker[m]: anholt: that's all questions for nirbheek (on the meson channel) he's done the most work trying to get meson and cargo to play nice
17:03dcbaker[m]: oh god, not rustc is terrible for linkers
17:03dcbaker[m]: it wont tell you what it's going to invoke
17:04dcbaker[m]: and on windows it invokes link.exe or lld-link.exe
17:04dcbaker[m]: on macos it invokes apple's clang as the linker
17:04dcbaker[m]: and on linux it invokes gcc as the linker
17:04dcbaker[m]:spent a lot of time trying to make that all work in meson
17:22karolherbst: btw.. is there anything better than grub2-reboot?
17:38jekstrand: Since when did glsl_type::is_scalar start returning true for images and samplers?!?!!?
17:41jekstrand: I guess it sort-of makes sense for GL-style bindless but it's 100% bogus 99% of the time.
17:42jekstrand: c618f31065d9e9d0e19afab7de8202ef609a731d aparently
17:47jekstrand: karolherbst: I'm running into issues with your runner where some tests write more stdout than it can handle.
17:48Venemo: jekstrand: is this a good time to bug you about a review?
17:48jekstrand: Venemo: No, I've got a thing in 10 minutes
17:49Venemo: allright, no rush
17:49jekstrand: Feel free to bug me again in an hour or so
17:50karolherbst: jekstrand: ehh.. really?
17:50jenatali: Which test?
17:50jekstrand: the async copy tests
17:50jenatali: Ah yeah, they spew a lot
17:51jekstrand: It would probably help if the right data got copied. 🤣
17:51karolherbst: I mean.. I would be happy to rework all of that and make a proper runner handling all of that perfectly :p
17:51jekstrand: It'd probably spew less then
17:52jenatali: Not by much :P
17:52karolherbst: that;s the one with the huge matrices, no?
17:53jekstrand: Hrm... It's getting data. Just not the right data....
18:05melissawen: danvet_, oh ok.. thx!
19:03dcbaker[m]: jekstrand, jenatali, karolherbst: I have this //gitlab.freedesktop.org/dbaker/mesa/-/commits/wip/2020-10/generic-clc-enable
19:03dcbaker[m]: of course, I don't have a new enough clc, and I don't really look forward to rebuilding LLVM from source
19:03dcbaker[m]: so no idea if any of it actually works
19:05airlied: dcbaker[m]: you don't need to rebuild llvm
19:05airlied: you just need to rebuild libclc out of the llvm monorepo
19:05karolherbst: airlied: so I can just build libclc-master with my system llvm-10? neato
19:06jekstrand: dcbaker[m]: I gave you the 64-bit SPIR-V file. Just drop it in /usr/lib64/clc
19:06airlied: karolherbst: yes you just need slt build against llvm 10
19:07airlied: which for fedora is dnf installable
19:07karolherbst: ahh, cool
19:07dcbaker[m]: jekstrand: and i need a pkg-config file... I might as well just build clc out of the monorepo
19:07karolherbst: dcbaker[m]: just install libclc-devel :p
19:07karolherbst: and drop the file in
19:07jekstrand: That's what I did. It works.
19:07dcbaker[m]: karolherbst: gentoo...
19:07karolherbst: it should have a libclc package, no?
19:08jekstrand: karolherbst: Yeah, but it involves building all of LLVM. 'cause gentoo. 😥
19:08dcbaker[m]: yeah, but it wants to link against amd or nvidia, which I don't have configured...
19:08dcbaker[m]: which would mean rebuilding LLVM
19:08karolherbst: dcbaker[m]: ehhh
19:08dcbaker[m]: I'm gonna just build clc out of the repo
19:08karolherbst: libclc depends on the PTX or AMDGPU llvm target?
19:08jekstrand: sounds like a plan
19:08dcbaker[m]: I think it's just an portage issue
19:09karolherbst: yeah.. let me check
19:09dcbaker[m]: butt fixing that sounds harder than coercing cmake to do something useful
19:10karolherbst: dcbaker[m]: ehhh...
19:10karolherbst: the ebuild is crap
19:10karolherbst: so you need to enable one target to get it built at all
19:10karolherbst: I guess we could convince it to always build the spirv one...
19:11jenatali: dcbaker[m]: Looks pretty cool :) though you did misspell spirv in a couple places
19:11dcbaker[m]: and fixing that sounds unpleasant and unrewarding
19:11karolherbst: dcbaker[m]: somebody has to do it sooner or later
19:11dcbaker[m]: yeah, since I don't have clc I can't convince vscode to autocomplete for me :)
19:12jekstrand: dcbaker[m]: Why use python rather than just depending on /usr/bin/zstd?
19:12karolherbst: mattst88: I am sure we can convince you to fix the libclc ebuild?
19:12dcbaker[m]: jekstrand: TIL there's a command line tool for zstd
19:13DrNick: it isn't a real compression algorithm unless it has a drop-in replacement for gzip and an additional letter for tar
19:13mattst88: karolherbst: I really wish someone else would. I don't use it and it's just a pain in my ass
19:14karolherbst: mattst88: I am sure there is a way we can make mesa the default OpenCL impls for your hw :p
19:14karolherbst: then you'd use it
19:14karolherbst: problem solved
19:16jenatali: dcbaker[m]: Once you have a libclc spirv blob, I'm curious what kind of compression you actually get out of it
19:16jenatali: I dunno how well SPIR-V compresses in general
19:16jekstrand: jenatali: 3.3M -> 761K
19:16jekstrand: jenatali: With zstd
19:16jenatali: Ooh, that's worth it
19:16mattst88: karolherbst: please do :_)
19:17karolherbst: mattst88: already working on it :p
19:17jenatali: Though for us we're bundling clang/llvm already, so that's a 3% savings :P
19:17jekstrand:hides his iris branch
19:17karolherbst: jekstrand: it's too late
19:22dcbaker[m]: is the spirv-llvm-translator the tool that libclc is looking for?
19:22jenatali: Yeah, that's needed to build the spir-v version of libclc
19:24Venemo: jekstrand: pingy ping. I added your r-b to the first two but there still are 3 NIR GS commits in MR 6964 that I'd like to ask you to please review.
19:26airlied: jekstrand, karolherbst : if you do a rust clover, just go all in cl 3.0
19:26karolherbst: airlied: sure
19:27airlied: karolherbst: btw I vote for images :-P
19:27airlied: conversion tests in piglit seem ot pass on llvmpipe :-P
19:28jekstrand: airlied: Of course!
19:28airlied: though I think it might have blown the CI time out of the water a bit
19:30karolherbst: airlied: but you wired up the intrinsic, no?
19:30karolherbst: that just sounds like so little work that I might do the basic
19:31karolherbst: move over OpenGL and not care about unsupported combinations for now
19:31karolherbst: that should like handle 70% of the stuff
19:32airlied: karolherbst: nope I did nothing except notice some tests pass
19:32jenatali: It's probably the lowering pass then
19:32airlied: I meant to investigate further to make sure it was sane passing :-P
19:32karolherbst: jenatali: yeah.. I totally forgot about that
19:32karolherbst: but I also wasn't aware it got wired up :)
19:37jekstrand: Yeah, I made clover unconditionally call it IIRC
19:38karolherbst: oh well..then it should work already indeed
19:39karolherbst: images it is then
19:39karolherbst: jekstrand: any work from you I can continue from or should I just start from whatever we have today?
19:39jekstrand: karolherbst: wip/clover-images. Don't grab the iris patches
19:40karolherbst: I like how you always stress out how we should ignore the iris patches, but then it's always you remembering us you have those patches :p
19:41airlied: just put at INTEL_NOT_THE_REAL_CL=1 env var in there
19:41jekstrand: honestly, it's tempting
19:42karolherbst: hey.. if you have 5 or 6 OpenCL runtimes running on Intel hardware really doens't make that much of a difference :p
19:42airlied: like at some point myself or karol will just write them outselves and send an MR :-P
19:42karolherbst: airlied: pssst... otherwise people will find it
19:43jekstrand: airlied: What do you mean "write". Don't you mean re-author?
19:43karolherbst: why re-author?
19:43karolherbst: that stuff is tirival to write :p
19:43airlied: jekstrand: rebrand :-P
19:44karolherbst: I am sure if the patch is called "WIP WIP" nobody will find it ;)
19:44airlied:gets non-strict vulkan lines tests to pass on lavapipe by always drawing start and ends
19:49Lyude: hey danvet_ do you know anything about the prime_nv_pcopy.c tests? it looks like they actually contain some code to do some tiling through the CPU, am I right in assuming that's all just for intel's tiling formats and not nvidia's?
19:53airlied: Lyude: just intel tiling
19:54Lyude: ah cool
19:54jekstrand: karolherbst: Do the async copy tests pass for you?
19:54airlied: though in theory there was a common tiling format
19:54airlied: not sure we ever found it
19:54karolherbst: jekstrand: yes
19:54jekstrand: karolherbst: Bummer
19:54jekstrand: Something's wrong on my end then
19:54karolherbst: jekstrand: all tests I include by default in my runner pass
19:54jekstrand: I have no idea what
19:54Lyude: i'd be a little surprised, I thought a lot of nvidia's tiling formats used compression (or pretty much all of them from what I can tell? maybe not the legacy ones but I'm not sure about that)
19:55airlied: Lyude: usually compressiong requires tiling, but not vice-versa
19:55Lyude: airlied: erm-yeah sorry, what you just said sounds right
19:55Lyude: i just know compression is specified in the fourcc modifiers
19:56jenatali: jekstrand: What's the failure look like?
19:56Lyude: (if I'm reading the docs on them right at least…)
19:56jekstrand: 0 -> [2d ] != [3c ]
19:56jekstrand: 1 -> [8b ] != [35 ]
19:56jekstrand: 2 -> [ 0 ] != [ 5 ]
19:56jekstrand: 3 -> [ d ] != [aa ]
19:56jekstrand: 4 -> [b0 ] != [eb ]
19:56jekstrand: 5 -> [2b ] != [17 ]
19:56jekstrand: It's getting data; just not the right data
19:56karolherbst: yeah, that looks busted :p
19:56jekstrand: karolherbst: Thanks. Very helpful.
19:57jenatali: jekstrand: Can you pastebin the full failure for a test case? I had some fun debugging those tests on our side a while ago
19:57jekstrand: jenatali: How do I make it dump the full thing?
19:57jenatali: Uh... I just meant all the text that it outputs
19:57jenatali: Though I think it might have a #define in the code that you can rebuild it to output more info
20:00jekstrand: jenatali: It becomes 0 at 393120
20:00jekstrand: Not sure if that's important
20:01jekstrand: that's for short
20:01jekstrand: 93183 for char
20:02jenatali: jekstrand: What do you mean "it becomes 0"?
20:02jekstrand: Here are two consecutive lines:
20:02jekstrand: 93183 -> [46 ] != [62 ]
20:02jekstrand: 116480 -> [15 ] != [ 0 ]
20:02jekstrand: After that, all the failed comparisons are != 0
20:02jenatali: Oh, interesting
20:02jekstrand: But it's not every one. It's like every 3rd that fails or so
20:03karolherbst: jekstrand: mhh
20:03karolherbst: jekstrand: maybe you run out of threads?
20:03jekstrand: karolherbst: It's totally possible
20:04karolherbst: try to reduce the amount of threads reported
20:04karolherbst: to the absolute lowest number the hw can always do
20:04karolherbst: clover doesn't really respect GPR constraints on launches
20:04karolherbst: so if your kenrel uses all your registers it still launches with max threads
20:09jekstrand: What's MAX_COMPUTE_UNITS
20:12karolherbst: jekstrand: I think that's just a fyi
20:12karolherbst: some info the API gives to applications
20:12karolherbst: what you need to limit is PIPE_COMPUTE_CAP_MAX_THREADS_PER_BLOCK
20:13karolherbst: just return the value you can support given the worst case of everything
20:39jekstrand: dcbaker[m]: Thanks!
20:45jekstrand: Hrm... THis might have something to do with it:
20:45jekstrand: shared-size: 0
20:45jenatali: Heh, that'd do it
20:47jekstrand:wonders what the wrapper does with local
20:47jekstrand: the kernel entrypoint wrapper
20:48jekstrand: For that matter, how do we handle shared at all?
20:48jekstrand: Is it derived from the kernel or does the API set it somewhere?
20:48jenatali: If it's declared as a local array inside the kernel, then it's derived from there
20:49jenatali: But you can also pass local args to the kernel, where the size of the array comes from the API
20:49karolherbst: jekstrand: sooo.. there are two places where we get the shared size in drivers
20:49karolherbst: in the nir_shader _and_ when launching the grid
20:49karolherbst: the driver has to add both values to get the final required size
20:49jekstrand: karolherbst: How do we handle making sure they don't overlap?
20:50jekstrand: karolherbst: I don't see anything in pipe_grid_info
20:50karolherbst: why would they overlap?
20:51karolherbst: the nir tells us how much it needs
20:51karolherbst: API provided buffers are added on top
20:51karolherbst: at least I think that's how it works :O
20:51jenatali: That's what we do, yeah
20:51jekstrand: So the API stuff starts at nir_shader::info::scratch_size?
20:51karolherbst: what scratch_size?
20:51jekstrand: shared_size, rather
20:51jekstrand: karolherbst: So where do I get the "dynamic" size?
20:52jekstrand: karolherbst: I don't see anything in grid_info. Is it "bound" somehow?
20:52karolherbst: mhh.. let me see
20:52karolherbst: ehh wait
20:52karolherbst: I think it's part of the CSO
20:52karolherbst: so you should get it through bind_compute_state
20:53jekstrand: Ah, req_local_mem
20:53jekstrand: I'll just add that to the NIR shader's requirement. :-)
20:53jekstrand: karolherbst: What about alignment?
20:53jekstrand: align to 8 before adding?
20:53karolherbst: I have this one patch
20:53jekstrand: That seems reasonable
20:54karolherbst: jekstrand: https://gitlab.freedesktop.org/karolherbst/mesa/-/commit/06c39e53dd60282faa795c0533b92e86be3b1b28
20:54karolherbst: I am still not sure if everything works out at runtime though
20:54jekstrand: karolherbst: Right
20:56karolherbst: yeah... I think it's a bit busted once you use both
20:56karolherbst: but the CTS doesn't do that afaik
20:56jenatali: ... wow
20:56karolherbst: we need to prefill ctx.mem_local in core/kernel.cpp depending on what the kernel needs
20:56karolherbst: then it should work out
20:56jenatali: It's crazy to me that with how long the CTS takes to run, there's still gaps like that
20:57karolherbst: jenatali: well.. it takes long for stupid reasons
20:57karolherbst: remove 95% of the longest running tests and you probably only removed 5% of the features
20:58jenatali: Yeah, running everything wimpy doesn't take too long
20:58jenatali: Basic covers most of the features anyway
20:58jekstrand: Well, that doesn't fix it. :-/
20:58karolherbst: what does was it again?
21:01jekstrand: Ok, if I hack some things, I can get it passing
21:01jekstrand: Time to unhack
21:04jekstrand: Ok, here's an interesting thing: If I get over 4 workgroups, it starts failing
21:04jekstrand: 4 and below and it passes
21:13karolherbst: so yeah.. images are next :p
21:13karolherbst: jekstrand: ehh...
21:13jekstrand: Have fun. I think I did a bunch of the hard work for you
21:13jenatali: Same :P
21:18airlied: mareko: moving to llvm10 causes changes in radeonsi rendering traces
21:19airlied: piano seems most hit
21:20airlied: not that I can really see nay diffs
21:21karolherbst: well, it does a lot of math
21:21karolherbst: ends up with a ton of fma stuff
21:21airlied:will likely just updaet the traces but maybe there's a bug
21:21karolherbst: it even has logarithmit and trigonometric stuff
21:33jekstrand: karolherbst: BTW: I was hoping you'd do whatever plumbing clover needs for the new clc loader assuming it looks functionally ok to you.
21:33jekstrand: karolherbst: I don't really grok how that's plumbed today
21:35karolherbst: yeah, I guess I could do that or airlied could :p
21:35jekstrand: I'd be happy for airlied to do it too. :P
21:35karolherbst: figuring out images kind of motivates me more though
21:35jekstrand: that's fair
21:36jekstrand: I might be able to get motivated to figure out clover well enough to do it. I'm just starting from near 0 on that one
21:36karolherbst: and I already had some code wiring it all up somewhat
21:36karolherbst: there are just the internals which are a bit broken
21:41airlied:adds to list
21:42airlied: bleh now vk multisample lines test to fix
21:43jekstrand: uh... what?
21:45airlied:has lavapipe fails in dEQP-VK.rasterization.interpolation_multisample_4_bit.lines,Fail
21:53jekstrand: grrrr Seems to work fine in simulation
22:32jenatali: jekstrand: What was it?
22:32jekstrand: Not setting my shared memory size right
22:32jekstrand: iris isn't really plumbed for this
22:33jenatali: Cool, glad you found it
22:35jekstrand: I still don't get why whacking the NIR didn't work....
22:35airlied:has an llvmpipe fix for the shared stuff as well
22:37jekstrand: Yet another place where clover is different for no good reason
22:37jenatali: I mean, there's a decent reason? The nir is baked after shader build, but the shared size can change based on kernel args?
22:38jekstrand: They could both be dynamic without hurting anything
22:39jekstrand: But maybe not for everyone, I guess.
22:39karolherbst: you want to be able to hardcode some offsets
22:39karolherbst: it has.. some benefits
22:39karolherbst: but anyway
22:39karolherbst: some shared accesses have an indirect base offset, some have hardcoded ones
22:39karolherbst: and the runtime needs to know how big all hardcoded ones are, etc...
22:40jekstrand: In any case, I sorted it out. It's just that every time I come across one of these things where the delta is some gallium difference, it's really painful.
22:43jekstrand: Pass 96 Fails 6 Crashes 4 Timeouts 0
22:45jenatali: jekstrand: What's left?
22:45jekstrand: basic async_strided_copy_global_to_local
22:45jekstrand: basic async_strided_copy_local_to_global
22:45jekstrand: basic bufferreadwriterect
22:45jekstrand: basic hiloeo
22:45jekstrand: basic kernel_call_kernel_function
22:45jekstrand: basic kernel_memory_alignment_local
22:45jekstrand: basic constant_source
22:45jekstrand: basic kernel_memory_alignment_constant
22:45jekstrand: basic kernel_preprocessor_macros
22:45jekstrand: basic work_item_functions
22:45jekstrand: The constant ones will get fixed once karolherbst's MR lands
22:45jekstrand: async_strided* is me not advertising big enough buffer sizes
22:46airlied: jekstrand: it landed I thought
22:47jekstrand: airlied: Looks like I need a quick rebase
22:48jekstrand: I assume kernel_memory_alignment_local will get fixed by karolherbst's alignment patch
22:49karolherbst: jekstrand: for memory limits I have something like this: https://gitlab.freedesktop.org/karolherbst/mesa/-/commit/005e19cc9796d6b23b85c993f54908e971748636
22:49karolherbst: this is really an annoying thing
22:49karolherbst: but you can also not just advertize whatever your max VRAM size can be
22:49karolherbst: as this will just fail
22:50jekstrand: karolherbst: I just set it to 1 GB
22:50jekstrand: It probably needs a bit of help there
22:50karolherbst: that's fine :p
22:51karolherbst: I think nvidia reports 1/4 of the VRAM size for max alloc or something
22:51karolherbst: some drivers are a bit careful here
22:51jekstrand: I assume you'll have more than 1GB of ram in a modern system
22:53karolherbst: like runtimes care
22:53karolherbst: they probably care about max alloc size and that's it
22:54jekstrand: Pass 99 Fails 5 Crashes 1 Timeouts 0
22:54jekstrand: load_work_dim isn't being lowered by clover
22:55karolherbst: we have that native in codegen
22:55jekstrand: Where do I get that from?
22:55jekstrand: And how is it different from num_work_groups?
22:56karolherbst: it has all the stuff
22:56jekstrand: Sure but what does it mean?
22:56karolherbst: how many dimensions you have
22:57jekstrand:didn't realize that was configurable
22:57karolherbst: you can launch a 10x1x1 grid
22:57karolherbst: but there is an explicit aPI setting for that anyway
22:57karolherbst: clEnqueueNDRangeKernel has a work_dim parameter
22:57karolherbst: and when it's 1, the 2nd and 3rd dim gets set to 1
22:57karolherbst: so you have your nice 1D group
22:58karolherbst: and can optimize a few things
22:58karolherbst: I mean.. it doesn't really matter all that much hence is just a fake value
22:58karolherbst: but hey...
22:58jekstrand: but not if it's not a compile-time constant. :-P
22:58karolherbst: it's in the API
22:59karolherbst: clover actually has support to move it as an argument
22:59karolherbst: same as grid_offset
22:59karolherbst: I just didn't wire it up
22:59karolherbst: as codegen supports it on its own already
22:59karolherbst: the lowering should be pretty staright forward
22:59karolherbst: just do the same as for grid_offset
22:59karolherbst: module::argument::grid_dimension is the thing you need to use
23:03jekstrand: Yeah, that'd be nice.
23:04airlied: jekstrand: lols mmap failed when I used your branch :-P
23:04airlied: oh it didn't wierd
23:04airlied: open_clc_data seems to return false when it work
23:05jekstrand: airlied: Oops
23:05jekstrand: airlied: I warned people on IRC that I didn't test it. :-P
23:06airlied: clinfo works now :-P
23:07jekstrand: kernel_call_kernel_function fail seems to be a validation issue
23:07jekstrand: airlied: Feel free to fix my patch if you want. Or you can provide a FIXUP if you want me to look at the changes.
23:07jenatali: jekstrand: Could be https://github.com/KhronosGroup/SPIRV-LLVM-Translator/issues/598
23:08airlied: jekstrand: I'll push the fix + clover patch on top
23:08jekstrand: jenatali: Looks about right.
23:08jekstrand: airlied: Thanks!
23:08jekstrand: In that case, I think iris is about on-par with nouveau now \o/
23:08jenatali: Oh, we weren't disabling optimization at this point, so we were able to pass basic, but started to fail kernels calling other kernels once they were in separate translation units
23:10jekstrand:will likely be flying X-Wings the rest of the evening.
23:27AndrewR: karolherbst, thanks, merged version of "clover/nir: support indirects on in kernel constants" compiles and works on my end!
23:31jekstrand: basic async_strided_copy_global_to_local
23:31jekstrand: basic async_strided_copy_local_to_global
23:31jekstrand: basic bufferreadwriterect
23:31jekstrand: basic hiloeo
23:31jekstrand: basic kernel_call_kernel_function
23:31jekstrand: basic kernel_memory_alignment_local
23:31jekstrand: Pass 100 Fails 6 Crashes 0 Timeouts 0
23:31karolherbst: why do those fail though
23:31jenatali: I'd expect hiloeo to work
23:31jekstrand: karolherbst: The async tests are failing because they can't dump to stderr anymore with your script
23:31jenatali: (Which is still my favorite test name)
23:31jekstrand: karolherbst: call_kernel_function is a validation error
23:31jekstrand: karolherbst: memory_alignment_local is your upatch
23:31jekstrand: karolherbst: Not sure about the other two
23:32jenatali: jekstrand: hiloeo is just vector swizzling
23:32jekstrand: Given how many 3D shaders we run through our pipeline, I'd be surprised if we got swizzling wrong
23:32jekstrand: Still, it's 100% possible
23:32jenatali: Could be failing once it gets to the high vector sizes though
23:32karolherbst: well, of those pass for me :p
23:32jenatali: It's hi, lo, even, odd
23:34airlied: jekstrand: not sure why but clover ran two passes before caching
23:41karolherbst: jekstrand: maybe something goes wrong for 8/16 components :p
23:49karolherbst: jekstrand: was there an image test you got working at all?
23:49karolherbst: it all looks pretty busted
23:52karolherbst: ahh right