00:24 krh:blinks
00:41 hanetzer: agd5f_: posted some more info. it seems the issue is specific to this gpu, or this kind of gpu at least.
00:41 hanetzer: the old gpu works with the old patch and the new patch perfectly fine.
02:05 jenatali: jekstrand: CL patch dump almost done, just a few more, and then I have to completely rework how we handle conversions based on what we talked about :)
02:44 jekstrand: jenatali: Cool. I'll try and read through them tomorrow
08:29 pq: bl4ckb0ne, bump the image name, or FDO_DISTRIBUTION_TAG if you're using ci-templates.
08:30 emersion: pq: do you remember the discussions about sharing EDID code between the kernel and user-space? do you remember the potential solutions we discussed?
08:31 pq: emersion, not very well.
08:34 pq: to the point that I'd probably just repeat the discussion
08:35 emersion: ok
08:36 emersion: i'm trying to find some examples of sharing code between the kernel and user-space in general
08:41 pq: DRM headers is the only one I know of that has an explicit process and a userspace project repository.
08:42 pq: I suppose the kernel has to be the primary source for the code, so that people notice the code must be good for the kernel and you can't simply malloc() stuff.
08:43 pq: OTOH, the kernel would not use most of the parsed things...
09:36 pinchartl: danvet: ping
09:38 danvet: pinchartl, right out the door :-/
09:39 pinchartl: danvet: no urgency
09:39 pinchartl: I had a question about atomic_check and display port
09:39 danvet: pinchartl, I'll be back later today before I disappear for 2 weeks
09:39 pinchartl: ok
09:39 danvet: oh maybe Lyude is better for that
09:39 pinchartl: I'll ask now and you can answer later :-)
09:39 pinchartl: or Lyude can
09:39 pinchartl: my understanding of atomic_check is that it shouldn't test the hardware
09:40 pinchartl: however, to validate modes, I've seen drivers performing link training in atomic check
09:40 danvet: it depends
09:40 pinchartl: to know what speed is available
09:40 danvet: yup
09:40 pinchartl: I was wondering if it was allowed
09:40 danvet: modern hw has made this complicated
09:40 danvet: oh, that's bonkers
09:40 danvet: no link training in atomic_check
09:41 danvet: if a driver does that we have a problem :-)
09:41 pinchartl: tomba: ^^
09:41 danvet: so the way this is supposed to work is you read the dpcd stuff in probe
09:41 danvet: and use that in atomic_check to see whether it should work
09:41 danvet: then if the link training fails, either at commit time
09:41 danvet: or later on when you get hpd pulse
09:42 danvet: and you can't make the current mode work anymore, or something else fell apart
09:42 danvet: you set link-status to BAD
09:42 danvet: and also in your probe code you need to take the now more limited link bw into account for validating modes
09:42 pinchartl: but what if a monitor has just been plugged and the link is down before the output is disabled ? the link hasn't been trained yet, so we don't know how fast it will go, and atomic_check can't filter modes out accordingly, right ?
09:42 danvet: userspace can then get the updated mode list and pick a new one
09:43 emersion: please keep atomic_check fast™
09:43 danvet: if you don't have dpcd you can just reject it
09:43 danvet: userspace doing something stupid
09:43 danvet: since there's no way you'll be able to light it up without something connected anyway
09:43 danvet: or you can just pick max link bw that your source supports
09:44 danvet: since you're supposed to light up the crtc and get the vblank going no matter what
09:44 pinchartl: I'll let tomba comment on why this is used in the cadence dp driver ("[PATCH v8 2/3] drm: bridge: Add support for Cadence MHDP DPI/DP bridge")
09:44 danvet: so even if link-status is BAD the crtc should keep going
09:44 danvet: i.e. you have to light up the link and keep the clocks going
09:44 danvet: since otherwise userspace might get stuck somewhere bad
09:45 danvet: maybe we should fix the docs to explain this all
09:45 danvet:really gtg now
09:45 tomba: well, we've been going through different ways to get things working in different scenarios. hasn't been easy with the LT.
09:46 pinchartl: thanks for the information
09:50 tomba: we have DP source that can go up to 8.1, but my monitor is only 1.62, and additionally the first LT with that monitor after the monitor has been powered up has trouble, causing only 2 lanes to be used. It's been a good test case to see how it works =)
09:52 emersion: what is LT?
09:52 tomba: link training
09:54 emersion: ah, i see
09:54 emersion: yeah i'd expect this to go through link-status
10:13 pq: danvet, happy holidays!
11:05 clever: mripard: ah, nvm thats in the source now: https://github.com/raspberrypi/linux/blob/rpi-5.4.y/drivers/gpu/drm/vc4/vc4_crtc.c#L1134
11:22 pmoreau: karolherbst: I’m working on implementing cl_khr_extended_versioning, and taking the opportunity to get rid of all the string comparisons for versions.
11:22 karolherbst: :) cool
12:01 danvet: pq, thx
12:09 danvet: agd5f, intel-gfx-ci approves of my quick patch, so looks like it's not entirely busted
13:33 bl4ckb0ne: pq: it works thanks
13:34 bl4ckb0ne: do you know if there's a way to gets packages from debian backports?
13:39 pmoreau: karolherbst: Looking again at the `clover::spirv::check_extensions()` function, I don’t see how the current version makes sense: it is currently checking against advertised platform and device extensions, but those are `CL_*` extensions whereas the function is looking at extensions declared with `SpvOpExtension` which will be `SPV_*` ones.
13:39 pmoreau: Or am I getting confused?
13:43 pmoreau: Also, what the heck was wrong with me at the time, fetching the list of platform and device extensions on every loop iteration rather than doing it once outside the loop? 🤦
13:47 tomeu: anholt: cannot find how do you deal in arm64_a630_vk with the absolute path in the icd file
13:52 karolherbst: pmoreau: dunno :D
13:52 karolherbst: pmoreau: but I am kind of inclined to accept all extensions or are we required to tell which we don't support? mhh
13:52 karolherbst: or maybe we indeed whitelist
13:52 karolherbst: we want to tell the translator anyway
13:53 jenatali: karolherbst: What's left before we merge the structurizer?
13:53 karolherbst: jenatali: I think nothing
13:53 pmoreau: We do want to tell the translator which ones it is allowed to use, true, to avoid getting unexpected ones.
13:55 pmoreau: So if we are already setting up a list of which ones we support, we might as well check all incoming modules and refuse those using unsupported ones, especially if spirv_to_nir doesn’t handle those in a graceful manner.
13:56 jenatali: Alternatively, make spirv_to_nir handle them in a graceful manner so you don't have to upfront validate them?
13:57 pq: bl4ckb0ne, isn't there a script that installs all packages in the image? Like this: https://gitlab.freedesktop.org/wayland/weston/-/blob/master/.gitlab-ci/debian-install.sh
13:59 pmoreau: The validation should not be that costly, is trivial to do, and we already need a list of supported SPIR-V extensions to give to the translator.
13:59 jenatali: Fair
14:00 karolherbst: yeah..
14:00 karolherbst: with the new interfaces all of that will be more sane
14:00 pmoreau: Though I would be more comfortable with that list coming from spirv_to_nir than something hardcoded in clover, tbf. :-D
14:00 karolherbst: and hopefully the translator doesn't push annoying extensions to us :D
14:01 karolherbst: pmoreau: I think it doesn't matter really
14:01 karolherbst: so I'd do this: have a list for the translator to pass into and _warn_ if the spirv contains an extension we don't know about
14:01 bl4ckb0ne: there isnt one in waffle, i was only looking in the ci-template repo for it
14:01 bl4ckb0ne: thanks
14:01 karolherbst: I am sure accepting would be safe
14:01 karolherbst: but we should still warn
14:02 karolherbst: most of the time vtn would either just ignore stuff or warn/error out itself
14:04 pmoreau: True… and currently we would even be rejecting modules for using debug extensions, even if those have no effect on the generated code in the end. 🙃
14:04 karolherbst: yeah
14:04 karolherbst: I think it does make sense to ack extensions we see being used and such
14:05 karolherbst: but I think it still would be safe to just accept them all. We just don't want the translator to add things we might not support
14:05 karolherbst: as this is actually preventable :)
14:05 pmoreau: Right
14:05 pmoreau: So we definitely want to tell the translator what it’s allowed to use
14:06 karolherbst: jenatali: I might want to review or ack jekstrand patches :D
14:07 pmoreau: If vtn is taking care of warning/error’ing on extensions it does not support, do we even need to do that in clover.
14:07 karolherbst: pmoreau: well.. it does not check extensions, but instructions, capabilities, etc...
14:08 karolherbst: and it it hits something it doens't know about it errors out afaik
14:09 karolherbst: pmoreau: but I think it writes into stderr :/
14:09 karolherbst: that's something we might want to check
14:11 jenatali: I believe you're right
14:17 karolherbst: I mean, I am sure it's done for a lot of things, I am just not sure if it's done for _all_ things ;)
14:26 jekstrand: WTH??? It seems like my laptop doesn't have enough RAM to link clang????
14:27 karolherbst: jekstrand: yes
14:27 karolherbst: jekstrand: disable linker threading
14:27 jekstrand: karolherbst: how?
14:28 jekstrand: karolherbst: I'm running with -j1
14:28 karolherbst: ehh.. there is some cmake option for that
14:28 karolherbst: or use make instead of ninja
14:28 jekstrand: -j1
14:28 jekstrand: At still gets a signa 15
14:28 karolherbst: let me see...
14:28 jekstrand: No error message, just signal 15
14:29 jekstrand: (SIGTERM)
14:29 karolherbst: mhh
14:29 jekstrand: Maybe ccache is a problem?
14:29 karolherbst: probably :p
14:29 karolherbst: my advice: don't build clang :D
14:29 jekstrand: karolherbst: Yeah... About that...
14:29 karolherbst: why would you need to build it anyway?
14:30 jekstrand: LLVM_CCACHE_BUILD=OFF
14:30 karolherbst: heh
14:30 jekstrand: Because the cl_khr_subgroup_extended extension isn't implemented in clang 10
14:31 jekstrand: So it's either that or hack stuff up to use the intel extension
14:31 jekstrand: But that's a lot of linker hackery
14:31 karolherbst: jekstrand: LLVM_PARALLEL_LINK_JOBS was what I was thinking about
14:31 jekstrand: Well, not a lot. Just a whitelist of mangled function names and annoying garbage in spirv_to_nir.c
14:32 karolherbst: jekstrand: there are usually llvm-git packages around for everybody
14:33 jekstrand: karolherbst: Not in fedora
14:33 jekstrand: karolherbst: Maybe if I install a copr
14:33 karolherbst: well, also for fedora
14:33 karolherbst: https://copr.fedorainfracloud.org/coprs/che/llvm/ + https://copr.fedorainfracloud.org/coprs/che/mesa/
14:33 karolherbst: it's just.. that that one breaks once in a while
14:33 karolherbst: well 4 months old.. could be good enough
14:34 karolherbst: anway, that's what I used to use in the past
14:35 jekstrand:tries with CCACHE_DISABLE=1
14:36 karolherbst: good luck with rebuilding 5 times as you get the options wrong each time :)
14:36 jekstrand: VIRT at 12g before it died
14:36 jekstrand: dang
14:37 jekstrand: I've got a machine with 32G of ram. Maybe that'll work?
14:37 karolherbst: I upgraded my machine to 32GB just because of building llvm :p
14:37 karolherbst: anyway
14:37 jekstrand: Can you do better by using a different linker?
14:37 karolherbst: you want to disable linker jobs
14:37 karolherbst: or use make
14:37 jekstrand: Maybe gold? Maybe lld?
14:37 karolherbst: with ninja the build system becomes smarter
14:37 karolherbst: and just threads ld
14:37 jekstrand: It's a single ld process that's burning 12G+
14:37 karolherbst: but.. dunno why, but you get cores * cores thrads
14:38 karolherbst: *threads
14:38 karolherbst: ohh, maybe LTO is enabled now?
14:38 jekstrand: off
14:39 karolherbst: are you using ninja or make?
14:39 karolherbst: anyway, you want LLVM_PARALLEL_LINK_JOBS to be off I think
14:39 karolherbst: otherwise it can really end up badly
14:39 jekstrand: debug ifno
14:40 jekstrand: CMAKE_BUILD_TYPE=Release should help. :)
14:41 karolherbst: you'd think
14:41 karolherbst: just makes you waste more time :p
14:41 jekstrand: Quite possibly!
14:42 jekstrand:starts drafting a new SPIR-V extension
14:43 karolherbst: building llvm always goes the same way: 1. you spend hours figuring out how to make it build 2. you rebuild with different options as linking libraries against it failed 3. you rebuild the whole thing with another set of options as runtime just crashed 4. you end up with binaries working two days later, but forgot some other option making you go through 1-3 again :p
14:47 karolherbst: jekstrand: btw, I am wondering if we should squash your changes into the original commit or not..
14:52 jekstrand: karolherbst: I'd kind-of like to leave them separate
14:53 jekstrand: karolherbst: As long as we don't use it until after the bugfix, there shouldn't be a problem
14:53 jekstrand: This way we keep the original in the history more-or-less as it was
14:53 karolherbst: yeah.. I guess that's fine .. I was mainly wondering about the whitespace one
14:53 jekstrand: If you want to squash that one, fine.
14:54 jekstrand: It doesn't make any substantive changes
14:55 karolherbst: I assume correct, that you ran the newest version through CI?
14:57 jekstrand: Vulkah CI, yes.
14:57 jekstrand: Vulkan, even
14:59 Lyude: pinchartl: a little late but I second danvet's take, and also we don't really expect our initial link training assumptions to always work the first time - there will be devices that report a specific link rate, which we then try and fail to use and have to set link-status appropriately to downgrade things for the very first LT
15:00 Lyude: (of course, you can also always add quirks for devices you know don't report capabilities correctly :)
15:03 pinchartl: Lyude: from a userspace point of view
15:03 pinchartl: let's assume the DP output is disabled
15:03 pinchartl: a DP monitor is plugged in
15:03 pinchartl: and userspace wants to turn it on
15:03 pinchartl: there will be an atomic check phase that will need to validate the requested mode
15:04 pinchartl: when does DP LT happen in that grand scheme of things ?
15:04 Lyude: pinchartl: not until the atomic commit
15:05 Lyude: as well I'm not actually sure if you can start DP link training in hw (outside of MST) earlier then that, not 100% on that one though
15:05 pinchartl: so that means the initial atomic check may accept a mode that isn't compatible with the link capabilities
15:06 pinchartl: atomic commit would fail, setting link-status
15:06 pinchartl: and userspace would then retry ?
15:06 ajax: iirc, pluggable dp ports are required to have working/honest dpcd before link training
15:07 Lyude: ajax: it's a bit more complicated then that sometimes unfortunately
15:07 ajax: (admittedly my recollection dates to dp 1.1a days)
15:08 Lyude: pinchartl: correct-userspace would retry, assuming a driver has fallback link retraining hooked up properly then after link status gets set to bad it would reread the new link caps of the sink, and validate new modes against that. So userspace would handle this by retrying the previous mode it had, if that fails the atomic check it needs to downgrade the mode until it fits and passes atomic
15:08 Lyude: check
15:08 pinchartl: ok
15:08 pinchartl: a bit convoluted, but there are not many other options
15:08 pinchartl: do we have any userspace that implements that ? :-)
15:08 Lyude: pinchartl: yeah - ideally you really shouldn't have fallback link retraining occur that often in the real world
15:09 Lyude: pinchartl: I think but I can't remember what specifically does
15:10 pinchartl: no, in the real world it shouldn't happen often, except when the monitor is initially plugged in, as the initial LT will happen after the first atomic check
15:11 Lyude: pinchartl: yeah-on most monitors the first LT shouldn't fail like that at all, DPCD caps usually are accurate. The situations where they might not be usually involve buggy sinks, or potentially a DP repeater that has it's own lower bw limits that it doesn't advertise properly
15:12 Lyude: for instance I think the only device I have that actually hits fallback link retraining is a very specific hw revision of a caldigit TS3 thunderbolt dock that they replaced with a newer revision that doesn't have those issues
15:12 emersion: sway should implement it correctly, i tested it ~1 year ago
15:12 emersion: sway/any wlroots-based compositor
15:12 pinchartl: Lyude: could it be caused by a faulty cable for instance ?
15:13 emersion: i don't know of any other user-space compositor implementing it sadly, because it's hard to test
15:13 Lyude: pinchartl: yeah, or really heavy ambient noise
15:14 Lyude: Note too that DP cables, at least the full sized and some of the mini ones, are a bit notorious for breaking easily
15:14 Lyude: I always bring up that one time I spent almost a week trying to solve LT issues on radeon a few years back, only to find out that both of the two cables I had been using to test (using a second because I assumed maybe the first was broken) were broken
15:16 emersion: you can now label them as "good for LT failure testing" cables :)
15:16 emersion: (honestly i'd like to have a broken cable to test things)
15:18 Lyude: I think I did with some of them actually, lol
15:18 ajax: i used to have a vga cable with the ddc pins ripped out
15:18 Lyude: Same
15:26 Lyude: (BRB, rebooting server)
15:42 jekstrand: karolherbst: Built! :-P
15:43 karolherbst: yeah.. that's step 1 :p
15:43 karolherbst: now try to build the translator based on that (or maybe you were up to something else)
15:54 jekstrand: karolherbst: Built
15:54 karolherbst: now try to run it :p
15:54 karolherbst: (or maybe you just secretly get everything right and I am more annoyed I always spend a day on that :P )
15:57 jekstrand: Yeah, that's the real problem.
15:57 jekstrand: I don't know how to tell meson where to look for LLVM
15:57 karolherbst: ahh
15:57 jekstrand: I'm sure there's a flag or something
15:58 karolherbst: llvm-config can be specified through cross files at least.. but you can also mess with PATH
15:59 karolherbst: ohh. seems like with 0.51.0 meson can use cmake as well. mhhh
15:59 jenatali: That's what we're using on Windows, yeah
16:03 jekstrand: dcbaker[m]: How do I tell meson which LLVM to use?
16:03 jekstrand: mareko: ^^
16:05 dcbaker[m]: jekstrand: you need a meson native file like
16:05 dcbaker[m]: [binaries]
16:05 dcbaker[m]: llvm-config = 'pathtomyllvmconfig'
16:05 dcbaker[m]: I seem to remember llvm-config doesn't work unless you install llvm
16:06 jekstrand: dcbaker[m]: And what do I do with that file?
16:06 dcbaker[m]: Pass it to meson with --native-file
16:06 dcbaker[m]: It's busy t like a cross file, but describes the build machine
16:07 jekstrand: dcbaker[m]: How do I ask meson what options I used to configure?
16:09 dcbaker[m]: meson configure builddir
16:09 jekstrand: thanks!
16:10 karolherbst: jekstrand: I think you want "bin/meson-cmd-extract.py build" :p
16:16 jekstrand: And.... now it's complaining that I don't have the right LLVM modules built
16:17 jenatali: Sounds about right
16:18 karolherbst: jekstrand: for a start you should probably just copy fedoras llvm options to not mess up that badly in case you didn't
16:28 daniels: jekstrand: you can look at .gitlab-ci/windows/mesa_deps.ps1 for how we build it
17:29 jenatali: karolherbst: Generalized the printf a bit more to use global mem. Still not too convinced on the value of parsing code, since it'd have to be either pseudocode or heavily parameterized
17:39 jekstrand: meson now detects llvm 12 \o/
17:57 dcbaker[m]: They've managed not to break llvm-config for a while now. If only they would give us proper pkg-config files...
18:01 karolherbst: jenatali: do you have the runtime code somewhere for the parsing?
18:05 karolherbst: jenatali: but yeah, the printf lowering itself looks good now and I think it's useful for clover in this state as well :) thanks
18:07 karolherbst: jenatali: but I think we should drop default_buffer_size as well and just assert on the buffer size to be set
18:07 karolherbst: and assert on options as well to make the code a bit cleaner
18:08 karolherbst: (will do a more thorough review next week)
18:24 jekstrand: karolherbst: I'm compiling shaders. :D
18:24 karolherbst: :D
18:25 jenatali: karolherbst: Mmkay, sounds fine
18:25 karolherbst: jekstrand: wait.. so you build llvm yourself, so you can mess it up to write vulkan shaders in C or what? :p
18:26 jekstrand: karolherbst: Something like that.
18:26 karolherbst: that would be fun though :p
18:27 karolherbst: you should write a C extension for all the shader bits and then we can create language bindings and whatnot :D
18:27 jenatali: karolherbst: I've got the CLOn12 parsing code that I can link to - it's similar to what's implemented in Clover for LLVM already, except that the nir version may have padding to align args to a struct base, where the LLVM version writes them packed
18:27 karolherbst: right...
18:27 jekstrand: VK_EXT_execute_C_code
18:27 karolherbst: and with clover we have those lowered strings
18:27 jekstrand: For all your parallel C/C++ needs!
18:27 karolherbst: sure :D
18:28 jenatali: karolherbst: Right - you could add a lowering pass between vtn and the one I wrote to gather the args and process it into the same metadata that LLVM emits
18:28 karolherbst: oh my.. and somebody writes VK_EXT_execute_C_SIMD to expose all of SSE/AVX :p
18:28 jekstrand: Naturally!
18:28 karolherbst: jenatali: mhh yeah.. I guess I could do that, need to figure out how much all of that differs
18:29 jekstrand: Just have to transpile x86 to SPIR-V
18:29 jenatali: karolherbst: They're pretty similar already, and I'd be open to more patches to converge them further
18:29 karolherbst: okay, cool
18:30 karolherbst: jekstrand: I have the idea: we do a single source extension where you can just write your vulkan shaders inside your c file or whatever and the magic is taken care of by the compiler :p
18:30 jenatali: I think the LLVM one doesn't use atomics, so it doesn't have a counter at the beginning of the buffer, it's just broken if you try to have multiple threads write at the same time (which is okay, CL says that's undefined)
18:30 karolherbst: jenatali: huh? but if that's undefined all of printf would be?
18:31 jenatali: > In the case that printf is executed from multiple work-items concurrently, there is no guarantee of ordering with respect to written data.
18:31 karolherbst: ordering, yes
18:31 jenatali: I guess it's a little bit stricter than just completely undefined
18:31 karolherbst: but I think each printf execution has to be correct
18:32 karolherbst: just the order of the result is undefined
18:33 karolherbst: because.. you know.. otherwise all of that would be quite broken, wouldn't it?
18:33 jenatali: Ah, nevermind, there is a header
18:33 karolherbst: ahh
18:33 karolherbst: okay
18:33 jenatali: Sorry, misread the printing logic - it reads back the data *after* the header into a std::vector, but I thought it read the whole printf buffer
18:33 karolherbst: jenatali: but I wouldn't mind having two parsing functions though... the amd one clover has is quite.. amd specific afaik
18:34 karolherbst: it's not llvm
18:34 jenatali: Ah, got it
18:34 karolherbst: or is there a llvm pass converting the stuff?
18:34 jenatali: They're similar, except for strings: the one in !6040 reads the actual string contents (not the format string, but %s) from the buffer, where I'm reading the pointer and looking up the string
18:34 karolherbst: I think in the end it matters if it's just AMD backend magic (in which case I am inclicned to just ignore it) or if it's core llvm stuff
18:34 karolherbst: ahh, okay
18:35 jenatali: I think it's libclc, but in an AMD-specific impl?
18:35 karolherbst: I honestly don't know
18:48 karolherbst: jenatali: mhhh.. libclc doesn't contain the string "printf" at least? or was it added recently.. mhh
18:49 jenatali: karolherbst: No, I think you're right, it's not libclc
18:50 karolherbst: mhh.. there is a amdhsa.printf thing
18:50 karolherbst: but I doubt it's the one which was used? mhh
18:51 karolherbst: "llvm.printf.fmts" is the metadata... let's see
18:51 karolherbst: yeah...
18:51 karolherbst: that's AMD stuff
18:51 karolherbst: not LLVM
18:51 jekstrand: Ugh... Now I'm getting an i128 in LLVM
18:51 karolherbst: jekstrand: :D
18:51 jekstrand: %8 = bitcast <4 x i32> %extractVec17 to i128
18:51 jekstrand: %9 = trunc i128 %8 to i96
18:51 jekstrand: %10 = bitcast i96 %9 to <12 x i8>
18:51 jenatali: jekstrand: Sounds like your target triple is wrong?
18:54 karolherbst: ugh...
18:54 karolherbst: is that always so annoying with llvm?
18:54 karolherbst: now that I start looking into the amd backend
18:54 karolherbst: they just add hidden parameters and stuff?
18:54 karolherbst: how annoying
18:55 karolherbst:kind of wants to move all AMD specific stuff inside clover to something behind AMD/ or radeon*/
18:58 jekstrand: Does OpenCL have a vec3?
18:58 jenatali: Yep
18:59 karolherbst: vec3 is super pointless...
18:59 karolherbst: it's vec4, just the programmer isn't allowed to touch .w :p
18:59 jenatali: Except for vload/vstore
18:59 jekstrand: Then I have no clue why LLVM thinks it needs to shuffle a vec3 into a vec4, cast to an i128, down-cast that to an i96, and then bitcast into a u8vec12.
19:00 karolherbst: because it's llvm
19:00 jekstrand: Half those things aren't legal OpenCL
19:00 jenatali: u8vec12? ...
19:00 jekstrand: No, actually i8vec12
19:00 jekstrand: Because signed makes it so much better
19:00 karolherbst: :D
19:00 jenatali: Well, u8 doesn't exist in llvm
19:00 jekstrand: wha?
19:01 jenatali: LLVM ints are sign-less
19:01 jekstrand: Oh, right
19:01 jekstrand: LLVM is half-typed
19:03 karolherbst: *sigh*...
19:03 karolherbst:had his fill of llvm for the entire month already
19:03 karolherbst: one would thing that a "git grep llvm.printf.fmts" would give you anything...
19:03 karolherbst: guess what
19:05 karolherbst: jenatali: right now I don't see any reason why I should deal with whatever format is used inside !6040
19:05 jenatali: karolherbst: Hooray
19:05 karolherbst: I'd use whatever llvm gives us and what the CL specification is more or less requiring us to do?
19:06 karolherbst: I assume the specification also tells us what the format will be?
19:06 jekstrand: Is there an LLVM flag I should be passing to get it to not give me strange types?
19:06 jenatali: They're really not that different
19:06 karolherbst: jekstrand: the tripple?
19:06 jenatali: jekstrand: What target triple are you using?
19:06 jekstrand: I'm sure it'll give me something even more horrible if I do
19:06 jekstrand: I don't know. Whatever clover uses
19:06 jenatali: I'd expect spir to give you something sane
19:06 karolherbst: what clover :p
19:06 jenatali: Heh good point
19:06 mareko: jekstrand: https://github.com/marekolsak/marek-build/blob/master/llvm_config_x86_64-linux-gnu.cfg
19:06 jenatali: LLVM clover would use AMD's triple
19:07 jekstrand: karolherbst: Upstream plus two patches neither of which changes the flag.
19:07 mareko: jekstrand: https://github.com/marekolsak/marek-build/blob/master/conf_mesa.sh
19:07 karolherbst: jekstrand: no.. I meant what tripple :D
19:07 karolherbst: clover uses multiple
19:07 jekstrand: mareko: Thanks! I got it to build already, though. Wasn't nearly as painful as karolherbst made it sound.
19:07 jekstrand: karolherbst: I don't know
19:07 karolherbst: mhh
19:07 jekstrand: karolherbst: I haven't touched that
19:07 anholt: anyone have any idea how to convince the kernel to use a specific linker? https://gitlab.freedesktop.org/anholt/mesa/-/jobs/4090629 (trying to use HOSTLDFLAGS to get back to bfd)
19:08 karolherbst: ehhh wait..
19:08 karolherbst: the translator chooses for us
19:08 jekstrand: Clearly, it's choosing the wrong one. :P
19:08 karolherbst: yeah.. let me check...
19:09 jenatali: jekstrand: If you're planning to send it to the converter, you want spir
19:09 jenatali: or spir64
19:09 karolherbst: mhh.. we use -spir-unknown-unknown or -spir64-unknown-unknown
19:09 karolherbst: I somehow thing that's also totally busted
19:10 jekstrand: It's entirely possible that my bleeding-edge LLVM is to blame. :)
19:10 jekstrand: But this is the closest I've gotten yet with this kernel
19:10 karolherbst: heh.. might be
19:12 karolherbst: jenatali: do you know where the printf stuff was specified?
19:12 jenatali: jekstrand: Just double-checked, SPIR's data layout doesn't have i128 in it, so if you're targeting that, you shouldn't get i128
19:12 jenatali: karolherbst: What do you mean? The CL C spec?
19:12 karolherbst: nope, the spirv bits
19:13 jenatali: https://www.khronos.org/registry/spir-v/specs/unified1/OpenCL.ExtendedInstructionSet.100.html#_a_id_misc_a_misc_instructions
19:13 karolherbst: jenatali: btw.. https://github.com/intel/compute-runtime/blob/master/shared/source/program/print_formatter.cpp
19:13 jekstrand: Well, the LLVM that I get with CLOVER_DEBUG=llvm says "spir64-unknown-unknown"
19:13 jekstrand: So I'm pretty sure that's the one I'm passing. :)
19:13 karolherbst: yeah....
19:13 karolherbst: but those types are legal, no?
19:13 karolherbst: well
19:13 jekstrand: Not in SPIR-V
19:13 karolherbst: except the 12 component vector
19:13 jekstrand: LLLVM, I think, has arbitrary-size integers
19:13 jekstrand: But i96 isn't a thing on any hardware
19:13 karolherbst: ohh, that as well
19:14 jenatali: karolherbst: 3.2 of https://www.khronos.org/registry/SPIR/specs/spir_spec-2.0.pdf doesn't have an i128
19:14 karolherbst: jekstrand: but the spirv lowering passes are ran as well, no?
19:15 karolherbst: jekstrand: but if you got llvm ir which doesn't translate to valid spir-v via "llvm-spirv' then that's a translator bug
19:15 jenatali: karolherbst: It has to be valid spir before it'll translate, or so I thought
19:15 jenatali: Which if it has i128, it isn't
19:15 karolherbst: jenatali: right.. but where is the format specified?
19:16 jenatali: The printf format? https://www.khronos.org/registry/OpenCL/specs/3.0-unified/html/OpenCL_C.html#printf
19:16 jenatali: More or less exactly the C spec, except for the differences in Differences between OpenCL C and C99 printf
19:16 karolherbst: jenatali: well.. the spir-v specification doesn't say what "format" should look like, does it?
19:17 karolherbst: strictly speaking
19:17 jenatali: karolherbst: No, but since it's a CL-specific extension opcode, the CL spec would be the source of truth, no?
19:17 karolherbst: yeah.. I guess so
19:17 karolherbst: and we get the original C string?
19:17 jenatali: Define "we get the original C string" :)
19:17 karolherbst: the one from the clc source :p
19:17 jenatali: It's embedded in the SPIR-V, yes
19:18 karolherbst: okay..
19:18 jenatali: It's passed as the arg to the instruction
19:18 karolherbst: so for whatever reason, clover with AMD gets something else and at this point we don't know where this is coming from
19:18 karolherbst: and it feels like the neo formatter also is based on the CLC format
19:18 jenatali: Yeah, they write a "format string index", and then they have a metadata node which maps the index to the string
19:19 jenatali: More or less what CLOn12 is doing too, except I write the pointer to the string, and then the runtime can map that pointer back to the string
19:19 karolherbst: yeah.. I think that's fair as it saves space, right?
19:20 jenatali: You could write the pointer if you use the lowering as-is, or you could add a lowering pass to index the strings, and switch the intrinsic's source to write the index instead
19:20 karolherbst: huh
19:20 karolherbst: what do you mean by pointer?
19:20 pmoreau: Which branch should I use to avoid things like “Invalid back or cross-edge in the CFG” or vtn_create_variable trying to set `var->var->constant_initializer` even though `var->var` is `NULL` (`var->mode == vtn_variable_mode_cross_workgroup`)? (Preferably one that has been rebased not too long ago against master)
19:20 karolherbst: pmoreau: the newest?
19:20 jenatali: I mean the pointer - as if you had taken the __constant char* and cast it to an integer and wrote it out to global memory
19:20 pmoreau: Which newest?
19:20 karolherbst: pmoreau: master
19:21 karolherbst: everything is pretty much master based now
19:21 pmoreau: Oh, it’s in master? :o
19:21 karolherbst: well.. except the structurizer and some other features :p
19:21 karolherbst: but the core stuff? sure
19:21 jekstrand: karolherbst: Yeah, the next question is how do I fix it?
19:21 pmoreau: I rebased like 2–3 days ago already
19:21 karolherbst: pmoreau: mhhh...
19:22 karolherbst: pmoreau: ohh, constant memory?
19:22 karolherbst: ehhh.. mhhh
19:22 karolherbst: don't use constant memory yet :d
19:22 karolherbst: that's not done.. totally missed that
19:22 jenatali: Hm... I don't remember having to patch anything for constant memory...
19:22 jenatali: At least not in vtn
19:22 pmoreau: I think it might be constant mem, yes…
19:22 karolherbst: jenatali: the problem is with constant arrays in the source code
19:22 pmoreau: Just trying to run basic tests from the CTS
19:23 karolherbst: and some magic and then it blows up
19:23 jenatali: Hm...
19:23 karolherbst: jenatali: or did the constant memory rework land already? I thought we were going to rework most of that?
19:23 karolherbst: like with the scratch memory and whatnot
19:23 karolherbst: or whatever we planned to do
19:24 pmoreau: `storage_class == SpvStorageClassUniformConstant` sounds like constant indeed
19:24 jenatali: I've got a series I still have to port over to add more memory modes that we want, but it didn't have anything to do with constant memory
19:24 karolherbst: jenatali: the issue is if constants are promoted to variables
19:24 karolherbst: and have a constant initializer
19:24 karolherbst: like you have a constant array and now indirectly access it based on inputs
19:25 jenatali: Sure, doesn't sound like that's the problem here though?
19:25 karolherbst: ever tried that?
19:25 pmoreau: Let me dig up the source for the one failing…
19:25 jekstrand: karolherbst: Any idea where to even start telling LLVM to not give me i128 for spir64?
19:25 karolherbst: jekstrand: no idea
19:26 jenatali: Yeah - we've got a lowering pass we're running to attempt to promote __constant vars to a different memory mode so we can use DXIL's "immediate constant buffer" functionality
19:26 jenatali: jekstrand: What's the source doing that's producing an i128?
19:26 jenatali: If you can share just the tiny snippets that are triggering that, I'd be interested to play around with some LLVM opts
19:27 jekstrand: jenatali: I've not tried to narrow it down yet. That may be a necessary step anyway.
19:28 pmoreau: jenatali: This would be the SPIR-V being problematic: https://hastebin.com/lipezukipe.pl
19:28 pmoreau: And the OpenCL C: https://hastebin.com/opupeyuxem.cpp
19:29 karolherbst: pmoreau: what... that sounds wrong :D
19:29 jenatali: pmoreau: Is this CL1.2? Or 2.x?
19:29 karolherbst: but that's probably even valid
19:29 karolherbst: jenatali: I can write you a 1.0 version :p
19:30 pmoreau: I would assume CL1.2 at least since it’s not tagged with a 2.x version in the CTS
19:30 karolherbst: but yeah, global constants are valid in pre 2.0
19:30 karolherbst: well.. as long as it's __constant
19:31 karolherbst: pmoreau, jenatali: didn't test but that should blow up as well: https://gist.github.com/karolherbst/cb553eba21b12097d7e544b51857f620
19:32 pmoreau: Mmh, I don’t think I have seen that one before: `src/compiler/nir/nir_lower_io.c:852: nir_ssa_def *build_explicit_io_load(nir_builder *, nir_intrinsic_instr *, nir_ssa_def *, nir_address_format, unsigned int): Assertion `addr_format_is_offset(addr_format)' failed.
19:32 pmoreau: `
19:32 jenatali: Hm, let me run this through CLOn12 and see why it works
19:32 karolherbst: pmoreau: your tree looks busted somehow?
19:33 karolherbst: or old
19:33 karolherbst: pmoreau: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6059/diffs
19:33 karolherbst: or maybe something broke it again?
19:34 karolherbst: I think your rebasing just went wrong
19:34 jenatali: Ah, I think I know the difference - we do actually use nir_var_mem_ubo for constants
19:34 karolherbst: yep
19:34 karolherbst: might be this as well
19:34 jenatali: case vtn_variable_mode_cross_workgroup:
19:34 jenatali: /* These don't need actual variables. */
19:34 pmoreau: Let me rebase again
19:35 karolherbst: jenatali: does this work? https://gist.github.com/karolherbst/cb553eba21b12097d7e544b51857f620 :p
19:35 jenatali: karolherbst: What we do is let vtn output ubo variables, then run a pass on ubos with constant initializers to see if they're trivially accessed, and if so, we switch them to shader_temp (just something out of the way)
19:36 jenatali: Anything that wasn't converted is then switched to ssbo like global mem
19:36 karolherbst: ahh
19:36 karolherbst: yeah.. I might have to get ubos working with clover then
19:36 karolherbst: maybe the vtn code is there yet to handle all of those nasty cases
19:36 karolherbst: uff..
19:36 karolherbst: I declared a generic pointer there
19:36 jenatali: It'll work, we just have to have the runtime populate the ssbos that end up being used for inline constants
19:37 karolherbst: mhhh
19:38 pmoreau: Looks like I actually did not rebase a few days ago to avoid rebase commits while !5038 is being reviewed, and so my last rebase was more like 3 weeks ago than 3 days ago. 🙃
19:39 karolherbst: ahhh
19:43 karolherbst: ohhh wow.. 5.9 will be a "fun" release
19:44 jekstrand: Ok, time to make a little test kernel to break LLVM
19:44 karolherbst: :)
19:49 pmoreau: karolherbst: I’m on top of “acf756a64fe android: panfrost: Redirect cmdstream includes through GenXML (v2)” now, but still getting that IO assert
19:50 jenatali: pmoreau: What memory type?
19:51 pmoreau: Let’s see
19:51 pmoreau: `nir_var_mem_shared`
19:51 jenatali: What address mode are you using for shared?
19:52 karolherbst: ohh
19:53 karolherbst: pmoreau: yeah.. we might need to do soemthign similiar for shared as with shader_in now :)
19:53 karolherbst: just add mem_shared to the nir_address_format_32bit_offset call
19:53 karolherbst: and I guess the other one will be mem_global
19:53 pmoreau: `address_format` is `nir_address_format_32bit_global` in this case
19:54 karolherbst: pmoreau: wait.. let me write it quickly :D
19:54 jenatali: I suspect you don't actually want a global address for shared - unless you want to do what jekstrand was talking about and add a "base" intrinsic
19:56 karolherbst: pmoreau: https://gitlab.freedesktop.org/karolherbst/mesa/-/commit/7d6037c8a9cefc15b3167bfe6ad54cb7fb8ae5da
19:56 karolherbst: ehhh.. I messed up
19:57 karolherbst: pmoreau: https://gitlab.freedesktop.org/karolherbst/mesa/-/commit/e076c384c6380f43c02658239cf4b4d6a1dac2c7 :D
19:57 karolherbst: now it should work I hope
19:57 jekstrand: jenatali: https://godbolt.org/z/rrYv7T
19:58 jekstrand: jenatali: It looks like it's correct on LLVM 10 but busted on trunk
19:58 jenatali: D:
19:58 karolherbst: :D
19:58 jekstrand:is *not* going to bisect LLVM
19:58 pmoreau: Thanks Karol, let me try that
19:59 pmoreau: Bisecting LLVM sounds like so much fun though :-D
19:59 karolherbst: *ugh*
19:59 agd5f: which drm-misc branch should I use for fixes right now?
19:59 jekstrand: On LLVM 10 it generates a simple extractelement
19:59 jekstrand: On LLVM 12 it wants to cast to get an 8-bit vector first
20:00 karolherbst: I am sure there is a good reason for it :p
20:00 karolherbst: pmoreau: once you verify it works I will create an MR
20:01 jenatali: jekstrand: Looks fine on trunk with -O1
20:01 agd5f: mlankhorst, still drm-misc-next-fixes or drm-misc-fixes at this point?
20:01 jenatali: And -O2
20:01 jenatali: No, sorry, -O2 breaks it
20:02 pmoreau: karolherbst: Test fails but it no longer asserts. 👍️
20:02 jekstrand: jenatali: So we should just use -O1 then?
20:02 jekstrand: :-P
20:02 jenatali: That's what I'm hearing :P
20:02 jekstrand: *sigh* Where does one file LLVM bugs?
20:03 jekstrand: Looks like -O0 works too
20:04 karolherbst: pmoreau: mhh.. what test?
20:04 pmoreau: min_max_local_mem_size
20:04 pmoreau: in test_api
20:04 jekstrand: Wait, why aren't we using -O0 all the time?
20:04 kisak: https://llvm.org/docs/HowToSubmitABug.html
20:04 karolherbst: jekstrand: I don't think we specify anything
20:04 karolherbst: but pmoreau would know
20:04 jekstrand: karolherbst: Should we?
20:04 karolherbst: I think?
20:05 jekstrand:tries adding -O0
20:05 pmoreau: Good question regarding optimisation level… I do not think we specify anything special?
20:05 jenatali: I don't think we do either
20:06 jenatali: Except disabling builtin memset because the translator doesn't support memset with variable sizes
20:06 karolherbst: pmoreau: yeah.. fails here as well
20:06 jekstrand: Yeah, making clover pass -O0 fixes it
20:07 jekstrand: Is that something we want to do?
20:07 jekstrand: I'm a bit unsure
20:07 jenatali: I'm unsure as well
20:07 jekstrand: On the one hand, LLVM optimizing some stuff might be nice
20:07 jenatali: Seems like an LLVM bug
20:07 karolherbst: jekstrand: it can make things only faster :p
20:07 jekstrand: On the other hand, it seems to mess stuff up
20:07 jenatali: -O2 shouldn't cause it to use types that aren't in the target data layout
20:07 jekstrand: karolherbst: Yeah, I'm sure all those casts will help our runtime perf
20:08 karolherbst: jekstrand: I was refering more to compilation speed though :D
20:08 karolherbst: but yeah
20:08 karolherbst: I guess letting llvm optimize less let us know more
20:09 karolherbst: and llvm tends to be suboptimal for optimizations towards GPUs anyway
20:09 karolherbst: there was weird stuff I was seeing llvm doing from time to time
20:10 karolherbst: pmoreau: ehh.. "ld u32 $r4 s[$r0d+0x0]" I guess we mess up
20:10 jekstrand: Yeah, I've seen a lot of LLVM weirdness. Lots of crazy casts we don't want, for instance.
20:10 jekstrand: jenatali and I found a fun one
20:10 pmoreau: “FAILED 6 of 68 tests.” not too bad!
20:10 karolherbst: jenatali: it seems like nir is producing some wrong stuff
20:10 karolherbst: using 64 bit pointers still
20:11 jenatali: "some wrong stuff"?
20:11 jenatali: Want to elaborate?
20:11 karolherbst: let's see
20:11 pmoreau: karolherbst: Is that on Tesla or later gen?
20:11 karolherbst: I will print some nirs
20:11 karolherbst: pmoreau: pascal
20:12 jenatali: pmoreau: Out of curiosity, how did your results for the arg_info test look? (I think that's the name)
20:12 karolherbst: jenatali: before and after lowering mem_shared https://gist.githubusercontent.com/karolherbst/b5a9669bd3db23bbc808ceab0a4334ed/raw/a6606d6ca293ab5dfeb24520b66259dff26036b8/gistfile1.txt
20:12 karolherbst: jekstrand: ^^ that looks wrong, doesn't it?
20:13 jenatali: karolherbst: What's the C source?
20:13 karolherbst: and I am using nir_address_format_32bit_offset
20:13 karolherbst: jenatali: test_api min_max_local_mem_size
20:13 pmoreau: jenatali: “get_kernel_arg_info”? No idea: those didn’t run as those are 1.2 but I’m running 1.1.
20:13 jenatali: pmoreau: Ah, got it :)
20:14 jenatali: karolherbst: What do you think looks wrong?
20:14 karolherbst: 64 bit pointers
20:14 jenatali: Ahhhh
20:14 jenatali: Yeah
20:14 jenatali: nir_lower_explicit_io isn't smart enough to actually change the bit size
20:14 karolherbst: ahh, I see
20:15 jenatali: You'll need my address modes rework if you want to use 32bit offsets
20:15 pmoreau: jenatali: I’m working on annotating tests in the CTS for OpenCL 1.0 and 1.1, as currently the CTS assumes you are running at least 1.2 so tests requiring 1.2 (or 1.1) features aren’t version gated. WIP is at https://github.com/pierremoreau/OpenCL-CTS/commits/support_opencl_1.0_and_1.1
20:15 karolherbst: pmoreau: I am sure it fails due to some barrier stuff or so
20:15 pmoreau: Could be the barrier
20:15 jenatali: pmoreau: Ah cool :) we finally got access to the Khronos certification site, and they recommend using old versions of the CTS if you want to certify against 1.0 or 1.1
20:16 jenatali: Would be nice to have 1.0 and 1.1 supported in the main repo though
20:16 pmoreau: I had a look at the 1.0 and 1.1 CTS (I have the ZIP somewhere on a computer), but IIRC their build system was super flaky.
20:17 jenatali: karolherbst: I still need to port this MR over to the upstream repo: https://gitlab.freedesktop.org/kusma/mesa/-/merge_requests/83
20:17 jenatali: This one's going to be hairy and maybe a little contentious - I doubt I'll be able to get to it today but hopefully early next week
20:17 karolherbst: pmoreau: did you check dmesg? :p
20:18 jenatali: Oh, actually maybe this won't be too bad, the stuff that touched core nir/vtn is smaller than I remember
20:18 pmoreau: Especially now that all version >=1.2 have been merged into a single branch, that it has a versioning system, better build setup, receiving a bunch of updates and fixes with incoming 3.0 support, it would be great to have 1.0 and 1.1 in it too rather than relying on the support old archives.
20:18 karolherbst: pmoreau: somehow we run out of bounds and I might know why
20:19 pmoreau: karolherbst: Lot’s of red text in there :-D
20:19 karolherbst: yeah.. I know what it is
20:19 karolherbst: pmoreau: SHADER_LOCAL_MEMORY_LOW_SIZE : 0x0 :p
20:20 karolherbst: anyway.. I will fix it
20:20 pmoreau: “gr: TRAP_PROP - TP 0 - CUDA_FAULT - Global write fault at address 00815659a0”
20:20 karolherbst: heh...
20:21 karolherbst: jenatali: where are you extracting from how much shared memory is required?
20:21 karolherbst: the nir shader_info should tell, no?
20:21 jenatali: pmoreau: The reason I asked about that test btw, is if/when you get around to trying it, you'll hit https://github.com/KhronosGroup/SPIRV-LLVM-Translator/issues/599, assuming you try to get that data from SPIR-V
20:21 karolherbst: "nir->info.cs.shared_size"
20:21 jenatali: karolherbst: We use lower_vars_to_explicit_types, which updates the shader_info, yeah
20:21 karolherbst: mhhh
20:22 karolherbst: that is 0 for me
20:22 karolherbst: jenatali: "shared-size: 0"
20:22 pmoreau: jenatali: Ah, I see. Are you trying using EdB’s MR?
20:22 jenatali: Are you running lower_vars_to_explicit_types on shared?
20:22 karolherbst: uhh.. no :/
20:22 jekstrand: You should be
20:22 karolherbst: how should I call it?
20:22 jenatali: pmoreau: Hm? Which MR?
20:22 jekstrand:is now getting his kernel successfully into spirv_to_nir
20:22 pmoreau: jenatali: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4974
20:22 jenatali: jekstrand: Congrats!
20:23 jekstrand: Time to implement SpvStorageClassGeneric!
20:23 jenatali: pmoreau: No, CLOn12 is currently using a different (but very similar) CLC frontend, with a completely separate CL runtime implementation
20:23 karolherbst: jekstrand: have fun I guess
20:23 jenatali: karolherbst: https://gitlab.freedesktop.org/kusma/mesa/-/blob/msclc-d3d12/src/microsoft/clc/clc_compiler.c#L1249
20:23 pmoreau: Oh right, I always forget you are not going through clover
20:24 karolherbst: jenatali: thanks
20:24 jenatali: Yeah - pros and cons
20:24 karolherbst: jenatali: only on those modes?
20:24 karolherbst: not the others?
20:25 jenatali: karolherbst: Yeah, that pass doesn't support other modes
20:25 karolherbst: ahh
20:25 jenatali: And for function temp, it'll update your scratch space used
20:26 karolherbst: ahh, cool
20:26 karolherbst: ehh. no glsl_get_cl_type_size_align globally available :/
20:27 pmoreau: Let’s see how things go for nv50 on test_basic…
20:27 karolherbst: jenatali: didn't we add the helper for everybody in mesa or isn't it there yet?
20:28 jenatali: karolherbst: Probably yet another patch on my backlog to port over :D
20:28 karolherbst: yeah.. let me find it
20:28 jenatali: https://gitlab.freedesktop.org/kusma/mesa/-/commit/f55f13a01feb7216f45e47cb06337235f0dd247f in our tree
20:28 jenatali: If it's not in Mesa feel free to cherry-pick
20:29 karolherbst: c++ is anoying with enums
20:29 jekstrand: jenatali: What happens if someone sticks a bool in a struct that's shared with the CPU?
20:30 jekstrand: jenatali: If there's no defined bit size, that seems tricky
20:30 jenatali: jekstrand: Yeah, I think it's literally just undefined
20:30 karolherbst: why would it be undefined?
20:30 jenatali: https://www.khronos.org/registry/OpenCL/specs/3.0-unified/html/OpenCL_C.html#built-in-scalar-data-types
20:30 jenatali: Type in OpenCL Language
20:30 jenatali: API type for application
20:30 jenatali: bool
20:30 jenatali: n/a
20:30 jenatali: char
20:30 jenatali: cl_char
20:31 karolherbst: ahh
20:32 pmoreau: “Invalid back or cross-edge in the CFG” I guess I need the structurizer for that?
20:32 karolherbst: jenatali: still 0 .. :/
20:32 jekstrand: pmoreau: Sounds like
20:32 jenatali: karolherbst: vtn should be emitting shared variables - are you removing them at some point?
20:32 karolherbst: ohh, I am stupid
20:32 karolherbst: jenatali: those are kernel args :D
20:33 jenatali: Ahhhh, yeah
20:33 karolherbst: yeah.. I think I am doing something dumb somehwere.. mhh
20:33 pmoreau: karolherbst, jekstrand Which branch should I use against master? (to get the structurizer)
20:33 jenatali: karolherbst: We manually update shared_size to account for those: https://gitlab.freedesktop.org/kusma/mesa/-/blob/msclc-d3d12/src/microsoft/clc/clc_compiler.c#L1326
20:33 jenatali: Since the size comes from the API
20:34 karolherbst: pmoreau: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2401
20:34 jenatali: karolherbst: One more reason to just go ahead and merge it ;)
20:34 karolherbst: yeah.. I think we can now?
20:34 karolherbst: jekstrand: mind if I tell marge?
20:34 jekstrand: karolherbst: For the structurizer? Do it!
20:35 karolherbst: somehow I am involved in too many MRs with way too many comments :D
20:35 pmoreau: karolherbst: Thanks!
20:36 karolherbst: that's like the 5th now above 50 comments
20:36 airlied: karolherbst: I want to move the amd clover to the same model as everyone else once we get a bit further
20:36 airlied: I wrote the initial patches for it before
20:37 karolherbst: cool
20:37 karolherbst: I'd love to just burn down the entire llvm path
20:37 karolherbst: I just fear that not everybody will see it this way
20:38 jenatali: karolherbst: I keep porting over patches that I think are going to be trivial/non-controversial, and then jekstrand keeps finding good reasons that they're more complicated than I thought
20:38 karolherbst: :)
20:39 jekstrand: jenatali: Hey, that's not quite fair. There's at least one patch in there that I told you you could cut down to 1/3 of the original size. :-P
20:39 jenatali: Heh, yeah, that's true :P
20:39 jekstrand: jenatali: As far as bools go, I think the easiest thing for now is to make them 32-bit
20:40 jekstrand: jenatali: Yeah, that's massively bloated
20:40 jenatali: jekstrand: Yeah, alright
20:40 jekstrand: But it's what we have to do for all the GL/Vulkan stuff
20:40 jekstrand: I'd like to trim them down to 8 for CL
20:40 jekstrand: it's just work
20:40 karolherbst: mhh.. aren't bools 8 bit in C?
20:40 jenatali: Yep, makes sense
20:40 karolherbst: or was that undefined?
20:40 jekstrand: karolherbst: Undefined, I think.
20:40 karolherbst: ugh..
20:41 airlied: jenatali:, jekstrand, karolherbst : the spirv translator says O0 only code if I remember
20:41 jekstrand: Woah, my shader made it all the way to nir_lower_explicit_io. Progress!
20:41 jenatali: \o/
20:42 jekstrand: airlied: Do you have a citation for that? I think I remember seeing it too; I just can't find it now
20:42 jenatali: jekstrand: The biggest problem with the long delay between making these patches and porting them over is I forget which tests hit which issues, so when I need to rework them, I don't know which tests to focus on
20:43 jekstrand: jenatali: :-(
20:43 jenatali: And the CTS is so huge that just re-running all of it isn't viable :)
20:46 airlied: I've got one commit with "llvm-spirv tool is supposed to be run with an non-optimized output of
20:46 airlied: clang, which contains a lot of constructions like:
20:47 jekstrand: airlied: I also found this https://github.com/KhronosGroup/SPIRV-LLVM-Translator/issues/203
20:48 jenatali: Cool... guess we should add -O0 as well
20:49 jenatali: jekstrand: Are vectors of bool legal in ni?
20:49 jenatali: nir*
20:50 jekstrand: jenatali: Yup
20:50 jenatali: Cool - looks like they're not in CL
20:50 jenatali: Not that I think that really changes anything, I was just curious
20:52 jenatali: Btw, did you have thoughts about what to do with overlap between nir_lower_alu and nir_lower_int64 both having implementations of mul_high lowering?
20:52 jekstrand: jenatali: Use whichever one you want?
20:52 airlied: jekstrand: ah yes I think that issue was raised when I looked into it before
20:53 jenatali: The one in lower_alu doesn't support anything other than i32/u32 without https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6313/diffs?commit_id=d6fc4188bd9b9a0628389460e947085bd6dd26f5
20:53 jekstrand: jenatali: Right. My inclination is to do int64 lowering in lower_int64
20:53 jenatali: But if I didn't patch that, then running lower_alu before lower_int64 just explodes
20:53 jenatali: Guess I could just tweak it to leave it alone for 64bit
20:54 jekstrand: jenatali: But it's kind-of a weird case because you may have int64 hardware but not imul64_high, you might want it.
20:54 pmoreau: karolherbst: I’m getting some “[Error] At word No.80: "Block 17[%for_cond_cleanup] appears in the binary before its dominator 18[%for_body]” for basic/vload_private.
20:54 jekstrand: I don't know
20:54 jekstrand: It's weird
20:54 jenatali: Agreed
20:54 karolherbst: pmoreau: that's spir-val, right?
20:54 pmoreau: Given the formatting, probably
20:56 jenatali: jekstrand: Guess I'll let that stew for a bit - let me know if you end up forming a concrete opinion on a course of action, otherwise I'll probably just leave that commit, the churn isn't too bad
20:57 jekstrand: jenatali: I'm kind-of inclined to not add stuff to lower_alu for that
20:57 karolherbst: pmoreau: found the shared mem issue :)
20:57 pmoreau: 👍️
20:57 karolherbst: "PASSED test."
20:57 jekstrand: jenatali: Primarily because the mechanism in lower_int64 seems more robust in a sense.
20:58 jekstrand: jenatali: It basically generates a full arbitrary-size multiplication and then lets dead code clean up.
20:58 karolherbst: mhhh
20:58 jenatali: Fair enough, let me see if I can rework it so that lower_alu leaves it alone for 64bit rather than trying to generate mixed 32bit/64bit alus
20:59 jenatali: That might end up simplifying that series quite a bit
21:00 jekstrand: jenatali: One option would be to tweak lower_mul_high64 to a helper which takes two arrays of values and produces an array of values. It should already be general enough.
21:00 jekstrand: jenatali: And then we can call it from lower_alu
21:00 jekstrand: But maybe that's over-kill?
21:00 jenatali: Sounds like overkill to me
21:00 jekstrand: Ok
21:01 jekstrand: What you did is probably fine too TBH
21:01 pmoreau: “error: glsl_type_is_array(parent->type) || glsl_type_is_matrix(parent->type) (../../../src/mesa_clover_series/src/compiler/nir/nir_validate.c:465)” `parent->type` is `glsl_type::_u64vec2_type`
21:01 jekstrand: It doesn't look like you really changed it all that much.
21:01 jenatali: Nope, just made it bit-size-generic
21:01 karolherbst: pmoreau: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6325
21:02 jenatali: pmoreau: Is that array-indexing-of-vector?
21:02 jekstrand: jenatali: Then that one's probably fine. One day, it feels like we should clean it up but maybe not today.
21:02 jenatali: jekstrand: Mmkay, I'm good with not reworking/retesting :)
21:03 pmoreau: jenatali: nir_deref_type_array_wildcard of a vector, yes. and I see the code says “Most of NIR cannot handle array derefs on vectors”
21:03 pmoreau: karolherbst: Cool! Let me merge that in locally and try it out
21:05 jenatali: pmoreau: https://gitlab.freedesktop.org/kusma/mesa/-/merge_requests/116/diffs?commit_id=07516dd5c86deb401f4e3a4221dae2f9e82d6f7b - we've had a bunch of discussions here on what the right way is to bring this upstream
21:05 pmoreau: I see, thanks. Will try it out locally!
21:06 jenatali: Yeah, that should unblock you, depending on what lowering/optimization passes you need to run :P
21:09 pmoreau: It did, though spirv-val is still complaining about a block appearing before its dominator.
21:09 jenatali: We don't run spirv-val so I can't help you there :D
21:10 karolherbst: yeah.. vtn is quite forgiving
21:10 pmoreau: I have plenty to keep me busy, looking at the different fails even when spirv-val is happy. :-)
21:11 karolherbst: wat
21:11 karolherbst: marge didn't complain this time
21:12 karolherbst: so I guess the structurizer is merged now
21:12 jenatali: :D
21:13 karolherbst: if we continue like this, 20.3 could be a release with actually useful OpenCL :p
21:14 jenatali: libclc's still a big ticket item, but I think we're on a path to converge there
21:14 jenatali: And then images, though that is technically optional
21:14 karolherbst: yeah.. but at least now a lot of kernels will compile and run :p
21:14 pmoreau: jenatali: Is a nir_deref_type_ptr_as_array deref inside `hash_deref()` something you encountered before? I don’t see any changes in Erik’s repo so I guess not.
21:14 jenatali: pmoreau: I'm not familiar with hash_deref - what pass?
21:15 pmoreau: Called from nir_lower_locals_to_regs, let me look up the pass
21:16 pmoreau: The pass is called nir_lower_locals_to_regs
21:16 jenatali: I don't think we use that one, we use temp to scratch instead
21:16 jenatali: nir_lower_explicit_io on function_temp creates load_scratch/store_scratch ops
21:17 karolherbst: yeah..
21:17 karolherbst: I still want to wire that up in nouveau first...
21:17 karolherbst: let me write it down so I won't forget
21:18 jenatali: Since you can take address of a temp, store it into a buffer, load it through a completely different access chain, cast it back to a private pointer, and deref it
21:18 jenatali: No way you're gonna get that with locals_to_regs
21:22 jekstrand: I really doubt the author of this kernel knows just how mean they're being to drivers...
21:22 jenatali: Is this a situation where you can tell him? Or do you just have to make it work?
21:22 jekstrand: I can tell them
21:22 jekstrand: Generic pointers everywhere
21:22 jenatali: Yayyy
21:22 airlied: the SYCL cdoe gen does that anyways
21:22 jekstrand: Which is fine if we can somehow figure out in the compiler that they're actually global
21:22 jekstrand: But we can't
21:23 airlied: address spaces are hard :-P
21:23 jekstrand: So every load/store is going to be an if-ladder
21:23 jekstrand: It's going to be awesome
21:23 jenatali: Perf? Who needs it
21:23 karolherbst: more if ladders
21:24 karolherbst: jekstrand: why can't we figure it out?
21:24 jekstrand: Because they load them from other data structures. :)
21:24 karolherbst: ehhhhhhh
21:24 karolherbst: "fun"
21:25 karolherbst: I think to remember the main reason they got added was to allow one function to work for all address spaces :p
21:25 jenatali: Can you have a generic pointer as a kernel input?
21:25 karolherbst: jenatali: if you know how, yes
21:25 jenatali:sighs
21:26 karolherbst: "struct { int* ptr; } arg;" :p
21:26 jenatali: Oh sure, but you can't pass a cl_mem to that at the API
21:26 karolherbst: I think technically the runtime is supposed to disallow that...
21:26 karolherbst: jenatali: that's a struct passed by value :p
21:26 karolherbst: but yeah
21:27 karolherbst: you can't have a generic pointer as an arg
21:27 jenatali: Well, I guess you can, it just wouldn't work - the cl_mem's pointer would show up in the `ptr` value
21:27 karolherbst: jenatali: with SVM that might even work if you know how it's encoded at runtime :p
21:27 karolherbst: I think I or pmoreau even filed some spec bugs
21:28 pmoreau: I think that issue is still open and hasn’t received any love
21:28 karolherbst: yeah well...
21:29 pmoreau: https://github.com/KhronosGroup/OpenCL-Docs/issues/10
21:29 pmoreau: https://github.com/KhronosGroup/OpenCL-Docs/issues/11
21:30 karolherbst:should stop filing bugs against OpenCL-Docs :p
21:31 karolherbst: or maybe more?
21:31 karolherbst: dunno
21:31 pmoreau: They get addressed… eventually
21:31 karolherbst: if those two were the only ones I've filed :D
21:31 jekstrand: stupid pointer arithmetic....
21:32 karolherbst: :D
21:32 pmoreau: 3 open, 5 closed for me ;-)
21:32 karolherbst: you mean C developers write CLC kernels like they write C? unbelievable!
21:33 karolherbst: jekstrand: the (char*) casting type of arithmetic?
21:33 karolherbst: or the container_of kind?
21:34 karolherbst:has some kernels using container_of and linked list for fun
21:34 karolherbst: you know, sharing list.h between the CPU and GPU will be the future
21:35 airlied: yeah once we have HMM, list.h all the way
21:35 airlied: what's a cache for, who knows
21:35 karolherbst: and set.h of course
21:37 jekstrand: Ugh.... clover lowers address spaces WAY too early
21:38 karolherbst: :/
21:38 jenatali: Hm?
21:38 jekstrand: It doesn't optimize basically at all
21:38 jekstrand: That's why I have all these generics
21:38 karolherbst: ahhh.. :/
21:38 karolherbst: and lowering address spaces is the last thing clover does :O
21:38 jekstrand: I mean, I'm mostly fine with that. I want to run the brw_nir optimizer on it
21:39 karolherbst: well.. there are a few more things, but that's all unrelated
21:39 karolherbst: mhhh
21:39 karolherbst: jekstrand: something special we sould run before lowering address spaces?
21:39 jekstrand: karolherbst: optimizations. :-P
21:39 karolherbst: I really tried to put the bare minimum so it doesn't suck as much
21:39 karolherbst: :D
21:40 karolherbst: right...
21:40 jekstrand: karolherbst: Really, just let the back-end lower
21:40 jekstrand: And run its own optimization loop
21:40 karolherbst: yeah.. probably
21:40 jekstrand: At least, that's what I'd want in iris
21:40 karolherbst: but gallium already does its own thing and most backends are not prepared for that I think
21:40 jekstrand: I also want to specify my own address formats
21:40 jekstrand: So I can get if-ladders for generic
21:40 karolherbst: yeah.. mhh
21:41 karolherbst: I don't have any good solution for the nir optimization problem :p
21:43 jekstrand: karolherbst: Do we call into the gallium driver to create the shader prior to dispatch?
21:44 karolherbst: jekstrand: yes
21:44 karolherbst: for stupid reasons
21:44 jekstrand: karolherbst: Hrm... I'm not seeing it call into iris. :-/
21:44 karolherbst: huh?
21:44 karolherbst: ohh you mean before actually dispatching?
21:44 karolherbst: nope, we don't
21:44 jekstrand: I'm calling CreateKernel but never clSetKErnelArg or clEnqueue*
21:44 karolherbst: so you need to enqueue
21:45 karolherbst: the reason for that are quite silly though
21:45 karolherbst:should fix it
21:46 jekstrand: karolherbst: Is there a quick hack I can do?
21:46 jekstrand:is all about quick hacks today
21:46 karolherbst: nope
21:46 karolherbst: kernels have no context related to them
21:46 jekstrand: I guess I need to hack up my test program to actually set kernel args
21:47 karolherbst: and we need a context to let the backend actually compile
21:47 karolherbst: there was some work to allow compilations on a screen directly.. but ufff
21:47 karolherbst: or sharing between contexts or whatever
21:47 karolherbst: it's messy
21:47 jekstrand: yeah
21:47 karolherbst: we need to fix it anyway
21:48 karolherbst: for clGetKernelWorkGroupInfo
21:48 karolherbst: right now clGetKernelWorkGroupInfo is just broken
21:48 karolherbst: or was it something else?
21:49 pmoreau: I think it might be it
21:50 pmoreau: You can’t adapt the number of threads or block size returned by the driver based on the resources taken by the kernel.
21:50 karolherbst: right...
21:50 jekstrand: How big is a long in CL?
21:50 karolherbst: so this affects CL_​KERNEL_​WORK_​GROUP_​SIZE
21:50 jekstrand: 64bit?
21:50 karolherbst: jekstrand: yes
21:51 karolherbst: pmoreau: but I think internally we also always assume the worst case, right?
21:51 karolherbst: so smaller kernels are actually running slower than they could
21:51 karolherbst: or well.. with less threads than they could
21:51 karolherbst: something was there we could improve
21:52 pmoreau: No, IIRC we always assume the best case and return the max.
21:52 karolherbst: eh?
21:52 pmoreau: Let me see…
21:52 karolherbst: maybe it was just r600 doing that?
21:53 jekstrand: Seriously? Clover is throwing an exception for CL_SUCCESS
21:53 jekstrand: Wow
21:53 karolherbst: jekstrand: sure
21:53 jekstrand: Just wow
21:53 karolherbst: it throws exceptions for everything
21:53 jenatali: ...
21:54 karolherbst: clover is the prime example for "control flow through exceptions" :p
21:54 karolherbst: also, everything can throw exceptions
21:54 karolherbst: just in case you try to step and always get thrown somewhere
21:54 kisak: exception-al coding
21:54 karolherbst: "catch throw" helps
21:55 karolherbst: jekstrand: I wouldn't mind a clover 2.0 being nir/spirv only :p
21:55 karolherbst: and not a toy project trying out C++11 features :p
21:56 pmoreau: If it is clover 2.0, it would have to try out all the new C++20 features. ;-D
21:56 karolherbst: heh..
21:57 pmoreau: 2.0 -> 20
21:57 karolherbst: but I am not even sure it would actually take soo much time, I just have things I'd prefer to do instead
21:57 karolherbst: but if somebody else would come around, I'd just ack it
21:57 pmoreau: nvc0_scren.c returns unconditionally 1024 for PIPE_COMPUTE_CAP_MAX_THREADS_PER_BLOCK.
21:59 karolherbst: pmoreau: right...
21:59 karolherbst: so I guess that would break for more complex kernels? mhhh
21:59 karolherbst: or is that actually fine on hw?
21:59 pmoreau: Which is correct if a) you don’t use more than 32 regs per thread, and b) less than 64 bytes? per thread.
21:59 pmoreau: 64 bytes of shared mem
21:59 karolherbst: ehhh...
21:59 karolherbst: dunno?
22:00 karolherbst: pmoreau: why per thread though?
22:00 karolherbst: shared mem is.. shared
22:00 karolherbst: doesn't make sense to make it per thread
22:01 karolherbst: but yeah...
22:01 karolherbst: in case I hit such complex kernels... well
22:02 karolherbst: pmoreau: actually... it's a clover bug
22:02 pmoreau: Well yeah, but depending on shared mem is being used, you might see it as a per-thread resource.
22:02 karolherbst: reporting 1k for _MAX_ is fine :p
22:02 karolherbst: clover just always uses that
22:02 jekstrand: karolherbst: Hrm... clover thinks "unsigned long" is 4 bytes
22:03 karolherbst: huh? where?
22:03 pmoreau: Is “unsigned long” dependent on the number of address bits?
22:03 karolherbst: no
22:03 karolherbst: long is 64 bit
22:03 karolherbst: that's at least something CLC fixed over C :p
22:04 pmoreau: clover needs to be updated, sure, but so does the driver and the infrastructure to allow for querying limits on a per-kernel basis.
22:04 karolherbst: sure
22:04 karolherbst: not saying it's not a lot of work :p
22:06 karolherbst: jekstrand: in which case was it messed up though? kernel arg sizes are derived from the spirv
22:07 jenatali: Yeah, I'd expect clang to have gotten that right
22:07 jekstrand: Hrm... I think I might see what's going sideways
22:07 jekstrand: karolherbst: Yeah, it's probably messed up somehow
22:07 jekstrand: That or the SPIR-V is wrfong
22:07 jekstrand: *wrong
22:08 karolherbst: CLOVER_DEBUG=spirv CLOVER_DEBUG_FILE=out and then you get it dumped into out.spvasm
22:08 jekstrand: karolherbst: They're coming in from LLVM right
22:08 jekstrand: karolherbst: But it's messing up kernel args elsewhere too
22:09 karolherbst: ehh..
22:09 jekstrand: karolherbst: I think the problem is that it's not handling multiple entrypoints properly
22:09 karolherbst: mhhh
22:09 jekstrand: It thinks I have 14 inputs when I actually have 6
22:09 karolherbst: that would be.. strange
22:09 jekstrand: But the *other* entrypoint has 14 :)
22:09 karolherbst: I thought I didn't mess that up
22:09 karolherbst: :D
22:09 karolherbst: maybe I did
22:09 jekstrand:is very surprised no one noticed this
22:09 karolherbst:is not
22:10 jekstrand: Or maybe I'm doing something wrong. 🤷
22:10 karolherbst: jekstrand: I guess you do specify the correct kernel when creating it, right?
22:10 jekstrand: karolherbst: Pretty sure....
22:10 karolherbst: mind checking how spirv_to_nir gets called?
22:11 jekstrand: Yeah, spirv_to_nir is getting called with the correct entrypoint name
22:11 karolherbst: but uff...
22:11 karolherbst: jekstrand: we call it for every entry_point...
22:11 karolherbst: mhhh
22:11 karolherbst: maybe the indexing is messed up.. let's see
22:11 karolherbst: there was something stupid
22:12 jekstrand: Maybe it's seeing 6 uint64_ts and assuming they're 12 uint32_ts?
22:12 jenatali: Shouldn't be
22:12 karolherbst: I doubt it
22:12 karolherbst: jekstrand: mind giving me enough to reproduce?
22:12 karolherbst: then I can check myself
22:12 jekstrand: karolherbst: I don't know how much is enough
22:12 karolherbst: the kernel probably
22:12 karolherbst: well...
22:12 karolherbst: without the code
22:12 karolherbst: I guess just the declarations alone would be
22:13 jekstrand: Let me see if I can figure it out
22:13 karolherbst: and maybe the c code you used to run it
22:14 karolherbst: jekstrand: which one is the first? the one you use or the other one?
22:17 jekstrand: the one I use
22:18 jekstrand: I think I know what's going on......
22:18 jekstrand: create_module_from_spirv builds a single list of arguments
22:18 jekstrand: hrm... no
22:19 jenatali: I doubt the issue's going to be in vtn
22:19 karolherbst: that's clover :p
22:19 karolherbst: jekstrand: I am sure I tested multiple entry points, I am just not sure I tested them properl
22:19 karolherbst: y
22:19 jenatali: I'm sure I'm passing the CTS tests for them ;)
22:22 karolherbst: jekstrand: see that "auto msym = find(name_equals(kern.name()), m.syms)" line in core/kernel.cpp ?
22:22 jekstrand: karolherbst: Yup
22:23 karolherbst: that is responsible for extracting the correct kernel
22:23 karolherbst: I think...
22:23 karolherbst: might be the one two lines later
22:23 karolherbst: anyway
22:23 karolherbst: that's stuff is happening there
22:23 karolherbst: or is the issue that clSetKernelArg fails?
22:24 karolherbst: how are you calling clSetKernelArg? :D
22:25 jekstrand: karolherbst: Hrm...
22:26 jekstrand: drp! I am using the wrong one.
22:27 jenatali: :)
22:28 jekstrand: Uh... was there a glsl_types patch for decode_type_from_blob I'm missing?
22:28 jenatali: For what type?
22:28 jekstrand: Ugh... glsl_type_singleton_init_or_ref() isn't being called somewhere. :-(
22:28 jekstrand:blames iris_screen.c
22:28 jenatali: Ah
22:28 karolherbst: jekstrand: yep.. the driver has to call it...
22:29 jenatali: Was going to say, this is the only one I know of: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5242/diffs?commit_id=8cedfe6d097618d98adbf010e2a4dae5c585e39a
22:29 karolherbst: and we all know why and this makes me sad
22:30 jekstrand: aha!
22:30 jekstrand: Just need to call more nir_lower_io
22:32 anholt: jekstrand: does it feel reasonable to have nir_lower_explicit_io() set a base and range on UBO accesses? (we want to upload ubos as push constants and do indirection on regs instead of external memory access)
22:32 jekstrand: anholt: Probably.
22:33 jekstrand: anholt: It's just annoying because you can't always get the base/range in general
22:33 jekstrand: So we need some way to represent "I don't know"
22:33 anholt: I as going with 0,~0 as "dunno"
22:33 jekstrand: Yeah, that probably works
22:33 anholt: but I'd like to have at least a 0,ubosize worst case
22:33 krh: worst case it's the base/range of the array, no?
22:33 anholt: (anv got a couple of 0,~0s in it)
22:33 jekstrand: Vulkan variable pointers :)
22:33 jekstrand: You may not know anything
22:34 jekstrand: But that's not the common case
22:35 karolherbst: why are we the only ones having no solution for reg indirection :p
22:36 anholt: looks like the deref no longer points to a var by the time build_explicit_io_load() gets to it.
22:36 jekstrand: Because most of the time it's not actually that useful
22:36 jekstrand: anholt: It can handle that case, yes. But often the variable is available.
22:36 karolherbst: anyway.. I need sleep now :p
22:37 karolherbst: I am sure next week will be... "fun"
22:37 anholt: in the build, nir_deref_instr_get_variable() is giving me all NULLs in GLES3.functional.ubo
22:37 jekstrand: Well, would you look at that:
22:37 jekstrand: SIMD8 shader: 507 instructions. 2 loops. 39541 cycles. 35:40 spills:fills, 2 sends, scheduled with mode top-down. Promoted 0 constants. Compacted 8112 to 6320 bytes (22%)
22:37 karolherbst: :D
22:37 jekstrand:declares victory for the day \o/
22:37 jenatali: Nice!
22:38 jekstrand: anholt: Uh.... I would think you'd have them in GL.
22:39 anholt: looks like we get a cast in the way
22:40 jekstrand: But I've not looked at GL in a while. :)
22:40 jekstrand: anholt: Are you running the vectorizer?
22:40 anholt: This is just iris
22:40 karolherbst: jekstrand: I hope we didn't accidentally broke it?
22:40 jekstrand: anholt: Yeah, iris vectorizes IO
22:40 karolherbst: or I did?
22:41 karolherbst: ohh wait
22:41 karolherbst: nvm
22:41 jekstrand: My NIR patch for generic is surprisingly small so far....
22:44 karolherbst: jekstrand: I suspect the details will be annoying, like shaders doing weird casts or something...
22:44 karolherbst: anyway...
22:45 anholt: before explicit io I've got
22:45 anholt: vec2 32 ssa_21 = load_const (0x00000000 /* 0.000000 */, 0x00000000 /* 0.000000 */)
22:45 anholt: vec2 32 ssa_22 = deref_cast (vec4 *)ssa_21 (ubo vec4) /* ptr_stride=0 */
22:45 anholt: vec4 32 ssa_6 = intrinsic load_deref (ssa_22) (0) /* access=0 */
22:46 anholt: on a plain vec4 ubo
22:47 anholt: gl_nir_lower_buffers creates the cast.
22:56 anholt: not really seeing how to plumb a range through this from the place where we might know it (get_block_index_offset)
22:59 jenatali: jekstrand: I think I'm going to close the bools MR. I was curious what clang/spirv did, and sure enough, they're explicit about marking them as uchar rather than a bool type
23:00 jenatali: Confirmed for both externally-visible (__global) and internal (__private), the spirv already has them as uchar
23:00 jenatali: And after I reverted that patch I was trying to upstream, the test that was broken without it still passed... probably because I added copy_prop as part of reconciling with one of your vtn reworks
23:06 jekstrand: jenatali: Cool. It still seems like a thing that could cause us grief....
23:11 jenatali: Potentially. But I'd like to wait until I see it cause us grief (again) before figuring out the right way to solve it
23:11 jekstrand: sure
23:11 jekstrand: If LLVM gives us uchar, then we probably don't have a problem
23:16 airlied: jekstrand: nice!
23:34 airlied: karolherbst, jekstrand : \o/ for structrizer
23:40 jekstrand: jenatali: I think I'm going to scrape some of the nir_lower_io patches from your branch
23:52 jekstrand: jenatali: Didn't you have some patches somewhere to add a new address mode? I can't find it in kusma's tree
23:53 jekstrand: Maybe I'm looking in the wrong tree?
23:54 jekstrand: This MR in particular: https://gitlab.freedesktop.org/kusma/mesa/-/merge_requests/83