00:00 airlied: damn no I'm thinking about indefinite fences again. better go do something productive
00:00 jenatali: karolherbst: Programs belong to a context, and kernels as well, rather than being tied to a device
00:00 karolherbst: right.. but that's easy to fix
00:00 karolherbst: you can always compile per device
00:00 karolherbst: resource sharing is a problem
00:00 jenatali: Yeah
00:00 karolherbst: but.. a proper memory model can also deal with that
00:00 jenatali: If you were curious, here was my multi-device implementation: https://github.com/microsoft/OpenCLOn12/commit/97b635b1757ff7a19283b8deb7a2f226fe480771
00:00 jenatali: It's... ugly :P
00:00 karolherbst: just cross device atomics are "fun" and shit
00:01 karolherbst: atomics on SVM are even more fun
00:02 HdkR: Just need PCIe atomics :P
00:02 karolherbst: you make it sound so easy :D
00:02 jekstrand: SVM atomics are easy for us.
00:03 karolherbst: ...
00:03 jekstrand: It's shared atomics that are kind-of annoying
00:03 karolherbst: jekstrand: cross device SVM :D
00:03 karolherbst: you think your CPU and intel GPU are the only ones using the memory, but then I have this nice nvidia GPU here using it as well :p
00:04 jekstrand: karolherbst: Yeah, it gets massively harder with discrete
00:04 karolherbst: you don't even have the resource sharing issue as it "just works"
00:04 jekstrand: With integrated, they both talk to the same cache. Done.
00:04 karolherbst: jekstrand: you know what's more fun?
00:04 karolherbst: we mmap a file and pass it to the GPUs :p
00:05 jekstrand: karolherbst: Why not?
00:05 karolherbst: yeah, and do atomics
00:06 jekstrand: Once all this persistent ram stuff becomes a thing, I look forward to mmap and then texture from it.
00:06 karolherbst: ehhh
00:06 karolherbst: I mean.. now
00:06 karolherbst: with.. you know,your normal ssd and shit
00:06 karolherbst: I think you could even SVM device memory...
00:06 karolherbst: mhhh
00:07 jekstrand: Yeah.... that gets terrible
00:07 jekstrand: fast
00:07 karolherbst: reverse engineering devices with AI/ML+SVM?
00:07 karolherbst: :D
00:07 karolherbst: anyway.. SVM + files should already work these days I think?
00:08 karolherbst: never tried though
00:08 airlied: don't think it does yet
00:08 airlied: it's on glisse list
00:08 karolherbst: yeah.. maybe it's still WIP?
00:08 airlied: I think it's all he's been working on for 8 months now :-P
00:08 karolherbst: but I guess it does make sense for GPU accelerated databases and shit
00:08 karolherbst: I guess
00:09 jenatali: What a world we live in, where GPU-accelerated database can actually be a thing
00:09 airlied:hacks on a spirv-llvm-translator spec file for fedora
00:10 karolherbst: jenatali: "can"?
00:10 airlied: one more core for oracle to sell you a licenese for :-P
00:10 jenatali: :P
00:10 karolherbst: you make it sound like it doesn't exist yet :D
00:11 bnieuwenhuizen: airlied: or many cores depending on how you count ;)
00:11 karolherbst: well.. we should then start counting AVX widths as cores as well :p
00:12 bnieuwenhuizen: counting CU/SMs falls short of that
00:12 karolherbst: I am sure intel disagrees
00:27 Kayden: jekstrand: read back up a bit. one alternative to caps in a lot of situations is nir_shader_compiler_options. that tends to be a lot less boilerplate and rubbish.
00:28 Kayden: I have some patches kicking around to make iris use finalize_nir, and I should, but I ran into a bunch of issues with them
00:28 Kayden: it would be good if we did but also I kind of share karolherbst's "ugh" sentiment. :)
00:28 Kayden: (it's unfortunately a bit more tricky, with finalize happening in a couple places and variants and...stuff.)
00:28 Kayden: (it always is!)
00:31 Kayden: jekstrand: also, happy to review an MR, but didn't see which one it was
00:32 airlied:found I had to be careful with finalize nir and stuff getting called twice with llvmpipe
01:45 karolherbst: Kayden: clover can't call into finalize_nir
01:45 karolherbst: so...
01:45 karolherbst: it's all annoying anyway
04:56 airlied: MrCooper, anholt : fyi did a vallium deqp-vk CI run for the lulz, https://gitlab.freedesktop.org/airlied/mesa/-/jobs/4154282
04:57 airlied: it's probably going to just make it under the hour :-P
05:39 tomeu: airlied: guess then that CI_NODE_INDEX: 1 and CI_NODE_TOTAL: 10 would give some nice coverage?
06:20 airlied: tomeu: yes i might see how that goes tmrw
06:21 airlied: need to fix some crashes first
06:52 tomeu: agd5f_: was looking at upgrading the kernel we use in mesa ci for some devices including an amd chromebook, and amdgpu has stopped probing: https://gitlab.freedesktop.org/tomeu/mesa/-/jobs/4155886
06:53 tomeu: the problem seems to be related to the platform device acp_audio_dma.0.auto not having been registered
06:53 airlied: tomeu: turned on the acp config option?
06:54 tomeu: airlied: I have now CONFIG_SND_SOC_AMD_ACP, though it wasn't needed before
06:54 tomeu: but no luck
06:55 tomeu: hmm, maybe disabling CONFIG_DRM_AMD_ACP would avoid that code being run
06:55 tomeu: I don't expect to need sound to test mesa :p
06:56 pmoreau: airlied: Does Nouveau benefit from the new SPIR-V target in libclc? I’d assume for us that OpenCL functions gets either lowered in NIR or in the driver.
06:58 pq: emersion, did you notice that Sean sent "drm/trace: Mirror DRM debug logs to tracefs" again? That's the flight recorder.
07:24 MrCooper: airlied: cool; FWIW, I understand fetching from external Git repos is essentially "free" in CI, e.g. the Windows image build clones and builds LLVM
07:35 pmoreau: airlied: Nevermind, I finished reading the backlog and got my answers. :-)
07:35 airlied: pmoreau: yes its for all spirv consumers
07:35 airlied: pmoreau: ah cool
07:38 airlied: tzimmermann: did you knkw ast have an out of tree fork
07:38 airlied: just found it via a rh bug
07:39 tzimmermann: airlied, we do have a bugreport that indicates that apseed maintains their own driver of some kind
07:39 tzimmermann: that's all i known about it
07:39 airlied: trying to get ast to upatream it
07:39 airlied: its not a massive delta
07:40 tzimmermann: airlied, is your bug report about firmware updates?
07:41 airlied: tzimmermann: hehe yes
07:41 airlied: https://www.aspeedtech.com/support.php
07:41 airlied: is where the oackage is
07:41 airlied: i tried backporting some obviius bits but didbt help yet
07:42 tzimmermann: airlied, it's probably the same thing. i made a workaround for that bug. but then the display hangs during graphical login
07:47 mlankhorst: airlied: maintainer ack on merging https://patchwork.freedesktop.org/patch/362898/?series=76431&rev=1 through drm-misc?
07:49 tzimmermann: airlied, ast is now atomic. the aspeed driver is fairly outdated
07:52 emersion: pq: ah, no, thanks for the heads-up!
07:54 airlied: tzimmermann: yup its just the hw programming bitsof interwst
08:08 icecream95: io_uring makes it possible to run a web server completely in a compute shader. Writing CGI scripts in GLSL should be fun :P
08:11 MrCooper: just a matter of time until Linux runs on the GPU
08:14 mlankhorst: that sounds awful enough that it would be a fun project..
09:26 Venemo: freaking gcc is getting super annoying with -Wstringop-overflow
09:30 ccr: no, no .. you need to implement javascript or wasm on top of glsl first, or so. turtles all the way down.
10:36 karolherbst: jekstrand: ehh.. moved lowering of function_temp into nouveau and nir passes managed to optimize away all store_scratch calls, but left load_scratch :)
10:36 karolherbst: that can't be right :/
10:36 karolherbst: but I see the store_derefs before opts.. let's see
10:36 cwabbott: jekstrand: do you want to make any comments before I land !5719? It's already been reviewed
10:39 karolherbst: mhh.. nir_lower_vars_to_ssa mhh
10:40 karolherbst: but it only breaks if the vec type reaches 128 bytes...
10:40 karolherbst: works for uchar8, ushort4, uint3, etc...
12:10 karolherbst: ehhh...
12:10 karolherbst: I see what's going on :/
13:44 karolherbst: jenatali: got your offset stuff working :)
13:45 jenatali: karolherbst: Awesome :)
13:45 karolherbst: the patch is quite hacky though :D
13:46 karolherbst: https://gitlab.freedesktop.org/karolherbst/mesa/-/commit/4d529fdd6aa5647375d2d0a4e41eafd574c48649
13:46 karolherbst: jenatali: I don't know what to think of drivers having to handle nir_intrinsic_load_work_group_id_zero_base
13:47 karolherbst: also, has_base_work_group_id has to be true otherwise you end up with gloabl_invocation_id_zero_base
13:47 karolherbst: which.. is a bit annoying
13:47 jenatali: That might just be a bug
13:47 karolherbst: but constant folding 0 is also fine
13:47 karolherbst: clover just forces 32 bit offsets, so I need to get this worked out
13:48 karolherbst: and it treats the offset as 3 scalar args, not a vector
13:48 karolherbst: :D
13:48 karolherbst: which.. makes sense
13:48 jenatali: Yeah
13:48 karolherbst: because then you don't have to deal with this stupid alignment
13:48 jenatali: Let me know if there's changes you think I should make to the MR, or honestly feel free to push changes to it
13:49 karolherbst: I think has_base_work_group_id = false, shouldn't end up with gloabl_invocation_id_zero_base
13:50 karolherbst: let's see
13:50 jenatali: Why not? They're unrelated, aren't they?
13:50 jenatali: Did you mean work_group_id_zero_base?
13:50 karolherbst: drivers can't handle that :p
13:51 karolherbst: no, gloabl_invocation_id_zero_base
13:51 karolherbst: wait... I can show you the nir
13:52 jenatali: Sure - like I said, might just be a bug in the lowering. I'd only expect the global_invocation_id_zero_base to show up if you set has_base_global_invocation_id, and don't set lower_cs_global_id_from_local
13:53 karolherbst: .lower_cs_global_id_from_local is true
13:53 karolherbst: and has_base_global_invocation_id as well
13:53 karolherbst: but I still end up with vec3 64 ssa_35 = intrinsic load_global_invocation_id_zero_base () () :)
13:54 jenatali: I would've expected it to be lowered by https://gitlab.freedesktop.org/karolherbst/mesa/-/commit/3ac0d496a688f54f06dcff11429f9c8fb73d3f68#99421c3761cfc8b192f107f54dbc376b2faf3e58_336_335
13:54 karolherbst: mhhh
13:55 karolherbst: let's see
13:57 karolherbst: ehh..
13:57 karolherbst: lower_cs_global_id_from_local is false... but I am sure nouveau sets it to true...
13:57 karolherbst: annoying
13:57 karolherbst: ohh wait...
13:58 karolherbst: jenatali: ahh, that's a new one and you forgot to set it in one file :)
13:59 jenatali: Ahhh my bad
14:00 karolherbst: jenatali: okay.. I will fix it locally and push to your branch with the clover changes once I am happy with them
14:00 jenatali: karolherbst: Sounds great, thanks!
14:14 karolherbst: mhhh
14:19 karolherbst: ahhhh, this is just so broken :/
14:20 jenatali: ?
14:20 karolherbst: clover I mean.. mhh this is a bit odd...
14:22 karolherbst: heh...
14:22 karolherbst: ahhh.. I see how it works now
14:22 karolherbst: mhhh
14:22 karolherbst: annoying
14:23 karolherbst: oh well..
14:23 karolherbst: jenatali: in CL all devices have to support three dimensions, no?
14:23 jenatali: karolherbst: Correct
14:23 karolherbst: mhhh
14:23 jenatali: Unless you're embedded
14:24 karolherbst: okay.. but if you have a device which only supports 2, but you specifyc a three dimensional offset
14:24 karolherbst: what should happen?
14:24 jenatali: Uhhh
14:25 jenatali: Set a max work group size of 1 in the third dimension?
14:25 karolherbst: well.. but you still specify your three dimensional global_work_offset
14:25 karolherbst: ohh wait..
14:25 karolherbst: mhhh
14:25 karolherbst: it is capped with work_dim
14:26 karolherbst: so yes.. even if you'd specify more dimensions, only up to work_dim are relevant
14:27 jenatali: Yeah
14:27 karolherbst: mhhh...
14:27 jenatali: You do have to support having a work_dim of 3 though
14:27 karolherbst: sure
14:27 karolherbst: but clover handles this case still
14:27 karolherbst: so it only adds two 32 bit values in case your device only supports two dimensions
14:28 karolherbst: which is fine, we would just have to expose such devices as embedded or whatever
14:28 jenatali: Yeah, I just double-checked, they have to be CUSTOM
14:28 jenatali: I.e. not a GPU
14:28 karolherbst: just expose your GPU as CUSTOM then :p
14:28 karolherbst: :D
14:29 karolherbst: anyway.. I try to consider this use case still
14:29 jenatali: karolherbst: What hardware do you care about that can only support 2?
14:29 karolherbst: nv50 only has 2
14:30 karolherbst: not quite sure why though
14:30 imirkin: because it was the first compute thing that existed
14:30 karolherbst: right.. but I mean we could support three dimensions?
14:30 imirkin: 3 dims is nuts anyways
14:30 karolherbst: but then clover would have to lower it
14:30 imirkin: yeah, the dims are completely fake afaik
14:30 karolherbst: well.. not directly
14:31 karolherbst: you still have the group ids
14:31 jenatali: Didn't realize people still cared about parts that old
14:31 karolherbst: and if you only have a 2d matrix of your grids?
14:31 karolherbst: instead of a 3d one
14:31 imirkin: you can make any 3d group into a 1d group and have a 1:1 mapping between the id's
14:31 imirkin: and no one would be the wiser
14:32 karolherbst: jenatali: well, the advantage of open source community stuff is, nobody forces you to remove support for older hardware :p
14:32 karolherbst: imirkin: right.. but you still run out of space
14:32 karolherbst: and with nv50 you run out of space fast
14:32 jenatali: karolherbst: Removing support is one thing - we don't do that either. Making new stuff work on old hardware is different :P
14:33 karolherbst: 65535 grids isn't _that_ much if you enqueue a 3d grid
14:33 imirkin: karolherbst: there's a max size on all these things
14:33 imirkin: karolherbst: and you can set a max on the product of the grid dims
14:33 karolherbst: mhh right
14:33 imirkin: so the max would be 64k
14:33 imirkin: :)
14:33 karolherbst: so you could do 65535x65535 grids, but you could do smart math and fake 3d :p
14:34 imirkin: i don't think it can do 64k x 64k
14:34 imirkin: i think it can only do 64k total
14:34 imirkin: so 64k x 1, 32k x 2, etc
14:34 karolherbst: we are a level higher
14:34 karolherbst: nv50 can only do 512 threads, yes
14:34 karolherbst: but once you are at the grid level it doesn't matter
14:34 karolherbst: it's just a counter
14:35 imirkin: ok. i think there's still a max on that, but maybe not
14:35 karolherbst: or was there an actual limit? . would be strange
14:35 ajax: jenatali: tbh the fact that we _do_ try to make things work even on absurdly outdated hardware is... i'ma go with "noble" maybe? right and proper?
14:35 karolherbst: yeah..
14:35 karolherbst: shouldn't amtter
14:35 karolherbst: you don't run grids concurrently, so..
14:35 jenatali: ajax: Yeah, "noble" is a good word for it
14:35 ajax: one, computers shouldn't be disposable
14:35 jenatali: I'm just a little surprised
14:36 karolherbst: jenatali: nv50 was the first real compute device though :p
14:36 imirkin: jenatali: we're not selling hardware
14:36 ajax: two, all the free-software zealotry talk about being in control of your own machine only matters if you can run it on the hardware you _have_
14:36 karolherbst: :D
14:36 ajax: and as a former broke high school student cobbling machines together from dumpster bins...
14:36 jenatali: Yeah, fair enough
14:36 imirkin: also nv50 does have OpenCL and CUDA support from NVIDIA
14:37 ccr: <firmware blobs, ME, PSP> .. sup guyse?
14:37 karolherbst: jenatali: most of the time people spend working on the hardware they have/care about and I don't own a nv50 ( well I do, but I didn't powered it on for a long time), but pmoreau cares so.. :p
14:37 imirkin: so we want to live the dream of feature parity
14:38 ajax: why else would anyone still spend any amount of timing caring about intel gen3, except that there's actual millions of them out there
14:39 jenatali: Yeah, I get it. Just a different mindset from what I'm used to, but it makes sense
14:41 ccr: it is admirable that you people (or some of you) bother. I am still running one g33 machine regularly, and my main desktop rig is a haswell. laptop has some ye ancient NVidia chip. and everything works fine. *cheers*
14:43 ajax: (sorry to get all preachy)
14:44 pmoreau: karolherbst: I can give you access to my testing box with a Tesla in it if you want
14:44 karolherbst: I can also just set up my tesla box :p
14:45 pmoreau: Or that :-D
14:46 jenatali: I do get keeping old stuff working though. We've got a GPU wall of history (https://devblogs.microsoft.com/directx/wall-of-gpu-history/) with a checkout system so we can pop in older cards and reproduce/fix problems with them
14:46 jenatali: Been a while since I had to grab a Tesla though
14:47 pmoreau: My laptop is from 2009, so always have a Tesla handy (two even!). :-)
14:48 ajax: jenatali: i drooled for a solid minute the first time i saw that wall (virtually, not had the privilege in person)
14:48 imirkin: jenatali: what happens to be in my comp atm: https://hastebin.com/elirerudal.makefile
14:49 pmoreau: Very nice hallway! 👍️
14:49 imirkin: nothing quite so extensive as that wall, of course...
14:49 imirkin: i have one drawer of gpu history :)
14:50 jenatali: :)
14:50 pmoreau: imirkin: Got the Riva TNT2 in case the GP108 lacks a bit of horsepower in some programs? :-)
14:50 imirkin: pmoreau: exactly
14:50 ajax: any chance there's a full inventory of that wall, and/or would you accept donations? ;)
14:50 imirkin: ajax: laying the grounds for an exchange program? :)
14:51 karolherbst: jenatali: done and pushed :)
14:51 ajax: i have a few weirdos in cabinets that ought to not disappear
14:51 jenatali: ajax: I think I can track down the inventory, yeah, and I'm sure we'd be willing to accept donations if you've got something we're missing
14:51 Venemo: jenatali: do you offer guided tours to that hallway?
14:52 jenatali: Venemo: We've had guests come to our building specifically to see that wall, yeah, but I'm not sure how broadly I can just say "yeah come visit"
14:52 jenatali: Especially during the pandemic :P
14:52 karolherbst: ehh..
14:53 ajax: tsk. pci.ids doesn't have an entry for the talisman. does have the pyramid3d though!
14:53 Venemo: sadly I'm on a different continent, so couldn't visit anyway
14:53 pmoreau: You haven’t set up virtual tours using the HoloLens? Sounds like a missed marketing opportunity to me. :p
14:53 jenatali: pmoreau: That'd be pretty awesome actually
14:53 pmoreau: Missing the headset, but I would be up for a virtual tour!
14:54 jenatali: ajax: Found the inventory - not sure I can directly share it though. Got something you're curious about?
14:58 ajax: hey, there's where i put that list
14:58 ajax: https://wiki.x.org/wiki/AdamJackson/
14:59 ccr: mmm. the infamous BitBoys Pyramid3D included.
14:59 ajax: artist graphics 3ga
15:00 ajax: ^ i have one, but never put it on that list for some reason
15:01 ajax: i'd love to lay hands on whichever motherboard it was that had an artx chip
15:01 ajax: ALi something i think?
15:02 ccr: I've read about it, probably because it was mentioned here (or #intel-gfx maybe)
15:02 jenatali: ajax: Looks like we've got a decent selection of the things on that list, but are definitely missing some
15:02 ajax: nv2 would be unobtanium, pretty sure i didn't even see one when i visited nvidia and saw their mini-museum. they did have an nv1 though! (and so do i!)
15:03 karolherbst: pmoreau: 97 test passing in test_basic now :)
15:04 pmoreau: Nice! How many tests are there? (Or what is the success rate?)
15:04 ajax: i don't remember which rendition verite i have, 2100 or 2200. definitely don't have the 1000 though.
15:04 karolherbst: Pass 97 Fails 1 Crashes 4 Timeouts 4
15:04 ccr: I fondly(?) remember the first time I tried to get some S3 Virge Diamond to "accelerate" under Utah-GLX. glquake mostly just rendered garbage and was slower than software, of course.
15:04 imirkin: ajax: what was that sega console, i forget the name...
15:04 karolherbst: crashes are due to constant
15:04 ajax: imirkin: dreamcast. powervr chip
15:05 karolherbst: fail is garbage.. it complains about a prerocessor macro set although we don't support images
15:05 karolherbst: and timeouts are async glocal<->local stuff
15:05 ajax: assuming you don't mean the saturn, that is. which... sure, we'll call that a gpu.
15:05 imirkin: ajax: i meant saturn :)
15:05 imirkin: it had an early nvidia gpu
15:06 pmoreau: karolherbst: I see; pretty good results!
15:06 ajax: imirkin: citation needed? nv1 was quad-based too but i've not heard the saturn be accused of being related
15:06 karolherbst: pmoreau: mind reviewing the last patch of https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5891 ? :D
15:07 karolherbst: that gives us offsets
15:07 imirkin: ajax: i thought it was well-known
15:07 imirkin: i'll try to track something explicit down
15:07 imirkin: afaik it was an nv1 variant
15:07 pmoreau: karolherbst: Not right now: still working.
15:08 sravn: tzimmermann: Can I persuade you to take a look at the Keembay drm driver? At least the memory things that you know so well
15:08 imirkin: ajax: https://en.wikipedia.org/wiki/NV1 -- look at some of the saturn references there
15:08 sravn: tzimmermann: https://lore.kernel.org/dri-devel/1597096418-28937-1-git-send-email-anitha.chrisanthus@intel.com/
15:08 imirkin: ajax: apparently NV1's even shipped with saturn controllers
15:08 ajax: right, it shipped with a copy of virtua fighter, but i thought that was more about the ease of porting
15:09 imirkin: hrmph. i guess i don't see direct references to it
15:10 ccr: I think it's just that NV1 happened to be quad-based like Saturn's hardware was
15:10 imirkin: i definitely see references to them working together
15:10 imirkin: so i don't think it was random
15:10 imirkin: but it may have been a parallel development effort
15:10 ajax: https://segaretro.org/VDP1_(Saturn) makes a stronger, unsourced claim: "The Saturn's VDP1 was the basis for Nvidia's first graphics processor, the NV1, [...]"
15:14 ajax: ooh, it has the programming manual too
15:15 ajax: which... doesn't smell a lot like nvidia's object/channel thing, which even nv1 had
15:18 jenatali:checks to see if we can share our inventory
15:24 agd5f_: tomeu, there is also an ACP option on the amdgpu side as well since the ACP audio on CZ/ST was part of the GPU device
15:25 agd5f_: tomeu, CONFIG_DRM_AMD_ACP
15:25 tomeu: agd5f_: yeah, what I ended up doing was disabling that kconfig
15:26 tomeu: and moved on :(
15:26 tomeu: it used to work with 5.5, not sure what could have changed since
15:26 agd5f_: tomeu, don't know. that stuff hasn't changed in ages
15:27 tomeu: agd5f_: do you know what is supposed to register that device?
15:28 agd5f_: gpu driver
15:29 agd5f_: it detects if ACP is present and requests the ACP audio driver to load
15:29 agd5f_: anyway, only needed for audio, which probably isn't a big deal for a mesa CI
15:31 jekstrand: cwabbott: Nah. Land it. I think we probably want to switch ANV away from our magic intrinsic but that can be a task for another day.
15:35 karolherbst: some more reviews/acks for https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6377 ? I really don't want to land it with just one review, but maybe that's fine and nobody cares :p
15:36 karolherbst: ohh.. I should rename the lower_double_ops function.. mhh
15:38 karolherbst: heh...
15:38 karolherbst: ahh, no, __fne64 is unordered as well :)
16:11 anholt: airlied: looks like you could set CI_NODE_INDEX=1, CI_NODE_TOTAL=10 to the vars and get a 1/10th run that completes in 6ish minutes.
16:35 jekstrand: karolherbst: Can you pull my nir_var_mem_constant MR (6379) and give it a try on nouveau?
16:35 jekstrand: karolherbst: I think it's now in a state where it should be no worse than what we have today.
16:35 karolherbst: sure
16:36 jekstrand: karolherbst: You'll get nir_intrinsic_load_global for constants and die on an unsupported constant_base_ptr system value for anything which declares a constant variable in a shader.
16:36 jekstrand: karolherbst: Actually, wait one second.
16:37 jekstrand: karolherbst: Never mind. Try the version I pushed first
16:38 jekstrand: airlied: Assuming that branch works ok on nouveau, that should solve your clc problems.
16:39 karolherbst: now I have this branch locally which merges 5 or 6 MRs together :D
16:39 jekstrand: karolherbst: :-(
16:40 karolherbst: well. you can help by reviewing stuff :p
16:40 jekstrand: karolherbst: Yeah. I reviewed clc yesterday
16:40 karolherbst: anyway.. I removed the function_temp bits from !6367 as I ran into a few issues
16:40 karolherbst: I don't have clc locally :p
16:40 jekstrand: heh
16:40 karolherbst: anyway.. still 9 patches left which we should get merged
16:40 karolherbst: :D
16:41 karolherbst: the only patch I am unsure about is https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6367/diffs?commit_id=e9a0818c98e3d8c8c5d058019033169604e7de42
16:41 karolherbst: it's more or less up to curro_ to accept a suboptimal fix
16:42 jekstrand: karolherbst: I've reviewed the NIR bits of !6367
16:43 jekstrand: karolherbst: I don't know what's going on in that patch
16:44 karolherbst: yeah..., that's why it's up to curro_ :p
16:44 karolherbst: but thanks for the review :)
16:48 karolherbst: jekstrand: that MR breaks stuff, like kernel_memory_alignment_local
16:49 karolherbst: stores a 32 bit constant into a uint64_t* buffer
16:50 karolherbst: but not quite sure if that's local changes or mhh
16:50 craftyguy: ajax: would you mind taking a look at this issue, I think it's caused by your commit to use xcb-shm: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3444
16:50 karolherbst: I suspect you need my other MR to hit this
16:51 craftyguy: that issue is causing all kinds of havoc in the intel CI :(
16:51 ajax: craftyguy: wtf
16:52 craftyguy: ajax: I could be totally off base thinking it's your patch, but reverting that patch *seems* to help
16:52 karolherbst: jekstrand: ohh, I see what is happening
16:52 craftyguy: ajax: maybe I need some minimum version of xcb-shm installed when building/running mesa w/ that patch?
16:53 pcercuei: can commits be dropped from drm-misc-next? Otherwise I need to send a fix
16:53 ajax: craftyguy: i mean, you do need libxcb-shm installed, yes
16:54 craftyguy: ajax: yeah it's installed on these systems
16:54 ajax: to be precise: if the megadriver you're building includes swr / softpipe / llvmpipe, then the drisw code will now link against libxcb-shm, and you'll need to have that on the disk
16:55 ajax: but none of the drisw code should be getting run at all while loading _iris_
16:55 craftyguy: hmm
16:55 ajax: craftyguy: if i give you a debugging patch can you run it for me? that "failed to load driver" looks suspicious and i think we can be more informative
16:56 craftyguy: ajax: sure
16:56 ajax: word, give me a minute
16:56 craftyguy: ajax: I appreciate any help on this :)
16:56 karolherbst: annoying :/
16:57 jekstrand: karolherbst: Oh?
16:57 karolherbst: jekstrand: taking the address of a shared memory variable and storing into global memory
16:57 jekstrand: karolherbst: Why doesn't that work?
16:57 karolherbst: I suspect it's a bit related that we force 32 bit addressing even though pointers are 64 bit
16:58 jekstrand: karolherbst: That seems likely. :)
16:58 karolherbst: I am checking how that worked before
16:58 jekstrand: karolherbst: My MR just does what clover does today
16:58 jekstrand: Assuming 64-bit global pointers even for __constant
16:59 karolherbst: jekstrand: without your patches I get 64 bit constants :)
16:59 jekstrand: And with my patches you're not getting 64-bit constants?
16:59 karolherbst: correct
17:00 jekstrand: karolherbst: I'm very confused by that.
17:00 karolherbst: I can bisect if you want to :p
17:00 jekstrand: karolherbst: It's entirely possible I've botched something though
17:00 jekstrand: karolherbst: What are you getting with my patches?
17:00 jekstrand: karolherbst: Can you give me some bogus NIR?
17:01 jekstrand: Wow.... This is some truely awesome code: " auto w = v;
17:01 karolherbst: jekstrand: https://gist.github.com/karolherbst/f8a784f9c114484a817bae3ac3042ced
17:01 jekstrand: v is a member of the class not something that's declared locally. What is w? Who knows? All we know is it's a copy of v
17:01 karolherbst: :D
17:03 karolherbst: anyway, I'll bisect real quick, maybe it's easy to spot
17:03 imirkin: jekstrand: copy of v if you're lucky. copy constructor could do whatever :)
17:03 karolherbst: well
17:04 jekstrand: karolherbst: Ok, that's weird
17:06 jekstrand: karolherbst: That is very strange indeed
17:06 karolherbst: jekstrand: "nir/lower_io: Add a build_addr_for_var helper"
17:06 jekstrand: karolherbst: That's the bisect?
17:06 karolherbst: yep
17:06 karolherbst: I can verify though
17:08 karolherbst: yep
17:08 karolherbst: verified
17:08 jekstrand: karolherbst: What test is this?
17:08 karolherbst: test_basic kernel_memory_alignment_local
17:09 karolherbst: but I suspect your might need someting from my other MR
17:09 karolherbst: mhhh
17:09 jekstrand: That's ok. I just want to see the CLC
17:09 karolherbst: although I doubt it
17:10 jekstrand: karolherbst: Are you lowering function_temp in any of your patches?
17:10 karolherbst: yep
17:10 jekstrand: Ah, I see it.
17:10 jekstrand: It's because you're using nir_address_format_32bit_offset for shared
17:10 karolherbst: yeah
17:11 karolherbst: on purpose
17:11 jekstrand: It needs to be 32bit_offset_as_64bit
17:11 jenatali: ^^
17:11 karolherbst: for explicit_io_lowering?
17:11 jekstrand: The problem is that nir_lower_io and spirv_to_nir disagree on the bit size
17:12 karolherbst: mhhh
17:12 karolherbst: will that give me 64 bit pointer math though?
17:12 jekstrand: The reason why the patch you pointed to changes things is because it starts obeying the addr_format instead of the bit-size that was left-over from spirv_to_nir
17:12 jekstrand: karolherbst: No, that's why that mode exists.
17:12 karolherbst: ahh okay
17:12 jekstrand: karolherbst: It unpacks, adds just the bottom bits, and then packs a 0 into the top
17:12 karolherbst: I see
17:12 karolherbst: mhh
17:12 jenatali: Right, it'll only give you 64bit pointer math if the app cast it explicitly to an integer type to do math on it
17:12 karolherbst:goes reworking clover a bit then
17:12 jenatali: As long as it's address offsetting, it's 32bit math
17:13 jekstrand: karolherbst: I can add another patch to my MR to change clover to use that.
17:13 karolherbst: I can do it and add it to my other MR
17:13 jekstrand: karolherbst: Or, if you don't support generic pointers, you can tell spirv_to_nir to use 32bit_global
17:13 jekstrand: Either should work.
17:14 karolherbst: how can that work if the pointers are indeed 64 bit big ?
17:14 jenatali: jekstrand: That'll work as long as there's no ptr<->int casts
17:14 jekstrand: karolherbst: I guess it doesn't. Which is why you really want 32bit_as_64bit
17:14 jenatali: Though one of my patches to add the new address modes would blow up on it because of that
17:14 karolherbst: yeah.. it's easy to fix though..
17:15 jenatali: jekstrand: I'll see if I can rework CLOn12 to use your constant memory patch at some point, hopefully later today
17:15 jekstrand: jenatali: Cool
17:16 jenatali: Going to try to address all the libclc comments first :)
17:17 jekstrand: jenatali: Yeah.... sorry for leaving so many comments. :-P
17:17 jenatali: jekstrand: Don't apologize, they're good comments :P
17:18 jekstrand: karolherbst: Just force-pushed my MR. Now has things in a slightly different order and a patch to fix shared pointers.
17:27 karolherbst: jekstrand: mhhh. I can do such a cleanup in my MR actually.. so we should just set the address format stuff once and just reuse the global setting everywhere, right?
17:27 jekstrand: karolherbst: ?
17:28 karolherbst: I mean reuse the ones from spirv_options
17:28 jekstrand: karolherbst: Yeah, the first couple patches from my MR do just that.
17:29 karolherbst: mind if I pull the fixes into my MR so we can get those merged before the mem_constant stuff?
17:29 karolherbst: I'd like to make the overall stuff be less annoying in clover
17:29 jekstrand: karolherbst: If you want to rewrite them or move them into a separate MR or review them and ask me to land them now, those are all fine options.
17:29 karolherbst: okay
17:30 ajax: craftyguy: https://paste.centos.org/view/raw/b4c6bf62
17:30 jekstrand: karolherbst: How about I pull the top three into their own MR
17:30 karolherbst: jekstrand: then my MR conflicts on that
17:30 jekstrand: karolherbst: Then pull them into yours. That's fine too.
17:30 karolherbst: :)
17:31 jekstrand: karolherbst: Do you need any more reivews on !6367?
17:31 jekstrand: besides the clover patch
17:31 karolherbst: no, just the clover one
17:31 karolherbst: but I could remove that as well
17:31 jekstrand: Yeah, let's punt that one down the road so we can land !6367 if that's ok with you.
17:32 karolherbst: yeah, sounds good
17:33 jenatali: It's sounding like we're only a few weeks away from getting upstream nir/vtn into shape where we can trivially hoist CLOn12 on master :O
17:33 jenatali: I expected it to be months
17:33 jekstrand: jenatali: \o/
17:33 jenatali: Thanks for all the help btw :)
17:33 jekstrand: jenatali: It's a bit fortuitous that right about now is the time I started seriously caring about OpenCL kernels. :)
17:34 jenatali: Indeed, but I'm not complaining
17:34 jekstrand: Not that I didn't care before but I wasn't motivated such that I was doing much hacking on it myself.
17:34 Sachiel: where "seriously caring" means "oh damn, I need to get them to work"
17:35 ajax: craftyguy: i am _really_ curious what that patch uncovers, there's very few places in that init path that can fail...
17:39 craftyguy: ajax: thanks, I'll run it through the CI right now
17:41 karolherbst: jekstrand: pushed with that clover patch removed + your first three patches (reviewed them)
17:41 karolherbst: ehh..
17:42 karolherbst: pushed some local stuff.. now it's good :D
17:50 Lyude: not expecting a yes to this, but does anyone have an old DP converter that perhaps does _NOT_ support MST but does have multiple different types of ports it can output to? (like VGA, HDMI, DVI)
17:50 Lyude: basically i'm trying to find if anyone has access to any DP devices that actually specify more then one downstream port in the detailed downstream cap info
17:51 ajax: Lyude: never seen such a beast, myself
17:51 Lyude: yeah i'm not sure I have either
17:52 Lyude: I -have- seen some combo HDMI/DVI adapters, but i don't have access to those anymore and I think they might have been passive (with some special ddc interface) anyway
17:52 karolherbst: jekstrand: fun.. even without implementing the constant base ptr stuff, with your branch I pass more stuff :D
17:53 jenatali: :)
17:54 karolherbst: but now I am at 58 patches :D
17:54 ccr: hmm.
17:54 karolherbst: mhhh.. printf next?
17:56 jenatali: karolherbst: I'd save that one for last :P it's so un-useful and I only implemented for completion
17:56 ccr: I'm almost sure I've seen some DP <-> HDMI converter at work, basically just a dongle. no idea about whether it does MST or not. but not sure if I can find it.
17:56 karolherbst: I guess :p
17:57 karolherbst: jenatali: but the idea was to implement it for clover so we could get your MRs merged
17:57 jenatali: Ah, fair
17:57 jenatali: Guess I can't complain about that
17:57 jenatali: karolherbst: Personally I'd prefer if images landed first though
17:58 jenatali: Yeah it's technically optional and printf isn't, but it's so much more important in practice
18:01 karolherbst: yeah, I agree that images are more important
18:01 karolherbst: I just don't know how much work it requires for clover though
18:04 jenatali: Fair :)
18:06 craftyguy: ajax: https://bpa.st/X2PA
18:09 ajax: craftyguy: well that's certainly something
18:09 ajax: gonna have another patch for you in a sec
18:10 craftyguy: ajax: sounds good. it takes some time to test/reproduce it, since I haven't noticed any patterns leading up to it failing.. so I have to run on lots of systems and wait to see what happens :/
18:22 ajax: craftyguy: ... out of curiosity, what _was_ the machine it failed on? which gen?
18:23 craftyguy: that last one was a broxton, gen9 gpu, atom cpu.
18:23 craftyguy: but I've seen it on gen8, gen11 too
18:25 ajax: bah, i could have looked that up from the pci id that i totally didn't see in the debug output
18:26 jenatali: jekstrand: I think the libclc series is good for you to take another look
18:31 ajax: craftyguy: also... i change the error message strings that should have been emitted from libGL in that patch, but that log doesn't seem to reflect that
18:32 ajax: so, are you intentionally trying to use the system libGL and not the one from the current build?
18:33 ajax: at any rate, give https://paste.centos.org/view/raw/614852f0 a shot please
18:34 craftyguy: ajax: I think I made some progress... apparently I didn't have the m32 build of libxcb-shm0 installed on all the systems..
18:35 craftyguy: installing that on one system 'fixed' the issue, I'm going to try it on a few more
18:37 imirkin: ccr: most DP -> HDMI adapters are actually passive DP++ -> HDMI adapters -- i.e. the DP port also has hdmi pins, and this just hooks that up.
18:37 ccr: imirkin, ah, I see.
18:37 imirkin: there are also active ones, but less common
18:37 ccr: makes sense
18:37 imirkin: but those are the ones that enable you to get e.g. HDMI 2.0 from DP 1.2 when the native hw doesn't
18:56 karolherbst: sooo.. let's get constant stuff working, shell we...
18:56 karolherbst: jekstrand: I am wondering how to deal with the constant array now :/
18:57 karolherbst: should we let drivers handle that or should clover bind a buffer?
19:09 karolherbst: jekstrand: why nir_load_scratch_base_ptr though?
19:09 karolherbst: do we really need that one?
19:10 karolherbst: or uhhh.. did I mess up.. wait
19:11 karolherbst: yeah.. wait
19:11 karolherbst: why would we want to do global stuff for temp memory?
19:13 karolherbst: ehh
19:14 karolherbst: okay.. fixed it
19:15 karolherbst: mhhh
19:15 karolherbst: oh well
19:15 karolherbst: not that it matters much
19:16 jenatali: jekstrand: Ugh before I can take your new constant memory type I need to take your variable list rework - otherwise which list to constant vars go into :P
19:17 karolherbst: jenatali: why do you end up with a variable?
19:19 jenatali: Maybe it doesn't matter at runtime, but 9000cd48 adds it to the list of valid modes that can be used for variables
19:19 karolherbst: right... mhh
19:19 karolherbst: ohh right mhhh
19:19 jenatali: But even then I think it would, vtn creates variables before we lower them
19:20 karolherbst: yeah.. I am still not sure on how to treat constants...
19:21 karolherbst: is what I get now: https://gist.githubusercontent.com/karolherbst/7082f490da487c79df0b804540b318aa/raw/52970025c0bddcc54b90509760b0b90b1da6510c/gistfile1.txt
19:21 curro_: karolherbst: regarding the local memory alignment patch you linked earlier, doesn't seem too bad though it would be nice for the target to have control over the alignment
19:21 karolherbst: curro_: the target _can't_ have control over it :p
19:21 karolherbst: that's the point
19:21 curro_: why not?
19:22 karolherbst: spec demands alignment rules
19:22 karolherbst: so you have to follow normal CLC rules
19:22 karolherbst: of course we only need to align according to the biggest member if it's a struct and stuff
19:22 curro_: huh? but the offsets of things in local memory aren't visible at the API level AFAIA
19:23 karolherbst: but we can't go below
19:23 karolherbst: and going above wastes memory
19:23 karolherbst: curro_: you can take the address
19:23 karolherbst: the CTS is testing that
19:23 karolherbst: so it has tests where it checks if the alignment is correct for local memory pointers
19:24 curro_: hah, so that's the test which it's fixing?
19:24 karolherbst: eg: test_basic kernel_memory_alignment_local
19:25 karolherbst: but there are a few more which do crazy stuff, but then I am not sure if that's runtime failing or alignment being messed up :)
19:25 karolherbst: but that one checks the alignment
19:25 karolherbst: the current patch is wasteful for arrays and structs and I guess we could be better
19:26 karolherbst: but that also requires fixing the llvm and spirv target to set inner memory alignment on local argumebnts
19:26 karolherbst: *arguments
19:26 curro_: right, which boils down to giving the target control over the alignment :P
19:26 jekstrand: karolherbst: You may want 32bit_offset_as_64bit for temp
19:27 karolherbst: jekstrand: yeah.. I already have a ptch for that :)
19:27 karolherbst: curro_: well.. right. I thought you were more refering to "drivers" and being less strict on certain hw
19:27 jekstrand: karolherbst: Though, on Nvidia, it may be easier to make scratch_base_ptr compute your scratch address and have everything in one unified 64-bit address space.
19:27 karolherbst: but if you meant the IR parsing, then yes
19:28 karolherbst: jekstrand: yeah... you might think, but I think they removed that feature on later gens...
19:28 karolherbst: not quite sure yet
19:28 jekstrand: karolherbst: Ah
19:28 karolherbst: but I also don't know if local memory has advantages over global besides it's bound checked
19:28 curro_: karolherbst: i guess in some cases the target may have more strict alignment rules than the spec mandates
19:28 jekstrand: karolherbst: In that case, you'll probably want my 62bit address format
19:28 imirkin: jekstrand: we actually do have a weirdly unified address space
19:28 karolherbst: jekstrand: it also makes debugging easier :)
19:28 karolherbst: if you use local and go out of bounds the hw sends a trap
19:28 jekstrand: karolherbst: That does 32-bit arithmetic for scratch and shared when it knows which to do.
19:29 karolherbst: ahh yeah, and pointer arithmetic is cheaper :)
19:29 karolherbst: totally forgot about that
19:29 imirkin: there are configurably-located holes for shared and local memory
19:29 jekstrand: If it's a generic pointer, it obviously has to do 64-bit.
19:29 karolherbst: sure
19:29 imirkin: and then yuo can just use standard load/store ops to operate on it
19:29 karolherbst: but for that we either map everything into global or if ladder :)
19:29 karolherbst: I'd like to support both I think
19:29 jekstrand: karolherbst: You could do 32-bit arithmetic for shared and local with 64-bit too if you knew a priori that your holes were cut out in nice places.
19:29 karolherbst: curro_: mhh.. that would be a bit strange, but maybe?
19:30 karolherbst: but if you are fine with the current patch we can always improve it later
19:30 karolherbst: the patch at least fixes bugs according to spec
19:31 karolherbst: I can fix it up for spir-v but not sure I could for llvm
19:31 karolherbst: imirkin: yeah.. but I think it goes away
19:31 karolherbst: or already went away
19:31 curro_: karolherbst: yeah but i wonder if it can cause things to break too if it wastes so much memory aligning an array to the next power of two that it no longer fits in local memory
19:32 karolherbst: well, that's why I cap it to sizeof(long16)
19:32 karolherbst: so it's not _that_ terrible
19:32 karolherbst: but yeah..
19:32 karolherbst: I think the issue is that the spec doesn't really can do much anyway
19:32 karolherbst: internal things could take up some space as well
19:33 karolherbst: anyway, I am in favour of fixing it properly and I can certainly do so for spirv
19:33 jekstrand: Are the local memory size requirements spec'd assuming no alignments? That seems wrong.
19:33 curro_: karolherbst: main difficulty with giving the target control is that currently module::argument::target_align and friends refer to the alignment of the offset in the input space, but we could easily redefine them to mean the alignment in the local address space, and require input offset arguments to be tightly packed
19:33 karolherbst: yeah, we need another field
19:34 karolherbst: and please no packed
19:34 karolherbst: never ever
19:34 karolherbst: packed is the worst
19:34 karolherbst: jekstrand: clover doesn't align
19:34 karolherbst: but the spec demands alignment
19:35 curro_: karolherbst: i disagree. if you're going to align up to 64 bits for a 32bit offset that's no better than passing a tightly packed 64bit offset in the input buffer
19:35 karolherbst: packed is the worst thing :)
19:35 karolherbst: we can't load under alignement
19:35 karolherbst: never
19:36 karolherbst: so packing input buffer essenntially means 8 bit loads and a bunch of shifts and masks
19:36 karolherbst: or 32 bit loads and more masks and less ands
19:36 karolherbst: anyway
19:36 karolherbst: packed is no solution here
19:36 jekstrand: Why are we aligning up to 64 bits for a 32-bit thing?
19:36 curro_: karolherbst: oh, that's not what i meant... by tightly packed i mean aligned to its own size at least
19:36 karolherbst: curro_: ahh, right
19:37 karolherbst: yeah, that's fine
19:37 karolherbst: but that's why we need a new argument
19:37 karolherbst: which specifies "inner" alignment
19:37 karolherbst: so what we have right now is the alignment of the value (being it a pointer or values or whatever_
19:37 karolherbst: and the other alignment tells about the alignment if we follow the pointer
19:37 karolherbst: like for local memroy
19:37 anholt: eric_engestrom (or maybe dcbaker): want to help pythonize some bare-metal? I need to figure out what the components to glue together are to make a class that 1) makes a thread that reads from python-serial producing bytes 2) Buffers those bytes into something that can produce lines. 3) on each line, log it to stdout (with a prefix) and also return that line in a queue to the main thread. what do I glue together, particularly to get a buffer of lines
19:37 anholt: out of bytes that we periodically read?
19:37 karolherbst: ptr vs data alignment? mhhh
19:38 curro_: karolherbst: but if all targets are going to use the same offset format and it's always going to be size-aligned we could omit the extra alignment field possibly
19:38 karolherbst: well.. there are two things
19:38 karolherbst: alignment inside the kernel input buffer
19:38 karolherbst: alignment inside the memory the parameter points to
19:39 karolherbst: so you have your 64 bit local memory pointer (which is just 32 bit, but you know, it's a pointer and the spec says so) and then it points to an array of 16 chars
19:39 karolherbst: so the pointer is aligned with 8 bytes, the content with 1
19:39 karolherbst: this is all just for the calculation of the local memory buffer anyway
19:39 karolherbst: the second alignent I mean
19:39 curro_: karolherbst: yes i get that, but we could possibly hardcode the alignment of one of those two things if all targets are going to be okay with it
19:39 karolherbst: nothing else needs it
19:40 karolherbst: mhhh
19:40 karolherbst: well.. we could say "this is a pointer" and then align according to spec, sure
19:40 curro_: it would be nice to give them control of both yeah, just trying to make your life easier :P
19:40 karolherbst: well.. right now we get the alignment inside the kernel input buffer and that's fine
19:41 karolherbst: adding a second alignment for the internal buffer alignment is fine I guess
19:41 karolherbst: shouldn't add that much more work
19:41 jenatali: Giving up on taking the variable list patch, we're too far out-of-sync w.r.t other drivers and lowering passes... just throwing constant vars in the uniforms list for testing...
19:41 karolherbst: and it's more obvious on what the value means
19:41 karolherbst: I had a patch ones I think, but lost it :D
19:41 karolherbst: or maybe I find it...
19:42 karolherbst: nope...
19:43 curro_: main disadvantage is adding an extra field has a cost for each argument of each kernel even if they aren't in the local space, even though the value of one of them is fully predictable
19:43 karolherbst: I don't mind the extra work, just somebody would have to the llvm changes
19:43 karolherbst: mhhh wait
19:43 karolherbst: we can do that just for local args, no?
19:44 karolherbst: ohh wait
19:44 karolherbst: module::argument is just one class mhh
19:45 karolherbst: curro_: well.. we could probably say that local, constant, and global are all pointer aligned I guess?
19:45 karolherbst: and for those types target_align specifies the data alignment
19:45 karolherbst: but.. that's a bit annoying that this field gets two meanings
19:45 karolherbst: but pointer alignment is quite fixed anyway
19:46 karolherbst: mhhh
19:46 karolherbst: ehhh
19:46 karolherbst: curro_: acutally.. I think this makes sense
19:46 karolherbst: we just say target_align is _always_ the data alignment
19:46 karolherbst: and pointers just get aligned to size_t inside the kernel input buffer
19:47 karolherbst: which are just constant, global and local anyway
19:47 karolherbst: this way it's a little weird, but if that's documented it might not be too hard to follow
19:48 curro_: karolherbst: that sounds good to me
19:54 curro_: however it's really only necessary to make that change for local arguments, only because clover is doing sub-allocation of the local address space while executing a kernel, same doesn't apply for the other address space types
19:59 karolherbst: yeah.. I know, but the targets are setting pointer alignment for pointers anyway and I prefer that one field has exactly one meaning.
20:05 curro_: karolherbst: it's not going to have exactly one meaning anymore once you do that change, since for scalar arguments it's still the alignment of the input buffer bits, while for constant/global arguments target_align becomes basically meaningless since there's nothing kernel.cpp can do to enforce the alignment of a buffer which has already been allocated...
20:07 karolherbst: true, and the inner alignment is enforced by the kernel anyway
20:08 karolherbst: though for pointers we could assert at runtime if the handed in ptr is aligned properly
20:08 curro_: i don't mind the minor inconsistency if you make the change for the local space only. local arguments are special really :P
20:08 karolherbst: but yeah...
20:08 karolherbst: yeah, I can make it local only for now
20:20 karolherbst: curro_, pmoreau: I guess that's not too terrible: https://gitlab.freedesktop.org/karolherbst/mesa/-/commit/52a0427d687fbc66ff24012f9301a4a5c0ca629c
20:28 karolherbst: jekstrand: any idea on what to do with load_constant_base_ptr in clover?
20:28 curro_: karolherbst: looks good
20:29 karolherbst: curro_: do you know how llvm deals with indirect on in kernel constants?
20:29 jekstrand: karolherbst: Pass the constants in as an implicit param and then turn it into a load_kernel_input
20:30 jekstrand: karolherbst: More specifically, turn the nir_shader::constant_data into a constant buffer and make that an implicit param.
20:30 karolherbst: jekstrand: I am wondering if we can just pass by value *hides*
20:30 karolherbst: I know.. not enough space
20:30 karolherbst: but yeah...
20:30 karolherbst: jekstrand: what would I need to do to turn the variables into constant_data?
20:31 karolherbst: or don't we have that yet?
20:31 karolherbst: I guess opt_large_constants won't do it the way we want it
20:31 karolherbst: or did I miss something? mhh
20:31 jenatali: karolherbst: nir_lower_mem_constant_vars
20:31 karolherbst: ahhh
20:32 jenatali: I did something wrong cherry-picking this series over to our fork, libclc is failing to validate :(
20:32 karolherbst: mhhh
20:33 airlied: jenatali: is the problem that it wasn't validaed before?
20:33 airlied: validated
20:33 jenatali: airlied: Not the SPIR-V, the nir
20:33 jenatali: I'm trying to play around with jekstrand's new mem_constant series
20:34 airlied: ah
20:35 karolherbst: mhhh
20:35 jekstrand: karolherbst: My MR already turns variables into constant data. nir_lower_mem_constant_vars
20:35 karolherbst: does it handle dead constants?
20:35 jekstrand: karolherbst: Run nir_remove_dead_variables(nir_var_mem_constant) first
20:35 karolherbst: this removes all variables :)
20:36 karolherbst: I guess before lowering?
20:36 jekstrand: karolherbst: Yeah, before losering
20:36 jekstrand: *lowering
20:36 jekstrand:wants to know what the OpenCL people were smoking when they came up with OpImageQueryFormat and OpImageQueryOrder. It must have been something really good.
20:37 karolherbst: :D
20:37 jenatali: :)
20:37 karolherbst: welll...
20:37 karolherbst: that what you get with typeless images :p
20:37 jenatali: jekstrand: It's not the worst... there's still printf :P
20:38 jekstrand: jenatali: I guess. But that's about the only thing I've seen that's worse.
20:38 karolherbst: jekstrand: I think we still want to turn them into ubos if possible, but I guess for that we need a proper pass or something? dunno...
20:38 jekstrand: jenatali: How do you implement those? Pass something in side-band?
20:39 jenatali: jekstrand: Yeah, as part of the kernel inputs
20:39 jenatali: When the app calls clSetKernelArg for an image, we stuff the format info in the input buffer
20:39 jekstrand: jenatali: Makes sense[
20:39 jekstrand: karolherbst: I thought that's what what sticking it int an implicit constant buffer was going to do?
20:40 jenatali: And when they call it for a sampler... we set a bunch of state so that we can recompile the shader to deal with all the crazy sampler modes that D3D doesn't support :P
20:40 karolherbst: jekstrand: if we can follow a deref_load back to a mem_constant variable, we can stick that into a ubo. everything else has to be treated as an constant* input...
20:40 karolherbst: that's kind of the summary of the situation
20:40 imirkin: jekstrand: having trouble tracking down the meanings of those two ops
20:40 karolherbst: the issue is, that a load _can_ originate from in kernel constants or constant* arguments
20:40 imirkin: jekstrand: what do they do?
20:41 karolherbst: imirkin: OpImageQueryFormat format of the bound image :)
20:41 imirkin: like what kind of format? like the enum value?
20:41 karolherbst: in CL that's all typeless
20:41 jenatali: imirkin: Yes, the CL API enum value
20:41 karolherbst: yeah.. type format
20:41 imirkin: lol ok
20:41 jenatali: get_image_channel_data_type and get_image_channel_order
20:41 karolherbst: :p
20:41 karolherbst: you know.. so you can switch on that and do read_image32f read_image32i... you know
20:42 karolherbst: :D
20:42 jenatali: Yeah...
20:42 karolherbst: ehh wait
20:42 karolherbst: it's tiehr float or int
20:42 karolherbst: is there no short stuff?
20:42 jenatali: Hm?
20:43 karolherbst: like if I don't want to have a uin4 but uchar4 as the result
20:43 jenatali: Image reads always convert to 32bit types
20:43 karolherbst: ahh yeah..
20:43 karolherbst: annoying
20:43 karolherbst: somehow somebody didn't thought that through
20:43 eric_engestrom: anholt: happy to help, but that'll have to wait until tomorrow :)
20:43 karolherbst: jenatali: ehh.. there is read_imageh :D
20:43 karolherbst: for half
20:44 karolherbst: right...
20:44 eric_engestrom: also, I'm not sure I understand; can you point me to the existing code you're talking about?
20:44 jenatali: Right, if you have fp16 support you can get half types
20:44 karolherbst: because for int you always want to have 32 bit, but for float it toally makes sense to support 16 and 32 bit
20:44 jenatali: karolherbst: You can also load half images as fp32
20:44 karolherbst: but why would I want to do that? :D
20:44 jenatali:shrugs
20:45 jenatali: Lower bandwidth?
20:45 karolherbst: but then I want to operate on fp16 as well
20:45 jenatali: That was the same reason people here gave for why vload/vstore_half exist
20:45 karolherbst: I mean.. right..
20:45 karolherbst: but at runtime you might also want to have smaller types because registers and stuff
20:46 karolherbst: but maybe you can just be smart and do uchar4 = read_imageu?
20:46 ajax: craftyguy: any update?
20:46 karolherbst: mhh.. dunno
20:46 craftyguy: ajax: not yet.. having some unrelated difficulties rolling out that change to all ~200 systems
20:47 jenatali: Sure, but fp16 is an extension, not all hardware can do actual alus on it
20:47 jenatali: And unlike i8/i16, it's pretty hard to emulate ;)
20:47 karolherbst: well.. fp16 textures are a thing :p
20:47 karolherbst: even if you don't do fp16
20:48 jenatali: Right, but all you can do with them is convert to fp32
20:48 karolherbst: sure
20:48 karolherbst: but CL doesn't allow you to create fp16 images without fp16 support or did I miss anything?
20:49 jenatali: You can
20:49 karolherbst: ohh, you can..
20:49 karolherbst: interesting
20:49 jenatali: Might even be required if you support images
20:49 karolherbst: might be
20:49 karolherbst: I really should look into images and get that sorted out :D
20:50 jenatali: Ah, looks like images are required for CL2.x, and yes, HALF_FLOAT is a required format there
20:50 karolherbst: jekstrand: soo.. when I remove dead constants, lower them to constant_data via explicit_io, and then remove dead constants again, it is all gone :p
20:50 karolherbst: no idea if that's indended but..
20:51 karolherbst: ehh wait..
20:51 karolherbst: I messed up
20:51 karolherbst: ahh cool.. it works
20:52 jekstrand: jenatali: Dropped a pile of comments on your image MR. Over-all, I think it mostly looks ok. My biggest gripe is with setting a format on the image_read/write intrinsics instead of a type.
20:52 jenatali: Sure, seems reasonable I think
20:54 jekstrand: Now for the 16/64-bit MR
20:54 jenatali: :)
20:55 karolherbst: mhhh...
20:56 jekstrand:is somewhat tempted to wire up images for clover
20:56 karolherbst: yeah....
20:57 jekstrand: I'm not really sure what that would look like though.
20:57 karolherbst: the biggest issue is just checking if the argument stuff is fine
20:57 karolherbst: and the spirv/invocation.cpp bit needs to get tought about image stuff
20:58 karolherbst: it's... a bit of work
20:58 karolherbst: but maybe pmoreau wants to do it :D
20:58 jekstrand: Yeah, jenatali's spirv_to_nir patches make images somewhat magic but I think that's what we want.
20:58 karolherbst: yeah.. I think that's okay
20:58 jekstrand: Basically, you get variables just like you do in GL/Vulkan
20:58 jenatali: Yeah, at the end of it you get image variables as input variables, so we do a bunch of lowering to convert those to uniforms, but overall it works okay
20:59 karolherbst: jekstrand: that's... fine I guess
20:59 karolherbst: images are really opaque anyway
20:59 karolherbst: smaplers are not
20:59 karolherbst: *samplers
20:59 karolherbst: but CL samplers are stupid
20:59 jekstrand: jenatali: I didn't notice: Do you set a location on them? Or is the driver expected to search through the param list and use that order somehow?
20:59 jenatali: And we also do lowering based on the read/write ops to create multiple strongly-typed versions of the image, because DXIL doesn't have untyped images
21:00 jenatali: jekstrand: I believe spirv-to-nir doesn't currently set locations on any of the input variables it creates
21:00 karolherbst: jekstrand: I can't of feel that spirv image support in clover will take a while
21:00 jekstrand: jenatali: Can you make one of each of the three typs and then assign them to the same descriptor? That's what I'd do in Vulkan.
21:00 jekstrand: karolherbst: That's quite possible. I'm also happy to not work on it. :)
21:00 karolherbst: :)
21:00 jenatali: jekstrand: Yep, that's what the runtime does
21:00 karolherbst: I just prefer dealing with the easier to merge MRs first
21:00 jekstrand: karolherbst: That's fine
21:00 karolherbst: so rebasing will be easier for jenatali and co
21:01 jenatali: Much appreciated :)
21:01 karolherbst: ohhh, offsets are ready btw
21:01 jekstrand: karolherbst: I think jenatali's SPIR-V patches for images are in pretty good shape. I don't think it will take significant work for Clover to adapt to them.
21:01 karolherbst: just the clover bits need review
21:01 jekstrand: karolherbst: offsets?
21:01 jekstrand: karolherbst: MR link?
21:01 karolherbst: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5891
21:01 jenatali: Additional system values for adding work ID offsets
21:01 karolherbst: I wrote the clover patch today :)
21:04 karolherbst: I guess I should move the pass into its own file :D
21:04 karolherbst: I plan to handle the constant_base there as well
21:05 jenatali: jekstrand: Those comments on the image MR are not nearly as bad as I expected :)
21:05 jekstrand: cool
21:05 jenatali: I'm not 100% sure about using TYPE vs FORMAT though, just because of parity with the non-_deref intrinsics
21:05 jekstrand: jenatali: I think it's a separate thing entirely
21:05 jekstrand: jenatali: It's more like the dest_type for texture instructions
21:06 jekstrand: jenatali: And I think we want the non-_deref versions to have TYPE as well.
21:06 jenatali: Yeah ok, I can buy that argument
21:06 jekstrand: OpenCL never provides an actual format in the shader.
21:06 jenatali: Nope
21:13 Lyude: Hey - the other day I thought I saw someone trying to add a connector prop to DRM for indicating the downstream port type of a connector (for instance, a DP connector with a DP to HDMI adapter plugged in would be indicated to have a downstream port type of HDMI). does anyone know where this patch series is and if it made it in?
21:13 Lyude: couldn't find anything on patchwork and i'm going through the drm-tip source right now, but I could have sworn I remembered seeing something like this go by on the ML at some point
21:14 imirkin: Lyude: i saw those patches too
21:14 imirkin: but iirc something along those lines has already landed
21:14 imirkin: this was just some refinement
21:14 Lyude: i hope-oh
21:14 Lyude: well it might be wrong
21:14 Lyude: i squinted very hard at the dp spec and realized that you can have multiple downstream port types
21:14 imirkin: give me a few
21:15 Lyude: imirkin: thanks-i'll keep looking as well
21:15 karolherbst: jekstrand: lowering of constant_mem: https://gist.github.com/karolherbst/221e85c357c9b6590d37edfe69ae5b76 :)
21:15 Lyude: it might still be ok since it seems like there is a concept of a "primary" downstream port type
21:15 airlied: Lyude: it was using subconnector prpp
21:15 airlied: prop
21:15 karolherbst: ehh.. the constants are already gone :D
21:15 karolherbst: wait..
21:15 airlied: if that helps search for it
21:16 imirkin: [PATCH 3/5] drm/nouveau: utilize subconnector property for DP
21:16 imirkin: probably that series
21:16 imirkin: Fri April 24
21:16 Lyude: yeah i found it
21:16 imirkin: originally "Subconnector property for DP downstream type" on Aug 26, 2019
21:17 karolherbst: jekstrand: https://gist.github.com/karolherbst/9ecd8cf13c8b0a1c3bdc6acaf9ed3fa0
21:17 karolherbst: does this look fine to you?
21:17 Lyude: so like, it's not super important to me right now but since I need to make sure we handle this downstream stuff correctly, I guess we'd need to come up with another prop for this? :\
21:17 karolherbst: ehh.. I am getting tired
21:18 karolherbst: messing up
21:18 Lyude: oh, hm, it looks like it's sharing a property that only used to be used for DVI-I and TV-out, so I guess adding another DP specific one might not be out of the question
21:18 imirkin: Lyude: so one thing i pointed out in that original review
21:18 imirkin: was that it tied things to "DP" ports
21:18 imirkin: but on nouveau and amd, a DP port could have an HDMI thing plugged in
21:18 jekstrand: karolherbst: roughly? I'm not really sure what you're asking me to look for.
21:18 imirkin: and it didn't gel very well with the implementation, i think
21:18 karolherbst: yeah.. I forgot the later passes :)
21:18 imirkin: (not like an active DP -> HDMI, but a true HDMI thing)
21:19 karolherbst: it was more of a "does the end result look sane to you" but also adding the steps in between
21:19 karolherbst: but now it should contain everything
21:19 Lyude: imirkin: tbh though that's really got more to do with how we implement connectors on our end... I've looked and i'm not sure there's actually a requirement for us to expose connectors like that
21:19 Lyude: ffor instance - intel (used) to handle this by having multiple ports for stuff like DP++
21:19 karolherbst: now I just need to bind the buffer somehow..
21:20 imirkin: you can see the thread in https://lore.kernel.org/dri-devel/20190826132216.2823-6-oleg.vasilev@intel.com/
21:20 karolherbst: ahh ehh "ERROR: Unable to enqueue kernel! (CL_INVALID_KERNEL_ARGS from ../test_conformance/basic/test_constant_source.cpp:74)" :D
21:20 imirkin: Lyude: yeah, intel has 2x the connectors
21:20 karolherbst: mhhh
21:20 karolherbst: more fighting with clover
21:20 airlied: jekstrand, karolherbst : probably okay to land the nir spriv image support wothout clover initially
21:20 karolherbst: airlied: maybe
21:20 airlied: then start pulling out hair
21:20 karolherbst: if others are fine, then yeah
21:20 jekstrand: karolherbst: You need to run lower_mem_constant_vars before lower_io
21:20 jekstrand: airlied: That's my thought as well
21:20 karolherbst: ohh
21:21 airlied: i started on clover images at least twice and ran away, and that was just with llvm backend
21:21 Lyude: vsyrjala / imre : ^ any opinions on this dp stuff btw?
21:22 airlied: but i think radeonsi was just special there as there is no explicit binding table
21:22 karolherbst: okay.. now to deal with the buffer
21:22 airlied: so i really wantec to just pass in the descriptors rather than an index
21:22 imirkin: airlied: most hw can now support unformatted read requests
21:22 imirkin: dunno if that matters to what you were doing
21:22 jekstrand: karolherbst: lower_mem_constant_vars does the same location-assignment stuff that vars_to_explicit_types does so you need that before lower_io can get the correct base pointer for a variable.
21:23 airlied: imirkin: think it needs formatting
21:23 karolherbst: yeah.. that's fine
21:24 karolherbst: airlied: sounds annoying :/
21:24 imirkin: airlied: i mean the format is baked into the image descriptor
21:24 imirkin: rather than specified explicitly in the shader
21:24 imirkin: EXT_shader_image_load_unformatted or whatever
21:24 karolherbst: imirkin: not all CL image operations are GL image things though :)
21:24 imirkin: make that _formatted :)
21:24 karolherbst: the normal reads are just normal texture operations
21:25 karolherbst: not that it matters much...
21:25 imirkin: texture ops like they do linear interp and LOD?
21:25 karolherbst: yes
21:25 imirkin: fun
21:25 karolherbst: that's the sampler_t thing for
21:25 imirkin: ah right. but you can't write to that, can you?
21:26 karolherbst: I think it needs to be a constant in the kernel? dunno
21:26 karolherbst: but it can also be passed in as an argument
21:26 imirkin: i mean, you can't write to the image backing it?
21:26 karolherbst: sure you can
21:26 jenatali: jekstrand: Left one comment so far on your mem_constant series - was quite annoying to track down :(
21:26 imirkin: through the sampler handle?
21:26 karolherbst: imirkin: there is no sampler handle :)
21:26 imirkin: "sampler_t" thing
21:26 jenatali: imirkin: You don't use the sampler_t for writes
21:26 karolherbst: yep.. that's just a constant declaring filter...
21:27 karolherbst: but yeah
21:27 karolherbst: writes don't have a sampler
21:27 imirkin: right ok
21:27 imirkin: that makes more sense then
21:27 karolherbst: write_image do image operations
21:27 karolherbst: but read_image with a s ampler are texture ops
21:27 imirkin: sure.
21:27 karolherbst: read_image without a sampler are images as well :) but yeah...
21:27 karolherbst: it's still all without a format
21:28 jekstrand: jenatali: Yeah.... I'm pretty sure we've been bitten by that before. :-(
21:28 jenatali: jekstrand: Yeah, I've been bitten by it in serialize/deserialize already
21:28 jenatali: Clang should have a warning for that...
21:28 jekstrand: jenatali: Feel free to submit an MR to switch that to unsigned and slap my RB on it.
21:29 jenatali: Will do, thanks
21:31 jenatali: jekstrand: Should I go ahead and do all of the rest of the enums that are explicitly bit-sized in that struct?
21:34 jekstrand: jenatali: Yeah, we probably should. Otherwise, something's going to bite you in GLOn12
21:34 jenatali: I'm surprised we haven't been bitten so far, but maybe we just haven't used a value in the high end of the enum space yet
21:34 jekstrand: Could be
21:34 karolherbst: curro_: what's the best way to create a buffer inside _context::bind?
21:34 jenatali: That's what saved us for CL so far
21:35 karolherbst: *exec_context::bind
21:35 karolherbst: mhh.. I guess I could create it elsewhere and just bind it there
21:43 jenatali: jekstrand: Figures the C++ code in Mesa would blow up trying to convert unsigned => nir_variable_mode :(
21:43 jekstrand: jenatali: :-(
21:43 jekstrand: jenatali: Fortunately, there's not *that* much C++ that touches NIR
21:44 jenatali: True, I just don't currently have a build environment for all of it
21:44 jenatali: I guess I can throw it at CI over and over again :P
21:44 jekstrand: jenatali: The CI system does
21:44 jekstrand: jenatali: Also, do you have WSL on your machine?
21:44 jenatali: jekstrand: I do, that's a good point
21:44 jenatali: Haven't tried setting up a Mesa build environment there yet
21:45 jenatali: I definitely will be at some point in the future though :D
21:45 jekstrand: If you've got an ubuntu image, I think "apt-get build-dep mesa" will do most of the work for you.
21:45 jenatali: Ooh, cool
21:46 jekstrand: Debian and Ubuntu actually do a decent job of making their source packages have the right deps.
21:46 jekstrand: Fedora (which is what I use), not so much....
21:48 jenatali: jekstrand: In case you actually want to look before I slap your r-b on it: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6393
21:48 jenatali: Will spin it through CI
21:49 karolherbst: ehhh...
21:49 karolherbst: that constant buffer stuff is a bit annoying :D
21:49 jekstrand: jenatali: I'll give you a real RB once CI builds it. :)
22:00 jenatali: jekstrand: It builds \o/ tests are still running but I don't think it could break them
22:00 jekstrand: jenatali: Cool
22:02 karolherbst: jekstrand: what's the best way to clear the constant_data thing? just set to NULL and done?
22:02 jekstrand: karolherbst: Set it to NULL and set constant_data_size to 0
22:02 jekstrand: karolherbst: Why do you need to clear it?
22:02 karolherbst: because otherwise we would store it twice ....
22:03 karolherbst: it's quite the result of how clover is designed
22:03 jekstrand: karolherbst: Yeah......
22:04 karolherbst: at this point I am not quite sure if I am having fun or not: https://gitlab.freedesktop.org/karolherbst/mesa/-/commit/0e107e0773dc3a9a580db71ae119846f5820934f
22:05 karolherbst: but at least now I really just need to create the buffer and be done with it :)
22:07 karolherbst: oh wow.. that section stuff even works
22:07 jekstrand: karolherbst: section stuff?
22:08 karolherbst: the changes in core/kernel.cpp + inserting the section in nir/invocation.cpp
22:17 jenatali: Hm, I misunderstood what this series was doing at first. Switching to use it is actually a pretty significant rework
22:17 jekstrand: jenatali: Constants? :-/
22:17 jenatali: Yeah
22:18 jekstrand: jenatali: It should ultimately let you drop some code.
22:18 jekstrand: But it may be a decent bit of churn in the short term
22:18 jenatali: Yeah, I think it will
22:18 jenatali: The main problem is that we convert mem modes to ssbo *before* lower_explicit_io
22:19 jenatali: With this series we'd have to go ahead and run lower_explicit_io on mem_constant, then handle the new load_constant ops
22:19 jenatali: Not terrible, and definitely more efficient since it'll pack all non-inlined consts into the same ssbo, instead of giving each its own buffer like we do now...
22:20 jekstrand: jenatali: You give each constant var it's own buffer!?!?
22:20 jenatali: jekstrand: Each giant array that we can't embed in DXIL's ICB
22:21 jekstrand: Ah
22:21 jenatali: Not *that crazy
22:21 jekstrand: How big is DXIL's ICB?
22:21 jenatali: But yeah could be better
22:21 jenatali: 64KB I think
22:21 jekstrand: Most stuff should fit in that, I'd think.
22:21 jenatali: Though size isn't the problem, IIRC DXIL doesn't allow indirect references to it
22:21 jekstrand: Ah
22:22 jekstrand: Yeah, if it's a direct reference, NIR will pull it out of the constant_data and give you a normal load_const.
22:22 jenatali: Maybe I'm using the wrong terminology - when I say indirect, I mean casts
22:22 jekstrand: Assuming some amount of optimization between lowering and conversion to DXIL
22:22 jekstrand: Oh, ok.
22:22 jenatali: Array accesses using an indirect offset are fine
22:22 jekstrand: That makes a bit more sense, I think.
22:23 jekstrand: That actually doesn't map well to nir_constant_data at all. :-/
22:23 karolherbst: jekstrand: "PASSED test."
22:23 karolherbst: for test_basic constant_source
22:23 jekstrand: karolherbst: \o/
22:23 karolherbst: one line change was needed :)
22:23 karolherbst: but now I leak memory :D yay
22:23 karolherbst: but overall I don't think it's _that_ terrible: https://gitlab.freedesktop.org/karolherbst/mesa/-/commit/1fdd7f06b8ce54dbe80438f24bfd6de68eeb99d9
22:24 karolherbst: curro_: mind taking a look and give some overall suggestions on how to improve that?
22:25 jekstrand: karolherbst: I don't think you want to creat multiple inputs if load_constant_buffer is used more than once.
22:25 jenatali: jekstrand: I think I can come up with ways to make it work - what I really want is to have 2 nir_constant_datas, one that's reflected out to the runtime to bind in a buffer, and one that's fed to the shader converter to embed in the DXIL
22:25 jekstrand: karolherbst: Otherwise, yeah, that's not too bad.
22:25 karolherbst: jekstrand: ... ehh yeah.. you are right :D
22:25 jekstrand: jenatali: Yeah. I'm not sure exactly how we'd make that distinction, though. :-/
22:26 jekstrand: jenatali: It's really DXIL-specific.
22:26 jenatali: jekstrand: Yeah, that's fair
22:26 jenatali: Regardless I don't think your patch breaks us
22:26 jenatali: I think we'll just keep using the lowering passes we were using, just switch ubo to constant
22:26 jekstrand: jenatali: Yeah, worst-case, you can do what you're doing now and just s/mem_ubo/mem_constant/ and call it a day.
22:26 jenatali: And file an issue to use fewer SSBOs for inline constants :P
22:27 jekstrand: hehe
22:27 karolherbst: "Pass 98" :) only 8 tests to go :D
22:27 jekstrand: karolherbst: \o/
22:27 jenatali: karolherbst: Nice! What's left?
22:28 karolherbst: 4 are async
22:28 karolherbst: vload/vstore private It hink
22:28 jenatali: Ah right, that reminds me, I need to either add those to the libclc series, or create a followup MR
22:28 karolherbst: kernel_numeric_constants... for stupid reasons
22:28 karolherbst: ehh wait
22:28 karolherbst: private is fine
22:28 karolherbst: kernel_numeric_constants, kernel_preprocessor_macros, parameter_types and prefetch...
22:29 karolherbst: I think prefetch is a regression
22:29 jenatali: You're failing prefetch...?
22:29 karolherbst: I didn't before
22:29 jenatali: Vtn literally no-ops that one I thought
22:30 karolherbst: yeah.. something is up with prefetch.. mhh
22:30 HdkR: prefetch isn't real
22:31 karolherbst: let's check if that used to work
22:35 karolherbst: yeah.. jekstrand broke prefetch :p
22:35 karolherbst: or maybe it was me? mhhh I only have the cosntant mem stuff though.. let's see
22:36 jenatali: karolherbst: Broke how? Just wrong data?
22:36 karolherbst: well... not only that, I also get invalid pointers and stuff...
22:36 karolherbst:starts bisecting
22:37 curro_: karolherbst: seems pretty straightforward... in order to avoid the leak i guess you should store the pointer in exec_context and then delete it on unbind
22:37 karolherbst: yeah, although I think we need to wait until the kernel is done?
22:38 karolherbst: didnt really thought about on how to manage the buffer
22:38 curro_: not really, the driver should be refcounting
22:38 karolherbst: at this point it's also just about the rough idea :)
22:38 karolherbst: ahh, right
22:40 curro_: karolherbst: instead of adding a new functor you could simply modify the existing id_equals to take a section type in addition, since every user of id_equals is going to have to check the section type necessarily as soon as we have sections of different types with the same id
22:41 curro_: karolherbst: good that you found the data_constant section type, i defined it for cases like this ;)
22:41 karolherbst: :)
22:41 karolherbst: yeah.. I assumed that was the case
22:42 karolherbst: sadly nothing makes use of it yet
22:42 karolherbst: and I was wondering how that's dealt with in the llvm case
22:42 karolherbst: but I guess r600/radeonsi just deal with it themselves
22:43 curro_: i have no clue how radeons deal with this, maybe they emit an if ladder into the shader or copy the whole constants into a temporary array and then use indirection on that? :P
22:44 karolherbst: I think they just consume llvm directly and deal with it inide their binary parser
22:44 karolherbst: but yeah... no clue
22:44 karolherbst: maybe nobody got that far yet?
22:46 karolherbst: jenatali: ehh.. it's my local alignment patch
22:46 karolherbst: it sets 0x3 for... char3
22:46 jenatali: That's now a power of 2
22:46 jenatali: not*
22:46 karolherbst: well.. :p
22:46 karolherbst: yeah
22:48 curro_: isn't it? seems like 2^1.584... :P
22:48 jenatali: :P
22:50 karolherbst: ahh yeah.. spirv/invocation.cpp doesn't fix up vec3 ...
22:52 karolherbst: mhh.. still something wrong
22:52 karolherbst: but at least the alignment is correct now
22:58 karolherbst: okay.. that's a real regression
22:58 karolherbst: and I suspect it's that other issue I hit before
22:58 jenatali: What's going on?
22:58 karolherbst: mhh.. nir_lower_vars_to_ssa doing something incorrectly
23:00 karolherbst: jekstrand: mind grabbing my cl_wip branch and check it test_basic prefetch passes?
23:01 karolherbst: jenatali: what's weird is that it only fails for vectors of size 32 bytes or more
23:01 karolherbst: so char16 is fine
23:01 karolherbst: short8 is fine
23:01 jenatali: That is odd...
23:02 karolherbst: short16 is not :)
23:02 karolherbst: yeah..
23:02 karolherbst: something funky
23:02 jenatali: I've never seen anything like that
23:02 karolherbst: mhh... it bisect to my alignment patch again :/
23:03 karolherbst: mhhh
23:04 karolherbst: ohh ehhh
23:04 airlied: jekstrand: what dep is fedora missing , I know there was some libdrm breakage at one point, have to check if that got fixed
23:08 jekstrand: airlied: I don't know anymore. I've completely stopped trusting it. It's almost always missing X deps
23:08 jekstrand: I'm not sure what else.
23:08 jekstrand: airlied: Grab a vanilla Fedora install, run a build-dep and then try to build mesa
23:09 karolherbst: ehhhh
23:09 karolherbst: I found it :/
23:09 airlied: jekstrand: yeah I do it quite regularly
23:09 airlied: jekstrand: which is why I wondered :-)
23:09 jenatali: karolherbst: What is it?
23:09 airlied: but therer was some breakage where freedreno dep leaked onto x86
23:09 jekstrand: airlied: Ok, maybe it's been mostly fixed these days. I find I'm always missing something. But I just add that something to my "install mesa deps" script, move on with life, and forget about it.
23:09 karolherbst: jenatali: I applied alignment into the kernel input buffer...
23:09 karolherbst: the wrong one
23:10 jenatali: :P whoops
23:10 airlied: jekstrand: like it would be difficult for fedora to build mesa if it didn't have all the deps it needed
23:10 jekstrand: airlied: https://github.com/jekstrand/linux-setup-scripts/blob/master/fedora/mesa-devel.sh#L27
23:10 jekstrand: airlied: Likely some of that is out-of-date. However, I had to add new things when I did a fresh F32 install a couple months ago.
23:10 airlied: jekstrand: wierd they should all be there with mesa builddep
23:11 airlied: jekstrand: maybe because the mesa builddep line failed and didn't install anything? :-)
23:12 jekstrand: airlied: I don't think so
23:12 karolherbst: nice nice nice :)
23:13 karolherbst: I was never that close :)
23:13 airlied: https://paste.centos.org/view/5cca0470
23:13 airlied: seems to have averything on the list
23:14 airlied: or will drag em all in, like clang-devel should pull in llvm-devl
23:14 jekstrand: airlied: Maybe it's all good these days? Or maybe I'm always running a bleeding-edge and there's a new dep? I don't know.
23:15 karolherbst: "Pass 99 Fails 2 Crashes 1 Timeouts 4" :)
23:15 jenatali: jekstrand: Should I go ahead and Marge !6393? Or should I hold off to see if anybody else chimes in?
23:15 karolherbst: kernel_numeric_constants complains that __IMAGE_SUPPORT__ is defined but we don't support images...
23:15 karolherbst: kernel_preprocessor_macros: same
23:15 jenatali: karolherbst: Yeah, LLVM explicitly always defines that one
23:15 karolherbst: parameter_types: crashes inside nouveau (codegens faul)
23:15 jenatali: Guess you just have to support images ;)
23:15 jekstrand: jenatali: Maybe leave it for 24 hours
23:15 karolherbst: 4 timesouts are async :)
23:16 karolherbst: jenatali: yeah...
23:16 jenatali: jekstrand: Sounds good
23:16 karolherbst: my point was just that 6 of the fails I can't do nothing about (at this point) :D
23:16 karolherbst: but I guess I should fix the nouveau bug
23:16 karolherbst: jekstrand: mind giving my cl_wip branch a try and see if the results are similiar on iris?
23:16 karolherbst: uhm..
23:16 karolherbst: do I need patches for iris?
23:16 jenatali: karolherbst: You can pass a -U on the LLVM command line to undefine the __IMAGE_SUPPORT__ I think
23:17 karolherbst: or can I just enable CL for iris locally?
23:20 karolherbst: ahh.. I know :/
23:20 karolherbst: ehhh
23:25 karolherbst: airlied: what was the CL env var for llvmpipe again?
23:26 karolherbst: ahh.. cl
23:27 karolherbst: why does the test run faster on llvmpipe :D
23:27 karolherbst: unfair
23:31 jekstrand: karolherbst: That's sad
23:31 airlied: karolherbst: LP_DEBUG=cl
23:31 airlied: but it only works in debug builds
23:31 airlied: I might change that to be a separate env var
23:32 jekstrand: karolherbst: Uh... what am I trying out?
23:32 jekstrand: Yes, you likely do need iris patches
23:32 karolherbst: airlied: llvmpipe crashes quite a lot though
23:33 karolherbst: jekstrand: just clctsrunner.py -i basic
23:33 karolherbst: runs all the basic tests :)
23:34 jekstrand: karolherbst: Building now
23:34 karolherbst: airlied: mhh.. I know why llvmpipe is messed up .. *sigh*
23:35 karolherbst: !5891
23:36 jekstrand: karolherbst: Pass 0 Fails 106 Crashes 0 Timeouts 0
23:36 jenatali: :D
23:36 karolherbst: ehh...
23:36 jekstrand: karolherbst: I think something's wrong. :)
23:36 jenatali: Success!
23:36 karolherbst: you are doing it wrong :p
23:37 karolherbst: but yeah..
23:37 airlied: karolherbst: the basics should work, but I haven't pushed on it much lately
23:37 airlied: at least what's in master passed abunch of piglits yesterady
23:37 karolherbst: airlied: Pass 57 Fails 41 Crashes 3
23:37 airlied: yeah that seems about right
23:37 karolherbst: ehhh :D
23:37 karolherbst: okay
23:38 airlied: getting vulkan conformance is next on my plate, then CL :-P
23:38 jenatali: jekstrand: Fun fact - the CL extension for mipmaps doesn't add versions of the size queries that can actually pass an LOD value...
23:38 jekstrand: jenatali: It does for C++
23:38 jenatali: Ahhh
23:39 jekstrand: Because C++ on a GPU is such a good idea!
23:39 airlied: it's the only idea :-P
23:40 airlied: at least that's what SYCL is in the end, and cuda might as well be
23:40 airlied:thinks we need more fortran
23:40 karolherbst: jenatali: some of you should take a look at https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5891
23:40 karolherbst: and check if the lowering is enabled for all drivers
23:40 jekstrand: karolherbst: Pass 60 Fails 22 Crashes 18 so far
23:41 karolherbst: jenatali: that lower_cs_global_id_from_local ting
23:41 karolherbst: I fixed it for nouveau and llvmpipe now :)
23:41 jenatali:sighs
23:41 jenatali: Yeah, probably needs a more thorough pass
23:41 karolherbst: jekstrand: not bad :)
23:42 jenatali: karolherbst: It did pass CI though, so I guess that means we don't have any compute that uses global IDs in CI on those drivers
23:42 karolherbst: yeah... I think it only matters once you enable the offsets
23:42 jenatali: Nah, you'll get the global_id intrinsic even without offsets if that bit isn't set
23:42 karolherbst: mhhh
23:42 jenatali: That's the whole reason for that bit, so we can get that for DXIL
23:42 karolherbst: I see
23:43 jenatali: I do wonder if it should be negated though so existing drivers get lowered by default
23:43 jenatali: It'd have to be some weird contorted name though...
23:43 karolherbst: airlied: ahh yeah.. llvm complains a lot :)
23:44 karolherbst: jekstrand: any idea what's the main reason for crashes/fails in iris?
23:44 jekstrand: karolherbst: I'm seeing lots of image tests in here that are, of course, crashing
23:45 karolherbst: ohh..
23:45 karolherbst: I guess images are enabled for you then
23:45 karolherbst: that should explain the difference then
23:45 karolherbst: I have them disabled :)
23:45 jekstrand: How do I disable them? PIPE_COMPUTE_CAP_IMAGES_SUPPORTED?
23:45 karolherbst: yeah.. I think
23:45 karolherbst: it's a clover only CAP :)
23:46 karolherbst: we have a bunch of them and drivers just enable random bits because it sounds right :D
23:46 karolherbst: anyway.. I need sleep now :)
23:46 jekstrand: karolherbst: 67% done already
23:47 jekstrand: Pass 88 Fails 8 Crashes 4 Timeouts 0 so far
23:47 karolherbst: I guess images is something we should really take care next
23:47 karolherbst: nice
23:47 karolherbst: you will get 4 timeouts for the async stuff
23:48 jekstrand: How long do they take to timeout?
23:48 karolherbst: uhm.. depends on what is set
23:48 karolherbst: but I think it's like 3 minutes or so
23:48 jekstrand: Final score (sans timeouts): Pass 89 Fails 9 Crashes 4 Timeouts 0
23:48 jekstrand: Looks like it's a tiny bit worse than nouveau
23:48 karolherbst: :)
23:48 karolherbst: not bad though
23:49 jekstrand: I'm getting fails for vload/vstore
23:49 karolherbst: given that I only tested on nouveau so far
23:49 karolherbst: mhh
23:49 karolherbst: which ones?
23:49 jekstrand: Here's my crash list:
23:49 jekstrand: basic hiloeo
23:49 jekstrand: basic vload_private
23:49 jekstrand: basic vstore_private
23:49 jekstrand: basic work_item_functions
23:49 karolherbst: do you have scratch memory implemented?
23:50 jekstrand: Yeah, but it's asserting on alignments
23:50 jekstrand: Drp... Stupid bug
23:51 karolherbst: anyway.. will go to bed now :) would be cool to get the offset, the spirv/clover cleanup merged
23:51 karolherbst: that would get rid of a couple of local patches
23:52 karolherbst: anyway, good night
23:52 jekstrand: g'night