01:07 jenatali: jekstrand: Go for it; that's on my list to get to at some point, but I wouldn't complain about you doing it for me :P
01:07 jenatali: pmoreau was looking for it earlier too, for 32bit offsets with 64bit pointers for shared memory
11:24 karolherbst: pmoreau: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6333
11:25 karolherbst: mind playing around with that and with your spirv rework stuff so we can start passing a list of extensions?
11:26 karolherbst: just want to see how well that works out
11:31 pmoreau: karolherbst: Sure! I added a similar patch to my branch yesterday as we talked about it, but haven’t pushed the updated branch yet. I didn’t run into any additional issues yesterday when running the CTS.
11:31 karolherbst: yeah.. the only thing annoying about it is that it hides it and we won't notice.. but maybe that's also fine :p
11:31 pmoreau: I even went ahead and removed the `dev` parameter of that function, as it is no longer used.
11:32 karolherbst: mhh.. local is still broken for some reasons
11:34 karolherbst: pmoreau: ahh.. right
11:34 karolherbst: I can do that as well :D
11:34 pmoreau: I updated my nv50_compute_support branch; it now has the shared mem patches from yesterday + a few others that jenatali pointed out, and a WIP for cl_khr_extended_versioning.
11:34 karolherbst: pmoreau: what I was mainly wondering about if the hash_set is good enough or if that's rather annoying
11:35 pmoreau: No idea
11:35 pmoreau: I need to dig into NIR at some point, but so far I have only looked a bit at the code around when it was crashing/asserting when running the CTS.
11:37 karolherbst: pmoreau: I meant more in regards to the translator API :p
11:38 pmoreau: Oh, I was thinking about the `hash_deref()` from NIR 🤦
11:39 karolherbst: :D
11:40 pmoreau: Yeah, I was wondering which data structure might be best.
11:40 karolherbst: mhhhh
11:40 karolherbst: "SHARED_MEMORY_SIZE : 0x800" mhhh
11:40 karolherbst: but still OOR_ADR
11:40 pmoreau: Sounds better than the 0x0 from yesterday :-)
11:40 karolherbst: I mean.. the other test worked
11:41 karolherbst: I bet something weird is going wrong
11:41 pmoreau: Do you know if the address it’s trying to access (and fail) is within those 0x800?
11:41 karolherbst: no clue
11:41 karolherbst: but I guess not
11:41 karolherbst: let me see
11:42 karolherbst: ehhh...
11:42 pmoreau: If only we had a trap handler, where one could look at the regs and figure out which address it is… O:-)
11:42 karolherbst: tmp_sum[tid + lsize - 1]
11:42 karolherbst: that's always.. well
11:42 pmoreau: What is that lsize doing there?
11:42 karolherbst: int lsize = get_local_size(0)
11:42 pmoreau: I’m assuming that’s the total size of allocated shared mem?
11:43 karolherbst: mhhh
11:43 karolherbst: it's guarded by "tid < lsize/2"
11:43 pmoreau: Which kernel is it again?
11:43 karolherbst: oohhh and there is a tid == 0...
11:43 karolherbst: why is CTS code so annoying :D
11:43 karolherbst: local_arg_def
11:44 pmoreau: Thanks
11:44 pmoreau:sent a long message: < https://matrix.org/_matrix/media/r0/download/matrix.org/BFOIemQyyZeLBKpVESaSMYic/message.txt >
11:45 karolherbst: so "test_api min_max_local_mem_size" still passes
11:46 pmoreau: Ah, tid is the ID within the block and not the grid, I see
11:46 karolherbst: lol..
11:47 karolherbst: ahh.. nvm
11:47 karolherbst: I thought "NV50_PROG_SCHED=0" fixed it, but that was still the other test
11:47 pmoreau: 🙃
11:48 karolherbst: pmoreau: let's see what happens in 32 bit mode :D
11:48 karolherbst: still fails, but now the kenrel is simplier
11:48 pmoreau: Ah, so if the `lsize` is odd, then the first thread in the block will add the last element of the block.
11:49 pmoreau: Okay, makes sense
11:50 karolherbst: I am wondering how the address can end up being messed up mhh
11:50 karolherbst: ehhh
11:50 karolherbst: NV50_PROG_OPTIMIZE=0 fixes it
11:50 pmoreau: :o
11:51 karolherbst: let's see...
11:51 pmoreau: Is there a way to run the VK-GL-CTS for OpenGL in headless mode?
11:51 karolherbst: =1 breaks it already
11:51 karolherbst: pmoreau: more or less
11:51 karolherbst: --deqp-visibility=hidden or something
11:51 karolherbst: but still requires some windowing system
11:51 karolherbst: it just makes it less annoying :D
11:52 pmoreau: :-/
11:53 karolherbst: ehh.. I bet something moves things around barriers or shit
11:54 karolherbst: fun.. enabling const folding made it break
11:54 karolherbst: that smells like something stupid
12:55 ccr: hmm. ../src/compiler/spirv/vtn_cfg.c:1335:10: error: implicit declaration of function ‘env_var_as_boolean’ [-Werror=implicit-function-declaration]
12:55 karolherbst: ehh
12:55 karolherbst: ccr: anything special about your env?
12:56 ccr: what counts as "special"? :) -D b_ndebug=true -D platforms=x11 -D dri-drivers=i965 -D gallium-drivers=zink,iris -D vulkan-drivers=intel -D tools=glsl
12:56 karolherbst: let's see...
12:58 karolherbst: jekstrand: I guess env_var_as_boolean is debug only :/
13:00 karolherbst: *sigh*
13:01 ccr: :/
13:02 karolherbst:likes the idea of not having to throw in pointless #ifndef NDEBUG and stuff...
13:03 pendingchaos: karolherbst, ccr: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6331
13:04 pendingchaos: debug.h declares env_var_as_boolean in debug and release builds?
13:04 karolherbst: ohh, good to know
13:05 pendingchaos: might make sense to remove the "#ifndef NDEBUG" around the debug.h include in nir.h
13:05 karolherbst: it might
13:06 karolherbst: orbea: mind allowing push access on that MR and stuff?
13:12 karolherbst: but I am also a fan of explicitly stating all header dependencies in each file.. so I think that MR is good enough
13:14 karolherbst: pmoreau: it's indeed constant folding...
13:14 karolherbst: shl (SUBOP:1) s32 %r141 %r139 %r133 (0) turning into mul (SUBOP:1) s32 %r141 %r137 %r193 (0)
13:14 karolherbst: ...
13:15 karolherbst: nasty bug
13:28 pmoreau: karolherbst: Mmh… weird!
13:28 pmoreau: Slowly getting to getting the OpenGL CTS running automatically on one of my computers, but still having issues with getting the CTS to not crash while detecting the displays. `glxinfo` and `glxgears` are able to connect to the Xorg server running, but it seems that the CTS is failing.
13:28 karolherbst: pmoreau: top two commits: https://gitlab.freedesktop.org/karolherbst/mesa/-/commits/cl_wip
13:31 karolherbst: actually.. I think the shl+shr could be a biter better
13:32 karolherbst: oh well...
15:11 Vanfanel: pq: hi! You and emersion helped me a lot the other day... So I come to you with new questions after trying to solve or understand these things myself. For example, if I am updating a PRIMARY plane and a CURSOR plane on the same atomic commit, do I need to set the OUT_FENCE_PTR and IN_FENCE_FD props of the CURSOR plane? I set them in the PRIMARY plane to sync KMS with GPU (in-fence) and to sync program
15:11 Vanfanel: with KMS (out-fence), but I fail to see if I need different fences for the CURSOR plane...
15:12 FireBurn|Home: ./mesa-9999/src/compiler/spirv/vtn_cfg.c:1335:10: error: implicit declaration of function 'env_var_as_boolean' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
15:15 kisak: FireBurn|Home: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6331
15:17 FireBurn|Home: Thanks
15:18 karolherbst: jenatali: I thought I put my cts runner in git: https://gitlab.freedesktop.org/karolherbst/opencl_cts_runner
15:18 karolherbst: should make it a bit less annoying to use, but at least tests like test_basic run way faster
15:19 karolherbst: should disable wimpy mode :D
15:31 Vanfanel: pq: My understanding is that IF the fences are set on the PRIMARY plane (for GPU <-> DISPLAY synchronization) then there's no need to set them on the CURSOR plane, because we are syncing on the PRIMARY plane, right?
15:42 jekstrand: karolherbst: Really?
15:43 jekstrand: karolherbst: Doesn't look debug-only to me
15:43 karolherbst: yeah.. my mistake, but the header include was guarded
15:43 karolherbst: and all other users
15:43 karolherbst: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6331
15:44 karolherbst: or we remove the guards or whatever
15:44 karolherbst: anyway, fails to build on release builds
15:44 jekstrand: karolherbst: rb
15:44 jekstrand: karolherbst: How did that pass CI? Do we not do release builds?
15:44 karolherbst: seems like we don't
15:44 jekstrand: *sigh*
15:45 karolherbst: yeah well.. feel free to add jobs testing release builds :p
15:53 karolherbst: ehh.. what can we do if we want to merge something, but marge can't? ... :/
15:58 pq: Vanfanel, setting OUT_FENCE_PTR or IN_FENCE_FD doesn't make sense if you don't know what to do with them. You need to know when you need them. That depends on how you get the buffers and what you do with them afterwards.
15:59 pq: Vanfanel, you're not syncing commits, you are syncing the use of the each buffer. If your next update does not touch the cursor plane, then you can't use the primary plane's out-fence to see when the old cursor buffer is free again.
16:00 pq: the fences are about the buffers on each place independently
16:01 pq: *plane
16:18 Vanfanel: pq: For the cursor plane, I simply set the CRTC_ID and FB_ID props. In FB_ID I simply pass the fb_id from a GBM BO containing the cursor, and I don't have any other buffer read by the cursor plane, just that single cursor plane.
16:19 Vanfanel: So I don't see why should I set OUT_FENCE_PTR and IN_FENCE_FD with the cursor plane, that's why I asked: because I don't see an use for them in that case
16:20 Vanfanel: This is the function I use to set the cursor on the cursor plane: https://github.com/vanfanel/SDL/blob/ea12c878dbb1b0a1824fcd75de68c2d9bd44e5b3/src/video/kmsdrm/SDL_kmsdrmvideo.c#L434
16:21 Vanfanel: pq: It USUALLY works, but when combined with this request set, the following atomic commit fails --> https://github.com/vanfanel/SDL/blob/ea12c878dbb1b0a1824fcd75de68c2d9bd44e5b3/src/video/kmsdrm/SDL_kmsdrmvideo.c#L388
16:23 Vanfanel: pq: I thought it could be because I was using the same props array for both planes, as you noticed the other day, but I have rewritten my add_plane_property() fn so that is not possible anymore, because I directly pass the plane whose props are to be modified: https://github.com/vanfanel/SDL/blob/ea12c878dbb1b0a1824fcd75de68c2d9bd44e5b3/src/video/kmsdrm/SDL_kmsdrmvideo.c#L195
16:32 marcodiego: I'm planning to compile mesa3d on my rockpi4 and want a good balance between stability and newer features/support. What is more recommended 2.0.8 or 20.1.5? Also, I know oibaf ppa, but want to compile it myself.
16:34 jenatali: jekstrand: Need anything else from me for !6330? I don't think an r-b makes sense for patches that I authored
16:35 kisak: marcodiego: it's easier to think of that in terms of support. Mesa 20.0 has reached its end of life and if you encounter any issues, you'll be asked to retest with a mesa 20.1 or git-master.
16:36 kisak: ^ a current 20.1 point release
16:37 kisak: and the same will be true with 20.1 in about a month ( https://docs.mesa3d.org/release-calendar.html )
16:37 marcodiego: kisak, hmmm... so, I'll try 20.1. Is it easy to compile on armbian buster? I mean, does it need any non packaged dependencies? Is it easy to generate a .deb package so I can install/uninstall it with dpkg?
16:39 kisak: sorry, I haven't played around with debian arm packages to know if there's quirks. You should be able to start with the debian folder at https://salsa.debian.org/xorg-team/lib/mesa/-/commits/debian-unstable and go from there
16:40 marcodiego: kisak, Thanks!
16:53 jenatali: karolherbst: Out of curiosity, does Clover support creating a context with multiple devices (from different Gallium drivers)?
16:54 marcodiego: I got meson to successfully "configure" the compilation, but is says: "Gallium drivers: kmsro v3d vc4 freedreno etnaviv nouveau tegra virgl lima panfrost swrast". How do I choose which Gallium drivers I don't want to compile?
17:25 pmoreau: jenatali: No idea if anyone ever tried that. I haven’t even tried a context with multiple devices from the same Gallium driver.
17:25 jenatali: :D
17:25 jenatali: pmoreau: I was asking because I noticed test_multiples was in the cts_runner karolherbst linked above
17:25 jenatali: And it tries to do that
17:26 pmoreau: Mmh, I will have to try it
17:26 pmoreau: I got two NVIDIAs in this laptop, so I could try
17:26 pmoreau: (Not know though)
17:27 pmoreau: marcodiego: `meson configure -Dgallium-drivers="r300,r600"` for example should only build those, IIRC how it works
17:29 marcodiego: pmoreau, Thanks! I'd already found it. Compilation is about 50% complete. Do you know how do I generate a debian package after it is compiled?
17:30 karolherbst: jenatali: I think so, yes
17:30 jenatali: Cool :)
17:30 pmoreau: Not working so hot… :-/ https://hastebin.com/manuvoyesi.coffeescript
17:30 pmoreau: But it did manage to create a context
17:32 karolherbst: ehh.. :/
17:32 jenatali: > assert(0); // XXX -- resource shared among dev and r.dev
17:33 karolherbst: pmoreau: mind reviewing https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6333 so we can merge it?
17:33 jenatali: No worries, was just wondering if that was hooked up - it was a pain to implement in CLOn12
17:33 karolherbst: should unblock a few CTS tests
17:34 pmoreau: karolherbst: Sure, will do a bit later
17:35 karolherbst: jenatali: I only have 10 fails btw in basic :)
17:36 jenatali: karolherbst: Nice!
17:36 karolherbst: 4 constant related, 6 private :)
17:36 jenatali: Do you implement async copy?
17:36 karolherbst: nope
17:36 jenatali: Is that 1.2?
17:36 pmoreau: karolherbst: In “Clover: rework handling of spirv extensions”, I would also remove the first parameter to `check_extensions()` as unused now.
17:36 karolherbst: no idea
17:36 karolherbst: pmoreau: k
17:37 jenatali: Ah, 1.2
17:37 jenatali: Er, 1.1
17:38 pmoreau: I think async_copy is 1.0 even
17:39 jenatali: Oh, you're right, the strided version is 1.1
17:39 pmoreau: event_t async_work_group_copy is 1.0
17:39 jenatali: karolherbst: There's async copy tests in test_basic, so not sure why you're not hitting those
17:40 karolherbst: ohh I hit those as well :D
17:40 karolherbst: https://gist.github.com/karolherbst/9ba5e89fde164f50f7c8a14db3994ea2
17:41 jenatali: Timeout?
17:41 karolherbst: yeah..... stupid reasons
17:41 karolherbst: seems like if I run tests concurrently the async ones hang somewhere for whatever reason
17:41 jenatali: Interesting. They won't pass vtn btw
17:41 karolherbst: I use my CTS runner which also timeouts tests and stuff
17:42 karolherbst: yeah.. they don't compile here either
17:42 jenatali: You'll need !6035 with all the upstream libclc MRs
17:42 karolherbst: I think I'll ignore async for now
17:42 karolherbst: I'd rather get private memory working :)
17:42 jenatali: Good idea :P you'll get it for free eventually
17:42 karolherbst: and then constant
17:42 onox: does mesa automatically provide VA-API support for any gallium driver or does the driver need to provide explicit support? (I'm missing iris_drv_video.so in mesa-va-drivers)
17:44 karolherbst: pmoreau: I think the translator needs a simple list or something, no?
17:45 karolherbst: ohh
17:45 karolherbst: it has a map
17:45 karolherbst: let's see
17:45 pmoreau: It’s not a list, it has a separate bool attribute for each extension AFAIR
17:45 pmoreau: I have a patch somewhere
17:46 karolherbst: ehhh
17:46 karolherbst: std::map ... *ugh*
17:48 karolherbst: but I think all STL containers are bad.. so whatever
17:48 pmoreau: karolherbst: https://gitlab.freedesktop.org/pmoreau/mesa/-/commit/5c65e79a8199af925871dc84f51a4aba9765acec
17:48 pmoreau: Probably can do a lot better
17:49 karolherbst: yes...
17:49 karolherbst: I really just want to pass in some list or something
17:49 karolherbst: but uff
17:49 karolherbst: the intel extensions are all annoying
17:50 karolherbst: pmoreau: honestly? I'd just ignore all of that until we actually support it
17:50 karolherbst: no point in adding the code
17:50 pmoreau: Sure
17:50 karolherbst: mhhh... so we have to pass in a map
17:52 karolherbst: pmoreau: okay.. let's do this: we also do the string -> bool map and fail on extensions we know a device doesn't support
17:52 karolherbst: mhhh
17:53 karolherbst: yeah...
17:53 karolherbst: I have an idea
17:58 karolherbst: pmoreau: mind if we get your spirv stuff in first? then I could write the stuff to declare extensions we support ...
17:58 pmoreau: Yes, that would be best
17:58 karolherbst: what's holding it up btw?
17:59 pmoreau: Review from curro_ I’m guessing?
18:02 karolherbst: I don't think that's required here
18:03 karolherbst: pmoreau: mind if we just expose spirv 1.0 up to 1.5?
18:04 karolherbst: or is there something we don't support yet?
18:07 karolherbst: ehh there are some spirv extensions mhhh
18:12 pmoreau: I think we should just keep it at 1.0 for now.
18:13 karolherbst: yeah.. I guess
18:21 karolherbst: pmoreau: I think we need a per device map :/
18:21 karolherbst: or something
18:21 karolherbst: or a list
18:21 karolherbst: but.. mhh
18:21 karolherbst: also a way to enable something for all devices...
18:25 karolherbst: pmoreau: std::unordered_set<std::string> supported_spirv_extensions on the device? and do cap checking whatever in there?
18:32 karolherbst: pmoreau: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6333
18:40 imirkin: is there a connection between "msc" values and current time?
18:40 imirkin: or is it all just relative to some arbitrary moment in time?
18:42 ajax: msc is one tick per channel sample (frame time, in video), but the 0 point is whenever
18:43 ajax: consistent within a session, is all it guarantees
18:44 imirkin: ajax: ok. what's the general usage pattern? i.e. do you query the current msc and then say you'd like something to be displayed at msc + fixed time offset?
18:45 imirkin: (i'm not trying to get fancy, just a very basic understanding, ahead of figuring out why events get stuck in nouveau ddx)
18:48 ajax: imirkin: SwapBuffersMSC has three parameters here, target divisor and remainder. if target > current msc, then that's when the swap will post; if target < current msc (typically 0), then the swap happens when msc % divisor == remainder
18:49 ajax: https://www.khronos.org/registry/OpenGL/extensions/OML/GLX_OML_sync_control.txt for the gories. egl and vk and present all have similar semantics
18:50 imirkin: ajax: more looking at the Xorg stuff - does that match up pretty directly?
18:50 ajax: yes
18:50 imirkin: i.e. DRI2 and PRESENT protocols
18:50 ajax: or, is certainly intended to
18:50 imirkin: ok cool
18:51 ajax: i think we gave present enough of an escape hatch to be able to do GLX_EXT_swap_control_tear, not that anyone's wired that up yet. but the basic functionality is straight outta OpenML
18:51 imirkin: ajax: nouveau ddx basically queues events onto a list, and then when they happen, deletes them from the list
18:51 imirkin: from what i can tell, it seems like that list just grows over time
18:51 imirkin: i was hoping to figure out, programmatically, which events are "old" and just dump them
18:51 imirkin: i guess i should have a per-crtc msc from the kernel?
18:53 ajax: it's the vblank interrupt, yes
18:53 karolherbst: imirkin: is the list sorted? so maybe all events up to the latest one could simply be removed?
18:53 karolherbst: well. up to the latest one which happened
19:14 karolherbst: ehhh.. I forgot the CTS to be broken :/
19:17 imirkin: karolherbst: all events are added sequentially (either to front or back, but consistently)
19:18 imirkin: karolherbst: but you could put in a request for a far-future event, so it's not ordered by msc
19:18 imirkin: hmmmm
19:18 imirkin: however i could store the msc at the event level and discard based on that or something?
19:18 karolherbst: I doubt that keeping them ordered would be too much work
19:18 imirkin: anyways ... more investigation required. the only obvious way i see for this to happen is if the nouveau kernel driver does something dodgy (like drop events)
19:19 karolherbst: imirkin: but I think if an event arrives, all requests to earlier points than the arrived one could all be dropped, or not?
19:19 imirkin: but i have no idea, nor do i have a consistent repro. so mostly just trying to understand wtf it's even trying to do in the first place.
19:19 imirkin: karolherbst: yeah, maybe
19:19 karolherbst: I am wondering what promises we get here
19:20 imirkin: it's _mostly_ working fine, but sometimes some application (chrome, looking at you here) does something which confuses everything
19:20 imirkin: and then the list jumps, and our strategy of finding events by just running linearly through the list becomes less algorithmically appealing :)
19:20 imirkin: works well when list size is 1 or 2. a lot worse when it's 1000.
19:21 imirkin: anyways, this is all good info. helps me organize it in my head a bit.
19:21 imirkin: thanks ajax
19:21 imirkin: i had hoped to live a vblank-knowledge-free life, but i guess it's not to be
19:23 karolherbst: jenatali: seems like the CTS is stupid and its return code means nothing
19:26 karolherbst: pmoreau: can we make clovers debuging stuff somehow so it isn't the worse kind of debugging?
19:29 jenatali: karolherbst: Yeah, the official CTS runner searches for "FAILED" in the output, it's awesome
19:29 karolherbst: nice...
19:29 karolherbst: jenatali: https://gist.github.com/karolherbst/9932fac844642fa70eb956ef0a44df77
19:29 karolherbst: :D
19:29 karolherbst: hope that helps
19:29 karolherbst: ...
19:35 karolherbst: ahhhh
19:35 karolherbst: why is stuff always so annoying
19:38 karolherbst: ohh, we could make use of LLVMSPIRVExtensions.inc mhhh
19:47 karolherbst: pmoreau: mind if I push to your branch?
19:51 karolherbst: pmoreau: https://gitlab.freedesktop.org/karolherbst/mesa/-/commits/clover_spirv_support_improvements
19:51 karolherbst: I think the last patch is the way it's the least amount of work
20:23 karolherbst: jenatali: do you have a fix for the alignment of vectors in local memory?
20:23 karolherbst: test_basic kernel_memory_alignment_local
20:29 jenatali: Uh... I don't think so
20:29 jenatali: I thought that just worked for us
20:29 jenatali: karolherbst: What's the failure look like?
20:32 karolherbst: "Vector size 2 failed: 0x1 is not properly aligned." and so on
20:32 karolherbst: but that could be a driver issue...
20:37 pmoreau: <karolherbst "Pierre Moreau: mind if I push to"> Nope, go for it!
20:38 karolherbst: cool
20:40 karolherbst: anyway, your patches worked the issue as well of course, but with my patches the translator uses the no_wrap extenions now as well :)
20:40 karolherbst: I kind of dislike the macro magic, but that's what you get if APIs use enums instead of strings :p
20:41 karolherbst: oh well...
20:56 karolherbst: jenatali: do you have a isnormal instruction in dxil?
20:57 jenatali: karolherbst: Yeah - adding that's on my to-do list
20:57 karolherbst: mhhh
20:57 karolherbst: how is that even implemented in hw.. :D
20:57 karolherbst: that's one of the more annoying ones I think
20:57 jenatali:shrugs
20:57 jenatali: https://gitlab.freedesktop.org/kusma/mesa/-/merge_requests/166/commits
20:57 jenatali: https://gitlab.freedesktop.org/kusma/mesa/-/merge_requests/167
20:58 jenatali: And isnormal is in this series: https://gitlab.freedesktop.org/kusma/mesa/-/merge_requests/116/commits
20:59 karolherbst: jenatali: this is what nvidia generates: https://gist.github.com/karolherbst/a9d6a7bcb175bcb3b23e6222e6186930
20:59 karolherbst: ehh.. the xmads are confusing
20:59 karolherbst: those are 16x16+32 imads essentially
21:00 karolherbst: updated with volta ISA :D https://gist.github.com/karolherbst/a9d6a7bcb175bcb3b23e6222e6186930
21:00 karolherbst: no idea if it has to be this much
21:04 karolherbst: imirkin: can you think of a smarter way of checking if a float is normal on nv hardware?
21:18 jenatali: That seems like a lot... shouldn't it just be a matter of a couple bitmasks and comparisons?
21:18 karolherbst: no idea
21:19 karolherbst: ptx doesn't have isnormal so it could be that the clc to ptx bits are already messed up
21:20 karolherbst: ohhh
21:20 karolherbst: they have it
21:21 karolherbst: testp.f32.normal
21:21 karolherbst: ehh.. I guess their CL stack is just weird
21:21 karolherbst: or there are special precision requiernments
21:22 karolherbst: jenatali: ohhh.. I messed up
21:22 karolherbst: most of the code was global_linear_id shit
21:22 jenatali: :P Ok that makes more sense
21:23 karolherbst: https://gist.github.com/karolherbst/a9d6a7bcb175bcb3b23e6222e6186930
21:23 karolherbst: that looks way better :D
21:24 karolherbst: that I2F is a bit odd.. but oh well
21:24 karolherbst: checking ig testp.normal gives me something better
21:24 jenatali: Much better
21:24 jenatali: If you want the intrinsic and vtn hookup, feel free to pick those patches - or just rewrite them, they're trivial
21:25 karolherbst: yeah.. let's see
21:27 jenatali: You'll also need this one: https://gitlab.freedesktop.org/kusma/mesa/-/merge_requests/163/diffs?commit_id=1a26837877052836782a021be5c2de0436c13cec
21:27 jenatali: (at some point)
21:31 karolherbst: seems like that testp.normal.f32 generates slightly different code
21:34 karolherbst: jenatali: ohhh
21:34 karolherbst: I see
21:34 karolherbst: ptx treats 0.0 and -0.0 as normal as well
21:34 karolherbst: and I think CL does not
21:35 karolherbst: or something weird
21:35 karolherbst: mhh, so the cl code does this
21:35 jenatali: Makes sense
21:39 karolherbst: yeah...
21:39 karolherbst: I think that's the difference
21:40 karolherbst: oh well..
21:40 imirkin: karolherbst: not sure what "normal" is
21:40 imirkin: karolherbst: there's a FCHK op on earlier hw that we never use
21:40 imirkin: perhaps it does something useful here
21:40 jenatali: Not subnormal, infinite, or other weird floats
21:40 karolherbst: "normal" floats :p but I already figured it out
21:41 karolherbst: CLs get_global_linear_id just ends up as a bunch of instructions
21:42 hanetzer: beh.
21:50 karolherbst: jenatali: mhhh, and kernel_memory_alignment_local passes for you?
21:52 karolherbst: ehhh...
21:52 karolherbst: something is funny with the runtime
21:53 karolherbst: the shader does the right thing...
21:53 karolherbst: well.. more or less
21:53 karolherbst: uhm.. actually
21:53 karolherbst: sould be fine
22:02 karolherbst: weird...
22:02 karolherbst: I read something else out than the kernel writes...
22:05 jenatali: karolherbst: Yeah, all the alignment ones just passed, except the dedicated vecalign test which uses packed structs
22:35 hanetzer: so. anyone else have any ideas on things I should try for https://gitlab.freedesktop.org/drm/amd/-/issues/1257 ?
22:56 karolherbst: kernels look soo much nicier with 32 bit addressing :(
23:02 karolherbst: jenatali: ohh.. I looked at the wrong kernel :D
23:03 karolherbst: ahh
23:03 karolherbst: mhhh
23:04 karolherbst: jenatali: seems like it's private memory
23:05 karolherbst: or rather.. taking the address of kernel inputs
23:07 karolherbst: jenatali: do you have any patches in regards to loading of kernel arguments?
23:08 jenatali: Don't think so
23:08 jenatali: What seems to be the problem?
23:08 karolherbst: soo we have a store_deref(deref, deref_cast(load_param)) thing initially
23:08 karolherbst: which looks fine
23:09 karolherbst: or maybe it's not?
23:09 karolherbst: mhhh
23:09 jenatali: Seems fine?
23:09 karolherbst: jenatali: it shouldn't load the arg
23:10 karolherbst: so what the kernel roughly does is dst = &arg
23:10 jenatali: Oh, hold on, there might be a patch from before I really started understanding nir
23:10 karolherbst: but I think the spirv is probably also a bit bonkers here
23:12 jenatali: Nope, nevermind, the patch I'm thinking of was actually yours :P I think it showed up in master after we forked originally
23:12 karolherbst: https://gist.github.com/karolherbst/fc66765e4a30cbd72d8b883302bf959b
23:12 jenatali: karolherbst: got the CLC source?
23:13 karolherbst: yep, reload
23:13 jenatali: karolherbst: &foo[0] == *foo, yeah?
23:13 karolherbst: yeah
23:13 jenatali: Oh, no, just == foo
23:13 karolherbst: I think the problem is OpFunctionParameter
23:14 karolherbst: soo. OpFunctionParameter more or less only declares the "pointer" to the parameter
23:14 karolherbst: but...
23:14 jenatali: karkolherbst: https://gitlab.freedesktop.org/kusma/mesa/-/blob/msclc-d3d12/src/microsoft/clc/clc_compiler.c#L1131
23:14 jenatali: Does Clover align the addresses of the inputs?
23:15 karolherbst: it does
23:15 jenatali: IIRC vtn doesn't even set driver_location
23:15 karolherbst: and this isn't the issue
23:15 karolherbst: the issue is, that the parameter gets loaded
23:15 karolherbst: because.. the spirv is just annoying here
23:15 karolherbst: see how the result of OpFunctionParameter is used in different ways?
23:15 jenatali: The function parameter is a char*, and that's what's being written, isn't it?
23:15 karolherbst: nope
23:16 jenatali: ?
23:16 karolherbst: or is it? wait...
23:16 karolherbst: ehh. actually
23:16 jenatali: char* mem0; &mem0[0] == mem0
23:16 karolherbst: ahh, it takes the address of the things inside local memory.. uff
23:16 karolherbst: I didn't see those [0]
23:17 jenatali: Ah, that's probably this then: https://gitlab.freedesktop.org/kusma/mesa/-/blob/msclc-d3d12/src/microsoft/clc/clc_compiler.c#L1328
23:17 karolherbst: might be? let's see
23:19 karolherbst: jenatali: I've added the nir after function inlining and I think that one looks fine actually
23:20 karolherbst: so the ptr get loaded and then cast to whatever
23:20 jenatali: Yeah, looks fine
23:20 karolherbst: although.. mhhh
23:21 jenatali: Not sure how Clover generates the addresses the get loaded, but I know we do align them
23:22 karolherbst: jenatali: ehh.. OpConvertPtrToU looks a bit odd
23:22 karolherbst: so it does only emit a u2u64
23:22 karolherbst: but maybe that's fine as well?
23:22 imirkin: mmmkay. problem identified. the events get stuck when screens got into dpms
23:22 jenatali: Ah right - we reflect out the offset that our compiler frontend calculates, and then the runtime stores that offset in the kernel args buffer
23:22 karolherbst: some time ago last time I looked at the nir here :D
23:22 jenatali: karolherbst: The spirv and nir both look fine to me
23:22 imirkin: ajax: any thoughts about DRI2 / PRESENT events in conjunction with dpms disabling?
23:22 jenatali: I suspect we have identical or nearly identical nir we're using, and that test is okay for us
23:22 imirkin: (besides "there lie dragons")
23:23 karolherbst: jenatali: I'd expect an additional deref_cast (uint64_t)ssa_15 instead of the u2u64 though
23:23 karolherbst: but .. I think that's fine :p
23:23 karolherbst: ohh actually
23:23 karolherbst: no...
23:23 karolherbst: I always get confused about deref_cast
23:24 jenatali: Yeah no it's fine
23:24 jenatali: lower_explicit_io will do the right thing
23:25 karolherbst: right.. so the kernel args should be the offsets into the shared memory buffer
23:25 karolherbst: and the kernel is just giving those back
23:26 jenatali: Yup
23:29 jenatali: karolherbst: Where do those addresses/offsets come from in Clover?
23:29 karolherbst: src/gallium/frontends/clover/nir/invocation.cpp
23:29 karolherbst: "// Calculate input offsets."
23:30 karolherbst: or do you mean the shared memory ones?
23:30 karolherbst: because I suspect the input buffer to be bonkers :D
23:32 karolherbst: "std::vector of length 28, capacity 32 = {0x0, 0xa0, 0x3d, 0x0, 0x0, 0x0, 0x0, 0x0, 0x1, 0x0, 0x0, 0x0, 0x3, 0x0, 0x0, 0x0, 0x7, 0x0, 0x0, 0x0, 0xb, 0x0, 0x0, 0x0, 0x13, 0x0, 0x0, 0x0}"
23:32 karolherbst: ahh yeah...
23:32 karolherbst: 0x1 is definetly wrong for the second local arg
23:32 karolherbst: should be 0x2
23:34 karolherbst: ehhhhhh
23:34 karolherbst: jenatali: found it
23:34 karolherbst: jenatali: kernel::local_argument::bind
23:35 karolherbst: so clover just continues with the size
23:35 karolherbst: and doesn't add padding _inside_ the local mem buffer
23:36 karolherbst: mhhh
23:36 karolherbst: annoying
23:37 jenatali: Yep, that's the bug then
23:37 karolherbst: yeah..
23:38 karolherbst: at least that could also explain other fails :)
23:38 jenatali: Cool, glad I could help
23:38 karolherbst:should remember to blame clover first
23:38 karolherbst: :D
23:38 karolherbst: I think I even hit and debuged this once... mhh
23:39 karolherbst: ahhh
23:39 karolherbst: glorious fix
23:39 karolherbst: jenatali: https://gitlab.freedesktop.org/karolherbst/mesa/-/commit/af19a08ab58925c5759da43078dde1cd27ffe1cb
23:39 jenatali: karolherbst: If you ask me and I don't know of a patch off the top of my head, blame clover first ;)
23:40 karolherbst: jenatali: I hope you handle that better than just to allign to 0x80? :D
23:41 jenatali: karolherbst: I sent you the code - we align up to the app-provided input size, up to a max of 128 byte
23:42 karolherbst: ahhh
23:42 karolherbst: a little better then
23:42 karolherbst: but I think we can do both better :p
23:42 jenatali: Yeah, not amazing, but good enough
23:42 jenatali: Honestly, I'm unconcerned until I see a kernel that has so many local args that the padding actually matters
23:43 karolherbst: right...
23:43 karolherbst: "PASSED test." :)
23:44 jenatali: Nice :)
23:44 karolherbst: but I think it's not hard to fix properly
23:44 karolherbst: we preparse the spirv anyway, just need to save the alignment I was too lazy for 2 years ago :D
23:44 karolherbst: 2 years....
23:44 karolherbst:has been workin on this for too long
23:45 imirkin: time flies when you're having fun
23:45 jenatali: :P
23:45 karolherbst: and I totally remeber writing that patch even...
23:45 jenatali: I actually have been having a lot of fun with this project
23:46 karolherbst: at least this project has a constant progress you actually can see :p
23:46 karolherbst: :D
23:46 karolherbst: that's worth a lot
23:47 jenatali: Yeah, absolutely, a steady march towards passing an already-existing test suite is awesome
23:47 jenatali: One of the reasons I really like working on mapping layers
23:48 jenatali:needs to stop writing code and write an XDC presentation
23:48 karolherbst: yeah.... I decided to don't have an XDC presentation, but this time the topic I'd talk about is covered already anyway :p
23:48 karolherbst: and I talked about nouveau at fosdem
23:49 jenatali: I'm part of 2 this year :D
23:49 jenatali: Instead of just lurking and hiding my face in the shadows like last year
23:49 karolherbst: ahh yeah...
23:50 karolherbst: giving talks are quite fun and usually I end up with three in a year
23:50 karolherbst: I think 4 was the most I did once?
23:50 karolherbst: dunno
23:51 karolherbst: at fosdem it's always the most fun though
23:52 karolherbst: but mostly because people actually do ask a ton of questions
23:53 jenatali: Yeah, I'm very interested to see how the remote format's gonna work this year
23:53 jenatali: Especially since it's going to be midnight - 8am in my time zone
23:53 karolherbst:doesn't like talking with nobody around
23:57 karolherbst: jenatali: actually.. just do MIN2(0x80, 1 << (ffs(size) - 1)) :p
23:58 jenatali: Isn't that what we're doing?
23:58 jenatali: Except with the MIN2 expanded
23:58 karolherbst: yeah, just easier to read :p
23:59 jenatali: Yeah but since MIN2 is a macro, it's mean the ffs needs to be evaluated twice - expanded is a bit cleaner IMO