00:22 fdobridge: <g​fxstrand> Okay, I've run 1k fuzzing tests through it. I think my layouts are good
00:42 fdobridge: <g​fxstrand> Now to figure out how to do 2D array views of 3D without smashing tiling.z_log2 to 0
00:43 fdobridge: <k​arolherbst🐧🦀> I'm not sure that's possible at all?
00:43 fdobridge: <g​fxstrand> It's got to be
00:43 fdobridge: <g​fxstrand> Sparse requires it
00:43 fdobridge: <k​arolherbst🐧🦀> why?
00:43 fdobridge: <k​arolherbst🐧🦀> uhhh
00:43 fdobridge: <k​arolherbst🐧🦀> and I thought vulkan had this bit to state that you are going to do this kinda of stuff
00:44 fdobridge: <g​fxstrand> Yeah but Vulkan also has standard block sizes which have a different opinion
00:44 fdobridge: <k​arolherbst🐧🦀> but if sparse is special here.. then mhh..
00:44 fdobridge: <g​fxstrand> I think we can do it on Maxwell+
00:44 fdobridge: <k​arolherbst🐧🦀> maybe?
03:46 fdobridge: <g​fxstrand> Yeah, it's looking like we can't. Fortunately, there's a VU for it
03:47 fdobridge: <g​fxstrand> We do have to support 2D views of 3D, just not 2D array views.
03:49 fdobridge: <g​fxstrand> I think that's okay, though. That *should* just be a 3D view that we access with a 2D sampler. At least I'm hoping as much.
04:43 fdobridge: <g​fxstrand> That seems to work except now I have to figure out sliced_view_of_3d
04:43 fdobridge: <g​fxstrand> Annoyingly, that one doesn't have an image create bit
04:45 fdobridge: <g​fxstrand> But also that's part of D3D12 so I *know* the hardware can do it. I just need to figure out how.
04:45 fdobridge: <g​fxstrand> Hell.... there's an idea....
04:46 fdobridge: <g​fxstrand> Why didn't I think of this before...
04:46 fdobridge: <g​fxstrand> If I write a little bit of D3D code, I can get NVIDIA's blob driver to cough out fully baked texture headers...
04:47 fdobridge: <g​fxstrand> 😈
04:47 fdobridge: <S​id> 👀
04:49 fdobridge: <g​fxstrand> I need to chat with @themaister about this
05:13 fdobridge: <r​edsheep> Would that code need to get ran on windows to go straight to the blob's native directx driver? Let me know if there's anything I can help capture, if you don't have a windows install
05:31 fdobridge: <a​irlied> @gfxstrand can't you just use a CTS test on the prop driver
05:34 fdobridge: <S​id> does anyone know how to build and boot a debug kernel on arch
05:34 fdobridge: <S​id> I need to run faddr2line
05:34 fdobridge: <S​id> but the final bzimage that's being built is always stripped of debug symbols
05:46 fdobridge: <g​fxstrand> I can but that involves fishing it out of memory. With D3D12, I can just give it a pointer and say "put the descriptor here, please"
07:16 fdobridge: <t​om3026> @redsheep are you on kde/kwin?
07:17 fdobridge: <r​edsheep> Yes
07:19 fdobridge: <t​om3026> @redsheep if on wayland https://invent.kde.org/plasma/kwin/-/commit/be9d2fd7d07b84f0a9be7d950947ea097011356f 6.0.1 kinda fixed https://invent.kde.org/plasma/kwin/-/commit/456ed767e6ce30e87829fa11e86cab6586bae7aa two major things
07:20 fdobridge: <t​om3026> so you dont blame nouveau for frameskips and stutters! 😄
07:22 fdobridge: <t​om3026> show me your pkgbuild
07:22 fdobridge: <t​om3026> most of them manually strip files in the package function
07:22 fdobridge: <r​edsheep> I've been testing on openbox as well to try to isolate my variables
07:22 fdobridge: <t​om3026> ah okay 👍
07:26 fdobridge: <S​id> I removed all of that
07:26 fdobridge: <S​id> and I got it sorted in the end
07:28 fdobridge: <t​om3026> okay heh
07:29 fdobridge: <t​om3026> i might not know gpu driver coding but ive been fiddling with half self built -git "Arch" from compiler to libc so got any questions ive probably been there at some point in the past 16 years. ask away! 😄
07:30 fdobridge: <r​edsheep> These fixes do seem like they might explain my higher performance in talos, I realized that was along with it switching me to the wayland session without saying anything
07:30 fdobridge: <r​edsheep> The x session is mostly the same performance. I feel like maybe kwin is responsible for the difference between installed vs not installed drivers
07:31 fdobridge: <r​edsheep> In terms of performance, that is. That stuff is behaving really weirdly right now.
07:33 fdobridge: <t​om3026> i mean at what stage are the drivers linked/loaded when the first xwayland process is running or when the game runs? because kwin kinda launches it directly you log in
07:33 fdobridge: <S​id> yeag plasma 6 wl does buffer latching correctly
07:33 fdobridge: <S​id> compared to 5
07:33 fdobridge: <t​om3026> meaning you pretty much dont load that local driver whatever you do
07:34 fdobridge: <t​om3026> since most games is in xwayland anyhow, unless you get wine 9.x and force it to the wayland driver i guess heh
07:37 fdobridge: <r​edsheep> Hmm testing proton games with it forced to wayland is an interesting idea. I don't think I will do too terribly much more performance testing until the modifiers situation is fixed though, I get the sense it's really messing things up not to have those.
07:39 fdobridge: <t​om3026> its quite easy once its compiled in, set the registry key so it "fallbacks" to wayland if $DISPLAY is unset, so anytime you wanna swap "DISPLAY= wine game.exe"
07:39 fdobridge: <t​om3026> not everything is implented yet so there is some breakage but most dxvk/vkd3d games i tried works
14:18 fdobridge: <g​fxstrand> @tiredchiku did you have a fix for suspend/resume somewhere?
14:25 fdobridge: <S​id> here
14:25 fdobridge: <S​id> https://github.com/torvalds/linux/commit/f6ecfdad359a01c7fd8a3bcfde3ef0acdf107e6e
14:29 fdobridge: <g​fxstrand> Thanks! I've been fighting suspend problems for a while and they seemed to be nouveau related. Here's hoping that helps.
14:30 fdobridge: <S​id> 🤞
15:56 fdobridge: <b​abblebones> Laptop enjoyers gonna have their day with this one ❤️
15:56 fdobridge: <S​id> ?
15:57 fdobridge: <S​id> oh that patch
15:57 fdobridge: <S​id> yus, I ran into the issue on a laptop :p
16:16 fdobridge: <z​mike.> .
16:17 magic_rb: fdobridge: Interesting, because of s2idle being the only available suspend i havent ran into this
16:18 fdobridge: <S​id> magic_rb on laptops, gpus enter d3cold when there's no load
16:19 fdobridge: <S​id> which is why I ran into it, almost immediately after boot
16:19 fdobridge: <S​id> and the same suspend logic applies to desktop suspend as well
16:20 magic_rb: Im on a laptop, but my laptop cannot enter s3
16:20 magic_rb: Only s2idle with s0ix
16:20 magic_rb: I do also wonder about the power ststes the proprietary driver exposes, it uses P0-P8 iirc
16:37 fdobridge: <g​fxstrand> @karolherbst For SSY and SYNC on Maxwell, is there some sort of internal stack or something?
16:37 fdobridge: <k​arolherbst🐧🦀> correct, and it's fixed in size, and if you run out of stack you can allocate it as part of tls
16:37 fdobridge: <k​arolherbst🐧🦀> like there is a field in the SPH for it
16:37 fdobridge: <k​arolherbst🐧🦀> c/r stack is the term
16:37 fdobridge: <g​fxstrand> And SSY spills to that?
16:38 fdobridge: <k​arolherbst🐧🦀> any instruction branching does, it's like the barrier registers, just transparent to the shader
16:38 fdobridge: <k​arolherbst🐧🦀> ehh.. not branching
16:38 fdobridge: <k​arolherbst🐧🦀> like.. those who affect the thread masks in weird ways
16:39 fdobridge: <k​arolherbst🐧🦀> those loop CFG ones as well
16:39 fdobridge: <k​arolherbst🐧🦀> any one where you push to it implicitly, in codegen we call those prebreak/break, etc...
16:41 fdobridge: <g​fxstrand> Ugh... this feels annoyingly intelish. 😅
16:43 fdobridge: <k​arolherbst🐧🦀> the thing is, the stack isn't even big
16:43 fdobridge: <k​arolherbst🐧🦀> like 512 bytes or something?
16:43 fdobridge: <k​arolherbst🐧🦀> but we also don't really know how it works
16:44 fdobridge: <g​fxstrand> No, the max is 1 MB
16:44 fdobridge: <k​arolherbst🐧🦀> and you need to know ahead of time if you use the fixed sized one or the spilled one
16:44 fdobridge: <k​arolherbst🐧🦀> naaah
16:44 fdobridge: <k​arolherbst🐧🦀> if you set 0, you use a fast on chip one
16:44 fdobridge: <k​arolherbst🐧🦀> but you need ti know ahead of time how much you need in the worst case
16:44 fdobridge: <k​arolherbst🐧🦀> it's all a bit painful
16:45 fdobridge: <k​arolherbst🐧🦀> I think you can configure the spilled on in steps of 512 bytes, but I think the non spilled one is also just 512 bytes in size.. maybe a bit bigger, never really could figure it out
16:46 fdobridge: <k​arolherbst🐧🦀> it's a bit annoying that it's a all or nothing thing
16:47 fdobridge: <g​fxstrand> It doesn't look like nouveau actually programs that header bit at all
16:47 fdobridge: <k​arolherbst🐧🦀> yeah... we only do it for compute
16:47 fdobridge: <k​arolherbst🐧🦀> we had _one_ bug where we run out of space
16:48 fdobridge: <k​arolherbst🐧🦀> and the corruption was pretty unimportant for game play
16:48 fdobridge: <k​arolherbst🐧🦀> but for compute we do it unconditionally
16:49 fdobridge: <k​arolherbst🐧🦀> `SHADER_LOCAL_MEMORY_CRS_SIZE` in the QMD
16:49 fdobridge: <g​fxstrand> nv4e_compute.c:171 would like to differ. 😛
16:50 fdobridge: <g​fxstrand> Oh, I see in CRS
16:50 fdobridge: <g​fxstrand> Oh, I see in the QMD (edited)
16:50 fdobridge: <g​fxstrand> Sure, just set it to a big fixed value and hope it's right. 😅
16:50 fdobridge: <k​arolherbst🐧🦀> 🙂
16:50 fdobridge: <k​arolherbst🐧🦀> as I said: we hit one bug on the 3D side and that's all
16:50 fdobridge: <k​arolherbst🐧🦀> 😄
16:51 fdobridge: <g​fxstrand> Yeah, that's not gonna work for vk...
16:51 fdobridge: <k​arolherbst🐧🦀> I know
16:52 fdobridge: <k​arolherbst🐧🦀> however...
16:52 fdobridge: <k​arolherbst🐧🦀> it's stored in the TLS buffer
16:52 fdobridge: <k​arolherbst🐧🦀> so we could actually reverse engineer whtever happens
16:52 fdobridge: <g​fxstrand> Yeah
16:53 fdobridge: <g​fxstrand> In any case, the first step is to add the instructions
16:53 fdobridge: <k​arolherbst🐧🦀> yeah
16:53 fdobridge: <g​fxstrand> And figuring out the stack depth shouldn't actually be that hard
16:53 fdobridge: <g​fxstrand> Thank you, NIR, for being structured. 😅
16:53 fdobridge: <k​arolherbst🐧🦀> yeah...
16:53 fdobridge: <k​arolherbst🐧🦀> when I was looking into the issue ......
16:54 fdobridge: <k​arolherbst🐧🦀> but the shader hitting this was a nested loop thing and nvidia just loop merged the whole thing...
16:55 fdobridge: <g​fxstrand> I guess the first question is if I should do it in NIR or directly in nak_from_nir
16:55 fdobridge: <p​ixelcluster> Quser Mode Driver? :P
16:57 fdobridge: <g​fxstrand> Now the real question: Do I want to make each one its own instruction with a mode or multiple?
16:57 fdobridge: <g​fxstrand> It's multiple in the HW
17:03 fdobridge: <z​mike.> @gfxstrand here you go, one item off your todo list https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28078
17:04 fdobridge: <z​mike.> and with that, I think my work here is done
17:05 fdobridge: <g​fxstrand> woo
17:24 fdobridge: <!​DodoNVK (she) 🇱🇹> Does SuperTuxKart work on zink + NVK now?
17:26 fdobridge: <m​henning> @gfxstrand It's lso worth noting that SYNC acts as an actual control flow instruction eg. an if statement can be implemented as just an SSY beforehand + a conditional jump in the branches + a SYNC at the end of each of the branches (at least on kepler, where I was doing some RE a few years ago)
17:26 fdobridge: <z​mike.> https://www.supergoodcode.com/woof/
17:27 fdobridge: <m​henning> codegen doesn't use it this way - it acts like it needs to do the reconvergence separately
17:29 fdobridge: <m​henning> but anyway, there's a few annoying details in that a loop will take more control stack space depending on whether you need prebreak and precontinue entries
17:30 fdobridge: <m​henning> codegen always inserts them and then deletes the precontinue if it isn't needed, which makes it harder to find out the actual stack size than if it were done in from_nir
17:31 fdobridge: <m​henning> @gfxstrand It's lso worth noting that SYNC acts as an actual control flow instruction eg. an if statement can be implemented as just an SSY beforehand + a conditional jump to the branches + a SYNC at the end of each of the branches (at least on kepler, where I was doing some RE a few years ago) (edited)
17:32 fdobridge: <g​fxstrand> Neat!
17:34 fdobridge: <r​edsheep> I've tested loads of games and haven't yet found one that isn't working. Sessions are still a little shaky though.
17:36 fdobridge: <r​edsheep> Anybody have a non Nvidia card where they care to test zink running discord? It's not clear if the corrupting server icons is an NVK or zink bug
17:37 fdobridge: <r​edsheep> Or are we at the point with zink+NVK where I should just open an issue for that?
17:38 fdobridge: <c​onan_kudo> Welp. https://www.tomshardware.com/pc-components/gpus/nvidia-gtx-branding-finally-reaches-the-end-of-the-line-after-19-years-the-last-gtx-16-series-chips-have-left-the-foundry
17:39 fdobridge: <r​edsheep> ... These were still being made huh? Weird.
17:39 fdobridge: <r​inlovesyou> i didn't even know it uses opengl
17:39 fdobridge: <!​DodoNVK (she) 🇱🇹> What's the corruption bug?
17:40 fdobridge: <r​inlovesyou> ```LIBGL_KOPPER_DRI2=1 __GLX_VENDOR_LIBRARY_NAME=mesa MESA_LOADER_DRIVER_OVERRIDE=zink GALLIUM_DRIVER=zink VK_ICD_FILENAMES=/usr/share/vulkan/icd.d/nvidia_icd.json discord```
17:40 fdobridge: <r​inlovesyou>
17:40 fdobridge: <r​inlovesyou> ran it with the proprietary driver, turned on hardware acceleration in the discord settings, and i'm not seeing any server icons corrupting
17:40 fdobridge: <r​edsheep> Hovering the server icons turns them cursed. Also screen share UI is similarly weird.
17:41 fdobridge: <r​inlovesyou> ```
17:41 fdobridge: <r​inlovesyou> [6979:0308/183854.948883:ERROR:gl_ozone_egl.cc(23)] GLDisplayEGL::Initialize failed.
17:41 fdobridge: <r​inlovesyou> [6979:0308/183854.950096:ERROR:viz_main_impl.cc(186)] Exiting GPU process due to errors during initialization
17:41 fdobridge: <r​inlovesyou> ```
17:41 fdobridge: <r​inlovesyou> oh well, it's certainly not working
17:41 fdobridge: <!​DodoNVK (she) 🇱🇹> Can't reproduce with RADV
17:42 fdobridge: <!​DodoNVK (she) 🇱🇹> What's up with this though?: `MESA: error: ZINK: failed to choose pdev`
17:42 fdobridge: <r​edsheep> Good to know, so probably an NVK issue. Dunno if we're doing issues for this kind of thing yet.
17:44 fdobridge: <z​mike.> stop doing `GALLIUM_DRIVER=zink`
17:44 fdobridge: <!​DodoNVK (she) 🇱🇹> I did `MESA_LOADER_DRIVER_OVERRIDE=zink`
17:45 fdobridge: <z​mike.> it was a general statement
17:45 magic_rb: Whats the correct way to make zink run?
17:45 fdobridge: <r​edsheep> I've been using NOUVEAU_USE_ZINK lately
17:45 fdobridge: <!​DodoNVK (she) 🇱🇹> I think that's NVK-specific though
17:46 magic_rb: Im considering setting it session global, it seems to be the path forward, so might as well test it out early
17:47 fdobridge: <r​edsheep> If you try Wayland and it fails x11 will probably be fine, of not ideal
17:47 fdobridge: <r​edsheep> if*
17:48 magic_rb: Im still on x11 because of xmonad and until recently being on proprietary nvidia
17:48 magic_rb: So wayland is not a concern i have yet
17:51 fdobridge: <r​edsheep> Oh also depending what you mean by session global it may or may not run the session itself. I've been putting NOUVEAU_USE_ZINK=1 in /etc/environment
17:52 magic_rb: Either eight after xmonad starts or before it starts
17:53 fdobridge: <g​fxstrand> Okay, down to two sparse bugs:
17:53 fdobridge: <g​fxstrand> 1. The CTS gets confused that we advertise `R10X6` and `R10X6G10X6` formats. Arguably, this is a CTS bug but I don't know that I care that much.
17:53 fdobridge: <g​fxstrand> 2. We need to make the stencil_copy_temp part of the miptail somehow
17:54 fdobridge: <z​mike.> did you run any of the glcts sparse tests yet
17:55 fdobridge: <g​fxstrand> No
17:56 fdobridge: <z​mike.> you're in for a good time.
17:57 fdobridge: <m​henning> Okay, I found my notes from REing control flow stuff on kepler. Another detail: the case where a conditional branch is taken executes first. This means that else branches of ifs (at least in codegen's impl) take one stack entry more than the "then" branch. So, including the SSY, a divergent branch consumes two stack slots on the "else" branch and one stack slot on the "then" branch
17:57 fdobridge: <m​henning> I also wrote down "Allocate 512 + 512 * (stack_depth / 32) bytes (round up)" for the size of the control flow stack on kepler. Note that this is different from an expression I found elsewhere, so there's something I don't understand going on - the size of the on-chip stack might vary by generation
18:01 fdobridge: <w​aelunix> ok so after a painful weekend, i finally have MESA 24.1.0 dev working with NVK
18:02 fdobridge: <w​aelunix> zink on NVK seems to be working **iff* i have gallium nouveau as well
18:02 fdobridge: <w​aelunix> zink on NVK seems to be working **iff** i have gallium nouveau as well (edited)
18:02 fdobridge: <w​aelunix> otherwise i can't run any wayland compositor due to missing DRI3
18:03 fdobridge: <z​mike.> `LIBGL_KOPPER_DRI2=1`
18:04 fdobridge: <!​DodoNVK (she) 🇱🇹> So does Kopper not support DRI3?
18:06 fdobridge: <w​aelunix> Would that work with `MESA_LOADER_DRIVER_OVERRIDE=zink NOUVEAU_USE_ZINK=yes`? cause it doesn't work when I have gallium nouveau
18:06 fdobridge: <w​aelunix> and i'm not looking forward to ~~compiling mesa again~~ switching back to an older generation
18:08 fdobridge: <r​edsheep> You don't need both of those variables to force on zink, and I don't think yes is valid input, I've been doing 1
18:09 fdobridge: <z​mike.> probably
18:09 fdobridge: <w​aelunix> So i tried running sway with the above incantation. It's complaining about `EGL_EXT_image_dma_buf` not being implemented and just dies
18:09 fdobridge: <z​mike.> and both of those is redundant
18:09 fdobridge: <z​mike.> nvk doesn't
18:09 fdobridge: <z​mike.> yes, that's dri3
18:10 fdobridge: <w​aelunix> Is there any wayland compositor that's DRI2 compatible 🥺
18:10 fdobridge: <!​DodoNVK (she) 🇱🇹> @ gfxstrand 👀
18:10 fdobridge: <z​mike.> you mean software
18:10 fdobridge: <z​mike.> and yes, weston
18:11 fdobridge: <z​mike.> you don't need to ping anyone about this, it's being discussed constantly here every day
18:12 fdobridge: <w​aelunix> Weston doesn't work and is complaining about DRM atomic modesetting not being implemented
18:12 fdobridge: <w​aelunix> I'm guessing even with `LIBGL_KOPPER_DRI2=1 ` i'm somehow not getting any DRI ?
18:12 fdobridge: <w​aelunix> I'm guessing even with `LIBGL_KOPPER_DRI2=1` i'm somehow not getting any DRI ? (edited)
18:12 fdobridge: <z​mike.> use uhh
18:13 fdobridge: <z​mike.> it's like `WESTON_NO_ATOMIC=1` or something
18:13 fdobridge: <z​mike.> WESTON_DISABLE_ATOMIC
18:15 fdobridge: <w​aelunix> Good news? It's not complaining about DRI anymore. Now i just have to deal with regular seat weirdness. _of to the arch wiki i go_
18:15 fdobridge: <w​aelunix> Good news? It's not complaining about DRI anymore. Now i just have to deal with regular seat weirdness. _of fto the arch wiki i go_ (edited)
18:15 fdobridge: <w​aelunix> Good news? It's not complaining about DRI anymore. Now i just have to deal with regular seat weirdness. _off to the arch wiki i go_ (edited)
18:25 fdobridge: <w​aelunix> ok so that didn't work. It "starts" but the fb still shows the PTY
18:28 fdobridge: <w​aelunix> Sorry for not checking more throughly beforehand orz
18:28 fdobridge: <w​aelunix> https://gitlab.freedesktop.org/mesa/mesa/-/issues/10477
19:05 magic_rb: running minecraft using this wrapper command "gamescope -w 2560 -h 1600 -r 60 -o 60 -f -- sh -c 'DRI_PRIME=1 NOUVEAU_USE_ZINK=1 "$@"' sh" creates a work of art and not a menu screen, without gamescope it works, i assume its because im not running gamescope on the same gpu as im running the game
19:08 fdobridge: <z​mike.> probably the same issue
19:41 Lyude: TimurTabi: btw, looks like 042b5f83841fbf frees a bunch of boot/init buffers - but unfortunately we still need those for runtime PM (I haven't checked, but it's probably also needed for S3/S4) so it seems that commit breaks runpm on this machine. gonna send a patch out in a moment
19:47 Lyude: oh, lol, seems like someone already fixed that
19:47 Lyude: do you need someone to push the patch?
19:48 fdobridge: <t​om3026> its landed, 6.7.9 or 6.8 :p
19:50 fdobridge: <S​id> yup
19:50 fdobridge: <S​id> 6.7.9 and 6.8-rc7
19:53 Lyude: oh huh, this was off a drm-next kernel
19:53 Lyude: maybe I switched to the wrong branch by mistake
19:53 Lyude: (unless you meant a different issue Sid?)
19:53 Sid127: nope, same issue
19:54 Lyude: gotcha
19:54 Lyude: well good to know that's handled :)
19:54 Sid127: <3
19:59 fdobridge: <g​fxstrand> Okay, now the next question: How do the different sync things nest?
20:00 fdobridge: <g​fxstrand> Does each thing pop one sync or does break/cont pop all the syncs?
20:02 fdobridge: <g​fxstrand> If it works anything like Intel, there's a precedence order: break > cont > sync
20:02 TimurTabi: Lyude: I think drm-next needs to be rebased.
20:03 fdobridge: <k​arolherbst🐧🦀> there is some of it, but I think sync is special... but also not sure
20:03 fdobridge: <g​fxstrand> I don't suppose you have docs on any of this?
20:03 fdobridge: <k​arolherbst🐧🦀> I don't 🙂
21:15 fdobridge: <g​fxstrand> @airlied Any progress on the IRQ issue? Thanks to the map fix, I can now complete a run about 50% of the time. The other 50% the card goes out to lunch.
21:23 fdobridge: <a​irlied> nah travel took this week out, will return to it next week
21:24 fdobridge: <g​fxstrand> Okay
21:24 fdobridge: <g​fxstrand> I kinda figured
22:08 fdobridge: <!​DodoNVK (she) 🇱🇹> Right now the outside looks like GTA 3 with NAK 🌫️
22:25 fdobridge: <m​henning> It can pop multiple off the stack. So, you can run cont from within an if and it will pop off the stack until it gets to the PRECONT entry
22:25 fdobridge: <m​henning> Your precedence sounds right, but I haven't checked that too carefully
22:26 fdobridge: <k​arolherbst🐧🦀> @gfxstrand ohh one thing.. the branch targets have no impact on the hw level afaik (same as on turing+)
22:26 fdobridge: <k​arolherbst🐧🦀> you should be able to put whatever, but it helps in debuggers/disassemblers
22:28 fdobridge: <k​arolherbst🐧🦀> also yes.. they have a second input pre
22:28 fdobridge: <k​arolherbst🐧🦀> *pred
22:28 fdobridge: <g​fxstrand> Well, I got to `Pass: 1423582, Fail: 4, Skip: 1925414, Duration: 1:13:04, Remaining: 8:50` before my kernel died. Do I call it a success? 😩
22:29 fdobridge: <k​arolherbst🐧🦀> and they just act as another execution predicate `and`ed with the normal one
22:30 fdobridge: <k​arolherbst🐧🦀> at least.. that's the way on turing, and I doubt it's different on older gens
22:31 fdobridge: <m​henning> Some of the old envytools docs are relevant, although they're for G80 so ignore any encoding notes https://envytools.readthedocs.io/en/latest/hw/graph/tesla/cuda/control.html
22:42 fdobridge: <g​fxstrand> That seems to contradict what @mhenning said about being able to use them in the place of jumps
22:45 fdobridge: <k​arolherbst🐧🦀> mhhh
22:45 fdobridge: <k​arolherbst🐧🦀> I think it depends on the architecture...
22:46 fdobridge: <k​arolherbst🐧🦀> though... mhh
22:46 fdobridge: <k​arolherbst🐧🦀> yeah I think for break/cont they should matter, but not necessarily for ssy
22:47 fdobridge: <k​arolherbst🐧🦀> ssy is super weird on some architectures
22:47 fdobridge: <k​arolherbst🐧🦀> I think on kepler it's a flag on each instruction and the sync happens on the enxt one
22:48 fdobridge: <m​henning> My understanding is the hardware branches to the address from the PRECONT/PREBREAK/SSY (that is, the stack entry) not the address from the CONT/BREAK/SYNC
22:49 fdobridge: <k​arolherbst🐧🦀> yeah...
22:49 fdobridge: <k​arolherbst🐧🦀> the pre one should be the defining one
22:50 fdobridge: <k​arolherbst🐧🦀> but not sure they had encoding space for the target?
22:50 fdobridge: <k​arolherbst🐧🦀> at least on turing the address is totally ignored
22:51 fdobridge: <k​arolherbst🐧🦀> and I was under the impression that's also true on at least some previous gens as well...
22:52 fdobridge: <k​arolherbst🐧🦀> but I might be wrong there
22:52 fdobridge: <g​fxstrand> The pre ones all have encoding for the target
22:52 fdobridge: <g​fxstrand> The pop ones don't
22:53 fdobridge: <m​henning> There might be encoding space for an address on con/break/sync on some archs but the hardware doesn't care and codegen doesn't set it
22:53 fdobridge: <g​fxstrand> Which makes sense because a sync instruction basically says "pause this thread and go find something else to run until everyone in the sync is at the sync target.
22:54 fdobridge: <m​henning> Yeah, the encoding space is there just for debuggers on archs where it exists
23:01 fdobridge: <k​arolherbst🐧🦀> the problem is, the meaning of what sync means changed quite a few time
23:01 fdobridge: <k​arolherbst🐧🦀> it was all a bit different on tesla, fermi-kepler1? or 2, and then maxwell+ or something
23:01 fdobridge: <k​arolherbst🐧🦀> and weird in like... subtle ways
23:02 fdobridge: <k​arolherbst🐧🦀> like on one arch you marked an instruction as sync and before that one can be executed, all threads need to arrive
23:02 fdobridge: <k​arolherbst🐧🦀> but I'd have to dig around to figure out which gen behaved how
23:49 fdobridge: <g​fxstrand> Yes, that's definitely a thing for subgroup ops
23:49 fdobridge: <g​fxstrand> PTX is a mess because ofit
23:49 fdobridge: <g​fxstrand> PTX is a mess because of it (edited)
23:51 fdobridge: <m​henning> From envytools docs, it looks like G80 behaves like this for sync. kepler2 (which is where I did the RE work) does not work this way for sync - it jumps to the stack address
23:52 fdobridge: <g​fxstrand> I'm not too interested in Fermi at the moment
23:59 fdobridge: <m​henning> Yeah. I don't know which behavior Kepler1 has, although that's maybe not the highest priority either