00:35 karolherbst[d]: ohh.. I found that `bitfield_select` exists and I _think_ this can be expressed via plop3, no?
00:35 karolherbst[d]: there is a `has_bitfield_select` config for it
00:37 karolherbst[d]: should be... `0xca`?
00:45 karolherbst[d]: but the ops also look broken?
00:45 karolherbst[d]: ohh no.. it's fine
00:53 karolherbst[d]: `LOP3.LUT R11, R5, R2, R6, 0xe2, !PT ;` heh...
00:54 karolherbst[d]: why 0xe2...
00:55 karolherbst[d]: sometimes nvidia splits it into 2 lops...
00:56 karolherbst[d]: ohhh
00:56 karolherbst[d]: they reordered the args 🙃
00:57 karolherbst[d]: yeah.. with reorederd args I get 0xe2
00:57 karolherbst[d]: okay
00:57 karolherbst[d]: that confirms it
00:58 karolherbst[d]: bitfield_select => `plop3 a, b, c, 0xca` or `plop3 b, a, c, 0xe2`
00:58 karolherbst[d]: though I guess nak already folds in those log ops?
01:22 mhenning[d]: yeah, I was wondering about adding a native version of bitfield_select
01:23 mhenning[d]: logic ops are supposed to fold it but I don't know if it actually succeeds or not
01:30 karolherbst[d]: funny...
01:31 karolherbst[d]: nvidia also changes the plot value to move a constant into the b source slot
01:31 karolherbst[d]: *plop
01:31 karolherbst[d]: anyway
01:31 karolherbst[d]: I got some subgroup lowering to hit the `bitfield_select` and `0xca` is indeed the correct plop3 LUT value to implement `bitfield_select`
01:34 mhenning[d]: karolherbst[d]: I forget if we already do that or not
01:34 karolherbst[d]: yeah.. not sure, the code is a bit hard to understand at 3am 🙃
07:26 monkey: "Hi !"
12:14 karolherbst[d]: ohhh what nvidia uses for `f2f32`: `HADD2.F32 R12, -RZ, R8.H0_H0`
12:15 karolherbst[d]: I'm sure that's faster than f2f 😄
12:15 karolherbst[d]: like
12:15 karolherbst[d]: 100% sure
13:39 gfxstrand[d]: Yeah, probably
13:41 karolherbst[d]: f2f is decoupled where hadd2 is coupled, so you at least can always get rid of the latency required to set up the barrier
13:42 karolherbst[d]: same reason you use hadd2 for fabs and such
14:24 snowycoder[d]: karolherbst[d]: With coupled you mean fixed-latency?
14:25 karolherbst[d]: that's just the nvidia term for instructions having fixed or variable execution time
14:32 glehmann: kind of like amd, where v_fma_mix_f32/v_fma_mixlo_f16 are faster than v_cvt_f32_f16/v_cvt_f16_f32 in some cases
15:07 karolherbst[d]: on nvidia it's always the case but yeah 😄 but nvidias cvt is really huge
15:07 karolherbst[d]: it does proper rounding (even integer rounding on floats) and is just massive
20:14 kicchou: hello, i'm trying to 'git fetch' the drm-misc tree but i'm getting connection refused errors
20:15 kicchou: fatal: unable to connect to anongit.freedesktop.org:
20:15 kicchou: anongit.freedesktop.org[0: 2610:10:20:722:a800:0:83fc:d2a1]: errno=Connection timed out
20:15 kicchou: anongit.freedesktop.org[1: 131.252.210.161]: errno=Connection refused
20:16 kicchou: i'm trying to resubmit a patch i sent to the mailing list a few weeks ago that got eaten by a spam filter (i wasn't subscribed)
20:18 jannau: kicchou: use gitlab https://gitlab.freedesktop.org/drm/misc/kernel/
20:19 jannau: change the URL to https://gitlab.freedesktop.org/drm/misc/kernel.git
20:21 jrayhawk: whoops, xinetd got oom-killed
20:21 jrayhawk: anongit should be better again
20:23 kicchou: thanks!