12:54 RSpliet: pmoreau: NVIDIA does floating point atomic add according to the CUDA docs. Any idea whether this is a real operation or a cmpxchg loop?
13:05 pmoreau: RSpliet: IIRC, it appears as a single insn in the disassembly
13:06 RSpliet: even for floating point?
13:06 pmoreau: let me check
13:06 RSpliet: OpenCL C doesn't seem to support FP atomic add... would make a nice extension
13:06 RSpliet: (because I can currently only express it with an atomic exchange loop, which is filthy!)
13:07 pmoreau: Yeah…
16:31 jamm: hakzsam, pmoreau: tested with changes by hakzsam on https://hastebin.com/jemuvuxoxo.bash (also added stalls of 0x1 in the last two mov's) same issue (blocky shadows on gnome windows). I went ahead and replaced the second sched control codes to (st 0x0) (st 0x0) (st 0x0) from "(st 0xf wr 0x1) (st 0xf wr 0x0 wt 0x3) (st 0xf wr 0x1)" and the issue disappeared
16:31 jamm: something's definitely going on in that sched block...
16:31 jamm: i.e., line 15
16:31 pmoreau: Hum…
16:34 hakzsam: jamm: I will have a look later
16:40 hakzsam: jamm: tex nodep $r1 $r2 0x0 0x1 t2d 0x8
16:40 hakzsam: ipa $r3 a[0x84] $r0 0x0 0x1
16:40 hakzsam: WaR on $r3
16:40 hakzsam: you need a rd bar on tex
16:40 hakzsam: to prevent ipa to overwrite $r3
16:51 pmoreau: Ah, tex is reading $r2:$r3?
16:53 hakzsam: yeah
16:54 pmoreau: Ok, so a rd is indeed needed
18:28 lastm0nkey: ?
18:28 lastm0nkey: I dont get it
18:28 lastm0nkey: !paste
18:29 lastm0nkey: CCLD Xdmx
18:29 lastm0nkey: ../../render/.libs/librender.a(glyph.o): In function `HashGlyph':
18:29 lastm0nkey: glyph.c:(.text+0x285): undefined reference to `x_sha1_init'
18:29 lastm0nkey: this is the error I get when trying to makepkg the package
18:30 lastm0nkey: same happens with: x_sha1_update, x_sha1_final
18:30 lastm0nkey: shit, sorry, wrong channel
20:14 austriancoder: hakzsam: I have a short question about: https://www.x.org/wiki/Events/XDC2014/XDC2014PitoisetNouveau/talk-perf.pdf page 21 - do you have a repo where I can find the ringbuffer handling?
20:23 hakzsam: austriancoder: https://cgit.freedesktop.org/~hakzsam/mesa/commit/?h=nouveau_perfmon_v2&id=aa07387341f626ba88a946c5cd99569b84c1db77
20:23 hakzsam: look for NV50_HW_PM_RING_BUFFER_MAX_QUERIE
20:24 hakzsam: should be exactly what I say on slide 21 :)
20:26 austriancoder: hakzsam: thx
20:28 austriancoder: hakzsam: why could it result in stalling if ringbuffer gets not used?
20:57 austriancoder: hakzsam: and you are using one bo for all queries - currently I am using one bo per hw query