11:24tavvva: Good morning/afternoon/evening
11:24tavvva: imirkin: u there?
11:32pmoreau: Good afternoon; he’ll probably be around in an hour or two, if he doesn’t have anything else going on.
11:33tavvva: pmoreau: thx :]
13:51tavvva: imirkin: I need to leave now, but let me share one more fish from a different HW ... https://termbin.com/lh7d
13:52tavvva: imirkin: this time the video caused a temporary UI freeze but after a minute it recovered
13:52tavvva: imirkin: it's GT 430
13:55tavvva: imirkin: I can reproduce it with one of the videos .... the Xorg process eats 100% of CPU during the frozen state
13:57tavvva: imirkin: I think it happened when I changed the movie position in both cases
13:57tavvva: imirkin: and it seems some frames are loosing VSYNC
13:58tavvva: imirkin: it is not permanent .... just some frames are displayed with visible tearing
13:59tavvva: maybe caused by unexpected delay
14:01tavvva: the video is h264 - 1920x810
14:15pmoreau: I was confused why the NIR and binary were loading 3 inputs when the CL kernel only had 2, but that’s because we lower global_invocation_id_offsets to loading from the inputs and add the offset as an extra argument.
14:19pmoreau: So back to trying to figure out why `.hi` works for every type and vector config, except for long16, ulong16, and double16).
15:03pmoreau: > local: 140 (bytes per thread)
15:03pmoreau: The kernel only uses 8 bytes of local memory, l[0x80] and l[0x88]. I should write a compaction so that the kernel would only use l[0x0] and l[0x4] and therefore only require the 8 bytes of local that it actually uses.
16:10karolherbst: pmoreau: that probably makes sense , just that you have to be careful around indirects
16:14pmoreau: Oh, that 0x80 comes from the `prog->tlsSize`; the spill pass starts from how much local memory is already reported as being used by the program, which makes sense.
16:15karolherbst: pmoreau: so we DCE local mem, and forget to adjust the tlsSize thing?
16:15pmoreau: It’s already in the NIR
16:15karolherbst: ohh, I see
16:16karolherbst: not sure if we have to do something after running passes
16:16pmoreau: We DCE variables after we have lowered to explicit types.
16:16karolherbst: but I think there is a gather_info thing
16:16karolherbst: which recalculates that stuff
16:16karolherbst: mhhh but yeah
16:16karolherbst: soo. we have to be careful with that stuff
16:16pmoreau: From what jenatali/jekstrand were saying: we lower too early and should run the DCE first.
16:17karolherbst: we can only recalculate as long as we still have variables
16:17karolherbst: essentially this
16:24pmoreau: Added a `NIR_PASS_V(nir, nir_remove_dead_variables, nir_var_all, NULL);` after the `NIR_PASS_V(nir, nir_lower_vars_to_explicit_types, nir_var_mem_constant, ...);` and that seems to do the trick.
16:24pmoreau: Though it didn’t solve the bug sadly, which is not surprising but one could have hoped it would.
16:35pmoreau: Oh, I found the bug! The spill code spills the results of a 2x 32-bit values merge and one 32-bit value, but later on I end up with the low bits of the 64-bit value being stored to l[0x0] (all good) but the high bits end up at l[0x8], overwriting the other 32-bit value which got spilled (and is the global memory address where the results should be stored 😅).
17:02pmoreau: Something off with how the compMask is set, or the computation in `offsetSlot()`, but `ffs(lval->compMask) - 1` is returning 2 instead of 1, resulting in an offset of 0x8 for the second component of the merge instead of 0x4.
17:18tavvva: I'm back ...
17:18tavvva: imirkin: u there?
18:09imirkin: huh, this is new: https://pastebin.com/raw/7Yetn5XD
18:10imirkin: trying to run vdpauinfo via "prime", but this appears to be in the object init logic
18:10imirkin: gears/etc works fine on the board
18:22tavvva: imirkin: hello :]
18:22imirkin: tavvva: i don't have anything for you yet, sorry
18:22tavvva: imirkin: how's your day?
18:22imirkin: i don't want to send you untested stuff, trying to get my env back in working order
18:23imirkin: for some reason the video stuff is totally busted
18:23tavvva: imirkin: no problem, no rush
18:23tavvva: imirkin: take your time, I don't wanna make pressure ... just interested and looking forward :]
18:25tavvva: but ... maybe someone else could give me an answer related to GLXVblank .... it seems to me it works exactly the opposite way ... when set to "on" in xorg.conf, I'm getting 2500 FPS in glxgears and movies have terrible tearing
18:26imirkin: i don't have a good explanation for that.
18:26imirkin: pastebin the output of "xrandr --listproviders"
18:26tavvva: when unset, the framerate is automatically changed to the frequency of the display .... the freq is changed according to the position of the centre of the window
18:27tavvva: in dualhead setup
18:27tavvva: the same applies to movies
18:27tavvva: but .... the vsync is not accurate for some reason ... I experience 3 kinds of behavior
18:29imirkin: but vsync is hard
18:29imirkin: and details depend on about 50 things
18:29tavvva: 1. correct vsync, no tearing, 2. vsync with fixed delay .... constant tearing in fixed Y position, 3. random vsync failures ... random position
18:29tavvva: understand :]
19:37imirkin: tavvva, in case you read the logs, looks like you're using the 'modesetting' ddx. i know nothing about its handling of vblank, if any.
19:41imirkin: skeggsb: huh, that's weird. 0fcc8 should be the xtensa XT_REGION_LIMIT value. and the earlier writes didn't fail.
19:42imirkin: but cc8 and the later fd84 (INTR_EN) give a MMIO fault
19:42imirkin: i guess this points to some lack of, uh, perfection, in the xtensa unit bring-up?
19:42imirkin:is going to cry
19:47imirkin: so yeah, the fact that INTR_EN (not to mention XT_REGION_LIMIT) fail, that means the fifo is definitely not going to work
19:48RSpliet:offers imirkin a shoulder
20:07juri_:still owes RSpliet a beer.