01:40 AndrewR: so, speaking about Cinelerra crash ..apitrace produced surprizingly small trace file, and dump'ing it with 'apitrace dump' showed only this: https://pastebin.com/BqAzNDRw
02:49 Hoolootwo: imirkin, I seem to be getting different results with different kernel configs
02:50 Hoolootwo: even with the working one, it's still intermittent, which leads me to suspect some form of race condition in init
03:11 imirkin: AndrewR: i thought i fixed that...
03:11 imirkin: AndrewR: can you check where it's dying?
03:11 AndrewR: imirkin, with gdb?
03:12 imirkin: ya
03:17 AndrewR: imirkin, https://pastebin.com/aMJYGU6m
03:17 imirkin: that's not even my fault! :)
03:26 airlied: imirkin: got the gl cts to run from the open tree, but most of the tests are unfortunately in the hidden code
03:27 imirkin: airlied: ok, i'm barely getting piglit to run the glcts stuff directly =/
03:28 airlied: the renamed GL45-CTS to KHR-GL45 I think
03:28 imirkin: nah, was fighting with piglit
03:28 imirkin: just made it go
03:29 imirkin: for some reason the thing crashes on me when trying to generate the caselist
03:29 imirkin: so i had to hack around it
03:29 imirkin: now there's a mustpass file, so that's convenient
03:36 imirkin: hm. everything's passing... something seems off.
03:36 airlied: get wierd random crashes running glcts binary here
03:37 imirkin: aha, my hacking was insufficient
03:39 imirkin: there ya go!
03:39 imirkin: getting some nice failures now
03:43 airlied: imirkin: piglit hacking?
03:43 imirkin: a tiny bit
03:44 imirkin: airlied: https://hastebin.com/yipiwetoma.py
03:46 airlied: there is a cts-gl45.py profile as well
03:46 airlied: which is what people were using
03:49 imirkin: o
03:49 imirkin: looks basically the same
09:00 karolherbst: RSpliet: ohh right, there was something
10:29 RSpliet: also, I love this
10:29 RSpliet: "A key reason to merge the L1 data cache with shared memory in GV100 is to allow L1 cache operations to attain the benefits of shared memory performance."
10:39 RSpliet: not to mention the regfile didn't grow. Using those 4*4 matrix multiplications will require... 24 or 32 registers if only in FP16, up to 48
10:40 RSpliet: greatly reducing the # of warps in flight
10:41 RSpliet: does that mean you need to be able to reserve 8 or 16 consecutive registers in RA? Oh boi...
10:48 karolherbst_: RSpliet: maybe?
10:49 karolherbst_: but I don't think they do matrix multiplications? I thought you just have a 4x4 matrix to do vectorized mad operations so to speak
10:49 karolherbst_: also the input is f16
10:50 karolherbst_: and the add input and output can be f32
10:50 karolherbst_: so this is quite of fixed in size already
10:53 RSpliet: karolherbst: no it's a full matrix multiply
10:53 RSpliet: but "The threads within a warp provide a larger 16x16x16 matrix operation to be processed by the Tensor Cores. CUDA exposes these operations as Warp-Level Matrix Operations in the CUDA C++ API."
10:53 RSpliet: sounds like they start using "scalar opportunities" there
10:55 AndrewR: https://bugs.freedesktop.org/show_bug.cgi?id=101000 (so, my new bug will be not lost if my current session crashes...)
10:56 RSpliet: nice bug ID
11:25 karolherbst: RSpliet: ahh, I see
21:39 Hoolootwo: ok so I'm going to double-check that this is reproducable
22:03 Hoolootwo: welp it doesn't happen on my other laptop, I give up
22:07 Hoolootwo: I'll try swapping hard drives, see if it's a config thing, or if it's some weird thing that happens only with this laptop
22:39 Hoolootwo: something is whining about thinkpad_acpi and backlights, which seems like a reasonable cause for suspicion
22:40 gnarface: not missing acpid on one, are you?
22:46 Hoolootwo: no
22:51 Hoolootwo: [ 42.073134] pci 0000:01:00.0: optimus capabilities: enabled, status dynamic power, hda bios codec supported
22:51 Hoolootwo: I don't have optimus/gpu switching at all on this hardware...