01:40AndrewR: so, speaking about Cinelerra crash ..apitrace produced surprizingly small trace file, and dump'ing it with 'apitrace dump' showed only this: https://pastebin.com/BqAzNDRw
02:49Hoolootwo: imirkin, I seem to be getting different results with different kernel configs
02:50Hoolootwo: even with the working one, it's still intermittent, which leads me to suspect some form of race condition in init
03:11imirkin: AndrewR: i thought i fixed that...
03:11imirkin: AndrewR: can you check where it's dying?
03:11AndrewR: imirkin, with gdb?
03:17AndrewR: imirkin, https://pastebin.com/aMJYGU6m
03:17imirkin: that's not even my fault! :)
03:26airlied: imirkin: got the gl cts to run from the open tree, but most of the tests are unfortunately in the hidden code
03:27imirkin: airlied: ok, i'm barely getting piglit to run the glcts stuff directly =/
03:28airlied: the renamed GL45-CTS to KHR-GL45 I think
03:28imirkin: nah, was fighting with piglit
03:28imirkin: just made it go
03:29imirkin: for some reason the thing crashes on me when trying to generate the caselist
03:29imirkin: so i had to hack around it
03:29imirkin: now there's a mustpass file, so that's convenient
03:36imirkin: hm. everything's passing... something seems off.
03:36airlied: get wierd random crashes running glcts binary here
03:37imirkin: aha, my hacking was insufficient
03:39imirkin: there ya go!
03:39imirkin: getting some nice failures now
03:43airlied: imirkin: piglit hacking?
03:43imirkin: a tiny bit
03:44imirkin: airlied: https://hastebin.com/yipiwetoma.py
03:46airlied: there is a cts-gl45.py profile as well
03:46airlied: which is what people were using
03:49imirkin: looks basically the same
09:00karolherbst: RSpliet: ohh right, there was something
10:29RSpliet: also, I love this
10:29RSpliet: "A key reason to merge the L1 data cache with shared memory in GV100 is to allow L1 cache operations to attain the benefits of shared memory performance."
10:39RSpliet: not to mention the regfile didn't grow. Using those 4*4 matrix multiplications will require... 24 or 32 registers if only in FP16, up to 48
10:40RSpliet: greatly reducing the # of warps in flight
10:41RSpliet: does that mean you need to be able to reserve 8 or 16 consecutive registers in RA? Oh boi...
10:48karolherbst_: RSpliet: maybe?
10:49karolherbst_: but I don't think they do matrix multiplications? I thought you just have a 4x4 matrix to do vectorized mad operations so to speak
10:49karolherbst_: also the input is f16
10:50karolherbst_: and the add input and output can be f32
10:50karolherbst_: so this is quite of fixed in size already
10:53RSpliet: karolherbst: no it's a full matrix multiply
10:53RSpliet: but "The threads within a warp provide a larger 16x16x16 matrix operation to be processed by the Tensor Cores. CUDA exposes these operations as Warp-Level Matrix Operations in the CUDA C++ API."
10:53RSpliet: sounds like they start using "scalar opportunities" there
10:55AndrewR: https://bugs.freedesktop.org/show_bug.cgi?id=101000 (so, my new bug will be not lost if my current session crashes...)
10:56RSpliet: nice bug ID
11:25karolherbst: RSpliet: ahh, I see
21:39Hoolootwo: ok so I'm going to double-check that this is reproducable
22:03Hoolootwo: welp it doesn't happen on my other laptop, I give up
22:07Hoolootwo: I'll try swapping hard drives, see if it's a config thing, or if it's some weird thing that happens only with this laptop
22:39Hoolootwo: something is whining about thinkpad_acpi and backlights, which seems like a reasonable cause for suspicion
22:40gnarface: not missing acpid on one, are you?
22:51Hoolootwo: [ 42.073134] pci 0000:01:00.0: optimus capabilities: enabled, status dynamic power, hda bios codec supported
22:51Hoolootwo: I don't have optimus/gpu switching at all on this hardware...