02:34karolherbst: mhh okay.. so suspending using the sw path takes like 220 seconds :/
02:35karolherbst: but it does seem to work reliably
02:39imirkin: woohoo, great success. by the time it's done suspending, you already want to resume ;)
02:39karolherbst: huh..
02:39karolherbst: what's up with nouveau_bo_move_prep?
02:39karolherbst: that function looks...... odd
02:42karolherbst: anyway.. doesn't matter
02:43karolherbst: imirkin: I kind of expect fencing to be broken... remember the issue I hit within mesa, but everything was fine regardless?
02:44karolherbst: I am actually wondering what would happen if I'd do the same on the kenrel.. just assume everything is fine and move on
02:47karolherbst: ohhhhhhhhhhhhhhhhhhhhhh
02:49karolherbst: oh no..
02:49karolherbst: I hope my current theory is wrong
02:50imirkin: if your past theories are any prediction...
02:57karolherbst: fuck
02:57karolherbst: we don't stop the channel before evicting memory
02:57karolherbst: well
02:57karolherbst: wait for idle
02:57karolherbst: _but_ it seems like something pushes work to the gpu while we evict things
02:59karolherbst: imirkin: soo... would you expect that after calling into nouveau_fbcon_set_suspend and nouveau_display_suspend channel 1 (used by the kernel) is active and doing random stuff?
03:00karolherbst: mhhh
03:02karolherbst: maybe I missunderstand what ttm_resource_manager_evict_all is supposed to do
03:02karolherbst: but my understanding is kind of that we remove all memory out of VRAM with that
03:03karolherbst: well move, and it gets copied over to sys mem
08:14graphitemaster: imirkin, Does NV hardware microarchitectually implement SIMT as 8x SIMD with each SIMD being 4 lanes. Is this where that 4x8 configuration comes from and why mapping thread indices to some texture space (as an example) not end up within a single warp. It's actually kind of weird that a warp is not a multiple of gl_LocalInvocationIndex
08:15imirkin: it changes on volta+
08:15imirkin: on fermi/kepler/etc afaik it's just waves of 32 SIMD thingies
08:16imirkin: volta+ is magic
08:16imirkin: but i don't really know what the magic is
08:16graphitemaster: So are warps are just partitioned into multiples of the thread index then?
08:21imirkin: well, multiples of 32
08:21imirkin: last one probably just has a bunch of lanes masked off
08:21imirkin: moral of the story: don't run compute shaders with local size 1
08:24graphitemaster: So gl_SubgroupInvocationID == gl_LocalInvocationIndex % gl_SubgroupInvocationSize
08:24graphitemaster: Or is that wrong
08:25graphitemaster: Looking at https://www.khronos.org/registry/OpenGL/extensions/KHR/KHR_shader_subgroup.txt
08:26graphitemaster: If the extension GL_KHR_shader_subgroup_basic is enabled, the variable
08:26graphitemaster: <gl_SubgroupInvocationID> is a built-in containing the index of an
08:26graphitemaster: invocation within a subgroup. The value of this variable is in the range
08:26graphitemaster: 0 to <gl_SubgroupSize>-1.
08:27graphitemaster: Er, typo'd there, gl_SubGroupInvocationID == gl_LocalInvocationIndex % gl_SubgroupSize ?
15:23karolherbst: imirkin: btw, I was hitting https://gitlab.freedesktop.org/drm/nouveau/-/issues/150 as well
18:06imirkin: karolherbst: heh. i guess people aren't totally crazy :)
18:06karolherbst: it's weird though
18:07karolherbst: I wouldn't be surprised if it's related to the suspend/resume issue
18:09karolherbst: imirkin: maybe we cut of the upper bits somewhere for stupid reasons?
18:09karolherbst: mhh
18:09karolherbst: something for next week