00:07 pmoreau: You’ll want CUDA 6.5 for Tesla
00:10 imirkin: ah no, more typing ahead
00:10 imirkin: apparently two wrongs make a right
00:10 imirkin: if you use the same wrong logic to load and store, then it all works out
02:05 imirkin: urgh. looks like i have to do something clever in the shader
02:05 imirkin: it's not just some simple math on x,y,z + constants
02:05 imirkin: i need to do math to determine the block "id"
02:05 imirkin: i think
03:36 imirkin: right... so obviously what i was doing can't work -- i was scaling the y coord by the z "tile height", and then adding the z coord. works great for z < tile height. not so much for the next tile.
03:40 imirkin: i think the next tile is supposed to go in the x coordinate. fun.
04:21 imirkin: mwk: i'm having trouble reproducing your results in the envytools docs... i took your pseudocode and it's producing diff results: https://paste.debian.net/plain/1187256
04:21 imirkin: e.g. for the (0, 0, 2) case
04:22 imirkin: it's producing the same address for (0, 16, 0), so the script is obviously wrong
04:22 imirkin: but it's not immediately apparent to me where
04:31 mwk: block_address = floor(x_coord * element_size / bytes_per_block_x) * block_bytes + floor(y_coord / bytes_per_block_y) * block_bytes * blocks_per_surface_x + floor(z_coord / bytes_per_block_z) * block_bytes * blocks_per_surface_x * blocks_per_surface_z
04:31 mwk: the `z` at the very end should be `y`
04:32 imirkin: hah, i just figured it out independently too
04:32 imirkin: thanks
04:32 imirkin: there's another typo in there too
04:32 imirkin: bytes_per_gob = 1;
04:32 imirkin: should be bytes_per_gob_z
04:32 imirkin: (that one was more obvious)
04:32 mwk: can you make a PR?
04:33 imirkin: yea, i will
04:33 imirkin: i'm going to use this to try to figure out what coords to feed it given no z tiling
04:33 imirkin: so that it all works out
04:34 mwk: or, well, just commit it, it's obvious enough
04:34 imirkin: yeah... have to get my envytools repo in a commitable state
04:34 imirkin: i'll do it later
04:34 imirkin: getting late here ... and probably early where you are :)
04:34 mwk: 5:34
04:34 mwk: yeah, not a good hour
04:35 imirkin: thanks for the continued help!
15:46 AndrewR: Hi All)
15:48 AndrewR: I can't see list of users from Android but I WANT to congratulate pmoreau and karolherbst and imirkin on recent merges (opencl/nouveau) and MRs
15:57 pmoreau: Thank you AndrewR for all the testing you have been doing!
15:59 pmoreau: I was going to ping you, that you might be interested by Ilia’s latest MR (https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9299), at least in testing it; he’s been working on getting images working on Tesla plus quite a few fixes here and there.
15:59 karolherbst:needs to finish images
16:04 AndrewR: I am Quite far from home right now Will return in 20 days or so...
16:22 pmoreau: No worries :-)
16:51 imirkin: hopefully i'll get 3d images sorted today. and that should actually be applicable to fermi too
16:51 karolherbst: nice
16:52 imirkin: i think it has the same limitation
16:52 karolherbst: I should finally finish up the threading patches :D
16:52 karolherbst: I think I have actually fixed it for real now
16:52 karolherbst: the only bug I run with the android emulator is an android emulator bug :(
16:52 imirkin: heh
16:52 karolherbst: yeah.. some FS foo
16:52 karolherbst: deadwait on an event never ariving
16:52 karolherbst: super stupid
16:53 karolherbst: can't restart the emulator, so I can't test in depth :(
16:55 karolherbst: fixing the nouveau_fence code was annoying :(
16:56 karolherbst: but I have to repeat how awesome simple_mtx_t is :)
16:56 karolherbst: finally a lock which doesn't suck perf wise
17:46 imirkin: success! i have a model for how to compute the (x,y) coords for a 3d surface. implementing in shader should be trivial
17:47 imirkin: slightly annoying since you basically have to retile everything by hand, but wtvr
18:12 karolherbst: imirkin: do we even have to advertise 3d groups?
18:12 karolherbst: blocks sure, but groups?
18:12 imirkin: karolherbst: for GL, yes
18:13 karolherbst: mhh
18:13 karolherbst: annoying
18:13 imirkin: | MAX_COMPUTE_WORK_GROUP_COUNT | 3 x Z+ | GetIntegeri_v | 65535 | Maximum number of workgroups that may be dispatched by a single | 5.5 |
18:13 imirkin: 64k in all directions
18:14 imirkin: let me double-check in es3.1
18:15 imirkin: yep, ES3.1 as well
18:16 karolherbst: annoying
18:16 imirkin: (see pdf page 421, printed page 404 in https://www.khronos.org/registry/OpenGL/specs/es/3.1/es_spec_3.1.pdf)
18:17 imirkin: but wtvr, it's pretty easy to support, so who cares
18:17 karolherbst: yep
18:17 karolherbst: for CL we also have to mess around with offsets in this area anyway
18:17 karolherbst: because... unlimited group size :)
18:18 imirkin: the 3d that i'm talking about now is images though, not group sizes
18:33 karolherbst: ohh
19:11 imirkin: karolherbst: the group sizes on nv50 i already handled, so this is about making 3d images work. i have to work around the tiling stuff a bit
23:40 imirkin: alrighty ... almost all 3d tests pass
23:41 imirkin: somehow CAS fails for 3d
23:42 imirkin: oh. 2d cas also fails. but 2d array works. hm. probably something i'm doing in the 3d lowering (which has to apply to 2d as well) is going wrong
23:44 imirkin: current list of fails in the whole category: https://paste.debian.net/plain/1187295