00:07pmoreau: You’ll want CUDA 6.5 for Tesla
00:10imirkin: ah no, more typing ahead
00:10imirkin: apparently two wrongs make a right
00:10imirkin: if you use the same wrong logic to load and store, then it all works out
02:05imirkin: urgh. looks like i have to do something clever in the shader
02:05imirkin: it's not just some simple math on x,y,z + constants
02:05imirkin: i need to do math to determine the block "id"
02:05imirkin: i think
03:36imirkin: right... so obviously what i was doing can't work -- i was scaling the y coord by the z "tile height", and then adding the z coord. works great for z < tile height. not so much for the next tile.
03:40imirkin: i think the next tile is supposed to go in the x coordinate. fun.
04:21imirkin: mwk: i'm having trouble reproducing your results in the envytools docs... i took your pseudocode and it's producing diff results: https://paste.debian.net/plain/1187256
04:21imirkin: e.g. for the (0, 0, 2) case
04:22imirkin: it's producing the same address for (0, 16, 0), so the script is obviously wrong
04:22imirkin: but it's not immediately apparent to me where
04:31mwk: block_address = floor(x_coord * element_size / bytes_per_block_x) * block_bytes + floor(y_coord / bytes_per_block_y) * block_bytes * blocks_per_surface_x + floor(z_coord / bytes_per_block_z) * block_bytes * blocks_per_surface_x * blocks_per_surface_z
04:31mwk: the `z` at the very end should be `y`
04:32imirkin: hah, i just figured it out independently too
04:32imirkin: there's another typo in there too
04:32imirkin: bytes_per_gob = 1;
04:32imirkin: should be bytes_per_gob_z
04:32imirkin: (that one was more obvious)
04:32mwk: can you make a PR?
04:33imirkin: yea, i will
04:33imirkin: i'm going to use this to try to figure out what coords to feed it given no z tiling
04:33imirkin: so that it all works out
04:34mwk: or, well, just commit it, it's obvious enough
04:34imirkin: yeah... have to get my envytools repo in a commitable state
04:34imirkin: i'll do it later
04:34imirkin: getting late here ... and probably early where you are :)
04:34mwk: yeah, not a good hour
04:35imirkin: thanks for the continued help!
15:46AndrewR: Hi All)
15:48AndrewR: I can't see list of users from Android but I WANT to congratulate pmoreau and karolherbst and imirkin on recent merges (opencl/nouveau) and MRs
15:57pmoreau: Thank you AndrewR for all the testing you have been doing!
15:59pmoreau: I was going to ping you, that you might be interested by Ilia’s latest MR (https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9299), at least in testing it; he’s been working on getting images working on Tesla plus quite a few fixes here and there.
15:59karolherbst:needs to finish images
16:04AndrewR: I am Quite far from home right now Will return in 20 days or so...
16:22pmoreau: No worries :-)
16:51imirkin: hopefully i'll get 3d images sorted today. and that should actually be applicable to fermi too
16:52imirkin: i think it has the same limitation
16:52karolherbst: I should finally finish up the threading patches :D
16:52karolherbst: I think I have actually fixed it for real now
16:52karolherbst: the only bug I run with the android emulator is an android emulator bug :(
16:52karolherbst: yeah.. some FS foo
16:52karolherbst: deadwait on an event never ariving
16:52karolherbst: super stupid
16:53karolherbst: can't restart the emulator, so I can't test in depth :(
16:55karolherbst: fixing the nouveau_fence code was annoying :(
16:56karolherbst: but I have to repeat how awesome simple_mtx_t is :)
16:56karolherbst: finally a lock which doesn't suck perf wise
17:46imirkin: success! i have a model for how to compute the (x,y) coords for a 3d surface. implementing in shader should be trivial
17:47imirkin: slightly annoying since you basically have to retile everything by hand, but wtvr
18:12karolherbst: imirkin: do we even have to advertise 3d groups?
18:12karolherbst: blocks sure, but groups?
18:12imirkin: karolherbst: for GL, yes
18:13imirkin: | MAX_COMPUTE_WORK_GROUP_COUNT | 3 x Z+ | GetIntegeri_v | 65535 | Maximum number of workgroups that may be dispatched by a single | 5.5 |
18:13imirkin: 64k in all directions
18:14imirkin: let me double-check in es3.1
18:15imirkin: yep, ES3.1 as well
18:16imirkin: (see pdf page 421, printed page 404 in https://www.khronos.org/registry/OpenGL/specs/es/3.1/es_spec_3.1.pdf)
18:17imirkin: but wtvr, it's pretty easy to support, so who cares
18:17karolherbst: for CL we also have to mess around with offsets in this area anyway
18:17karolherbst: because... unlimited group size :)
18:18imirkin: the 3d that i'm talking about now is images though, not group sizes
19:11imirkin: karolherbst: the group sizes on nv50 i already handled, so this is about making 3d images work. i have to work around the tiling stuff a bit
23:40imirkin: alrighty ... almost all 3d tests pass
23:41imirkin: somehow CAS fails for 3d
23:42imirkin: oh. 2d cas also fails. but 2d array works. hm. probably something i'm doing in the 3d lowering (which has to apply to 2d as well) is going wrong
23:44imirkin: current list of fails in the whole category: https://paste.debian.net/plain/1187295