03:39 skeggsb: =/win 22
09:52 karolherbst: pmoreau: do we actually need the code to invoke llvm-spirv ourself in clover, when we have a toolchain inside LLVM/clang doing the work for us?
09:57 pmoreau: karolherbst: I have no idea about that.
09:58 karolherbst: yeah, I think it is mainly for the clang CLI tool, otherwise it would need to generate temporary files/pipe it into the llvm-spirv process and fetch the result from there somehow
09:59 pmoreau: Yeah :-/
11:08 karolherbst: pmoreau: :D guess what
11:09 karolherbst: Physical32: %__spirv_BuiltInGlobalInvocationId = OpVariable %_ptr_Input_v3uint Input
11:09 karolherbst: Physical64: %__spirv_BuiltInGlobalInvocationId = OpVariable %_ptr_Input_v3ulong Input
11:10 karolherbst: ohhh
11:10 karolherbst: it makes sense
11:10 karolherbst: get_global_id returns size_t
12:15 karolherbst: pendingchaos: after you are done with the conservative rasterization stuff, maybe you could look into GL_ARB_transform_feedback_overflow_query support for nouveau? The core and gallium support already landed for that, because it is already implemented for AMD. So only the reverse engineering bits and implementation for nouveau are left here.
12:15 karolherbst: and this extension is required for 4.6
12:16 karolherbst: imirkin_: or did you already look into that?
12:22 pmoreau: karolherbst: Ah right, I had forgotten that, that size_t has different sizes. :-D
12:30 RSpliet: pmoreau: the same is true for the pointer types (uintptr_t, intptr_t, ptrdiff_t)
12:30 pmoreau: Which makes sense
12:32 RSpliet: I've found OpenCL makes a surprisingly large amount of sense in general :-D
12:32 pmoreau: What a pity :-/
12:34 pmoreau: Depending how the Vulkan ray tracing will be, I’ll most likely drop CUDA + OptiX + OpenGL for Vulkan.
12:45 RSpliet: "drop"?
12:46 pmoreau: In my research projects
12:49 pmoreau: RSpliet: I don’t have a secret implementation of CUDA and OptiX on top of Nouveau, if that is what you were wondering about. :-D
12:50 RSpliet: Hahaha, well, the secret CUDA implementation is not that secret really, that's just shenpei's hack :-P
12:51 pmoreau: I don’ŧ remember much about that one.
13:05 RSpliet: pmoreau: https://github.com/shinpei0208/gdev
13:07 RSpliet: Beware, code is as stale as a US congressman
13:07 karolherbst: does it even work
13:07 pmoreau: Oh, the name gdev rings a bell
13:08 karolherbst: and it is proprietary
13:08 RSpliet: karolherbst: it used to work.
13:08 karolherbst: so -> no interest in gdev
13:08 karolherbst: yeah, well, I changed my mind about if I care if it works or not, it is proprietary
13:08 RSpliet: Also, it's MIT licensed
13:09 karolherbst: the License states something else
13:09 karolherbst: at least in the readme
13:09 karolherbst: so what counts now?
13:09 RSpliet: https://github.com/shinpei0208/gdev/blob/master/LICENSE.txt ?
13:09 karolherbst: allthough it says "Lincese" in the readme
13:09 karolherbst: which is License
13:09 karolherbst: ...
13:10 karolherbst: RSpliet: nah.. I don't work with projects who don't have their license situation figured out
13:10 RSpliet: The readme says something else you mean, but I think it's pretty clear the intention is for it to be MIT licensed
13:10 karolherbst: stating two different license things is kind of messy situation
13:10 RSpliet: shinpei's a friend of the community though, used to hang out here a lot. You might recognise some of the other contributors as well ;-)
13:10 karolherbst: RSpliet: so? it isn't legally clear what's valid here, which may or may not cause issues
13:11 imirkin: karolherbst: i already suggested he glance at the overflow query thing :)
13:11 karolherbst: imirkin: ahh, nice
13:11 karolherbst: RSpliet: did those work on gdev for real or are those just mentioned because they worked on nouveau?
13:12 RSpliet: https://github.com/shinpei0208/gdev/graphs/contributors
13:12 karolherbst: okay, mwk has indeed some commits there
13:14 RSpliet: Anyway, it's 3 year old stuff, so I guess mainly could serve as a reference in case anyone gets stuck when doing a clean implementation - I suspect it's not worth trying to forward-port this to modern nouveau
13:14 karolherbst: RSpliet: they use ocelot :(
13:14 karolherbst: for ptx2sass
13:14 karolherbst: allthough maybe they added a real open thing here
13:14 karolherbst: but I though it was just wrapping closed tools
13:15 karolherbst: oh no, they have a real thing
13:15 karolherbst: nice
13:15 karolherbst: but mhh
13:15 karolherbst: I wished they would have just did this work on top of mesa
13:16 pendingchaos: karolherbst: yeah, I'm probably going to try to get some traces today on that
13:16 RSpliet: I guess that was infeasible if they wanted the nvrm back-end to function too
13:18 RSpliet: Anyway, if there's anything useful in there and you remain worried about the licensing, I'm sure you can drop shinpei an e-mail. I'm sure he'll be happy to clear this up, although he's probably not interested in doing any further dev work on gdev
13:27 karolherbst: RSpliet: yeah, probably
13:32 moben: karolherbst: did you have a chance to look at my reclocking mmiotrace from a couple days ago?
13:32 karolherbst: moben: not yet. It has to wait until the weekend
14:13 karolherbst: imirkin: can we access global memory with a 32 bit value as well or is it strictly required to be a 64 bit one?
14:13 imirkin: technically yes
14:13 karolherbst: I could imagine it may be just a 32 bit address on Tesla, but I am not sure
14:13 imirkin: in practice, i don't know how it works
14:14 karolherbst: mhhhh
14:14 karolherbst: glisse: is HMM 64 bit only?
14:14 imirkin: on tesla, you have 16x offsets, and you can select which one to use for the 32-bit address in the shader
14:14 imirkin: on fermi+, there are LD and LD.E variants. the latter takes a 64-bit address. i've never seen the blob use the 32-bit variant, and i'm not 100% sure how it works
14:14 karolherbst: imirkin: ahh and the offsets are selected like we do for const buffers or is it set in the high bits of a 64 bit value?
14:15 imirkin: karolherbst: like we do for ssbo's. except the hw does it, rather than a lowering pass in the shader.
14:15 karolherbst: okay
14:16 karolherbst: mhh, nobody cares about 32 bit kernel/userspace anymore, but I am now wondering how the HMM stuff works if we have 32 bit pointers in the application/host, but 64 bit ones in the GPU
14:16 karolherbst: I am sure this would already break on the CL API level, but what if we get passed that
14:17 karolherbst: maybe we should just will the upper 32 bits with the value of the high bit
14:17 karolherbst: allthough I don't really know how the heap/stack split works on a 32 bit kernel
14:21 glisse: karolherbst: well technicaly no
14:22 glisse: but i don't feel a urge to make sure it works on 32bits, it should there is no reason it should not
14:22 glisse: but here 32bits means your process is 32bits
14:22 karolherbst: glisse: right, but on the GPU we still do 64 bit pointers
14:22 glisse: yup
14:23 glisse: AMD folks did optimize not long ago for that
14:23 karolherbst: and we would feed 32 bit pointers into the kernel
14:23 glisse: to only work on the lower32bits and use 0 for the upper 32bits
14:23 glisse: Marek did that
14:23 karolherbst: yeah, I remember
14:23 karolherbst: but this opt makes no sense on nouveau really
14:23 karolherbst: not on that level at least
14:24 glisse: 32bits is for people who like archeology and Jurassic Park ;)
14:24 karolherbst: instead of doing ld b64 $r0d c0[0x0]; we would do ld b32 $r0 c0[0x0]; mov b32 $r1 0x0
14:24 karolherbst: but then again
14:25 karolherbst: this doesn't really work with spirvs physical memory model
14:25 imirkin: i suspect there are bigger fish to fry :)
14:25 glisse: yeah Marek did it to save register for graphics where he assume the program won't need more than 4GB
14:25 karolherbst: so in spirv we still have 64 bit pointers as the kernel input
14:25 karolherbst: and then the only thing left is a info->input_ptrs_are_32bit flag so that we can opt the high bits in codegen away
14:26 karolherbst: or maybe we could do physical32, but then we need to fixup g[$r0] access to be 64 bit
14:26 karolherbst: or we figure out how the 32 bit mode works?
14:27 karolherbst: anyway, I agree with imirkin here, it sounds to me like a waste of time to think too much about that
14:29 pmoreau: I don’t think you can get OpenCL 2.0 on Tesla, and that is the only gen that can’t do 64-bit.
14:30 karolherbst: pmoreau: just because of SVM?
14:32 pmoreau: I doubt you can enqueue kernels from within a kernel for example
14:32 karolherbst: ahh
14:32 pmoreau: (which is also a 2.0 feature)
14:32 karolherbst: I thought this is a kepler2+ feature anyway
14:32 pmoreau: I think so too
14:33 karolherbst: so if this is a required 2.0 feature, we can get full 2.0 only on kepler2+?
14:33 pmoreau: Possibly
14:34 karolherbst: uhm..
14:34 pmoreau: Can all generations read and write to a same texture, or is that fermi+/kepler+?
14:34 karolherbst: enqueue_kernel(get_default_queue(), ndrange, ^{my_func_A(a, b, c);});
14:34 karolherbst: blocks.. nice
14:34 pmoreau: s/texture/image
14:35 karolherbst: I see a lot of fun coming towards us if we really want to support 2.0
14:35 pmoreau: (I just checked, and in CUDA, dynamic parallelism is available on >=sm_35.)
14:36 karolherbst: pmoreau: what is missing in clover to support 1.2 by the way?
14:36 pmoreau: I don’t know.
14:36 glisse: Tom would know
14:36 karolherbst: or does 1.2 simply requires passing CTS?
14:36 karolherbst: *require
14:36 glisse: Stellar
14:36 pmoreau: Well, that as well
14:37 karolherbst: okay
14:38 pmoreau: vedranm: Did you had time to compile a list of missing features in clover? We talked about it some time ago, but I haven’t looked into it.
14:42 pmoreau: Ah, found back the bug report: https://bugs.freedesktop.org/show_bug.cgi?id=104529
14:43 pmoreau: Hum, I’m thinking about putting that on my todo list, but I’m afraid it’s already too long. Maybe
16:08 vedranm: pmoreau: no, next month
16:09 vedranm: too busy with IRL work right nnow
16:10 pmoreau: vedranm: Understandable; I’m slowly accumulating too many projects IRL as well. If I start something, I’ll update the bug report in bugzilla and keep you informed.
16:11 vedranm: pmoreau: sure, no rush
16:12 karolherbst: pmoreau: ohh right, please tell me if you won't be able to take care of your clover patches as well
16:32 pmoreau: karolherbst: Will do. If I don’t get home to late tonight, I might send them.
16:33 karolherbst: pmoreau: ahh, nice
17:30 karolherbst: pmoreau: should OpMemoryModel be set according to the host or the device memory model?
17:30 karolherbst: pmoreau: think about structs with pointers in it
17:31 karolherbst: and SVM
17:31 karolherbst: and passing them from the host to the kernel
17:31 karolherbst: well "passing"
17:31 pmoreau: I would say device, as the SPIR-V contains the code you will be running on the device, not the host.
17:31 karolherbst: you can't translate those pointers from 32->64 or 64->32
17:32 karolherbst: pmoreau: yeah.. then it doesn't work with SVM
17:32 karolherbst: it can't
17:32 pmoreau: Why can’t you go from 32->64? Just 0 extend it, no?
17:32 karolherbst: well, where?
17:32 karolherbst: the GPU just reads the host memory directly
17:32 pmoreau: In clover, when you are binding the arguments, as it already does (maybe not for pointers though)
17:33 karolherbst: nono, SVM
17:33 karolherbst: no layer in between
17:33 karolherbst: direct memory access
17:33 pmoreau: Ah, for pointers in structs, not pointers passed directly to the kernel
17:33 karolherbst: pmoreau: yeah, for structs like this: https://raw.githubusercontent.com/karolherbst/HMM-examples/master/fma_struct_ptrs.h
17:33 karolherbst: the host is 32, so those are 3x32 bit fields
17:33 karolherbst: but
17:33 karolherbst: if you use the 64 bit memory model in spirv, you compile it to 3x64 bit fields
17:34 pmoreau: You should be able to 0 extend when converting to NIR/NVIR
17:35 pmoreau: But it definitely won’t work for 64->32.
17:35 karolherbst: ;)
17:35 pmoreau: Maybe the spec says something about those scenario?
17:35 karolherbst: not sure
17:36 karolherbst: pmoreau: but the issue is, if you have a spirv 64 bit model module, but the host has 32 bit based memory
17:37 karolherbst: this kind of looks super ugly to deal with
17:37 pmoreau: But anyway, that is not a limitation of SPIR-V: you will have the same issue regardless what IR you are using.
17:37 karolherbst: well, yeah
17:37 karolherbst: just something to keep in mind, because I think freedreno uses 32 bit addressing
17:38 karolherbst: no idea if there will be this HMM based SVM
17:38 karolherbst: but this might need some handling inside clover as well
17:39 pmoreau: We need to check what the spec says, and if it doesn’t say anything about it, I would ping the OpenCL WG about it for clarification.
17:40 karolherbst: yeah
17:41 karolherbst: I guess this makes sense
17:44 pmoreau: “However, with OpenCL 2.0 SVM allocations, it is guaranteed that a global address space pointer on the device matches the pointer representation on the host.” (from https://software.intel.com/en-us/articles/opencl-20-shared-virtual-memory-overview)
17:44 pmoreau: I’ll see if I can find the equivalent text in the spec.
17:45 karolherbst: uhh
17:45 karolherbst: k
17:45 karolherbst: so no SVM on 32 bit hosts?
17:47 pmoreau: Or you force the GPU to only use the low 32-bits, not sure.
17:48 karolherbst: well yeah, but we can't right now :)
17:49 karolherbst: this would add some ugliness into the compiler anyway I guess
17:49 karolherbst: except we could just switch it over on the GPU as well, yeah
17:50 pmoreau: “We require the SVM implementation to work with either 32- or 64- bit host applications subject to the following requirement: the address space size must be the same for the host and all OpenCL devices in the context.” from the OpenCL 2.0 spec, section 5.6.1
17:50 karolherbst: nice...
17:50 karolherbst: this solves that problem
17:51 karolherbst: pmoreau: more CAPS to gallium then
17:52 pmoreau: That should be the only additional one needed, right?
17:53 pmoreau: Well, we also need two for OPENCL_VERSION and OPENCL_C_VERSION, as not all devices in clover will report the same.
17:54 karolherbst: no
17:54 karolherbst: this is clovers task
17:54 karolherbst: not the drivers one
17:54 karolherbst: we have cap for features
17:54 karolherbst: and clover devices what version to expose
17:55 karolherbst: *decides
17:55 karolherbst: and what CL features
17:55 karolherbst: base on the gallium caps
17:55 pmoreau: Ah, then that needs to be implemented
17:56 karolherbst: yeah
17:56 karolherbst: and I think we also need to move the clang spirv triple thing into clover as well
17:56 karolherbst: because of the host/device messup
17:56 karolherbst: if a device can do both, fine, then clover can pass whatever it wants/needs
17:56 pmoreau: That’s done already, isn’t it?
17:56 karolherbst: not with the patch I have
17:57 karolherbst: but maybe there is an updated one
17:57 pmoreau: I changed that a week or two ago already
17:57 karolherbst: ahh okay
17:58 pmoreau: Hum, I don’t have it on my latest version, why
17:59 karolherbst: pmoreau: I was playing around with structs quite a lot today, and I think we are able to dereference any member from any struct correctly now :)
18:00 pmoreau: I was sure I had changed that back, but the current version of my branches certainly do not reflect that :-/
18:00 karolherbst: even with unions/arrays_of_arrays/packed/aligned stuff
18:00 pmoreau: Cool! \o/
18:00 karolherbst: didn't really test structs within structs though
18:00 karolherbst: but that should work as well
21:23 pendingchaos: can someone with a GM20x GPU test the latest conservative rasterization patches?
21:23 pendingchaos: they can be found at https://github.com/pendingchaos/mesa/tree/nv-conservative-raster-v2 with the tests at https://github.com/pendingchaos/piglit/tree/nv_conservative_raster
21:25 lachs0r: I can do that tomorrow (gtx 960/gm206)
22:19 pmoreau: Phew, got Nouveau running well again on the computer.
23:58 pmoreau: karolherbst: I thought I had fixed the compilation issues on my branches, but apparently I did not. Didn’t you had it working on your end?
23:58 karolherbst: uhm... not all of it
23:58 pmoreau: Okay
23:59 pmoreau: Going to fix it right now then.