17:06imirkin: pmoreau: i think i got all the half-float ops. a few flags i'm not sure i 100% understand, but should be enough for a basic impl for someone that cared
17:07imirkin: i guess i should also work out dp2a/dp4a...
17:09pmoreau: What are those dp2a/dp4a? some derivatives thingy?
17:09imirkin: integer dot product
17:09imirkin: int8/int16 i gather
17:09pmoreau: And nice work for getting all the half-float opts. I’m awaiting a Pascal card that should arrive during the week I think, so I’ll be able to try those ops.
17:09pmoreau: Ah, okay
17:10imirkin: i might redo the variants slightly in envydis... i just did sm50 vs sm60
17:10imirkin: but i might need to be a bit more selective...
17:10imirkin: apparently dp2a/4a are only on GP102+
17:12imirkin: making proper use of these operations will be tricky
17:12pmoreau: Hum, interesting. I know they have int8 support on Volta with the tensor cores, but I didn’t remember there being any int8 on Pascal as well.
17:12imirkin: they've had int8 support since fermi
17:13imirkin: the "video" opcodes
17:13pmoreau: Ah, true, they had some vector things in the video opcodes
17:14imirkin: this is either marketing, or they sped them up, or they added extra things that are useful for the deep learning stuff
17:14pmoreau: I think their tensor core can do its stuff on int8 as well, not just fp16.
17:15imirkin: the video ops supported both int8 and int16...
17:15imirkin: (SIMD-style, much like the hadd2/etc stuff works)
18:01Michael__: Hi @mupuf, sorry about yesterday. The session was lost! Can you please tell me about the "fixing fbo-blending" task. I cannot figure out where to start coding from
18:02Michael__: In this task, I understand the concept,but which file to code in or how to start coding is very much unclear. I need a little guidance as this is first time with drivers. can you please help?
21:47rhyskidd: pmoreau: which Pascal card are you getting?
21:50karolherbst: pmoreau: that OpenCL C++ API is super anoying
21:50karolherbst: please stop using it :D
21:51karolherbst: guess what, if we want to just use one or two 2.0 features, it switches over to use like 2.0 stuff for everything where possible
21:51karolherbst: even when non 2.0 APIs would have been enough
21:51karolherbst: pmoreau: we need clCreateCommandQueueWithProperties for example
21:51karolherbst: allthough it only sets CL_QUEUE_PROPERTIES
21:52karolherbst: which means using clCreateCommandQueue would make no difference