12:00karolherbst: pmoreau: ever ran the CTS spirv tests? :D
12:01karolherbst: also, we might want to fix the CTS to accept the extension as well
12:05Forty-Bot: I would like to reclock my 960, but when I cat /sys/kernel/debug/dri/0/pstate I get "No such device"
12:05karolherbst: Forty-Bot: it's disabled on those as we can't change fan speeds
12:05karolherbst: well, we know how to do it
12:05karolherbst: the hw just doesn't allow us
12:06Forty-Bot: the nvidia driver has been crashing recently so I wanted to try nouveau
12:06karolherbst: you can of course compile the kernel yourself with 1 or 2 patches and force enable it, but we thought for the general user it might not be a good idea :)
12:06Forty-Bot: but I'm getting 50 fps in games where I used to get 250
12:06karolherbst: it's an annoying issue
12:07karolherbst: essentially what we require is redistributable signed PMU firmware from nvidia in order to do it
12:07karolherbst: (and support their protocol and what not)
12:07karolherbst: Forty-Bot: this is a desktop, no?
12:07karolherbst: yeah mhhh
12:08karolherbst: I have some patches for laptop users as fans are generally driven by the laptops firmware, but on desktop usually we don't want to allow higher clocks when the fans keep running at stock speeds :/
12:08karolherbst: sorry that we can't do anything about it really
12:08imirkin: one user went so far as to attach his own fans onto his GPU
12:08karolherbst: (well we could, but that would require extracting firmware from nvidias driver and users would have to do it themselves or run scripts)
12:08imirkin: so that they could be controlled separately
12:09Forty-Bot: ok so I should just use nvidia drivers for this gpu and try and find a fix for the crashes>
12:09karolherbst: anyway.. at the moment every solutions sucks
12:09karolherbst: if performance is important to you then probably yes
12:11imirkin: thank you for buying nvidia. we appreciate you have a choice of gpu vendors, and you appear to have made the wrong one.
12:12Forty-Bot: I wasn't really thinking about that at the time... it also worked very well for like 4 years and it's only recently staretd crashing
12:12imirkin: think about it next time you guy a gpu :)
12:14Forty-Bot: at the time I was just glad to be moving off of intel integrated graphics
12:14Forty-Bot: actually, I also played a game which has since dropped linux support which *only* worked on nvidia gpus for some reason
12:15karolherbst: this is how my day starts today: https://gist.githubusercontent.com/karolherbst/ad1a934595736f4d549b408aad69ae6d/raw/59339d88957abbdff3133c35561ce6a47dd8c7ad/gistfile1.txt
12:16imirkin: is that wrong?
12:16karolherbst: when I compare strlen of the c strings both are 39...
12:16imirkin: i don't think size() does what you think it does
12:17karolherbst: imirkin: one string is longer even though they are the same?
12:17karolherbst: c++ uses size when comparing strings
12:17imirkin: c++ strings are pascal-style
12:17imirkin: while c strings are ... c-style
12:17imirkin: i.e. c-strings are null-terminated
12:18imirkin: while c++ strings have an explicit length
12:18imirkin: and thus can contain nulls
12:18imirkin: now ... it's VERY RARE to actually want that
12:18karolherbst: thing is just.. it all originated from some C stuff
12:18imirkin: so most likely this points to a bug somewhere
12:19karolherbst: yeah.. mhhh
12:19karolherbst: found was extracted via istringstream and the >> operator
12:19imirkin: which is why it's correct...
12:19imirkin: it's not, i guess...
12:20karolherbst: yeah.. dunno
12:20imirkin: good luck with that ;)
12:20imirkin: one counts the null in the size, the other doesn'
12:20imirkin: good times.
12:20karolherbst: orignally the string got retrieved via the CL API which is C
12:20karolherbst: just splitted with istringstream
12:21karolherbst: the heck
12:21karolherbst: code is weird sometimes
12:21karolherbst: I think I see what's up
12:22karolherbst: I bet istreamstream splits until it finds a non "NULL" character or sometimg
12:22karolherbst: and the CTS retrieves stuff with an explicitly sized std::vector<char>, fills in via data() and then constructs the std::string through tierators
12:23RSpliet: sounds inefficient
12:23karolherbst: it has to fetch the size first to provide a buffer big enough :)
12:26karolherbst: so yeah.. this only happens with the last extensions advertised
12:26karolherbst: when returning from clover assert(extensions.size() == strlen(extensions.c_str())) this is fine
12:27karolherbst: so I don't think anything is broken on that level
12:28RSpliet: That sounds dodgy tbh
12:28RSpliet: size() returns "the number of actual bytes that conform the contents of the string, which is not necessarily equal to its storage capacity."
12:28RSpliet: c_str() should add a null-terminator to the string
12:29karolherbst: it does
12:29RSpliet: strlen() should count the null terminator ;-)
12:29karolherbst: p (int)strlen(found.c_str())
12:29karolherbst: $89 = 39
12:29karolherbst: (gdb) p found.size()
12:29karolherbst: $90 = 40
12:30karolherbst: you tell me what's wrong :p
12:30imirkin: like i said - size isn't what you think it is.
12:30imirkin: it is _not_ strlen
12:30karolherbst: it is used when using the == operator though
12:30imirkin: of course.
12:30imirkin: and those two strings ARE different
12:30imirkin: one is length 40, the other is 39
12:31karolherbst: I just don't know why they are different :)
12:31karolherbst: so it is a istringstream bug or what?
12:31RSpliet: manually counting cl_khr_spirv_no_integer_wrap_decoration gives me 39. so strlen(found.c_str()) should give 40
12:31karolherbst: strlen doens't count the null terminator
12:31imirkin: RSpliet: strlen doesn't count null
12:32RSpliet: oh my bad
12:32RSpliet: you're right
12:32RSpliet: Ok, that's one thing resolved for me
12:32imirkin: karolherbst: solution to your problem is "don't use =="
12:32karolherbst: only for the last string extracted, size != strlen
12:32karolherbst: heh ;D
12:33karolherbst: I guess I could propose a fix but that's the CTS so I am more annoyed by the fact I am supposed to do stuff privately?
12:33RSpliet: karolherbst: for (int i = 0; i < found.size(); i++) printf("%x\n", found[i]);
12:34RSpliet: Be nice to know exactly what those extra characters are
12:34karolherbst: it just has two null terminators
12:35karolherbst: I bet istringstream is wonky
12:35karolherbst: so it includes the 0 of the input string
12:35karolherbst: and adds another 0
12:37karolherbst: the entire extensions string is one element too long
12:37karolherbst: the plot thickens
12:39karolherbst: we return 294 for the extension strings size, but it's only 293 long
12:39karolherbst: guess it's our bug
12:45pmoreau: karolherbst: https://github.com/KhronosGroup/OpenCL-CTS/issues/790
12:45karolherbst: I think it's a CTS bug
12:46pmoreau: It is, there is no doubt about it
12:46karolherbst: I think the issue is how istreamstream is used
12:46karolherbst: or how the string is constructed...
12:47karolherbst: pmoreau: does the spec even requires null terminated strings? :D
12:48pmoreau: `std::string` are not-null terminated.
12:48pmoreau: Did you read what I wrote in that issue btw? :-)
12:48karolherbst: that's not my point
12:48karolherbst: I mean, whatever we report via the CL API
12:49pmoreau: C-strings should be null-terminated
12:49pmoreau: And what we report via the CL API is null-terminated
12:49karolherbst: then it's easy to fix
12:50pmoreau: The problem is that https://github.com/KhronosGroup/OpenCL-CTS/blob/09aa54246f607bc1770a7cc2dc076185487a00f7/test_common/harness/deviceInfo.cpp#L48 copies the '\0' into the C++ string rather than discarding it.
12:50karolherbst: pmoreau: return std::string(info.begin(), info.end()); -> return std::string(info.data())
12:51pmoreau: I feel like I saw that somewhere already… :-D
12:51karolherbst: anyway, that's what I would go for really
12:51karolherbst: the issue is rather if the spec expects a null terminator or not
12:52karolherbst: and if people throw in null terminators inside their strings they deserve to fail conformence
12:53karolherbst: pmoreau: "Returns a space separated list of extension names"
12:53pmoreau: I didn’t send a patch with that fix cause I didn’t have time nor wasn’t sure whether the API ever expects to use '\0' as a way to split a string into several elements, like "str0\0str1\0str2\0\0".
12:55karolherbst: now I need to figure out how to get the spirvs :D
12:55pmoreau: And to answer your earlier question, I think so cause I was running into the issue of BuiltIn variables not using the Input storage class
12:56pmoreau: Let me see if I have a script somewhere for it… it wasn’t too difficult, you just needed to pass a path to the folder with the binaries IIRC
12:56pmoreau: karolherbst: ./bin/test_spirv_new --spirv-binaries-path /home/pmoreau/Projects/OpenCL_for_Nouveau/src/OpenCL-CTS/test_conformance/spirv_new/spirv_bin
12:58karolherbst: sure.. but I don't have the binaries :p
12:58RSpliet: pmoreau: if the API expects \0 to be separators rather than terminators, it wouldn't leave the terminating \0 in place no?
12:58karolherbst: RSpliet: it's explicitly sized
12:59pmoreau: karolherbst: How come you don’t have them? :o What do you have in test_conformance/spirv_new/spirv_bin?
13:00karolherbst: I don't have KHRONOS_OFFLINE_COMPILER set anyway
13:00pmoreau: Me neither
13:00pmoreau: Are you sure you are on the correct branch?
13:00pmoreau: https://github.com/KhronosGroup/OpenCL-CTS/tree/master/test_conformance/spirv_new/spirv_bin There are available in master
13:01karolherbst: I checked in my build dir :D
13:02karolherbst: chdir mess
13:02karolherbst: why is software so annoying? :D
13:02pmoreau: RSpliet: Re “ if the API expects \0 to be separators rather than terminators, it wouldn't leave the terminating \0 in place no?” Well, if you do not have a size, the final one is still needed to know you reached the end.
13:02pmoreau: Why do you need to chdir?
13:03RSpliet: pmoreau: that would only work if there's no other \0s in the string
13:04pmoreau: Unless you consider that an empty string (i.e. two '\0' in a row) mark the end of your string array. But it’s not something I would ever recommend or do, but I have seen at least one API do that.
13:05karolherbst: pmoreau: for my CTS runner if I don't want to pass in the argument..
13:05karolherbst: but I guess I should
13:05pmoreau: Ah I see
13:05RSpliet: More generally, \0 should have a meaning. That meaning is either "separator between elements", "terminator of the string" or "undefined, should never appear". C++ doesn't mandate a meaning when handling string objects, so that's all up to the SPIR-V guys. All three meanings I've given it would imply that they shouldn't be retained when converting to a list of strings :-)
13:05karolherbst: ehh.. (seed = 0371120151) ERROR: unknown nir_op f2i8
13:05karolherbst: annoying :D
13:05RSpliet: or well, not the SPIR-V guys, but khronos...who happen to also be the SPIR-V guys :-P
13:22karolherbst: pmoreau: stuff looking good though
13:22karolherbst: Pass 155 Fails 38 Crashes 9
13:25karolherbst: some value mismatches but I think it's not that bad
14:50karolherbst: pmoreau: once you have time, mind taking a look at https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6433 ?
15:09karolherbst: so the CTS is annoying
15:10karolherbst: if we advertise 4GB of memory being legal to allocate, it just goes ahead and allocates 4095MB :D
15:10karolherbst: which obviously fails
15:55karolherbst: pmoreau: nice.. I have three tests which have no subtest fails (with 1.1) now :)
15:55karolherbst: allocations, atomics and basic
16:09pmoreau: Are you running a particular branch to avoid running 1.2+ tests?
16:09karolherbst: that's not needed anymore
16:09karolherbst: I just run them as 1.1 or 1.2
16:09karolherbst: starting with 2.0 there are more fails
16:10pmoreau: Well, there are several 1.2 tests that are not marked as 1.2 and so will still run even at 1.0, which is a problem if it’s not implemented.
16:11karolherbst: but it doesn't matter :p
16:11karolherbst: I think 1.2 is a viable target
16:13pmoreau: It does matter more on nv50 I guess, since we won’t be able to advertise more than 1.0 IIRC.
16:13karolherbst: why not?
16:14pmoreau: 1.1 bumped the minimum amount of local memory available, and it’s higher than what you get on Tesla hw.
16:14karolherbst: so stupid :/
16:15pmoreau: “CL_DEVICE_LOCAL_MEM_SIZE from 16 KB to 32 KB.” and Tesla has 16 KB
16:16karolherbst: I don't think this is an issue though
16:17karolherbst: you have 48k per block
16:17pmoreau: Tesla has 16 KB per SM
16:17karolherbst: and I think nvidia advertises 1.1 even on tesla
16:17karolherbst: maybe not alll
16:17karolherbst: but some at least
16:17pmoreau: And a block lives on a single SM
16:18karolherbst: "Driver 340+ support OpenCL 1.1 for Tesla and Fermi."
16:19pmoreau: Mmh, I wonder what they do then
16:19karolherbst: pmoreau: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#features-and-technical-specifications
16:20karolherbst: uhh wait
16:20karolherbst: we need an older version
16:20pmoreau: Look at the one on Wikipedia: https://en.wikipedia.org/wiki/CUDA#Version_features_and_specifications
16:20karolherbst: yeah, it says 48k per block
16:21karolherbst: maybe you can create blocks across multiple SMs?
16:21karolherbst: that would be super strange
16:21karolherbst: well.. we have the QMD files, let's check there
16:22imirkin: qmd is for nve4+
16:22karolherbst: well,... class headers then
16:22pmoreau: AFAIK a block can not span multiple SMs, at least in recent architectures. But maybe Tesla was different? It’s the only architecture where the limit per thread block is higher than the limit per SM.
16:24karolherbst: als there are no max registers per thread block
16:24pmoreau: Let’s see if I can find anything in the early CUDA programming guides which only supported Tesla.
16:30pmoreau: “A block is processed by only one multiprocessor, so that the shared memory space resides in the on-chip shared memory leading to very fast memory accesses.” (from the Programming guide for CUDA 1.0) soooo, is this leaving the door opened to possibly using global memory instead of shared if you use too much shared?
16:32karolherbst: I think 1.0 is too old to use that one :p
16:32karolherbst: what's the latest one supported on tesla 6.5?
16:34pmoreau: I think 6.5 sounds about right
16:34karolherbst: ahh yeah
16:35pmoreau: I wanted to check in one which hadn’t been influenced by Fermi, since it seems the behaviour might have changed.
16:36karolherbst: "dnf install cuda-documentation-6-5" :D
16:38karolherbst: let's see
16:39pmoreau: In CUDA 2.0 and 3.0 (the first one with Fermi support), they changed the language a bit: “There is a limit to the number of threads per block, since all threads of a block are expected to reside on the same processor core and must share the limited memory resources of that core.”
16:39karolherbst: Maximum amount of shared memory per thread block: 16 kb
16:41karolherbst: pmoreau: I mean.. we can advertise 1.0 on tesla. I think I just prefer that we have a method checking all the requiernments inside clover
16:41karolherbst: so we don't have to constantly look it up