06:13karolherbst: pmoreau: nice
06:18karolherbst: pmoreau: you forgot to add clCreateProgramWithIL as well
06:18karolherbst: or at least to export it
06:20karolherbst: I doubt it makes sense to skip it before clover reaches 2.1 or something
07:04karolherbst: pmoreau: and integrated :) https://github.com/karolherbst/mesa/commits/nouveau_nir_spirv_opencl
07:04karolherbst: seems to work nicely if I ignore those nir fails
07:05karolherbst: we should think what to do with that TGSI stuff
07:05karolherbst: if nobody uses that I would say we should just remove support for it...
07:05karolherbst: and I am sure nobody is interested in a cl -> TGSI path anyway?
07:59tomeu: robclark: ended up hacking on compute stuff this weekend?
08:01pmoreau: karolherbst: Yeah, I would go with removing it as well, if nobody’s using it.
08:02pmoreau: I didn’t sent the series yesterday, cause 1) the mailing was down anyway, and 2) I was too tired to write a cover letter that would have made sense. :-D
08:02tomeu:has no idea of who could be interested on the tgsi path
08:02pmoreau: Me neither
08:03pmoreau: karolherbst: Ah, either I made a mistake, or you picked the wrong branch.
08:03pmoreau: (For clCreateProgramWithIL)
08:06pmoreau: You did make a mistake: the branch clover_spirv_only does have those OpenCL 2.1 functions. I guess you went with the clover_spirv_series_v2 one, which does not have it, cause I am not planning to send them before we reach 2.0.
08:09karolherbst: pmoreau: uhm...
08:09karolherbst: pmoreau: send it anyway
08:09karolherbst: there is no reason not to
08:11karolherbst: well the only annoying part is to add that ICD stuff
08:11pmoreau: Well, I would need to find a way to not expose those functions in cl.h, cause curro did not want to. :-) But I could add a temporary header in clover to define them.
08:11karolherbst: why not?
08:11karolherbst: we should just ship the most recent cl.h anyway
08:11karolherbst: nobody uses the mesa cl.h
08:12karolherbst: and nobody should use vendor cl.h in the first place
08:12pmoreau: True, but that does not mean we should export things we do not support.
08:12karolherbst: but we have clCreateProgramWithIL
08:12karolherbst: normally you compile against ocl-icd
08:12karolherbst: and run against whatever OpenCL impl
08:13karolherbst: maybe even run against rocm and have a nvidia gpu running with nouveau
08:13pmoreau: By the way, the Khronos ICD loader does have clCreateProgramWithIL, and even OpenCL 2.2 support.
08:13karolherbst: and most will compile against that
08:13karolherbst: and if something compiled against ocl-icd doesn't run with mesa
08:13karolherbst: it is a bug in mesa
08:14karolherbst: or mesa can't be used as a system opencl impl
08:14karolherbst: but has to be used through icd
08:14karolherbst: which means, we have to add func pointers in the dispatch.Xpp files
08:15karolherbst: you know at runtime what the OpenCL feature level of the devices are
08:15karolherbst: you have to check every device anyway
08:15karolherbst: otherwise your application is faulty
08:15karolherbst: never depend on information at compile time
08:16pmoreau: It seems weird that OpenCL implementations would need to fake they support all functionalities when they do not
08:16karolherbst: if you want to be a system OpenCL implementation, you kind of have to
08:16karolherbst: otherwise your library won't get used as a system OpenCL implementation
08:16karolherbst: it is really that simple
08:17karolherbst: I don't really care about Mesa as the system one
08:17karolherbst: the ICD stuff _has_ to work
08:17karolherbst: and we could expose null ptrs for all 1.2+ functions
08:18karolherbst: and just have clCreateProgramWithIL providing a real value
08:20karolherbst: pmoreau: I think the cl_khr_il_program extension doesn't get added to the extension string
08:21pmoreau: Why not? https://phabricator.pmoreau.org/rMESA7b10e14948147a9d270ef10cf82f704b74ac41a6
08:21karolherbst: mhh it doesn, but your util.cpp file in the example repository doesn't pick it up
08:21pmoreau: Do you have SPIRV as a supported IR? ;-)
08:21pmoreau: Cause my example code does pick it up on my computer.
08:21karolherbst: ohh wait
08:21karolherbst: it was my fault
08:21karolherbst: you have that opencl_version >= "1.2" check as well
08:22karolherbst: yeah, nvm
08:22karolherbst: it works with just that check removed as well
08:22pmoreau: I have a bump to 1.2 on the branch I told you to use :-p
08:23karolherbst: pmoreau: anyway, I think we can just add clCreateProgramWithIL through that ICD interface
08:23karolherbst: stick the correct amount of NULLs into the dispatch table
08:23karolherbst: I think we woulc need to add enough func pointers declerations as well
08:23karolherbst: but this could be a second series on top of the clCreateProgramWithILKHR one
08:25karolherbst: pmoreau: rocm exposes clCreateProgramWithILKHR just for OpenCL 2.0+ ...
08:25pmoreau: Weird that they don’t do it for 1.2
08:25karolherbst: it is actually a 2.0 extension
08:25karolherbst: I am sure it could be even used with a 1.0 feature set, right?
08:25pmoreau: It is a 1.2 extension as well
08:26karolherbst: or do you really want to have 1.2 for that?
08:26karolherbst: I am kind of strictly against pointless limitations...
08:26pmoreau: You want the split of clBuildProgram in clCompileProgram + clLinkProgram
08:27pmoreau: I can’t remember if that’s part of 1.1 or 1.2
08:27karolherbst: sounds like 1.2
08:27pmoreau: Yep, it’s 1.2
08:27karolherbst: yeah, makes sense then
08:28karolherbst: pmoreau: anyway, you have to keep in mind, that ocl-icd doesn't expose clCreateProgramWithILKHR, so there is nearly nobody being able to use that through mesa
08:29pmoreau: It’s not even in the official OpenCL headers
08:29karolherbst: that's why I think exposing clCreateProgramWithIL is the good alternative here
08:29pmoreau: I am still waiting on that pull request to get merged though
08:29karolherbst: I don't really care about exporting the symbol though
08:30karolherbst: but it should be accessable through the ICD interface
08:30pmoreau: I’ll need to make one for the ICD loader as well I guess
08:30karolherbst: not really
08:30karolherbst: it is the same
08:30karolherbst: the ICD is just a function pointer table
08:30pmoreau: pull request I mean
08:30karolherbst: the icd just contains clCreateProgramWithIL afaik
08:31pmoreau: Making one pull request for the OpenCL-Headers is not going to magically have it appear in the ICD loader
08:31karolherbst: that's why I talk about clCreateProgramWithIL
08:31karolherbst: not clCreateProgramWithILKHR
08:31karolherbst: clCreateProgramWithILKHR is an extension
08:31pmoreau: But that extension should still be in the ICD loader
08:31karolherbst: and you get that through that extension func pointer function
08:31pmoreau: Somehow you have extensions in the ICD loader as well
08:32karolherbst: clGetExtensionFunctionAddressForPlatform / clGetExtensionFunctionAddress
08:32karolherbst: through that
08:32karolherbst: those are part of the ICD
08:32karolherbst: your driver exposes those through ICD
08:32karolherbst: you get clCreateProgramWithILKHR as well
08:32karolherbst: there is no clCreateProgramWithILKHR ;)
08:33karolherbst: just clCreateProgramWithIL
08:33pmoreau: Why could one extension be there and not the other one?
08:33karolherbst: because extensions are retrieved by other means
08:33karolherbst: clGetExtensionFunctionAddressForPlatform / clGetExtensionFunctionAddress
08:33karolherbst: you get the driver version of those
08:33karolherbst: and the driver version can handle clCreateProgramWithILKHR
08:33karolherbst: so why would the ICD even care?
08:34pmoreau: cl_khr_egl_image is an extension as well, which you retrieve through those clGetExtensionFunctionAddress functions, but why is it in the ICD loader then?
08:34karolherbst: because it is part of the spec I suppose
08:34pmoreau: Then shouldn’t it drop it KHR suffix?
08:34karolherbst: I mean, the extension might be part of the ICD spec
08:34pmoreau: I become clCreateFromEGLImage, like clCreateProgramWithIL?
08:34karolherbst: there is a problem for the ICD
08:34karolherbst: you can't rename/remove shit
08:35karolherbst: if it is there
08:35karolherbst: it has to stay forever
08:35karolherbst: undere the same name
08:35pmoreau: And you can’t have both I guess?
08:36karolherbst: you can
08:36karolherbst: the ICD is there to provide a stable ABI
08:36karolherbst: so that you can link against any cl implementation
08:36karolherbst: and run against any cl implementation
08:37karolherbst: if clCreateProgramWithILKHR never gets exposed as a symbol, it is good, because nobody has to care about it later on
08:37karolherbst: and just get the func pointer through the extension functions
08:37karolherbst: the same situation we have in OpenGL where implementation started to export symbols they don't have to
08:37karolherbst: and every impl has to deal with that now
08:39pmoreau: I get that you have to get the function pointer through the extension functions, I just don’t see why you do that for some extensions, and not for others
08:39karolherbst: because vendors are crappy
08:39karolherbst: maybe khornos did it on purpose
08:39karolherbst: who knows
08:40karolherbst: we don't need an entry for clCreateProgramWithILKHR in the ICD, just for clCreateProgramWithIL :)
08:42pmoreau: That I haven’t done (neither for one nor the other).
08:42karolherbst: yeah. I think it would make sense to do that in a second series
08:43karolherbst: and just copy/paste everything up to 2.2 while at it
08:43pmoreau: I agree with the second series.
08:43karolherbst: for me it just makes sense to expose clCreateProgramWithIL already
08:43karolherbst: but yeah
08:43karolherbst: we can do that in a second series
08:43pmoreau: I’ll let you do that, since you seem to better understand how that ICD loader works. ;-)
08:44karolherbst: I'll write the patch for it today then
08:45pmoreau: Go for it! :-)
08:47pmoreau: I might look tonight into rewriting my patch for generating SPIR-V in clCreateProgramWithSource to use tomeu’s code. But I doubt I’ll have time; still need to send the series plus do a couple of pull requests for SPIRV-Tools.
08:57pmoreau: karolherbst: If you plan on sending the patch today for the dispatch table, I can rebase my series on top of yours to fill in the dispatch table for clCreateProgramWithIL.
09:00karolherbst: pmoreau: good idea actually
09:18karolherbst: pmoreau: uhh, hpp11 compile error when dispatch.hpp includes CL/cl_egl.h :(
09:19pmoreau: Which error do you get?
09:19karolherbst: ../src/gallium/state_trackers/clover/spirv/spirv.hpp11:159:5: error: expected identifier before numeric constant
09:19pmoreau: I guess it’s a define thing once more
09:19karolherbst: None = 0,
09:20karolherbst: clang for the rescue
09:20pmoreau: Ah, I have been having issues with None in a C++ program recently
09:20pmoreau: It seemed to be a C++ keyword now
09:21karolherbst: /usr/include/X11/X.h:115:30: note: expanded from macro 'None'
09:21karolherbst: #define None 0L /* universal null resource or null atom */
09:21karolherbst: mesa doesn't have CL/cl_egl.h
09:21karolherbst: so the system one gets incuded
09:22karolherbst: which includes system CL/cl.h
09:22pmoreau: I don’t think clover supports any of the egl extensions
09:22karolherbst: oh well, I will deal with that after I made it to the office
09:22karolherbst: pmoreau: right, but we need those updated headers for ICD support
09:22pmoreau: Sounds like a reasonable plan
09:23karolherbst: pmoreau: do you by any chance have already a commit to update the cl headers?
09:23karolherbst: I fear they are somehow modified compared to the khronos ones though
09:24karolherbst: I mean the mesa ones
09:24pmoreau: They should be the Khronos ones (well, of OpenCL 1.1, but still the Khronos ones).
09:24karolherbst: ohh, you have one for 2.1
09:25karolherbst: I have a better idea
09:25pmoreau: Well, I did edit it them to have a declaration of clCreateProgramWithIL and clCreateProgramWithILKHR.
09:25karolherbst: I just add untyped pointers until we add those headers
09:34karolherbst: pmoreau: https://github.com/karolherbst/mesa/commit/ff19ef12437904f07466ddd5ae7ab496c42fae04 :)
09:34karolherbst: I guess I will qualify all pointers for which we don't have to update the headers to make it easier
09:35karolherbst: anyaway, of to work
09:35pmoreau: Cool :-)
10:04karolherbst: pmoreau: nice, both versions are wokring through the ICD interface :)
10:05pmoreau: Both? clCreateProgramWithILKHR and clCreateProgramWithIL? Or do you mean two different code versions?
10:06karolherbst: both paths of your util.cpp stuff
10:06karolherbst: 1. clCreateProgramWithIL directly 2. clCreateProgramWithILKHR through fetching the func pointer
10:08pmoreau: Okay :-)
10:09pmoreau: I think they were both already working for me, but that might have been a bug
10:22karolherbst: pmoreau: clGetKernelSubGroupInfo has both the core and KHR version in ICD
10:23pmoreau: Given that cl_khr_il_program is not in the official headers, I would guess they just completely forgot to update both the headers and the ICD for it.
10:23pmoreau: Or it’s just a complete mess :-)
10:24pmoreau: Which could be plausible
10:37karolherbst: pmoreau: https://github.com/karolherbst/mesa/commit/e63f8c6302f4cb3e6353b036861b3703748b4d56 :)
10:37karolherbst: pmoreau: this version should be better for now: https://github.com/karolherbst/mesa/commit/95e6c02ea5563b7f9297b62f16a30871812cdd24
10:38karolherbst: it should work with your CreateProgramWithILKHR stuff already
10:39tomeu:starts seriously considering setting some personal time aside to hack on it, before others have all the fun
10:41karolherbst: too late
10:43karolherbst: tomeu: did you talk with somebody from khronos about your llvm-spirv work?
10:43karolherbst: I mean I don't care if they like or dislike this, just curious
10:43karolherbst: I would prefer your module approach regardless
10:44tomeu: not yet, but I did email the llvm ml about it
10:44karolherbst: I see
10:45tomeu: also, jakob is going to the f2f meeting in taipei this week
10:45tomeu: planning to talk about it with a few people, including codeplay's neil
10:45tomeu: and there's iwocl in march, i think
10:49karolherbst: I see
10:49karolherbst: well we can push that clCreateProgramWithSource back until this is figured out though
10:50karolherbst: for me getting any way working is higher priority then clCreateProgramWithSource support
10:51karolherbst: fun, the icd doesn't check the returned pointer from the driver
10:51karolherbst: like I care
10:55karolherbst: pmoreau: I am kind of wondering, how to pass something non SPIR-V to clCreateProgramWithIL?
10:55tomeu: well, I don't really see much alternative
10:55tomeu: I think we are stuck with llvm-spirv or something equivalent until it gets merged in llvm
10:55tomeu: which I don't expect to happen any time soon
10:56karolherbst: if it is a module it doesn't have to get merged, right?
10:56karolherbst: and I am fine with adding another dependency for that
10:56tomeu: no, but someone has to maintain it, and fix it every time API changes
10:57tomeu: I personally have no plans to work on upstreaming to LLVM, but have offered people willing to work on it to use llvm-spirv as a staging project
10:57tomeu: as long as they don't break mesa :)
10:58karolherbst: I fear I might end up having to maintain it then....
10:59karolherbst: tomeu: or do you know others you might actually be interested in doing that?
11:00tomeu: I think collabora could help with that if everything goes as expected on the commercial side of things
11:01tomeu: I think there's interest in opencl in #etnaviv, so maybe someone could end up lending a hand there
11:01karolherbst: my problem is, I have 0 LLVM knowledge and nearly 0 SPIR-V knowledge :)
11:01karolherbst: but maybe until then it will be better
11:02karolherbst: ohh right, let's ask them
11:02tomeu: me too, but I think that with a good test suite and with help from the community, things can turn out well
11:33pmoreau: Nice for the f2f meeting! And I hope that by IWOCL there will be a clearer plan regarding llvm-spirv, rather than having three separate projects trying to do the same thing.
11:36pmoreau: karolherbst: Anything non SPIR-V, for clCreateProgramWithIL is implementation specific. So you just pass in your IL as you would for SPIR-V, and the implementation will do its job.
11:38pmoreau: From the spec, for clCreateProgramWithIL: “`il` is a pointer to `length`-byte block of memory containing SPIR-V or an implementation-defined intermediate language.”
11:41karolherbst: pmoreau: well right, but what if an implementation supports two different ILs?
11:41karolherbst: and how does the application know about it?
11:42karolherbst: I mean sure, you can ask the platform of the devices you have avaiable and check against some kind of whitelist, but is this really how it should be?
11:42pmoreau: You can get the list of supported ILs by querying DEVICE_IL_VERSION:
11:42karolherbst: ahh, I see
11:43pmoreau: “The intermediate languages that can be supported by clCreateProgramWithIL for this device. Returns a space-separated list of IL version strings of the form <IL_Prefix>_ <Major_Version>.<Minor_Version>. For OpenCL 2.1, “SPIR-V” is a required IL prefix.”
11:43karolherbst: okay right
11:43karolherbst: but in the case you support two or more
11:43karolherbst: the implementation still needs to validate against all to figure out what it actually got, right?
11:44karolherbst: oh well
11:44karolherbst: I guess adding a string parameter to clCreateProgramWithIL was not a good idea for whatever reason
11:44pmoreau: On the other hand, are applications really going to use an implementation specific IL?
11:44karolherbst: who knows?
11:45karolherbst: I mean if there shouldn't be one in the first place, then they could just add clCreateProgramWithSPIRV
11:45pmoreau: So, applications will have a white list of drivers: for NVIDIA, let’s submit the kernels as PTX, for AMD, here’s some LLVM IR, Mesa? here is some NIR, etc.
11:45karolherbst: it sounds like this was the plan
11:46karolherbst: is there any other way on how to do that?
11:47pmoreau: I would see it differently: why have two entry points for submitting an IL, when you could have one? Most applications will only ever submit SPIR-V, but maybe some might have a special case, because there are only targeting one platform anyway, so they let the user submit other stuff than SPIR-V.
11:49karolherbst: pmoreau: that's why they should have added a string parameter
11:49karolherbst: parsing that string compared to parsing the IR is cheap
11:50pmoreau: Depends: for SPIR-V, your just look at the first 32-bit word.
11:50karolherbst: well right
11:50karolherbst: but you know
11:50karolherbst: such APIs should be more generic than this
11:50pmoreau: You don’t need to parse the whole IL to figure out what it is. If you serialise to some binary, you will most likely have a magic number to start with.
11:51karolherbst: of course you could say the first 32 bits are a magic number and so on
11:51karolherbst: but then you still have the possibility to run into conflicts
11:53pmoreau: It’s the implementer’s job to do it in a reasonable way. If they want to support 32 different input ILs, sure, go ahead. Though I guess an extra string would have been nice.
11:54pmoreau: I guess they went with the same interface as clCreateProgramWithBinary, which does not specify either what type of binary it is.
11:58karolherbst: pmoreau: did you actually send your series out? I doubt I got any mails
12:00pmoreau: I haven’t: as I told you, I didn’t do it yesterday, and I will do it today, but today means this evening, cause working on Nouveau is not part of my job, and I tend to work during the day, so evenings/nights/weekends are the only time I can work on Nouveau. :-)
12:00karolherbst: right, missed that
12:01tomeu: pmoreau: wouldn't you like to change that? :p
12:01karolherbst: tomeu: there has to be a company for that though
12:01karolherbst: uhm... what I actually tried to say, it won't most likley not work out where I am currently am
12:01pmoreau: tomeu: I’d like to finish my PhD first, and then the best job would be one with a mix of research and driver development. :-)
12:02tomeu: karolherbst: a company?
12:02tomeu: pmoreau: ack, I hate phds :)
12:03karolherbst: tomeu: yeah well, if you wants to make working on Nouveau his job, he kind of needs to find somebody willing to pay for that
12:03pmoreau: I think it is easier to work on Nouveau as a hobby, than to do research and submit to conferences/journals as a hobby.
12:03tomeu: pmoreau: makes sense
12:04pmoreau: Well, that as well :-D I’m not going to leave my job to work for free xD
12:05tomeu: I don't think it's difficult at all to find a company to hire someone with demonstrated FOSS experience
12:05karolherbst: tomeu: right, but I was talking about working on Nouveau on working time
12:05tomeu: it's more difficult to match very specific areas to work on, or the research stuff pmoreau mentioned
12:06tomeu: yeah, it's difficult to predict what socs most people will be using in a couple of years
12:07karolherbst: well right
12:07tomeu: but with opencl, nouveau on tegra should be very interesting for CV, machine learning, etc
12:07karolherbst: but for compute it is kind of obvious for the next few years
12:07tomeu: so who knows :)
12:07karolherbst: I doubt OpenCL will be a thing like ever
12:07karolherbst: I am sure if they do Vulkan Compute right it could take of a little
12:08tomeu: cannot say much, but opencl in ambedded has a bigger chance than in desktop/servers
12:08karolherbst: but I can't predict the future so it might happen with OpenCL
12:08tomeu: and vulkan compute will take some time
12:09karolherbst: well, you can use CUDA on the Tegra devices as well afaik
12:09pmoreau: You should be able to
12:10tomeu: with open drivers?
12:10pmoreau: They have their now sm_x.y version, which are supported by nvcc, so it should work. Haven’t tried it.
12:10karolherbst: tomeu: in the future? why not?
12:10pmoreau: Nouveau does not support CUDA
12:11karolherbst: tomeu: I think the biggest problem is that the most significant users of SOCs + compute don't really care about Open Source drivers anyway
12:11karolherbst: I never heard the opposite
12:11tomeu: right, so the problem I'm worried about is that we have nice GPUs with nice free drivers, but some people are forced to use binary blobs just because of the compute stuff
12:11karolherbst: so they will just evaluate Tegra+Cuda or anything else + OpenCL
12:11karolherbst: and choose CUDA
12:12tomeu: most of the industry doesn't care, but most of our customers do :)
12:12karolherbst: I mean I am not saying it can be different
12:12karolherbst: but I am just saying that 1. We don't know if OpenCL will be a proper thing 2. We don't know what will be the proper thing 3. We might end up with Open Source CUDA
12:13tomeu: I'm also worried by the absence of mobile compute braking devleopment on higher level APIs
12:13karolherbst: or maybe not
12:13karolherbst: who knows
12:13karolherbst: the only thing I know for sure is, that we need SPIR-V support in Nouveau :)
12:14karolherbst: and doing OpenCL SPIR-V to nvir is just on top of that for now
12:16pmoreau: When I started working on it, the goal was to get OpenCL support for Nouveau. Then the question was which IL to use, and since SPIR-V had just been announced, would be fed directly in OpenCL 2.1+, is used in Vulkan and can be used in OpenGL, it became the pick.
12:17karolherbst: and I think on top of that using a spir-v to nir to nvir path would share a lot of work between all drivers supporting nir
12:32vedranm: any chemists? anyone here knows/uses VMD?
12:35pmoreau: Nope, sorry
12:36pmoreau: Does it have an OpenCL backend and its not working/crashing on Radeon?
12:36vedranm: pmoreau: OK, I'll just do a bug report
12:36vedranm: pmoreau: actually, it has OpenGL backend that's crashing on nouveau
12:36pmoreau: Ah, okay
12:37pmoreau: What kind of error messages are emitted by Nouveau?
12:38vedranm: pmoreau: lemme quickly post apitrace/stderr/dmesg
12:46vedranm: pmoreau: https://bugs.freedesktop.org/show_bug.cgi?id=104728
12:47karolherbst: vedranm: did you trace with nvidia and run on Nouveau>
12:47karolherbst: that might fail for a lot of different reasons
12:47karolherbst: and nothing we can do about
12:48karolherbst: ohh wait, you didn't
12:48karolherbst: my mistake
12:48vedranm: karolherbst: no, nouveau all the way
12:48vedranm: nvidia driver is not an option on that particular machine
12:48karolherbst: I think the issue is within your application :p
12:48karolherbst: let me verify
12:49karolherbst: or your machine
12:49vedranm: dmesg is clean, btw, I would post it otherwsie
12:49karolherbst: it runs without crashes
12:49karolherbst: I think
12:49karolherbst: the call causing the crash isn't recorded
12:49karolherbst: if you replay the trace, does it crash?
12:50vedranm: on the same machine or?
12:50karolherbst: doesn't really matter
12:50karolherbst: should run nouveau
12:50vedranm: heh :) that's the only such machine I have
12:50vedranm: lemme check
12:50karolherbst: I think what happens is, that this VMD stuff calls a func pointer without checking if the function is available at all
12:51karolherbst: what might help is to check the backtrace
12:51karolherbst: and see what gl function was called
12:56pmoreau: It seems to replay fine on the blob. I’ll try to run the application on Nouveau tonight. Some steps on what to do to reproduce it would be appreciated, i.e. is it enough to just start the application, or do you need to click on things to crash it?
12:57karolherbst: it replays fine on Nouveau
12:57karolherbst: that's why I am asking
12:57karolherbst: apitrace + crashes is a big pile of fail if the fail occurs inside a gl call
12:58karolherbst: yeah.. I am not going to register to download that stuff
12:59karolherbst: vedranm: tell them it is stupid to require registering for downloading stuff
13:05vedranm: karolherbst: check your pm
13:23robclark: tomeu, so I built your llvm-spirv branch.. still trying to figure out how to make tools/llvm-spirv/llvm-spirv do something other than "Fails to open input file:"..
13:26karolherbst: robclark: it is like clang
13:26karolherbst: kind of
13:26tomeu: yeah, i had some trouble the first time
13:26karolherbst: try --help ;)
13:26karolherbst: tomeu: will it accept the same args as that clang++ from khorons thing?
13:26robclark: karolherbst, not that I know anything about clang :-P
13:26tomeu: karolherbst: yeah, I didn't change anything
13:27robclark: I did try --help.. although don't see anything obvious about spirv there..
13:27karolherbst: uhh wait
13:27karolherbst: or something like that
13:27robclark: -help-list shows more
13:29tomeu: see the README
13:29karolherbst: weird thing
13:29tomeu: build/tools/llvm-spirv/llvm-spirv ../clspv/test/int_sub.bc -o int_sub.spv
13:29karolherbst: so this is spir-v to llvm?
13:29karolherbst: yeah,... that is poinntless
13:29tomeu: llvm to spirv
13:30tomeu: but it can do the other way as well
13:30karolherbst: I see
13:30karolherbst: robclark: tools/llvm-spirv/llvm-spirv -r /home/kherbst/git/spir-v-testing/instruction_sets/OpenCL/smad24.spv
13:30tomeu: with clang you can pass it a cl c file, and get llvm bytecode from it
13:30karolherbst: but that .l file is kind of empty
13:31tomeu: it can also output a text representation of the spirv
13:31karolherbst: ohh wait
13:31karolherbst: it put it where the spv file was
13:31karolherbst: okay, so this kind of works
13:32tomeu: I was thinking that clover would use the passes from llvm-spirv directly
13:32karolherbst: how can I pretty print *.bc files?
13:32tomeu: don't know
13:32tomeu: guess clang can?
13:33robclark: hmm, what is .bc?
13:33karolherbst: llvm bytecode
13:33robclark: hmm, ok..
13:33robclark: so do I need to use something else to go from .cl to .bc? I was assuming I could just feed in .cl
13:33karolherbst: I think clang can be told to use llvm-spirv as a plugin
13:34karolherbst: and just compile anything to spir64-unknown-unknown
13:34karolherbst: robclark: well, you need a frontend, which is clang
13:34karolherbst: clang -S -emit-llvm
13:35karolherbst: llc X.ll
13:35karolherbst: llvm-spirv X.bc
13:35karolherbst: not llc
13:36karolherbst: but then it fails for other reasons
13:36karolherbst: llvm-spirv: ../lib/TransOCLMD.cpp:122: void SPIRV::TransOCLMD::visit(llvm::Module*): Assertion `(Arch == Triple::spir || Arch == Triple::spir64) && "Invalid triple"' failed.
13:37robclark: ok, yeah, seeing same thing..
13:38karolherbst: yeah, the ll contains the wrong trupe
13:39robclark: ahh, -march= I guess..
13:39karolherbst: or change that to spir64-unknown-unknown in the ll file :)
13:40robclark: hmm, but ofc clang doesn't like that triplet
13:40karolherbst: sed helps
13:41robclark: heh, yeah..
13:41karolherbst: otherwise I can just give you spv files
13:41robclark:has it working..
13:42robclark: I guess sed it will be, to wrap this up in a nice script so I don't have to remember all the magic incantations :-P
13:43karolherbst: I still have that clang++ from the khronos llvm-spirv which does all that in one command
13:44karolherbst: let me try something
13:48karolherbst: got an elf.o
13:49karolherbst: tomeu: any idea how to use your llvm-spirv thing as a llvm plugin?
13:51karolherbst: oh well
13:56karolherbst: robclark: I suppose you wanted to work on getting that function arguments loading working?
13:57robclark: karolherbst, mmm, maybe.. I was thinking to do the pointer thing first.. but first I just wanted to feed some spv to spirv_to_nir and see what happened..
13:58robclark: I think fxn args can be ssbo, but not sure how spirv_to_nir translates them now
13:58karolherbst: it doesn't do anything with those
13:58karolherbst: they stay func args
13:58karolherbst: and in the body you get load_var/store_var
13:58pmoreau: karolherbst: Something like this I guess? https://clang.llvm.org/docs/ClangPlugins.html
14:00robclark: karolherbst, hmm, ok.. then that probably makes it easier to have a separate nir_lower_fxn_args_to_ssbo type pass..
14:00karolherbst: I am not sure if using ssbos is really the best way to do that
14:00robclark: (ie. not have to bake it into spirv_to_nir)
14:00karolherbst: it is global memory
14:00karolherbst: on nouveau
14:00vedranm: karolherbst: indeed, apitrace replay doesn't crash here
14:01robclark: karolherbst, hmm, not sure I understand why it would be any different than an ssbo.. ssbo is is global memory for me..
14:01karolherbst: afaik those things are handling like ubo or uniforms by nvidia
14:02robclark: hmm, uniforms are different, but ubo and ssbo is all in global memory for me, ie. I don't copy it to on-chip memory..
14:02karolherbst: it isn't global mem for nouveau
14:03pmoreau: robclark: Arguments passed to kernels are copied (done by the hardware?) to on-chip memory for Tesla, and constant memory for Fermi+
14:03karolherbst: on nouveau ubo: c1+x uniform: c0
14:03robclark: pmoreau, even output args?
14:03karolherbst: robclark: g address is passed by value ;)
14:04karolherbst: so you get the address for the output to g from c
14:04pmoreau: There is no output argument per se to a kernel: just a pointer to some (global) memory you read after the kernel ran
14:05karolherbst: I think we should handle those like uniforms
14:06karolherbst: pmoreau: c0 is readonly g?
14:06karolherbst: or is there more magic to it?
14:07pmoreau: It is readonly g + it does/did go through a different cache and accesses can be done in a few cycles if the access is direct IIRC
14:07karolherbst: I see
14:07karolherbst: pmoreau: how is it currently done with your spir-v work?
14:08karolherbst: I only saw nvdia using c0 and c15 for that sort of stuff anyway
14:08robclark: karolherbst, well anyways, I guess that is more reason in favor of having separate nir pass to lower fxn args.. so nouveau could have a different lowering pass if it wanted to
14:09karolherbst: but in the end clover decides, right?
14:09pmoreau: karolherbst: I just pre-load the value from wherever it is smem/cmem before parsing any of the code in the function body.
14:09karolherbst: I mean the function args are somehow handled in clover and something called on the driver to store them somewhere
14:10karolherbst: pmoreau: well right, but does clover decides where to store them?
14:10karolherbst: how is the code path?
14:10pmoreau: It just passes a CPU buffer, that gets uploaded by Nouveau
14:10karolherbst: right, but what functions are called in gallium for this?
14:11pmoreau: info contains input, which is the input buffer we are talking about: https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/state_trackers/clover/core/kernel.cpp#n85
14:11pmoreau: As for Nouveau, the uploading is done here: https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/nvc0/nvc0_compute.c#n384
14:12pmoreau: which is called by nvc0_launch_grid()
14:13karolherbst: so pipe->launch_grid
14:13pmoreau: Corresponding code for nv50: https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/nv50/nv50_compute.c#n195
14:13karolherbst: I see
14:14robclark: karolherbst, well, sounded like right now clover didn't decide how to handle input/output (although my thinking was that it should be the one setting up ssbo state).. but if nouveau doesn't want ssbo then maybe that isn't hte best way.. idk
14:15karolherbst: maybe we should have intrinsics like load_func_param?
14:15karolherbst: then we have common code at nir level
14:15karolherbst: and still driver being able to do whatever
14:16karolherbst: I mean in the end the application calls clSetKernelArg, right?
14:16pmoreau: OpenCL does not seem to have a concept of arguments being “input” or “output” ones: they are just arguments to it.
14:16pmoreau: Yes, it calls clSetKernelArg to pass an argument to the kernel.
14:17karolherbst: mhh obj(d_kern).args().at(idx).set(size, value);
14:17karolherbst: that helps...
14:17pmoreau: If you want to pass an “input” array, you allocate a buffer first, and then you pass the handle of that buffer to clSetKernelArg
14:18pmoreau: obj() returns the clover object from the OpenCL handle d_kern
14:19robclark: karolherbst, hmm, perhaps.. for freedreno I'd still want to lower to ssbo I think, since for me they are basically the same thing.. but compared to tgsi the driver has more control over which nir passes to run
14:20karolherbst: in the end you can use the same code in the driver
14:20karolherbst: my idea was, that nir just tells the driver: this is an function argument, you know where to get the value from
14:20karolherbst: in the end it is handled just like compute shaders
14:21karolherbst: pmoreau: found the code I was looking for: PUSH_DATAp(push, info->input, cp->parm_size / 4);
14:22pmoreau: On line 404 O_O
14:22karolherbst: but seriously
14:22karolherbst: code like this: (0 << 8) | 1
14:22karolherbst: pmoreau: mhh, I thought something like that "const unsigned base = NVC0_CB_USR_INFO(5);" points to c5?
14:23karolherbst: I still don't know much about the GPU graph interface
14:24karolherbst: robclark: I mean in the end there are two things we need to know: where to fetch it from and offset. I don't think that having drivers to write their own nir passes to get the first part is the correct path
14:24karolherbst: especially because nir should make it easy for drivers
14:24pmoreau: Yeah, I would expect it to point to c5 as well
14:25karolherbst: ohh, that is nvc0
14:25karolherbst: I slowly think my nir pass is just broken on pre maxwell
14:25karolherbst: might work on kepler
14:26karolherbst: yeah, mhh
14:26robclark: karolherbst, I was thinking to put the nir pass in src/compiler/nir .. you can just choose not to call it if you don't want to ;-)
14:26karolherbst: robclark: well, then I end up with writing my own pass
14:28pmoreau: robclark: Is ssbo the best option even for read-only? (I haven’t played much, if at all with them, so I’m curious.)
14:29robclark: karolherbst, I guess you probably just need an option to not lower load_var for fxn params, or maybe an option to lower it differently.. I guess I'd be curious what (for ex) intel/amd do..
14:29karolherbst: robclark: I really don't think that having a collection of nir passes the driver has to call to support this is the correct way to go though. I really would just place the burden on all drivers to implement load_function_param which an offset in bits/bytes as a value
14:30robclark: pmoreau, it is kinda all the same thing, it is just a difference of how the hw calculates the address.. for ssbo case we have a ssbo state object which has base address.. if I use raw load/store instructions then I need to pass the base address to the shader via a uniform which is a bit more annoying
14:31pmoreau: Ah, okay. And performance-wise there is no difference, or regarding the caching of data loaded through ssbo vs ubo?
14:36robclark: not really
14:37robclark: they have a sort of unified cache, and it's all just system memory.. the only thing that is "special" are uniforms which get copied to on-chip memory (but that is too small to use for ubo's in general)
14:39pmoreau: However that might be enough for the arguments passed to a kernel? (I am not sure what the limit is for those.)
14:39karolherbst: pmoreau: is there actually a limit for OpenCl kernels?
14:39karolherbst: not that some decides to pass structs by value or that sort of thing
14:39pmoreau: I think so... let me check
14:40pmoreau: I definitely do pass structs by value, to avoid having 20 pointer arguments to my kernel, I group them in structs instead, and only have 3/4 arguments that way. :-D
14:40karolherbst: you can pass structs by pointer
14:41pmoreau: Why would I want to incur the overhead of reading from global mem, when I can read from cmem?
14:41karolherbst: but that might take up some space
14:41karolherbst: ahh there is a CL_DEVICE_MAX_PARAMETER_SIZE
14:42karolherbst: nouveau exposes 4kB
14:42pmoreau: As much as if I had given them explicitly to the kernel rather than grouping them in a struct. :-) Plus, it makes it more visible that alignment still plays some role.
14:42karolherbst: robclark: how much fits in the on-chip memory thing?
14:44robclark: maybe something like 256 dwords.. or it might end up working similar to shader instructions..
14:45robclark: ie. execute out of on-chip memory if they are not too big, but use on-chip as an instruction cache otherwise
14:45karolherbst: minimum for CL_DEVICE_MAX_PARAMETER_SIZE is 256 bytes
14:45karolherbst: ahh I see
14:46robclark: since might be bigger for later gen's, I'd need to go back and check..
14:46karolherbst: well I am more interested in the smalles size though
14:46robclark: anyways I don't think that is how I'd want to do things for function params on adreno
14:46karolherbst: well right, I was just curious
14:48karolherbst: I am just wondering on how this should be done on a NIR level, where we shouldn't assume anything about how the driver does things and try to focus on generic solutions to problems. That's all in the end. Well you can lower those func args to ssbo intrinsics if you want to, but I would really prefer something more generic and then generic -> to whatever the driver wants to have
14:49karolherbst: and then somebody could still write a pass to lower the generic intrinsics to ssbos or something
14:49karolherbst: or just imeplement the generic stuff
14:50karolherbst: robclark: I mean, you still need to know what offset the kernel args are written to inside that ssbo, right? I have no idea how the NIR code can get this information from
14:50robclark: karolherbst, being able to lower things differently depending on what driver needs is only of the good things about nir
14:50robclark: I don't think you should require that things get lowered the same way for all hw ;-)
14:50karolherbst: yeah, I know, and I know you don't want that either
14:51robclark: karolherbst, we can just do one ssbo per kernel arg.. that seems nice and simple
14:51karolherbst: I am wondering if a generic solution would give us less headaches overall
14:51karolherbst: robclark: 4KiB kernel arg size?
14:51karolherbst: imagine there are 30 args
14:51karolherbst: sure we could create 30 ssbos
14:51karolherbst: but then again
14:51robclark: each ssbo doesn't have to be page aligned
14:51karolherbst: how would nir even know?
14:52robclark: karolherbst, pipe_shader_buffer::buffer_offset
14:52karolherbst: can you actually use that inside a nir pass?
14:52karolherbst: or is it a good idea to do that?
14:53robclark: karolherbst, nir doesn't know/care about that
14:56karolherbst: robclark: the thing about having one ssbo per kernel arg is, that we then introduce a dependency between: how a driver uploads kernel args and how nir has to lower function args and I would rather not having such dependencies in the first place. I mean sure, if you write a nir_freedreno_lower_func_args pass and use it for freedreno that is all fine, but I don't see it makes sense to have it inside src/compiler/nir
15:01robclark: karolherbst, well, I'm assuming the ssbo approach would also make sense for some other drivers (and tbh, I guess it would work even for nouveau, even if it wasn't optimal approach)..
15:02karolherbst: well right
15:02robclark: and I was kinda thinking it would make sense for clover to setup the ssbo's, perhaps based on a cap
15:02robclark: but I haven't had a chance to think that far through it yet
15:02karolherbst: well currentl I think func args are just an array on host memory
15:02karolherbst: the driver uploads at some point somewhere on the GPU
15:03karolherbst: pmoreau: is this code also used for something compute shader related?
15:03glisse: isn't there a window for share memory amon thread configure inside the compute class
15:03glisse: iirc they are field for that
15:03glisse: so that needs to be accounted for too
15:03glisse: and iirc there is also a window for const buffer
15:03pmoreau: karolherbst: I don’t know, could have a quick look at the code
15:04glisse: maybe several window iirc they are 12 const buffer on nvidia
15:04karolherbst: glisse: yeah
15:04karolherbst: glisse: nvidia uses shared memo on tesla
15:04karolherbst: const mem on fermi+
15:04glisse: are those field 40bits or 64bits ? to know if they have the same issue as the pushbuffer :(
15:05karolherbst: no idea
15:05glisse: thought iirc they need a 8bit align address or something like that
15:05glisse: i can check i have the doc somewhere
15:05karolherbst: const bufs have a quite huge align requiernment
15:05glisse: yeah 8bit might be on AMD :)
15:06karolherbst: 8 bit sounds correct for nvidia as well though
15:06pmoreau: I think so too
15:06karolherbst: align(cp->parm_size, 0x100)
15:06karolherbst: for nvc0
15:07pmoreau: Yeah, and the alignment needs to be done on line 402 and 404 as well. I have a patch for that somewhere among the others.
15:08karolherbst: makes sense, right
15:09karolherbst: but that means that this code is really just used for stuff nobody uses nouveau with
15:10glisse: hhmm that's bad, those field are u32 in the compute class method
15:11pmoreau: I haven’t found references to the launch_grid outside of Nouveau for querying hw performance counters, and clover.
15:11glisse: so if they are not >> 8 then it means we need to do trick like for push buffer
15:11karolherbst: glisse: well, should be plenty, no?
15:11karolherbst: this is just for passing arguments
15:12glisse: issue is the address space is the same as the process
15:12glisse: so if it has to be in the first 4GB of virtual address space
15:12glisse: this is painfull
15:12glisse: as it means you have to mmap() with MAP_FIX or in a loop until mmap give you an address < 4GB
15:13karolherbst: uhm, we don't talk about pointers passing as arguments, right?
15:13karolherbst: or do you mean in general?
15:14glisse: i mean in the compute class you program through the pushbuffer
15:14pmoreau:hasn’t followed what was wrong with the current way of passing arguments (besides the missing alignment function calls).
15:15glisse: that's weird nouveau code push 2 32bits
15:16pmoreau: I’m not sure the compute code has been updated for Maxwell.
15:17pmoreau: I *think* hakzsam had a look at it, or was planning to look into it, but I’m not sure.
15:17karolherbst: yeah, we still use the kepler code paths for this I think
15:17karolherbst: or for most stuff
15:17karolherbst: there is a gm107_compute_validate_surfaces
15:18glisse: ok, so there is 2 thing in compute class, you have to set the address of the buffer ie memory backing the window
15:18glisse: and you have to pick a window address
15:18glisse: nouveau hardcode the window address
15:19pmoreau: and the obj_class has been updated for Maxwell and Pascal, but it doesn’t look like there is anything otherwise (besides the gp100_compute_dump_launch_desc)
15:19glisse: LOCAL_BASE in nouveau
15:20pmoreau: I like that comment: “Unified address space ? Who needs that ? Certainly not OpenCL.”
15:21glisse: hhmm actually no nouveau doesn't even seem to program the window register
15:22glisse: different name with nvidia make it hard to follow :)
15:24glisse: ok so the local memory window does not seem to be program but the shared memory window is hardcoded in nouveau to 0xfe << 24
15:24glisse: but the size for the window is set to 0
15:24glisse: so i am guessing it is disabled
15:25glisse: i will ask nvidia if the window need to be in the first 4GB or or 1 << 48, if the latter then it is all good
15:30karolherbst: I doubt the size is always set to 0?
15:30karolherbst: would be weird
15:31glisse: this is just details, technicaly we can reuse the pushbuffer address below 40bits as i doubt the share window or local memory need much more than few MB
15:31karolherbst: if you want big memory, you have to use the global memory
15:32glisse: yeah we shouldn't worry to much about that i will ask nvidia anyway see if we can get an answer before we have to actively figure this out
15:32karolherbst: I mean most of the code is already working for trivial OpenCL kernels
15:32glisse: SHARED_SIZE is program in launch_grid to non zero once we know the share memory size
15:33karolherbst: yeah, makes sense
15:33glisse: LOCAL_BASE is hardcoded but i don't see a size field so i am assuming the size is constant on each family
15:33glisse: probably something like 64K or smaller
16:07robclark: tomeu, I won't be stepping on any feet if I start implementing parts of OpenCL.std, I assume? (Or at least whichever instructions I run into..)
16:11karolherbst: robclark: are you also encountering the problem that those nir_variables have no location/offset information?
16:12karolherbst: robclark: nice, thanks for doing this :)
16:12robclark: for function args? I haven't gotten far enough to have that problem yet
16:12karolherbst: I see
16:12karolherbst: I just hacked two new intrinsics and it looks quite nice already, just I lack some information
16:13robclark: presumably you aren't trying to load the same param multiple times?
16:14robclark: what does it look like before you lower load_var to load_param?
16:15karolherbst: mhh and I need to lower that store_var differently as well
16:15karolherbst: needs to be a load_param + store_global
16:17robclark: what does your code to lower the load_var to load_param look like?
16:17karolherbst: it is just lower_io
16:18karolherbst: with a few changes
16:18karolherbst: I kind of hoped the nir_variable contains all the information required to handle that already
16:18robclark: presumably adding nir_var_param case..
16:19karolherbst: nir_lower_io is nice because it let you disable lowering of certain var types :)
16:20robclark: oh, yeah, actually maybe that is ok.. but I guess driver_loc isn't getting set anywhere for variables that are fxn params
16:20karolherbst: I am currently checking
16:20karolherbst: location is set
16:21karolherbst: but ufff
16:21karolherbst: this sounds like problems
16:21karolherbst: like if the args have different sizes and so on
16:23karolherbst: yep, location is just the index
16:23karolherbst: but mhh
16:23karolherbst: robclark: do you think it might be a smart idea to let the driver handle that?
16:24karolherbst: like I am sure they could just read out the params and create some map for the sizes + offsets
16:25karolherbst: well in the end a driver would need to iterate long enough until the entire param is read out
16:25karolherbst: pmoreau: would you mind cooking up an example with a struct?
16:25karolherbst: passed by value to a OpenCL kernel?
17:58robclark: tomeu, hmm, somehow I end up w/ .spv that doesn't have an OpEntryPoint.. is that even something that is valid but spirv_to_nir should handle?
19:34Lyude: nice, got a patch for the PTE issues on kepler
19:39pmoreau: karolherbst: There may be already some in memory_access. Look for the ones that include struct in their name ;-)
19:40pmoreau: Lyude: \o/
19:40karolherbst: I ended up writing it on my own :D
19:40pmoreau: If you want to play with all kind of different alignments, packed structures, I encourage you to run the memory_access tests.
19:41pmoreau: There are a few nasty things, that have been constantly annoying me, but I think I have them all sorted
19:42karolherbst: we can't toally not handle structs as params
19:46pmoreau: That double negation surprised me at first. :-D
19:49karolherbst: it is a big pile of fail
19:49pmoreau: Good luck with it! :-)
19:50karolherbst: I'll gues for you this is childs play
19:51pmoreau: Well, it’s a relatively easy case
19:51pmoreau: Does NIR parse the ByVal decoration?
19:51pmoreau: Rather, does the spirv2nir pass takes the ByVal into account? `OpDecorate %in FuncParamAttr ByVal`
19:52pmoreau: Cause that’s quite important
19:52karolherbst: no clue
19:52karolherbst: it only complains about SpvCapabilityAddresses and SpvCapabilityKernel
19:52pmoreau: Didn’t tomeu “fix” that?
19:52pmoreau: Or did you not take his changes into account?
19:53karolherbst: fun, everybody seems to say that
19:53pmoreau: Ah no, he didn’t :-D
19:53karolherbst: his one hacky mesa patch or is there emore to it?
19:53pmoreau: Not that I know of
19:54pmoreau: Interesting, how did he get spirv2nir to accept any OpenCL SPIR-V input without “accepting” those capabilities oO
19:54karolherbst: they are ignored
19:55pmoreau: Ah, so it complains as in emits a warning, not an error
19:56pmoreau: Where does it fail with that SPIR-V example?
19:56karolherbst: ptr->type is null
19:56karolherbst: for OpLoad
19:57karolherbst: yeah, weird
19:59pmoreau: What does it generate for the previous OpInBoundsPtrAccessCheck
20:00karolherbst: no clue
20:00karolherbst: it kind of crashes
20:02karolherbst: pmoreau: it is about the source type
20:02karolherbst: the res type is getting set at least
20:07karolherbst: I'll look into it tomorrow
20:19pmoreau: Yes, validation for OpenCL extended instruction set is coming to spirv-val! :-)
20:20robclark: pmoreau, karolherbst you can kinda ignore the capabilities.. although at some point spirv_to_nir will have to take into account the memory model..
20:21pmoreau: clover will be checking for capabilities support already
20:21karolherbst: we have other things to deal with for now anyway ;)
20:21pmoreau: (as well as extensions)
20:22pmoreau: The memory model might be a bit more complicated that just accepting it
20:22robclark: yeah, I have a skeleton/empty OpenCL.std.. was just planning on adding things as I encounter them
20:23pmoreau: I know someone who kinda did that as well :-) karolherbst
20:23karolherbst: I kind of ended up implementing a lot of OpenCL opcodes in the spir-v to nvir pass...
20:23pmoreau: You did
20:24karolherbst: things like sign are fun
20:24karolherbst: or smoothstep
20:24pmoreau: Well, I expect things like cos and sin to be even more fun
20:24karolherbst: those are easy
20:24robclark: karolherbst, ok, I guess I should rebase on your branch then before I actually start adding things
20:24karolherbst: use nir sin/cos
20:24karolherbst: to codegen
20:24karolherbst: not nir
20:25robclark: ok, well for spriv_to_nir a lot of the instructions are copies of glsl.std.450 opcodes
20:25robclark: (well, maybe they make more precision guarantees in some cases..)
20:25pmoreau: Is nir sin/cos IEEE compliant? Or is it just an abstraction of the hw approximation instruction using a LUT
20:26robclark: I think that depends on underlying hw
20:26karolherbst: robclark: check this function out for examples: https://phabricator.pmoreau.org/diffusion/MESA/browse/clover_spirv_support/src/gallium/drivers/nouveau/codegen/nvir_from_spirv.cpp;d3f8dc0b750c6bed0ae6fd39d32b2ab9fc3aef66$3472
20:26karolherbst: nextafter :3
20:27karolherbst: but Sign and rotate went out of hand
20:27robclark: seems like uint *ptr blows the mind of glsl_type and spriv_to_nir deref chain handling
20:28karolherbst: yeah, seemed like it
20:29karolherbst: pmoreau: I like how nextafter isn't even complete yet
20:30karolherbst: but I am also sure for most there are a lot more smarter ways on how to do those
20:32karolherbst: robclark: sign is different
20:32robclark: you mean compared to glsl?
20:32karolherbst: it is more picky about the 0.0 cases
20:33karolherbst: OpenCL: eturns 1.0 if x > 0, -0.0 if x = -0.0, +0.0 if x = +0.0, or -1.0 if x < 0. Returns 0.0 if x is a NaN.
20:33karolherbst: GLSL: sign returns -1.0 if x is less than 0.0, 0.0 if x is equal to 0.0, and +1.0 if x is greater than 0.0.
20:34robclark: I'm a bit undecided whether spriv_to_nir should handle that, vs just having separate pass for "lowering" stricter math rules..
20:34karolherbst: we have to handle those OpenCL instructions
20:34karolherbst: one way or another
20:34robclark: opencl had some compiler flag to let you be more sloppy about precision, not sure if that extends to things like sign
20:34karolherbst: sign is defined in the spec
20:35karolherbst: whatever the spec is saying
20:35robclark: anyways, I'm not thinking about trying to pass opencl cts yet ;-)
20:36karolherbst: it isn't that bad actually
20:37karolherbst: most of the time it is just about to get those builtin functions right
20:37karolherbst: at least the math part
20:37karolherbst: they test quite all corner cases though
20:37karolherbst: so it is actually a good idea to test against it
20:38robclark: sure, not disputing that yet, just that there are some other bigger things to fix first :-P
20:39karolherbst: well that sign part is actually in the spec
20:39karolherbst: CL spec: "Returns 1.0 if x > 0, -0.0 if x = -0.0, +0.0 if x = +0.0, or –1.0 if x < 0. Returns 0.0 if x is a NaN."
20:39robclark: I'm sure there are a lot of things in the spec ;-)
20:39karolherbst: glsl spec: "Returns 1.0 if x > 0, 0.0 if x = 0, or –1.0 if x < 0."
20:44pmoreau: robclark: Most of the OpenCL math flags can be transcribed as SPIR-V decorations, for example FPFastMathMode decorations https://www.khronos.org/registry/spir-v/specs/1.2/SPIRV.html#_a_id_fp_fast_math_mode_a_fp_fast_math_mode
20:44pmoreau: I don’t know how spirv2nir handle those though.
20:49robclark: by printing a warning, from the looks of it ;-)
23:02Lyude: btw mupuf, just because I want to be extra careful: I double checked some register traces and noticed that nvidia writes the blcg register packs a second time later in the init sequence (the values appear to be identical though), think we should play it safe and make sure nouveau does the same as well before I send out a v2?
23:14pmoreau: Lyude: May I steal a bit for your time to quickly go through a cover letter draft, and tell me if there’s anything that isn’t clear and should be fixed?
23:15skeggsb_: Lyude: there's a few places nvidia do stuff like that, i'd not worry about it - i think it's a side-effect of how their driver programming sequence is structured more than anything else
23:17skeggsb_: ie. it might be like our "fini" hook, they're run for everything to try and get the gpu into a known state, and then done individually as-needed on a per-engine basis
23:18Lyude: makes sense
23:18skeggsb_: it's just a guess, but it seems likely
23:18Lyude: pmoreau: sure, I am ready to be stolen
23:19Lyude: take me away~
23:19skeggsb_: Lyude: what's the resize issue btw?
23:19skeggsb_: rather: what's the fix for it?
23:19Lyude: skeggsb_: sec, lemme bring up the pw link
23:19pmoreau: Lyude: :-D I’m ready to bring you closer to a world where Nouveau supports OpenCL oO \o/
23:19Lyude: oh nice!
23:20pmoreau: Lyude: https://hastebin.com/genenidalo.vbs
23:20skeggsb_: Lyude: thanks
23:22mupuf: Lyude: I would agree with Ben, I do not see the reason why these parameters would have to be re-set. The magic bits are in ptherm
23:23mupuf: this is why I am a little sceptical with the idea that there are multiple levels of clock gating. Sure, there is one in HW, but you don't get to control it from the software
23:23Lyude: cool, just double checking
23:24mupuf: but as I said, this is a minor thing :)
23:24mupuf: this is great work you have done!
23:25Lyude: thank you!
23:26Lyude: mupuf: tbh though: most other hw has multiple levels of powersaving like this, although you are right in that it's really just a difference of whether we change the blcg/slcg registers from their defaults
23:26mupuf: pmoreau: "you knew that all modules would be use the same IR" --> the "be" probably should not be there ;)
23:26Lyude: someone pointed out on the listt btw that slcg does stand for second level clockgating
23:26pmoreau: Oops O:-)
23:26Lyude: pmoreau: will take a look at that in a second
23:27mupuf: Lyude: yep, Mikko, I did not know he changed company!
23:27pmoreau: Lyude: No worries :-)
23:27pmoreau: mupuf: Wait, did he? :o
23:27Lyude: should we at least keep the ability to toggle between blcg and slcg as well?
23:27mupuf: pmoreau: well, it was not @nvidia.com :D
23:28mupuf: but maybe it is an ISP I do not know
23:28pmoreau: Indeed, I had missed it
23:28mupuf: or his own server
23:28mupuf: his linkedinn still says nvidia
23:28mupuf: so... false alarm!
23:28mupuf: good on him to share :)
23:29pmoreau: mupuf: https://www.kapsi.fi/english.html
23:30mupuf: pmoreau: cool, it's like tuxfamily then :)
23:30mupuf: thanks :)
23:31pmoreau: If he left NVIDIA, it could be dangerous for him to go around and say things like “oh btw, that stuff is called foo and does bar”. :-)
23:31mupuf: I think the if part does not apply
23:31mupuf: it is a little dangerous
23:31mupuf: but I guess he just asked permission
23:32mupuf: this is seriously nothing earth-shatering really
23:32mupuf: and it improves nvidia's image (yes, it is that bad right now...)
23:32pmoreau: Right, but there is a (quite) higher chance he asked permission before saying it if he’s still employed by NVIDIA.
23:33pmoreau: Anyway, yes, every little bit is a step in a better direction
23:34mupuf: pmoreau: your discussions section is indeed showing a possible nightmare :D
23:34pmoreau: And I completely forgot to thank Robert for sending an email about the v2 release of the EVO documentation for Pascal + Volta
23:34mupuf: maybe keep the original code, always?
23:35mupuf: I mean, who cares about the binary size anyway?
23:35pmoreau: For linking different IRs together?
23:36mupuf: yeah, you could try to recompile to another IR if necessary
23:36mupuf: so that does not introduce yet another IR
23:36pmoreau: I had absolutely not thought about that. I only realised when Karol had an issue when trying to run clover with Nouveau advertising NIR, while creating a program using clCreateProgramWithIL
23:37pmoreau: I am hoping we can get rid of the TGSI frontend, and only have the LLVM and the SPIR-V ones left.
23:38pmoreau: That way, if we could keep everything in SPIR-V, and then apply a SPIR-V -> LLVM IR or SPIR-V -> NIR pass if needed before handing it to the driver for emission.
23:40mupuf: pmoreau: sounds good :)
23:41Lyude: mupuf: er, so should I also remove the ability to only do blcg/slcg, or just remove the one for doing cg only?
23:41mupuf: the first. The only configuration would be "enable CG or not"
23:47robclark: pmoreau, I guess TGSI support in clover was just a hack to make it work kinda/sorta with non-llvm drivers.. I guess nothing but some test/example code uses it ;-)
23:49pmoreau: Ah, you might be right regarding some example code; I wonder if the n-body example code which sits somewhere in Mesa? isn’t using it for example.
23:49pmoreau: IIRC Hans was trying to run it, when he worked on LLVM IR -> TGSI.
23:50robclark: well, at any rate it seems like the odds that someone used it who wasn't working on mesa is low
23:54pmoreau: robclark: You’re also welcome to share any comments you might have on the draft: https://hastebin.com/genenidalo.vbs
23:57robclark: pmoreau, I guess I haven't thought at all about linking, although from a driver standpoint (or at least a driver that doesn't really support actual fxn calls, so maybe this doesn't apply to nouveau) as long as nir's function inlining pass is happy, I'm happy :-P
23:58pmoreau: I’ like to be as happy about it as you are :-)
23:59robclark: as far as when to go from spirv->nir.. I *guess* just an implementation detail about what is most convenient for clover.. I'm kinda assuming that each driver doesn't have to call spirv_to_nir itself, but from driver perspective I guess I don't care how that happens