00:03 BootI386: imirkin: Now I'm able to reproduce yesterday's crash with a tiny piece of code
00:06 imirkin_: yay :) (kind of)
00:08 BootI386: I was sure you would be interested :)
00:24 BootI386: imirkin: crash.c http://termbin.com/j63y
00:25 BootI386: crash.sh http://termbin.com/9i5x
00:25 imirkin_: yeah that seems simpler
00:26 imirkin_: i'm guessing that the depth test and srgb aren't critical to all this
00:26 imirkin_: what's the effect of the crash? you can still kill the app and go on your merry way, right?
00:27 BootI386: Often, yes
00:27 BootI386: Sometimes, Alt+PrntScr+E is needed
00:27 imirkin_: skeggsb: iirc you were looking for an app you could use to debug the X hanging stuff? --^
00:28 BootI386: It crashes a lot faster when in the lowest pstate
00:29 imirkin_: makes sense
00:30 skeggsb: imirkin_: i was? ;)
00:30 imirkin_: i thought you were.
00:31 imirkin_: you were asking me a while ago what was going on with X when app Y had a killed channel
00:31 skeggsb: yeah, you answered that though when you traced it - it managed to kill its channel somehow and spins forever waiting on buffers, yeah?
00:32 imirkin_: that was X though
00:32 imirkin_: i.e. that's what was going on with X when X's channel was killed :)
00:33 skeggsb: X's channel can end up stalled if it's tried to acquire a semaphore that a dead channel was supposed to release
00:33 skeggsb:is working on that problem atm
00:33 imirkin_: ok cool :)
00:34 BootI386: But my code shouldn't crash anything, right?
00:35 imirkin_: nah
03:20 mooch2: does anybody know which methods the GF100_2D objects take?
03:21 skeggsb: FermiTwoD is documented in rnndb in g80_2d.xml
03:21 skeggsb: it's basically the same as Nv50TwoD
03:22 mooch2: ah, okay
03:32 mooch2: https://github.com/yuzu-emu/yuzu/blob/master/src/video_core/command_processor.h#L13
03:33 mooch2: are there any more PFIFO submission modes known other than these on GM20B?
04:10 mooch: can someone please help me figure out why the win9x nvidia drivers freeze on boot?
04:15 mooch: in my emulator, that is
04:28 gnarface: (linux support channel, mooch)
04:29 mooch2: i know
04:30 gnarface: i think nvidia has a forum for windows driver support
04:30 mooch2: but this is also an nvidia dev channel
04:30 mooch2: nono
04:30 mooch2: i'm making an emulator
04:30 mooch2: and trying to run the win9x nvidia drivers in it
04:30 mooch2: i'm emulating the riva 128
04:30 mooch2: that's why i said "in my emulator"
04:31 gnarface: but "win9x" suggests you're running the emulator on windows. is that not true?
04:33 gnarface: well it doesn't matter in the sense that i would have been able to help you either way
04:37 mooch2: yeah
04:37 mooch2: gnarface, but i'm running windows inside the emulator inside windows
04:37 mooch2: how are you not getting this?
04:37 mooch2: oh god, win 3.11 drivers don't work either
04:37 mooch2: they freeze the same way lmfao
04:48 imirkin: it's almost like the drivers expect your emulated hw to work in a particular way
04:51 mooch: yeah but i have no idea which way that IS
05:39 mooch2: imirkin, any idea why the driver would read PMC_ID, and then PFB_CONFIG_0 and then just stop?
05:40 mooch2: not even messing with ptimer?
10:47 karolherbst: pmoreau: nice, clCreateProgramWithSource works :)
11:13 tomeu: \o/
11:14 karolherbst: :)
11:14 karolherbst: now, upstreaming all that stuff :D
11:14 karolherbst: and those builtins
11:15 karolherbst: tomeu: there is some weirdo issues with the builtins: https://gist.github.com/karolherbst/2a93a998e559ce508ec99e2f2e6a96f3
11:16 karolherbst: tomeu: and we need to add support for entrypoint auto detection or so
11:16 karolherbst: int spirv_to_nir
11:16 karolherbst: *in
11:17 karolherbst: allthough...
11:17 karolherbst: clover should give us the function name
11:20 karolherbst: kernel::_name
11:25 pmoreau: karolherbst: clover should already be storing the function name, IIRC
11:27 pmoreau: karolherbst: And it does: in the clover::module, each symbol has a name (and some other stuff), and all SPIR-V entry points should already be added as clover symbols by the spirv backend in clover.
11:28 pmoreau: So you should be able to loop over all declared symbols on the clover module, and call spirv_to_nir for each of them.
11:28 pmoreau: Otherwise, the symbol name is passed by clover to the driver when launching a grid.
11:30 tomeu: is the problem with builtins what is being discussed in the libclc ml?
11:31 pmoreau: Do you have a link to that discussion please?
11:33 tomeu: I thought you were involved in it :p
11:33 pmoreau: karolherbst: What’s the builtin issue you’re having? The SPIR-V binary looks fine to me. The `u64_vec3 = some_u32_constant` comes from spirv_to_nir, not NIR.
11:34 pmoreau: Hum, I started a thread on the Mesa ML, but I’m not even following the libclc ML.
11:34 tomeu: pmoreau: sorry, not in the libclc ml, I was thinking of "[Mesa-dev] Adding a SPIR-V target to libclc"
11:34 pmoreau: Ah, okay :-D
12:21 karolherbst: pmoreau: well, spir-v makes it a long
12:21 karolherbst: %__spirv_BuiltInGlobalInvocationId = OpVariable %_ptr_UniformConstant_v3ulong UniformConstant
12:22 karolherbst: thing is, we shouldn't end up with "vec3 64 ssa_3 = intrinsic load_var () (__spirv_BuiltInGlobalInvocationId) ()" in the first place
12:22 pmoreau: What would you want to end up with?
12:23 karolherbst: __spirv_BuiltInGlobalInvocationId needs to be converted to a nir intrinsic
12:23 karolherbst: the entire load_var is wrong
12:24 karolherbst: it should be a nir_intrinsic_load_invocation_id:
12:24 karolherbst: without the :
12:24 pmoreau: Ah okay, so that’s something that’s missing in the current spirv_to_nir I’m assuming.
12:24 karolherbst: maybe
12:24 karolherbst: the intrinsic is there though
12:24 karolherbst: maybe vulkan doesn't know GlobalInvocationId?
12:25 karolherbst: no idea
12:25 pmoreau: GlobalInvocationId
12:25 pmoreau: Global invocation ID in GLCompute or Kernel Execution Models. See OpenCL, Vulkan, or OpenGL API specifications for more detail.
12:25 karolherbst: ohh
12:25 karolherbst: there is SpvBuiltInInvocationId:
12:25 karolherbst: SpvBuiltInInvocationId -> nir_intrinsic_load_invocation_id I guess
12:25 karolherbst: there is a SpvBuiltInGlobalInvocationId as well though
12:26 karolherbst: mapped to SYSTEM_VALUE_GLOBAL_INVOCATION_ID
12:26 pmoreau: InvocationId
12:26 pmoreau: Invocation ID, input to Geometry and TessellationControl Execution Model. See Vulkan or OpenGL API specifications for more detail.
12:26 pmoreau: That is something different
12:30 pmoreau: And SPIR-V is not wrong about those making those longs: from the OpenCL specification, “size_t get_global_id(uint dimindx)”
12:31 pmoreau: s/SPIR-V/llvm-spirv
12:31 karolherbst: right
12:31 karolherbst: I never said that :p
12:33 pmoreau: “13:21:07 karolherbs+│ pmoreau: well, spir-v makes it a long” that wasn’t a complaint? my bad then
12:34 karolherbst: pmoreau: "<pmoreau> karolherbst: What’s the builtin issue you’re having? The SPIR-V binary looks fine to me. The `u64_vec3 = some_u32_constant` comes from spirv_to_nir, not NIR."
12:35 pmoreau: Ah okay, but my point still stands: SPIR-V declares a u64 vec3, and for some obscure reason, spirv_to_nir decides to initialise it with a single 32-bit value. That initialisation does not come from the SPIR-V binary.
12:36 pmoreau: (Or maybe it is a 64-bit value, but only 32 bits gets displayed)
12:38 pmoreau: I just want to be clear about what to blame: if it’s llvm-spirv, an error in the SPIR-V spec, then I’ll look into it. If it’s an error in spirv_to_nir, I’ll leave it alone. :-D
12:39 karolherbst: pmoreau: I forgot to run two nir passes
12:40 pmoreau: Okay. Does it work better with those?
12:40 karolherbst: yeah
12:40 pmoreau: And it’s great that “clCreateProgramWithSource” works for you as well!
12:40 pmoreau: Cool!
12:40 karolherbst: I think I trigger that carry sched issue now!
12:40 karolherbst: which was my goal in the first place :D
12:41 pmoreau: I was going to say ”Less cool :-/”, but if that was your goal...
12:41 karolherbst: or maybe not
12:42 karolherbst: the CTS is always crappy if it comes to error output
12:43 karolherbst: close enough? LONG_ADD int test failed 2311e25a6cf33054 != 7ff97f8893afdb5f
12:43 karolherbst: ...
12:44 pmoreau: Yeah, almost there
12:46 karolherbst: mhh weird
12:46 karolherbst: int math add: https://gist.githubusercontent.com/karolherbst/a27af0062ae031c3a6a2142ea2416db4/raw/0193850317ed3d6a13b3fc9597b6f864bb625465/gistfile1.txt
12:46 karolherbst: this looks kind of correct?
12:46 karolherbst: uhh wait
12:47 karolherbst: I see
12:47 karolherbst: the first value is correct
12:47 karolherbst: but the second is not
12:47 karolherbst: because there is no hint of global invocation id in the shader anymore
12:48 karolherbst: ohhhhhh
12:48 karolherbst: the heck
12:48 karolherbst: global invocation id is set to 0
12:48 karolherbst: and opted away
12:49 karolherbst: pmoreau: you said there might be something wrong with the spirv?
12:50 pmoreau: Let’s see
12:50 karolherbst: wondering why it is set to 0
12:51 pmoreau: What’s the SPIR-V for that one?
12:51 karolherbst: basically the same as https://gist.github.com/karolherbst/2a93a998e559ce508ec99e2f2e6a96f3
12:52 pmoreau: Okay, and spirv_to_nir is still initialising that global_id to 0?
12:52 karolherbst: maybe that "OpDecorate %__spirv_BuiltInGlobalInvocationId Constant" confuses spirv_to_nir?
12:52 karolherbst: yeah
12:52 pmoreau: Hum
12:52 karolherbst: but mhh
12:53 karolherbst: this does nothing really
12:54 karolherbst: mhh wondering what generates this line "decl_var INTERP_MODE_NONE u64vec3 __spirv_BuiltInGlobalInvocationId = { 0x00000000 }"
12:54 pmoreau: Yeah, I’m trying to look for it as well
12:56 pmoreau: Yeah, maybe that Constant confuses it
12:56 pmoreau: https://cgit.freedesktop.org/mesa/mesa/tree/src/compiler/spirv/vtn_variables.c#n1325
12:56 karolherbst: mhhh
12:57 karolherbst: yeah
12:57 pmoreau: I’m assuming it creates the init somewhere for constant, and if there is nothing, initialise it to zero
12:57 pmoreau: That should be made conditional to not being a builtin, I would guess.
13:00 karolherbst: pmoreau: can I dump the C kernel with clover?
13:00 pmoreau: Yes, using `CLOVER_DEBUG=clc` IIRC
13:01 pmoreau: CLOVER_DEBUG can accept: clc for OpenCL C, llvm, spirv, and native. And you can specify multiple at ones, by separating them with commas.
13:01 karolherbst: yeah
13:02 karolherbst: pmoreau: any idea why that thing is a u64vec3 though?
13:03 karolherbst: because it makes like 0 sense
13:03 pmoreau: Well, for the u64 part, we already talked about it.
13:03 karolherbst: right
13:03 karolherbst: I am more curious about the vec3 part
13:03 pmoreau: For the vec3 part...
13:04 pmoreau: Because it’s a builtin variable, not a function like in OpenCL
13:04 pmoreau: So you either have one vec3, or three different builtins for each component
13:04 karolherbst: why though?
13:05 pmoreau: I *think* you can’t have builtin decorations on functions, but I’m going to check
13:05 karolherbst: that's not what I meant
13:05 karolherbst: why is it a vec3?
13:05 karolherbst: or has to be?
13:05 karolherbst: or are all builtins vec3s?
13:05 pmoreau: Because you can start a 3D grid, in which case the global ID is 3d
13:05 karolherbst: this would be stupid
13:05 karolherbst: mhh
13:05 karolherbst: yeah well, sure right, but still
13:06 karolherbst: I try to skip the Constant thing and see what happens
13:07 pmoreau: If the ID is 3D, why would you not have a vec3 to represent it? Would you prefer a custom structure with three members? Or three separate u64 variables?
13:07 karolherbst: I mean you could just declare it as a 3d if you actually have a 3d value here
13:09 pmoreau: So you would have a builtin variables that would change type depending on the context? Doesn’t sound great
13:09 pmoreau: s/variables/variable
13:11 karolherbst: dunno
13:11 pmoreau: Could you break point in https://cgit.freedesktop.org/mesa/mesa/tree/src/compiler/spirv/vtn_variables.c#n1697 and check what is happening, what’s the value of initializer upon input?
13:12 karolherbst: initializer is NULL
13:13 pmoreau: So far so good
13:13 karolherbst: I am wondering though why it is a nir_var_uniform
13:13 pmoreau: If you go to line 1882, is it still NULL?
13:13 karolherbst: should be nir_var_system_value
13:14 pmoreau: Ah, that is true
13:14 karolherbst: so maybe nir_mode is messed up?
13:14 karolherbst: but it is a uniform in spir-v
13:14 karolherbst: yeah, in 1882 it is still null
13:14 pmoreau: And after that?
13:15 karolherbst: still
13:15 pmoreau: Hum, okay
13:15 karolherbst: I think it might be confused due to the uniform thing, maybe
13:16 pmoreau: Oh wait, could you check for constant_initializer instead
13:16 pmoreau: You’re probably right
13:16 karolherbst: mhh
13:17 karolherbst: if (glsl_type_is_vector(interface_type->type)
13:17 karolherbst: mode = vtn_variable_mode_vector;
13:17 karolherbst: nir_mode = nir_var_uniform;
13:17 pmoreau: :-/
13:17 pmoreau: Where is that piece of code located?
13:18 karolherbst: vtn_storage_class_to_mode
13:18 karolherbst: uhh wait
13:18 karolherbst: mhh
13:19 karolherbst: why is the code the in the first place in my branch
13:19 karolherbst: and not in master
13:19 pmoreau: Yeah, I was looking at master and couldn’t find it
13:19 karolherbst: right, I put it there
13:19 karolherbst: needed it for .... other stuff
13:22 pmoreau: Can you add a check for a builtin decoration there?
13:22 karolherbst: doesn't help
13:23 karolherbst: I think the issue is somewhere hidden
13:23 pmoreau: Really? Even setting the mode to sys_val does not help? That’s unfortunate
13:24 pmoreau: Try checking again in vtn_create_variable, what’s the value of constant_initializer. If it’s NULL, try setting a watchpoint on the memory address, to see when it gets modified
13:24 karolherbst: yeah
14:13 Guest12: hey guys, thinking about diving deep into nouveau and trying to help, do I need recent hardware, or would be a mobile nvidia card sufficient?
14:14 karolherbst: Guest12: whatever, best is if you have issues you want to fix on your card
14:15 RSpliet: Guest12: I'm sure there's todo items for each generation of GPUs, so I'm sure you're able to contribute even without further monetary investments
14:15 Guest12: sounds good, thx :D
14:16 imirkin__: but if it's a NV28M, might be time to upgrade :)
14:16 karolherbst: would be still nice to fix issues there though
14:16 Guest12: no, its a GF117 fermi
14:17 imirkin: cool
14:17 imirkin: i need someone to test a patch on fermi :)
14:17 karolherbst: imirkin: mhh weird, your color changed in my IRC client...
14:18 imirkin: i fixed my nick
14:18 karolherbst: yeah right, but fixed was green and _ was red/brownish
14:18 karolherbst: now your are violett
14:18 karolherbst: same color as you had with __
14:18 imirkin: reflects the color of my face :p
14:18 karolherbst: :D
14:19 karolherbst: I see
14:19 Guest12: havent build nouveau yet, but I could try with your patch
14:20 imirkin: patch at: https://patchwork.freedesktop.org/patch/202553/
14:20 imirkin: the test is in VK-GL-CTS. when you get that far, ask in here how to build / run it
14:20 imirkin: it's not trivially obvious.
14:20 imirkin: (it uses cmake, so everything's difficult =/ )
14:23 Guest12: i will try it, when i am at home
14:23 Guest12: cool, first task :D
14:27 imirkin: you may also want to come up with a better handle than 'Guest12'
14:28 RSpliet: GF117 is the headless one, isn't it? If you have the official driver running, perhaps it could provide some useful information on changing clock speeds in the future. I'm not in a position to invest serious time in this at the moment, but if you stick around (with a slightly more permanent IRC nickname?) skeggsb might request your services in due time if needed
14:28 imirkin: RSpliet: yes, it's the display-less one
14:28 imirkin: (i.e. DISP is just plain fused off)
14:29 RSpliet: Sounds like a useful card to determine whether there's line-buffer registers residing in FB that need fiddling during a reclock by diffing trace with GF119 or sth
14:37 LukasH: RSpliet: sounds insteresting
14:38 RSpliet: LukasH: I blame the abbreviations and tech lingo. Don't want to overload you with complex tasks from day one though, best find your own itch to scratch ;-)
14:39 imirkin: LukasH: ultimately tons of stuff that can be done, but what you're interested in will drive what you're motivated to actually work on :)
14:40 LukasH: will be back this evening, have to work ;), thx for introductions
15:04 Manoa: ilia I been thinking, you wanne do openCL for nvidea cards right, do you think it worth taking the opportunity now that you develop it to also add CUDA to the mix ? it mybe more easy to do CUDA than it is to do openCL ? that is, if it possible, afther all CUDA work bether than openCL on nvidea just an idea to think about :)
15:12 imirkin_: you're right. just doing opencl is too easy.
16:16 karolherbst: Manoa: we have to start somewhere and for OpenCL a lot of things are already there
16:16 karolherbst: having an open source CUDA stack would have some other implications I don't even want to think about right now
16:16 karolherbst: I mean I am sure Nvidia would like it very much to have CUDA running on AMD or intel hardware
16:17 karolherbst: not that I care
16:17 karolherbst: but I care about their lawyers might get upset about it
17:17 jvesely: karolherbst, I don't think their lawyers would care much, much of the CUDA language is available in clang. you only need to add the runtime library, and backend compiler.
17:18 karolherbst: jvesely: that's tooling
17:18 karolherbst: the most important bits _are_ the runtime library and the backend compiler
17:19 karolherbst: well we have a solid enough backend compiler, so that basically just leaves the runtime library
17:19 karolherbst: but every driver have their backend compiler anyway
17:19 karolherbst: jvesely: imagine what would happen if you could run your CUDA stuff on AMD hardware, do you think Nvidia would just do nothing about that?
17:20 karolherbst: I am quite sure they will be very upset about that if it becomes competetive
17:20 karolherbst: I am not saying they will succed with anything, just that they will most likely try _something_
17:21 orbea: what can they even do?
17:21 karolherbst: sue you?
17:21 karolherbst: they have money
17:21 karolherbst: you don't
17:22 orbea: on what grounds?
17:22 karolherbst: does it matter?
17:22 karolherbst: even if they lose the lawsuite
17:22 karolherbst: it costs you
17:22 orbea: if they wanted to sue nouveau they had lots of opportunity
17:22 karolherbst: I am sure some foundation might chip in and help you out
17:22 karolherbst: but...
17:22 karolherbst: orbea: why would they even care?
17:23 jvesely: karolherbst, not interesting for lawyers. reimplementing runtime API is fair game (see google vs. oracle). adding LLVM IR to nv ISA backend is hardly more interesting than reverse engineering the ISA itself
17:23 orbea: nvidia
17:23 karolherbst: I am talking about CUDA on all GPUs here
17:23 orbea: ?
17:23 karolherbst: this is a completly different level
17:23 karolherbst: jvesely: patents, also google has money
17:23 karolherbst: anyhow
17:23 orbea: just seems silly to lose progress because of some hypothetical scenario some lawyers might care
17:24 karolherbst: that's not something I want to work on before some lawyer tells me it is okay
17:24 karolherbst: well
17:24 karolherbst: I am sure nobody cares about the code
17:24 karolherbst: but shipping it might get problemativ
17:24 karolherbst: *problematic
17:24 jvesely: that's a question for those who ship it
17:25 karolherbst: right
17:25 jvesely: I don't think there's a patent that would cover LLVM IR to nv ISA that you're not infringing already in nouveau
17:26 karolherbst: that's not the point as I already said
17:26 karolherbst: CUDA on other hw is the point
17:26 karolherbst: if you buy nv hardware you also have the rights to run CUDA on it, so you are already covered for all the patents and so on
17:27 karolherbst: interesting is the CUDA on AMD or Intel hardware part
17:27 jvesely: I think it would be benefitial to nvidia, more CUDA proliferation + they can provide better CUDA support than a FOSS project can
17:27 karolherbst: CUDA on other hw -> less money
17:27 karolherbst: where is here the benefit for nvidia?
17:27 jvesely: I don't agree with that
17:27 karolherbst: you don't buy CUDA
17:27 karolherbst: you buy nvidia hardware
17:28 jvesely: you buy hw and sw support
17:28 karolherbst: right, and because it is open source now
17:28 karolherbst: you don't need to ask nvidia
17:28 karolherbst: you can ask anybody to give you support
17:28 jvesely: as long as they are better in CUDA than the others, they get the bussiness
17:28 karolherbst: you can buy any hw to run CUDA
17:28 karolherbst: right
17:29 jvesely: plus they control the standard
17:29 karolherbst: which doesn't really bring them any money
17:29 jvesely: they can introduce features that others have to replicate and exploit first mover advantage
17:29 karolherbst: I am just pointing out why nvidia wouldn't be amused by an open source CUDa stack running on all hw
17:29 jvesely: CUDA proliferation is good for them
17:30 karolherbst: mhh
17:30 karolherbst: not if it is open source
17:30 karolherbst: because then it doesn't matter anymore
17:30 karolherbst: same as C
17:30 karolherbst: nobody wins if C is big or small
17:30 karolherbst: because it doesn't matter
17:30 karolherbst: CUDA is just a classic vendor lock-in
17:30 karolherbst: if you destroy vendor lock-ins, vendors aren't amused
17:31 jvesely: no, they still decide what is in the specs, what hw requirements there are. and can shape it so they always have an advantage over other implementations
17:31 jvesely: everybody uses CUDA since it runs everywhere, but for best ersults you buy nvidia
17:31 karolherbst: that's just a pseudo argument, in reality it doesn't matter
17:32 karolherbst: people could just add extensions to CUDA if they really care
17:32 jvesely: it does. the "limited features for free, for enterprise + support pay pextra" model is quite widespread
17:32 karolherbst: and begin adding awesome stuff which doesn't perform well on nv hw
17:32 orbea: tbh, i dont care about cuda, just these leaps of logic bug me...my hypothesis is that nvidia will either not care or if they do they don't have a leg to stand on and why should nouveau developers care either way?
17:33 jvesely: they can't, open source implementation does not mean they lose control fo the standard
17:33 karolherbst: orbea: again, it doesn't matter for nouveau here
17:33 orbea: exactly
17:33 karolherbst: jvesely: right, but it doesn't matter on the hw level
17:34 karolherbst: or not on the customer level so to speak
17:34 jvesely: karolherbst, it does. both OpenCL and cuda have rather strict ULP precision requirements. if your hw can support ops on instruction level, while competitors need to run sw routine. you win on performance level, big time
17:34 karolherbst: nobody uses sw routines
17:35 karolherbst: "faking" support in sw is a hoax
17:35 jvesely: on the contrary, most of libclc is sw implementation of required ops that can't be done in hw
17:35 karolherbst: it is the normal stuff
17:35 karolherbst: some stuff the hardware can't do
17:35 karolherbst: but this is also valid for nvidia
17:35 karolherbst: nobody wins here
17:36 karolherbst: jvesely: not really
17:36 jvesely: different GPUs ahve different precision capabilities
17:36 karolherbst: it is just one "op" mapped to multiple hw ISA ops
17:36 karolherbst: it is still hw
17:36 jvesely: on AMD side VI+ can do log2/esp2 in hw with 1 ULP precision, older asics need to implement the ops in sw
17:36 karolherbst: you just need sw to do the compiling stuff
17:36 karolherbst: jvesely: they don't do that in sw.....
17:36 karolherbst: it is still ran on the hardware
17:37 karolherbst: you just need more ops for doing the same thing
17:37 imirkin_: user can't have to care about all that
17:37 jvesely: they do, you can check the liblcll ource that was released. it is different if the ops needs 1 GPU instruction or 200 gpu instructions
17:37 karolherbst: jvesely: same is true for nvidia....
17:37 jvesely: it impacts performance
17:37 karolherbst: not even all PTX instructions map to 1 GPU instruction
17:38 jvesely: yes, cmy point is that controling the specs gives you power over precision requirements that can be made to fit you hw
17:38 karolherbst: jvesely: with your logic, sqrt is implement in sw for nvidia as well
17:38 karolherbst: because nvidia has no sqrt hw op
17:39 karolherbst: so it is sw, right?
17:39 karolherbst: imirkin_: true
17:39 karolherbst: jvesely: I am just saying that "emulating in sw" is a total hoax argument
17:39 karolherbst: it is super irrelevant
17:40 karolherbst: and totally wrong to even talk about
17:40 jvesely: karolherbst, it's not emulating in sw
17:40 karolherbst: jvesely: why not?
17:40 jvesely: it's running a high precision routine instead of low precision instruction
17:40 karolherbst: ?
17:40 karolherbst: there is _no_ sqrt instruction
17:40 karolherbst: not even low precision
17:40 jvesely: there is on AMD GPUs
17:40 karolherbst: ohh
17:41 karolherbst: so AMD would have a benefit for PTX sqrt here?
17:41 karolherbst: do you see my point now and how irrelevant that is?
17:41 jvesely: it depends on precision requirements of CUDA sqrt
17:42 karolherbst: well, sqrt is sqrt
17:42 jvesely: amd isntruction is good enough for OpenCL (3 ulp)
17:42 karolherbst: right, but in the end it doesn't matter
17:42 karolherbst: there is a imad instruction on nvidia hw
17:42 jvesely: it does, that's what the specs ask for
17:42 karolherbst: but
17:42 karolherbst: they don't use it on maxwell
17:43 karolherbst: because it is slower than converting that imad into some fancy 16 bit mad op and do a bunch of ops to implement 32/64 bit mad in 16 bit
17:43 jvesely: most of clover work has been adding sw routines for ops that cannot be mapped to hw
17:43 karolherbst: so 1 hw op can be slower than 10 hw ops....
17:43 karolherbst: ....
17:44 jvesely: I'm talking about 100s of instructions to implement one op
17:44 karolherbst: and?
17:44 karolherbst: there are PTX surface unstructions which map to 100 hw instructions
17:44 karolherbst: *instructions
17:44 karolherbst: on nvidia
17:44 karolherbst: some 64 bit ops have to be imeplemented like this
17:45 karolherbst: and I still wouldn't consider this "sw"
17:45 karolherbst: sure, a compiler has to do the work
17:45 karolherbst: but so it has to do that for 1 to 1 mappings as well
17:45 karolherbst: or optimizations
17:45 karolherbst: it is just part of the stack
17:45 karolherbst: as it is running something on the GPU cores
17:45 jvesely: I'm not sure what you're getting at. if vendor A implements 3 ULP sqrt insturction, vendor B implements 5 ulp sqrt, and the specs require 4 ulp vendor A wins big time on performance
17:45 karolherbst: for some ops it makes sense to implement it in hw because it is fast
17:45 jvesely: program running on GPUs are sw as well
17:45 karolherbst: some things just add costs without giving you any benefit in spee
17:45 karolherbst: d
17:46 karolherbst: jvesely: maybe the more precise one is super slow?
17:46 karolherbst: and slower than emulating it with other instructions?
17:46 jvesely: karolherbst, that is almost never the case
17:46 jvesely: gtg
17:47 karolherbst: look at div on intel cpus....
17:47 karolherbst: best not to use it, because if you can do something else, it is faster
20:41 pmoreau: karolherbst: Any updates regarding the Input stuff? If you can push your current patches somewher, I’ll have a look at them and patch more things if needed.
20:43 karolherbst: pmoreau: not really
20:43 karolherbst: I will look into it tomorrow again
20:45 pmoreau: Okay
21:35 Manoa: karol herbst: CUDA running on AMD and intel = mega LOL :)
21:53 Manoa: you know that nvidea hate openCL, that's why they did CUDA, it's funny but if you think about it - on windows how could AMD have mutch better openCL than nvidea ? it don't make sense right ? I meen nvidea have bigger software department, bigger research, more money - it because they do it on purpose to criple openCL to make CUDA look better, so I have no problem to make CUDA run on AMD and intel - nothing would be more funny than that :)
21:55 pmoreau: Manoa: OpenCL was created as an answer to CUDA, not the other way round.
21:55 Manoa: CUDA was first to do computing on video card ?
21:56 pmoreau: It came with the first compute capable card from NVIDIA, the G80, back in 2007.
21:56 pmoreau: OpenCL first release was 2 years later
21:57 pmoreau: It’s possible that AMD hardware had some compute capabilities before that, but I think it would have been using another language than OpenCL.
21:59 Manoa: 2007 that's the time of 8800 GT and radeon 9600 XT right ?, at the time we didn't have strong enough hardware to run computing because we barely had enough hardware to do graphics :)
21:59 airlied: yeah AMD had CTM before CL
22:00 pmoreau: Didn’t prevent NVIDIA from releasing CUDA in 2007 for doing compute, and AMD from having CTM, apparently
22:01 Manoa: well now that there is standard way to do computing on video card, AMD removed the CTM and more to openCL but nvidea didn't
22:03 Manoa: directcompute I think is also redundant, whay do it if you already have so many ways to do it
22:03 pmoreau: True, they kept CUDA and have been pushing hard for it. Their OpenCL support is definitely not topnotch.
22:04 pmoreau: I’m not really familiar with DirectX & co, so is DirectCompute more like OpenCL, or is just compute shaders in DirectX?
22:10 imirkin_: 8800 GTS was a pretty powerful card for the day
22:10 imirkin_: *way* ahead of anything else out there at the time
22:10 imirkin_: sporting an impressive 320MB of VRAM
22:10 Manoa: I remember, it was the better choice than 9600 XT
22:11 imirkin_: and DX10 support
22:11 Manoa: 9600 had the brilinear filtering, AMD had to cut some math corners to make it appear faster
22:12 imirkin_: (as the first chip in the series, it obviously had a bunch of warts, but nothing like the ATI R600 did)