00:00 robclark: yeah, there is plenty of stuff beyond making "__kernel void add(__global int * out, int arg0, int arg1) { out[0] = arg0 + arg1; }" work that I haven't thought thru yet :-P
00:02 pmoreau: No worries :-) I started that way as well, a bit more than two years ago. So I had some time to think about things. ;-)
00:04 robclark: fwiw, now that I can actually generate .spv binaries, I have a setup to feed those to spirv_to_nir so I've started digging my way through how that works and thinking about what we need to do for pointers and fxn params (and fxn params that are pointers, etc)..
00:05 pmoreau: Nice!
00:05 airlied: you might want to talk to jekstrand about pointers in nir
00:05 robclark: I kinda know how I want things to look in nir (some kinda small extension of how deref chains work to handle -> instead of . basically).. still learning my way around spirv_to_nir to figure out how to implement that (but I really only just got as far as starting to dig thru that code this afternoon)
00:05 airlied: he's probably at least considered how it might work
00:06 airlied: (or tell you how crazy it would be)
00:06 pmoreau: Getting a way to generate SPIR-V binaries helps quite a bit! I do not regret the time where I had to actually write by hand those binaries, cause LLVM-SPIRV hadn’t been open-sourced yet
00:06 robclark: airlied, I guess nir_deref_type_pointer is the obvious thing..
00:07 robclark: and nir_lower_io would break up offset calculations to do a load_var of pointer address and feed that back into remaining deref offset alu instructions.. plus I guess maybe we need to make get_io_offset() do 64b math
00:10 cwabbott: robclark: jekstrand and i have talked about that a little
00:11 cwabbott: i think we're going to need a new nir_deref_instr instruction type, and move everything over to it instead of deref chains piece-by-piece
00:12 cwabbott: i started doing that a while ago, but i got stuck somewhere (i think it was nir_lower_locals_to_regs iirc)
00:12 robclark: cwabbott, hmm, a full deref chain might contain multiple levels of pointer dereference..
00:12 cwabbott: that works just fine
00:13 Lyude: pmoreau: "you knew that all modules would be use the same IR"
00:13 cwabbott: a nir_deref_instr produces another pointer, so you can call it again
00:13 cwabbott: i.e. have a nir_deref_instr of a nir_deref_instr
00:13 pmoreau: Lyude: Fixed, mupuf already pointed that one out ;-)
00:14 robclark: but other than a way to load a 32b or 64b (depending on memory model) value from some offset in some data structure, and then use that as base for next thing, I think you are ok.. but note that I was thinking of load_ptr/store_ptr (and possibly associated atomics) intrinsics..
00:14 Lyude: ahh cool
00:14 robclark: which might be same as what you are thinking for nir_deref_instr I guess
00:14 cwabbott: well, i was thinking of doing something higher-level
00:14 robclark: cwabbott, basic idea is it is kinda like (for example) ssbo intrinsics but the offset param is a 32b/64b raw ptr addr
00:14 cwabbott: where we don't lower away the address calculations all at once
00:15 robclark: hmm
00:15 cwabbott: different targets will want to do different things
00:15 robclark: I think the pre-lowered situation could just be deref chain, no?
00:15 cwabbott: no
00:15 Lyude: pmoreau: "not currently supported IRs by clover, like NIR." s/\./?/
00:15 cwabbott: you need to handle stuff like &var->foo
00:16 cwabbott: say, passing that to a function argument
00:16 cwabbott: it's called GetElementPointer in SPIR-V and LLVM
00:16 Lyude: other then that, lgtm
00:16 cwabbott: what i wanted was a NIR equivalent of GetElementPointer
00:16 robclark: hmm, I haven't looked yet at how &foo in spir-v, but kinda assuming that just returns the result of the address calculation that get_io_offset() does
00:17 cwabbott: well, it's more complicated than that
00:17 pmoreau: Lyude: Ah, yes. And rephrasing it as “IRs not currently supported by clover” makes it easier to read as well, imo.
00:17 cwabbott: in CL pointers can point to various different things
00:18 pmoreau: Lyude: Thanks for the comments! :-)
00:18 cwabbott: you could have void add(int *x) { x++; } and then call that with a global variable, shared variable, local variable, etc.
00:18 cwabbott: you don't know, when you're compiling the function, which one you'll get
00:18 cwabbott: the pointer might be annotated with an address space, which does tell you, or you might not
00:18 cwabbott: *it might not
00:19 robclark: hmm, I guess that is getting back to a question I asked earlier (but I guess it was on rh irc, not freenode) about mixing opaque things and raw svm style pointer values..
00:19 robclark: (vs spriv memory model)
00:19 cwabbott: some targets will have to use "fat pointers" to compile that function (think far pointers in DOS) where you insert a big ol' switch statement to handle it
00:19 pmoreau: cwabbott: You would need to at least annotate it with __generic I think, which is OpenCL 2.0+ IIRC.
00:20 cwabbott: others have a unified address space where the same load/store instruction can handle everything
00:20 robclark: cwabbott, I guess at least pre lower_io you can tell what "address space" it is by looking at the nir_variable.. so I guess what you are thinking is you just can't loose that info in lower_io
00:21 pmoreau: But whether or not there is a __generic does not change the fact that you won’t know at compile time, that is true.
00:22 cwabbott: robclark: remember that you can also do stuff like foo = cond ? &thing1 : &thing2
00:22 cwabbott: i.e. you might not always know the nir_variable that a pointer comes from
00:22 robclark: where &think1 and &think2 are different class of memory?
00:23 cwabbott: apparently not until 2.0 according to pmoreau
00:23 robclark: s/k/g/g
00:23 cwabbott: but think about when thing1 and thing2 are both local variables
00:23 robclark: that implies that you kinda need to handle it at runtime..
00:23 pmoreau: 2.0 it is indeed: https://software.intel.com/en-us/articles/the-generic-address-space-in-opencl-20
00:23 cwabbott: we want to preserve type information, so we can promote locals to SSA values
00:24 airlied: well I'd assume we'd be designing for 2.0 :)
00:24 cwabbott: robclark: yeah, hence the fat pointers thing
00:24 robclark: cwabbott, local is easy since that is just an offset calculation..
00:24 cwabbott: not when you need to preserve the type information
00:24 cwabbott: say that we realize, thanks to inlining, etc. that it's always &thing1
00:25 robclark: offset_of_local1 + relative_offset_from_rest_of_deref_chain vs offset_of_local2 + relative_offset_from_rest_of_deref_chain?
00:25 cwabbott: we could've turned it into an SSA value, but because we lowered everything to offsets in spirv_to_nir we're screwed
00:26 robclark: to be clear, I think lowering things to offsets should happen after spirv_to_nir
00:26 cwabbott: same thing for inlining functions involving pointers to local variables
00:26 cwabbott: yeah, sure... i guess i was trying to say that
00:27 cwabbott: we need a nir_deref_instr
00:27 robclark: but yeah, having different address spaces and a language that lets you mix them like that is... fugly..
00:27 cwabbott: we could even use it today to make KHR_variable_pointers better
00:28 cwabbott: robclark: yeah...
00:28 cwabbott: hence why we've avoided OpenCL :)
00:29 pmoreau: Ah, apparently kernel arguments still require the decoration, so it’s only to be used for device function arguments, or kernel/device functions local variables.
00:38 cwabbott: pmoreau: i don't think that would even make sense for kernel arguments... the kernel arguments are what you pass into the GPU, so you need to know ahead of time what they are
00:40 pmoreau: I guess you could always retrieve that information when the arguments are set using clKernelSetArg, but your program is already compiled and linked from the API’s point of view.
00:41 robclark: cwabbott, hmm, do wonder about the use-case for pointers into anything other than global memory.. seems like pointers are more useful for svm which by defn would be global.. maybe the spec writers really just hated their compiler teams
00:49 pmoreau: I hope I’ll be able to send the remaining of the series tomorrow, cause right now, it doesn’t want to leave. And I’m not sure who I should blame for that between git or my email provider (or someone else).
00:49 cwabbott: robclark: i guess it's useful for being able to pass pointers around to functions
00:50 cwabbott: also, OpenCL is much more C and LLVM-centric, so they tend to try and make everything as similar to the CPU as possible
00:53 robclark: cwabbott, yeah, doesn't stop me from wishing someone said "that's a dumb idea, don't do that" :-P
00:53 cwabbott: yeah, well, the pressure is there
00:54 cwabbott: KHR_variable_pointers in Vulkan came from people wanting to do that
00:54 cwabbott: i think it's kinda inevitable eventually
00:55 robclark: since naive way is driver to lower to if (is_local(ptr)) { ... } else if (is_global(ptr)) { ... } ..., and something needing to figure out how to clean that mess up after inlining
00:55 cwabbott: it's like ye olde x86 segments, where people just worked around them
00:55 cwabbott: yeah
00:55 cwabbott: at least AMD has a universal address space, so they don't need to do that
00:56 cwabbott: you just setup a magic address range for LDS and scratch memory, and then the hw takes care of it
00:56 robclark: I guess either way for spirv_to_nir part, we want nir_deref_type_pointer.. after that a useful first step is to make that work with just global address space..
00:56 cwabbott: maybe newer qualcomm chips have that too
00:57 robclark: going past that without generating hideous code in backend seems painful
00:57 cwabbott: in the long run, i want to get rid of deref chains entirely
00:57 cwabbott: maybe nir_deref_type_pointer is a good way to start that process
00:58 robclark: fwiw, at least as of a3xx-a5xx there are different types of load instructions which do address calculating differently.. I think after a3xx (or possibly a4xx) local memory is no longer global memory with different address calculation.. I *think* something new is coming w/ a6xx but -ENOHARDWARE yet to get cmdstream traces..
00:59 robclark: a3xx you had to setup register w/ pointer to buffer to use as "local" memory but that doesnt' exist anymore by a5xx
01:00 robclark: (so I *guess* on a5xx it uses gmem for "local" memory)
01:00 cwabbott: btw, i think that the gunk to implement fat pointers can be mostly done in NIR
01:00 cwabbott: so the backend itself doesn't have to change much
01:01 robclark: yeah, it just means you need to be able to resolve what sort of ptr a fat ptr is..
02:56 orbea: I was watching a video with mpv and my xorg froze as I changed from windowed to full screen (without hwdec) and I was able to recover by killing mpv via ssh. dmesg printed these nouveau messages at the bottom, any ideas? http://dpaste.com/1XBB1YD This is with a 4.14.14 kernel
03:04 orbea: hmm, i wonder if its more than one mpv instance running at once...
07:06 karolherbst: pmoreau: you didn't send out all patches
08:15 tomeu: robclark, karolherbst, pmoreau: I don't have any code that I haven't released, and I haven't done anything relevant that I haven't described in the mls
08:15 tomeu: also have no immediate plans of hacking on anything, and if I find some time to do so, will first ask around to not step any toes
08:18 karolherbst: tomeu: sounds good
08:49 pmoreau: karolherbst: Yeah, cause 1) the ML server ran out of memory and couldn’t deliver patches 11, 12 and 13, and 2) either Git, my mail provider, or some greylisting somewhere as been preventing me from sending patches 16 through 22.
08:50 karolherbst: :(
08:53 pmoreau: Okay, looks like the sending issue is gone, patches 16 to 22 incoming.
08:54 pmoreau: Hopefully gabe got its memory back
08:55 pmoreau: Could, that worked out! I’ll resend 11, 12 and 13 to the ML.
08:55 pmoreau: s/Could/Cool
08:55 pmoreau:hasn’t completely woken up it seems --"
08:56 pmoreau: karolherbst: Thanks for the reviews! I’ll have a look at your other replies once I get to work.
08:58 karolherbst: :)
09:40 karolherbst: pmoreau: c5 gets lowered to c7+g?
09:59 karolherbst: but anyway, c0[] is used with your stuff
10:00 karolherbst: I never got around mapping those weird PUSH commands to the actualy const buff :(
13:41 karolherbst: pmoreau: you can't do 64bit loads from cX[]
13:42 karolherbst: they have to be all 32bits
16:48 pmoreau: karolherbst: I’d have to double check on that (64-bit loads from cX[]).
16:48 karolherbst: pmoreau: this is due to cX[] being used like immediates in instructions
16:48 karolherbst: pmoreau: and like 32 bit muls could just get them loaded in
16:48 karolherbst: so you end up with mul %r0 %r1 c0[0x40]
16:48 karolherbst: allthough c0[0x40] is a 64 bit value ;)
16:49 karolherbst: most of the time you are safe
16:49 karolherbst: but when you are not, you never find the issue ;)
16:52 pmoreau: I added a check for sub 32-bit load folding, cause that one was being annoying as well
16:53 pmoreau: https://phabricator.pmoreau.org/rMESAfdbf5da4c28a3185fb2bbcc872c95df382bfd934
16:54 pmoreau: As for 64-bit loads, well the code should properly split it when splitting the 64-bit op
18:48 Lyude: I think suspend/resume just got broken by the latest 4.15 rc
18:48 Lyude: erm, for kepler
18:49 Lyude: will try to bisect in a biit
18:49 Lyude: *bit
21:54 Lyude: ah, kepler suspend/resume got broke because someone broke the irq handler in the kernel
21:54 Lyude: nice
21:57 karolherbst: Lyude: sigh
22:02 Lyude: karolherbst: mhm, sent them a message (although I realize I probably should have cc'd the nouveau mailing list as well, oh well)
22:02 Lyude: skeggsb_: jfyi ^
22:02 karolherbst: :)
22:02 karolherbst: Lyude: do you have my IRQ patch though?
22:02 Lyude: Tracked it down to irq/matrix: Spread interrupts on allocation (a0c9259dc4e1923a98356967ce8b732da1979df8)
22:02 karolherbst: Lyude: https://github.com/skeggsb/nouveau/commit/90ace7d2acba46fb3ba02eb3a3880438703f5cac
22:03 Lyude: let me see what that does
22:03 karolherbst: it should be on bens tree already
22:03 karolherbst: but maybe you don't have it and it would improve the situation
22:03 karolherbst: Lyude: do you have the issue that on resume you don't get interrupts?
22:04 karolherbst: meaning secure boot and anything else really fails as well?
22:04 Lyude: karolherbst: actually that patch seems to already be in my branch
22:04 Lyude: this is kernel.org master btw
22:05 Lyude: [ 31.255563] do_IRQ: 1.35 No irq handler for vector
22:05 Lyude: Sad.
22:05 karolherbst: uhh
22:05 karolherbst: weird
22:05 Lyude: on the plus side
22:05 karolherbst: try config=NvMSI=0?
22:05 Lyude: nouveau's power consumption is a lot better with the gr hung
22:05 Lyude: k, sec
22:05 karolherbst: :D
22:07 Lyude: ugh, yes it does
22:07 Lyude: erm, it resumes with MSI disabled]
22:08 karolherbst: I think we have a bug regarding the non MSI IRQ if we do MSI
22:08 karolherbst: but I am not quite sure
22:09 Lyude: lemme see if I can figure this out
22:09 karolherbst: anyway, when I rmmod nouveau, sometimes the kernel gets upset and disables that non MSI IRQ
22:09 skeggsb_: that's just a case of us not disabling some gpu interrupt properly, and when we disable msi and it triggers..
22:09 skeggsb_: solution: find what we're not disabling fully and fix it
22:10 Lyude: alright
22:12 Lyude: skeggsb_: btw, it should probably also be noted that the gr also hangs when this happens
22:13 skeggsb_: Lyude: i'm talking about what karol mentioned
22:13 Lyude: oh, thought you were responding to my bug
22:21 karolherbst: Lyude: yeah, if you get no interrupts you also get nothing from the falcons
22:22 Lyude: is there a sysfs/procfs node somewhere that can be used for listing msi IRQs like we do with normal interrupts? (/proc/interrupts)
22:22 Lyude: and karolherbst mhm, that makes sense, although we are getting interrupts here
22:22 Lyude: the problem seems to be that the handler gets disassociated with the interrupt
22:22 Lyude: unless that's what you meant
22:22 karolherbst: uhh
22:22 karolherbst: that sounds more serious
22:22 Lyude: yes it does
22:23 Lyude: that's why I sent a message to the ML the moment I saw it :P
22:24 Lyude: the patxh that broke this was in the kernel's irq handler after all...
22:24 Lyude: *patch
22:32 Lyude: hm, but that irq doesn't seem to even be allocated by nouveau beforehand
22:46 airlied: /proc/interrupts also has msi I thought
22:50 Lyude: yes it does
22:50 Lyude: i realized that a little bit ago, it's just that the msi vector this is coming from isn't listed there
22:53 airlied: that sounds like a bug then
23:19 Lyude: weird, so even with that patch reverted and the IRQs working it doesn't seem to come up there