00:06 karolherbst: mhhh. when I add a "*vec4(1.0, 0.0, 0.0, 1.0)" I don't get any errors as well...
00:10 karolherbst: imirkin_: where is the code to translate an if/else branch into a predicate?
00:11 imirkin_: post-ra
00:11 imirkin_: i forget the name...
00:11 karolherbst: flattening?
00:11 imirkin_: yes
00:11 karolherbst: ahh tryPredicateConditional I guess
00:12 imirkin_: the target specifies whether something is predicatable or not
00:12 karolherbst: I essentially just want to adjust the threshold
00:13 karolherbst: ohh, I think I found it
00:14 karolherbst: cool
00:14 karolherbst: imirkin_: okay, here is the thing: if I always convert that to predicates, it doesn't show the error
00:15 karolherbst: now... why is that
00:15 imirkin_: coz our stack isn't big enough for all the warps
00:15 imirkin_: or ... something
00:15 karolherbst: unlikely
00:15 karolherbst: it's not _that_ deep I guess
00:15 karolherbst: thte execution
00:15 karolherbst: it's one loop
00:16 prOMiNd: is anyone using windows currently with nvidia pascal inside?
00:16 karolherbst: imirkin_: or does a "bra" have any side effect like that?
00:17 karolherbst: ohhh wait....
00:17 karolherbst: mhhh
00:17 karolherbst: I have a super bad feeling about that
00:21 karolherbst: imirkin_: meh.... that's it
00:21 karolherbst: the more iterations we do, the more errors I get in dmesg
00:22 karolherbst: 0-7: no block traped, 8-9: one block traped
00:22 karolherbst: and so on
00:23 karolherbst: 150: half the blocks traped :)
00:24 karolherbst: okay...
00:24 karolherbst: imirkin_: what should we do about that?
00:29 karolherbst: but then agaain, why would flattening help here? :/
00:34 karolherbst: mhhh, I think I see it now :/ annoying
00:36 karolherbst: imirkin_: I guess we have to sync on each loop iteration?
00:37 karolherbst: well, if that loop contains an if
00:38 HdkR: karolherbst: What if the loop contains an if, a break, and a continue?
00:38 HdkR: We just need goto added to glsl to really hate things
00:39 imirkin_: karolherbst: no
00:39 karolherbst: HdkR: it's a for() {... if() .. return; } thing
00:40 karolherbst: HdkR: but yeah :/ breaks and continue makes it really hard
00:40 HdkR: karolherbst: What are you doing specifically? I'm out of the loop here
00:41 imirkin_: trying to make the gpu not produce errors
00:41 imirkin_: same as always :)
00:41 karolherbst: HdkR: apperantly, if we do too many predicate bars, the gpu gets upset
00:41 HdkR: pfft, Nvidia hasn't even solved that issue yet
00:41 karolherbst: imirkin_: well, the rendering is incorrect
00:41 imirkin_: meh, that's secondary
00:41 karolherbst: mhh well, incorrect here means the output is black :D
00:41 HdkR: predicate bars or predicate bras?
00:41 karolherbst: HdkR: bras
00:42 HdkR: Try putting a block of nops after all of them
00:42 karolherbst: :(
00:42 HdkR: (If that fixes it then I hate everything)
00:42 imirkin_: i hate solutions like that.
00:43 imirkin_: they make me lose faith in logic
00:43 HdkR: It would give you an idea of what to investigate at least
00:43 karolherbst: HdkR: I show you both versions, then you decide for yourself :p
00:43 HdkR: Which GPU btw?
00:44 karolherbst: HdkR: https://gist.github.com/karolherbst/389a7cfd703a3419a641324e0dc266f5
00:44 karolherbst: HdkR: gm204
00:44 HdkR: ah. Try my nop thing :P
00:44 karolherbst: diff them
00:44 imirkin_: so says the oracle
00:45 karolherbst: HdkR: I like my workaround better than your nop thing :p
00:45 karolherbst: obviously
00:45 HdkR: lol
00:45 imirkin_: sounds like you've been working on adreno or something
00:45 HdkR: pfft
00:45 imirkin_: that's where throwing in nops solves everything
00:46 karolherbst: :D
00:46 HdkR: Who needs branching anyway
00:46 HdkR: Overrated
00:46 imirkin_: along with (ss)(sy) or something
00:46 HdkR: nop with side effects is great
00:46 karolherbst: that's not a nop
00:46 imirkin_: HdkR: that's how you do a plain join on nvidia pre-mawell
00:46 imirkin_: NOP.S :)
00:47 imirkin_: maxwell has its own "SYNC" now
00:47 karolherbst: ohh, right, that thing
00:47 HdkR: Tell that to IBM and MIPS who both have NOP instructions that have side effects...
00:47 karolherbst: mhh, still how do we solve this issue? can we add more stack or something?
00:47 karolherbst: HdkR: I hope those are undocumented
00:47 HdkR: they're documented
00:47 karolherbst: meh, no fun then
00:51 karolherbst: imirkin_: maybe we should always flatten like that inside a loop? as you basically have to take account for the entire loop as well or not?
00:52 karolherbst: mhhh
00:52 karolherbst: only if the bra is actually the tail of the loop
00:52 karolherbst: because that's where diverging branches really hurt
00:52 HdkR: Is it only incorrect rendering or a fault?
00:52 karolherbst: HdkR: fault
00:53 karolherbst: I guess too many threads diverged
00:53 HdkR: Gone past 32, can't go anymore :P
00:53 karolherbst: because you don't diverge on a simple if/else thing
00:53 karolherbst: but on the entire loop
00:53 karolherbst: so you diverge again, on the next iteration
00:53 karolherbst: and again on the next one ;)
00:54 karolherbst: HdkR: that would mean, that 32 iterations os somehow the threshold
00:58 karolherbst: annoying issue :/
00:58 karolherbst: HdkR: so, what's your suggestion here?
01:00 HdkR: I mean, 32 obviously isn't the limit otherwise the blob would break on super divergent cuda kernels constantly :P
01:01 karolherbst: obviously
01:01 karolherbst: or the compiler is good enough to not run into that issue
01:02 HdkR: I mean. That's impossible
01:02 HdkR: You can literally use subroutines to break out to fully divergent in one instruction :P
01:02 karolherbst: well, even 500 iterations are fine for the working shader
01:02 karolherbst: allthough there are conditional breaks ;)
01:04 HdkR: Oh. So if you lower the number of times the loop goes around then it doesn't fault?
01:04 karolherbst: obviously
01:04 HdkR: Didn't notice it was a long running loop
01:04 karolherbst: 7 I think is the highest we can go
01:05 HdkR: I see
01:06 HdkR: What sort of fault is it? Memory fault?
01:06 karolherbst: dunno
01:06 karolherbst: MULTIPLE_WARP_ERRORS is all I get
01:07 HdkR: lol, no idea what that even means
01:10 skeggsb: karolherbst: you don't get another (possibly untranslated) code to go with that?
01:16 imirkin_: usually there's multiple codes
01:16 imirkin_: when you have multiple errors
01:16 imirkin_: it'll return a first error
01:17 imirkin_: and then say "hey, ps, there were more"
01:17 karolherbst: skeggsb: the gist above
01:18 karolherbst: uhm...
01:18 karolherbst: or didn't I post it?
01:18 HdkR: Is it multiple warps hitting a memory error? :P
01:18 karolherbst: skeggsb: https://gist.github.com/karolherbst/d24a9a5ac387cd2bdd221efce407c143
01:18 karolherbst: skeggsb: the if inside the loop inside effect
01:19 skeggsb: karolherbst: i meant, in the trap errors in dmesg
01:19 skeggsb: you usually get something along with MULTIPLE_WARP_ERRORS
01:19 karolherbst: yeah... nothing usefull
01:19 karolherbst: "gr: GPC0/TPC2/MP trap: global 00000000 [] warp 50016 []"
01:19 karolherbst: "gr: GPC0/TPC0/MP trap: global 00000004 [MULTIPLE_WARP_ERRORS] warp 70016 []"
01:19 karolherbst: "gr: TRAP ch 13 [017dcf7000 X[28858]]"
01:19 karolherbst: that is essentially everything
02:04 karolherbst: skeggsb: any idea why there is no more specific error?
02:05 karolherbst: I mean, we kind of know what's that all about, but...
02:19 skeggsb: well, i actually can't see any evidence in any source i have that 0x16 should even exist :P
02:20 karolherbst: ohh, so that's actually an error code?
02:20 karolherbst: thought that was simplythe warp id
02:20 skeggsb: yes
02:22 imirkin: skeggsb: you should argue with the GPU
02:22 imirkin: see who wins that argument...
02:22 skeggsb: haha
02:23 imirkin: i try to sync up our error codes with the public nvgpu ones every so often
02:23 imirkin: but it's not like nvgpu is complete
02:23 skeggsb: nvgpu was one of the places i looked
02:23 imirkin: and they change around from gen to gen, i think
02:24 imirkin: skeggsb: btw, do you have any specific criteria for doing a xf86-video-nouveau release?
02:24 skeggsb: that group is pretty consistent, at least, changes are in compatible ways
02:24 skeggsb: i don't really care about it at all anymore, so, as you will :P
02:24 imirkin: k, sounds good
02:25 imirkin: i want to figure out why X crashes in the teardown path (which affects gdm, or when the last X client leaves)
02:25 imirkin: and then i'll do a release
02:26 imirkin: (only happens with rotated screens, naturally)
06:41 imirkin: karolherbst: second attempt: https://github.com/imirkin/mesa/commit/ef5417ecf714a996bf937c057813f001fe572bc9.patch
06:41 imirkin: should compile now =]
13:50 karolherbst: imirkin: yeah, seems to work :) Will run it system wide a bit and see how well that works out
15:42 cosurgi: imirkin: runnning htop with refresh rate 10 times per second seriously hinders cursor movement in gvim, even with :set lazyredraw.
15:43 cosurgi: I will try compiling mesa later. Now I've got some work to do :)
15:44 glisse: imirkin: oh between just so you know the outcome, so dma map on the ppc was ok and i can read/write through 0x1700, the issue is kexec the firmware on this thing is actualy a linux kernel and that linux kernel loads nouveau successfuly but then it does kexec the kernel of the install OS and it leaves the nvidia gpu in bad states which leads to the OS kernel nouveau driver failing in weird way
15:45 glisse: solution is proper GPU reset ...
15:45 glisse: thanks a lot for helping me
16:00 imirkin_: cosurgi: "don't do that"
16:01 imirkin_: glisse: awesome :) nouveau + kexec = often-fail, i believe
16:01 imirkin_: cosurgi: iirc last i looked into it, some stuff was not being accelerated by EXA
16:02 imirkin_: specifically related to things xterm was doing
16:32 cosurgi: imirkin_: you mean that mesa will not help xterm ?
16:37 cosurgi: btw, those llvmpipe processes are software rendering? I see pplenty of those. My program is also spawning them a lot. And it displays OpenGL.
16:41 imirkin_: cosurgi: it will not.
16:41 imirkin_: yes, llvmpipe is software rendering
16:41 imirkin_: the shader is converted into llvm ir, which is then compiled for the target architecture
16:41 imirkin_: (and the rast/etc is also done using llvm ir)
17:00 cosurgi: ok. I see. Then without mesa let's enjoy the improved stability :)
17:08 imirkin_: and woe be unto you if you're on s390, since there's no llvm backend for that
17:11 cosurgi: :)
18:38 karolherbst: imirkin_: mhh, did you by any chance came up with a random idea on how to solve that branching issue from yesterday? somehow I don't have anything better than to prevent branching conditionally within the loop, if the paths don't converge later inside the loop iteration
18:38 karolherbst: somehow a conditonal break is less bad than a branch inside a loop
18:38 karolherbst: or a break converges on the prebreak BB
18:39 karolherbst: dunno
18:39 imirkin_: i don't have a clean understanding of how branching works
18:39 imirkin_: i believe mwk has attempted to explain it to me a few times
18:39 karolherbst: yeah me neither though :/
18:39 imirkin_: with little effect
18:39 imirkin_: i believe RSpliet also has a clear understanding of it
18:39 imirkin_: not to mention HdkR ... who hoards knowledge
18:40 imirkin_: and doesn't share it :)
18:40 karolherbst: all I know is that the shader with 1 conditional branchs, 1 branch, 1 break doesn't really work out, where the shader with a conditional break and a branch kind of does
18:40 karolherbst: bad HdkR
18:40 karolherbst: come to the good side HdkR, all information must be free!
18:40 RSpliet: karolherbst: what's the problem?
18:41 imirkin_: the problem is that we don't understand how gpu does branches
18:41 karolherbst: RSpliet: traps inside a nested loop. hold on a moment, I grab the shaders
18:41 imirkin_: which in turn leads to later confusion
18:41 RSpliet: imirkin_: there's a patent for that!
18:41 imirkin_: later confusion? i invented it first!
18:41 karolherbst: RSpliet: https://gist.github.com/karolherbst/389a7cfd703a3419a641324e0dc266f5
18:41 karolherbst: there are the shaders
18:41 karolherbst: the broken one causes traps
18:41 karolherbst: the working one, not
18:41 HdkR: karolherbst: I gave you an idea to try at least :P
18:41 karolherbst: can you explain to us why that is?
18:42 imirkin_: RSpliet: basically ... the call stack, etc
18:42 karolherbst: HdkR: I know, but you know, we have to fix our compiler so that we don't have to rely on optimizations
18:42 karolherbst: HdkR: also, we don't do nops
18:42 karolherbst: :p
18:42 HdkR: karolherbst: The idea was for testing. Not a final solution
18:42 karolherbst: and I am sure neither would nvidia with opts disabled
18:42 imirkin_: when it's touched, how the gpu handles divergence, etc
18:42 karolherbst: HdkR: right, but for testing I have my other solutions which seems to work
18:42 karolherbst: RSpliet: you can diff those easily
18:42 imirkin_: i only have a vague understanding, nothing concrete
18:43 RSpliet: karolherbst: do you have shaders that have branch targets filled in rather than organised per-BB. This is a bit tricky to read
18:43 karolherbst: RSpliet: diff them, that's fully enough I think
18:43 HdkR: Sure but my one would let me laugh if it made the problem go away
18:43 karolherbst: RSpliet: that's the glsl code: https://gist.github.com/karolherbst/d24a9a5ac387cd2bdd221efce407c143
18:45 karolherbst: the trap we get is something like "gr: GPC0/TPC1/MP trap: global 00000004 [MULTIPLE_WARP_ERRORS] warp 70016 []" but we have no idea what 0x16 means...
18:46 RSpliet: Ok, so first of all the fact that the working one works doesn't mean it's correct :-P But let me see...
18:47 karolherbst: of course not, but at least it doesn't cause a trap even for a lot of iterations
18:47 karolherbst: the broken one bails out after a few iterations withing the nested loop
18:47 karolherbst: (2*7)^2 or something is what we can do savely
18:48 RSpliet: You... might be overflowing the control stack then
18:48 karolherbst: probably
18:48 RSpliet: the reason why I asked for a fully compiled shader is because I can't easily see branch targets in a linear fashion. Anyway
18:48 karolherbst: anything its a for() { for() {if () break; } } thing
18:48 karolherbst: *anyway
18:48 RSpliet: The joinat BB:25 is not inside a loop presumably
18:49 karolherbst: right, should be before the loop
18:49 karolherbst: BB:10 is where the loop starts
18:49 RSpliet: So that pushes one entry onto the ctrl stack, with a "all threads" mask.
18:50 karolherbst: actually, BB:11 is where the loop starts
18:50 imirkin_: so what does one do in such a case? compile the shader differently? or larger stack? or?
18:50 karolherbst: RSpliet: let me find where I placed the shader and I could give you the full IR output
18:51 RSpliet: the not $p0 bra BB:35 at the end of BB:9 pushes the branch target BB:32 onto the control stack with some of the thread bits set, continues execution in BB:10 for the remaining threads
18:51 RSpliet: There's a prebrk in BB:10, pushing another entry on the stack. That's three.
18:53 RSpliet: karolherbst: Thanks, that's very helpful.
18:54 karolherbst: RSpliet: https://gist.github.com/karolherbst/d24a9a5ac387cd2bdd221efce407c143#file-z-ir
18:57 RSpliet: Oh hm, I was hoping for envytools disassembly, so with the BB targets replaced with addresses
18:57 RSpliet: The fact they're not linear messes with my head :-P
18:57 imirkin_: the non-sequential numbers getting confusing? :)
18:57 RSpliet: Yep
18:57 karolherbst: ohhh, I see
18:57 karolherbst: uhm.. I have the binary somewhere
18:58 RSpliet: Thanks
18:59 RSpliet: I don't think break in like kepler ISA is supposed to take an address... only prebrk.
19:00 karolherbst: RSpliet: https://gist.githubusercontent.com/karolherbst/d24a9a5ac387cd2bdd221efce407c143/raw/8ce582d34d2dd50dbb8b7773eb380b19842cd055/zsd.envy
19:00 imirkin_: correct.
19:00 RSpliet: karolherbst: thanks!
19:00 imirkin_: break just pops + jumps to the stack addr
19:00 karolherbst: RSpliet: I am not 100% sure it's the same one, but like 95% sure
19:01 RSpliet: Ah, so many sched codes, this must be maxwell+
19:01 karolherbst: imirkin_: ohh, yesterday, that reminded me of: https://github.com/envytools/envytools/pull/151
19:02 karolherbst: are you fine with the current version?
19:02 karolherbst: RSpliet: it is :p
19:05 RSpliet: karolherbst: your problem is probably 00000290: 0b88000f e2400000 not $p0 bra 0x350
19:05 karolherbst: yep
19:05 karolherbst: did you diff the both IRs?
19:05 karolherbst: kind of makes it obvious it is the issue
19:05 karolherbst: the question is: why?
19:06 RSpliet: Because it allows some threads to not "break" out of the loop between 1f0 and 348
19:06 karolherbst: okay
19:06 RSpliet: As a result you don't pop the prebrk entry
19:06 karolherbst: so a conditional break is fine within a nested loop
19:06 karolherbst: a conditional bra + break not?
19:06 RSpliet: A conditional branch is fine, but only if both branches end with a brk somehow
19:06 karolherbst: ohh
19:07 karolherbst: okay, that makes sense
19:07 karolherbst: which means we either have to converge both paths within the same iteration
19:07 karolherbst: meaning we move the break out of the one path and make it conditional after converging
19:07 RSpliet: It's a bit more tricky
19:08 RSpliet: because... the branch target has an unconditional branch to 1f0
19:08 karolherbst: yeah, but this we can't change
19:08 karolherbst: but I meant before
19:08 karolherbst: we have to converge before that bra
19:08 karolherbst: basically, move the brk out of the BB
19:08 karolherbst: move it before the bra 0x1f0
19:08 karolherbst: and execute it conditional based on $p0
19:09 karolherbst: the bra 0x1f0 is esentially the loops continue
19:10 RSpliet: No, the thing is. the branch at 290 pushes the branch target 350 onto the stack and continues execution. When that hits 348, it will pop the top entry off the stack, because an unconditional brk will disable everything that's still alive. That pops the branch target off with the threads that followed that branch, continuing execution at 350.
19:12 karolherbst: okay... now I am wondering, why does removing that branch at 290 help if 298-348 are executed conditonally on $p0
19:12 karolherbst: what does that changes in order to be able to execute that shader?
19:13 karolherbst: or simply because nothing would get pushed onto the stack?
19:14 HdkR: Oh, something as boring as not pulling tokens back off the stack? I'm disappointed
19:14 HdkR: Logic error is sad D:
19:14 karolherbst: RSpliet: or let me rephrase the initial question: what do we have to do to make all that nice and legal
19:15 RSpliet: I'm trying to figure out exactly why this is illegal :-P
19:15 karolherbst: okay :D
19:18 RSpliet: Ok, so different codepath that looks very suspicious
19:18 RSpliet: 158: conditional jump to 368. Jumps into the middle of the loop, but if taken consensually the prebrk hasn't been executed yet.
19:21 karolherbst: RSpliet: there are two loops
19:21 RSpliet: karolherbst: nested if I'm right
19:21 karolherbst: yeah
19:21 RSpliet: 158 is before the first prebrk (198), so before the outer loop
19:22 karolherbst: RSpliet: I can play around a little, like forcing the amount of iterations and stuff like that
19:22 RSpliet: Yet it jumps into the tail of the outer loop from the look of it
19:22 RSpliet: or that's what it looks like, given a few lines further there's a conditional brk without branching happening in between
19:22 karolherbst: RSpliet: well, function call ;)
19:22 RSpliet: following by an unconditional 0x1c8
19:22 imirkin_: RSpliet: i think those are just weird BB's that get shceduled there
19:22 imirkin_: they don't jump to each other
19:22 imirkin_: they each jump somewhere else
19:23 karolherbst: RSpliet: 368 saves the return value
19:23 imirkin_: and then does a sync
19:23 imirkin_: which jumps to 3d8
19:23 imirkin_: which is the final bb
19:23 karolherbst: yep, 3d8 is after the func call
19:23 RSpliet: oh sync is joinat?
19:23 karolherbst: essentially
19:23 imirkin_: RSpliet: sync is join
19:23 RSpliet: pardon. "join"
19:23 RSpliet: yeah
19:23 imirkin_: ssy is joinat
19:24 RSpliet: I was under the impression sync was a barrier
19:24 imirkin_: that's bar
19:24 RSpliet: Could use one now
19:24 RSpliet: But in that case, false alarm
19:25 imirkin_: or put it another way, if sync is not "join", we're in deep trouble
19:27 karolherbst: imirkin_: btw, your new text patch seems to survive the CTS as well, seems to work with a few applications as well. Will do more testing though
19:28 karolherbst: crazy annoying bug btw, both actually
19:34 RSpliet: karolherbst: the higher level observation is that the GPU doesn't seem to like ending one branch of that divergent branch with a "brk" rather than a sync.
19:36 imirkin_: iirc break pops the stack until it hits the right thing
19:36 imirkin_: maybe not
19:36 imirkin_: but if not, why have pbk/ssy?
19:36 imirkin_: i.e. why not just have one
19:37 RSpliet: imirkin_: there's three or four masks
19:37 RSpliet: a stack entry is annotated with the type "ctrl, break, ret" ... the fourth mask being exit - which you'll never find a stack entry for
19:37 imirkin_: famous last words
19:38 RSpliet: That way they make "ret" stick even if sandwiched between prebrk-brk
19:39 imirkin_: right
19:39 imirkin_: so ... i think it just pops until it finds the right type, no?
19:40 RSpliet: My understanding was that it just pops regardless. Restores whatever mask it popped, if now there's suddenly threads that can run, good, continue. Otherwise, pop again
19:40 imirkin_: ok, but if you hit a ret
19:41 imirkin_: and the thing on the stack is a join
19:41 imirkin_: it'll keep popping until it hits a preret, no?
19:42 RSpliet: I mean, I haven't experimented with this a lot, but I don't see why that should be necessary. Function call, two branches, both have a ret at the end. After hitting the first ret, you'd first want to execute the other branch before you sync them up and return control...
19:42 imirkin_: ohhhh
19:42 imirkin_: i see what you mean.
19:42 imirkin_: i was thinking about it from the single-thread perspective
19:42 imirkin_: but that's clearly a bogus way of lookign at it
19:49 RSpliet: But if I'm honest with you, if that's the way it works, this code doesn't look wrong. It just implies that the brk implicitly does a join. But until every last entry has executed a brk, it will not pop the prebrk entry off.
19:55 RSpliet: Nor is there any risk of a stack overflow, unless the stack can only hold 4 entries...
19:58 HdkR: throw four prebrk at the start of a shader, see if it explodes? :)
19:59 imirkin_: 4 seems a wee bit low
19:59 HdkR: Maybe like 1000 and see if madness happens
19:59 imirkin_: even for my test
19:59 imirkin_: taste
19:59 RSpliet: We know how to set the stack size in the shader header right?
20:03 HdkR: `Minimum value 0, maximum 1 megabyte. Must be multiples of 512 bytes.` huh. 1MB maximum. TIL
20:04 RSpliet: Yeah, but there is (or used to be) a small hardware stack too
20:05 RSpliet: Can't recall how many entries it holds, but not many
20:05 HdkR: So 1MB plus a smallish amount
20:05 HdkR: :)
20:05 RSpliet: Well, we'd have to carve out that stack memory from what OpenCL calls local memory I think
20:05 RSpliet: E.g. make sure we don't use it for anything else.
20:06 RSpliet: Not sure if nouveau does any of that
20:08 pmoreau: OpenCL’s local memory being the slice of L1 that you have control on, right? (aka shared memory in CUDA)
20:12 karolherbst: pmoreau: yeah
20:12 karolherbst: please call it shared
20:15 pmoreau: Taking chunks of shared might be fine for graphics, but for compute that’s going to be more complicated as the user might use all of it.
20:18 RSpliet: pmoreau: that's the one
20:18 RSpliet: Prefer open standards and their terminology ;-)
20:20 pmoreau: RSpliet: So “shared”, like in OpenGL compute shaders? ;-)
20:20 RSpliet: pmoreau: thanks Khronos :')
20:21 RSpliet: "shared" is the one I had only heard before in a CUDA context, and it confuses the hell out of me. You share global memory right?
20:21 RSpliet: Anyway, tomato tomato
20:24 pmoreau: On the other hand “local” isn’t really local either, as it’s off next to the L1 rather than it being the register file. But CUDA isn’t better either on that as it’s the spill memory (in global).
20:25 RSpliet: Yeah, all terms suck in various degrees, but local/global speaks to my imagination, shared/.... doesn't :-P
20:26 pmoreau: I have the opposite problem, probably because I learned CUDA first.
20:27 RSpliet: Doesn't really matter what you learned first, I'm right and the whole world is wrong!
20:27 RSpliet: :-P
20:27 pmoreau: :-D
20:27 karolherbst: RSpliet: look from a workgroup point of view
20:28 karolherbst: then it makes all sense
20:28 pmoreau: RSpliet: You’re sounding like a grumpy old man, just saying! ;-p
20:28 karolherbst: shared: memory shared across all threads, global: accessible by everything, local: accessible from within a thread ;)
20:28 karolherbst: that's how we name it inside codegen as well
20:29 karolherbst: and is essentially only sane naming scheme :p
20:29 HdkR: shared lets you share your toys with your friends. You don't have many friends. Global lets you share everything to the world, like a facebook data leak
20:30 karolherbst: or we name it like spir-v
20:30 karolherbst: workgroup/crossworkgroup...
20:30 karolherbst: private
20:30 karolherbst: uhm
20:30 karolherbst: function I meant
22:38 RSpliet: pmoreau: well, can't blame me for being grumpy, everything used to be better in my days...
23:01 imirkin: RSpliet: back when the kids didn't play on your lawn?
23:09 RSpliet: imirkin: back when I could afford a lawn on a normal salary
23:09 imirkin: ;)