02:23 pabs3: imirkin: Debian finally has Linux packages containing the 64-bit vblank fix \o/
02:23 imirkin: yay
02:23 imirkin: does it make your system work?
02:24 pabs3: I just rebooted into it
02:24 imirkin: the issues don't appear until you're a few weeks in though, right?
02:24 pabs3: well yesterday it took much less than that
02:25 pabs3: but yes, it takes longer than a few minutes
02:26 pabs3: yesterday I had to shut down for a power outage and then afterwards I hit the issue a few hours later
02:35 annadane: is this the same pabs as in debian channels?
02:35 pabs3: ya
05:14 pabs3: imirkin: hmm, now I am getting gnome-shell freezing while vlc is full-screen. the vblank issue does seem to be fixed though, since that doesn't break GL afterwards, and I can restart gnome-shell
09:46 martossa: https://devtalk.nvidia.com/default/topic/541356/opengl/slow-local-arrays-in-glsl/
09:47 martossa: so finally did you manage to understand a few, how glsl generates the arrays from high-level code during RA ?
09:49 martossa: this is why mentioned scheduling methods would have to be done either postRA, or you need to change the allocation scheme, if you want to do that in IR of some sort, it needs to have physical register notatation/notion
09:56 martossa: the thing is when you want to trace some register in the handler, you need register and pointer addressspaces to be unified i.e colliding, but the register allocation don't do that by default in glsl
09:59 martossa: this is just unbelivable what kind of dorks you are!
10:13 martossa: hence only thing you need to be sure is to construct a large array of all the registers that the program uses ..upon initialization you do not read them but call it on the memory that does pagefault, otherwise it will take time to finish, say you have one warp running with max 256regs
10:14 martossa: you'd without the large array know what warp it is cominig from, but would not know what the underlying reg was
10:19 martossa: what i am saying, world science and forums are full of this info, you act like arrogant conquerors but are complete douches, who block the ways, i no longer can not tolerate this!
10:20 martossa: i must face the facts by trying to help you i have constantly only wasted time!
10:21 martossa: cause you lack any sense and empathy whatsoever
10:27 martossa: why you want to construct a big arrays in machine code, is how pipelined processors like GPU inherently work, the hw only cares about what registers had changed values to schedule an execution of the instructions, pointer readbacks are done in private regs only, what will happen is that the instruction who reads some register in later stream will be hijacked in exec stage by the big array
10:27 martossa: so you can redirect pointers, change values or whatever, that is what arrays aka pointers are meant to do
10:33 martossa: both execute and readback pipeline stages are in-order queues, some might want to call them fifo's that is how verilog works inherently, they are evaluated with input feeds sequentially and executed in parallel in order, it only cares about writebacks in the order of issue or readback
10:33 martossa: and it will do the underlying functions aka exec in this case only when register in that warp had a value changed
10:34 martossa: similarly it will always try to readback unless scoreboard blocked the stuff
10:36 martossa: in other words, the inputs of the queue are processed in the order, the outputs can come out of order
10:36 martossa: it is because some instructions may skip cause of no change in private regs value, or scoreboard blocking the stuff
10:37 martossa: so to speak inorder readback out of order execution, out of order writebacks
10:38 martossa: execution even though it is in the order, cause some insts skip it appears as out of order
10:45 martossa: and why is an array even sequence of regs , cause in lsu stage pointer flops are broadcasted, and the hw allows whole-array operations on element0 of the pointer, in which case
10:50 martossa: or array, in which case, the array will run per all the marked registers per warp, and even though no writeback broadcasting is done
10:50 martossa: the addressspace captures the local writes and can do broadcasting manually
10:51 martossa: it is all documented, and of course it is the most flexible and sanest way to program the hw
11:04 martossa: manual broadcasting means that you have writeback reg in instruction cache stream, aka. accessing individual elements of the array
11:06 martossa: another example is, if you want to execute multiple pixels of 64fragments in parallel, there is a method to do that also, but this time it is no longer via writeback
11:06 martossa: broadcasting
11:22 mariossa: in this case the thing would be done by hijacking the warps private regs in execute stage, and regfile will flop the last value
11:22 mariossa: dont concentrate on this imirin troll, he is an handycap
11:24 mariossa: with born handycaps i.e was that breast milk deficiency or genetical disorder it's always annoying when such don't understand their issues and deny it
11:24 mariossa: all the real men arsenal constantly suffer about it
12:39 karolherbst: imirkin: currenly fixing up my slct(a, b, set(c, 0)) patch, but that condcode handling for that is a bit annoying :(
12:41 karolherbst: especially things like S32 vs U32 on the slct sType
12:41 karolherbst: and int true returned by the set
12:43 imirkin: you could just not handle some cases
12:45 karolherbst: really painful is just slct with s32 sType, because u32 and f32 is basically the same if I just set sType to u32
12:45 karolherbst: maybe that will be enough
12:47 karolherbst: imirkin: uhm, random though, we have a slct.ge.u32(a, b, c) -> a opt?
12:47 karolherbst: *thought
12:47 karolherbst: or slct.lt.u32(a, b, c)
12:47 karolherbst: -> a
12:47 karolherbst: uhm
12:47 karolherbst: b
12:57 karolherbst: no difference yet
13:13 imirkin_: karolherbst: i doubt those ops can be generated
13:13 imirkin_: or ... i actually don't remember what SLCT does precisely
13:13 karolherbst: ?:
13:13 karolherbst: and comapre against 0
13:13 imirkin_: right
13:14 imirkin_: i think there's only NE/EQ for the U32 variant
13:14 karolherbst: ahh
13:14 karolherbst: would make sense
13:14 imirkin_: :)
13:25 karolherbst: okay and my original patch was indeed wrong
13:26 karolherbst: maybe we also want to get rid of those "value %70 not uniquely defined" warnings
13:26 karolherbst: it should be caused inside from_tgsi somewhere
13:27 imirkin_: they're real issues.
13:27 imirkin_: the issue is that getUniqueInsn shouldn't be used at that point in time.
13:27 karolherbst: maybe
13:28 karolherbst: at least I never got those warnings with my nir stuff, but there I was careful about creating SSA values
13:28 imirkin_: :)
13:28 karolherbst: I think in from_tgsi something creates an SSA value and assigns it multiple times
13:28 karolherbst: should be fairly easy to fix
13:28 imirkin_: that's not how you trigger the problem
13:29 imirkin_: well - there are a few ways actually
13:29 imirkin_: perhaps this case is legit. would have to be investigated very carefully.
13:31 karolherbst: imirkin_: does this look correct to you? https://github.com/karolherbst/mesa/commit/ede4ecfd0a006f15df4dc1ba83e3bbe459ea4474
13:32 imirkin_: src2 is the thing that you're comparing against 0 right?
13:32 karolherbst: yes
13:32 karolherbst: I plan to check shaders which cases I should handle as well, but those were the obvious and less painful ones
13:33 imirkin_: sure. let me just read through it ...
13:33 imirkin_: i think you got it slightly backwards
13:34 imirkin_: + set->src(s ^ 1).getImmediate(imm0);
13:34 imirkin_: 
13:34 imirkin_: what's the point of all that?
13:34 imirkin_: you already loaded the immediate into imm0
13:34 karolherbst: if set->src(1).getImmediate(imm0) removes false, is imm0 still valid from the last call?
13:35 imirkin_: s/removes/returns/?
13:35 karolherbst: yes
13:35 imirkin_: if it returns false, you're going to return as per the "else" clause
13:36 karolherbst: actually yeah, the argument is only touched in the case there is an immediate
13:36 imirkin_: depends on the *set*'s *dType*
13:36 karolherbst: actually both
13:36 imirkin_: nope
13:36 imirkin_: only dtype
13:37 imirkin_: you can have ISET.BF -- stype is s32, but dtype is f32
13:37 karolherbst: why? slct.s32 and set.u32 vs slct.u32 and set.u32
13:37 karolherbst: allthough mhh
13:37 imirkin_: you're looking at what the set returns
13:38 imirkin_: the set returns based on its dtype, nothing else
13:38 karolherbst: okay sure, but the condition of the slct depends on its sType
13:38 karolherbst: allthough I hope I can get away by simple swapping sources
13:38 karolherbst: *simply
13:39 imirkin_: that is correct. but not what's written in the comment text.
13:39 karolherbst: right
13:39 imirkin_: anyways, the comment outlines all this stuff
13:39 imirkin_: but it seems completely tangential to what the code is doing
13:39 karolherbst: currently yes
13:40 imirkin_: i'm like 25% sure your logic is wrong, but i'd have to draw it out on paper to tell
13:40 karolherbst: I wrote this down: https://gist.githubusercontent.com/karolherbst/853b16390a345057e7066b3eeeb8ee52/raw/4dfc14e82e42cca7a158412a9946b87aa89dc0c2/gistfile1.txt
13:40 imirkin_: so instead i'm going to ask you to beef up your comment to have examples for each case
13:40 imirkin_: you might also do it as
13:40 imirkin_: if (set->cond == NE) {
13:40 imirkin_: if (slct == EQ || NE) { do stuff }
13:40 imirkin_: }
13:40 imirkin_: that way the cases are a bit simpler to read
13:41 imirkin_: your call though
13:43 karolherbst: if (slct->cond == EQ || NE) { if (set->cond == NE) {} else if set->cond == EQ) {} }
13:43 imirkin_: however you want to group it.
13:43 imirkin_: the current cross-product is a bit hard to read
13:48 karolherbst: imirkin_: https://github.com/karolherbst/mesa/commit/c5dc65e09f3474c28bfbdbf56694a687641f3e8e
13:48 imirkin_: heh
13:49 imirkin_: yeah, that's not better at all.
13:49 imirkin_: sorry
13:49 imirkin_: undo.
13:49 imirkin_: how about this
13:49 imirkin_: /* ((a != 0) == 0 ? d : e) == (a == 0 ? d : e) */
13:49 imirkin_: if (the-cond) { stuff }
13:49 imirkin_: /* ((a != 0) != 0 ? d : e) == (a != 0 ? d : e) */
13:49 imirkin_: if (the-cond) { stuff }
13:49 imirkin_: a bit more verbose, but should be a whole lot clearer
13:49 karolherbst: true
13:53 karolherbst: imirkin_: https://github.com/karolherbst/mesa/commit/552e61decebdcbdbbdff9e343e5880b0eefc4dd7
13:57 imirkin_: oh yeah! much clearer!
13:57 imirkin_: + if (!imm0.isInteger(0))
13:57 imirkin_: shouldn't need that...
13:58 imirkin_: you already check for it above
13:59 imirkin_: ok, so now the only rub is
13:59 imirkin_: this won't work if the OP_SET has a 64-bit sType
13:59 imirkin_: and if the OP_SET has a sType of f32, i think you should change the sType of the slct to f32 as well
13:59 imirkin_: since 0x80000000 == 0x00000000 in f32-land but not u32-land
14:09 karolherbst: imirkin_: can't I simply set the sType of the slct to the sets sType?
14:09 imirkin_: not if the set's stype is 64-bit
14:09 imirkin_: coz there is no 64-bit slct
14:09 karolherbst: uhh, I see
14:10 imirkin_: you should just check that typeSizeof(setInsn->sType) == 4
14:34 karolherbst: setting the stype further improved the code, nice
15:26 karolherbst: mhh, some mad max shaders have set.u32.lt(a, 0) + slct.u32.ne
15:27 karolherbst: uhm
15:27 karolherbst: that's sets sType is f32 actually
15:34 imirkin_: so ... FSET.LT ?
15:52 karolherbst: imirkin_: yeah
15:52 karolherbst: but with int result
15:52 imirkin_: sure
15:52 imirkin_: that can become SLCT.F32.LT then
15:53 imirkin_: which is good
15:57 karolherbst: do I actually have to require set->CC->dType == slct->sType? like what happens for a f32 true put into a slct.u32? false?
15:59 karolherbst: or is it actually enough if any bit is set?
16:02 karolherbst: ohh wait, ... I am stupid :D
16:11 karolherbst: :/ https://gist.github.com/karolherbst/9bdec0e4efe7840d08cd090612b58c2f
16:19 imirkin_: doesn't each opt run to a fixed point?
16:19 imirkin_: or no, it runs twice or something
16:20 imirkin_: or are you forgetting to report progress?
16:20 imirkin_: i forget how it all works
16:20 imirkin_: or you can stick a little while loop there
16:20 karolherbst: I think there are different opts involved actually..
16:20 imirkin_: while insn->op == OP_SET :)
16:20 karolherbst: ohhh
16:20 karolherbst: maybe that helps indeed
16:21 imirkin_: there's a helper in the const folder
16:21 imirkin_: which finds the "ultimate" set or something
16:21 imirkin_: when it's set of set of set of set
16:21 imirkin_: you should be able to use that here too
16:22 imirkin_: ConstantFolding::findOriginForTestWithZero
16:22 imirkin_: this "sees" through a lot of stuff
16:23 imirkin_: including shit like AND(1.0, SET())
16:23 imirkin_: are you sure your thing belongs in AlgebraicOpt btw?
16:23 imirkin_: and not in ConstnatFolding
16:27 karolherbst: mhh, maybe it would be better to put that in ConstantFolding
16:30 karolherbst: imirkin_: problem is, ConstantFolding kind of starts with the current instruction having an immediate
16:30 imirkin_: oh right
16:30 imirkin_: you don't have an immediate
16:30 imirkin_: yeah ok
16:30 imirkin_: you belong in Algebraic
16:30 karolherbst: but that set looping might be a good option in the end
16:32 karolherbst: imirkin_: ... do you see anything wrong here? The improvements are way too good: https://github.com/karolherbst/mesa/commit/66e0fefc39d0ef2d1f379d2746b9eec711f92834
16:32 karolherbst: (this is another opt I am working on)
16:32 karolherbst: ohhhh
16:32 karolherbst: the second case is wrong :)
16:33 karolherbst: forgot to swap the immediates
16:33 karolherbst: but still... weird
16:41 imirkin_: karolherbst: that needs to be handled separately
16:41 imirkin_: that expr() is meant to compute an exact result
16:41 karolherbst: okay
16:42 imirkin_: the code may be good, it just has to live elsewhere i think
16:43 karolherbst: yeah, I guess it would make sense to add an OP_SLCT case to the other functions as well
16:43 karolherbst: anyway, that 0, -1 case is more difficult to handle anyway
17:36 karolherbst: imirkin_: every time I look at reverseCondCode I get more annoyed :(
17:36 imirkin_: it's correct.
17:37 karolherbst: I know
17:37 karolherbst: I just don't get why that stupid math is better than a table with all codes
17:38 imirkin_: :)
17:38 imirkin_: coz someone was feeling clever when they wrote it
17:38 karolherbst: :D
17:38 karolherbst: I don't really want to know how much time you have to spend to get the CondCode enum just right so that you can do clever stuff like that
17:38 imirkin_: not long
17:38 imirkin_: it falls out of boolean operations
17:39 imirkin_: you can encode boolean ops as bits
17:39 karolherbst: sure, but faster than just writing that stupid table?
17:39 imirkin_: and manipulating the bits has certain mathematical properties re those boolean ops
17:39 imirkin_: if you're not familiar with boolean math, then a stupid table is a lot faster
17:39 karolherbst: I guess so
17:39 imirkin_: if you are familiar with boolean math, then the stupid math is a lot more reliable than the table, which will invariably have errors
17:40 imirkin_: personally, i tend to be more of a lookup table kind of guy
17:40 karolherbst: yeah, it is easier to read
17:41 imirkin_: but the clever math is clever for a reason, so it's not totally random either. this stuff comes up.
17:42 karolherbst: I am just wondering if it's worth the effort
17:50 karolherbst: imirkin_: what was neu vs ne again?
17:51 imirkin_: unordered vs ordered
17:51 imirkin_: nan
17:51 karolherbst: ohh
17:51 karolherbst: mhh
17:52 imirkin_: (only matters for float compares, naturally)
17:52 karolherbst: sure
17:52 karolherbst: it is just, that it came u
17:52 karolherbst: p
17:52 imirkin_: ieee754 has all that specified btw
17:52 imirkin_: i think opencl exposes it? not sure.
17:53 imirkin_: ah not really. isunordered(), but you can't specify it for gt/lt/etc
17:57 karolherbst: that unordered stuff is a little wierd, no?
22:01 karolherbst: imirkin_: I guess I have to add ConstantFolding::opnd(Instruction *, ImmediateValue&, ImmediateValue&) for the slct thing then
22:01 imirkin_: something like that
22:02 imirkin_: opnd3() maybe
22:02 karolherbst: opnd3 already exists
22:02 karolherbst: it is if the third source is an immediate
22:02 imirkin_: i'm not too fussed on names...
22:02 karolherbst: mhh, opnd3 handles mad and shladd
22:03 karolherbst: I don't really see the point because i could be opnd(i, imm, 2)
22:03 karolherbst: though, might make that code more complex there
22:03 karolherbst: I add a variant which should fit
23:22 Simon--: imirkin_: lo! any idea where I should start on debugging that hang on possibly out of vram case? debug=something, or is there something I should haxor/run to see if vram is even relevant?
23:22 Simon--: I can separately console, etc
23:23 imirkin_: sorry, i have little experience debugging such matters
23:23 imirkin_: perhaps skeggsb can provide some advice
23:26 Simon--: ok, thx. I'll fiddle after I rebuild some stairs..
23:28 karolherbst: imirkin_: mhh in ConstantFolding we end up with movs having a src(2) :(
23:29 imirkin_: karolherbst: that's really bad. something's not cleaning up
23:29 karolherbst: yeah
23:32 karolherbst: imirkin_: ConstantFolding::expr(Instruction *i, ImmediateValue &imm0, ImmediateValue &imm1) for all ops with 3 sources :)
23:33 karolherbst: oh well, OP_MAD/OP_FMA have some special handling
23:36 karolherbst: imirkin_: maybe at some point we should add a programm::validate method which checks all that stuff, after each pass or something
23:37 imirkin_: yeah, that's one of the nice things about glsl ir (and nir) -- the validation
23:37 karolherbst: yeah
23:37 imirkin_: that said, it's a lot more of a pain to adjust when you want to do something funky
23:37 imirkin_: nvir is pretty fast and loose
23:37 karolherbst: well, depends
23:37 karolherbst: in the end it is still somehow well defined
23:37 karolherbst: at least the instructions
23:37 karolherbst: (maybe tex not so much, but tex is pain)
23:39 karolherbst: and we can also depend on the actual stage, like for those ops which sometimes have two args and sometimes one after legalization or something
23:39 karolherbst: but just adding it for all the non funky stuff would catch quite a lot already I think