00:02 mwk: mooch2: ?
00:02 mwk: you're actually watching my repos?
00:03 mwk: it's for a university course... it's a piece of hardware I made up
00:03 mwk: nothing real
00:04 mwk: more or less a "hardware" implementation of the rendering primitives used by id software's Doom
00:04 mwk: students are supposed to make a driver for that
00:05 plutoo: anyone know what register 0x0118 / 0x011C are for MAXWELL_3D (0xB197) ?
00:06 plutoo: seems like it might be a way to write into CB ?
00:06 mooch2: oh
00:06 imirkin_: plutoo: how are you counting?
00:06 imirkin_: are you dividing by 4?
00:07 plutoo: if you divide by 4 they would be 0x45/0x46
00:07 mooch2: plutoo, protip, on nvidia hardware, those "registers" aren't really registers, but more like function calls. writing to a "register" can have all kinds of side-effects
00:08 imirkin_: plutoo: right ... but if you already divided by 4, then they'd be 0x460
00:08 karolherbst: std::list containers are kind of annoying to debug with :(
00:08 imirkin_: plutoo: anyways, we have this defined: <reg32 offset="0x0400" name="UNK0400" length="0xc0">
00:08 imirkin_: with a comment about "vertex submission of some sort", but all the immediate mode vertex submission is gone on maxwell
00:09 imirkin_: all the 0x100..0x200 stuff is actually a copy engine instance
00:09 plutoo: this is used during gpu initialization of an official nvidia driver
00:09 imirkin_: but those methods start at 0x180
00:09 imirkin_: 0x110 is the sync method iirc
00:09 plutoo: google turned up this: NV91C0_LOAD_MME_INSTRUCTION_RAM 0x0118
00:10 imirkin_: OH
00:10 imirkin_: right
00:10 imirkin_: the macro loading
00:10 plutoo: but that's for compute
00:10 imirkin_: mme = macros
00:10 plutoo: what's a macro
00:10 imirkin_: yeah, those are in the same places.
00:10 imirkin_: https://github.com/envytools/envytools/blob/master/envydis/macro.c
00:10 imirkin_: (didn't i explain this to someone the other day?)
00:10 imirkin_: (was it you? or someone else?)
00:11 plutoo: not me
00:11 imirkin_: k. it was in the past 2 weeks i think.
00:11 imirkin_: someone trying to do something switch-related
00:11 plutoo: i'm also doing switch stuff
00:12 imirkin_: i guess there's many of you running around :p
00:12 imirkin_: read the huge comment towards the top of the file i pointed you to
00:12 imirkin_: it explains what macros are and how they're loaded
00:12 imirkin_: demmt decodes them too, as well as has an interpreter for them.
00:13 plutoo: cool cheers
00:56 karolherbst: huh, what kind of instruction is that.. ne $r8 not s32 $r1 $r8
00:57 karolherbst: ohh
00:58 HdkR: The instruction that says ne
00:58 karolherbst: imirkin_: something turned a predicate into a GPR :O
00:59 karolherbst: 38: $p0 not s32 $r1 $r8 (8)
00:59 karolherbst: but the emiter wants to emit a gpr
01:02 imirkin: that's not a thing.
01:02 imirkin: someone fucked something up
01:03 imirkin: is this with your nir thing?
01:03 karolherbst: yeah
01:03 imirkin: pastebin codegen input
01:03 karolherbst: https://gist.githubusercontent.com/karolherbst/14fa209be7bab8307a545ba5c3f713a5/raw/001c52ff9203bcfba49a8bc75e27dfe7a8e9757f/gistfile1.txt
01:04 imirkin: wtf is this shit
01:04 imirkin: 84: eq %r90 bra BB:3 (0)
01:04 karolherbst: that's legal actually
01:05 karolherbst: we had this already
01:06 karolherbst: tgsi does the same
01:07 karolherbst: "61: eq %r62 bra BB:5 (0)"
01:09 imirkin: aha
01:09 imirkin: NVC0LoweringPass::visit()
01:09 imirkin: does if (i->cc != CC_ALWAYS)
01:09 imirkin: checkPredicate(i);
01:09 imirkin: which in turn sticks an extra SET in there
01:09 karolherbst: yeah
01:09 karolherbst: sounds about right
01:10 imirkin: can i see the ssa post-opt form?
01:11 karolherbst: https://gist.githubusercontent.com/karolherbst/00b9171db8da0354692e94dd337f5ff8/raw/1cd11757c977210bb061a34a458666587bce7390/gistfile1.txt
01:11 imirkin: and the post-RA thing it's trying to emit?
01:12 karolherbst: https://gist.githubusercontent.com/karolherbst/f10fab2cdd883d29f5f0dd162ed4a51f/raw/b598330b51ec05771903f76ec904b962c501d000/gistfile1.txt
01:12 stoatwblr_: hey imirkin, thanks for the advice. I'm now the proud owner of a used radeon W4100 card :) I'll have it in my grubby mitts next week.
01:12 imirkin: stoatwblr_: cool. hope it works out well for you.
01:13 stoatwblr_: $65, so not too much.
01:13 imirkin: karolherbst: erm... and where's the problem?
01:13 karolherbst: emiter asserts
01:13 stoatwblr_: so do I, gotta get some mini-DP to dvi adaptors, but they're cheap as hell from aliexpress.
01:13 imirkin: on what?
01:13 karolherbst: somehow an instruction is this: "ne $r8 not s32 $r1 $r8 (8)"
01:13 karolherbst: inside handleNOT
01:14 karolherbst: ../../../../../src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp:282 assert(i->getPredicate()->reg.file == FILE_PREDICATE);
01:14 karolherbst: I got that from gdb
01:14 imirkin: sounds like memory corruption
01:14 imirkin: that's never printed before...
01:15 karolherbst: yeah, seems likely
01:15 karolherbst: mhhh
01:15 karolherbst: or is it
01:15 karolherbst: imirkin: where is the predicate stored?
01:15 karolherbst: crap...
01:16 karolherbst: "i->setSrc(1, i->src(0));" in emitNOT
01:16 karolherbst: guess what happens if I disable that line
01:16 karolherbst: but...
01:16 karolherbst: I mean... we have to emit predicated NOTs before, no?
01:17 imirkin: hehehe
01:17 imirkin: apparently ... NOT
01:17 karolherbst: ....
01:17 karolherbst: that limm bug was super annoying already
01:17 karolherbst: I am sure I will hit a few more of those weirdo bugs
01:19 karolherbst: imirkin: Val *pred = getPredicate; setSrc(); setPredicate(pred) ?
01:19 karolherbst: or why is that src1 set anyway for a not?
01:22 imirkin: coz it's actually LOP.PASS_B
01:22 imirkin: with a "not" modifier on the second arg
01:22 imirkin: there is no straight-up not
01:22 karolherbst: ahh
01:23 imirkin: you need to use moveSources(1, 1) or something
01:24 karolherbst: seems to work
01:27 imirkin: =]
01:27 imirkin:didn't write the nvc0 emitter
01:27 imirkin: actually i haven't written any emitters
01:27 imirkin: nvc0 / gk110 were calim, gm107 was skeggsb
01:28 imirkin: gk110 was a wee bit ... erm ... incomplete though
01:38 karolherbst: some shader-db stats: https://gist.githubusercontent.com/karolherbst/657e8b640a51d7be218635bca5747818/raw/9eeefdb4cfaa6dc606de49e6761e835484da62d8/a.bad_imms
01:41 imirkin: good imm locaitons are better than bad oens ;)
01:42 imirkin: do it for consts too, that could make up the difference
01:42 karolherbst: uhm with imms I meant the const thingies
01:42 imirkin: ok, well you should do it for both imms and consts
01:42 karolherbst: well there are no imms with nir
01:42 imirkin: how do you mean?
01:43 karolherbst: you always have ssa = load_const
01:43 imirkin: ok, so const == imm
01:43 karolherbst: yeah
01:43 imirkin: so what is a const in nir?
01:43 karolherbst: a constant value
01:43 imirkin: right. which is an imm
01:43 karolherbst: right
01:43 imirkin: so what does nir call a const?
01:43 karolherbst: nothing. There is just a load_const intrinsic to set a reg/ssa value to a constant value
01:44 imirkin: we're not communicating
01:44 imirkin: nv50_ir -> nir name mapping: "imm -> const", "const -> ?"
01:44 imirkin: what is the '?'
01:44 imirkin: i.e. what does nir call the thign that in nv50_ir is called a const
01:44 karolherbst: ahh you meant those "DCL CONST[1][0..174]" TGSI thingies?
01:45 imirkin: FILE_MEMORY_CONST
01:45 karolherbst: yeah
01:45 imirkin: what does nir call that?
01:45 karolherbst: it doesn't exist in that way
01:45 karolherbst: there is just load_uniform and load_ubo
01:45 imirkin: ok. so const -> uniform.
01:45 imirkin: ? = "uniform" :)
01:45 imirkin: you should move those as well as the imms.
01:46 karolherbst: ubo == c1+x[], uniform c0[]
01:46 imirkin: right. same diff.
01:46 karolherbst: yeah
01:46 imirkin: move those.
01:46 imirkin: you may get better results.
01:46 karolherbst: they are already near the usages afaik
01:46 imirkin: ah ok
01:47 karolherbst: I think nir just puts the load_const on top of the basic blocks or function for whatever reason
01:47 karolherbst: dunno
01:48 karolherbst: not all load_consts though, but most of them
01:49 imirkin: convenient.
01:50 karolherbst: mhh I think all constants from the glsl shaders are ending up at the top
01:51 karolherbst: the mid function consts are things like & 0x1 for i->b conversions or t/s references
01:52 karolherbst: yeah, dunno. I will look into the differences more deeply tomorrow
01:52 karolherbst: the reduction of local used is nice though
01:53 imirkin: yea
02:04 karolherbst: imirkin: interesting, for pixmark piano alone: -12.99% instructions, +22.81% gprs, +68 local (63 gprs -> spilling), but the benchmark still gives quite the same results
02:05 karolherbst: maybe we can have the best of both :)
02:05 imirkin: well instructions and GPR's tend to go in opposite directions
02:05 imirkin: either you have a ton of values at the same time
02:05 imirkin: which means fewer moves from reg to reg
02:09 karolherbst: right. I just hope there are some obvious opts we can add to codegen still
02:11 imirkin: yeah, we need a basic scheduler
02:12 imirkin: which can decide whether to start retiring values in a bb or not
09:37 pmoreau: karolherbst: https://github.com/pierremoreau/mesa/tree/clover_spirv_series_v6 I’ll send them after lunch; I rebased the other branch as well.
09:40 pendingchaos: lachs0r: oh and don't run with -fbo, that seems to mess some of them up for some reason
11:19 karolherbst: imirkin: yeah.. we need a basic scheduler
11:26 RSpliet: karolherbst: Do we have the ability to calculate live sets as we go? From the top of my head, but correct me if I'm wrong, liveness calculation currently is an all-or-nothing thing. If you want some insn scheduler that's significantly more advanced than the toy I built on top of dboyans code, we might want to recalculate liveness "locally" rather than program-wide.
11:27 karolherbst: well I was walking about a pre RA scheduler anyway here
11:27 RSpliet: I'm sure there's information in the program flow that we can use to be more work-preserving in liveness recalculation :-)
11:27 RSpliet: you (can) have live sets in SSA
11:27 RSpliet: In fact, you should, because they predict how many registers you need...
11:27 karolherbst: right, but there you can just collect values in a set and you know how many live values you have
11:28 karolherbst: where is dboyans code?
11:28 karolherbst: or yours?
11:28 karolherbst: ohh, got it
11:28 RSpliet: Think dboyan's code is here https://github.com/dboyan/mesa/tree/wip/sched2
11:28 RSpliet: Mine https://github.com/RSpliet/mesa/tree/insn_sched
11:30 RSpliet: And yes, the set is called the "live set". There isn't much more to liveness analysis than determining these sets. Moving an instruction over others increases the size of their sets... lifting it over a control flow instruction or branch point could have a bigger impact on live sets
11:31 karolherbst: yeah, but we don't do that in pre SSA scheduling
11:31 RSpliet: So what I was trying to say is "it shouldn't be expensive to update the live sets of a few instructions after making a scheduling decision to better predict the impact on GPR. But with current code we either update all live sets in the program or none at all... or do we?"
11:31 karolherbst: we have to keep instructions inside their BB
11:32 karolherbst: it isn't up to the scheduler to move instructions into different BBs
11:32 karolherbst: think about loops and the scheduler moving instructions from the head into the body of a long running loop
11:32 RSpliet: Then I'm sure live sets can/should be updated just within the BB as well.
11:33 karolherbst: yeah
11:33 karolherbst: we should be able to eliminate useless BBs though
11:33 karolherbst: pre RA
11:33 karolherbst: but that can wait
11:37 RSpliet: Let me just put it like this then: if I had the time to invest in a serious mature (pre-RA) instruction scheduling, the first thing I would look at is LVA (live variable analysis, liveness analysis. Thought I'd drop the abbreviation for future terseness ;-))
11:38 karolherbst: maybe I could work on that... but I guess there are more important tasks right now
11:38 RSpliet: 1) Can we produce live sets in SSA form? 2) Can we update them on a per-BB basis (and what is the effect on the live-in and live-out sets of dependent BBs if any)?
11:39 karolherbst: we only run per BB
11:39 RSpliet: Yeah, there's a million useful things to do and I'm the last one to determine *your* agenda :-) I should perhaps drop this on Trello
11:39 karolherbst: never ever we will move instructions in different BBs in a pre RA scheduler
11:40 karolherbst: so in/out of BBs don't matter here, or do they still?
11:40 RSpliet: karolherbst: Well... I would like to believe this is true for the sake of program correctness. Can we come up with a scheduling decision that alters a live-out set?
11:41 RSpliet: Or would that be an illegal transformation? I honestly don't have the answer to that ready right now. :-)
11:41 karolherbst: it makes things difficult if you keep phi nodes in mind
11:42 karolherbst: so even without instructions a BB can have a in/out set, because on RA you move instructions into that BB to get the phi node resolved
11:43 karolherbst: imagine a case where you have BB:1 heavy calculations BB:2 empty BB:3 empty BB:4 phi with srcs from BB;1
11:43 karolherbst: now on RA you move things into BB:2 and BB:3
11:43 karolherbst: usually you still have bra ops there to reflect an if clause or something
11:44 RSpliet: Yeah its clear that you don't want to move insn past BB boundaries, at least not at first.
11:44 karolherbst: not ever
11:44 karolherbst: second thing is you screw up CFG optimizations
11:44 karolherbst: move instructions into loops
11:44 karolherbst: and what not
11:45 RSpliet: Well, redundancy elimination is a real thing, moving superfluous instructions out of loops
11:45 karolherbst: yeah, but that's not the task of the scheduler
11:45 karolherbst: we should have a different opt pass for that alone
11:45 karolherbst: to keep things sane
11:45 RSpliet: Agreed :-)
11:45 RSpliet: (for reference, in case I wasn't distracting you enough already: http://www.cl.cam.ac.uk/teaching/1617/OptComp/slides/lecture07.pdf )
11:46 karolherbst: that's still with CPUs in mind, no?
11:46 RSpliet: Takes a different kind of analysis than liveness... similar, but different
11:46 karolherbst: first slide already gives that away :p
11:47 karolherbst: on GPUs it might be a smart thing to recompute results to increase the amount of threads ran in parallel
11:47 RSpliet: Technically yes, but I'm not 100% sure why that would matter. Specifically loop-invariant code motion remains relevant
11:48 karolherbst: on CPUs all that stuff is fairly straightforward, because you just need to prevent spilling, allthough even on CPUs you may not even care about spilling that much
11:48 RSpliet: Along the lines of "only do code-motion optimisations if it doesn't push GPR usage beyond the magical boundary (of 32 for Kepler I think) of reduced parallel warps"?
11:49 karolherbst: *63
11:49 karolherbst: well
11:49 karolherbst: 255 on kepler2 even
11:49 karolherbst: ohhh
11:49 karolherbst: parallel warps
11:49 karolherbst: mhh those things are weird though
11:49 karolherbst: I am sure there is not the same level of parallelism between 32 and 63 regs
11:50 karolherbst: or was it below 32 you always get the same amount of threads?
11:51 RSpliet: Not really. I think there's like 256 GPRs per SMx on Kepler. Maximum of 8 warps in parallel (scheduler limit). If GPR usage is 33, you can't have more than 7, 37 -> 6, 43->5 (or something along those lines) etc. etc.
11:51 karolherbst: RSpliet: yeah, it is 32
11:51 karolherbst: uhm
11:51 RSpliet: Makes a huge difference in performance :-)
11:51 karolherbst: 65536 gprs / SM ;)
11:51 RSpliet: per SMx per thread, sorry
11:52 karolherbst: yeah, 64 or 256 per thread on kepler1/2
11:52 karolherbst: well 63/255 technically
11:52 RSpliet: Aren't those simply the ISA limits? Let me rephrase that
11:52 RSpliet: per SMx per SIMD lane
11:53 karolherbst: max regs per thread: 63/255
11:53 RSpliet: Yeah, those are threads in the software sense. Sorry for the confusion
11:54 karolherbst: you always have 65536 per SM and 1024 per thread block (whatever that is)
11:55 karolherbst: ahh
11:55 karolherbst: thread block: threads within a warp
11:55 karolherbst: so yeah, going above 32 might hurt perf
11:56 RSpliet: Thread block == just more confusing terminology I did the quick maths a long time ago.
11:56 karolherbst: mhhh
11:56 karolherbst: but interesting
11:57 RSpliet: 32 is the magical number indeed. Although reducing warps/thread block (or whatever the term is) likely has a smaller penalty than spilling
11:57 karolherbst: you can only have 16 thread block per SM
11:57 karolherbst: but 64 warps
11:58 RSpliet: Isn't it 8 warp schedulers/SMx, max 8 warps/scheduler, 32-threads/warp (wavefront in OpenCL terms)?
11:59 karolherbst: this is on fermi
11:59 karolherbst: I think
11:59 karolherbst: mhh but mhh
11:59 karolherbst: ohha ctually it might make sense
12:00 karolherbst: RSpliet: 4 warp scheduler on kepler
12:00 karolherbst: with two instruction dispatch units for dual issueing
12:02 RSpliet: Dual-issue doesn't change the numbers.
12:04 RSpliet: Meh, I find this hugely interesting to play with. Stupid PhD :-P
12:05 karolherbst: :D
12:20 pmoreau: RSpliet: Remind me, who really wanted to do a PhD? :-p
13:17 imirkin: karolherbst: want to mail your NOT patch?
13:19 karolherbst: ahh, makes sense
13:19 karolherbst: it is untested though :(
13:19 karolherbst: I basically just tested with shader-db, which means untested ;)
13:20 imirkin: testing is all relative.
13:20 imirkin: e.g. you compiled it :)
13:21 karolherbst: right
13:21 karolherbst: well the path where the changes trigger was broken before anyway
13:21 karolherbst: so it can't get worse...
13:23 imirkin: exactly.
13:23 imirkin: i think the deal is that a bare "not" is pretty rare to see
13:23 imirkin: so not + predication is ... even rarer
13:24 imirkin: usually it gets folded as a modifier
13:24 karolherbst: yeah
13:24 imirkin: coz who just does a bitwise not and then not use it in an "and" or "or"
13:24 karolherbst: imirkin: did you checked the limms patches?
13:24 imirkin: fk
13:24 imirkin: probably not.
13:26 karolherbst: imirkin: mhh interesting
13:26 karolherbst: 38: $p0 not s32 $r1 $r7 (8) 39: $p0 and u32 $r1 $r1 0x3f800000 (8)
13:27 karolherbst: wondering why that doesn't get mod folded in
13:27 karolherbst: s32 vs u32?
13:27 imirkin: limm
13:28 imirkin: (maybe?)
13:28 karolherbst: uhh, I doubt it
13:28 karolherbst: tgsi: 39: $p0 and u32 $r0 not $r7 0x3f800000 (8)
13:28 karolherbst: but with tgsi the not was u32 :)
13:29 karolherbst: I am sure there is a dtypea == dtypeb with the assumption those biops are never signed
13:29 karolherbst: nir only has a inot opcode and no unot anyway
13:29 karolherbst: so maybe I just force it to be unsigned
13:29 karolherbst: shouldn't matter for a not, should it?
13:29 imirkin: should always be unsigned yea
13:30 karolherbst: I forced imul to be unsigned as well anyway
13:30 karolherbst: there wasa some 64 bit bug or something
13:31 karolherbst: imirkin: yeah, that fixed it
13:32 karolherbst: ahh yeah. If i->sType != mi->dType continue :)
13:34 karolherbst: imirkin: also I have an idea about a basical "we move isntruction nearer to their defs"-pass: 1. move only if all defs inside same BB 2. only move if insn->bb->successors are not reachable by target BB?
13:37 karolherbst: the BB reachable check is there for not moving things deeper inside a loop
13:37 karolherbst: I don't know if we are able to get the "nearest" BB outside a loop to a given BB
13:47 karolherbst: "total instructions in shared programs : 5248707 -> 5248665 (-0.00%)" mhh
14:00 imirkin_: karolherbst: all bit ops are uint
14:00 imirkin_: by convention
14:00 imirkin_: signedness doesn't matter for them
14:01 karolherbst: yeah, I know
14:01 imirkin_: so like and, not, or, etc
14:01 karolherbst: random thought for an op: https://gist.githubusercontent.com/karolherbst/272b4a8055e64d69e7496a650764cc3d/raw/a0b55817981910308a062e770a1a331a225b371c/gistfile1.txt
14:01 imirkin_: it doesn't really matter *which* way it goes, as long as it all goes the same way
14:01 imirkin_: hehehe
14:01 imirkin_: do algebra. fancy.
14:01 imirkin_: this is aka tree balancing, i think
14:02 karolherbst: I wouldn't put it into algebraic opt anyway
14:02 karolherbst: because we have a different approach here
14:02 karolherbst: allthough...
14:02 karolherbst: this opt doesn't hurt if you have only one mul pair, right?
14:03 imirkin_: it's not a peephole opt
14:03 imirkin_: not that there's anything wrong with that
14:03 imirkin_: i think there's some amount of that in glsl, not sure
14:03 imirkin_: iirc mattst88 was playing with something of the sort at one point or another
14:03 karolherbst: why isn't it a peephole opt?
14:03 imirkin_: it relies on more global info
14:03 karolherbst: mhh
14:04 imirkin_: no law against sticking that into nv50_ir_peephole.cpp
14:04 imirkin_: but just sayin' :)
14:04 karolherbst: but I think this one is pretty simple
14:05 karolherbst: you search for patterns like mul(mul(a, b), c) => mul(a, mul(b, c)) .... uhhh yeah, I see the problem
14:06 karolherbst: so that opt only makes sense if b, c are more often used in the same mul patterns than a,b
14:06 karolherbst: something like that
14:08 karolherbst: imirkin_: but I think we really want to have an opt which moves things closer to the uses to reduce gpr usage
14:08 karolherbst: even if it's just 0 reg src instructions we move
14:08 karolherbst: like loads
14:09 karolherbst: without indirects
14:09 imirkin_: ok
14:09 imirkin_: so you want a sequence like
14:09 imirkin_: mov $r0 c0[0]
14:09 imirkin_: use($r0)
14:09 imirkin_: mov $r0 c0[0]
14:09 imirkin_: use($r0)
14:09 imirkin_: instead of
14:09 imirkin_: mov $r0 c0[0]
14:09 imirkin_: use($r0); use($r0) ?
14:09 karolherbst: no
14:10 imirkin_: [you see how it gets tricky though?]
14:10 karolherbst: if you ld something in a BB, but you consume the result in a different BB
14:10 imirkin_: yeah, it can still end up in stupidity
14:10 imirkin_: but we could play with copying loads to the relevant bb
14:10 imirkin_: and see if that improves things
14:10 imirkin_: my guess is that it won't on average
14:10 karolherbst: well one problem is
14:10 karolherbst: we have to be careful about loops
14:11 karolherbst: and always stick them to the top of the bb
14:11 karolherbst: not above the use
14:11 imirkin_: uh huh.
14:11 karolherbst: and later if we can merge/eliminate BBs + pre SSA scheduler this should give us some nice code
14:11 imirkin_: lots of ifs-and-buts
14:12 karolherbst: yeah, not saying it is easy
14:15 karolherbst: uhm this is kind of silly: https://gist.github.com/karolherbst/871bba2920c694ad82fe845f6e2948ff
14:15 karolherbst: but I guess this is just how spilling works?
14:17 karolherbst: imirkin_: mad(a, imm0, imm1) => mul(a + imm1 / imm0, imm0) ?
14:17 karolherbst: uhh wait
14:17 karolherbst: that doesn't help
14:17 imirkin_: which helps you ... how?
14:18 karolherbst: mhh
14:18 karolherbst: I have an example with mad(a, 2, -1) :(
14:19 karolherbst: I guess there is no subop magic we could use to turn a mov+mad into a single op?
14:25 karolherbst: 423: st u32 # l[0x24] $r14 (8) 424: ld u32 $r14 l[0x24] (8)
14:25 karolherbst: ...
14:25 karolherbst: imirkin_: where would you fix that issue?
14:25 karolherbst: while doing the spills or having some clean up running affter it?
14:27 imirkin_: that's a result of spilling doing dumb shit
14:27 imirkin_: solution is to make spilling Not Do That (tm)
14:28 imirkin_: it already tries not to
14:28 imirkin_: but i think it can get tripped up sometimes?
14:28 karolherbst: I see this issue quite often though here in that shader
14:29 karolherbst: obviously the issue is when some of the uses are quite far away so you have to use lmem, but other uses are quite near and don't need it
14:36 karolherbst: imirkin_: uhh I have still this nice patch: https://github.com/karolherbst/mesa/commit/d445d48749613219035ecf3805545a812a9e4c3d
14:36 karolherbst: need to finalize it
14:38 imirkin_: erm
14:39 imirkin_: i don't get it.
14:39 imirkin_: also getImmediate won't work
14:39 karolherbst: mad post ra limm form
14:40 karolherbst: only works if dest == src2 reg
14:40 karolherbst: this code just enforces that in RA
14:40 imirkin_: ok, so
14:40 imirkin_: (a) only do this if it's a limm
14:40 imirkin_: (b) only add the cond for nvc0+
14:40 imirkin_: (c) getImmediate won't work
14:41 karolherbst: the cond is already there and works for pre nvc0
14:42 imirkin_: i meant the imm-based cond
14:42 imirkin_: since that's irrelevant for nv50
14:42 karolherbst: ahh
14:42 karolherbst: right
14:42 imirkin_: and instead of getImmediate() check if the src is a MOV with an immediate src.
14:42 imirkin_: if it's anything else, bail
14:43 imirkin_: er actually crap
14:43 imirkin_: even that won't work
14:43 karolherbst: yeah..
14:43 imirkin_: basically if there are multiple defs, bail
14:44 imirkin_: otoh
14:44 imirkin_: getting the wrong answer here
14:44 imirkin_: doesn't make anything horribly worse
14:44 karolherbst: hopefully, yeah
14:44 imirkin_: so maybe it's OK, if you add an appropriate comment
14:44 karolherbst: so just do it unconditionally and see how that works out?
14:44 karolherbst: it is only done for mads anyway
14:45 karolherbst: and because the def reg == src2 reg the value isn't used later anyway
14:45 karolherbst: so we don't change anything seriously
14:45 imirkin_: yeah, it's fine
14:45 imirkin_: even if getImmediate lies
14:45 imirkin_: but there should be a comment about it
14:45 karolherbst: in what cases will it lie?
14:45 imirkin_: since getImmediate will NOT necessarily provide the correct answer
14:45 imirkin_: wel
14:45 imirkin_: if there's a merged node
14:45 imirkin_: like a phi
14:46 imirkin_: whose sources are immediates
14:46 karolherbst: mhh
14:46 imirkin_: when those values get merged
14:46 karolherbst: that means getImmediate goes through phis?
14:46 imirkin_: it throws the other things into the defs list
14:46 imirkin_: not normally
14:46 imirkin_: but post-RA it might
14:46 karolherbst: mhh
14:46 karolherbst: getImmediate only follows MOVs afaik
14:46 imirkin_: uh huh
14:47 karolherbst: ../src/gallium/drivers/nouveau/codegen/nv50_ir.cpp:103
14:47 imirkin_: you're looking at this like a logical person
14:47 karolherbst: if (mov) continue, else not
14:47 imirkin_: right.
14:47 imirkin_: my poit is
14:47 imirkin_: it might be a mov :)
14:47 imirkin_: even if it's a phi
14:47 imirkin_: coz of the merging
14:47 karolherbst: ohhhhh
14:47 karolherbst: I see
14:47 imirkin_: the lval->defs contains the full list of definitions
14:48 karolherbst: but yeah, we only change the preference here, so it should be fine
14:48 imirkin_: there are other ways for things to get merged too btw
14:48 imirkin_: exactly
14:48 karolherbst: okay
14:48 karolherbst: changes with newest shaders: https://gist.githubusercontent.com/karolherbst/bc791f5ffb3ccb2fcd89ff45fe182b90/raw/38a6fb8b6dcb9fa60bf18a6cf2570afe1972bf2e/gistfile1.txt :)
14:48 imirkin_: coolio
14:49 karolherbst: it also affects perf in pixmark_piano quite a lot.. let me benchmark again
15:07 karolherbst: oh well it doesn't mater much there, because it is just 12 MOV instructions inside a 3700 instruction shader
15:14 karolherbst: imirkin_: ohh I might want to exclude 64 bit types
15:15 imirkin_: no limm form there? :)
15:15 karolherbst: at least I don't see any restriction inside the maxwell emiter
15:15 imirkin_: there's no limm form at all
15:15 imirkin_: can't encode a 64-bit immediate into a 64-bit instruction
15:16 karolherbst: right
15:16 karolherbst: makes sense :D
15:17 HdkR: Well if the constant fits within the encoding space you can still sort of cheese it :P
15:18 imirkin_: there are short immediates too
15:18 imirkin_: 12 or 20 bits, i forget which
15:18 HdkR: woo, shorter immediates
15:18 imirkin_: for floats, it's the high bits, for ints it's the low bits
15:18 HdkR: Makes sense
15:18 imirkin_: for the 0x3f800000's and 0x40000000's of the world
15:20 HdkR: Does anyone really need any other encoded float values? ;)
15:20 karolherbst: imirkin_: do you know why we allow it for OP_SAD though?
15:20 karolherbst: we don't loadpropagate for OP_SAD
15:20 imirkin_: karolherbst: why we allow what?
15:20 karolherbst: that reg preference
15:20 imirkin_: the existing code is for nv50
15:20 imirkin_: has nothing to do with immediates.
15:20 karolherbst: ohh, nv50 always needs this?
15:21 imirkin_: i think for the short encoding
15:21 karolherbst: ahh
15:21 karolherbst: makes sense
15:22 imirkin_: i don't remember how nv50 encodings work tbh
15:22 imirkin_: might be dual-purpose
15:22 imirkin_: i.e. for short encodings, or if that's not possible, for imms
15:22 imirkin_: i added this like 3 years ago
15:22 imirkin_: my memory of it is slightly faded :)
15:23 imirkin_: RSpliet started something with a post-ra pass
15:23 imirkin_: i added somethign to make it happen more often...
15:24 imirkin_: right. so it's needed for immediate in mad's.
15:25 imirkin_: but also to have mad with short encoding.
15:25 imirkin_: so it's dual-purpose
15:25 karolherbst: imirkin_: what is this "insn->flagsDef" all about?
15:25 imirkin_: .CC
15:25 karolherbst: ahh
15:26 imirkin_: (there's 4 of them on nv50 though)
15:26 HdkR: ick, CC :P
15:29 karolherbst: imirkin_: can we actually load a full 32 bit immediate into a nv50 mad?
15:30 karolherbst: I don't know if the postraloadpropagation pass is correct for nv50
15:31 RSpliet: karolherbst: From the top of my head: yes. But I *think* the constraint is that src2 == dst1 , otherwise the opcode can't be encoded in 32 bits - a requirement for imms
15:31 karolherbst: yeah, I know
15:31 karolherbst: I wasn't sure about the 32 bit imm
15:32 karolherbst: okay, so that part is fine
15:33 RSpliet: Also, do double check the following: short opcodes have one bit fewer to encode register, so cannot be used for the high half. ... There's something in the back of my head like this, but I might well be confused with a completely different ISA :')
15:34 RSpliet: Should be easy enough to verify from nvadisasm
15:35 karolherbst: "karolherbst committed on Sep 4, 2016" :D
15:36 karolherbst: https://github.com/karolherbst/mesa/commit/1ec56c1e654244dd4eed74459a18f69fe2ca6b20
15:36 karolherbst: I think I did nothing wrong here
15:37 karolherbst: maybe I should add NVISA_GF100_CHIPSET 0xc0 define
15:40 imirkin_: karolherbst: i don't remember
15:40 imirkin_: there already is a NVISA_GF100_CHIPSET ;)
15:40 imirkin_: (or was)
15:41 karolherbst: I think there was one
17:10 karolherbst: imirkin_: send out the ra patch for mad
17:33 xexaxo1: pmoreau: clover is autotools/meson only - no scons/android
17:33 pmoreau: xexaxo1: Ah okay, cool! No need to learn yet another build system. :-)
17:35 mercury^: Hi. When gnome-greeting or epiphany play a video (I think they use gstreamer), my computer freezes after a few seconds. I am running wayland. I do not know for sure whether the issue is with nouveau, but that is my best guess. I do not know how to diagnose the issue further, as the system is completely locked up. The card is an 8800 GTS.
17:39 karolherbst: imirkin_: I found this golden commit: https://github.com/karolherbst/mesa/commit/11a60b242c0280ad8e1b1dd68ce340f267487758
17:39 pmoreau: mercury^: Which kernel version do you have, and is this new or not?
17:40 imirkin_: mercury^: is the 8800 GTS an actual G80 or is it a G92 variant?
17:40 imirkin_: (you can tell with lspci -nn -d 10de: )
17:40 mercury^: pmoreau: I just installed a system on this computer again after it was in storage for around 8 years. I did not use nouveau on it before.
17:40 imirkin_: karolherbst: cool. that seems generically reasonable.
17:41 karolherbst: yeah, but really only happens in feral ported games
17:41 karolherbst: I think
17:41 mercury^: Kernel version is 4.15.10
17:41 karolherbst: I will dig into it
17:41 karolherbst: that increase in local though
17:41 mercury^: imirkin_: it's a G80.
17:42 pmoreau: mercury^: Hum try .11 or .12, can’t remember when a fix went in. One sec
17:42 imirkin_: mercury^: ok. so the "funny" thing about G92 is that vdpau accel causes hangs
17:42 imirkin_: but G80 didn't have a vdpau-supported engine in the first place
17:42 mercury^: Oh, by the way, firefox plays videos fine (using ffmpeg I believe).
17:43 imirkin_: however something in that acceleration pipeline could still be doing something weird
17:43 imirkin_: do you see anything in dmesg when this happens?
17:43 imirkin_: (can you ssh in from another box?)
17:43 pmoreau: imirkin_: I’m wondering if it isn’t the bug that was fixed by the ALIGN_DOWN patch
17:43 imirkin_: could be.
17:43 pmoreau: I don’t think the fix is in .10
17:43 mercury^: imirkin_: I did not try it yet, basically just finished the installation.
17:43 imirkin_: but G80 is weird for lots of reasons
17:43 karolherbst: imirkin_: affected games: | grep -v -e f1_2015 -e hitmanpro -e dirt_rally -e tomb_raider -e mad_max
17:43 imirkin_: like ... only linear objects allowed in sysram
17:44 karolherbst: but I am sure the difference for those is insane
17:44 imirkin_: and i dunno if skeggsb tested his VM stuff on a G80
17:45 imirkin_: mercury^: if it's easy, try a 4.14 kernel too
17:45 pmoreau: mercury^: The fix I’m thinking of is in 4.15.12, so if you could try that or 4.16-rc6, and it still doesn’t work, that would rule that out. Or if 4.14 doesn’t work either, as imirkin_ suggests.
17:46 mercury^: It's an ostree system (my first), I have no idea yet if I even can replace the kernel. :s
17:52 mercury^: I will try to find he kernel version it had before I ran an update, maybe that is older than 4.14;
17:54 imirkin_: mercury^: one thing to note is that G80 has some extra-special oddness to it, and so it's less well supported than the rest of the family
17:54 imirkin_: that said it *should* work
17:56 karolherbst: imirkin_: massive: https://github.com/karolherbst/mesa/commit/ba8adc2210b53e5c3e6c581b0c3776febe7778f8
17:57 imirkin_: seems like 2 unrelated things
17:58 karolherbst: yeah
17:58 karolherbst: I am currently thinking about the first one. slct(-1, 0, a) -> set(a, 0)?
17:58 imirkin_: also i'd be lying if i said i understood the second chunk
17:58 imirkin_: i also don't remember how slct works =]
17:59 imirkin_: but might want to be careful with u32 vs f32 slct
17:59 karolherbst: c compare against 0 ? a : b
18:00 karolherbst: -1 was true?
18:00 karolherbst: or was it false?
18:00 karolherbst: integer version
18:00 karolherbst: I am sure I will never remember that ever
18:00 imirkin_: so you've got a set() ? a : b thing
18:01 imirkin_: and if the set is a comparison against 0
18:02 karolherbst: slct always compares against 0 though, so we can always do a slct -> set(a, 0) out of it
18:02 imirkin_: so set(c == 0) ? a : b -> c ? a : b
18:02 karolherbst: the result of the slcts matter here
18:02 imirkin_: right, makes sense.
18:02 karolherbst: yeah
18:02 karolherbst: but I never know if -1 int is false or true :)
18:02 karolherbst: it looks like -1 is true looking at the code
18:02 imirkin_: true.
18:03 imirkin_: remember it like this -- 0 is always false.
18:03 karolherbst: ahh
18:03 karolherbst: right
18:03 imirkin_: by process of elimination, the other must be true ;)
18:03 karolherbst: makes sense
18:03 karolherbst: :)
18:03 imirkin_: + Instruction *o = slct->getSrc(2)->getInsn();
18:03 imirkin_: careful with stuff like that
18:03 imirkin_: it might pan out in practice
18:04 imirkin_: but note that not all sources have insn's
18:04 karolherbst: ohh right
18:04 imirkin_: only LValue's
18:04 imirkin_: but not Symbol's
18:04 karolherbst: I want to concentrate on the first opt anyway
18:04 imirkin_: or ImmediateValue's
18:04 imirkin_: not that they should appear there
18:04 imirkin_: but ... yeah, dunno.
18:05 imirkin_: also, should probably check if EITHER of the set's srcs are 0
18:06 karolherbst: huh?
18:06 karolherbst: I have to check if it's true : false or false : true
18:06 karolherbst: and in the second case just inverse the condition
18:08 karolherbst: imirkin_: can we have a slct with a 64 bit c?
18:08 karolherbst: ohh
18:09 imirkin_: don't think so
18:09 karolherbst: c is aways 32, but a and b can be 64 bit values
18:09 karolherbst: but
18:09 imirkin_: shouldn't be
18:09 karolherbst: this is lowered away..
18:09 imirkin_: but perhaps there's a lowering thing that fixes it up
18:09 imirkin_: + } else if (a->data.u32 == 0xffffffff && b->data.u32 == 0x0) {
18:09 imirkin_: 
18:09 imirkin_: should check it both ways
18:09 karolherbst: right
18:10 imirkin_: oh wait. no.
18:10 karolherbst: but the other way gets an inveresed/the_other_word cc
18:10 karolherbst: ;)
18:10 imirkin_: right.
18:10 imirkin_: easy enough to flip though, no?
18:10 imirkin_: EQ <-> NE
18:10 imirkin_: iirc it's just an xor away
18:10 imirkin_: there's a invertSomething thing in there
18:10 karolherbst: I am sure you can also do lt and whatnot
18:10 karolherbst: yeah
18:10 karolherbst: have to use the correct function
18:10 imirkin_: yeah
18:11 imirkin_: there's 2 and they do similar-but-different things
18:11 karolherbst: :)
18:11 karolherbst: exactly
18:11 karolherbst: I think no shader was affected byt the false:true case, that's why I left it out
18:11 imirkin_: think about the f32 compare with like le
18:11 imirkin_: that this would still work for
18:11 imirkin_: hehe
18:11 imirkin_: it's nice to be complete.
18:11 karolherbst: anyway, I've added a !isFloatType(i->dType) :)
18:12 imirkin_: mmmm
18:12 imirkin_: but it's fine
18:12 karolherbst: sure?
18:12 imirkin_: if you have a c < 0.0 ? 0xffffff : 0x000
18:12 imirkin_: can still go into a SET
18:12 karolherbst: imagine somebody is crazy doing a float = a > 0 ? 0xffffffff : 0x0
18:12 imirkin_: that's different.
18:12 karolherbst: ohhhh
18:13 karolherbst: isImmediate checks the type, no?
18:13 karolherbst: ohh wait, I don't use it here
18:14 karolherbst: imirkin_: but don't I have to do a int set if the slct was a f32 one?
18:16 karolherbst: I mean it can be a slct f32 d 0xffffffff 0x0 u32 c for example
18:18 karolherbst: so mhh the dType of the slct isn't important, but I need to fix it up for the set
18:21 karolherbst: imirkin_: slct(0, -1, c) -> iset(0, c) ?
18:21 karolherbst: that should work out without correcting the condcode
18:24 karolherbst: except eq...
18:26 imirkin_: not necessarily iset
18:26 imirkin_: could be fset
18:26 karolherbst: how?
18:26 karolherbst: fset won't return 0xffffffff or would it?
18:27 imirkin_: it will
18:27 imirkin_: fset vs iset is the diff between whether the comparison is integer or float
18:27 karolherbst: okay right sure, but I talk about the dType here
18:28 karolherbst: I will keep the sType alone, because that's a thing I can't change here
18:28 karolherbst: but what if the slct has F32 for both?
18:28 imirkin_: slct can't have a dtype
18:28 karolherbst: wouldn't it be correct to change the dType to U32 for a set or doesn't it matter?
18:28 karolherbst: okay right, it is basically a b32
18:28 imirkin_: yes
18:28 imirkin_: but for a set it matters
18:29 karolherbst: right
18:29 imirkin_: u32 means -1/0
18:29 imirkin_: f32 means 1.0/0.0
18:29 karolherbst: I am not sure if "i->dType = TYPE_U32;" is 100% needed, but for being safe I add it
18:40 mercury^: pmoreau, imirkin_: the kernel version before the update, which also had the problem, is 4.13.9;
18:40 imirkin_: ok
18:41 mercury^: Does that rule out what you had suspected?
18:42 karolherbst: "set u32 $r44 lt f32 $r63 $r29" + "slct u32 $r44 eq $r63 0xffffffff $r44" right, that was the reason for that opt
18:42 mercury^: It is tagged -300.fc27 though, and the release was around the time that 4.14 happened. I am wondering if it included the relevant changes already.
18:43 imirkin_: so it's not the new vm stuff...
18:43 imirkin_: could be something else though :)
18:43 feaneron: what does "ctor" stands for in nouveau's code?
18:44 karolherbst: constructor
18:44 feaneron: aha, thanks
19:20 mercury^: pmoreau, imirkin_: here is a log https://pastebin.com/FrzDPK3j
19:31 karolherbst: imirkin_: what can I do when I don't want to add new MOVs or NOTs inside ConstantFolding?
19:31 karolherbst: wrong opt pass I guess
19:32 karolherbst: imirkin_: "set != 0 ? -1 : 0 -> set" is this more like an algebraic opt?
19:42 imirkin_: mercury^: right ... the 406040 issue. that one plagues all of tesla...
19:42 imirkin_: no clue what causes it. try to do less GL-intensive stuff :)
19:44 imirkin_: mercury^: you'll want to stay away from the KDE's and GNOME's of the world.
19:45 mercury^: :(
19:46 imirkin_: or use blob drivers
19:46 mercury^: Is there any specific thing the applications do that causes it?
19:47 mercury^: As it is triggered by video playback through gstreamer, but not ffmpeg.
19:47 imirkin_: they use OpenGL
19:47 imirkin_: for things that they have no business using OpenGL for
19:48 mercury^: Is it OpenGL specific then or would it also affect Vulkan?
19:48 imirkin_: no vulkan with nouveau
19:48 imirkin_: it's related to using hw acceleration
19:49 imirkin_: nouveau doesn't offer the level of reliability s.t. it can be used for acceleration willy nilly
19:49 imirkin_: so all these applications end up poking the proverbial bear
19:49 imirkin_: eventually you end up with some fail
19:52 mercury^: Is there some kernel option that makes nouveau stop advertising everything that it cannot reliably do, so that the applications will not try to?
19:52 imirkin_: sure. remove nouveau_dri.so -- that'll remove OpenGL support.
19:53 imirkin_: you should still get X-based acceleration, which is pretty bullet-proof
19:53 mercury^: Will that work with Wayland also?
19:53 imirkin_: yes and no
19:53 imirkin_: yes, it will cause wayland to stop using GL
19:53 karolherbst: imirkin_: we don't have a and(set.u32.X, 1.0) -> set.f32.X pass yet?
19:53 imirkin_: no, wayland has no other way of creating acceleration
19:54 imirkin_: karolherbst: wtf does SET.X do? (i think i knew once. but i forget.)
19:54 karolherbst: uhm
19:54 karolherbst: I meant set dtype = u32 stype = ?
19:54 imirkin_: or you mean SET_AND and so on?
19:54 imirkin_: oh wait. i see what you mean. we have that.
19:55 karolherbst: doesn't seem to trigger though
19:55 imirkin_: https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp#n1265
19:55 imirkin_: check the src->asCmp() case.
19:56 karolherbst: mhhh
19:57 imirkin_: mercury^: unfortunately there aren't any wayland compositors that can accelerate using anything but GL
19:57 imirkin_: (there also aren't really any APIs out there that would allos such compositors to be built...)
19:58 imirkin_: wayland is heavily tied into the dma-buf concept too
19:59 imirkin_: mercury^: i suspect things would be considerably more stable if *only* the compositor used GL
19:59 imirkin_: not sure if there's a way to teach a compositor to not "pass through" GL to the clients
20:00 mercury^: imirkin_: Yes, I have only had the problem with gstreamer so far, the compositor itself runs fine.
20:00 karolherbst: imirkin_: "cmp->getDef(0)->refCount() > 1" :(
20:00 imirkin_: karolherbst: if refCount > 1, can't flip the cmp's dType
20:00 karolherbst: right
20:00 karolherbst: but that was the reason something didn't work here
20:01 karolherbst: the other refs were dead code
20:01 imirkin_: yeah, that's unfortunate.
20:01 imirkin_: perhaps we should run dce more often.
20:01 karolherbst: well
20:01 karolherbst: the dead code was produced inside ConstantFolding :)
20:01 karolherbst: I think I should work on running opts inside a loop again and if we have that, we should cover most of those things already
20:02 imirkin_: boo.
20:11 pendingchaos:pokes lachs0r
20:11 pendingchaos: lachs0r: can you run the second revision of the tests (without -fbo)?: https://github.com/pendingchaos/piglit/tree/nv_conservative_raster_v2_rc1
20:13 karolherbst: imirkin_: that slct(set) magic thing gave me like -0.00% instructions. So I think I will drop that idea for now
20:13 imirkin_: pendingchaos: why does -fbo break it? should work no?
20:14 pendingchaos: mostly because -fbo creates a RGBA framebuffer, so the alpha mismatches on piglit_probe_pixel_rgba
20:14 pendingchaos: after fixing that, only one subtest fails
20:14 pendingchaos: I don't know why
20:17 lachs0r: pendingchaos: https://0x0.st/sBDX.txt
20:19 pendingchaos: looks good
20:19 pendingchaos: thanks
20:19 pendingchaos: oddly, the compile then execute display list one succeeded this time
20:23 lachs0r: fwiw, causing nouveau to freeze is easy, but it’s much harder to make it hardfreeze the entire system than with nvidia’s proprietary driver (it usually recovers if you kill the offending application)
20:23 lachs0r: been running this system on it for a few days
20:24 orbea: lachs0r: hardfreezing the system with nouveau is easy :P
20:24 lachs0r: yeah, just run piglit :P
20:24 orbea: heh
20:24 lachs0r: nvidia on the other hand does not require any specific action
20:24 lachs0r: it just happens sometimes
20:25 orbea: <-- should of bought amd
20:25 lachs0r: same
20:26 lachs0r: it’s quite ironic, I was one of those “AMD sucks on linux and will always suck on linux” folks back in the day
20:26 karolherbst: I will start an anti campaign to imirkin_s "buy AMD"-campagin: "buying AMD kills Nouveau" :O
20:28 lachs0r: nvidia likes to shit the bed some time after I toggle a fourth display (which happens to be a projector I use for watching movies. last thing I want to happen is the driver going apeshit while I’m doing that with friends)
20:29 imirkin_: karolherbst: good riddance :p
20:29 karolherbst: with some rewritten slct opts: "total instructions in shared programs : 5894114 -> 5879950 (-0.24%)" :)
20:32 lachs0r: nouveau doesn’t appear to have that problem, but performance on gtx 960 is too low for placebo-quality video playback with mpv :(
20:41 imirkin_: have you tried diff output options?
20:41 imirkin_: i.e. xf86-video-nouveau and xv output?
20:42 orbea: lachs0r: how about mplayer?
20:42 lachs0r: is this 2001
20:43 imirkin_: you want something that performs well, or something that has fancy hotness written all over it?
20:43 orbea: it works well most of the time :)
20:43 lachs0r: I can just use lower-quality scaling
20:43 lachs0r: nouveau just can’t handle fancy stuff like EWA lanczos yet
20:44 imirkin_: does that truly matter?
20:44 lachs0r: it does, actually, especially for large scale factors
20:44 imirkin_: when you say large, you mean ... ?
20:44 lachs0r: say SD (DVD) to 1080p
20:45 imirkin_: so like 2.5x?
20:45 lachs0r: yeah
20:45 imirkin_: anything over 2 is gonna suck probably, yea
20:46 imirkin_: i dunno. i watch dvd-sized content on my 1920x1200 monitor just fine.
20:46 imirkin_: and i tend to be picky about these things.
20:46 lachs0r: well once you go full placebo…
20:46 imirkin_: :)
20:47 lachs0r: that said, intel stuff can’t really handle EWA either
20:47 orbea: tbh, I seen amd and intel users complain about mpv too
20:47 lachs0r: not enough bandwidth I assume
20:48 lachs0r: AMD should work unless it’s really old or involves hardware decoding :D
20:48 orbea: i suspect mpv makes code based on how they expect the world to work and not how it actually works...
20:48 orbea: and nvidia is permissive enough to not care :P
20:49 lachs0r: well the shaders are fine afaik. supports ANGLE and there’s renderer abstraction for d3d11 and vulkan, too. it’s known to work on raspberry pi garbage, too
20:50 lachs0r: I don’t think it’s an issue of nvidia being too permissive at least
20:51 lachs0r: macOS opengl stack works, too (just has unrelated issues with macOS being garbage and having performance regressions because apple stopped caring about opengl support)
20:51 orbea: i didn't say that, just that when you only test nvidia its easy to miss things
20:52 lachs0r: true, but that’s not been happening
20:53 lachs0r: 90% of the time it’s an issue with hardware decoding
20:54 orbea:shrugs, mplayer never crashed my whole system
20:55 orbea: actually, it did once due to some kernel regression (now fixed)
20:55 lachs0r: mpv hasn’t crashed mine either
20:56 lachs0r: that was entirely xorg’s fault (memory leak in the intel ddx, but luckily that’s gone in favor of modesetting now)
20:57 lachs0r: pretty sure it’d still happen if I installed that ddx, just like libreoffice causing display corruption and xorg crashes
21:43 pendingchaos: imirkin_: I'm thinking of moving the conservative rasterization state setting into a macro
21:43 pendingchaos: currently, it expands nvc0_rasterizer_stateobj::state by 7+NVC0_MAX_VIEWPORTS entries (resulting in an increase of 23 entries)
21:43 pendingchaos: does the macro thing sound good?
21:43 imirkin_: not sure i understand what you mean by that
21:45 pendingchaos: macro = PGRAPH macro, entry (element would be a better word) = nvc0_rasterizer_stateobj::state[...]
21:45 pendingchaos: does that clear it up?
21:46 imirkin_: you mean the SB_* stuff?
21:46 pendingchaos: yeah
21:46 imirkin_: ok. coz pgraph supports other kinds of macros (check the 'mme' directory)
21:47 pendingchaos: I don't think com9097.mme compiles with currently envytools btw
21:47 imirkin_: it doesn't
21:47 imirkin_: need to comment out a line.
21:47 imirkin_: [in macro.c]
21:50 pendingchaos: what's the meaning of "com9097" btw? is com9097.mme.h the file I should put the macro in?
21:50 imirkin_: 9097 is the gf100 3d class
21:50 imirkin_: com ... dunno. "it was like that when i got there"
21:51 imirkin_: anyways, yeah, macro is fine.
22:34 karolherbst: imirkin_: uhh, I sent out the wrong patch...
22:35 imirkin_: which one
22:36 karolherbst: "nv50/ir: optimise slct(t, f, set) to mov(set) or not(set)"
22:36 imirkin_: ah. oops
22:36 karolherbst: the new version looks much better :)
22:36 karolherbst: https://lists.freedesktop.org/archives/mesa-dev/2018-March/190317.html
22:37 karolherbst: still the local problem
22:37 karolherbst: but that's really just due to bugs inside the spillcodeinserter
22:39 imirkin_: inverseCondCode - is that the right one? i think so... what does it return for LE?
22:39 karolherbst: GT
22:39 karolherbst: cc ^ 7
22:39 imirkin_: ok cool
22:39 imirkin_: there's another one that would have returned GE
22:40 karolherbst: LE = 3 ^ 7 = 4 = GT :)
22:40 imirkin_: reverseCondCode maybe?
22:40 karolherbst: yeah
22:40 karolherbst: but that code is just terrible
22:40 imirkin_: ;)
22:40 karolherbst: comments would have been nice
22:40 karolherbst: :D
22:41 karolherbst: but that increase in locals used still annoys me
22:42 karolherbst: but those kind of issues are fixed if we have a decent enough scheduler, because we should get a less random change in GPR usage
22:49 imirkin_: karolherbst: btw, how about a review (or T-b) for https://patchwork.freedesktop.org/patch/212782/
22:51 karolherbst: imirkin_: https://gist.github.com/karolherbst/68fc98244bb7399432ff18baa2619b4d here I got local +4 check out the diff
22:51 karolherbst: and this is the smallest shader with a change in locals
22:51 karolherbst: the other shaders are like 4k instruction compute shaders
22:52 karolherbst: imirkin_: yeah, a second
22:52 imirkin_: errr ... i don't get it. there are no locals in those shaders
22:54 karolherbst: this is pre RA
22:54 imirkin_: oh
22:54 karolherbst: just to show that those changes shouldn't increase local usage
22:54 karolherbst: no matter how you look at it
22:55 karolherbst: I guess it is just random noise
22:56 imirkin_: yeah
22:56 karolherbst: that shader needs 92 gprs on maxwell
22:57 karolherbst: 92 after opt
22:57 karolherbst: ...
22:57 karolherbst: maybe I should start posting those tats for multiple archs...
22:58 imirkin_: pendingchaos: iirc this is the line you have to comment out:
22:58 imirkin_: { 0x00000001, 0xfffff87f, N("parm"), REG1 }, // SC
22:58 imirkin_: but i don't remember exactly
22:58 imirkin_: one of the "SC" lines :)
22:58 imirkin_: { 0x00000001, 0xffffc007, T(dst), REG2 }, // SC
22:58 imirkin_: maybe that one
23:01 pendingchaos: imirkin_: thanks
23:17 pendingchaos: doesn't seem to fix the segmentation faults I'm getting? it's simple to workaround them though
23:17 pendingchaos: (just change "send $rn" -> "send (or $rn 0)")
23:18 imirkin_: oh right. the send.
23:20 imirkin_: riiight. it gets confused between like
23:20 imirkin_: send reg1
23:20 imirkin_: and send reg2
23:20 imirkin_: { 0x00000040, 0x00000770, N("send") }, // SC
23:20 imirkin_: i think if you kill that one
23:21 imirkin_: then there's no ambiguity
23:21 imirkin_: otoh you may end up with send REG1 which might not be the one we want
23:22 pendingchaos: commenting that line seems to result in "No match"
23:22 imirkin_: ok, then the next line?
23:22 imirkin_: { 0x00000040, 0x00000070, N("send"), REG1 }, // send result and store it to REG1
23:22 imirkin_: coz it's that line and { 0x00000001, 0xffffc007, T(dst), REG2 }, // SC
23:22 imirkin_: which are fighting
23:22 imirkin_: (in the parser)
23:23 imirkin_: it has no way to distinguish those
23:23 imirkin_: and it expresses its displeasure by crashing
23:24 pendingchaos: seems to work perfectly
23:24 imirkin_: i dunno what send + store results to REG1 even means
23:24 imirkin_: i guess it's store the value to REG1
23:25 imirkin_: so really it's more like mov REG1 send REG2
23:25 imirkin_: mwk: what do you think? that ISA definition looks pretty confusing. both for parsing, and for understanding by people
23:31 pendingchaos: i've gotten a macro written and it seems to work fine
23:31 imirkin_: yay
23:31 pendingchaos: I'll probably be sending a new revision of the patches tomorrow
23:32 imirkin_: that macro language is a little crazy
23:32 imirkin_: but for simple things it's ok
23:33 pendingchaos: representing some numbers can be a little annoying
23:34 imirkin_: yeah, have to do that extrinsrt jazz
23:36 pendingchaos: it all fits in one SB_IMMED_3D, which is nice
23:36 imirkin_: =]
23:37 imirkin_: and i think we still have plenty of macro code space to go
23:37 imirkin_: i'm going to have to adjust the MACRO_QUERY_BUFFER_WRITE at some point to handle the stupid 64bit value + ANY_SAMPLES case
23:38 imirkin_: and i think i'm going to run out of regs
23:39 imirkin_: which means i'll have to use a scratch value. so sad.