01:27 karolherbst: huh
01:27 karolherbst: "gr: GPC0/TPC0/MP trap: global 00000004 [MULTIPLE_WARP_ERRORS] warp 380009 [INVALID_OPCODE]"
01:27 karolherbst: but I like didn't notice anything
10:18 karolherbst: mhh now that I started to think while writin the SEL pass, it suddenly works and don't create visual artefacts...
10:20 karolherbst: total instructions in shared programs : 107871 -> 107752 (-0.11%) in SR3 :/
10:20 karolherbst: sadly it isn't much
10:33 mupuf: that is a lot
10:33 mupuf: you will often work for much less
10:33 mupuf: but this is what getting more perf is about :s
10:37 karolherbst: well
10:37 karolherbst: the good thing about the pass is, that it helps overall: total instructions in shared programs : 2136222 -> 2135031 (-0.06%)
10:37 karolherbst: ohh wait
10:38 karolherbst: still mostly eon based games
10:38 karolherbst: well, yeah
10:39 karolherbst: mupuf: to be honest, my pass isn't really generic yet anyway so I don't find all cases yet (intel SEL pass reduced instructions by over 1%)
10:39 karolherbst: but I think the effect is smaller with nouveau because we have predicated instructions
10:39 karolherbst: and having a slct instead of branched if/else usually only cuts away one instruction
10:40 karolherbst: because there is no branching in the final code
10:41 karolherbst: usually I find something like that: set//$p0 op{mov,neg,abs} $r1 $r0 and opt this into a slct
10:42 RSpliet: mupuf: well, no, getting perf still is more about reducing register pressure rather than instruction count :-P
10:42 RSpliet: (or well, on NVIDIA)
10:44 karolherbst: RSpliet: depends on the workload though, but in the usual case, yes
10:44 karolherbst: being able to run more in parallel should help a lot
10:45 karolherbst: but reducing GPR count isn't as easy as reducing instruction count
10:45 RSpliet: karolherbst: a shame isn't it :-)
10:46 karolherbst: :D
10:46 RSpliet:puts on his manager hat
10:46 RSpliet: don't try to see it as a problem, but as an opportunity. a challenge! ;-)
10:46 karolherbst: ohh before I forget. I have to fix the dual issueing situation where instruction 6 and 7 are dual issues in a block of 7 instructions
10:46 RSpliet:burns that hat
10:47 karolherbst: I think
10:47 karolherbst: not quite sure but I think nvidia never does this
10:47 karolherbst: ohh no
10:47 karolherbst: instruction 7 itself
10:48 karolherbst: sometimes nouveau generates sched opcodes where the last instruction is dual issued with the first instruction of the next block
10:48 karolherbst: no idea if we should do that or not
10:50 RSpliet: it's entirely possible that hardware just ignores it, but it sounds wrong
10:52 karolherbst: I think it doesn't matter that much in the end, because it would only makes it impossible to dual issue the first instruction in the next block, so the dual issueing information in the scheds is still valid
10:52 karolherbst: also I wrote a patch once to fix that and didn't saw any differences in perf
10:52 karolherbst: maybe the hw just defaults to the 32 cycles delay in this case?
11:15 karolherbst: mhh my dual issue pass improves dual issueing in SR3 by 10%
11:17 mupuf: what about perf? :p
11:18 karolherbst: I can't benchmark that game stabily
11:18 karolherbst: but 10% more dual issueing usually means like 1% more perf
11:18 karolherbst: but because the game runs at like 22 fps this is hardly messureable
11:18 karolherbst: improving dual issueing is a good idea in any case and doesn't harm anything
11:19 mupuf: check ALU-bound benchmarks, like piano
11:20 karolherbst: yeah
11:20 karolherbst: there I get 2% more perf
11:20 karolherbst: but dual issuing is improved by 22%
11:25 karolherbst: mhh not dual issueing the last instruction doesn
11:25 karolherbst: 't seem to have any effect
11:47 karolherbst: huh
11:47 karolherbst: what is nop u32 %r542 ?
11:49 karolherbst: I have a nop here which is then one of the results of my selps
11:49 karolherbst: originally: nop u32 %r542 (0)// phi u32 %r544 %r1181 %r542
11:54 karolherbst: ahh
11:55 karolherbst: rdsv f32 %r0 sv[POSITION:3] => 4x nop + linterp pass f32 %r500 a[0x7c] (0)
11:55 karolherbst: mhh
11:55 karolherbst: what are those nops? :D
11:56 karolherbst: ohhh okay
14:18 Tom^: give me the patches and could benchmark stuff :p
14:18 Tom^: *and i
15:10 imirkin: karolherbst: nop is used to create values that are used before they're written
15:31 karolherbst: imirkin: mhh, but how does that makes sense when the nop is used in a phi instruction?
15:32 karolherbst: in fact the nop Value doesn
15:33 karolherbst: 't seem to be ever written
15:33 karolherbst: just used
16:41 gouchi: hi
16:43 imirkin_: karolherbst: think about what happens if you have "int x; while (true) { if (foo) { x = 1; break; } }"
16:44 gouchi: imirkin_: https://github.com/libretro/RetroArch/commit/c9d3936ee53cc123bafd72b05e8ec70e12efbb10
16:44 imirkin_: gouchi: hm, cool. but that's not enough
16:44 gouchi: imirkin_: now opengl cores launch and work on nv40 card family
16:44 imirkin_: gouchi: specifically, the code needs to call e.g. glBindFramebufferEXT() instead of glBindFramebuffer()
16:44 gouchi: imirkin_: it is working
16:45 imirkin_: i guess something else is taking care of that then
16:45 imirkin_: cool
16:45 imirkin_: i guess their no-fbo's path was just highly untested
16:45 gouchi: hum
16:45 imirkin_: (as i might have expected... basically every gpu since ... a long time ago ... supports fbo's )
16:49 imirkin_: gouchi: do actual games run on nv40 though?
16:49 imirkin_: that driver is ... imperfect.
16:49 gouchi: imirkin_: yes it working
16:50 imirkin_: nice
16:50 gouchi: congratulation for your work !
16:50 imirkin_: mostly not my work, but thanks
16:55 imirkin_: gouchi: btw, which games are you trying on there? (i mean, which game systems)
16:55 gouchi: imirkin_: n64 and psp
16:55 imirkin_: ah cool. nv40 should be able to handle that easily.
16:55 gouchi: imirkin_: because they are using opengl
17:01 karolherbst: imirkin: yeah, I mean I understand that there is a reason for this in pre SSA form, but shouldn't in SSA this doesn't matter as long as there is no initial value?
17:02 karolherbst: or what happens when you read the value out from a noped reg?
17:02 imirkin_: you get a random value
17:02 karolherbst: okay
17:02 imirkin_: it's just a way to keep SSA happy
17:03 imirkin_: think of it another way
17:03 imirkin_: int x; if (foo) x = 5; if (bar) y = x;
17:03 karolherbst: mhh okay
17:04 karolherbst: now I am curious what nvidia gives you in that cases
17:04 imirkin_: there will be a phi node after that first if
17:04 imirkin_: well, nop is a fake instruction
17:04 imirkin_: it's never emitted
17:04 karolherbst: I know
17:04 karolherbst: but nvidia wrote 0 into unused regs
17:04 imirkin_: (i mean, there is a real NOP instruction, but the nop in nv50 ir is fake)
17:05 karolherbst: yeah, I understood this now, I just didn't think hard enough
17:05 karolherbst: anyway, I am now curious how nvidia handles that, because I still don't know why nvidia writes 0 into unused regs in their binaries :/
17:05 karolherbst: most likely this is completly unrelated
17:05 imirkin_: gouchi: lol. https://github.com/libretro/RetroArch/commit/67e64f4ca6e33453daf6f6250e48c08edab0a425
17:06 karolherbst: :D
17:06 imirkin_: i guess some osx ppc driver has a broken glGenerateMipmapEXT impl. that's moderately common :)
17:06 karolherbst: but this is like 20 hours old
17:06 karolherbst: ohhh
17:06 karolherbst: they added ext support?
17:07 karolherbst: and now it got removed again
17:07 karolherbst: lol
17:07 karolherbst: how they like each other
17:07 karolherbst: this is an odd style they got there
17:08 imirkin_: although yeah, their checks are still wrong
17:08 imirkin_: they might work against mesa, but that's mostly coincidence
17:08 karolherbst: yeah
17:08 karolherbst: I think they do both because of reasons
17:08 karolherbst: if they check the extensions this is fine
17:08 imirkin_: no, i mean they only look for glBindFramebuffer from what i can tell
17:08 gouchi: karolherbst: not completly removed https://github.com/libretro/RetroArch/blob/master/gfx/drivers/gl.c#L234 ;-)
17:08 imirkin_: and not glBindFramebufferEXT
17:08 karolherbst: if they also check the function pointers it shouldn't matter
17:08 karolherbst: gouchi: yeah I know
17:09 imirkin_: and same for glGenerateMipmap vs glGenerateMipmapEXT
17:09 karolherbst: gouchi: but both commits have like a few minutes difference
17:09 imirkin_: probably why it breaks on osx
17:09 gouchi: true don't know how it happened
17:09 karolherbst: ahh
17:10 karolherbst: imirkin_: right
17:15 mwk: whee
17:15 mwk: I compiled my first falcon function
17:16 mwk: now let's do one that actually contains an instruction :)
17:16 mupuf: mwk: :o
17:16 mupuf: using fucc?
17:18 mwk: mupuf: nope, my all-new target
17:19 mupuf: cool, good luck with it!
17:20 mwk: LLVM ERROR: unable to write nop sequence of 0 bytes
17:20 mwk: eh...
17:21 mwk: I'm not sure what to do about that hook, in general
17:21 mwk: you can't really implement it for Falcon
17:21 mwk: there's no way to make a 1-byte nop
17:22 mwk: for that matter, I don't know how to implement a 2-byte one either, but I suppose there is some usable instruction for that
17:22 mwk: but 0 byte nop is easy to emit...
17:23 mwk: okay, I successfully emitted a 'ret' instruction
17:24 mwk: ... for assembly, now let's try with ELF output
17:27 karolherbst: mwk: so in the end we would load firmware like we would load nvidia firmware files and just upload them and drop that include stuff?
17:27 mwk: yep, ELF works as well
17:27 karolherbst: ahh
17:27 karolherbst: cool
17:27 karolherbst: any benefits of using ELF?
17:27 karolherbst: or just simplied to use ELF because of toolkit support?#
17:27 mwk: karolherbst: uh, I was testing ELF output, not replying
17:27 karolherbst: *simplier
17:27 karolherbst: ohh okay
17:27 mwk: anyhow
17:28 mwk: directly using the ELF output is a definite possibility
17:28 mwk: or, we could use objcopy to mangle the ELF output to simple code + data binaries
17:28 karolherbst: mhh right, in the end we could just cut the stuff out we need
17:28 mwk: either way, the toolchain is going to be using ELF because that's the simplest thing to do
17:29 karolherbst: mwk: yeah, maybe that would be simplier, because we also have to distribute those firmware files through linux-firmware
17:29 karolherbst: and objcopy would make sense, because we only have to do that once for everybody
17:31 mwk: so... what now
17:31 mwk:tries clang
17:31 karolherbst: :)
17:31 karolherbst: awesome stuff by the way, I am really looking forward writing the pmu stuff in C :)
17:32 mwk: eh, clang needs to be explicitely told of a new target
17:32 mwk: let's do this later, when I can actually compile some interesting code
17:33 mupuf: mwk: so, you write LLVM-IR code by hand then?
17:33 mwk: alright, let's try implementing some ALU
17:33 mwk: mupuf: yep
17:33 mupuf: mwk: have fun :)
17:33 mwk: mupuf: sure thing
17:35 mwk: so hmm
17:36 mwk: let's try handling LowerFormalArguments...
17:38 karolherbst: mwk: do you know if it is possible to tell llvm, that specific sources have to be read out as fast as possible without putting instructions in between?
17:39 karolherbst: example: if I want to read out the PMU counters, all registers should be read out nearly at the same time and some compiler optimisations might move stuff around so that those readouts are further away
17:43 mwk: karolherbst: inline asm, I suppose.
17:43 karolherbst: meh :/
17:43 mwk: or, if you really really need it, I suppose I can make an intrinsic for that
17:43 karolherbst: that would be really awesome
17:43 karolherbst: and I would always consider a non inline asm solution first for such things, because inline asm somehow disrupt things
17:46 karolherbst: but in the end something like that might be enough: __attribute(packed_readout) { v0 = read_gpu_reg(0x0); v1 = read_gpu_reg(0x1);... } or something like that. No idea how compiler dev wants to add support for something like this
17:46 imirkin_: usually volatile + clobbers take care of things like that
17:49 karolherbst: mhh I really woul try to find a solution which doesn't include the need of asm, but if there is nothing good, then we can't do much about it then
17:53 imirkin_: gnurou: ping on https://github.com/Gnurou/nouveau/issues/10
17:57 mwk: alright, my functions can have arguments now...
17:58 mwk: now I need to model the registers properly
17:59 mwk: I suppose we're going to want to use 8/16/32-bit subregisters
18:02 mupuf: mwk: are you pushing all the regs to the stack and pop them at the end of the function?
18:03 karolherbst: mupuf: I would expect that llvm is already smarter than this
18:04 karolherbst: mupuf: like via push a reg if you read it out in a called function
18:04 mwk: mupuf: not yet, I haven't touched the frame lowering stuff
18:05 mwk: but yeah, that's the plan
18:06 mwk: the current plan is: 1) model the registers, 2) model the b8/b16/b32 multiclass, 3) implement ALU, 4) implement load/store, 5) implement calls, 6) deal with frame lowering
18:08 mupuf: sounds fun
21:22 karolherbst: imirkin_: I have an odd situation: on a mov instruction srcExists(0) returns true, but getSrc(0) segfaults :/
21:22 imirkin_: that is an odd situation :)
21:22 karolherbst: right
21:22 imirkin_: congrats
21:23 karolherbst: std::_Deque_iterator<nv50_ir::ValueDef, nv50_ir::ValueDef&, nv50_ir::ValueDef*>::operator+ (this=0x60, __n=0)
21:23 karolherbst: ...
21:23 karolherbst: it looks wrong, but I have no clue where it comes from
21:24 imirkin_: dunno. good luck.
21:24 imirkin_: my initial guess is memory corruption.
21:24 karolherbst: yeah, most likely
21:24 karolherbst: mov u32 %r1179 0x00000000 (0)
21:24 karolherbst: is the instruction, but ...
21:25 karolherbst: ohhh
21:25 karolherbst: odd
21:25 imirkin_: seems innocent enough
21:25 karolherbst: yeah, right
21:25 karolherbst: integers are in hex from
21:25 karolherbst: was confused at first
21:25 karolherbst: this->srcs.size() is 21.333333333333332
21:26 imirkin_: size() returns an integer.
21:26 karolherbst: riht
21:26 karolherbst: *right
21:26 karolherbst: that's why I think this is odd
21:26 karolherbst: mhh
21:26 imirkin_: the insn probably gets deleted and you're still holding onto it
21:26 karolherbst: defs.size() is also 21.333...
21:26 karolherbst: mhh
21:27 karolherbst: would be really odd
21:27 imirkin_: again... size() can only return an integer
21:27 imirkin_: it can't be 21.33 :)
21:27 karolherbst: well gdb shows me this value
21:27 imirkin_: gdb can't show you size
21:28 imirkin_: gdb can show you that __N_length member or whatever
21:28 imirkin_: i forget exactly
21:28 karolherbst: p this->defs.size()
21:28 imirkin_: dunno
21:28 imirkin_: anyways... good luck.
21:28 karolherbst: you can call functions while debugging ;)
21:28 imirkin_: some, but nto all.
21:28 karolherbst: right
21:28 karolherbst: it can't handle inlines
21:28 imirkin_: ... like size.
21:29 karolherbst: usually it complaints
21:30 karolherbst: seems like I have to try out this reverse debugging thing
21:32 karolherbst: Process record does not support instruction 0xc5 ...
21:44 karolherbst: he.... everything looks completly fine...
21:50 karolherbst: ... lol
21:51 karolherbst: $rsi changes through a call to this->operator+(0)
21:51 karolherbst: and this is saved thre
21:53 karolherbst: ohh wait, rdi should be this
21:58 karolherbst: movabs $0xaaaaaaaaaaaaaaab,%rsi
21:58 karolherbst: and then rsi is 0x60?
22:45 karolherbst: imirkin_: okay, there is some evil stack corruption going on, because the stack points to a place in the code it shouldn't reach
22:47 karolherbst: ohhhh
22:48 karolherbst: now it makes sense
22:48 karolherbst: the heck
22:48 karolherbst: stupid gdb
22:49 karolherbst: imirkin_: I called "n = nonTaken->getSrc(0)->getUniqueInsn()->getDef(0);" on the mov with the immediate which is obviously wrong
22:49 karolherbst: imirkin_: but gdb pointed me to this line: "n = nonTaken->getDef(0);"
22:49 karolherbst: ....
22:49 karolherbst: this is all GDBs fault :D
22:55 mwk: ugh, the flags on Falcon are the worst
22:56 mwk: it seems they rolled a dice to determine which flags will be set by which instruction
22:56 mwk: *and* they rolled again when they designed v3
22:59 mwk: signed/unsigned comparisons set CZ; sane v3 comparisons set all 4; unary instructions set OSZ; mov sets OSZ on v0, nothing on v3; shifts set C on v0, all 4 on v3; add/sub set all 4; mul/div/mod set nothing; and/or/xor set nothing on v0, all 4 on v3; sext sets SZ on both; extr* set SZ ; ins nothing; xbit nothing on v0, SZ on v3
23:00 mwk: the fucking mov instruction sets flags... I'll have to use "and A, B, B" to actually implement a mov on v0
23:01 mwk: which also happens to only support b32... but maybe that's a smaller problem than destroying flags
23:01 karolherbst: mwk: well they didn't think anybody else will need a compiler for the falcons
23:01 karolherbst: so why bother bringin sanity in there?
23:02 mwk: or I suppose I could implement a mov by st+ld
23:02 mwk: but that's even worse
23:03 karolherbst: mwk: but do you know what? Imagine how we should _ever_ be able to take care of that by writing assembly code and then wondering why $flags are weird or something :D
23:03 karolherbst: mwk: the and solution looks okayish though
23:03 mwk: well
23:03 mwk: I'm not sure yet
23:03 mwk: maybe destroying flags is not a problem
23:03 karolherbst: mhhh
23:03 karolherbst: idea
23:04 mwk: but then movs can be emitted quite late in the process
23:04 karolherbst: mwk: can you tell llvm to put the producer and the consumer of the flags together?
23:04 mwk: so who knows
23:04 mwk: yeah, I can fuse instructions
23:04 karolherbst: like that there is never something in between
23:04 mwk: basically I can make a Pseudo instruction with a custom emitter that will expand it to two instructions
23:04 karolherbst: mwk: or declare instructions "never put them between producer and consumer of flags"
23:05 mwk: well
23:05 mwk: normally that's taken care of in a simple way
23:05 mwk: you just mark which instructions destroy which flags
23:05 karolherbst: mhh okay
23:05 mwk: and LLVM knows that it needs to put the producer and consumer together, or risk an expensive spill&fill
23:05 mwk: (which sometimes isn't possible, eg. when you try to do two independent carry chains)
23:06 mwk: but mov is very special, as it can be emitted very late, post register allocation
23:06 karolherbst: right
23:06 mwk: so I have to be careful
23:06 karolherbst: but maybe llvm can even handle that?
23:06 mwk: I'm not sure
23:06 mwk: well, we'll see
23:07 mwk: I could also do crazier shit
23:07 karolherbst: yeah
23:07 mwk: make mov a Pseudo-insn that expands to mov if it's safe to destroy flags, to and if it isn't
23:07 karolherbst: sounds okay though
23:08 karolherbst: there is also movw and set I think?
23:08 mwk: these are effectively "load immediate"
23:08 mwk: what I need is a reg-to-reg copy
23:08 karolherbst: mhhh
23:08 karolherbst: pre v5
23:08 karolherbst: movw +sethi
23:08 karolherbst: v5+ mov
23:09 karolherbst: ohh wait
23:09 karolherbst: those are for constants into reg
23:09 mwk: v5 is a whole different thing
23:09 mwk: v3+ already has a sane mov
23:09 mwk: only v0 is a problem
23:09 karolherbst: mhh I see
23:09 imirkin_: v0 being... pcopy on GT215?
23:09 mupuf: mwk: do we care about v0? wasn't it only pcopy?
23:09 mwk: imirkin_: G98
23:09 mupuf: on nv998
23:09 imirkin_: er right
23:10 imirkin_: same diff :)
23:10 mwk: and it's not PCOPY, it's the VP3 trio
23:10 imirkin_: right
23:10 mupuf: v3 was the one used for pdaemon?
23:10 mwk: for the first daemon, yes
23:10 mwk: later ones are v4, v4.1, v5
23:10 mupuf: good to hear
23:10 mwk: (don't ask me what's new on v4.1, I still don't know)
23:11 mwk: so hmm
23:11 mwk: let's wire up the flags, I guess
23:12 karolherbst: we can just hope LLVM being smart about it
23:12 karolherbst: and if not, maybe we get through in saying it is a llvm bug
23:12 karolherbst: and they should fix it :p
23:12 mwk: right... except it doesn't work that way
23:15 mwk: ew
23:16 mwk: I have a grand total of 4 supported instructions, and the td file is already a mess
23:16 mupuf: mwk: so, if v0 is such a hassle, why even care about it?
23:16 mwk: but then... I haven't seen a target yet that didn't require horrible hacks
23:16 mupuf: you could reduce the number of regs there and keep one to spill your flags and reloading them
23:16 mwk: mupuf: that's horrible
23:17 karolherbst: like mov $0 $flags changes the flags again?#
23:17 karolherbst: and what would mov $flags $0 do?
23:17 mwk: no, that'd work, it just happens to be horrible
23:17 mupuf: it is, but hey, who cares? There will be no users for it until ... someone writes code for VP3 ... which is unlikely to happen ever :D
23:17 mwk: mupuf: I do care about v0
23:17 karolherbst: mwk: oh so if you write the flags with the mov, the flag won't be set by the mov?#
23:18 mwk: karolherbst: the mov to/from SR instruction is a different mov than reg-reg mov
23:18 karolherbst: ahhh I see
23:18 mwk: there are two approaches when it comes to calling instructions
23:19 mupuf: mwk: you have bigger plans than just writing the compiler then, looking forward to it :D
23:19 mwk: one is to give every instruction a separate name
23:19 mwk: ppc, s390 very much like that, and it works nice
23:19 mwk: but then you have a lot of names
23:19 mwk: eg. s390 has: AR - add reg to reg; A - add mem to reg ; AHI - add imm16 to reg; AFI - add imm32 to reg, and 10 more variants or so
23:20 mupuf: oh, fun and joy :D
23:20 mwk: the other approach is to give every instruction a simple name describing function, and disambiguate similiar instructions by operand types
23:20 mwk: which is the x86 approach
23:20 mupuf: anyway, bed time for me! Have fun again!
23:21 mwk: so you have 5 or so instructions called add: add r/m, r; add r, r/m; add r, imm; add r/m, imm
23:21 mwk: oh, and the immediates can be 8 or 16 bit
23:21 imirkin_: i like the x86 way :)
23:21 mwk: yeah, I've also used the x86 way for every arch I REd for envydis
23:21 mwk: and... tbh after working with both, I think I like ppc way more
23:22 imirkin_: how do you feel about the ppc altivec mnemonics? :)
23:22 imirkin_: a bit of a mouthful, imo
23:22 mwk: I've never seen short vector ISA mnemonics tbh
23:23 imirkin_: well, the intel ones have rhyme and reason to them... sort of
23:23 mwk: altivec is quite reasonable actually
23:23 imirkin_: the ppc ones are ... beyond hope
23:24 mwk: anyhow
23:24 mwk: the approach is kind of problematic on Falcon
23:24 mwk: because you have instructions that look very much the same, but have different behavior
23:25 mwk: there are 3 different mov's, 2 add's
23:26 mwk: oh, and there are short/long immediate variants, but that hasn't caused problems so far
23:29 mwk: at least for Falcon it's only these 2 instructions, but I very much regret doing it for G80 and all the following ones
23:29 mwk: float and integer additions should *not* be named the same
23:29 imirkin_: mwk: yeah... that i agree with
23:29 karolherbst: uhh there is floating point on the falcons?
23:30 mwk: nope, that's G80
23:30 karolherbst: ahh okay
23:30 mwk: I blame ptx for that craziness
23:31 imirkin_: mwk: well, i dunno about ptx, but nvdisasm definitely has it as IADD vs FADD
23:31 imirkin_: (as well as FADD32I and so on)
23:31 mwk: yeah
23:31 mwk:didn't have nvdisasm back then :(
23:31 imirkin_: ah =/ it's incredibly useful
23:31 imirkin_: albeit imperfect
23:45 karolherbst: mhhh, so producing selps like a stupid guy does have a negative impact
23:46 karolherbst: meh, well
23:47 imirkin_: optimizations are hard.
23:47 karolherbst: yeah, but I thought that at least adding some selps should come for free
23:49 karolherbst: huh
23:49 karolherbst: in this one shader I even added slcts
23:50 karolherbst: but also got 6 instructions more in total
23:51 karolherbst: huh
23:53 karolherbst: uhhh, that looks evil
23:54 karolherbst: imirkin_: can slct take immediates?
23:54 imirkin_: i assume so
23:54 imirkin_: but i don't know for sure
23:55 karolherbst: emitForm_A, checking
23:56 karolherbst: s == 1 || i->op == OP_MOV || i->op == OP_PRESIN || i->op == OP_PREEX2
23:56 karolherbst: so an immediate is allowed for source 1
23:57 karolherbst: at least something
23:57 imirkin_: well, not necessarily :)
23:57 imirkin_: check in nvdisasm to be sure
23:57 karolherbst: I really never get really how I should read the tables in nvdisasm :
23:58 karolherbst: :/
23:58 imirkin_: perl -ane 'foreach (@F) { print pack "I", hex($_) }' > tt; nvdisasm -b SM30 tt
23:58 imirkin_: paste in the hex values (32 bits at a time), enjoy