02:45 pmoreau: Jayhost: The commands for interacting with the mmiotracer are not outdated.
03:05 karolherbst: add u32 $r6 $r18 0x00000010; add u32 $r6 $r6 0x00000080 would there be any problem to optimize this to add u32 $r6 $r18 0x00000090?
03:25 Tom^: karolherbst: hi, we might be able to play or test things later this week. my gpu fan died completly so havent been using my computer at all :P. ordered a new custom cooler for it
03:25 Tom^: was running like 70C at idle without it xD
03:31 karolherbst: ohhh
03:31 karolherbst: sad
03:40 Tom^: i also in the process found out why that guy with that massive custom core gpu coler in here had his crashes and freezes
03:40 Tom^: the voltage regulator also needs to be cooled on like almost every card :P
03:43 Tom^: and most highends cards have a backplate that cools it from behind too.
04:33 karolherbst: :D
04:34 karolherbst: how can I merge two immediates in nv50/ir?
04:38 karolherbst: ohh I see, it isn't implemented yet
04:41 imirkin: huh?
04:46 karolherbst: imirkin: like imm0 += imm1;
04:46 karolherbst: I see only the operator= implemented
04:49 imirkin: i don't think that sort of thing has ever come up
04:49 karolherbst: imirkin: well there some really obvious cases where you can merge to adds: add(add(x, imm0), imm1)
04:49 imirkin: normally you want to create a new value
04:49 imirkin: sure, i just mean you never want to do imm0 += imm1
04:50 karolherbst: ohh okay
04:50 karolherbst: I figured creating a new value is a bit more complicated than just changing the value
04:50 karolherbst: especially because some operators are declread
04:50 karolherbst: +,-,* and /
04:50 karolherbst: but not implemented
04:51 imirkin: you have to be careful about how you interpret the bits in the value
04:51 karolherbst: because of the types?
04:51 karolherbst: and to correctly overflow the value
04:53 imirkin: yes, because of the types
04:54 karolherbst: imirkin: well at first I could only do that if the immediates have the same type
04:54 karolherbst: should cover most cases where this occurs already
04:54 imirkin: sure, but... float add != int add
04:54 imirkin: so you still have to be careful :)
04:55 karolherbst: ohh mhh
04:55 imirkin: not saying this is some impossible task, just... have to be careful
04:55 karolherbst: well I found it in a shader for u32
04:55 karolherbst: so I will concentrate on this and leave the other cases alone for now
04:57 karolherbst: I also found stuff like this: shl(shr(add(shl(a, 4), 0x80), 4), 4) where the outer shr/shl can be nooped away, but for that we need value range stuff :/
05:01 karolherbst: imirkin: does the type of the immediate value matters at all for adds or is the compiler just taking the reg.data.$type value?
05:03 imirkin: so the shl + shr thing, for u32, should be possible to undo with an and most of the time
05:03 imirkin: i've seen it a bunch too
05:03 imirkin: never got around to fixing it
05:03 imirkin: karolherbst: values don't have type, instructions have type.
05:04 karolherbst: okay
05:04 imirkin: karolherbst: i.e. i can take a value and use it in a u32 add, or a f32 add. the value doesn't care. it's just bits.
05:05 karolherbst: imirkin: okay so shr(shl(x, a)) == add(a, (0-1 << a) >> a))
05:05 karolherbst: ohh and
05:05 karolherbst: I meant
05:05 karolherbst: ohh that can be more generic
05:06 karolherbst: okay, good
05:07 imirkin: shr(shl(x, a), a) == x & ~((1 << a) - 1)
05:08 imirkin: er
05:08 imirkin: other way around
05:08 imirkin: shl(shr(x, a), a) == x & ~((1 << a) - 1)
05:15 karolherbst: imirkin: is there a difference between fma and mad in the nvidia hardware by the way?
05:15 karolherbst: or is it the same op in the end
05:20 imirkin: FMA expresses the desire that the op shouldn't be brokeen up
05:20 imirkin: and probably shouldn't be optimized the same as MAD
05:21 imirkin: but i don't think we really respect that atm
05:21 karolherbst: yeah I know the theoretical different between those, but I was thinking if that makes a difference on the hardware, because I see nvidia using mainly fma
05:21 imirkin: there's also various shit in GL around "precise" which we totally don't support (nowhere in mesa)
05:21 karolherbst: where nouveau generates mads
05:22 imirkin: no, we also generate fma's
05:22 imirkin: there's no mad
05:22 karolherbst: ohh okay
05:22 imirkin: run nouveau's output through decoding
05:22 karolherbst: k
05:22 imirkin: nv50 ir != what the actual bytecode is
05:22 karolherbst: yeah I know, that's why I asked
05:22 imirkin: it gives you the bytes... use 'em :p
05:24 karolherbst: imirkin: there is no isIntType :/
05:25 imirkin: there's !isFloatType()
05:25 karolherbst: ohh
05:25 karolherbst: makes somewhat sense
05:26 karolherbst: and I guess there are no B.. adds?
05:34 karolherbst: imirkin: I can't do i->getSrc(s)->reg.data.u8 += src->getSrc(ss)->reg.data.u8; ?
05:35 karolherbst: src is the second add
05:37 imirkin: never ever ever ever ever modify a value
05:37 imirkin: hopefully that's clear enough :p
05:37 karolherbst: :D
05:39 karolherbst: so what do I do instead?
05:39 imirkin: make a new value
05:40 imirkin: the problem is that a particular value might be used in multiple places
05:40 imirkin: and so by modifying its actual storage, you're modifying it *everywhere*
05:41 imirkin: this has been a source of many bugs for me, as i've tried to be "clever"
05:41 imirkin: there's a cloneShallow() method that will do some stuff
05:41 imirkin: should you find yourself wanting to clone some values
05:41 karolherbst: ohhh
05:48 karolherbst: imirkin: so i->setSrc(s, cloneShallow(i->getSrc(s)) ?
05:48 imirkin: why not just do what all the other logic does?
05:49 imirkin: but wtvr, yes, that's fine
05:49 karolherbst: ohh right
05:49 imirkin: but try not to invent your own ways of doing things... just read a bit of the surrounding code
05:49 imirkin: what you're attempting to do can't be *that* unique
05:57 RSpliet: hah interesting... where on nouveau I get a pretty good measure of the context switching time by monitoring PGRAPH.CTXCTL.BAR - on (an old version of) the official driver it stays 0 always
05:57 RSpliet: even when firing up multiple instances of glxgears, or even X
05:59 imirkin: i think they manage to share a single context for everything
06:03 RSpliet: in other news, reducing the "hot loop" in mmctx_xfer by one instruction (9->8) doesn't seem to reduce ctx switching time at all; seems to be very much memory bound :-)
06:03 RSpliet: (or the falcon pipeline is more advanced than I'd expect)
06:06 mwk: nah, of course it's memory bound
06:08 RSpliet: I was expecting it to be, but enjoyed the fiddle :-P
06:28 zeq: karolherbst: as long as the gpu isn't in D3cold state, reclocking is working fine for me with your patches. I go from:
06:28 zeq: 3991 frames in 5.0 seconds = 798.119 FPS
06:28 zeq: to:
06:28 zeq: 20152 frames in 5.0 seconds = 4030.305 FPS
06:28 zeq: with glxgears obviously
06:28 karolherbst: zeq: yeha, because of pcie stuff
06:28 karolherbst: that has nothing to do with reclocking
06:28 karolherbst: well changing pstates also changes the pcie link status
06:28 zeq: I mean from boot speed to writing 0f to pstate
06:28 karolherbst: so you go from 2.5 GT/s to 8.0 GT/s
06:29 karolherbst: yeah, but that's unrelated to the gpu clock
06:29 zeq: oh ok, glxgears isn't bound by gpu clock?
06:29 karolherbst: mhhh maybe with higher pcie speed, but I highly doubt that
06:30 karolherbst: I am only at 6000 fps
06:30 karolherbst: well 7300
06:30 karolherbst: and my gpu is fully reclocked and most likely much faster than yours
06:30 zeq: fwiw, I get:
06:30 zeq: 39858 frames in 5.0 seconds = 7971.506 FPS
06:30 zeq: with my IVB i7-3840QM
06:30 karolherbst: yeah, because pcie is no problem here
06:31 karolherbst: zeq: with PRIME you have to copy the buffer from the nvidia gpu to the intel gpu
06:31 karolherbst: and that only works over the pcie bus
06:31 zeq: sure
06:31 zeq: glx gears isn't exactly much *work* for the gpu
06:31 karolherbst: if you want to check if reclocking the core really helps, run something like unigine heaven
06:31 zeq: makes sense
06:31 karolherbst: you will still notice some improvements regarding pcie there, but this is more like 5%
06:32 zeq: I'll do that now
06:34 zeq: (once the torrent is in)
06:39 imirkin: zeq: glxgears is a test of everything except 3d perf
06:39 karolherbst: imirkin: okay, I do something wrong, after I clone the source and set the source on the instruction, the new source has no instruction bound to it and therefore it also doesn't appear in the bb (and obviously I can't add it)
06:40 imirkin: karolherbst: ok. well an ImmediateValue doesn't need a def...
06:41 karolherbst: well the original source had an instruction
06:41 imirkin: then why did you clone it?
06:41 imirkin: what were you going to modify about it?
06:41 karolherbst: the value
06:42 imirkin: how can you modify it if it has an instruction that defines it?
06:42 karolherbst: I have add(add(a, i), i2) and now I want to do add(a, i+i2)
06:42 imirkin: i and i2 are immediates yes?
06:42 karolherbst: yes
06:42 imirkin: then create a new immediate value that has the desired value...
06:43 imirkin: the instruction sources are actually pointing to mov's, not to immediate values
06:43 imirkin: this stuff ain't magic :)
06:43 karolherbst: :D
06:44 karolherbst: k, then I do something like new ImmediateValue(program, new_value) and put it into the source I want to replace
06:47 imirkin: use bld.loadImm() -- no guarantee that you can actually put that imm there
06:47 imirkin: bld.loadImm will create a mov
06:47 imirkin: (which can always load an imm)
06:52 karolherbst: k but at least it is working now, though with new ImmediateValue it got immediated into the add and with bld.loadImm it segfualts :/
06:52 karolherbst: guess I forgot something
06:52 karolherbst: ahh yes
06:53 imirkin: make sure you use bld.setPosition(i, false) before using the builder
06:53 karolherbst: do I have to restore setPosition=
06:54 imirkin: nope
06:54 karolherbst: k
06:54 karolherbst: nice
06:54 karolherbst: works now
06:54 karolherbst: great
06:56 imirkin: please try to make a pass that does this more generically though
06:56 imirkin: basically that takes sequences of commutative operations and tries to merge all the immediates
06:57 imirkin: there's a opInfo[].commutative or something
06:59 RSpliet: isn't associativity you need to worry about?
06:59 RSpliet: +it also
07:01 imirkin: RSpliet: give me an example of a commutative but non-associative operator
07:10 karolherbst: there are some though
07:10 glennk: imirkin, halving add
07:10 karolherbst: rock, paper, scissors is commutative but non-associative
07:10 karolherbst: :D
07:15 imirkin: alright wtvr
07:18 karolherbst: anyway, it seems to depend on the target
07:18 karolherbst: there is target->getOpInfo(op/i)
07:20 karolherbst: imirkin: how can I generalize this code then? imm0.reg.data.s8 + immSrc.reg.data.s8 is there osmething like doHostOperation(op, src0, src1, ...)?
07:20 imirkin: karolherbst: yeah. it's called ConstantFolding
07:21 karolherbst: ohh, I see
07:21 imirkin: karolherbst: you should make a pass that rejiggers expressions and then lets ConstantFolding do its work
07:21 karolherbst: ohh okay, now I got what you really meant then
07:22 karolherbst: it could also be add(mul(a, i0), i1) which would be optimized to add(a, i0*i1)
07:23 karolherbst: ohh wait
07:23 karolherbst: doesn't make sense
07:23 imirkin: :)
07:23 imirkin: i do rejigger shl(add) into add(shl) because that's often done on an address calculation
07:24 imirkin: (and i do this in the ConstantFolding pass, although that's not really the best place for this sort of thing)
07:24 karolherbst: mhh, I will gladly add this to my todo list and take care of that when I find a shader where this gives us a benefit :D
07:24 imirkin: [and if it's on an address calculation, the address load can do the + itself most of the time]
07:25 karolherbst: now the shl/shr thingies
07:27 wvuu: hello
07:27 wvuu: nouveau is friendly with EFI framebuffer? how to configure the kernel to have pretty graphics from the onset?
07:28 imirkin: wvuu: in theory nouveau should take over from efifb
07:28 imirkin: karolherbst: please try to organize your optimizations in ways that optimize as much as possible, not just one little sub-case
07:30 wvuu: so just select efifb in the kernel?
07:35 imirkin: wvuu: i think so yea
07:35 wvuu: does it make sense though?
07:37 imirkin: if you want things to be displayed before nouveau is loaded, yes
07:37 imirkin: otherwise no :)
07:37 karolherbst: imirkin: yeah, I will first collect all and then tell you about those with the biggest impact
07:37 RSpliet: karolherbst: combinations of add(mul()) was indeed what I was getting at
07:37 wvuu: alright
07:38 karolherbst: imirkin: https://github.com/karolherbst/mesa/commit/757d91849c175955106de57a3d80f1fe8a83cc97 seems pretty good for those games
07:38 karolherbst: I don'T think I saw a change in something else though
07:40 karolherbst: anyway, after I get tired looking at shaders the entire time, I will clean that mess up
08:25 karolherbst: imirkin: shr(shl(a, c)) is and(a, 0xffffffff >> c) right? somehow the result of that optimization is kind of odd
08:25 imirkin: karolherbst: opposite
08:25 imirkin: shl(shr(a, c))
08:26 imirkin: oh wait
08:26 karolherbst: I did shl(shr( already
08:26 karolherbst: :D
08:26 imirkin: that won't work
08:26 imirkin: but you can use extbf
08:26 imirkin: the problem is if the shr is s32 and not u32
08:26 imirkin: it'll sign-extend
08:26 karolherbst: ohhh
08:26 karolherbst: imirkin: does this work? https://github.com/karolherbst/mesa/commit/986ff3b5a9e226de67539aa4674c92e4b7b5fc86
08:26 imirkin: but extbf also has a 32-bit version
08:26 imirkin: er
08:26 imirkin: has a signed version
08:27 karolherbst: because then I just leave those shl(shr(a,c)) for now
08:30 imirkin: that patch seems fine
08:30 imirkin: it's in the OP_SHL block right?
08:30 karolherbst: yes
08:31 karolherbst: I tried to write a patch for shr(shl, but the result: -0.35% gpr, +0.24% instructions, but affected shaders either have hurt instruction count or helped gpr count ...
08:31 karolherbst: really odd
08:31 karolherbst: doesn't work at all
08:34 imirkin: heh
08:38 karolherbst: imirkin: https://gist.github.com/karolherbst/ec6f9bbef341c9d7ad30 :O
08:38 karolherbst: especially the end
08:41 imirkin: what seems to be the issue?
08:42 karolherbst: I just think that optimizations across BBs isn't that easy to do or doesn't it matter generally?
08:42 karolherbst: and it is ugly to work with :D
08:42 imirkin: hmmmmmmm
08:43 imirkin: this is actually weird
08:43 imirkin: coz all those BB's are doing useless work
08:43 karolherbst: yes
08:43 imirkin: unfortunately you can't tell what's going on because you posted the post-RA thing
08:43 karolherbst: ohh okay, do you want the pre-ra thing?
08:43 imirkin: that's the only way to tell what's going on
08:44 karolherbst: there https://gist.github.com/karolherbst/ec6f9bbef341c9d7ad30
08:45 karolherbst: this tgsi....
08:45 karolherbst: ELSE :0 ENDIF...
08:46 karolherbst: one might think that this would be optimized away in the tgsi already
08:47 karolherbst: there are some "UIF TEMP[2].xxxx :0 ELSE :0 ENDIF" things as well
08:47 karolherbst: can't this be dropped alltogether?
08:47 imirkin: ah sadness
08:48 imirkin: the result of those stupid things is used by branch decisions that are eliminated
08:48 imirkin: you could make a post-RA DCE that cleans this junk up
08:48 imirkin: take a look at my post_ra_dead function -- probably needs some improvement
08:48 karolherbst: imirkin: but shouldn't that be done in the TGSI already ?
08:48 karolherbst: at least some of this
08:48 imirkin: nothing should be done in TGSI
08:49 karolherbst: not even removing dead branches?
08:49 imirkin: nothing.
08:49 karolherbst: k
08:49 imirkin: you don't want 10 different optimizers all doing the same thing
08:49 imirkin: that just slows things down
08:50 imirkin: i have patches to remove a lot of the opt done before nouveau
08:50 karolherbst: k
08:50 imirkin: unfortunately that triggered some sad cases in nouveau and didn't speed anything up
08:50 imirkin: however i think i've mitigated some of that
08:50 imirkin: but haven't repeated the experiment
08:50 imirkin: the patches in question are on my master branch in github
08:50 karolherbst: that glsl looks funny though, none if or functions, just a bunch of ?: checks
08:52 karolherbst: imirkin: what would be the benefit of cleaning all those branches up? I thought it doesn't matter in the emited code
08:52 imirkin: the branches HAVE been cleaned up
08:53 imirkin: the problem is that things were computed for the sole benefit of those branches
08:53 imirkin: and THOSE things weren't eliminated
08:53 imirkin: 622: add ftz f32 $r0 $r0 0.000001 (8)
08:53 imirkin: see how that's just overwritten by
08:53 imirkin: 623: mov u32 $r0 0x3f800008 (8)
08:56 karolherbst: ohh okay
08:57 karolherbst: that means there needs to be a post-ra pass that will remove all those useless BBs and then a post-ra dead-code-elimination pass
08:57 imirkin: the BB's are fine
08:58 imirkin: don't worry about the BB's
08:58 imirkin: touching the BB's will lead you into a world of pain that you want nothing to do with
08:58 imirkin: i'm talking about the instructions *in the bb's* which were removed as part of the post-RA code layout process
08:59 imirkin: and now there are a bunch of other instructions which are no longer used
08:59 imirkin: as a result of that
08:59 karolherbst: ohh okay
08:59 imirkin: so... a post-ra DCE pass
08:59 karolherbst: okay
08:59 karolherbst: like the NV50PostRaConstantFolding does it somewhat, just for the entire program
08:59 imirkin: all the " -> BB:123" stuff aren't actual branches
09:00 imirkin: that's just for amusement value
09:00 imirkin: it effectively prints out the CFG
09:00 karolherbst: k
09:00 imirkin: if it says "bra BB:123" then it's a branch
09:00 imirkin: and has a number associated with it
09:01 karolherbst: imirkin: so the post-ra DCE pass would remove instructions 620-622 (and a lot of others here)
09:03 imirkin: yes.
09:04 karolherbst: imirkin: and if a register is part of the result of the shader, it will have a def I assume?
09:07 imirkin: right.... so that's a little annoying =/
09:07 imirkin: there will be a mov with a subOp == 1
09:07 imirkin: never delete those
09:07 imirkin: but make sure to run your little DCE before the mov $ra $ra get removed as no-ops
09:07 imirkin: since that will often remove those subOp == 1 mov's
09:13 karolherbst: where is Instruction::isDead implemented? :D
09:14 karolherbst: ohh found it
09:19 imirkin: karolherbst: look at my post_ra_dead() thing
09:19 karolherbst: ohh
09:19 imirkin: i didn't write it just for the sheer joy of it
09:20 imirkin: although even that could be wrong... dunno
09:20 imirkin: i think we have idiotic situations where defExists(0) == false, but defExists(1) == true
09:21 imirkin: e.g. if you're doing an add and only want the carry
09:21 karolherbst: 60 instructions down in that shader..
09:21 imirkin: so you'd have to deal with that somehow
09:22 karolherbst: okay, yeah, there is something odd
09:22 karolherbst: got some nvc0_program_translate:572 - shader translation failed: -5
09:24 karolherbst: imirkin: some joinats and joins got eliminated, or does that happen when the branch gets emptied?
09:24 karolherbst: I mean the BB
09:25 imirkin: joins go around the branch stuff
09:25 imirkin: so yeah, they get removed as well
09:26 heiko: Hello
09:28 heiko: can I help the nouveau project ? I have a notebook with GT 540 M and my Desktop system have a GTX 670
09:28 heiko: The GT 540 M on my notebook have nothing todo
09:29 karolherbst: imirkin: exit got removed
09:29 karolherbst: heiko: fermi reclocking
09:30 heiko: karolherbst:how can I help ?
09:31 karolherbst: no good idea except figuring out how it works :/ there is some stuff done in this direction but nothin really works yet
09:32 imirkin: karolherbst: don't touch insn->fixed stuff
09:32 imirkin: heiko: what kind of help did you have in mind?
09:33 karolherbst: imirkin: I guess I can add those checks into the post_ra_dead function, because that shouldn't be removed in either case
09:34 heiko: karolherbst:is there any program what I can run on the fermi chip to provide information aon the chip and yes recolcking on fermi is interesting
09:34 heiko: karolherbst:my notebook have 2 Graphic cards, on in intel CPU I think CPU is sandy bridge and the second is the GT 540 M#
09:35 imirkin: karolherbst: mmmm yeah. also btw make sure you do the DCE backwards. (not sure if the main pass does that. if it doesn't, it should too)
09:35 heiko: imirkin:is there any program what I can run on the fermi chip to provide information aon the chip and yes recolcking on fermi is interesting
09:36 karolherbst: imirkin: cause it is faster I assume
09:36 heiko: imirkin:my notebook have 2 Graphic cards, on in intel CPU I think CPU is sandy bridge and the second is the GT 540M
09:36 karolherbst: heiko: doesn't really matter, cause this is a DDR3 fermi,
09:37 karolherbst: heiko: reclocking isn't that easy to do though, because otherwise somebody would already finished it
09:37 imirkin: heiko: you could make a mmiotrace, but that's only useful if someone is going to analyze it. the biggest lack on the nouveau project is manpower, not hw access.
09:39 karolherbst: imirkin: total instructions in shared programs : 1211680 -> 1149149 (-5.16%) after filtering out EXIT and fixed ones, so in the end the benefit might be much smaller
09:39 imirkin: karolherbst: 5% is still pretty good ;)
09:39 heiko: karolherbst:don't know
09:40 karolherbst: imirkin: yeah and saints row is all black now :D
09:40 heiko: imirkin:ok.
09:42 imirkin: karolherbst: i bet you could have removed a TON more instructions and gotten that result too :)
09:42 karolherbst: :D
09:42 karolherbst: okay, now I have to check what went wrong
09:42 imirkin: karolherbst: i suspect that post_ra_dead thing needs some of the checks in isDead() as well
09:42 karolherbst: imirkin: I just copy all checks over I guess
09:43 imirkin: karolherbst: basically copy the isDead function but remove the "getDef(d)->reg.data.id >= 0" bit
09:43 karolherbst: okay
09:43 imirkin: that was the key badness
09:43 imirkin: or add a postRA=false by default param
09:46 Jayhost: Thanks pmoreau
09:46 karolherbst: imirkin: yay, it isn't all black anymore
09:46 Jayhost: MMIO hangs at nvidia enabling 2d accell on gm107
09:47 karolherbst: imirkin: only -0.75% though now
09:47 RSpliet: and partially black?
09:47 imirkin: karolherbst: heh, i guess some of those instructions were useful afterall
09:47 pmoreau: Jayhost: Do you have some logs from when it hangs?
09:48 karolherbst: imirkin: not partially black, but the game turned a bit more violett now
09:48 Jayhost: pmoreau the xorg log and 28 lines of MMIO?
09:49 imirkin: karolherbst: because you forgot to do the thing i said with running it before the mov a,a elimination
09:49 imirkin: karolherbst: and because you're not checking for mov + subop 1
09:49 pmoreau: Jayhost: More like the dmesg output. But, you have to know that under mmiotracing, it can take ~30sec to reach the NVIDIA logo when starting X.
09:50 karolherbst: imirkin: yeah I know
09:50 imirkin: karolherbst: what you can do is make the mov a,a pass not touch the subop == 1 mov's
09:50 imirkin: and then run your DCE
09:50 imirkin: and then remove any subop == 1 mov's that are a,a
09:52 karolherbst: imirkin: in my pass there are movs with subop == 1
10:01 karolherbst: imirkin: yay, it is working:) only -0.5% instructions though
10:02 imirkin: cool
10:04 karolherbst: imirkin: it helps none of the shaders in the deafult shader-db :D
10:11 karolherbst: imirkin: so this could be also optimized? https://gist.github.com/karolherbst/ab4f1fe997537836b12f
10:11 imirkin: karolherbst: please read carefully.
10:13 Jayhost_: pmoreau dmesg http://pastebin.com/JCucKhUn
10:13 karolherbst: imirkin: ohh that is the head of a loop?
10:13 imirkin: karolherbst: @P0 mov $r5 value; use $r5
10:14 imirkin: karolherbst: what kind of optimization did you have in mind?
10:14 karolherbst: ahh, I wanted to move the immediate into the rcp instruction
10:14 karolherbst: ohh no
10:14 imirkin: it's predicated
10:14 karolherbst: just move the result in instead of doing rcp
10:15 imirkin: you'd have to do that to all the other sources of the phi node
10:15 imirkin: nouveau has no such optimizations
10:15 karolherbst: ohh okay
10:17 pmoreau: Jayhost_: Did you run mmio from the dmesg you linked?
10:19 karolherbst: imirkin: is mul_x2^-1 (a*b*2)^-1 or something else?
10:20 imirkin: no
10:20 Jayhost_: pmoreau: rmmod nvidia; dmseg > tempdmeg; echo mmiotrace > ; cat /sys; modprobe nvidia; startx; black screen for over 20 minutes
10:20 imirkin: it's a * b * 2^-1
10:20 karolherbst: ohh so (a*b)/2
10:20 imirkin: sure
10:21 imirkin: the range on that is 3..-3 iirc
10:21 karolherbst: k
10:21 imirkin: so up to *8, and down to /8
10:21 pmoreau: Jayhost_: Ah! You generated the dmesg before running mmiotrace, I understand better now. :-)
10:21 karolherbst: imirkin: ahh, okay, got it
10:23 imirkin: karolherbst: so a stupefyingly-clever optimizer would do a*b*c*64 as t = a*b*8; x = t*c*8 :)
10:23 karolherbst: :D
10:23 karolherbst: yeah, got it
10:23 pmoreau: Jayhost_: Could you `cat /dev/kmsg > some_file&` before starting the mmiotrace and link the resulting file?
10:23 karolherbst: I think I saw stuff like that already, but no idea where
10:25 karolherbst: imirkin: stuff like this? 121: mul ftz f32 $r12 $r11 c3[0xb0]; 124: mad ftz f32 $r12 $r12 0.500000 $r0 => mul x2^-1 ftz f32 $r12 $r11 c3[0xb0]; add ftz f32 $r12 $r12 $r0
10:25 imirkin: right
10:26 imirkin: that could execute faster
10:26 imirkin: but ... meh.
10:27 karolherbst: so you think it isn't worth it
10:28 imirkin: how often does that come up
10:28 imirkin: and how hard would it be to deal with this generically
10:28 karolherbst: well I fount that three times in a row in a saint rows shader :D
10:28 karolherbst: *found
10:29 karolherbst: imirkin: I think mul/add can be dual issues by the way
10:29 karolherbst: orr, wait
10:30 karolherbst: yeah, I think yes
10:30 imirkin: check the func in SchedDataCalculator?
10:30 imirkin: although that's for kepler
10:31 karolherbst: I think to mads can also dual issued
10:31 karolherbst: same opclass and F32
10:31 karolherbst: well same opclass AND arithmetic and (F32 or ADD)
10:31 karolherbst: so it shouldn't matter in that case
10:35 Jayhost_: pmoreau http://pastebin.com/NMMTeLJs
10:36 pmoreau: Jayhost_: Thanks!
10:37 pmoreau: Unfortunately, you're hitting the same bug as us…
10:37 karolherbst: min ftz f32 $r0 $r0 1.000000; set ftz u8 $p0 eq f32 $r0 1.000000
10:37 karolherbst: that looks like something
10:37 imirkin: karolherbst: how so?
10:37 Jayhost_: pmoreau slightly reassuring. What's going on?
10:38 karolherbst: imirkin: because set could be changed to ge 1.000 and the min removed
10:39 pmoreau: Jayhost_: No idea… There was some discussion about the meaning of the "secondary hit for address", but I have forgotten what resulted.
10:41 imirkin: karolherbst: hmmmm
10:41 karolherbst: set(min(a, c) == c) => set(a >= c) looks about right
10:41 imirkin: karolherbst: does $r0 also not get used afterwards btw?
10:41 karolherbst: imirkin: set ftz u8 $p0 ge f32 $r63 $r0 ?
10:41 karolherbst: :D
10:41 karolherbst: ohh wait
10:42 karolherbst: that makes actually sense :D
10:42 karolherbst: but still, maybe some shader benefit from that
10:42 Jayhost_: pmoreau what can I do to help? Does this mean MMIO doesn't step properly?
10:43 imirkin: karolherbst: of course the second one could also use the original $r0 as well
10:43 imirkin: karolherbst: you could think about all the identities you can come up with re min/max and comparisons
10:43 karolherbst: imirkin: right, because it simply doesn't matter...
10:44 pmoreau: Jayhost_: To help solve that MMIO bug? Apart from trying to find where it comes (new NVIDIA driver version? new kernel version?), I don't know.
10:44 karolherbst: not that mmio bug again :/
10:44 pmoreau: Yeah… :-(
10:45 karolherbst: I spend already hours into that and ... well I still have it
10:45 pmoreau: Jayhost_: If you are mainly interested in getting a MMIOtrace, you could try an iso from last Nov. or Dec., and it should probably work.
10:46 pmoreau: karolherbst: What have you tried so far? Older kernel/blob versions? Or try to hack around the mmiotrace code?
10:46 karolherbst: pmoreau: tried to find a setup where it worked and failed...
10:46 pmoreau: :-(
10:46 karolherbst: the next time it works I will backup my entire system or something :D
10:47 pmoreau: Could we use ezbench to bisect that? :-D
10:50 karolherbst: imirkin: so we can optimize every case where the immediate fullfills the comparison, like set.cmp(set.imm0, src.imm0) true, that means we can be smarter
10:51 Jayhost_: pmoreau: mainly interested in helping gm107 reclock however is best
10:52 pmoreau: Jayhost_: Helping by providing information, or even by playing with the code?
11:01 Jayhost_: Yes both
11:05 pmoreau: Nice! RSpliet and karolherbst (and mupuf) should be able to guide you on the reclocking path
11:06 pmoreau: As for the mmiotrace, maybe you could get one by trying a 2015 ISO.
11:06 Jayhost_: Debian Jessie?
11:09 pmoreau: Maybe (I have no idea when new Debian version are released)
11:14 Jayhost_: Would it be better to downgrade packages or build mmiotrace from source?
11:14 Jayhost_: rather than full install
11:15 Jayhost_: I could boot into old kernel and try again
11:15 pmoreau: mmiotrace is a kernel module
11:16 pmoreau: You could try to boot an old kernel, but then you might run into troubles due to the nvidia module being built against a different version.
11:31 karolherbst: imirkin: what happens if there is something like min u8 $r0 $r5 -1?
11:33 imirkin: there's no u8 min variant
11:33 karolherbst: ohh
11:33 imirkin: i mean you can stick whatever types you want whereever you want
11:33 imirkin: but that doesn't necessarily mean it has to make sense
11:35 karolherbst: yeah, but if I do this set(min/max) pass, I have to compare those immediates somehow
12:11 imirkin: with enough hax: OpenGL core profile version string: 4.3 (Core Profile) Mesa 11.2.0-devel
12:11 imirkin: heh
12:11 pmoreau: \o/
12:12 vedranm: imirkin: excellent, which card is that?
12:12 imirkin: vedranm: i mean... you can get it to say that on any card you want
12:12 vedranm: imirkin: of course
12:13 imirkin: this is on fermi, but like half the features are actually broken
12:13 vedranm: aha, I see :D
12:13 vedranm: looks good regardless
12:13 imirkin: i'm a little afraid to try GRID Autosport... the only "advanced" game i have, with this
12:13 hakzsam: imirkin, MESA_GL_VERSION_OVERRIDE=4.3? :p
12:13 imirkin: hakzsam: no, check my gl43-integration branch
12:13 imirkin: the last commit is esp good
12:14 hakzsam: I was joking :)
12:14 imirkin: i never got around to implementing ARB_shader_image_size in the backend, image support is only about 50% working, and i'm just totally lying about AoA
12:15 imirkin: and there's a 50% chance i actually broke texture views while trying to fix images
12:17 karolherbst: imirkin: I think I will give up on this set, min/max thing because this is a monster for itself :/
12:18 imirkin: oh, and there's absolutely no robustness ext support at all
12:18 imirkin: karolherbst: yeah it's a pain
12:18 karolherbst: and I am getting kind of tired :D
12:20 karolherbst: imirkin: what I am always thinking about, is there some kind of SIMD mul instructions or something like that?
12:21 imirkin: karolherbst: yea
12:21 imirkin: there's all the v* ops
12:21 imirkin: check the ptx isa for details
12:21 imirkin: they're called "video opcodes"
12:21 karolherbst: :D
12:21 imirkin: i doubt you'll find them helpful though
12:22 karolherbst: "Scalar Video Instructions" ?
12:22 imirkin: they have non-scalar ones too
12:22 imirkin: and things like NV_gpu_shader5 have types like u8vec4
12:22 karolherbst: ahh found it
12:23 imirkin: but even with that, i never ever ever saw the blob use those ops
12:23 jeremySal: is the firmware in the gk20a directory the same as the firmware in the gm206 directory?
12:24 karolherbst: imirkin: strange
12:24 imirkin: jeremySal: nope... what firmware in the gm206 directory btw?
12:24 jeremySal: imirkin: there is none, but my driver is looking for it there... I'm wondering if it's a bad idea to copy it over manually
12:25 imirkin: jeremySal: even if you had firmware, it wouldn't work - we don't currently have a loader for it
12:25 imirkin: jeremySal: GM20x doesn't, at present, support any acceleration
12:25 jeremySal: oh, so the kernel error about loading the firmware is to be expected?
12:26 imirkin: yep
12:26 jeremySal: imirkin: thanks!
12:27 jeremySal: another question, if I wanted to help nouveau and also learn something about driver development, is there a good place to start?
12:28 imirkin: step 1: get a gpu that nouveau runs ok on :)
12:28 imirkin: which is basically every nvidia gpu except yours
12:28 jeremySal: imirkin: haha, I have a GTX960. I think it's running fine
12:28 jeremySal: or at least, it's giving me video
12:28 imirkin: there's no gpu accel at all
12:28 imirkin: it's all cpu-rendered
12:29 jeremySal: oh damn
12:29 jeremySal: Is there something useful I could work on with my GPU?
12:29 imirkin: all the GTX 9xx's are probably GM20x's, so they're out, at least for now
12:30 imirkin: well, what sort of things are you interested?
12:30 vedranm: hakzsam: ahahaha, muh future OpenGL core profile version string: 4.6 (Core Profile) Mesa 11.2.0-devel (git-e707b9d)
12:30 hakzsam: vedranm, :)
12:30 imirkin: you mentioned "driver development", but that's a pretty wide swath of stuff
12:30 jeremySal: yeah, I mean
12:31 imirkin: jeremySal: a bunch of GL features were added on GM20x, so you could trace them with the blob and get us closer to ready for when we get accel going on there
12:31 jeremySal: I'm relatively experienced with C
12:31 jeremySal: I've tried learning about how the graphics system works on linux before
12:31 jeremySal: But I couldn't find much between the high level overview
12:32 jeremySal: and just staring at code
12:32 jeremySal: Okay, you mean trace the nvidia driver?
12:32 imirkin: ya
12:32 jeremySal: Is there a guide?
12:33 imirkin: http://nouveau.freedesktop.org/wiki/Valgrind-mmt/
12:33 imirkin: the idea is that you'd write programs that traced the new functionality and analyzed the traces and added various docs
12:34 imirkin: bbiab
12:35 jeremySal: yeah
12:35 jeremySal: I think I could do that
12:38 jeremySal: First I need to get the nvidia driver working for me...
12:42 karolherbst: imirkin: ... I just run a shader through nvidia and guess what, the mmt doesn't contain any binary or I just doesn't find it
12:45 imirkin: jeremySal: yeah, definitely step 1 of that task :)
12:45 imirkin: karolherbst: dri3 messes it up
12:46 karolherbst: imirkin: means I have to start the other x server with dri3 disabled?
12:46 imirkin: karolherbst: how big is the resulting mmt?
12:46 imirkin: karolherbst: LIBGL_DRI3_DISABLE
12:46 karolherbst: imirkin: aroun 300K
12:47 imirkin: hmmm, that's reasonable
12:47 karolherbst: was CODE the keyword for the binaries or something else?
12:47 imirkin: START_ID
12:48 karolherbst: k, there is no START_ID
12:48 imirkin: probably can't find the IB...
12:48 imirkin: can you search for "IB address"?
12:49 karolherbst: yeah, have some
12:49 imirkin: and there was a draw?
12:49 imirkin: search for BEGIN
12:49 karolherbst: GK104_3D.VERTEX_BEGIN_GL?
12:49 imirkin: yes
12:50 karolherbst: after w 9:0x001c, 0x00a9a001 IB: address: 0x100262980, size: 10856, buffer id: 8
12:50 karolherbst: makes somewhat sense
12:50 imirkin: and no START_ID??
12:50 karolherbst: no START_ID
12:50 karolherbst: should I downgrade the driver?
12:50 imirkin: inconceivable!
12:51 karolherbst: ohh wait, my mmt I sent you was with the same dirver
12:51 imirkin: you sent me an mmt?
12:51 karolherbst: yeah, I think yesterday
12:52 imirkin: ah right
12:52 imirkin: yeah, and that one was fine
12:55 karolherbst: maybe just bad luck with that shader
12:56 karolherbst: ohh wait, there is some code
12:56 karolherbst: what is GK104_3D.GRAPH.MACRO_CODE_DATA?
12:56 imirkin: it's.... macro code data? :)
12:57 karolherbst: :D
12:57 imirkin: it's all the macro's
12:57 karolherbst: there is a lot of it there actuall
12:57 karolherbst: y
12:58 imirkin: ya, they're uploaded on ctx init
12:58 karolherbst: ohhh okay
12:58 imirkin: i'm guessing that your draw didn't actually use the program in question
12:58 imirkin: although there should still be *some* program... weird
12:58 imirkin: search for 00001de7
12:59 karolherbst: not there
12:59 imirkin: yeah there's no code in that trace
12:59 imirkin: that's an exit instruction
12:59 karolherbst: mhh, but why isn't there any code...
13:00 imirkin: and it never sets the START_ID on vert/frag shaders, i got nothin'
13:03 karolherbst: imirkin: did you ever tried to mmt those generated shader-db shaders?
13:04 imirkin: the shader-db shader_test files don't draw fyi
13:04 imirkin: so you won't see them in a mmt trace
13:04 karolherbst: ... k
13:04 pmoreau: What is a `Value->reg.data.id`? Getting a crash in RA cause `Value->reg.data.id == -1`: https://phabricator.pmoreau.org/P77
13:04 imirkin: it's the... id of the register
13:04 imirkin: -1 means it's a pre-RA value
13:05 karolherbst: pmoreau: wanna try out some patches of mine? I digged into that RA stuff some months ago and fixed maybe some minor memory corruptions
13:06 karolherbst: pmoreau: one of the things I added for myself: https://github.com/karolherbst/mesa/commit/902870a878943e5d1c174a7d95093e48776aa295
13:06 karolherbst: pmoreau: this patch helps when the RA stuff is run multiple times: https://github.com/karolherbst/mesa/commit/4538c5c1a59952e97b657784d0419f1cfea8b052
13:06 imirkin: pmoreau: the ArgumentMovesPass
13:06 imirkin: pmoreau: it has to do with your call node
13:07 pmoreau: imirkin: Ah, Value->id != Value->reg.data.id, ok
13:07 imirkin: pmoreau: you're probably not setting it up correctly
13:07 imirkin: pmoreau: see how other CALL's are set up and do the same thing
13:07 imirkin: there's a fixed calling convention
13:08 pmoreau: I mostly did the same thing as what i had in my previous version, which worked and was based on the tgsi conversion code.
13:09 pmoreau: But I'll have another look
13:10 glennk: imirkin, what does nouveau do with integer division/mod?
13:10 pmoreau: karolherbst: You haven't tried upstreaming those?
13:11 karolherbst: pmoreau: I should right :/
13:11 imirkin: glennk: http://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/codegen/lib/gf100.asm#n2
13:11 karolherbst: imirkin: [test] draw arrays GL_TRIANGLE_FAN 0 1 works really nice :D
13:11 imirkin: glennk: there's also a code version of that on nv50 (since nv50 doesn't have mul hi)
13:12 glennk: tgsi_divmod in r600_shader.c for comparison
13:13 imirkin: glennk: there's also this one: http://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp#n480
13:14 imirkin: (and handleMOD below it which basically just calls handleDIV)
13:15 glennk: at some point i should look into if r600's RECIP_[U]INT is useful, and indeed if it also exists on cayman, the opcode table says it doesn't but the docs says its there
13:15 glennk: but i think glsl might want to lower some of the simple ones, such as / 3 etc
13:17 imirkin: well you kinda want to do all that *after* all your opts run
13:18 glennk: right before emitting tgsi i mean
13:19 imirkin: right... so the driver wouldn't stand a chance
13:31 karolherbst: imirkin: so, 5 latest patches: https://github.com/karolherbst/mesa/commits/to_upstream
13:31 karolherbst: these are the good ones
13:31 karolherbst: well the swap doesn't do much though :/ but I thought it is good to have it
13:44 mooch: is it just me or are all the available bios dumps of nv20-era cards bad dumps?
13:45 mooch: i mean, i got one that was 61 kb, and one that was 47 kb
13:45 imirkin: mooch: those sizes seem perfectly reasonable
13:45 mooch: how do you make a bad dump of a vga card?
13:45 mooch: imirkin: they certainly weren't everything that was on the bios chip
13:45 imirkin: in fact a little larger than i'd expect
13:45 mooch: imirkin: the riva tnt bios is a perfect 64 kb so
13:45 imirkin: not everything that's on the bios chip is accessible directly
13:46 mooch: so you're saying they used something like flash or eeprom?
13:46 mooch: ugh
13:48 imirkin: mooch: i dunno. normally you just read out the option rom accessible over pci and that's it
13:48 mooch: i did find a geforce 3 bios that was 64 kb
13:48 mooch: but when i tried to run it with my emulation, it gave an svga size of 0x1
13:49 mooch: at some weird refresh rate above 100 hz
13:49 mooch: did the svga parts change between nv4 and nv20?
13:49 imirkin: i'm guessing a lot of it just stuff captured from PRAMIN which won't be the actual bios you want
13:49 imirkin: the display stuff all changed between nv4 and nv5, and between nv5 and nv10
13:49 mooch: yeah, but did the vga core?
13:49 imirkin: well, maybe not *all*, but a bunch
13:49 mooch: like, the stuff in the vga crtc regs?
13:50 mooch: also, it never even accessed any nvidia-specific registers
13:51 imirkin: ah yeah, dunno
13:52 mooch: so i'm thinking that was a bad dump too
13:52 mooch: also, i tried my nv4 with win2k
13:53 mooch: it still couldn't use the card
13:53 mooch: or rather, the drivers
14:00 karolherbst: imirkin: where should I send nouveau mesa patches? to the nouveau ML or mesa-dev?
14:01 imirkin: mesa is fine
14:06 jeremySal: imirkin: I got the mmt valgrind running
14:06 jeremySal: What should I trace?
14:06 imirkin: jeremySal: for starters, glxgears, to make sure that you got it set up fine
14:07 jeremySal: I just ran it on glxgears
14:07 imirkin: ok, now grab envytools
14:07 jeremySal: is there something I should do to inspect the log?
14:07 imirkin: there's a 'demmt' tool in there
14:08 imirkin: which will greatly assist in the decoding of those mmt logs
14:08 jeremySal: ok great
14:11 mooch: but how would i log windows's accesses to a real nv4?
14:12 mooch: would i have to build some sort of pci bus analyzer or something? o.o
14:12 imirkin: mooch: pci passthrough in a VM
14:12 mwk: if you can find a PCI NV4 and a motherboard that allows doing that for an actual PCI device...
14:13 imirkin: mwk: well, i have a PCI NV5
14:13 imirkin: mwk: don't you just need a recent mobo with VT-d or whatever?
14:13 imirkin: i.e. an iommu
14:14 mooch: well, imirkin: could you maybe do that for your nv5? that might help me with my nv4 emulator
14:14 imirkin: (and by recent i mean like 5 years by now)
14:14 imirkin: mooch: nowhere near enough time for it, sorry
14:14 mooch: shit
14:18 jeremySal: imirkin: Okay, so I was able to convert the trace to text
14:18 imirkin: cool
14:18 imirkin: now search for START_ID
14:18 jeremySal: It doesn't appear
14:18 jeremySal: I used mmt_bin2dedma
14:18 imirkin: grrrrrrrrrrrrr
14:19 imirkin: nonon
14:19 imirkin: demmt
14:19 jeremySal: ooh, okay
14:19 imirkin: you don't want nothin' to do with the dedma stuff
14:19 jeremySal: I see, so the demmt looks up known nvidia calls?
14:20 imirkin: it decodes the trace and looks things up in rnndb
14:20 imirkin: which is our database of... stuff
14:20 imirkin: search for START_ID now?
14:21 jeremySal: imirkin: there we go, it's showing many instances of it
14:22 imirkin: ok
14:22 imirkin: look at any random one
14:22 imirkin: except for one that says TCP
14:22 imirkin: and scroll down a bit
14:22 imirkin: you should see first a shader header
14:22 imirkin: followed by shader code
14:23 mooch: imirkin: do you know anybody who knows anything about how early nvidia drivers start up?
14:23 imirkin: some people at nvidia?
14:23 imirkin:is not being helpful
14:24 mooch: imirkin: yeah, but they wouldn't tell me
14:25 jeremySal: imirkin: I'm looking at some shader code right now
14:25 imirkin: jeremySal: awesome
14:25 imirkin: ok, so that's the basics of it
14:25 imirkin: now
14:25 imirkin: getting back to my original suggestion
14:25 imirkin: GM20x introduced a bunch of new features
14:25 imirkin: one of them is ...
14:25 imirkin: GL_ARB_shader_viewport_layer_array
14:26 imirkin: the idea would be to basically write a quick piglit test (piglit is an opengl testing framework) and trace it
14:26 imirkin: and see what the blob does
14:26 imirkin: and then document it
14:26 jeremySal: I see
14:27 jeremySal: Is glxgears doing something relatively complicated?
14:27 imirkin: the piglit repo is here: http://cgit.freedesktop.org/piglit
14:27 imirkin: not really, no
14:27 imirkin: but also nothing particularly new
14:27 jeremySal: Like, if I wrote a minimum opengl program to use this functionality, would it still produce 90MB traces?
14:27 imirkin: definitely not
14:27 imirkin: glxgears does something simple A LOT OF TIMES
14:27 imirkin: heh
14:27 jeremySal: Ok, awesome, so it would be much easier to keep track of what's going on
14:27 jeremySal: I see
14:28 imirkin: each frame is probably 1-2K of data
14:28 imirkin: but there's a million frames in there :)
14:28 jeremySal: gotcha
14:28 jeremySal: Okay, I'll try it out
14:28 jeremySal: thanks for the tips!
14:29 imirkin: btw, i definitely know i haven't provided you sufficient info to do this yet
14:29 imirkin: so if you're confused, that's good - you should be
14:29 imirkin: keep asking questions
14:29 imirkin: :)
14:53 mooch: someone should really update the human readable docs more often
14:54 RSpliet: mooch: are you volunteering? ;-)
14:54 mooch: no
14:54 mooch: i don't know enough about the hardware
14:54 RSpliet: darnit... but you're right, good public docs are useful to get people involved, but in general the biggest problem with nouveau currently is the lack of manpower
14:54 mooch: plus, i'm mainly interested in the hardware from an emulation standpoint, not a driver standpoint
14:55 RSpliet: oh you must love the recent hwtest work then
14:55 mooch: eh, i'm trying to emulate older nvidia hardware
14:55 mooch: but what did these tests reveal?
14:55 mooch: and where can i find the tests?
14:55 effractur: does nouvea support the nividea vgpu stuff?
14:56 effractur: nouveau*
14:56 RSpliet: they document (in C code) the algorithms behind the shader ops
14:56 imirkin: effractur: vgpu?
14:56 imirkin: effractur: you mean the GRID stuff?
14:56 effractur: imirkin: y
14:57 imirkin: effractur: i know someone got nouveau loaded on amazon ec2's stuff
14:57 imirkin: i doubt it'd work well at the hypervisor level though
14:57 effractur: imirkin: more the the client side
14:57 mooch: rspliet: link?
14:58 mooch: wait, i found it
14:58 mooch: holy shit, it's for tesla too
14:58 RSpliet: the envytools git history would point you in the right direction
14:59 mooch: i could write some more hwtests, but i would need access to a machine running an old card
14:59 mooch: say, an nv4
15:03 Jayhost: hey Rspliet I've been checking out your 240 reclock presen. Looking to help do the same for gm107.
15:03 Jayhost:
15:09 RSpliet: Jayhost: if I could, but no time and no hardware
15:24 mupuf: imirkin: hey, one pixmark case regressed by 12% after this commit: http://cgit.freedesktop.org/mesa/mesa/commit/?id=f97f755
15:24 mupuf: http://fs.mupuf.org/mupuf/nvidia/nvd9_xonotic.html <-- the full report. Will check what the heck is wrong with xonotic soon
15:25 Jayhost: Karolherbst: Did you say you tried multiple kernels and drivers with MMIO bug?
15:31 imirkin: mupuf: huh. surprising.
15:32 mupuf: surprising is indeed true
15:32 mupuf: no idea what happened with xonotic, but I suspect that some state got broken and the benchmark fails to run or somethin
15:32 mupuf: g
15:33 mupuf: this is why I need to re-run on previous commits to confirm regressions
15:34 imirkin: mupuf: i don't understand how to read anything on that page =/
15:34 mupuf: I am not surprised...
15:34 mupuf: I nreally need to sit down and think about usability
15:35 mupuf: so, here is a short version
15:35 mupuf: first you have got the trends view
15:35 mupuf: each line represents a benchmark
15:35 mupuf: the x axis is commits, ordered by commit date
15:35 imirkin: i don't see that f97 commit anywhere
15:36 mupuf: ah, it did not fit in the X axis, so it is not shown
15:36 imirkin: mupuf: ohhhhh i see it now
15:36 imirkin: ok
15:36 imirkin: so
15:36 imirkin: here's the *BIG* problem
15:37 imirkin: multiple lines are overlayed with one another
15:37 imirkin: bad idea.
15:37 imirkin: offset them artificially. something.
15:37 mupuf: hmm, but then it would look weird :s
15:37 imirkin: also once you click, you can't make the popup disappear again
15:37 imirkin: better weird than invisible
15:37 mupuf: so, you can click on the benchmark name of interest
15:38 mupuf: and it will hide all the other lines
15:38 mupuf: you can select multiple benchmarks too
15:38 mupuf: as for the tooltip, I will add a "hide" link soon
15:38 imirkin: ah it's a toggle thing
15:38 mupuf: the google charts are not well prepared for such giant tooltips :s
15:38 mupuf: yep
15:38 imirkin: yeah. you have a *few* non-obvious bits in there
15:38 mupuf: as I said, bad usabilityh
15:39 mupuf: the UI is mostly a prototype, someone will likely re-do it
15:40 imirkin: that is seriously weird though.
15:40 imirkin: i also wonder if fermi and kepler have diff perf characteristics
15:40 gnurou: jeremySal: if you want to work on improving support for your GTX960, NVIDIA will soon release the firmware files required for 3d to work and the kernel loader for them - so be sure to stay tuned
15:41 gnurou: (it has been "soon" for a while, but this time it's for real)
15:41 imirkin: gnurou: what's the over/under on 3 months?
15:42 gnurou: imirkin: well things tend to always get in the way at the last minute, but I think we can safely switch the unit from months to weeks
15:42 imirkin: so... what's the over/under on 15 weeks? :)
15:44 mupuf: imirkin: I guess it is easier to test that now!
15:44 mupuf: But yeah, no time to check that now, but it is definitely something I can easily check soon
15:44 gnurou: imirkin: I cannot give probabilities, even times estimates make me feel uneasy :P
15:45 gnurou: but it's working on my (debug) board, and the IP questions are cleaned, so now I'm just waiting for a production signature
15:49 imirkin: gnurou: ok :)
15:51 gnurou:is afraid to check his email every morning and find another setback
15:56 imirkin: mwk: btw, anything you know about IMAD that would make it be slower than IMUL + IADD?
15:56 imirkin: mwk: on, at least, GF119
16:51 imirkin: w00t, fermi has a SUQ op!
16:51 imirkin: and it works too
17:16 glennk: imirkin, is that a bazaar instruction?
17:23 Jayhost: Internet says SUQ is for surface query
18:23 Jayhost: Anyone want to give their reasons for using or developing nouveau or just Foss in general?
18:26 mooch: jayhost: i work on emulators for preservation of the hardware
18:26 mooch: and that's about the only FOSS thing i work on
18:28 karlmag: Jayhost: what's the motivation for your question?
18:33 Jayhost: karl curiosity
18:34 karlmag: fair enough
18:37 karlmag: a bit complex question to answer.
18:40 Jayhost: mooch archeology! nice
18:41 karlmag: I guess "the right thing to do" sounds both a little naive and is far from the whole story.
18:43 karlmag: But yeah, it can be a good way of keeping old hardware going for a long time. Not like you usually have; "sorry, no more support" and you can toss the hardware, because you can't use it in practice.
18:43 karlmag: That is also, of course, a story not without variations, but still..
18:44 karlmag: You - to a lesser degree - trust your data to black boxes.
18:45 karlmag: eh.. that was badly written; should be "you don't trust your data to black boxes as much"
18:48 karlmag: Personally I think it's sad that nvidia have gone the route they have. Pragmatically I guess I can't blame them, but that doesn't mean I agree with it. It would be better to have full, open access to the hardware so one could have full, free use of it (hardware is good in itself). And who knows what nice tricks other would have come up with if they could concentrate on making the software and drivers better, instead of having to waste the tim
18:49 karlmag: (if that doesn't make sense I chalk that up as a tired rant :-P )
18:50 Jayhost: It does make sense.
18:52 karlmag: Jayhost: how about your reason(s) ?
18:54 Jayhost: I guess I'm curious because I'm trying to put the pieces together myself
18:56 Jayhost: I'm quite interested in the contrast between popular capitalists like gates,jobs vs stallman
18:57 karlmag: You hare mentioning people with quite opposite ideal there..
18:58 karlmag: Stallman is so uncompromising that lots (most?) people have problems swallowing his ideas.
18:59 karlmag: I find that a bit unfortunate, but I'll give him credit for sticking with what he says (and been saying for years and years)
19:01 karlmag: the mantra of gates, and maybe jobs in particular seemes more to have been "make money no matter what".
19:02 karlmag: Thought they may not have been (or is) the worst in their class either.
19:03 Jayhost: Yeah I'm fascinated by what people have to say about it
19:03 karlmag: Then you have people in general either not knowing what they want or maybe not even caring what they get, even if it means they're giving up their freedoms for it.
19:04 koz_: karlmag: Yeah, this is definitely true.
19:04 koz_: 99% of people don't seem to believe that software freedoms apply to them or give them any benefit.
19:04 koz_: s/apply/don't apply
19:05 Jayhost: Yeah it's quite strange how the education system bought all these closed systems and people I know personally don't have any idea what any of it means
19:05 karlmag: Nope.. so now there are a few huge companies that knows more or less everything about everyone. And most people give them that info for free too. Knowingly or not (not caring in most part I'm guessing)
19:06 koz_: karlmag: Not caring most likely. I've had some people I *greatly* respect basically say stuff 'privacy's dead, so I might as well get some convenience out of it'.
19:06 koz_: Jayhost: They bought it because the corporate world won and nobody understands technology in the general case.
19:07 karlmag: I am wondering how it will turn out over time. Certainly will become worse before it gets better. Which way I don't know.
19:10 karlmag: and to try to get a bit more back into being on topic; earlier today I did look a bit at opencl and related stuff, concluding; it could have been easier. (Including "why having to use closed drivers for it", but not limited to that)
19:32 Jayhost: koz bummer
19:59 Jayhost: karlmag topic = technical discussion?
22:06 Jayhost: echo 64000 > /sys/kernel/debug/tracing/buffer_size_kb
22:06 Jayhost: Enable the mmio tracer, and start recording the log:
22:06 Jayhost: root@Ein:~# echo mmiotrace > /sys/kernel/debug/tracing/current_tracer
22:07 Jayhost: wth
22:10 Jayhost: this computer is trippin. Pastin autonamously and switching locales
22:23 Jayhost: Config_mmiotrace not set in Debian Jessie 3.16 kernel.