00:00 karolherbst: "Sparse conditional constant propagation" also sounds like fun
00:20 karolherbst: well
00:20 karolherbst: 147: mul ftz f32 %r487 neg %r345 neg %r345
00:20 karolherbst: maybe we should remove those negs too
00:21 imirkin: we do
00:21 imirkin: at emit() time
00:21 karolherbst: ahh
00:22 imirkin: no real point before then
00:22 karolherbst: I can't think of a pass which may benefit from this anyway
00:22 karolherbst: added a new Pass now called "SmartCSE", but I am sure there is a much better name for that
00:23 karolherbst: wow
00:23 karolherbst: that was easy
00:23 karolherbst: https://gist.github.com/karolherbst/dd42e11f2aa99288faf863e0e5ac0305
00:23 karolherbst: gpr increase is to be expected I assume
00:24 imirkin: from lengthening live intervals? :)
00:24 imirkin: this is why all this stuff kinda has to be done together.
00:24 karolherbst: yeah well
00:24 imirkin: what we really need is GVN
00:24 karolherbst: thing is, the use of the things I found is like 4 BBs away...
00:25 karolherbst: yeah
00:25 imirkin: which would supersede such hacks
00:25 karolherbst: right
00:25 karolherbst: but GVN sounds hard
00:27 imirkin: yes. it is. but would be a great project
00:27 imirkin: both for nouveau, and as a learning experience for whoever did it.
00:27 karolherbst: yeah
00:28 karolherbst: I really should upstream all my patches at some point: https://gist.github.com/karolherbst/1bd8c53af272a7ada9e0be78b2bf0dce
00:28 karolherbst: :D
00:28 karolherbst: the gpr thing is just a bit annoying
00:34 karolherbst: now I was plain stupid
00:35 karolherbst: no, I can't do the same with mul I did with add
00:38 karolherbst: imirkin: is there a vfetch u96 or something like that?
00:39 karolherbst: "vfetch u64 { %r388 %r392 } a[0xa0] + vfetch u32 %r396 a[0xa8]" ==> vfetch u96 { %r388 %r392 %r396 } a[0xa0]
00:40 imirkin: karolherbst: insnCanLoad()
00:42 karolherbst: mhh?
00:44 karolherbst: anyway, this vertex shader is a big wtf in some places
00:44 karolherbst: https://gist.github.com/karolherbst/4bf07c3b7bd02834f55494c2adb506a5
00:49 imirkin: should probably see what that is...
00:49 imirkin: aha
00:49 imirkin: 0x7f7fffff
00:50 karolherbst: ... lol, another big thing
00:51 karolherbst: https://gist.github.com/karolherbst/876ec9af0f34349468f7ee2eb6d57614
00:51 karolherbst: but I thought I had something for that already...
00:52 karolherbst: nope
00:53 karolherbst: mhh what is true - false and false - true :/
05:27 mithro: karlmag: heyo, a while back you created me a special branch for my set up, it seems Chrome is now causing it to crash with the following output in dmesg -> http://paste.ubuntu.com/16452774/
08:11 orbea: I found myself needing to use the windows version of ppsspp to debug an issue with wine (Since their GE debugger is windows only so far) and found it froze everything whenever the GE debugger was started and there is a lot of this in dmesg http://dpaste.com/370XSDM Maybe similar to this? https://github.com/mpv-player/mpv/issues/2798
08:24 binarym: hi all ... i just got my brand new workstation with a Quadro K620. I followed the doc to extract firmware from nvidia proprietary driver but it looks like extract_firmware.py isn't aware of nv117_* firmware ... any idea ?
08:25 binarym: nouveau driver doesn't load without these: [ 5.253718] nouveau 0000:04:00.0: firmware: failed to load nouveau/nv117_fuc409c (-2)
08:38 karolherbst: binarym: you need the firmware only for vdpau
08:38 binarym: karolherbst: hmmm really ? cause at launch, Xorg don't find my adapter
08:39 karolherbst: binarym: then xorg log please
08:39 binarym: i was running debian stable. Update to testing on-going ...
08:39 karolherbst: binarym: uhhh
08:39 karolherbst: binarym: stable should be too old
08:39 binarym: yeah, i guess so
08:39 karolherbst: you want to have at least mesa 11.2
08:40 karolherbst: binarym: anyway, there is no vdpau support on these because there is no VP6 support so far
08:41 binarym: i'm not very aware of "vdpau" and "VP6" ... does this mean i won't be able to run nouveau driver for my Quadro K620 ?
08:41 karolherbst: vdpau is used for video playback
08:41 karolherbst: what also can be done on the CPU
08:41 karolherbst: gpus just tend to be faster here, allthough the difference doesn't matter with a normal CPU
08:41 binarym: hmmm ok ... so it's not a problem since my workstation run with a Xeon
08:41 binarym: :)
08:42 karolherbst: shouldn't be
08:42 karolherbst: maybe some 4k videos could be a bit slow...
08:42 karolherbst: no idea though
08:42 karolherbst: I just now that on my haswell i7 fullhd videos are running just fine on the CPU
08:43 binarym: i have another question ... is nouveau xen compliant ?
08:44 karolherbst: binarym: regarding performance (allthough your quadro isn't really fast to begin with): I have some expermental patches for maxwell to enable memory reclocking there, might work, might not work
08:44 karolherbst: binarym: what do you mean?
08:44 binarym: the nvidia non free driver isn't compatible with Xen kernel
08:44 binarym: that's why i rollback to nouveau
08:44 binarym: i need Xen (to run that shitty Windows)
08:45 karolherbst: why don't you use qemu with kvm?
08:46 karolherbst: no idea how xen works if you want to offload graphics
08:46 karolherbst: because for opengl you usually needed like a running X server
08:46 karolherbst: maybe it is possible to get around with EGL, but... never even tried that and no idea if that would even work
08:47 karolherbst: binarym: do you know if you have Vt-d support on your board and CPU?
08:47 mgottschlag: shouldn't all linux drivers be trivially compatible to Xen Dom0? the physical memory mapping should be an 1:1 mapping (in case of paravirtualization)
08:47 mgottschlag: ah, wait, Windows. No paravirt then.
08:47 karolherbst: mgottschlag: exactly
08:48 karolherbst: binarym: with Vt-d you might get around using a linux driver at all and just use the windows nvidia driver in your vm
08:48 mgottschlag: although, Dom0 should use an 1:1 mapping anyways
08:48 karolherbst: mgottschlag: yeah, but that doesn't help you if you want to run windows :D
08:48 binarym: karolherbst: in fact, i wanted to run Xen so that the hardware is the same on the Windows VM than on the host (essentially to avoid hardware mismatch that would broke my windows installation)
08:49 karolherbst: right...
08:49 binarym: cause i can't reinstall my windows by myself (IT service, you know ... i'm not even admin on the windows :-( )
08:49 mgottschlag: Xen DomU usually uses a qemu device model, so it is nothing like the host
08:49 karolherbst: and graphics are an entire different beast to begin with
08:50 binarym: hmmm
08:50 binarym: si
08:50 binarym: so it won't works, even in Xen ?
08:50 binarym: doooo
08:50 binarym: :(
08:50 karolherbst: does xen support Vt-d?
08:51 karolherbst: because with Vt-d you could load a bunch of pci-stubs for your pcie devices and pass them right into your windows vm (and use the windows driver for those devices)
08:51 binarym: looks so http://wiki.xen.org/wiki/VTd_HowTo
08:51 binarym: but i guess that if i dedicated that pci devices to my window vm, they are not usable anymore on linux ?
08:51 mgottschlag: although, if the hardware is supposed to look identical, things like the hdd controller will be problematic
08:51 mgottschlag: exactly
08:52 karolherbst: yeah well, who cares about the hdd controller :D
08:52 karolherbst: that is something the guest shouldn't use for... _various_ reasons :D
08:53 binarym: ok ... btw, thanks a lot for all feedback. Debian updated ... time to reboot :p
09:00 jhogarth: I'm trying to determine the current state of NV117 (GM107) and it's a little confusing. My last understanding was mesa >= 11.2, kernel >= 4.6 and linux-firmware with the nvidia signed firmware shoudl have nouveau working on it ... but on F24 beta with rawhide kernel and an updated linux-firmware the Xorg log reports "Unknown chipset: NV117"
09:01 jhogarth: looking through the freedesktop bugzilla and the nouveau git commit log I see a commit specifically blacklisting NV110 series and tehre was reference to usign the modesetting driver instead
09:02 jhogarth: it's a laptop with both intel skylake and nvidia 960m ... and when running on Xorg instead of wayland it appears that the intel chipset is being driven by modesetting (according to xrandr --listproviders) and glxinfo confirms intel mesa acceleration is in use
09:03 jhogarth: is this nvidia chip expected to be working yet on this configuration, and if so what am I missing to pushing it into working with PRIME ?
09:05 karolherbst: jhogarth: you get the error in xorg or dmesg?
09:08 karolherbst: ohh xorg log
09:08 karolherbst: jhogarth: well yeah, you get that error when the nouveau ddx gets loaded, but modesetting should pick up the gpu instead so it should work anyway
09:12 jhogarth: karolherbst, huh it doesn't appear to ... xrandr --listproviders only shows one provider and that's the intel chip, not the nvidia one
09:13 jhogarth: (and it shows it using modesetting)
09:13 karolherbst: I never done offloading using modesetting on intel, might not work because both use modesetting
09:13 karolherbst: jhogarth: anyway, just run DRI_PRIME=1 glxinfo
09:14 karolherbst: I think the modesetting ddx might support dri3 offloading and then you won't have to deal with something stupid like setting up the offloading
09:17 jhogarth: karolherbst, huh ... I didn't try DRI_PRIME=0/1 with only the one provider listed ... but http://pastebin.com/aJn22pZX ... intriguing
09:18 karolherbst: :D
09:18 karolherbst: strange things are happening sometimes
09:19 jhogarth: karolherbst, and vgaswitcheroo shows it switching between DynPwr and DynOff correctly .... this makes me happy :P
09:20 jhogarth: tonigth i'll give it some more painful testing (ie gaming) and see how nouveau stands up on it ;) bumblebee doesn't like F24 at present so I can't g NV driver as of yet ;)
09:20 karolherbst: jhogarth: well, there is no reclocking yet
09:20 karolherbst: jhogarth: so performance should be.. bad
09:21 jhogarth: karolherbst, ah well ... i'll keep an eye on commits and mailing list for an eye on when that lands ... at least i've got a good baseline testbed now :)
09:21 jhogarth: and given this is a work laptop I spend most of my time on the intel anyway ...
09:45 karolherbst: are those equal?
09:45 karolherbst: tex 2D $r13 $s0 f32 { %r2947 %r2948 %r2949 } %r2943 %r2945
09:45 karolherbst: tex 2D $r14 $s0 f32 { %r2957 %r2958 %r2959 } %r2943 %r2945
12:33 karolherbst: ehm ... phi u32 %r153 %r145 %r136 + phi u32 %r154 %r145 %r136
12:33 karolherbst: :D
12:57 mwk: whee, loads and stores are working
12:57 karolherbst: 55.3ms frame time in pixmark piano now :O
12:59 mwk: okay, let's try truncation+extension
12:59 mwk: I have a feeling this is going to be a mess
13:00 karolherbst: nvidia at 43.4ms
13:00 karolherbst: mwk: awesome :)
13:01 karolherbst: stock nouveau at 57.7ms
13:03 mupuf: karolherbst: what was the previous perf
13:03 mupuf: ?
13:03 karolherbst: ^
13:04 karolherbst: 57.7 -> 55.3
13:05 karolherbst: with my Smart CSE I also found some instructions I could cut away in pixmark piano
13:05 karolherbst: so I am at around 3650 instructions now
13:05 karolherbst: stock is at 3850? or something
13:08 karolherbst: mhh not that bad: 3771 -> 3654
13:10 karolherbst: I see a lot of shaders doing min and max on the same sources :/
13:34 karolherbst: huh, shouldn't local CSE pick that up? "mad f32 %r551 %r547 %r483 %r541 + mad f32 %r552 %r547 %r483 %r541" ?
13:35 karolherbst: uhh, merge loads
13:35 karolherbst: sounds nice
13:43 mwk: hmm, can I actually tell LLVM to Promote my Constants?
13:43 mwk: let's see
13:44 mwk: apparently not...
13:54 mwk: truncation works
13:54 mwk: let's see about these sign/zero extensions
14:18 karolherbst: the hell
14:18 karolherbst: ...
14:18 karolherbst: phis are funny
14:24 mwk: hehe
14:26 mwk: I made a transform that changes i32 ZERO_EXTEND (i16 x) to i32 AND (i32 ANY_EXTEND (i16 x), 0xffff)
14:26 mwk: someone else made a DAG combiner that does the exact reverse operation
14:26 mwk: and the whole thing went into infinite loop
14:26 karolherbst: :D
14:27 mwk: alright, it works now
14:27 mwk: I can zero-extend stuff
14:27 mwk: let's do sign ext
14:36 karolherbst: mupuf: 54.4 ms now :)
14:40 tajjada: yaaay I managed to change my physical output setup in a way that allows me to change input source on the monitor without Linux seeing the monitor as disconnecting
14:40 tajjada: no monitor hotplugging support needed for me to use sway now!
14:53 karolherbst: well. this we could do at compile time I think "mov u32 $r0 0xbf000000 + floor ftz f32 $r0 $r0"
14:54 karolherbst: ohh wait
14:54 karolherbst: I missread the SSA form
15:31 karolherbst: I wonder what would happen if we just move like at least 7 instructions between source and use :/
15:52 imirkin: skeggsb: got a register on maxwell that looks good for TFB_UNFUCKUP_OFFSET_QUERIES ?
15:53 skeggsb: imirkin: we already should touch it actually...
15:53 imirkin: hakzsam: what kernel were you running on?
15:54 imirkin: skeggsb: to be clear, this is GM107, not GM20x
15:54 skeggsb: yep, doesn't matter, we bash it on both
15:55 imirkin: well, hakzsam ran deqp on the GM107 and all the TF tests fail :(
15:55 hakzsam: imirkin, Linux reator 4.5.0-rc7-NV+
15:55 imirkin: i haven't investigated further than that
15:55 imirkin: hakzsam: could you run any one of those TF tests with the deqp runner and provide the TestResults.qpa file?
15:56 hakzsam: imirkin, yeah, once the gf119 run is done
15:56 imirkin: hakzsam: but run it on the GM107, not GF119 - we know that GF119 needs the extra poke :)
15:56 hakzsam: sure
15:58 imirkin: [and feel free to poke it on reator so you get better pass rate]
15:59 hakzsam: I can try with and without the poke
16:13 karolherbst: RSpliet: your postRAconstantFolding pass is odd
16:13 karolherbst: RSpliet: I reordered some instructions and now it doesn't fold in the movs into the mads anymore :/
16:16 karolherbst: ohhh I get it now
16:16 karolherbst: I am weird
16:16 karolherbst: stupid constraints :D
17:50 mwk: ugh fuck
17:50 mwk: LLVM is not particularly happy about having more than one implicit result
17:51 mwk: which totally breaks my attempt to model each Falcon flag output separately
18:08 mooch: hey, is there any way to run nouveau drivers on dos?
18:08 mooch: i can't get nt4 or w9x drivers working
18:08 mooch: and i can't get a modern linux working either
18:09 mooch: for the record, i'm trying to emulate the riva tnt
18:09 imirkin_: i mean ... you could write an application that used the userspace library and have it run on dos (with a little difficulty)
18:10 imirkin_: you'd have to flip into 32-bit mode
18:10 mooch: well, i want an app that directly accesses the nv4 in dos
18:10 Calinou: https://imgs.xkcd.com/comics/supported_features.png
18:10 imirkin_: mooch: right, i realize that
18:10 imirkin_: there's a "userspace" version of nouveau
18:10 imirkin_: in skeggsb's nouveau tree
18:10 mooch: link?
18:11 imirkin_: which does everything the nouveau kernel module does, but in userspace
18:11 mwk: alright
18:11 imirkin_: mooch: https://github.com/skeggsb/nouveau/
18:11 imirkin_: mooch: look in "bin" for a few simpler programs
18:11 mooch: ah, okay
18:11 mwk: I've hammered LLVM's TableGen a bit and now it supports multiple implicit defs
18:11 imirkin_: the trick will be getting them to run on dos
18:11 mwk: I have a feeling it was way too simple for something marked with 3 FIXMEs in the source, I wonder what just blew up
18:12 imirkin_: you'd have to flip into 32-bit mode, and provided hooks to implement stuff like finding the PCI device and mapping the BAR's
18:12 imirkin_: mooch: you'd have a much easier time feeding that PCI device to linux
18:13 mooch: yeah, but the problem is that linux relies on busmaster dmas involving atapi drives and those aren't implemented
18:13 imirkin_: uhhhh
18:13 imirkin_: well, you can always disable bmdma
18:14 mooch: how?
18:14 imirkin_: i don't remember
18:14 imirkin_: i do remember that a lot of earlier controllers had issues with it
18:14 mupuf: imirkin_: the userspace driver has no memory management
18:14 mupuf: and not sure we can send commands to it either
18:14 imirkin_: i don't think he's looking for memory management :)
18:15 mooch: *she
18:15 mupuf: mooch: what about you looking for then? :)
18:15 mupuf: what are you*
18:15 mooch: mupuf, i'm trying to emulate the nv4
18:15 mupuf: :o
18:15 imirkin_: mooch: drivers/ata/Kconfig:config ATA_BMDMA
18:16 imirkin_: you could disable it :)
18:16 mooch: and i can't get the nt4 drivers to stop hanging
18:16 mooch: imirkin, i only have live cds
18:16 mwk: huh, another nv emulation project
18:16 mwk: that makes three now, right?
18:17 mooch: mwk: and mine's the only one that can emulate the SVGA interface. :^)
18:17 mwk: the other ones were for nv2a and nv3 though
18:17 mooch: who was doing nv3?
18:17 mwk: hm
18:17 mwk: I should have that in logs...
18:17 mooch: i was, but then i switched to nv4
18:17 mooch: plus, there's ps3 emulation
18:18 mupuf: mooch: emulation ... in sw or in fpga?
18:18 mwk: ah right
18:18 mooch: sw
18:18 mwk: you had a longer nick back then, it confused me :)
18:18 mooch: eh
18:18 mupuf: mooch: I assume you have seen mwk's hw tests, right?
18:19 mooch: the ones for vp1
18:19 mupuf: they are some sort of emulator already
18:19 mwk: mupuf: exacly 0 of them apply to nv4 :)
18:19 mupuf: there are more tests, aren't there?
18:19 mooch: nt4 is hanging on pfifo stuff tho
18:19 mupuf: well, too bad
18:19 mwk: there are some for nv1 PGRAPH
18:19 mwk: some bits of these may apply for nv4
18:20 mwk: but... the whole PGRAPH interface is different, so nothing that could be used out of the box
18:20 mooch: well, the main part i need help with is pfifo
18:22 mwk: mupuf: so anyhow... you're not too used to envydis syntax, right?
18:22 mupuf: mooch: nope
18:22 mupuf: mwk: nope
18:22 mwk: I suppose I could replicate the syntax with LLVM if I tried hard enough
18:22 mupuf: did I introduce a bug when fiddling with fucv5?
18:22 mwk: but I'd rather not :p
18:23 mwk: yeah, there are problems in the v5 code
18:23 mwk: for one, you forgot to mark some instructions as v5-only
18:23 mwk: I'll fix it later, once we have Falcon hwtest
18:24 mooch: mwk: nt4 does this in a loop http://pastebin.com/uZDGF8jd
18:24 mooch: and it just hangs like that
18:25 mwk: mooch: could you also print what values it reads?
18:25 mooch: this isn't from hardware, this is from an emulator
18:25 mooch: so i have no clue what it's supposed to read
18:25 mwk: ok, but what does it read?
18:26 mooch: zeroes for unimplemented registers
18:26 mwk: ISTM it's trying to handle an interrupt and failing badly
18:26 mwk: PFIFO_INTR0 is unimplemented?
18:27 mwk: and so is PFIFO_CACHE1_STATUS, I assume?
18:27 mooch: PFIFO_INTR0 is not unimplemented
18:27 imirkin_: mwk: do note that there's a bunch of stuff written in the current envydis syntax
18:27 mooch: that's implemented too
18:27 mwk: imirkin_: yeah... the question is if we want to mix them
18:28 imirkin_: i think it's fine for each tool to have its own syntax
18:28 imirkin_: you can always move back and forth by decoding the binary
18:29 mooch: mwk: here, i'll just give you my code
18:29 mwk: I mean, if you want to link the current assembly and new C, either envydis needs to grow ELF support, or LLVM needs to support the old syntax
18:29 mooch: mwk: https://github.com/MoochMcGee/PCem-mooch/blob/master/src/vid_nv_rivatnt.c
18:31 mwk: mooch: you handle PFIFO_INTR writes wrong
18:31 mwk: PFIFO_INTR (and most other _INTR registers) has 1-to-reset write semantics
18:32 mwk: ie. if you write 0x1000 to it, it clears bit 12
18:32 mwk: and leaves other bits unchanged
18:32 mooch: oh, weird
18:33 mooch: so should i just or the written value with the stored value?
18:33 mwk: no
18:33 mwk: you need to do intr &= ~written_val;
18:33 mooch: ah, okay
18:33 mooch: thanks
18:34 mwk: *sigh*
18:34 mwk: I need a beefier machine
18:34 imirkin_: there are a few of those types of things all over too, not just intr. there are write bits, clear bits, etc
18:34 mwk:removed a debug printf from TableGen code, 1200 C++ files to recompile
18:34 mwk:goes to make a coffee
18:37 mooch: okay, that's been fixed, any other bugs?
18:37 karolherbst: mhh 54.7ms => 54.4 in pixmark_piano just by enabling the NV50PostRaConstantFolding pass :/
18:40 karolherbst: imirkin_: do you think it would be possible to teach RA about specific constraints? Like when having an mad instruction, choose the same reg for dest and src2
18:40 imirkin_: karolherbst: i've already done that
18:40 karolherbst: ohh
18:40 imirkin_: but only on nv50
18:41 karolherbst: I see
18:41 imirkin_: coz i didn't realize it was a restriction on nvc0 at the time
18:41 karolherbst: so on newer ISAs we would depend on good luck for now
18:41 imirkin_: karolherbst: https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp#n1470
18:42 imirkin_: karolherbst: just nuke the chipset restriction
18:42 karolherbst: I have no idea if we can do that on gk110+
18:43 Tom^: well i can test that no cant i
18:43 Tom^: :D
18:43 Tom^: *now
18:44 imirkin_: karolherbst: oh, but i think the restriction is a bit different there...
18:44 imirkin_: karolherbst: anyways... would have to figure it out
18:44 imirkin_: karolherbst: will also need some post-RA merging, since RA won't always decide to do what you wanted
18:44 karolherbst: imirkin_: yeah, we talked about that way back and we end up just enabling it for nvc0 isa
18:45 karolherbst: what do you mean?
18:45 imirkin_: no
18:45 imirkin_: there's a FMAD32I
18:45 imirkin_: which requires that dst == src2
18:45 imirkin_: so we never emit that
18:47 karolherbst: I was asking about the "post-RA merging" part though
18:51 karolherbst: imirkin_: enabling that RA thing for my GPU: total instructions in shared programs : 1824203 -> 1822604 (-0.09%) and total gprs used in shared programs : 218621 -> 218521 (-0.05%)
18:51 mooch: i just wish more people would contribute to my nv4 emulation project...
18:52 karolherbst: mooch: no idea how serious you are
18:52 imirkin_:wishes more people would contribute to nouveau
18:52 mooch: pretty serious, actually
18:52 mooch: i've got quake running in 1024x768
18:52 RSpliet: karolherbst: he's been around for quite a while now ;-)
18:52 karolherbst: yeah I know
18:52 mooch: *she
18:53 RSpliet: whoops, excuse me :-D
18:53 karolherbst: well
18:53 Calinou: if people stopped caring about legacy software, innovation would be like… twice as fast?
18:53 Calinou: :|
18:53 karolherbst: mooch: no offense meant, but nv4 is quite ancient and well, the practical use cases might be non existent. At least I don't see the point in doing this besides for educational reasons
18:54 Calinou: if you're doing educational things, try to make more useful things
18:54 Calinou: like, developing a game or game engine
18:54 mooch: karolherbst, i just want to preserve the card.
18:54 mooch: for history's sake!
18:54 karolherbst: well
18:54 imirkin_: Calinou: and you're the sole judge of useful?
18:54 Calinou: if you think DOS can be useful in 2016… :)
18:55 mooch: eventually, i'll try to at least get up to NV10 emulated
18:55 Calinou: there are plenty of missing bricks in free software
18:55 Calinou: it'd be nice if most of them were fixed before I die :D
18:55 karolherbst: mooch: well, my point is just, that time might be better spent, but if you like the work and continue with that, I don't want to stop you, I just dont quite get why you do this ;)
18:55 karolherbst: Calinou: :D
18:55 imirkin_: Calinou: i assume you're busy fixing some of those?
18:55 mooch: karolherbst, well, it's my passion
18:55 mooch: i just need some help is all. :/
18:55 RSpliet: karolherbst: I don't think there's a better way to learn about GPU hardware than to design it. Emulating comes pretty darn close ;-)
18:55 Calinou: imirkin_: yes, I work on open source games
18:55 Calinou: it takes time, but eventually it'll succeed
18:56 karolherbst: https://gist.github.com/karolherbst/1bd8c53af272a7ada9e0be78b2bf0dce :)
18:56 imirkin_: Calinou: so perhaps you're a bit biased in saying that a game engine is more useful than working on nv4 emu?
18:56 Calinou: surely
18:56 karolherbst: RSpliet: yeah, that's why I ecxplicitly stated that for education this is indeed a valid task
18:57 karolherbst: mooch: well as long as you are having fun it is fine by me :)
18:57 mwk: I, for one, would love to preserve old hardware too
18:57 mooch: mwk: then why don't you help?
18:57 karolherbst: mooch: I just don't like seeing people wasting time which useless tasks, that's all
18:57 mooch: karolherbst, i don't consider preserving old hardware in software useless
18:58 RSpliet: mooch: I think mwk has helped more than you think, generating most of envytools' documentation ;-)
18:58 mooch: no i mean
18:58 karolherbst: mooch: I don't consider what you do useless because it has a valid purpose
18:58 mooch: specifically helping with the emulator
18:58 mooch: i mean, the docs i have can be pretty obtuse at times. :/
18:58 mooch: keep in mind, i'm just 17
18:58 karolherbst: I am sure mwk knowledge is better used elsewhere :D
18:59 mwk: karolherbst: maybe I could be the judge of that...
18:59 mwk: anyhow
18:59 karolherbst: anyway, he is busy writing llvm based stuff for falcons
18:59 karolherbst: which is pretty usefull
18:59 karolherbst: :p
18:59 mwk: I've done my share of "emulating" nv cards
18:59 mooch: oh? how so?
18:59 mwk: haven't got very far with the nv1
18:59 mwk: basically, you never have enough time
18:59 mooch: mwk: maybe you'd get further with a more solid base. :^)
19:00 mwk: what solid base?
19:00 mooch: well, my implementation can at least do SVGA well
19:00 Calinou: mooch: wow, you're only 17 and doing all this? nice
19:00 mooch: except 1280x1024
19:00 Calinou: I'm 18 and started doing game development last year
19:00 mooch: but that's not a bug in my code
19:00 mooch: i checked
19:01 imirkin_: it does seem like there are a surprising number of youngin's who are interested in this ancient stuff
19:01 mwk: mooch: my problem is really that I want to do it bit-perfect
19:01 mwk: and there's never enough time for that
19:01 mooch: i'm fine with that!
19:01 mooch: BUT
19:01 mooch: there should be a fast path for those who need moar performance
19:02 mwk: I have the whole test harness that runs stuff on real hw and on software, looking for any difference
19:02 mwk: getting anything non-trival right is rather involved
19:02 mooch: this is true
19:02 mwk: but I did manage to reconstruct the NV1 raster op pipeline and rasterization rules :)
19:03 mooch: that's why i can't even get win9x to use the tnt drivers
19:03 mwk: yup
19:03 mooch: some weird RMA bug, i suspect
19:06 mooch: weird, the emulator's gotten much more... crashy...
19:16 mwk: mooch: I don't know how far you're getting with pfifo, but... you also have ramht and ramfc addresses wrong
19:16 mooch: ah shit, really?
19:16 mwk: (val & 0xf0) for ramht should be (val & 0x1f0)
19:17 mwk: and it should be shifted by 8, not by 12 (since it's already at bit pos 4)
19:17 mwk: likewise for ramfc and ramro
19:23 mooch: mwk: what should ramfc and ramro be shifted by?
19:24 mwk: 4
19:25 mooch: k
19:26 mwk: hmm
19:26 mwk: I really wonder if using the predicate registers on Falcon is a good idea
19:27 mooch: k mwk, that's fixed and pushed
19:27 mwk: I suppose I'll run away screaming real soon
19:27 mooch: oh? why?
19:27 imirkin_: coffee's too hot
19:27 mooch: lol
19:27 mwk: coffee's cold
19:27 mwk: that's a bit of a problem
19:28 mwk: maybe I should just make another one
19:28 mupuf: mwk: put it on your cpu, you must be using it extensively at the moment
19:32 karolherbst: anybody a very good idea how we could increase the achieved occupancy?
19:34 glennk: karolherbst, comparing against the blob?
19:34 karolherbst: already was there :/
19:34 karolherbst: it is a mess
19:34 karolherbst: because they schedule for real and sometimes there are like 20 instructions between use and source
19:34 karolherbst: mupuf: 54ms now :)
19:34 mupuf: karolherbst: and I guess it is hard to implement a real scheduler?
19:35 karolherbst: somewhat
19:35 glennk: scheduling tends to do that to code...
19:35 glennk: at least its not vliw!
19:35 mupuf: karolherbst: 3.7ms down, 10 more to go :D
19:35 karolherbst: yeah :D
19:35 karolherbst: though this is pretty good already
19:35 karolherbst: above 80% perf
19:36 glennk: mupuf, nuc's make good coffee warmers in my experience
19:36 karolherbst: now I turn off my scheduler and see what changes
19:36 mupuf: yeah, it is definitely getting there!
19:36 mupuf: glennk: really? My NUCs are pretty cold
19:36 mupuf: but I think I only got i3 nucs
19:36 glennk: including the power supply?
19:36 mupuf: hmm, good question
19:37 karolherbst: mhh like my scheduler didn't change a thing :/
19:37 glennk: the old celeron ones were particularly toasty
19:38 glennk: karolherbst, isn't there also the funky thread vote thing for somewhat divergent branches?
19:38 karolherbst: mupuf: well random scheduling should be funny again
19:38 karolherbst: glennk: no clue
19:38 imirkin_: glennk: yeah, we haven't implemented that.
19:38 karolherbst: now I am curious
19:38 mupuf: karolherbst: aren't there articles on how to do scheduling?
19:38 karolherbst: glennk: but that's not the issue
19:38 karolherbst: mupuf: well maybe? who knows if that's good for gpus?
19:39 mupuf: spacing the memory fetch from the user is typicall
19:39 glennk: curious to know how you know thats not karolherbst ?
19:40 mupuf: and useful for cpus and gpus
19:40 karolherbst: glennk: ?
19:40 karolherbst: glennk: ohh
19:40 karolherbst: glennk: well, because branch efficiency is like 90%+
19:41 karolherbst: glennk: and achived occupancy is like 20% if it comes to the worst in SR3
19:41 karolherbst: glennk: and there is a strong corelation between shitty perf and low occupancy
19:41 mupuf: ah ah!
19:41 mupuf: of course
19:41 mupuf: and what are the reasons for a low occupancy?
19:42 karolherbst: isn't this about latencies of instructions
19:42 karolherbst: and filling the waits with other stuff?
19:42 glennk: i'd probably look at memory bank conflicts
19:42 mupuf: there are also divergent branches
19:44 karolherbst: yeah, but that doesn't looks like the problem
19:44 karolherbst: mupuf: as I said: branch efficiency is like 90%
19:45 mupuf: ok, then yeah, I would check information about caches and memory accesses
19:45 mupuf: and making sure you manage to have big memory bursts
19:45 glennk: well fwiw 90% hit ratio in a cpu branch predictor is considered "bad"
19:46 karolherbst: yeah well
19:46 karolherbst: but how much more perf would 95% give you?
19:46 glennk: depends on how costly a stall is
19:47 karolherbst: yeah well, but nvidia has not like 50% more perf, but more like 5x more perf
19:48 glennk: so start looking at the memory subsystem
19:55 karolherbst: glennk: any metrics you can suggest for that?
19:56 karolherbst: mhh
19:56 karolherbst: hakzsam: l1_global_load_miss is always 0?
19:57 karolherbst: and l1_global_load_hit is also always 0
19:57 glennk: good question, depends on what the shader is doing
19:58 karolherbst: glennk: well eon based game
19:58 karolherbst: they usually do a lot of computational stuff and texture things D:
19:58 karolherbst: :D
19:59 glennk: atomics, shared memory?
19:59 hakzsam: karolherbst, those are buggy for a long time ago... they even didn't ever work because a mux has to be enabled and the current interface doesn't allow to do that yet (part of my perf counters work)
20:00 hakzsam: so, expected results
20:00 karolherbst: hakzsam: ahh okay
20:00 karolherbst: "uncached_global_load_transaction" sounds interessting
20:00 karolherbst: everything should be cached, right? :D
20:01 karolherbst: I guess this also falls under the same category
20:02 karolherbst: hakzsam: maybe you have any ideas where I could check why occupancy is really really bad
20:03 hakzsam: karolherbst, well, have a look at the memory subsystem is probably a good target
20:03 hakzsam: and the number of GPRs might also be one reason of low occupancy
20:04 karolherbst: hakzsam: well, if it is something memory related, we should have a pretty decent memory load, right?
20:05 hakzsam: the thing is to increase the number of active warps, but a low occupancy doesn't always say bad performance
20:05 hakzsam: I mean having 80% of occupancy is very nice
20:05 hakzsam: 100% is theory
20:06 hakzsam: karolherbst, we should
20:06 karolherbst: hakzsam: well, in SR3 we usually have between 20% and 40&
20:06 hakzsam: but unfortunately, we still don't have all memory-related perf counters
20:06 hakzsam: yeah, it's quite bad
20:06 karolherbst: memory load is like 9% at most
20:07 hakzsam: do you have a trace somewhere btw ?
20:08 karolherbst: memory clock doesnt matter
20:08 karolherbst: just clocked form 4GHz to 1.6GHz
20:08 karolherbst: 18->17 fps
20:08 hakzsam: mmh
20:08 karolherbst: hakzsam: yeah, a 6.6GB one
20:08 karolherbst: hakzsam: and apitrace has 100% cpu load
20:08 karolherbst: :D
20:08 karolherbst: mhh
20:08 hakzsam: did you try to replay it with LGD?
20:08 karolherbst: achieved occupancy increases on downclock though
20:09 karolherbst: 40->57%
20:09 karolherbst: lgd?
20:09 hakzsam: the nvidia graphics debugger
20:09 hakzsam: or NGD or whatever :)
20:09 hakzsam: (to monitor perf counters on blob)
20:10 hakzsam: karolherbst, trace link please?
20:11 karolherbst: k
20:12 karolherbst: https://drive.google.com/open?id=0B78S7GSrzebId1pzOTQ4Y0FKRjg
20:12 karolherbst: I think this is the right one
20:13 hakzsam: karolherbst, well I think it should help you if I rebase my perf counters work even if it's not really stable (and sort of experimental)
20:13 mooch: imirkin, where are some linux distros that i can install to a hard drive that have bmdma turned off?
20:13 hakzsam: thanks
20:15 karolherbst: hakzsam: I think if we solve the perf issue with eon games, that would help quite a lot people, :) hopefully
20:16 hakzsam: karolherbst, I'll try to find time this week :)
20:16 karolherbst: awesome :)
20:16 imirkin_: mooch: no clue. i haven't used a distro kernel in a decade
20:17 imirkin_: (actually much longer than that)
20:19 karolherbst: so in the end nobody thinks that this is a memory problem in this game or a really odd one
20:19 hakzsam: karolherbst, awesome, 2 fps on my gf119 (without any reclocking)
20:19 karolherbst: :D
20:19 karolherbst: wait a little
20:20 karolherbst: it gets worse
20:20 karolherbst: wait until he goes up the stairs
20:20 hakzsam: okay
20:20 karolherbst: it is a super serious game by the way, maybe you already noticed :D
20:21 karolherbst: hakzsam: where did you had 2 fps?
20:22 hakzsam: at beginning
20:22 hakzsam: 1 fps actually
20:22 karolherbst: uhh...
20:22 hakzsam: that's the bare minimum :)
20:22 karolherbst: fun time replaying the trace
20:22 karolherbst: maybe it is done tomorrow night
20:22 hakzsam: lol yeah
20:23 karolherbst: because at the beginning I have like 50fps
20:23 hakzsam: but you have a kepler with reclocking enabled?
20:23 karolherbst: yeah
20:23 hakzsam: I have a fermi without reclocking
20:23 hakzsam: so...
20:23 karolherbst: right
20:23 hakzsam: this was just for testing though
20:24 hakzsam: I could try at work, I have a gk106 there
20:29 mooch: imirkin, then how am i supposed to try the linux drivers?
20:31 kung: is there any list of "nicely" supported GPU's?
20:31 karolherbst: kung: tesla+kepler
20:31 karolherbst: kung: but what you need? :D
20:32 kung: hm would be nice to have a gpu which is kinda silent, not so expensive and still quite fast
20:32 hakzsam: karolherbst, don't forget fermi ;)
20:32 imirkin_: mooch: huh? just boot a regular linux install in your vm...
20:32 karolherbst: hakzsam: well fermi misses reclocking
20:32 imirkin_: mooch: make sure you stick a kernel that disables bmdma on there...
20:33 mooch: ah, okay
20:33 karolherbst: kung: then a kepler would be good
20:33 hakzsam: karolherbst, yup, but except that it's good
20:33 karolherbst: kung: but usually with intel/amd you have bigger stability
20:33 karolherbst: jujust saying
20:33 kung: hm like crashing once a hour?
20:34 imirkin_: kung: if you're looking for serious gpu support on linux, you're going to be better off going with an AMD gpu
20:38 karolherbst: well or intel
20:39 imirkin_: "still quite fast" doesn't match intel hw...
20:48 vita_cell: the bad, is that. AMD gpu uses blobs
20:48 imirkin_: blob has the implication that it's software that runs on the cpu. under that definition, there are no blobs.
20:50 vita_cell: but in linux kernel you must to use that non-free software for AMD gpu
20:50 vita_cell: AMD gpu almost doesn't work witout that blob
20:50 imirkin_: nope. the linux kernel does not execute that "software". it's firmware, and it's executed by the gpu's command processor.
20:50 imirkin_: much as intel/amd cpu's don't work without their own microcode firmware.
20:51 vita_cell: just try deblobed kernel, and tell me if it runs fine
20:51 mupuf: vita_cell: you are missing imirkin_'s point entirely
20:51 mupuf: vita_cell: what's the difference between microcode and fixed logic?
20:52 imirkin_: vita_cell: just try to remove random lines of code in the kernel and see if it still works.
20:53 vita_cell: I hate non-free software, with Kepler I can run entirely free software OS, with deblobed kernel, but it is not possible with AMD gpu
20:53 imirkin_: you're just deluding yourself then
20:53 imirkin_: as long as you run it on a Pentium-IV or later (iirc), your CPU has microcode in it, which is highly closed.
20:53 vita_cell: Yes, I know that Intel/AMD CPUs have a microcode, but this microcode is built-in, we can not to remove it
20:53 karolherbst: hate is unreasonable aversion
20:54 karolherbst: nothinh good comes out of hate :p
20:54 chithead: non-free software is a tool for user subjugation, so it is perfectly fine to hate it
20:54 imirkin_: so you're making completely arbitrary, and imo nonsensical, distinctions between "types" of firmware.
20:54 karolherbst: chithead: nope, hate is bad in itself
20:54 imirkin_: karolherbst: would you say you hate hate? :)
20:54 karolherbst: :D
20:55 karolherbst: allthough it is reasonable thinking to say that hate doesn't lead to anything good
20:55 karolherbst: ;)
20:55 vita_cell: I don't like to use non-free software (excluding "hate")
20:56 karolherbst: well and then you also have reasons for that I assume
20:56 karolherbst: like you can't see what is going on there
20:56 vita_cell: right, yes
20:56 karolherbst: well, then using an intel CPU is your bigger problem anyway
20:56 vita_cell: yes
20:56 vita_cell: and AMD cpu too
20:57 karolherbst: well in theory nvidia could copy each image send to your monitor and upload it somewhere
20:57 karolherbst: well the gpu
20:57 karolherbst: maybe
20:57 vita_cell: only Intel can mod or remove it, but with gpu you have choise,
20:57 karolherbst: maybe not
20:57 karolherbst: who knows
20:58 vita_cell: but Nvidia (excluding latest gens) does not need non-free software in the high level, so no non-free software in kernel or OS
20:58 karolherbst: ohh the piano got colors :O
20:58 karolherbst: what have I done
20:58 karolherbst: huh....
20:59 imirkin_: karolherbst: you've gone the max power way...
20:59 karolherbst: serioulsy...
20:59 karolherbst: that's just plain odd now
20:59 karolherbst: I only changed TargetNVC0::canDualIssue!
20:59 imirkin_: the wrong way, but faster!
20:59 karolherbst: and just returned false in a few cases
20:59 karolherbst: and now the colors are changing?
20:59 imirkin_: karolherbst: https://www.youtube.com/watch?v=7P0JM3h7IQk
21:00 mupuf: imirkin_: ahah
21:00 karolherbst: 52.5ms now
21:00 karolherbst: but the heck
21:00 karolherbst: why should a change inside TargetNVC0::canDualIssue lead to like... oh wait
21:01 karolherbst: yeah, my PostRADualIssue pass did something wrong
21:01 karolherbst: odd
21:01 karolherbst: okay
21:01 karolherbst: we dual issue wrong
21:02 karolherbst: a bit
21:02 karolherbst: not much
21:02 karolherbst: we shouldn't dual issue instructions when the second one uses the result of the former one
21:02 karolherbst: still checking
21:03 Calinou: hi vita_cell
21:03 Calinou: did you fart on nvidia yet :D
21:04 Calinou: <karolherbst> well in theory nvidia could copy each image send to your monitor and upload it somewhere
21:04 Calinou: that'd require atrocious amounts of bandwidth
21:05 karolherbst: Calinou: that's why I said in "theory" :p
21:05 vita_cell: hi Calinou
21:05 vita_cell: I think that Intel has more power with their "i" series backd00red CPUs
21:06 vita_cell: MTA, ME technologies
21:08 karolherbst: how are the warps managed in nouveau?
21:08 Calinou: there's no proof those backdoors have caused death, injury or significant financial loss though :/
21:08 vita_cell: yes, this is true
21:08 mupuf: Calinou: that would not be that bad
21:08 mupuf: especially since nvidia has this technology called nvenc :D
21:08 karolherbst: Calinou: ohh fincancial loss they did actually
21:09 vita_cell: but who knows, what they can do with that technology, in theory, they can block, power off, power on, any computer remotely
21:10 karolherbst: vita_cell: you lack the required amount of creativity here :D in theory you can even send data to another computer using only your sound card
21:10 karolherbst: well
21:10 karolherbst: not theory
21:10 karolherbst: because somebody already did this
21:10 Calinou: mupuf: what if an user has 50 KB/s of upload bandwidth?
21:10 vita_cell: lol
21:10 Calinou: it'd congest the user's bandwidth easily
21:11 vita_cell: the speaker windows virus, used sound to tranfer and infect other windows computers, this is not very new
21:12 karolherbst: vita_cell: nonono
21:12 karolherbst: vita_cell: I mean sound card
21:13 karolherbst: vita_cell: not through the speakers
21:13 karolherbst: you don't need speakers
21:13 vita_cell: ohhh, I don't know it
21:13 vita_cell: looks impossible
21:13 karolherbst: well you could also just record the noise of your CPU with a phone and collect data for crypto keys
21:14 Calinou: https://imgs.xkcd.com/comics/security.png
21:14 karolherbst: vita_cell: http://www.pcworld.com/article/2068525/researchers-create-malware-that-communicates-via-sound-no-network-needed.html
21:15 karolherbst: oh well, they still use the builtin speakers
21:16 karolherbst: but there was somebody who did this with the chip noise alone somewhere
21:17 vita_cell: wow, looks impossible
21:20 karolherbst: vita_cell: and this is just the boring stuff :D
21:21 karolherbst: imirkin_: is isCommutationLegal enough in RA or do I have to check something more?
21:22 karolherbst: *post RA
21:23 imirkin_: to determine what? whether commutation is legal?
21:23 karolherbst: well I need something like that:
21:23 karolherbst: a x z y b => a b x z y
21:23 karolherbst: and I need to know if I can move b there
21:24 karolherbst: currently I check if I can swap x/z, z/y, y/b and x/b but I think that's not enough
21:24 karolherbst: ohh wait
21:49 imirkin_: skeggsb: is this the GM107 version of the tfb thing? nvkm_wr32(device, GPC_UNIT(0, 0x3018), 0x00000001);
21:58 karolherbst: imirkin_: how can I identify instructions like "join mad ftz f32 $r26 neg $r27 $r19 $r27" ?
21:58 imirkin_: can you make that a multiple choice question?
21:59 karolherbst: mhhh
21:59 karolherbst: I mean the join part
21:59 imirkin_: what do you mean by 'identify'
21:59 karolherbst: ahh
22:00 karolherbst: there is a join flag on the instruction object
22:00 imirkin_: yes.
22:00 karolherbst: didn't saw it at first
22:01 karolherbst: yeah, I don't want to move them around
22:01 karolherbst: mupuf: 52.5 ms :D
22:02 karolherbst: so we don't want to dual issue if the result is used later
22:05 karolherbst: ohh
22:05 karolherbst: actually we can't
22:06 karolherbst: inst_issued1: 514M -> 493M inst_issued2: 240M -> 252M
22:08 karolherbst: metric-issue_slot_utilization: 158% -> 159%... oh well
22:09 mupuf: karolherbst: pretty nice :)
22:10 karolherbst: credit goes to glennk :D
22:10 karolherbst: he got me the idea
22:11 karolherbst: nearly 83% perf now
22:13 karolherbst: mhh issue slot utilization is also like 27% in SR3
22:13 karolherbst: 15% even
22:13 karolherbst: mhh something is very wrong here
22:16 karolherbst: maybe we just misconfigure the GPU?
22:35 mupuf: karolherbst: I strongly suggest to follow the suggestion of hakzsam and use LGD to see what you can expect from this benchmark
22:35 mwk: there are no Falcons with >32kiB data memory, right?
22:35 mupuf: that will give you a target
22:35 mupuf: mwk: who better than you to know this?
22:36 karolherbst: mupuf: I don't get it to run on my system
22:36 mupuf: ...
22:36 karolherbst: it can't connect to my local ssh daemon
22:36 karolherbst: no idea hwy
22:36 karolherbst: *why
22:36 mupuf: hmm, well, you can use reator
22:37 karolherbst: mhh
22:37 mupuf: this is what hakzsam did for REing
22:37 karolherbst: actually a good idea
22:37 karolherbst: I hope there is space for a 6GB apitrace :O
22:37 mupuf: there should be in /home
22:37 karolherbst: right
22:38 karolherbst: well the issue is somewhat, that I also want to know themetrics with nouveau
22:38 mwk: right, there are none
22:38 mwk: so my attempt to support 24-bit addresses is a total overkill...
22:39 mupuf: lol
22:39 mupuf: we always used 16bit pointers, why change?
22:40 mupuf: well, unless we actually do what nvidia does ... which is to use the VRAM to store the program
22:40 mwk: mupuf: code addresses, annoyingly enough, are 17-bit on the biggest Falcon
22:40 mupuf: AHAHAH
22:40 mwk: which means I have to use 32-bit void *
22:41 mwk: even though 16-bit would entirely suffice for data addresses
22:41 imirkin_: or have a 17-bit intptr_t :)
22:41 mwk: anyhow
22:42 mwk: I'm writing GlobalAddress lowering
22:42 mwk: the address space is kind of important here
22:42 mwk: if things fit in 15 bits, I can just use a mov to load the damn address
22:43 mwk: if 16 bits are needed, I have to use mov + sethi 0
22:43 karolherbst: mupuf: "connection failed."
22:44 mupuf: karolherbst: what the heck
22:44 karolherbst: exactly
22:44 mwk: or, if I'm doing an addition anyway, use an add instruction with 16-bit immediate... because that one is unsigned
22:44 karolherbst: mupuf: I am sure they bundle some libraries which mess up stuff
22:44 mwk: for >16 bits, the next level is 24-bit - mov + 8-bit sethi
22:44 mupuf: karolherbst: did you forget to set the port?
22:44 karolherbst: nope
22:44 mupuf: oh. you mean LGD
22:44 karolherbst: yeah
22:44 mupuf: well, did you create a tunnel?
22:45 karolherbst: tunnel?
22:45 mwk: ah screw that, I'll optimize that thing later
22:46 mwk: I'm thinking of making special "short data" sections for small variables, btw
22:46 mwk: linker would stuff these at the beginning of data segment, so that we can stuff their addresses straight into ld/st instructions
22:47 mwk: eg. clr %r0; ld.b %r1, 0x12(%r0)
22:48 mupuf: ahah
22:48 mwk: this requires stuffing 0 to some register, but it's still a shorter sequence than mov+ld, and the 0 can be reused for another small data access
22:50 mupuf: why are you optimizing so early in the development? :p
22:51 mwk: mupuf: this is one of the few optimizations here that requires ELF support, and I'd like to finish this part already
22:51 mupuf: ack :)
22:51 mwk: as in, I'd need 2 new relocations
23:00 karolherbst: uhh dri3 vpdau support
23:02 imirkin_: no prime though
23:02 karolherbst: meh :/
23:02 karolherbst: useless then :D
23:03 imirkin_: well, last time i was going to see about hooking up prime for vdpau, i realized it had 0 dri3 support
23:03 imirkin_: this time it might be a bit more fruitful
23:03 karolherbst: yeah, hopefully
23:11 orbea: What's the proper way of tracing a graphical issue (utterly broken colors) with a wine game?
23:11 imirkin_: apitrace wine
23:11 orbea: I tried following this guide, no go :/ https://github.com/apitrace/apitrace/wiki/WINE
23:11 imirkin_: do you know if it's only on nouveau, or is it a wine issue?
23:11 orbea: not sure
23:11 imirkin_: well, you don't want the wine apitrace thing
23:12 imirkin_: you want to apitrace wine :)
23:12 imirkin_: i.e. you want the GL cmdstream, not the D3D cmdstream
23:13 orbea: that is what I tried first, maybe I'm getting it wrong? http://dpaste.com/03RSEKN
23:14 imirkin_: yeah... you want the WINEPREFIX on the outside
23:15 imirkin_: since it's an env var setting, interpreted by bash, not by execve()
23:15 orbea: oh...thanks...
23:24 mwk: perfect, GlobalAddress now lowers :)
23:24 orbea: so, any thoughts, nouveau, mesa or wine issue? http://ks392457.kimsufi.com/orbea/stuff/trace/xanadu-wine-preloader.trace.xz Wait till after the title screen, the name change and short intro text, it will be very obviously rainblow colored...
23:24 imirkin_: i'm on i965 right now, let's see what happens there...
23:26 imirkin_: orbea: looks pretty rainbow colored to me here on i965/SKL
23:26 orbea: so probably wine or mesa
23:26 orbea: thanks
23:27 imirkin_: orbea: http://i.imgur.com/S55tCh8.png
23:29 orbea: yep, looks same here
23:30 karolherbst: orbea: tried nine?
23:30 orbea: what do you mean specifically? I have it compiled into mesa?
23:31 imirkin_: you need a handful of patches on top of wine to use it
23:31 karolherbst: well you can use d3d9 somewhat natievly
23:31 orbea: would it be included in winestaging?
23:31 karolherbst: but this issue should get fixed too anyway
23:31 karolherbst: orbea: nope
23:31 orbea: i'll look up setting up a install install with that, but I will submit it to wine too
23:32 orbea: *wine isntall
23:32 karolherbst: well I have a nine patch which kind of works on top of staging
23:32 karolherbst: because nine also uses some staging patches and it is a bit of a mess up :/