01:31 mwk: ugh, modling the CCs separately was a horrible idea
01:31 mwk: it tries to spill the Z flag...
01:46 mooch: mwk: do you think you could try to make a PFIFO emulator
01:47 mooch: something that just emulates PFIFO circa NV4
01:47 mooch: if you could get something like that working, i could gather info from it and put that into my emulator
01:47 mooch: and then hopefully actually work on pgraph
01:48 imirkin_: pgraph is going to be a bit of a bother :)
01:49 mooch: same with pfifo, what's your point? :^)
01:50 imirkin_: no... pgraph bother == (pfifo bother)^5
01:53 mooch: oshit
01:53 mooch: what's so bad about pgraph tho
01:53 imirkin_: pfifo basically does nothing
01:53 imirkin_: it reads commands in
01:53 imirkin_: and sends them out to the relevant engines
01:54 imirkin_: pgraph has 2d and 3d pipelines
01:54 imirkin_: well, pipeline might be generous
01:54 imirkin_: but ... functionality :)
01:54 imirkin_: NV4 didn't do TNL, but it did rasterization
01:54 mooch: but pfifo has all these damn status flags that if you don't get EXACTLY RIGHT
01:54 mooch: NT4 hangs
01:55 imirkin_: i didn't say pfifo wasn't annoying
01:55 imirkin_: all i said was that pgraph is going to be WAY harder
01:55 mooch: also, the latest NV4 drivers for NT4 require service pack 4, which on my emulator, you can't login
01:55 mooch: fair enough
02:31 mwk: I compiled strlen :)
02:33 imirkin_: correctly? :)
02:33 mwk: seems so
02:40 mwk: alright, so the only critical feature left is... TargetFrameLowering
02:42 mwk: this is going to suck badly
02:44 mwk: I wonder if it's remotely possible to do something useful with the [$sp+$rX*4] addressing mode
02:46 mwk: maybe if I just smack every stack-relative reference with >> 2, LLVM will be able to make some sense of it
02:51 mooch: what are you trying to do exactly?
02:51 mooch: are you trying to compile x86 code?
02:54 imirkin_: fuc, not x86
02:55 mooch: oh
02:55 mooch: weird how it would have similar addressing modes
03:25 mwk: mooch: they're not similiar at all
03:26 mwk: x86, with a few minor exceptions, has a single addressing mode: regbase + regidx * {1, 2, 3, 4} + offset
03:26 mwk: where you can skip both regbase and regidx
03:26 mwk: it allows you to address pretty much anything reasonable
03:27 mwk: Falcon has several modes: regbase ; regbase + regidx * {1, 2, 4} ; regbase + offset ; $sp + offset ; $sp + regidx * {1, 2, 4}
03:28 mwk: and $sp doesn't count as a proper register
03:29 mwk: so if you want to do something like "$sp + $r0 + 4", you're out of luck
03:29 mwk: you can fold out the "$r0 + 4" part to another register and transform it to "$sp + $r1", which is not bad
03:30 mwk: but if you're addressing a 32-bit word, you have to use the "4" multiplier
03:31 mwk: so either you hammer it in, and fold it like that: "$r1 = ($r0 + 4) >> 2"; "ld whatever [$sp + $r1*4]"
03:31 mwk: which is not exactly pretty, but should work (Falcon requires things to be aligned, so the >> 2 can't destroy anything)
03:32 mwk: ... or just give up, and read $sp to proper register and compute the address manually
07:17 imirkin: hakzsam: looks like "ld lock' on nvc0 always uses s[0x0] instead of the right place in memory... oops? :)
07:18 imirkin: Instruction *ld =
07:18 imirkin: bld.mkLoad(TYPE_U32, atom->getDef(0),
07:18 imirkin: bld.mkSymbol(FILE_MEMORY_SHARED, 0, TYPE_U32, 0), NULL);
07:18 imirkin: heh
07:31 hakzsam: imirkin, fermi, kepler or both?
07:31 imirkin: both. check the patch
07:32 imirkin: i sent a fix, although it doesn't fix the application =/
07:32 hakzsam: which app?
07:33 imirkin: elemental with your images patch series, and overrides to get it to GL 4.3
07:33 imirkin: and a few additional fixes to mesa
07:38 hakzsam: okay
08:19 axl_gento: hello
08:38 axl_gento: could anyone give me some pointers? I have a GTX960 card and wanted to try the nouveau driver. I have latest fw, kernel 4.6, mesa 1.18.3 and nouveau 1.0.12. still when i try to start x, it complains (EE) Unknown chipset: NV126. the weird thing is that gdm starts with wayland. X works with fbdev. the module loads in the kernel (even though it complains DRM: unknown connector type 70. any ideas?
08:41 Calinou: it should work if you have 4.6 kernel since there is the firmware now
08:41 Calinou: but it's experimental
08:42 axl_gento: i thought so too.
08:42 axl_gento: 4.6.0-gentoo #1
08:43 axl_gento: nouveau shows up at lsmod. even sensors work
08:43 axl_gento: nouveau-pci-0600
08:43 axl_gento: Adapter: PCI adapter
08:43 axl_gento: GPU core: +0.99 V
08:43 axl_gento: power1: 16.14 W
08:43 axl_gento: could it be that my board is different ?
09:45 karolherbst: ahh okay, now I understand this better
09:46 karolherbst: better dual issueing decreases the issue slot utilization, because two instructions go into the same issue slot
09:54 karolherbst: Tom^: I might have patches to increase performance in unigine heaven
10:01 karolherbst: ohh wait
10:01 karolherbst: dual issueing doesn't change much in the slot utilization, my other opts just decreased it
10:01 karolherbst: mhhh
10:01 karolherbst: maybe this makes sense in the end
10:02 karolherbst: new theory: nouveau just doesn't utilize the issue slots enough and that causes perf to decrease
10:11 karolherbst: 267.198 ->273.404 score
10:14 karolherbst: and improved dual issueing doesn't change a thing in the performance
10:15 karolherbst: which makes sense cause our issue slot utilization is bad
10:15 karolherbst: and in pixmark piano it makes a huge difference because we have like 158% there
12:15 karolherbst: hakzsam: what is the correlation between active_warps and warps_launched?
12:16 karolherbst: I have 13.2G active_waprs and 2.20M warps launched
12:23 karolherbst: hakzsam: by the way, I can use 5 counters in total usually :)
12:25 karolherbst: odd
12:25 karolherbst: active_cycles in SR3 is like 15% of pixmark_piano, that sounds wrong
12:37 mwk: ugh
12:37 mwk: why do "caller-saved" and "callee-saved" differ by only one letter...
12:38 karolherbst: mupuf: okay something goes wrong in the ssl handshake :/
12:39 karolherbst: lol
12:39 karolherbst: thos ciphers
12:39 karolherbst: *those
12:40 karolherbst: yeah, if the client only supports sha1 for hashing no wonder
12:40 karolherbst: I*ve disabled that like long time ago in ssh...
12:44 pmoreau: karolherbst: That seems weird (for active and launched warps). The opposite would make more sense IMHO
12:44 karolherbst: pmoreau: I know
12:44 karolherbst: that's why I asked
12:45 pmoreau: From the NSight VS documentation: "Active Warps – A warp is active from the time it is scheduled on a multiprocessor until it completes the last instruction. The active warps counter increments by 0-48 per cycle. The maximum increment per cycle is defined by the theoretical occupancy."
12:46 pmoreau: and "Active Warps per SM # warps that were active on the SM per cycle"
12:46 pmoreau: Doesn’t seem to be exactly what you are getting though
12:49 karolherbst: maybe hakzsam just calculates the value wrong or something
12:50 pmoreau: Maybe it misses a division by the number of cycles? I don’t know if it’s supposed to be an average or min/max value
12:50 karolherbst: anybody any idea where the fail is here? https://gist.github.com/karolherbst/f489cbaf1568bb07c790d41f02a675e8
12:50 karolherbst: pmoreau: yeah, maybe
12:52 pmoreau: No sorry. I’m not that into debugging ssh.
12:52 karolherbst: me neither
12:53 karolherbst: that's what you get for statically linking in the ssh client library...
12:53 karolherbst: serioulsly, for the sake of security, all ssh libs should convert to LGPL or something
12:53 karolherbst: :D
12:53 Calinou: and enjoy dependency hell, Fedora's OpenSSL beaking things… :|
12:54 karolherbst: I don't care, they should fix that
12:54 karolherbst: bundling ssh libs is the worst you can do
12:55 karolherbst: especially if there are security issues and you _have_ to run sshd unsecure because your applications are shitty because they bundle ssh
12:55 karolherbst: ...
12:55 Calinou: the trend today tends to be, statically link the most things you can
12:55 Calinou: because dynamic linking is simply not possible due to distributions being half-assed
12:56 Calinou: (this is pretty much what AppImage does by the way)
12:56 Calinou: http://appimage.org
12:56 karolherbst: well the LGD also ships libcrypto and libssl ...
12:57 karolherbst: meh
12:57 karolherbst: you don't statically link crypto libs
12:57 karolherbst: and if you do, you shouldn't ship that or be able to ship an update in 24 hours
12:59 karolherbst: libssh2-1.4.3 is from 2012!
12:59 karolherbst: wut...
12:59 karolherbst: maybe I can do something with LD_PRELOAd magic
13:00 karolherbst: sadly I can't
13:01 karolherbst: am I wrong with the thinking that you shouldn't ship a product with a 3.5 years old crypto lib statically linked in?
13:12 pmoreau: Depends on your side: if I wanted to hack you, I would be (most likely) super happy to find such an old version! ;-)
13:13 karolherbst: :D
13:24 mwk: alright, here's an attempt at documentation for the compiler: http://0x04.net/~mwk/Falcon.html
13:24 mwk: well, more of a wishlist for now :)
13:26 mwk: I suppose I need a section on inline assembly and a big section on linker features... but linker will come later
13:26 RSpliet: mwk: I'm considering archiving more (G)DDR datasheets than you currently have
13:26 RSpliet: from different memory chips encountered in the wild - with slight differences in (E)MR values
13:27 RSpliet: I'm rather empathic about where and how, but since you currently have a selection on 0x04.net, any preferences?
13:28 mwk: *shrug* you can dump it on me if you want
13:28 mwk: or we could create another repo on mupuf's
13:29 RSpliet: hmm, I doubt it'll exceed 20 pdfs, maybe a full repo is a bit overkill
13:30 RSpliet: it might be nice though having a GPU -> datasheet mapping. Maybe I should stuff them inside the vbios repo (or link as an external git module), and use symlinks in the respective VBIOS folders for those cards
13:31 RSpliet: overengineering? maybe...
13:33 mwk: eh
13:34 mwk: writing all that compiler stuff broke my right shift key
13:35 mwk:expected some hardware would have to be sacrificed for nouveau development, but expected it to be a burnt gpU
13:36 karolherbst: mwk: how would lldb work later?
13:37 karolherbst: are there soem debugger interfaces we could use?
13:37 mwk: karolherbst: yep
13:37 karolherbst: awesome :)
13:37 mwk: v4 has a nice debugging register set visible from host
13:37 karolherbst: yay
13:37 mwk: on v0 and v3, you'd need a debugging stub
13:37 karolherbst: this was one of the most annoying things while writing pmu code
13:38 karolherbst: ahh
13:38 karolherbst: yeah, well
13:38 mwk: though v3 has nice hardware breakpoints
13:38 karolherbst: I think it is plenty if we can debug on v4 already, because the code should be the same
13:38 karolherbst: sure, we will get some differences and odd issues
13:38 Weaselweb: mwk: do you have some source for falcon support?
13:38 mwk: Weaselweb: what source?
13:39 Weaselweb: mwk: Falcon LLVM backend http://0x04.net/~mwk/Falcon.html
13:39 karolherbst: mwk: we would have C++ support without the standard library, right?
13:39 karolherbst: and some fancy features
13:39 mwk: Weaselweb: yeah, I'm working on an LLvm backend
13:39 mwk: this is a wishlist
13:39 RSpliet: karolherbst: other options include implementing a functional model for falcon in SystemC, or a synthesisable model in *HDL for your FPGA to increase exposure :-P
13:40 mwk: right now I can compile some simple C code
13:40 mwk: I'll throw it on github once it's in reasonable shape
13:40 karolherbst: RSpliet: :D
13:40 Weaselweb: mwk: so no public repo yet? I'm playing with coldfire support on llvm, but don't exactly know how to write those .td
13:40 RSpliet: maybe tie it in with mooch's emulator to get meaningful response from register read/writes ;-)
13:41 mwk: Weaselweb: hah, you're in luck
13:41 mwk: Falcon is in many ways similiar
13:43 mwk: I'll tell you... it's taken a while to hammer the whole thing to compile
13:43 mwk: I spent a day or so before I got llc to link
13:44 mwk: and I still have loads of places where I #if 0'd the code wholesale
13:44 mwk: Weaselweb: got to go to a lecture, we can talk in 3 hours or so if you're still there
13:45 mwk: Weaselweb: oh, and you might want to join #llvm on oftc
13:48 Weaselweb: mwk: I'll be on in about 3.5 hours, feel free to ping me
14:22 karolherbst: mupuf: maybe I should read out the speedo value somewhere else like in pfuse? This way it is a bit easier to adapt to chipset specific stuff
14:29 Tom^: karolherbst: cool
14:29 Tom^: karolherbst: but doesnt that mean a bit increased in general?
14:31 karolherbst: maybe
14:32 karolherbst: Tom^: https://github.com/karolherbst/mesa/tree/for_real_opts
14:32 hannu1: why does plasma freeze?
14:32 Tom^: karolherbst: 12 commits behind imirkin:master. is the TEX patch not merged yet?
14:34 karolherbst: no clue
14:34 karolherbst: rebase it on imirikin:master
14:34 Tom^: i guess i could just apply your commit ontop of my local branch
14:34 Tom^: oor rebase yes.
14:35 karolherbst: it is a bit experimantel though
14:35 karolherbst: no clue what might break
14:35 Tom^: thats the way i like things.
14:35 Tom^: its most fun when you are on the edge of an catastrophe.
14:36 karolherbst: :D
14:36 karolherbst: well
14:36 karolherbst: I have to look how the fmad thing is done on gk110+ GPUs
14:39 karolherbst: Tom^: I would say maybe you get like 2 or 3 % more perf in heaven
14:40 mlankhorst: mwk: is iowrs also exported?
14:40 mlankhorst: on the falcon llvm page
14:46 kyamashita: What skills would I need if I were to contribute to Nouveau?
14:49 kyamashita: Aside from reading the Introductory Course page on the Nouveau wiki
15:03 mupuf: karolherbst: Seems like GPU boost 3 is just a sw feature: http://arstechnica.com/gadgets/2016/05/nvidia-gtx-1080-review/
15:03 karolherbst: mupuf: like every other GPU boost version too
15:03 hannu1: hi
15:03 mupuf: ok, let me rephrase, it is a user interface
15:03 karolherbst: ahh
15:04 karolherbst: ohh
15:04 karolherbst: real time preemption
15:04 karolherbst: fun
15:04 mupuf: boost is mostly sw, but hey, they did validation to tell you which voltagfe should be used
15:04 mupuf: yeah, real time preemption is ... interesting!
15:04 mupuf: not sure how it works without a big performance penalty
15:04 mupuf: in any case, it is just the 3d pipeline state that they switch
15:04 karolherbst: mupuf: where is the user interface thing?
15:05 mupuf: "Overclocking: Yes, you can (and should) do it"
15:05 karolherbst: yeah, found it now...
15:06 karolherbst: ohh
15:06 karolherbst: well
15:06 mupuf: maybe there is hw support for reporting faults
15:06 karolherbst: 2,025MHz ...
15:06 karolherbst: the hell
15:06 mupuf: or more tbles to do memory link training
15:06 mupuf: yeah, this is insane :o
15:06 karolherbst: well
15:07 karolherbst: to be fair I can run my gpu also with -0.1V
15:07 imirkin: axl_gento: did you figure it out? you need to use the "modesetting" ddx. you may also want to update to mesa 11.2 if you want accel. if you still have issues, pastebin dmesg and xorg logs
15:07 karolherbst: which means +135MHz is usually stable enough
15:07 karolherbst: (862 vs 997)
15:07 mupuf: :)
15:08 karolherbst: so nothing new
15:13 karolherbst: mupuf: but yeah, nvidia needs to know how stable the cores are with the applied voltage. sounds like fun
15:13 karolherbst: just troublesome if that would be a windows only feature
15:13 mupuf: it likely will be
15:13 mupuf: but hey, let's not care about overclocking
15:14 mupuf: we have bigger gains to get from just fixing our shit :D
15:14 mupuf: oh oh, bdw got ogl4.2!
15:14 mupuf: nice :)
15:16 karolherbst: yeah
15:16 karolherbst: well, it looks pretty decent now though
15:17 karolherbst: just that downclock on high temperature, which looks just messy, but not hard and then power budgets, which will be a little bit annoying
15:17 mupuf: yeah, and then we can move to dvfs
15:17 mupuf: and have a look at this linebuffer
15:18 karolherbst: and then we should check why some games run real bad
15:23 imirkin: karolherbst: can you see what happens when you run the UE4 elemental demo with https://github.com/imirkin/mesa/commits/gl43 ?
15:24 imirkin: it fails to render properly on nvc0, but i want to see if it's a similar fail on kepler
15:27 karolherbst: imirkin: I guess 4.3 is required for that?
15:28 imirkin: karolherbst: well, the branch should expose GL 4.3
15:28 karolherbst: right, just asking because I started it with stock and the window just stayed black
15:28 imirkin: oh, with stock it should work
15:28 imirkin: it takes a whiel to get going though
15:29 karolherbst: ahh okay
15:29 imirkin: the GL 3.2 renderer works fine with nouveau
15:29 karolherbst: maybe that was it
15:29 imirkin: make sure to wait a few minutes before giving up on it :)
15:29 karolherbst: :D
15:29 karolherbst: minutes?
15:29 karolherbst: I ususally give up after 10 seconds
15:30 karolherbst: it looks like it adapts the quality on the fly?
15:30 imirkin: it definitely scales up as you increase the window size
15:30 imirkin: (or down as you reduce it)
15:30 imirkin: i normally run it with ResX=640 ResY=480
15:31 imirkin: (you can just pass those as args)
15:31 karolherbst: how small :D
15:31 imirkin: i have a GF108
15:31 imirkin: which is at a middle clock level
15:31 mupuf:usually runs it in 64x64 to be cpu limited :D
15:31 karolherbst: :D
15:31 imirkin: mupuf: but then i can't tell if it's rendering correctly
15:31 imirkin: 640x480 is a good compromise
15:32 mupuf: yep, different goal :p
15:32 karolherbst: on 640x480 I get like 25 fps :O
15:32 imirkin: iirc i decided 320x240 was too small too
15:32 karolherbst: ohh no
15:32 karolherbst: now I get over 45
15:32 imirkin: karolherbst: with GL 3.2 renderer or 4.3?
15:32 karolherbst: 3.2
15:32 imirkin: ah ok
15:32 karolherbst: well
15:32 karolherbst: it compiles with all cores in the background tough
15:32 karolherbst: :D
15:33 imirkin: even on my GF108 i think i have like 5 fps
15:33 karolherbst: done compiling
15:33 imirkin: after hakzsam is done with his fermi images stuff, i'm going to swap the GK208 in and never look back
15:33 karolherbst: yay glsl430
15:33 karolherbst: ohh yeah
15:33 karolherbst: it looks like garbage
15:33 imirkin: mind taking a screenshot of the garbage?
15:34 imirkin: curious if it's the same garbage, or different garbage, than on nvc0
15:34 imirkin: also, do you see any errors in dmesg?
15:34 karolherbst: ohh wait
15:34 karolherbst: if you change the resultion it ungarbage it a bit
15:34 karolherbst: well it looks like a rendered image just gets stuck
15:34 karolherbst: but parts are updated
15:34 imirkin: yeah
15:34 karolherbst: layered wise
15:34 imirkin: same issue as on nvc0
15:34 karolherbst: k
15:35 imirkin: anything in dmesg?
15:35 karolherbst: yeah
15:35 imirkin: care to share?
15:35 karolherbst: MEM_OUT_OF_BOUNDS
15:35 imirkin: oooh
15:35 imirkin: i didn't get any of those
15:35 karolherbst: well, I have 3GB vram, just saying :D
15:36 imirkin: that just means it hit an unmapped page i think
15:36 imirkin: or went too far reading into a constbuf
15:37 karolherbst: imirkin: mhh
15:37 karolherbst: I only get the errors when I resize the window
15:37 karolherbst: imirkin: when I do nothing, I don't get any errors in dmesg
15:38 imirkin: ah ok
15:38 karolherbst: well
15:38 karolherbst: now I don't get those anymore
15:38 karolherbst: ahhh!
15:38 karolherbst: okay
15:38 karolherbst: here is the deal
15:39 karolherbst: parts of the benchmarks ran just fine
15:39 karolherbst: but
15:39 karolherbst: while the benchmarks runs fine, I get those errors in dmesg
15:46 mwk: mlankhorst: __falcon_iowrb
15:55 mwk: mupuf: any comments on the Falcon doc?
15:56 mwk:needs to add information about inline asm, overlays, linker usage, runtime lib
15:56 mwk: the overlays are going to be fun, for one
15:57 mwk: maybe that's what the "input file" specification in linker scripts is for :)
16:02 mupuf: it seemed fine for my usage
16:02 mupuf: mwk: ^^
16:04 mwk: no requirements about reading two IO ports at one time? boring :)
16:18 RSpliet: mupuf: in hard real-time systems, the penalty for full preemption is often justified in the light of reducing the worst-case response time of higher priority tasks
16:18 RSpliet: even if a context switch takes 10x as long as it takes now
16:19 RSpliet: furthermore, if you "drain" your cores from work rather than storing individual threads' state (complete register file), you can get by with a much smaller memory transfer while bounding preemption delay
16:49 mupuf: RSpliet: I fully agree, but we are talking about GPUs here, where throughput is really important
16:50 mupuf: I assume that it may only be triggered in rare cases, as you said, when the deadline approaches for doing physics or it really is taking too long
16:50 mupuf: I would love to know if there is any configuration needed
16:50 mupuf: pascal is going to be a big thing for us
16:53 imirkin_: neat. i965 showed the same corruption as nouveau with elemental. but then it crashed shortly thereafter.
16:54 Tom^: but didnt elemental work quite ok a few months ago?
16:54 imirkin_: with the GL 3.2 renderer
16:54 imirkin_: this is with the 4.3 renderer
16:54 Tom^: oh i see
16:54 imirkin_: still works with the 3.2 renderer :)
16:58 karolherbst: Tom^: and any changes in perf in heaven?
16:59 Tom^: ive not compiled
16:59 Tom^: to tired and lazy today :p
17:04 mlankhorst: mwk: ah that s what the barrier was
17:04 mlankhorst: too much kernel work for me lately, thought it was a compiler barrier
17:24 RSpliet: mupuf: we're also talking about Real-Time systems, where latency is really important :-D
17:27 RSpliet: I certainly hope full preemption is not a thing by default on desktop workloads
17:27 RSpliet: I expect it to require a separate FECS/GPC firmware
17:29 RSpliet: don't overestimate the cost though, including the register file we're talking something like 4MB per SM
17:29 RSpliet: while the bandwidth on GPUs appears to be balanced around 15GiB/s
17:29 RSpliet: per SM
17:32 RSpliet: wait, no that's not right, the reg file is much smaller
17:34 RSpliet: anyway, think in the order of < 0.1ms on kernels that can take up to 50-100ms. Analysis-wise, accounting for full preemption inflates the WCET of an individual task with < 1% while reducing its worst-case priority-inversion blocking with 50-100 ms ;-)
17:54 pascalgp104: http://www.techpowerup.com/reviews/NVIDIA/GeForce_GTX_1080/images/features3.jpg Pascal GPU family supports HEVC Main10/Main12 hardware decoding & Main10 hardware encoding, very nice improvements compared to GM200/204 which didn't support any hardware decoding & GM206's Main10 hardware decoder
17:56 francua: pascalgp104: idk
18:40 karolherbst: oh well
18:40 karolherbst: no 1080 vbios then...
21:10 karolherbst: hakzsam: is there a counter to know how long the gpu is doing stuff?
21:10 karolherbst: like time based
21:10 karolherbst: something like: while calculating this frame, the gpu spend 80% doing nothing
21:11 imirkin_: there's GL_TIMESTAMP, but we do it wrong ;)
21:11 karolherbst: actually I like the idea that the eon games are bad because of the compilations in every frame
21:11 karolherbst: so that we: compile, draw, compile, draw, compile, draw ... and so on
21:11 imirkin_: it's a nice idea ;)
21:11 karolherbst: and while we compile, the gpu does nothing
21:12 karolherbst: => bad perf
21:12 imirkin_: well, the bigger thing is that you have to upload new code
21:12 karolherbst: this would also exaplain why it is worse in more complex scenarios
21:12 imirkin_: and flush
21:12 karolherbst: also true
21:13 karolherbst: imirkin_: what is the best way to see if the engine really compiles every frame?
21:13 imirkin_: add a driver counter that counts compilations
21:13 karolherbst: or printf whenever we compile?
21:13 imirkin_: sure, but then you can't put it up in the HUD
21:14 karolherbst: well, that doesn't matter for now
21:14 karolherbst: or maybe there is a hud entry for that already
21:15 karolherbst: nope
21:15 karolherbst: why not? :D
21:16 karolherbst: imirkin_: nv50_ir_generate_code? or is there some other point where I could count?
21:16 imirkin_: i forget... something like that
21:22 karolherbst: imirkin_: uhhh
21:22 karolherbst: they do
21:22 karolherbst: every frame
21:22 karolherbst: ohh it stopped in the main menu
21:24 karolherbst: well
21:24 karolherbst: CPU usage is also at 33%
21:28 hakzsam: karolherbst, there is not, actually it's not yet exposed, and I don't have time right now :/
21:29 karolherbst: no worries
21:31 karolherbst: what is "gred" by the way?
21:33 hakzsam: global reduction
21:33 Calinou: http://arstechnica.com/gadgets/2016/05/nvidia-gtx-1080-review
21:33 Calinou: there's DRM implemented to prevent usage of 3/4-way SLI by default now
21:34 Calinou: "What is is surprising, however, is that by default, anything other than two-way SLI, or two-way SLI with an additional card for physics processing, is locked out by the hardware. DirectX 12 games that support "multi display adaptor," where any number of mixed GPUs are controlled by the game directly, will still work. Anything involving SLI with more than two cards under Nvidia's driver is locked out."
21:34 hakzsam: karolherbst, gred_count and atom_count will most likely return 0 if there is no atomic operations
21:34 Calinou: ethics.html Cannot GET (404 Not Found)
21:34 karolherbst: hakzsam: right
21:34 karolherbst: hakzsam: I just didn't found your link with the description of all metrics
21:34 karolherbst: but now I found it
21:35 karolherbst: ohh there is no gred count there
21:35 karolherbst: uhh right
21:35 karolherbst: because it is only for metrics
21:35 hakzsam: maybe I should add a description for each event in the code..
21:35 hakzsam: that might help
21:35 karolherbst: hakzsam: or add it to envytools
21:35 karolherbst: or the nouveau wiki
21:36 hakzsam: yeah
21:36 karolherbst: no clue what those prof_trigger_00 - 07 are too
21:36 hakzsam: sort of user events
21:36 hakzsam: don't look at them
21:36 karolherbst: k
21:37 hakzsam: for debugging purposes actually, with cuda essentially
21:37 karolherbst: mhh
21:37 hakzsam: so, there are not really useful :)
21:38 karolherbst: now I am running out of ideas slowly, even crazy ideas aren
21:38 karolherbst: 't helpfull :/
21:38 karolherbst: sm_efficiency mhh
21:39 hakzsam: noy yet exposed...
21:39 hakzsam: *not
21:39 karolherbst: right, I just look at things which might be interessting to have
21:39 hakzsam: okay
21:39 karolherbst: tex_cache_hit_rate also sounds like something usefull, though no idea how that would help us
21:40 karolherbst: uhh
21:41 karolherbst: *_utilizations
21:41 karolherbst: those sound usefull
21:41 hakzsam: like what?
21:41 karolherbst: alu_fu_utilization
21:41 karolherbst: or dram_utilization
21:41 karolherbst: or tex_utilization
21:41 hakzsam: yeah, they are
21:41 karolherbst: maybe they could help us
21:42 karolherbst: basically if one of them is 100%, we have found out bottleneck, right?
21:42 karolherbst: or is it a bit more complex than this?
21:42 hakzsam: I think it's a bit more complicated than that, but if one engine is at 100%, yeah you have most likely a bottleneck
21:42 karolherbst: hakzsam: anyway, I tried to get the LGD working, but ssh connecction fails for me and I have no clue why... any ideas?
21:43 hakzsam: mmh, I don't remember all the specifics, let me try again on femri
21:43 karolherbst: flop_sp_efficiency uhhh
21:43 karolherbst: and eligible_warps_per_cycle
21:43 karolherbst: also nice
21:43 hakzsam: those are metrics, right?
21:43 karolherbst: yeah
21:44 hakzsam: which probably use both MP and global perf counters
21:44 hakzsam: some perf counters are just crazy btw
21:44 hakzsam: some of them use like multiple passes and a ton of hw counters
21:44 karolherbst: ugh :/
21:44 hakzsam: and multiple passes seem like crazy to implement
21:45 hakzsam: so they won't happen quickly
21:45 karolherbst: well at least those utilization things sounds really nice to have
21:45 hakzsam: yeah
21:45 hakzsam: so, where is LGD on my system? I don't even remember that :)
21:45 karolherbst: :D
21:45 karolherbst: mine is in usr/local
21:46 hakzsam: maybe I just removed it
21:47 hakzsam: I did
21:49 hakzsam: downloading
21:52 hakzsam: karolherbst, works fine here
21:52 karolherbst: mhh
21:52 karolherbst: what version of openssh do you have?
21:53 karolherbst: hakzsam: and with working you mean you can also connect to sshd I assume?
21:53 hakzsam: OpenSSH_7.2p2, OpenSSL 1.0.2h 3 May 2016
21:53 hakzsam: yep
21:54 karolherbst: odd
21:54 hakzsam: but I don't have blob, so I can't test but it can connect locally
21:54 karolherbst: yeah well, it won
21:54 karolherbst: 't connect for me
21:54 hakzsam: what's the error?
21:54 karolherbst: "connection failed"
21:55 karolherbst: sshd log: https://gist.github.com/karolherbst/f489cbaf1568bb07c790d41f02a675e8
21:56 hakzsam: weird
21:56 karolherbst: yep
21:56 karolherbst: uhh
21:56 karolherbst: it doesn't even try to authenticate
21:56 karolherbst: at least I am sure the password isn't sent over
21:57 hakzsam: I guess ssh username@localhost works?
21:58 karolherbst: yeah
21:58 karolherbst: it does
21:59 hakzsam: that's unexpected...
22:01 karolherbst: uhhh
22:01 karolherbst: I changed something in the sshd config
22:01 karolherbst: now it does more
22:01 hakzsam: what did cou changE?
22:02 karolherbst: more stuff at once...
22:02 karolherbst: but stuff like pma, strict mode...
22:02 hakzsam: maybe share your sshd_config?
22:04 karolherbst: ahh found it
22:04 karolherbst: "PasswordAuthentication no"
22:04 hakzsam: :)
22:04 karolherbst: "To disable tunneled clear text passwords, change to no here!"
22:04 karolherbst: ...
22:04 hakzsam: that should work nw
22:05 karolherbst: well now it hangs on connect
22:06 karolherbst: mhh
22:06 karolherbst: sshd fork has 100% cpu
22:08 karolherbst: maybe my sshd is just messed up...
22:08 hakzsam: yeah
22:12 hakzsam: xexaxo, btw, is the schedule delayed by one week for mesa (to be sure I will have time to merge my stuff) ?
22:12 karolherbst: well with my distro default config it won
22:12 karolherbst: 't work :/
22:14 karolherbst: hakzsam: mind sending me your config and I check what is different?
22:19 hakzsam: karolherbst, http://hastebin.com/birobifoce
22:19 karolherbst: thanks
22:32 karolherbst: hakzsam: lol...
22:32 karolherbst: hakzsam: I disabled some compiler opts, guess what
22:39 karolherbst: uhh
22:40 karolherbst: well
22:40 karolherbst: having a optimus system makes it a bit complicated now
22:40 hakzsam: works now?
22:40 karolherbst: yeah well
22:40 karolherbst: somehow
22:40 karolherbst: it won't start debugging anything
22:40 karolherbst: because my system libgl.so isn't nvidia ones :)
22:41 karolherbst: because I can fake it
22:43 karolherbst: oh well