02:14 karolherbst: imirkin_: what do you think how much perf can be gained through this zcull thingy? and how much work would be required to figure it out what to do?
02:16 karlmag_: karolherbst: do you ever sleep? I am under the impression that I see you here at all times of day (and night) :-P
02:16 karolherbst: yes, I do sleep :p
02:17 karlmag: good to know :-D
02:17 karolherbst: but maybe I have less fixed sleeping times then others
02:17 karlmag: ah, right
02:17 karlmag: If I could I would have had too
02:18 karolherbst: actually do you know the 28 hours day from xkcd? :D I tried that out once
02:19 karolherbst: I wouldn't try that, it is kind of heavy
02:20 karlmag: hehe...
02:20 karlmag: just looked it up
02:20 karlmag: so.. what's kind of heavy? The mom? :-P
02:20 karlmag:ducks
02:25 karolherbst: no, sleeping at day
02:25 karlmag: if exhausted enough I'd sleep at any time. :-P
02:27 karolherbst: :D
02:36 hakzsam: karolherbst, I don't work on PDAEMON perf counters (currently but that's going to change I think), so all stuff related to MP counters and PCOUNTER are in my area :)
02:47 karolherbst: hakzsam: okay, because I already figured out how to read the load out :)
02:47 karolherbst: it is a bash script though, but would that make sense to you? or didn't you dig into that yet? https://gist.github.com/karolherbst/538d5429587dada60845
02:48 karolherbst: it is a mix of what I found in the gk20a source and the reg values for me with the blob
02:49 karolherbst: I think 0x10a568 shows the video accel engine load though without ever having trying it out
02:49 hakzsam: karolherbst, I'll have a look after lunch
02:51 karolherbst: okay, thanks
03:01 Caterpillar2: I am writing from a machine with a monitor that uses VGA cable. xrandr http://paste.fedoraproject.org/273064/44360566 shows only 1024x768 as max resolution and I would like to set 1680x1050. Is there a way to unlock the resolution choices without having to mess with xorg.conf?
03:04 pmoreau: Is Nouveau loaded? Could you paste your Xorg.0.log please?
03:06 Caterpillar2: pmoreau: http://paste.fedoraproject.org/273070/44360757
03:07 pmoreau: Nouveau seems to load correctly, but doesn't found a 1680x1050 mode
03:08 Caterpillar2: pmoreau: does VGA port support resolution information exchange between the computer and the monitor?
03:09 pmoreau: Nouveau does receive the EDID from the screen it seems
03:10 pmoreau: Could you paste the output of dmesg as well please?
03:12 Caterpillar2: pmoreau: http://paste.fedoraproject.org/273074/36079691
03:23 Caterpillar2: pmoreau: I think I should fill a bugreport
03:34 Caterpillar2: http://nouveau.freedesktop.org/wiki/DumpingVideoBios/ assuming that you have debugfs mounted on "/sys/kernel/debug"
03:34 Caterpillar2: how can I check that?
03:35 prg: is that dir empty?
03:38 Caterpillar2: prg: no http://paste.fedoraproject.org/273094/09496144
03:38 prg: then it's mounted
03:39 Caterpillar2: prg: thank you
03:39 prg: yw
03:45 Caterpillar2: filled bugreport https://bugs.freedesktop.org/show_bug.cgi?id=92192
03:46 Caterpillar2: could you please check if dmesg is okay or I should ask the system for more deeper debuginfos?
03:46 Caterpillar2: (log_buf_len=1M )
03:46 Caterpillar2: I haven't yet setted it because it is a server machine
04:00 pmoreau: Caterpillar2: Sorry, I had to go. Looking at the dmesg now
04:00 Caterpillar2: ;-)
04:01 pmoreau: Apart from "nouveau W[ DRM] unknown connector type 01", nothing special
04:02 Caterpillar2: pmoreau: so do you suggest me to restart with log_buf_len=1M ?
04:03 pmoreau: Connector type 01 is DVI-A
04:03 pmoreau: You don't seem to be missing any information from your dmesg
04:03 pmoreau: So no need for that
04:03 Caterpillar2: pmoreau: yes it is a dvi
04:04 pmoreau: You should still be able to add a new mode using xrandr
04:05 pmoreau: imirkin_: -^ Any suggestions / ideas?
04:05 Caterpillar2: pmoreau: do you have the command? So that I can use the native resolution while the bugreport is being checked
04:05 Caterpillar2: thx
04:06 pmoreau: run man xrandr, and look towards the end of the output
04:06 pmoreau: You have an example on how to set a new mode: "Forces to use a 1024x768 mode on an output called VGA:"
04:08 pmoreau: bbl
04:08 Caterpillar2: pmoreau: there are 3 different commands there
04:08 Caterpillar2: xrandr --addmode VGA
04:08 Caterpillar2: xrandr --output VGA --mode
04:08 Caterpillar2: xrandr --newmode
04:08 Caterpillar2: mmh
04:08 pmoreau: One to create the new mode, one to add it to the output's mode, and one to switch to that mode
04:10 Caterpillar2: ok
04:11 Caterpillar2: now I have to go, thank you for your help
04:11 pmoreau: You're welcome
04:47 tagr: imirkin_, rektide2_: X1 is going to be very similar to K1, it's mostly an incremental upgrade
04:47 tagr: there are a few bits related to multimedia that are new, but display and the GPU are roughly the same
04:48 tagr: so there's the same split between display and GPU and the plan is to make both work in much the same way as K1
04:48 tagr: once we solve the problem for how to seamlessly integrate both it'll get solved for both K1 and X1 at the same time
05:04 karolherbst: wuut, my laptop bios has overwrite key combinations for controling my fans :O
05:06 xexaxo: karolherbst: i2c fan ? I've seen some of the initscripts tinkering around with those.
05:07 karolherbst: no, EC controlled one
05:07 karolherbst: it may be i2c or pwm or anything, but it's attached to the motherboard
05:07 karolherbst: and the EC controls it depending on the temp the gpu exports
05:08 karolherbst: my fans are too loud in general though so I search for a solution
05:09 karolherbst: there is something really odd with my CPU C states
05:17 xexaxo: hmm I misread laptop bios as gpu bios :)
05:18 karolherbst: well, i would be more worried about why the heck the gpu knows what I do with my keyboard then ;)
05:20 karolherbst: mhhh, I don't know, but somehow intel_pstate does stupid things for me
06:07 RSpliet: let me just drop this here: http://fpaste.org/273132/43618463/
06:09 RSpliet: (G94, VirtualBox trying to boot a Windows 8.1 guest. Fedora 22, kernel 4.3rc1 plus my reclocking patches, libdrm-2.4.62-2.fc22.x86_64, mesa-libGL-10.6.3-3.20150729.fc22.x86_64, xorg-x11-drv-nouveau-1.0.11-2.fc22.x86_64)
06:17 RSpliet: sorry, this is GT200 rather, not G94
06:48 fling: Hello.
06:48 fling: Which device in /dev/ do I need for nouveau driver to work?
06:48 fling: I'm trying to run gui in an lxc container.
06:48 fling: https://www.stgraber.org/2014/02/09/lxc-1-0-gui-in-containers/
06:49 fling: nvidia.ko people are mounting /dev/nvidia* there but we don't have this.
06:50 pq: fling, /dev/dri/*
06:50 fling: pq: looks like I needed something like this -> something like lxc.cgroup.devices.allow = c 226:* rwm
06:50 fling: pq: thanks, will test it now.
06:52 fling: pq: did the trick! thanks
07:47 karolherbst: hakzsam: I guess there is no way to use native drivers inside a vm without a vt-d cpu?
07:48 hakzsam: nope
07:49 karolherbst: and you need a display port on your gpu? :/
07:49 karolherbst: didn't know that
07:50 hakzsam: heh? a display port to use vt-d?
07:50 karolherbst: I meant any port on the gpu
07:50 karolherbst: for a display
07:50 karolherbst: (the name display port is really badly choosen :D )
07:50 hakzsam: RSpliet, you got your article, nice job btw ;)
07:50 RSpliet: hakzsam: :-P
07:51 karolherbst: :D
07:51 karolherbst: the guy didn't do the gddr5 benchmark, now I am angry at him :p
07:51 RSpliet: aren't we all lazy deep inside?
07:51 hakzsam: karolherbst, when I tried vt-d on my main desktop, I used two monitors btw
07:52 karolherbst: hakzsam: yeah I read your stuff about that
07:52 karolherbst: RSpliet: hey gddr5 nouveau!
07:52 karolherbst: :p
07:52 kubast2: kubast2 here[gtx 650 ,GK107] ,Is there something I can test out ?
07:52 hakzsam: karolherbst, oh cool!
07:52 karolherbst: "Similar Kepler improvements will eventually happen inspired by these DDR3 changes. " lol :p
07:52 RSpliet: karolherbst: artistic freedom
07:52 karolherbst: :)
07:53 RSpliet: you have to admit the coverage is nice
07:53 karolherbst: kubast2: if mine gddr5 patches work for you, I only have those pcie speed patches left
07:53 RSpliet: you'll have your own article when your clock patch is out on the ML :-P
07:53 karolherbst: I already had one
07:53 karolherbst: without even posting on the ML :p
07:54 karolherbst: he promised benchmarks though :(
07:54 kubast2: cloning master_karol
07:54 karolherbst: RSpliet: https://www.phoronix.com/scan.php?page=news_item&px=Nouveau-GDDR5-Branch
07:54 karolherbst: kubast2: nope
07:54 karolherbst: _stable
07:54 RSpliet: there you go!
07:54 kubast2: sudo git clone https://github.com/karolherbst/nouveau.git -b master_karol_stable
07:55 kubast2: doing it
07:55 karolherbst: I may have to rebase pcie_speed changes, but for kepler it doesn't magger
07:55 karolherbst: or wait
07:55 karolherbst: maybe I do it
07:55 karolherbst: kubast2: wait :D
07:55 karolherbst: kubast2: pushed the branch now, now you can clone
07:55 karolherbst: and you don't need sudo for git
07:56 karolherbst: you shouldn't do anything internet related with sudo if you can avoid it
07:56 karolherbst: RSpliet: I just hoped he will do some benchmarks :/
07:57 kubast2: to be honest I have a weird bug
07:57 karolherbst: would be nice to see how the performance goes in general
07:57 kubast2: that doesn't allow git to create nouveau folder
07:57 karolherbst: kubast2: then fix it ;)
07:57 kubast2: even through I deleted it with rm -rf
07:57 karolherbst: kubast2: mhhhh
07:57 kubast2: fatal: could not create work tree dir 'nouveau'.: Permission denied
07:57 kubast2: kubast@Kuba-Pc ~/nouveau $ ls
07:57 kubast2: envytools
07:57 karolherbst: it should work
07:57 kubast2: wait
07:57 hakzsam: karolherbst, be patient, he'll do it for sure :)
07:57 kubast2: got an idea
07:57 karolherbst: hakzsam: yeah, I know
07:58 karolherbst: the patch will be merged for 4.4 anyway, so
07:58 karolherbst: I just hoped that I get more testing before that, but I got that anyway, so I am fine
07:58 kubast2: works now
07:58 karolherbst: hakzsam: skeggsb thought earlier, that my patch may only work for mobile chips, because they don't drive a display
07:59 karolherbst: kubast2: nice
07:59 kubast2: forgot I was in # folder instead of ~\
07:59 karolherbst: hakzsam: and I wanted to know how "big" the improvement is in general, and how stable
07:59 hakzsam: karolherbst, yeah, I know
07:59 karolherbst: RSpliet: and what gddr5 thingy did you want to test?
08:00 hakzsam: karolherbst, what do you want me to look at your pdaemon script ?
08:00 karolherbst: hakzsam: if that makes sense to you
08:00 karolherbst: I want to have something like that in mainline nouveau in the end anyway
08:00 karolherbst: I feel that it may be usefull information for finding bottlenecks
08:01 hakzsam: sure
08:02 RSpliet: karolherbst: someone needs to figure out what happens if DLL is enabled on GDDR5
08:02 karolherbst: RSpliet: I can do that
08:03 RSpliet: whether some MR trickery is required to reset the DLL after changing the clock when enabled
08:03 karolherbst: but what is the benefit of doing so?
08:03 RSpliet: like done on so many other cards
08:03 RSpliet: stability
08:03 RSpliet: reliability
08:03 karolherbst: ohhh
08:03 karolherbst: mhhh
08:03 karolherbst: then I can't test that, because my gpu doesn't crash anymore at 0f :/
08:03 karolherbst: so I can't tell you if anything is better
08:03 RSpliet: that's all right, it requires little more than a fake VBIOS that enables the DLL on a perflvl
08:04 hakzsam: karolherbst, on which chipset did you test your script?
08:04 RSpliet: and a bit of trace
08:04 karolherbst: hakzsam: gk106
08:04 karolherbst: RSpliet: mhh, I can't fake my vbios
08:04 hakzsam: karolherbst, because signals may be different between chipsets
08:04 karolherbst: hakzsam: I know that it might be
08:05 karolherbst: it looked a little bit different on the gk20a side
08:05 karolherbst: but the reset I took from there
08:05 RSpliet: karolherbst: and I don't have a GDDR5 card, so I can't do the dev work for this :-P
08:05 karolherbst: and it works, so I think resetting the regs is the some at least across all kepler
08:05 karolherbst: *same
08:05 karolherbst: RSpliet: hehe
08:06 karolherbst: RSpliet: I just know, that switching pstate 1000 times may crash my gpu, but this was before mupufs pwm voltage patches
08:07 karolherbst: hakzsam: I have the feeling, that it doesn't matter what regs are used anyway
08:07 karolherbst: 0x4 to configure the output of 0x8
08:07 hakzsam: karolherbst, why don't you configure COUNTER_MASK[0x7]? because you read after...
08:07 karolherbst: 0x10a574 ?
08:08 hakzsam: yeah, you have N counters and you can do whatever you want
08:08 karolherbst: it is 0x0 with the blob,
08:08 hakzsam: ok
08:08 hakzsam: makes sense
08:08 kubast2:updates initramfs ,forgets to copy copiled kernel module
08:08 karolherbst: also 0x3 is used for that one in c
08:08 karolherbst: and 0x2 for the others
08:10 karolherbst: and the 0x6X regs may be used for the video accell engine
08:10 karolherbst: can't test it though, because vdpau on hybrid gpu setup is somehow painfull
08:10 hakzsam: well okay, I think you should first store the previous state of PMU before doing your stuff and restore it at the end, but the most important thing is to *exactly* understand what you count with those signals
08:10 karolherbst: yeah I know
08:10 kubast2: "[ 2.430359] [drm] Initialized nouveau 1.3.0 20120801 for 0000:01:00.0 on minor 0" "[ 0.863063] nouveau 0000:01:00.0: pci: pcie max speed: 8.0GT/s" ""
08:11 karolherbst: but the regs were 0x0 with nouveau
08:11 karolherbst: kubast2: seems okay
08:11 karolherbst: kubast2: with the pstate file you can switch pstates
08:11 karolherbst: which will change pcie speed
08:11 hakzsam: karolherbst, 0x10a574 is probably used for counting the number of cycles
08:11 karolherbst: you can check with lspci -vv
08:11 karolherbst: hakzsam: yeah
08:11 karolherbst: that's at least the "total" value used in gk20a
08:12 kubast2: su root
08:12 hakzsam: so, the number of cycles ;)
08:12 karolherbst: hakzsam: https://github.com/karolherbst/nouveau/blob/master/drm/nouveau/nvkm/subdev/pmu/gk20a.c#L100
08:12 karolherbst: yeah
08:12 karolherbst: the gk20a driver seems to read only the core load though
08:13 karolherbst: mhh maybe not
08:13 karolherbst: they configure the reg different
08:13 karolherbst: I just took the stuff the blob uses for me and applied it to the code logic withing gk20a
08:14 hakzsam: and did you validate the results ?
08:14 karolherbst: yes
08:14 karolherbst: nearly the same as with the blob
08:14 hakzsam: how?
08:14 karolherbst: gpu core heavy benchmarks have 100% core load
08:14 karolherbst: pcie based benchmask have nearly 100% pcie load
08:14 karolherbst: ...
08:14 karolherbst: stuff like that
08:14 karolherbst: heaven even gave same core/mem load on nouveau and blob
08:15 hakzsam: always 100%? :)
08:15 karolherbst: pcie never reaches 100% though
08:15 karolherbst: it is around 90%
08:15 RSpliet: karolherbst: I'm just reading a paper about how the "sfu" in an NVA0 does FMA rather than the ALU
08:15 kubast2: [ 0.750789] nouveau: unknown parameter 'pstate' ignored
08:15 karolherbst: hakzsam: well when there is no load, the load is mostly 0 ;)
08:15 kubast2: [ 0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-3.19.0-28-generic root=UUID=8825e5b5-9491-4ff1-b034-42e10cf89608 ro nouveau.pstate=1 quiet splash vt.handoff=7
08:15 karolherbst: kubast2: yeah
08:15 RSpliet: and even regular multiply... according to some Intel chaps
08:16 hakzsam: karolherbst, hehe
08:16 karolherbst: there is no parameter for that on my branch
08:16 karolherbst: kubast2: and pstate is in /sys/kernel/debug/dr...
08:16 RSpliet: have you observed a tendency in NV50 shaders to alternate ALU ops with MUD/FMAD?
08:16 karolherbst: hakzsam: ohh wait, we need those pmu values for auto reclocking?
08:17 karolherbst: or will that be done differently
08:17 hakzsam: karolherbst, did you check if the pcie speed has been changed by the blob when the pcie load is more than X% (I'm just curious)?
08:17 karolherbst: I think the blob always enables the core counters and the other both only when you check them in nvidia-settings
08:17 karolherbst: hakzsam: nope
08:17 karolherbst: hakzsam: I don't have any "pcie" only benchmarks though
08:17 hakzsam: karolherbst, yeah, we'll need those pdaemon perf counters for DVFS, like gk20a already does
08:18 karolherbst: only those where pcie load equal core load
08:18 kubast2: karolhehrbst /sys/kernel/debug/drm ?
08:18 karolherbst: kubast2: yes
08:18 kubast2: because I don't have it tbh
08:18 karolherbst: then it is dri
08:18 karolherbst: :D
08:18 kubast2: 0 , 128 ,64
08:18 karolherbst: 0
08:19 kubast2: got 0f 0f: core 1058 MHz memory 5000 MHz AC DC *
08:19 karolherbst: hakzsam: so I should check how the reg are configured on other chips as well
08:19 karolherbst: kubast2: check lspci -vv -s 01:00.0
08:19 karolherbst: and dmesg
08:20 karolherbst: lspci should show you 8.0 lnkSta speed
08:20 karolherbst: and dmesg something about pcie
08:20 karolherbst: ohh wait, I don't print anything when nothing went wrong
08:20 karolherbst: you could reboot with nouveau.debug=debug
08:20 karolherbst: but if the speed is up in lspci, it should be fine
08:21 karolherbst: and it should be lower on the other pstates
08:21 karolherbst: 2.5 on 07
08:21 hakzsam: karolherbst, yeah and to figure out signals, but I would like to suggest you to focus on one card only at beginning
08:21 kubast2: LnkSta: Speed 8GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
08:21 kubast2: lspci
08:21 karolherbst: :)
08:21 kubast2: it wasn't on dmesg
08:21 karolherbst: then go to 07 pstate
08:21 karolherbst: and it should become 2.5
08:22 kubast2: Speed 2.5GT/s
08:23 karolherbst: nivce
09:31 danboid: Does the X1 have an open source X driver like the K1 does?
09:31 danboid: Tegra, that is
09:33 danboid: Has anyone got normal Linux running on the shield TV?
09:44 imirkin: danboid: try #tegra ?
09:45 imirkin: danboid: the GPU (i.e. acceleration) aspect is definitely not presently supported on the X1 with an open-source stack
09:46 imirkin: danboid: but the modesetting might be. in fact, check the logs for this chan, i think tagr recently commented on this.
09:48 danboid: imirkin: I should've, yes. Seems phoronix (at least) have already got Ububntu running on the Shield TV
09:49 danboid: I'm not saying Michael ported Ubuntu, but it seems it should be pretty hacker-friendly if thats the cae
09:50 imirkin: i have no information on that
09:54 karolherbst: I need somebody with a nouveau driven desktop kepler + working vdpau setup
09:54 imirkin: for what
09:55 imirkin: ritual sacrifice?
09:55 karolherbst: reading out gpu load
09:55 karolherbst: especially for the video accel
09:55 karolherbst: I think I got the other stuff working
09:55 karolherbst: just want to be sure that one bit is for that
09:55 imirkin: prepare some instructions, i can try to do it later
09:56 karolherbst: imirkin: https://github.com/karolherbst/envytools/commit/a3f1fb7b8fd19b3db5c83f75ca3ce8adbcea18e1
09:56 imirkin: karolherbst: why don't you do it btw?
09:56 imirkin: you have a kepler...
09:56 karolherbst: I never ever even touched vdpau
09:56 imirkin: time to start!
09:56 karolherbst: :D
09:56 RSpliet: karolherbst: don't forget the closing > on line 2 in nvagpuload
09:56 imirkin: install nvidia-firmware package on gentoo
09:56 karolherbst: ohhh
09:56 karolherbst: mhh
09:57 imirkin: and DRI_PRIME=1 mplayer -vo vdpau -vc ffh264vdpau foo.m4v
09:57 karolherbst: ohh taht easy
09:57 karolherbst: okay
09:57 karolherbst: I think I can do that
09:57 imirkin: or equivalent in some other player you use.
09:58 karolherbst: I disabled vdpau support for all
09:58 karolherbst: so I have to rebuild something anyway
09:58 imirkin: oh
09:58 imirkin: yeah. you need mesa +vdpau
09:59 imirkin: that should pull in enough deps
09:59 karolherbst: vdpau is already installed though
09:59 imirkin: libvdpau is just a passthrough library
09:59 karolherbst: because I use my intel card for that with a wrapper already, just I don't have any software using vdpau actually
09:59 imirkin: like libGL
09:59 imirkin: [well, libGL doesn't have to be a passthrough library, but in mesa it is]
10:02 karolherbst: okay, I think I am done with RE on the kepler side :D everything seems to be in envytools already, except video accel
10:04 karolherbst: imirkin: but if you want, you could check if that tool prints useable information for you as well
10:04 karolherbst: there is a check gainst gk106 though
10:05 karolherbst: this is what I get with glxgears: https://gist.github.com/karolherbst/72993778ea21dd65438b
10:05 karolherbst: it kind of makes sense
10:06 karolherbst: imirkin: your witcher patches are upstreamed?
10:07 imirkin: karolherbst: my resource lifetime patches are upstream
10:07 imirkin: not sure if those are the "witcher patches"
10:08 karolherbst: this buffer delete stuff
10:09 imirkin: yea
10:09 imirkin: that's upstream
10:09 karolherbst: okay, thanks
10:09 imirkin: and iirc joi said that it fixed all of those issues. but of course, there are more now... PBENTRY whatever that is
10:09 karolherbst: now I can remove the patches locally from portage :D
10:10 imirkin: errr... well note that they're not in any release
10:10 karolherbst: I use git
10:10 imirkin: ah ok
10:10 imirkin: all good then :)
10:10 karolherbst: got patch apply failures, that's why I asked
10:10 karolherbst: :D
10:11 karolherbst: mhhh
10:11 karolherbst: should I get a window with mplayer?
10:11 imirkin: that'd be ideal, yes
10:12 karolherbst: ohh my mistake
10:12 karolherbst: used wrong video
10:12 karolherbst: do I need the vc argument?
10:12 karolherbst: funny how vdpau works on intel without issues so far :)
10:13 imirkin: if you want hw decoding, you need the vc argument
10:14 imirkin: or you can look at the VideoAcceleration page
10:14 imirkin: which tells you how to make that sort of thing permanent in a non-annoying way
10:14 imirkin: but that i'm certainly not repeating here
10:22 karolherbst: imirkin: "[vdpau] Error when calling vdp_device_create_x11: 23"
10:23 imirkin: errr
10:23 imirkin: right
10:24 imirkin: VDPAU_DRIVER=nouveau
10:24 karolherbst: yeah
10:24 karolherbst: added that
10:24 imirkin: and DRI_PRIME=1 ?
10:24 karolherbst: yes
10:24 imirkin: are you foolishly adding LD_LIBRARY_PATH in the hopes that it does something?
10:24 karolherbst: mhh not before deeper debug
10:25 karolherbst: will try LD_DEBUG first
10:25 imirkin: (coz it doesn't)
10:25 imirkin: you need VDPAU_DRIVERS_PATH or something
10:25 karolherbst: looks okay though
10:25 karolherbst: the driver is loaded actually
10:25 karolherbst: otherwise I get stuff like that: "Failed to open VDPAU backend libvdpau_nouvea.so: cannot open shared object file: No such file or directory"
10:25 imirkin: ah ok
10:25 karolherbst: so it finds the driver alright
10:26 imirkin: start small -- what does vdpauinfo produce?
10:26 karolherbst: also so file seems to be loaded without problems
10:26 karolherbst: same error
10:26 imirkin: did you forget to install the firmware? anything in dmesg?
10:27 karolherbst: ohhhhh wait
10:27 karolherbst: I know I have the firmware for my gpu
10:27 karolherbst: and nothing in dmesg
10:27 karolherbst: I have these files: nve6_fuc409c nve6_fuc409d nve6_fuc41ac nve6_fuc41ad
10:27 imirkin: those are not the files
10:27 karolherbst: okay
10:28 imirkin: those are the ctxsw fw
10:28 karolherbst: okay
10:28 imirkin: install the nvidia-firmware package :)
10:28 karolherbst: then I will install the package
10:28 karolherbst: but shouldn't there be anything in dmesg?
10:28 imirkin: maybe, maybe not
10:28 karolherbst: :)
10:28 imirkin: we fail some of the vdpau stuff if you don't have the firmware
10:28 imirkin: although.... hrm
10:28 imirkin: for kepler it's kernel-only firmware, no userspace
10:29 imirkin: dunno
10:29 karolherbst: do I need some module para?
10:29 imirkin: no
10:29 imirkin: you need files like nve6_fuc084
10:29 imirkin: (and 085, 086)
10:30 imirkin: could well be something in the rewrite that killed it, dunno
10:30 imirkin: i don't think i've tested
10:31 karolherbst: okay, I have these now
10:31 karolherbst: but still, doesn't work
10:33 karolherbst: isn't there something like VDPAU_DEBUG or anything?
10:34 imirkin: sure, but that won't tell you anything interesting
10:34 karolherbst: :(
10:34 imirkin: (VDPAU_TRACE=1 iirc)
10:34 imirkin: bbl
10:34 karolherbst: ..., you are right, it's useless
10:35 karolherbst: vdp_imp_device_create_x11 returns 23
10:41 karolherbst: 23 is VDP_STATUS_RESOURCES?
10:43 karolherbst: oh well, debugging mesa then, fun
11:02 karolherbst: imirkin: this fails: http://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/state_trackers/vdpau/device.c#n67
11:03 karolherbst: so vl_screen_create fails for some reason
11:03 pmoreau: karolherbst: Don't forget to make me a script/README if you want me to test anything PCI related on my G8x, G9x.
11:04 karolherbst: pmoreau: mhh, just changing pstates
11:04 karolherbst: and check speed with lspci
11:04 karolherbst: this is all
11:04 pmoreau: Ok
11:04 karolherbst: it should work on the g9 ones
11:04 karolherbst: and not on the g8x ones
11:04 pmoreau: :-)
11:04 karolherbst: you can add debug messages though
11:04 karolherbst: I add some
11:04 karolherbst: *enable
11:04 imirkin_: karolherbst: makes sense that this is the thing that'd fail
11:05 imirkin_: karolherbst: question is... why
11:05 karolherbst: dri3 not supported?
11:05 imirkin_: let me try it here
11:05 pmoreau: RSpliet: Hurry up, I need reclocking on G8x to test PCI changes! :p
11:05 imirkin_: oooohhh... i wonder. maybe not.
11:05 karolherbst: I gdb through the source
11:05 karolherbst: and see some DRI2 stuff
11:05 imirkin_: let's see if i get the same fail here
11:05 imirkin_: give me a minute
11:05 karolherbst: yeah, the function returns NULL
11:06 imirkin_: karolherbst: i get the same error.
11:06 imirkin_: could be dri3
11:06 karolherbst: this fails? if (!scrn->base.pscreen)
11:06 imirkin_: unfortunately i'm entirely unfamiliar with this junk
11:06 karolherbst: in /usr/src/debug/media-libs/mesa-9999-r1/mesa-9999/src/gallium/auxiliary/vl/vl_winsys_dri.c:310
11:09 karolherbst: yeah
11:09 karolherbst: this check fails: http://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/auxiliary/vl/vl_winsys_dri.c#n397
11:09 karolherbst: so scrn->base.pscreen is null
11:10 karolherbst: which means, dd_create_screen failed
11:10 imirkin_: (gdb) p device_name
11:10 imirkin_: $7 = 0x619890 "/dev/dri/card0"
11:10 imirkin_: that's probably not great
11:10 karolherbst: wait
11:10 karolherbst: it rads DRI_PRIME for me though :/
11:10 imirkin_: yeah, it reads DRI_PRIME for me too :)
11:10 karolherbst: :D
11:11 imirkin_: but it does it the DRI2 way
11:11 imirkin_: which i guess doesn't work here
11:12 karolherbst: mhh
11:12 karolherbst: pitty
11:15 karolherbst: so I need to use dri2 for that?
11:15 imirkin_: or fix it to know about dri3 ;)
11:16 karolherbst: mhh
11:17 karolherbst: shouldn't be that difficult
11:17 karolherbst: :D
11:17 karolherbst: okay, who has a desktp kepler with working vdpau? :D
11:17 imirkin_: are you going to give it a shot?
11:17 karolherbst: don't know
11:18 karolherbst: somehow it isn't worth the effort just to verify a pmu reg bit
11:18 karolherbst: I can always just start a second X server
11:18 karolherbst: ohh what an idea
11:18 imirkin_: inspired
11:19 karolherbst: mhhh
11:19 karolherbst: somehow I can't always start a second X
11:20 karolherbst: NOUVEAU(0): [drm] error opening the drm NOUVEAU(0): 892:
11:21 karolherbst: my ttys aren't working too :/
11:21 karolherbst: messy
11:22 imirkin_: how do you start it?
11:22 karolherbst: X :1 -config ..file...
11:22 imirkin_: try
11:22 imirkin_: -sharevts -novtswitch
11:23 karolherbst: -sharevts is enough
11:23 karolherbst: messy
11:24 karolherbst: mhhhh
11:24 karolherbst: mhhh
11:24 karolherbst: well
11:24 karolherbst: that doesn't look too good
11:24 karolherbst: https://gist.github.com/karolherbst/0ac17a9af4550f587887
11:25 karolherbst: no h264 support?
11:25 karolherbst: ohh up there
11:25 karolherbst: okay, seems fine
11:25 imirkin_: not the weird profiles
11:25 imirkin_: which for all i know are actually supported
11:25 imirkin_: just not reported as such
11:25 karolherbst: k
11:25 karolherbst: now trying crazy stuff
11:27 karolherbst: ey, that's too stupid now
11:27 karolherbst: ohh server exit
11:27 karolherbst: uhhh
11:27 karolherbst: it works
11:27 karolherbst: mhhh
11:28 karolherbst: mhhhh, so the one pmu is not the video accel load
11:31 karolherbst: this doesn't make sense though
11:31 karolherbst: not really
11:32 imirkin_: try a higher bitrate video :)
11:33 imirkin_: note that the video decoding stuff isn't in PGRAPH at all
11:34 pmoreau: I forgot... Is there function overloading in C?
11:34 imirkin_: pmoreau: no
11:35 pmoreau: :(
11:35 imirkin_: pmoreau: i mean, there are function pointers
11:35 imirkin_: pmoreau: but not by name
11:35 pmoreau: Oh well, I'll remove an underscore and be good
11:45 mupuf: karolherbst, hakzsam, imirkin_: I changed the switch that connects reator to the net, it finally fixed the issue we had when downloading fast (ssh would become unresponsive)
11:45 imirkin_: awesome :)
11:45 mupuf: hopefully it will solve the disconnections issues
11:45 karolherbst: nice
11:46 mupuf: I have never had any problems with switches before, but the previous one clearly has been a pain in the ass
11:46 karolherbst: yay
11:46 karolherbst: I get load which isn't core, mem or pcie
11:46 mupuf: avoid zyxel at all cost :(
11:47 mupuf: I had never heard of it before, I did not want to buy a cheaper dlink switch, and I should have done it
11:47 karolherbst: { GR_HUB | GR_GPC | GR_ROP | PVLD | PPDEC | PPPP | BFB_NISO | PMFB } what of that could it be for video accell?
11:47 karolherbst: PPDEC?
11:47 imirkin_: PVLD, PPDEC, PPPP
11:47 imirkin_: those are the 3 engines used for video decoding
11:47 karolherbst: okay
11:47 imirkin_: PVLD does the bit decoding into macroblocks
11:47 mupuf: like imirkin_ said. You either need to count the 3 of them at the same time or need to spread them on multiple contexts
11:47 imirkin_: DEC does the hard work
11:47 mupuf: err, counters
11:48 imirkin_: PPP does some post-processing junk
11:48 mupuf: nvidia puts the 3 in one
11:48 karolherbst: nice
11:48 mupuf: even if it has the right number of counters
11:48 karolherbst: it works
11:48 karolherbst: only { PVLD | PPDEC | PPPP } enabled and like 16% load
11:49 mupuf: karolherbst: http://fs.mupuf.org/mupuf/nvidia/graphs/thresholds_vdec.svg
11:50 karolherbst: new version: https://github.com/karolherbst/envytools/commit/b76a8fc41f30fb66886fd95ff06e9b51186a9d6a
11:51 karolherbst: mupuf: mhhh, first I need to understand :)
11:51 mupuf: understand what?
11:51 karolherbst: the graph
11:52 mupuf: if (nva_cards[cnum]->chipset.chipset != 0xe6) --> 0xe4+ would work for sure
11:52 mupuf: not sure when nvidia changed the layout of the signals
11:52 mupuf: either for fermi or kepler
11:52 mupuf: but supporting both formats would not take too long
11:52 mupuf: I think I REed both
11:53 karolherbst: mupuf: yeah, I just added that because I never actually tried that on another card
11:53 karolherbst: the blob uses 0x20000 though
11:53 karolherbst: and I don't see when this is not 0
11:55 karolherbst: mupuf: what is Perf_VDec in this graph?
11:55 karolherbst: ohh video decoding
11:55 imirkin_: fwiw the video engine changed on nvd9
11:56 imirkin_: to what we refer to as VP5
11:58 mupuf: karolherbst: any other question?
11:58 karolherbst: mupuf: not really
11:58 karolherbst: I just have to came up with a nice threshlds
11:58 karolherbst: and then I write a bash daemon which will clock the card for me
11:58 mupuf: use nvidia's, as shown on the graph!
11:58 karolherbst: :D
11:58 karolherbst: well
11:59 karolherbst: I think it will be more difficult in the end. Anyway the blob is a pain in the ass for my gpu :/
12:00 hakzsam: mupuf, awesome, let me try that new switch :)
12:01 mupuf: hakzsam: it is up and running
12:01 hakzsam: cool
12:01 hakzsam: I'm going to fix MP counters on your ce then :)
12:02 hakzsam: mupuf, can I reboot reator?
12:02 mupuf: yes
12:04 karolherbst: mupuf: what are the impacts of having stuff running on the pmu slots?
12:04 imirkin_: skeggsb_: before all the nvgpu renames, PBENTRY used to be ILLEGAL_CMD
12:05 imirkin_: skeggsb_: which is an illegal pb command (not method)
12:05 imirkin_: skeggsb_: which means something messed up bigtime
12:05 imirkin_: joi: if you can get a mmt trace with a PBENTRY error, that'd be very useful
12:06 mupuf: karolherbst: absolutely none
12:07 karolherbst: mupuf: there is something odd with your graph though. Why does Perf_VDec doesn't have any impact or is it the plain event count?
12:08 karolherbst: no impact on perf by clocks I mean
12:08 mupuf: you likely read it wrong
12:08 mupuf: perf_vdec = value of the perf counter counting the video-dec-related events
12:08 mupuf: and it is the purple color
12:08 karolherbst: yeah I know
12:09 karolherbst: but it says "%" at the top ;)
12:09 mupuf: yes, and?
12:09 mupuf: there are two scales
12:09 mupuf: err, axis
12:09 karolherbst: I thought if it is the vdec_events/total_events ratio value, then this value should change depending on the clocks
12:09 karolherbst: higher clocks, lower ratio
12:10 mupuf: nope
12:10 karolherbst: at least it does for me here locally
12:10 mupuf: the polling frequency on the busy signals in the pmu counters is fixed
12:10 mupuf: and it is either 200 MHz or 330 MHz
12:10 karolherbst: mhh okay
12:10 karolherbst: and why does changing pstate change the ratios?
12:11 mupuf: ah, I was the one faking a load
12:11 mupuf: and I was measuring how the blob reacted to it
12:11 mupuf: does it make more sense?
12:12 karolherbst: maybe, I don't know how the counters change depending on pstate
12:12 karolherbst: I just display the ratio
12:12 karolherbst: which is the value I actually care about
12:12 karolherbst: should I also display the actual event counts in the tool?
12:13 mupuf: no, the ratio is all that matters
12:13 mupuf: you cannot predict how the counters will react to a change of pstate
12:13 mupuf: or it is SUPER hard
12:14 mupuf: hence why you need to react quickly to changes and go step by step
12:16 hakzsam: mupuf, I still have the disconnections issues
12:16 mupuf: ok
12:16 mupuf: RRRrrrr
12:17 mupuf: if it does not happen on the blob's partition
12:17 mupuf: just diff the config file and try to fix it
12:17 mupuf: the blob's partition is sda2
12:17 hakzsam: okay, but not now because your internet connection is just very slow
12:17 karolherbst: okay, total event count doesn't change
12:17 karolherbst: which makes actually sense
12:18 karolherbst: okay, next step writing a bash daemon to do the reclocking for me and try to find usefull ratios where reclocking makes sense)
12:18 karolherbst: mupuf: should I always check all four ratios? gpu core, memory, pcie and video?
12:19 karolherbst: ohh well, display would also make sense
12:19 karolherbst: which would be the display part of the pmu counter bits?
12:20 mupuf: just read the code of gk20's pmu
12:20 mupuf: and implement the same one
12:20 karolherbst: from these: { GR | GR_HUB | GR_GPC | GR_ROP | PVLD | PPDEC | PPPP | BFB_PART0_REQ | BFB_NISO | PMFB | PCOPY0 | PCOPY1 | PCOPY2 | PCIE }
12:20 karolherbst: mupuf: the gk20a code doesn't do much
12:20 mupuf: you are confused about how counters work
12:20 mupuf: pmu's counters do not count events
12:20 karolherbst: ohh what do they count?
12:21 karolherbst: mupuf: gk20a code only checks GR and PCOPY2
12:21 karolherbst: nothing else
12:22 mupuf: ack
12:22 karolherbst: the idea I have is, that when something is near 90% I should try to increase the perf of this "part", whether itis increaing pstates or cstates or whatever
12:22 mupuf: yes, but 90 is too high
12:22 joi: imirkin_: ok, I'll try, but it's hard to reproduce
12:23 imirkin_: joi: understood.
12:23 karolherbst: mupuf: gk20a uses 70 target and 90 max
12:23 mupuf: check the thresholds nvidia uses
12:23 mupuf: for GR
12:23 imirkin_: joi: my bet is that some idiot (e.g. me) tries to do IMMED_NVC0 with too large of an immediate
12:23 karolherbst: https://github.com/karolherbst/nouveau/blob/master/drm/nouveau/nvkm/subdev/pmu/gk20a.c#L79-L86
12:23 karolherbst: this part
12:24 karolherbst: I have to dig into that and understand what it does exactly
12:30 imirkin_: gnurou: ended up sending the request since i'm at wit's end with this stuff
12:31 imirkin_: gnurou: but i don't have high hopes.
12:32 hakzsam: imirkin_, 0x0f seems to be definitely UNALIGNED_MEM_ACCESS because I have got this error while trying to monitor MP counters on a GF114 (error is now fixed by aligning access to global memory)
12:33 hakzsam: I don't know why this is not documented in rnndb
12:34 imirkin_: hakzsam: aligned to what exactly? 0x4?
12:36 hakzsam: imirkin_, 0x30
12:36 imirkin_: errrrr
12:36 imirkin_: why 0x30?
12:39 hakzsam: imirkin_, 0x30 is the offset, 128 bits is the answer
12:40 imirkin_: were you doing 128-bit loads?
12:40 imirkin_: and/or stores
12:40 hakzsam: imirkin_, http://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/nvc0/nvc0_query.c#n943
12:40 imirkin_: right. so you were.
12:41 imirkin_: so that makes sense then
12:42 hakzsam: * mul $r8 u32 $r8 u32 36 --> this instruction causes unaligned mem access
12:42 hakzsam: well, maybe we should add UNALIGNED_MEM_ACCESS to rnndb?
12:43 imirkin_: errr
12:43 imirkin_: what?
12:44 imirkin_: how can that instruction cause an unaligned access?
12:44 hakzsam: I meant, this instruction will cause an unaligned access just after :)
12:44 imirkin_: oh
12:44 imirkin_: that makes more sense ;)
12:44 hakzsam: sure (sorry)
12:45 imirkin_: that prog needs to get rewritten for GK110/GK208 btw
12:46 hakzsam: mupuf, obvisouly, this unaligned memory access is also present on my gf108 because there we have 2 MPs!
12:46 hakzsam: imirkin_, as the compute support needs to be implemented
12:46 hakzsam: or maybe it already works, but I can't test
12:46 imirkin_: hakzsam: oh yeah. right, i disabled it.
12:46 hakzsam: I know
12:46 imirkin_: hakzsam: iirc we were missing something in our ctxsw fw
12:46 imirkin_: which was causing error spews
12:47 hakzsam: okay
12:48 hakzsam: imirkin_, compute support is required if you want to use MP counters on your GK208 :)
12:51 hakzsam: imirkin_, btw, I'm fixing MP counters on Fermi and the final goal is to get rid of that NVC0_COMPUTE envvar
12:53 imirkin_: hakzsam: i know
13:00 hakzsam: well, the blob is quite strange, sometimes it uses GPC[0], or [1] or [2] but I don't find any logic
13:02 mupuf: hakzsam: don't forget to turn off reator
13:02 hakzsam: yes
13:03 hakzsam: oh ooops
13:03 hakzsam: done
13:03 m3n3chm0:nasZ
13:03 hakzsam: I'll maybe try again later if your connection is more acceptable
13:32 pmoreau: Grrr... send-mail decided to give up on sending patches...
13:32 imirkin_: arch?
13:32 imirkin_: iirc arch has a broken git or perl or something by default
13:33 hakzsam: arch is fun :)
13:33 pmoreau: It did work one or two weeks ago
13:33 pmoreau: Just, not right now
13:33 hakzsam: what's the error?
13:34 pmoreau: fatal: 'send-email' appears to be a git command, but we were not able to execute it. Maybe git-send-email is broken?
13:34 imirkin_: hardcore
13:34 imirkin_: that's a diff error
13:34 imirkin_: the old one was a perl backtrace from ssl-ish thing
13:34 hakzsam: pmoreau, maybe perl needs to be updated?
13:35 pmoreau: I just ran pacman -Syu, and there was no perl
13:37 hakzsam: I meant, perl modules
13:37 hakzsam: but I don't remember how to update them
13:37 karolherbst: pmoreau: /usr/libexec/git-core/git-send-email
13:37 karolherbst: you could try to use this binary and check what actually happens
13:39 karolherbst: not giving the actual error is kind of cruel of git :D
13:39 pmoreau: Connection closed
13:39 karolherbst: mhh
13:39 karolherbst: does git use curl?
13:39 pmoreau: On line 1369
13:39 karolherbst: yay, it does
13:40 karolherbst: pmoreau: I may have a different version
13:40 karolherbst: and I terrible with perl
13:41 karolherbst: pmoreau: is it this line for you too? "$smtp->datasend("$header\n") or die $smtp->message;"
13:41 pmoreau: Almost: $smtp->datasend("$header\n$message") or die $smtp->message;
13:42 karolherbst: ohhh
13:42 karolherbst: this doesn*t look right
13:42 karolherbst: remove the $message part
13:43 karolherbst: I am on git 2.6 by the wa
13:43 karolherbst: y
13:43 karolherbst: hja!
13:43 karolherbst: wait
13:43 karolherbst: pmoreau: you are on 2.6 too!
13:43 karolherbst: hehe
13:43 karolherbst: you need a aptch
13:44 hakzsam: imirkin_, any ideas how to launch a kernel on a specific GPC (not necessarily GPC[0x0])?
13:44 karolherbst: pmoreau: https://gist.github.com/karolherbst/55b2e54f89efdfea3b91
13:44 imirkin_: hakzsam: none
13:44 pmoreau: yes, but I had the same problem before updating from 2.5.x iirc
13:44 karolherbst: this was added for portage to git-2.6.0
13:44 karolherbst: try this patch
13:44 hakzsam: imirkin_, I can use MP_LIMIT to limit to only one MP but it's always the first one
13:45 mupuf: hakzsam: just exit if you are running on the wrong one
13:45 karolherbst: pmoreau: I hope this patch works for you ;)
13:46 hakzsam: mupuf, that's a bit aggressive, no? :)
13:46 imirkin_: hakzsam: maybe by setting the grid id?
13:46 hakzsam: mmh I'll checl
13:46 hakzsam: *check
13:47 pmoreau: It was already broken on 2.5.1 apparently
13:47 karolherbst: :)
13:48 karolherbst: so the patch works?
13:48 pmoreau: I still had the archive for 2.5.1, so reverted back to that one and it still failed
13:48 pmoreau: Going to try the patch
13:52 pmoreau: karolherbst: Looks like it did! Thanks a lot for the patch!
14:03 hakzsam: mupuf, well, I can launch this kernel on the first GPC only, but I'm not sure if it works on your GF114 because I have seen strange stuff while monitoring on GPC[0]
14:04 RSpliet: pmoreau: are you sure your patch is 100% accurate?
14:05 RSpliet: because, for one, I don't see NVIDIA read 0x8807c in my NV94 trace
14:07 karolherbst: pmoreau: nice :)
14:13 karlmag:confuses himself on several nouveau wiki pages.. :-P
14:15 imirkin_: karlmag: improvements encouraged
14:15 imirkin_: karlmag: get a wiki account and edit away
14:16 karlmag: hehe... if I only understood what is described there..
14:16 imirkin_: what's the q
14:16 karlmag: no particular one...
14:17 imirkin_: what page is confusing
14:17 karlmag: I started with the feature matrix and went on from there
14:17 karlmag: I's probably not really that confusing..
14:17 imirkin_: there's not a ton of information and it's very poorly organized
14:17 karlmag: oh yeah... codenames is a bit confusing.. but I'm pretty sure it's not much to do about that really :-P
14:17 imirkin_: i spent some time improving it a couple years ago
14:17 imirkin_: but... it's a thankless task
14:18 karlmag: yeah.. documentation is never too fun. Very nessesary sometimes though.
14:19 karlmag: It was just me kind of deciding on a graphics card to get, then trying to figure out what support it currently has, then trying to understand what might be missing. And then got side traced on other stuff with other cards too :-P
14:20 karlmag: Ended up on some firmeware page and something about videodecoding :-P
14:21 karlmag: and I lost track if it even was relevant for the particular card..
14:21 karlmag: oh well
14:21 karlmag: one day... maybe... one day.. :-)
14:22 hakzsam: mupuf, well, launching the kernel on GPC[0] works fine on your GF114
14:22 hakzsam: mupuf, the solution could be to always monitor on GPC[0]
14:23 karolherbst: div_u64 is just division?
14:23 karolherbst: or is it anything fancy I don't know
14:24 karolherbst: ohh okay, got it
15:06 karolherbst: ohh wow, glxspheres64 produces a big driver overhead :/
15:06 karolherbst: "only" 68% gpu core load
15:07 karolherbst: 98,5% at 07 pstate though
15:09 karolherbst: yeah, glxspheres64 is a good driver overheead benchmark actually
15:10 imirkin_: can you see where the overhead goes?
15:10 imirkin_: my bet is that it's in "nouveau_update_fence" or something
15:10 imirkin_: which is aka "busy-waiting for gpu"
15:11 karolherbst: imirkin_: yeah, but shouldn't be the core at 100% even when it does busy waiting? or cose to it?
15:11 karolherbst: at 07 and 0a it is pretty close
15:11 imirkin_: karolherbst: maybe
15:11 karolherbst: around 98,5% for both
15:11 imirkin_: perhaps you're not measuring what you think you're measuring
15:11 imirkin_: either way, curious to hear where the overhead lies
15:11 karolherbst: but on 0f mem usage goes up from 15% to 18%
15:11 imirkin_: and if it's in nouveau_fence_update, break it down by caller
15:11 karolherbst: pcie around 7%
15:12 imirkin_: my guess is that we could improve matters by rotating our buffers more often
15:12 karolherbst: I do that without PRIME now, just that you know ;)
15:12 imirkin_: i.e. if a buffer is busy for read and it's small enough, copy it into a fresh buffer and do a direct update
15:12 imirkin_: instead of waiting for that one
15:13 karolherbst: do you need userspace or kernespace?
15:13 imirkin_: user
15:13 karolherbst: k
15:14 karolherbst: ohh right, there was this bfd update in gentoo
15:14 karolherbst: have to rebuild like all tracing tools :D
15:16 imirkin_: just perf
15:16 imirkin_: pretty annoying that the rdep isn't there
15:17 imirkin_: mattst88: --^
15:17 imirkin_: (perf doesn't seem to depend on binutils or whatever)
15:17 imirkin_: (or at least emerge didn't pick up the fact that it had to be rebuilt, and the old libbfd wasn't saved)
15:18 karolherbst: yeah
15:18 karolherbst: oh wow, and binutils-lib fails to build
15:18 karolherbst: very nice
15:18 karolherbst: haha, multilib header differences
15:18 mattst88: imirkin_: binutils is in RDEPEND if USE=demangle
15:19 imirkin_: mattst88: well it's linked in either way :)
15:19 karolherbst: usr/include/bfd.h is different for both archs, nice
15:19 imirkin_: mattst88: errr... hm. i have +demangle
15:19 imirkin_: mattst88: and yet i have a non-working perf now (until i rebuild)
15:20 imirkin_: mattst88: http://hastebin.com/ezuziciyix.vhdl
15:20 karolherbst: imirkin_: this binutils split came in some days ago
15:21 karolherbst: I bet the dev doing this just uses amd64 and -demangle :D
15:21 karolherbst: what a mess
15:22 karolherbst: imirkin_: is perf record enough? Or should I add some args?
15:23 karolherbst: imirkin_: 10.67% nvc0_draw_vbo
15:23 karolherbst: 5.5% _mesa_load_state_parameters
15:24 karlmag: so... a gtx 750ti (nv117/gm107, maxwell v1, unless I'm mistaken) should work at least somewhat, right? (column NV110 in the feature matrix, isn't it?) I've *almost* decided on one..
15:24 imirkin_: karlmag: perf record -g i think
15:24 imirkin_: karlmag: i wouldn't recommend it.
15:24 imirkin_: karlmag: lots of problems
15:24 imirkin_: karolherbst: that perf record -g was meant for you :)
15:25 karolherbst: I know
15:25 mattst88: imirkin_: I expect things should be in much better shape after binutils-libs gets going, but if not please file a bug
15:25 imirkin_: i just type kar<tab> and that's apparently not good enough
15:25 karlmag: ah.. hehe.. I was kind of confused by that command :-P
15:25 imirkin_: mattst88: well, i know that re-emerging perf will fix things. is that what you mean?
15:25 karolherbst: mattst88: does it compile for you with multilib?
15:26 karolherbst: imirkin_: ohhh okay, looks better now
15:26 karolherbst: 45% inside glxspheres
15:26 karolherbst: maybe I should run it on the blob and see how the load is there
15:26 karlmag: in a way is't a bit stupid to go for an "older" card given the price difference is essentially neglectable.
15:27 karlmag: on the other hand it's a bit nice to not have too many problems too..
15:27 imirkin_: karlmag: but in another way, it's not a great idea to go for a card that you know is poorly supported with no improvement in sight
15:27 karolherbst: imirkin_: 0.45 * 0.20 is nouveau_screen_get_name
15:27 karolherbst: this looks too much somehow :D
15:27 imirkin_: karolherbst: wtf? who's calling that??
15:27 karlmag: imirkin_: there is no improvement in sight for the nv117?
15:27 karolherbst: [unknown] :/
15:27 imirkin_: karlmag: well, i know i'm not working on it.
15:28 imirkin_: karlmag: i'm also unaware of anyone else working on it
15:28 karlmag: so you're it then..
15:28 karlmag: hmmm...
15:28 karolherbst: imirkin_: https://gist.github.com/karolherbst/150d1f9967b77eea0afc
15:28 imirkin_: maybe when/if gm20x works with nouveau i'll start caring about maxwell a little more
15:29 karlmag: maybe go for the 650ti after all then...
15:29 imirkin_: a bit difficult to care right now since it's such a small fraction of the gpu's out there, and i haven't the slightest clue what the issues are
15:29 imirkin_: i think it has to do with properly computing the sched codes
15:29 imirkin_: but... who knows
15:30 karolherbst: I am sure part of it are these stupid "bad" writes ;)
15:30 karolherbst: *reads
15:30 imirkin_: karolherbst: can you make a debug build of glxspheres
15:30 karolherbst: yes
15:30 mattst88: imirkin_: I think with binutils-libs and updated deps, when binutils-libs is upgraded perf will be rebuilt automatically
15:30 karolherbst: lol turbojpeg build issue
15:30 karolherbst: :/
15:31 imirkin_: mattst88: ah ok
15:31 karolherbst: mattst88: that must be new ;)
15:31 imirkin_: mattst88: anyways, sounds like this is a known/understood issue. i'll leave it alone.
15:32 imirkin_: karolherbst: i'm really curious what's calling nouveau_screen_get_name though
15:32 imirkin_: should be easy to cache it
15:32 karolherbst: yeah
15:32 karolherbst: wow, rebuilding libjpeg-turbo didn't help
15:32 karolherbst: what is goind on
15:32 karlmag: uhm.. aren't those maxwell too,though other version?
15:32 imirkin_: glGetString(GL_SOMETHING) should be what triggers it, but... ugh
15:33 karlmag: they're all listed in the nv110 family
15:33 karolherbst: lol wut
15:33 imirkin_: karlmag: the feature matrix isn't perfect in every way
15:33 imirkin_: karlmag: if i say something that conflicts with what you see there, trust me, not it.
15:33 karolherbst: yeah, of course building 64bit binaries for 32bit builds doesn't work
15:33 karlmag: imirkin_: actually, t his was on the codnames page, sorry
15:34 karlmag: but same question applies
15:34 karlmag: (and probably same answer too)
15:34 imirkin_: what's the question?
15:34 karlmag: you mentioned gm20x, then caring about maxwell (unless I misunderstood you).
15:34 imirkin_: gm20x is maxwell v2
15:35 karlmag: which is your priority now
15:35 imirkin_: a bunch of new features added there, as well as signed firmware
15:35 imirkin_: which means nouveau can't load
15:35 karlmag: if I got you correctly
15:35 karlmag: sounds like fun :-P
15:35 imirkin_: no, but there are a lot more GM20x's running around
15:35 karlmag: *nods*
15:35 imirkin_: once those are supported, there will be more of an argument for me to spend time on maxwell
15:35 karlmag: noticed those are top of the line.
15:36 imirkin_: they recently put out a GTX 950 which is lower end
15:36 karlmag: or as I call them "I wish" :-P
15:36 imirkin_: either way, i don't have the hw myself
15:36 imirkin_: and debugging things like that remotely is... not fun
15:36 karlmag: didn't I look at a gtx 950 too?
15:36 karlmag:checks
15:36 imirkin_: dunno what you looked at. gtx 950/960/970 are GM206's though.
15:36 karlmag: ah, no.. 960
15:37 karlmag: oh, ok..
15:37 imirkin_: GTX 980 is GM204
15:37 imirkin_: GTX TITAN X = GM200
15:37 imirkin_: i think
15:39 imirkin_: oh, i guess 970 is GM204. it's all on the codenames page
15:40 karlmag: bottom line is; those cards are likely to be well supported before the ones I mentioned before.
15:40 imirkin_: errr
15:40 imirkin_: no
15:40 imirkin_: none of the maxwells are likely to be well-supported in the near future
15:41 karlmag: uhm, ok..
15:41 imirkin_: once nouveau can do acceleration on the GM20x's, there will be more of an argument to look into why 3d accel is such a pile of fail
15:41 imirkin_: it is likely that the same issues plaguing GM107 also exist on GM20x
15:42 karolherbst: imirkin_: I do something wrong :/
15:42 karolherbst: I am sure glxspheres is built with debug symbols
15:42 karlmag: sorry to bother you with this. /me shows most of his cards to imirkin_ ; "n00b" "n00000b" "very novice" "n0000000b".. :-P
15:42 karolherbst: but I still don't get from where it was called
15:42 imirkin_: but until that happens, i'm insufficiently motivated to investigate
15:42 karolherbst: :D
15:43 karlmag: imirkin_: ah, ok..
15:43 karolherbst: I just gdb and check it myself
15:43 imirkin_: karlmag: that of course doesn't preclude someone else from investigating
15:43 imirkin_: karlmag: in fact i didn't even do the initial maxwell support -- skeggsb_ did.
15:43 karolherbst: OHHHHHHHH
15:43 karolherbst: imirkin_: I know
15:44 imirkin_: karolherbst: virtualgl?
15:44 karolherbst: imirkin_: check what glxspheres displays
15:44 karolherbst: ohh wait
15:44 karolherbst: it doesn't display the name
15:44 karolherbst: weird
15:44 karolherbst: ohhhh
15:44 karolherbst: mhhh
15:44 karolherbst: what is going on?
15:44 karolherbst: my breakpoint was just hit like once?
15:47 imirkin_: your backtraces seem bogus
15:47 imirkin_: -fomit-frame-pointer?
15:47 karlmag: oh jikes.... just came across a tesla k40
15:47 karolherbst: imirkin_: right
15:47 imirkin_: karlmag: pricey stuff :)
15:47 karolherbst: :D
15:47 karlmag: it cost about the same I get payed out in about 3 months... :-P
15:48 karlmag: the k20 about 2 months salaries..
15:48 karolherbst: should I rebuild mesa and virtualgl with -fno-omit-frame-pointers?
15:48 karlmag: i.e not *this* month,no..
15:48 imirkin_: karlmag: if you're looking to play games, go with amd or get ready to use the blob drivers
15:48 karlmag: games?
15:48 karlmag: heh..
15:48 imirkin_: video games
15:48 karolherbst: ohhh why do you need a fast gpu anyway?
15:48 karlmag: I know what you meant...
15:49 karlmag: karolherbst: who said I *needed* one? :-P
15:49 karolherbst: if you don't play games or do professional graphic stuff, use a intel gpu :p
15:49 karolherbst: seriously
15:49 karolherbst: less headache
15:49 karlmag: I intend to play a bit of games
15:49 karlmag: but not that much, really
15:49 karolherbst: what kind of games
15:49 karlmag: thing is.. I am getting this spiffy new cpu/mobo, and I kind of want a matching graphics card.
15:50 imirkin_: intel cpu?
15:50 imirkin_: if so, just use the onboard graphics and move on with life
15:50 karlmag: i7-5840k (IIRC)
15:50 karlmag: the mobo doesn't have graphics interface
15:51 imirkin_: get a diff mobo
15:51 imirkin_: oh wait, that cpu doesn't have graphics
15:51 imirkin_: at least i7-5820K doesn't
15:51 imirkin_: i don't see a 5840 on ark
15:51 imirkin_: why are you getting a haswell-e?
15:51 imirkin_: get a broadwell or skylake chip
15:51 karolherbst: isn't 5x broadwell?
15:51 karlmag: http://ark.intel.com/products/82932/Intel-Core-i7-5820K-Processor-15M-Cache-up-to-3_60-GHz
15:51 karlmag: that's the cpu
15:52 imirkin_: karolherbst: marketing is fun, right?
15:52 karlmag: can't get a differnet mobo either
15:52 karolherbst: :D
15:52 karlmag: it's a replacement I'm getting after a brand new, but faulty PSU blew up an old mobo
15:53 imirkin_: ah ok
15:53 karlmag: I.e I can't choose freely
15:53 imirkin_: well in that case, definitely go with amd -- their open drivers are well-supported
15:53 karlmag: Hmm...
15:54 karlmag: Interesting turn on events.
15:54 karlmag: Though I guess the amd drivers are much better now than when we had to ban them at work a few years ago.
15:54 imirkin_: bad the open drivers?
15:55 imirkin_: or ban catalyst?
15:55 imirkin_: catalyst is pretty horrible afaik
15:55 karlmag: basically we had to ban ati cards
15:55 imirkin_: in the past it's never relaly worked for me
15:55 imirkin_: but that past was quite some time ago
15:55 karlmag: everything just starting falling apart at one point
15:55 imirkin_: if you were banning *ati* cards, it must have been some while back
15:55 karlmag: I guess this is around 4 years ago
15:56 imirkin_: it hasn't been ati for a while
15:56 karlmag: maybe it was longer
15:56 karlmag: can't remember for sure
15:56 karlmag: (and time flies)
15:57 karolherbst: mhh
15:57 karolherbst: bannind a brand is kind of drastic
15:58 karolherbst: maybe the boss just disliked amd cards for no reasons
15:58 karolherbst: this is as far as I know the more usualy case for "bans"
15:59 karlmag: karolherbst: no, the reason was that we where unable to give stable support on the cards anymore. All of a sudden some cards just stopped working. And sometimes in very weird ways too.
15:59 karlmag: And when you're trying to do support on several hundred computers that's not a thing you can have happen.
15:59 karolherbst: ohh strange
16:00 karlmag: We thought so. Not to mention frustrating.
16:00 imirkin_: karlmag: well, if you're looking for open source support, intel and amd are the way to go.
16:00 karlmag: Yeah, things have changed a bit in recent years I guess.
16:01 imirkin_: and since intel doesn't sell add-on gpu's since the i740, amd is the only way to go
16:01 karlmag: I still have a stack of older nvidias.
16:01 karlmag: true that
16:01 imirkin_: and nouveau works on those. but you have to realize who works on nouveau
16:01 imirkin_: and who works on amd/intel
16:02 imirkin_: people work on amd/intel because they're paid to work on those drivers
16:02 imirkin_: people work on nouveau because it's a fun pastime
16:02 karlmag: Oh, I am cheering on you guys. Make no mistake on that :-)
16:03 imirkin_: (to be fair, there are people who, as part of their employment, contribute to nouveau. but not a whole lot of such people.)
16:04 karolherbst: imirkin_: I bet this won't change for the next 5 years anyway ;)
16:05 imirkin_: which is why i recommend using intel / amd when possible
16:08 karolherbst: imirkin_: much better now
16:08 karlmag: hmm.. I know less about amd than nvidia...
16:08 karolherbst: the driver seems to be harder to understand though
16:08 karolherbst: imirkin_: https://gist.github.com/karolherbst/150d1f9967b77eea0afc
16:10 karlmag: looks like the nvidia 4 monitor support cards are cheaper than ati 3 monitor cards though...
16:13 karolherbst: imirkin_: seems like nvc0_draw_vbo is pretty much called
16:14 karolherbst: will do a trace on lower perf level
16:14 imirkin_: karolherbst: what in there takes time?
16:14 karolherbst: in a non -g trace it is still 10%
16:14 karolherbst: maybe it is just called very often
16:15 karolherbst: vbo_save_playback_vertex_list ?
16:15 karolherbst: mhh
16:15 imirkin_: well, if it's 10%
16:15 imirkin_: that 10% is still split up among a bunch of instructions
16:15 imirkin_: not like 1 instruction in there ;)
16:18 karolherbst: mhhh
16:18 karolherbst: right
16:18 karolherbst: on lower states it is still at the top
16:19 karolherbst: though ioread32 with 5.44% :D
16:19 karolherbst: but yes, nvc0_draw_vbo is somehow the big one
16:19 karolherbst: and _mesa_load_state_parameters with over 5%
16:19 karolherbst: and 7% mutex lock overhead
16:21 karolherbst: vbo_save_playback_vertex_list has 3% in total
16:21 glennk: by non -g i assume -O2 or higher with some -march=something thrown in?
16:22 imirkin_: ok, but it's not spending time in nvc0_vbo_draw
16:22 imirkin_: it's spending time in fucntions that nvc0_vbo_draw calls
16:22 karolherbst: yeah
16:22 imirkin_: nvc0_vbo_draw doesn't actually do anything on its own iirc
16:22 imirkin_: so... which functions does nvc0_vbo_draw call?
16:23 karolherbst: 98.70%-- st_draw_vbo
16:23 imirkin_: that's the caller of that function
16:23 karolherbst: ohh okay
16:23 imirkin_: not a callee
16:25 karolherbst: let see how happy the blob is while I mess with the pmus :D
16:25 karolherbst: a bit unhappy
16:26 karolherbst: but cpu load is around 92% with blob
16:27 karolherbst: blob 4k fps, nouveau 1.5k fps :/
16:27 karolherbst: there is potential :D
16:28 karolherbst: I try to get a more detailed output somehow
16:29 karolherbst: ahhhh
16:29 karolherbst: much better
16:29 karolherbst: 81k lines though :/
16:30 karolherbst: imirkin_: perf report -G ;)
16:31 karolherbst: imirkin_: https://gist.githubusercontent.com/karolherbst/150d1f9967b77eea0afc/raw/a3bd378bf4da9f653f72eea4e721631ccc1d9dfc/gistfile1.txt is this better?
16:32 imirkin_: much
16:32 imirkin_: thanks
16:32 karolherbst: I think this is the biggest block
16:33 imirkin_: it's the block that matters
16:33 imirkin_: the other stuff it's difficult to do anything about
16:33 karolherbst: yeah I know
16:33 karolherbst: but it may be a block which is only called like 1% of the time ;)
16:33 karolherbst: was just checking that
16:33 karolherbst: it is around 10% for sure
16:33 imirkin_: hmmmmmm
16:33 imirkin_: ok, this is bad
16:33 karolherbst: I can give you the entire output though :p
16:34 imirkin_: why does __memcpy_avx_unaligned end up in ioread32
16:34 karolherbst: ask glibc
16:34 imirkin_: heh
16:34 karolherbst: I think I have the sources
16:34 karolherbst: wait a sc
16:34 imirkin_: this is a kernel func
16:34 karolherbst: I know, but glibc calls it?
16:35 imirkin_: wellllll
16:35 karolherbst: __memcpy_avx_unaligned is glibc for sure
16:35 imirkin_: glibc just does memory io
16:35 imirkin_: i bet the page faults
16:35 imirkin_: and nouveau does something dirty
16:35 RSpliet: is mmiotrace still on? :-P
16:35 imirkin_: lol
16:36 karolherbst: :p
16:36 karolherbst: no
16:36 karolherbst: :D
16:36 joi: btw, when perf gives you suspicious data, it's worth trying -F option
16:36 RSpliet:imagines filling up a meme with "FAULT ALL THE PAGES!!1"
16:36 karolherbst: I have full preempt kernel by the way
16:38 karlmag:imagines someone with a huge notebook, ripping off one page at the time, pointing at it, yelling "it's your fault", then crumbling it up and tossing it in a pile, going to the next page..
16:38 imirkin_: joi: what's -F?
16:38 joi: changes frequency
16:38 imirkin_: ah
16:39 imirkin_: ok. so this looks like these functions do something dodgy
16:42 imirkin_: hmmmm
16:42 imirkin_: nouveau_scratch_data makes a GART bo and maps it
16:42 imirkin_: that shouldn't have much overhead
16:42 imirkin_: i wonder why it does so many ioreads
16:42 imirkin_: oh hahaha
16:43 imirkin_: it makes 4k buffers
16:43 imirkin_: karolherbst: can you update nouveau_scratch_bo_alloc to make 128k buffers instead of 4k?
16:43 karolherbst: yes
16:43 karolherbst: I've updated the gist by the way: https://gist.githubusercontent.com/karolherbst/150d1f9967b77eea0afc/raw/170fc8cdae74bdebeafc4133fb577ac5a442dad2/gistfile1.txt
16:43 imirkin_: (why 128k you ask? because that's the size of an nvidia large page)
16:44 karolherbst: it should contain now over 80% of the entire cpu load
16:44 karolherbst: there is someting even bigger then the last one we looked at
16:44 karolherbst: "67.20%-- st_draw_vbo" same parent
16:46 karolherbst: imirkin_: this thingy? return nouveau_bo_new(nv->screen->device, NOUVEAU_BO_GART | NOUVEAU_BO_MAP, 4096, size, NULL, pbo);
16:46 imirkin_: yes
16:47 imirkin_: just change 4096 to 128k
16:47 imirkin_: errr
16:47 imirkin_: oh i see
16:47 imirkin_: hrmph.
16:47 karolherbst: ?
16:47 imirkin_: that's the alignment
16:48 karolherbst: will something bad happend with 128k?
16:48 RSpliet: imirkin_: can you have 128K pages with non-128K (but 4K) alignment?
16:49 RSpliet: I reckon the pt will have fewer bits per address?
16:49 imirkin_: RSpliet: no
16:49 imirkin_: karolherbst: no, try 128k
16:49 karolherbst: k
16:50 RSpliet: imirkin_: is no an answer to my first or second question? :-P
16:50 imirkin_: RSpliet: sorry, first. 128k pages have to be 128k-aligned
16:50 imirkin_: [or are they 64k pages? don't remember]
16:50 imirkin_: [iirc there's a switch for whether large pages are 64k or 128k]
16:56 imirkin_: the scratch data is allocated in 2M chunks
16:56 imirkin_: and doesn't use the transfer logic, so there's never any waiting
16:57 imirkin_: might be a tall order to ask for 128k-aligned worth of phys mem. dunno.
16:59 RSpliet: if 4K pages are allocated from the start, and 128K pages from the end... shouldn't be a problem right? :-P
16:59 imirkin_: RSpliet: they're allocated that way in vram
16:59 RSpliet: oh right
16:59 imirkin_: RSpliet: but this is for GART
16:59 imirkin_: aka system memory
16:59 RSpliet: I was about to amend yes
16:59 RSpliet: (fwiw, I have a habit of thinking remarkably loud)
16:59 imirkin_: no worries
17:01 RSpliet: we can always start sewing memory together using IOMMU if necessary :-P
17:03 karolherbst: imirkin_: it seems like nothing really changed
17:03 imirkin_: karolherbst: hm ok
17:03 karolherbst: maybe I get less perf now
17:03 imirkin_: karolherbst: another idea:
17:03 imirkin_: in nvc0_cb_bo_push, change nr = MIN2(nr, NV04_PFIFO_MAX_PACKET_LEN - 1);
17:03 imirkin_: to - 2 instead of the -1
17:04 imirkin_: errr
17:04 imirkin_: -3
17:04 karolherbst: which file?
17:05 imirkin_: nvc0_transfer.c
17:06 karolherbst: got it
17:06 imirkin_: er no. i take it back. -2 is right.
17:07 gnurou: imirkin_: I saw that a bug has been opened for your request
17:07 imirkin_: gnurou: awesome
17:07 gnurou: imirkin_: strangely I did not receive your original mail...
17:07 imirkin_: gnurou: i cc'd nouveau@
17:07 imirkin_: are you not on that list?
17:07 gnurou: seems like I am not
17:07 karolherbst: mhhh
17:07 karolherbst: okay
17:08 karolherbst: mhhh, doesn't change much either
17:09 RSpliet: imirkin_: is that the top function because it takes forever, or rather because it's just called so many times?
17:09 imirkin_: RSpliet: maybe both. but i'm fairly sure that -1 is wrong there.
17:11 karolherbst: anyway, neither -2 nor -3 are any better with glxspheres
17:11 imirkin_: yeah, it was an unlikely thing
17:11 imirkin_: i doubt they have such large uniforms there
17:11 RSpliet: gnurou: you're a Tegra man right? ... got Coresight to get some cycle-accurate trace from nvc0_draw_vbo ? :-P
17:13 gnurou: RSpliet: not yet, but probably doable... if I can setup a decent openocd environment :P
17:14 RSpliet: I'd bet on ARM DS-5 rather, there's no mature open source Coresight trace decoder last time I checked
17:18 RSpliet: anyway, that was only half serious; coresight a great tool for detecting hot-spots, but I doubt it'll hit __avx_memcpy_unaligned on ARM :-D
17:18 karolherbst: imirkin_: _mesa_load_state_parameters is pretty often called too :/
17:18 karolherbst: and it does some strange things inside the trace
17:19 imirkin_: karolherbst: yeah i dunno what that function does, but i can't imagine it's anything good
17:20 karolherbst: I see
17:20 karolherbst: update_vs_constants is parent
17:21 karolherbst: a lot of ioread32 in that
17:24 karolherbst: this looks strange somehow
17:24 karolherbst: mhhh
17:25 karolherbst: huh
17:26 karolherbst: imirkin_: okay: I have the same core speed at 0f and 0a. with 0a I got like 98% core load, with 0f only 66% core load, dlightly higher mem load, but same fps
17:26 airlied: ioread32 is probably reading the timer for profiling
17:26 imirkin_: ohhhhhhh right
17:26 airlied: or something silly like that
17:27 RSpliet: karolherbst: yes
17:27 RSpliet: that's because your cores are stalled less
17:28 RSpliet: or are you referring to your CPU load as "core load"?
17:28 imirkin_: ok, so the problem is that glxspheres uses a client array, which is teh suck
17:28 karolherbst: RSpliet: because of gpu memory speed?
17:28 karolherbst: RSpliet: no, gpu core load
17:28 karolherbst: if I mean cpu, I say cpu ;)
17:29 RSpliet: good
17:29 RSpliet: yes, higher memory speed means load/stores are finished quicker, resulting in a lower CPI
17:29 RSpliet: since you're executing the same number of instruction, your load reduces
17:29 karolherbst: okay
17:30 karolherbst: that makes sense
17:30 karolherbst: okay, then I assume the driver doesn't push new work fast enough to the gpu
17:30 karolherbst: because of high cpu load, which is very low on 07 pstate
17:31 karolherbst: 40% at 07, 94% at 0a, 96% at 0f
17:37 karolherbst: imirkin_: anything else I could try out?
17:38 glennk:suggests heaven basic at a low resolution
17:39 karolherbst: but heaven doesn't produce a cpu bottleneck I think :/
17:39 karolherbst: for heaven the gpu core is pretty much at full load all the time
17:40 karolherbst: so, I think the compiler only has to be improved to increase perf there
17:43 RSpliet: karolherbst: and a pile of other tiny params that could make a difference
17:43 karolherbst: yeah okay
17:44 RSpliet: (like: do we configure the LLC properly? :-D)
17:44 karolherbst: like zcull? :)
17:44 RSpliet: like zcull!
17:44 karolherbst: :p
17:44 karolherbst: what is this LLC thingy?
17:44 karolherbst: is this something I have to poke and with the right value I get x2 perf?
17:44 RSpliet: last-level cache
17:44 karolherbst: :D
17:44 karolherbst: ohhh
17:45 RSpliet: one hopes it configures itself, but you never know
17:45 karolherbst: I prefer easy tasks :p
17:45 RSpliet: like insn scheduling?
17:45 glennk: karolherbst, which is why i said low resolution, like 512x384...
17:45 karolherbst: RSpliet: right :D
17:46 karolherbst: glennk: I could try
17:47 karolherbst: glennk: 99.9% load at 640x360
17:47 karolherbst: 58% cpu
17:48 glennk: what gpu is this? a potato?
17:48 karolherbst: ohh 770m
17:48 karolherbst: ohh wait
17:48 karolherbst: 07 pstate
17:48 karolherbst: I already wondered why I only get like 65fps :D
17:49 karolherbst: 150 fps now
17:49 karolherbst: much more wow
17:49 karolherbst: okay , 83% gpu core load, 10.5% gpu mem load, 7.7% pcie load, 114% cpu load
17:49 glennk: thats more in line what i'd expect, get something like that on a 5850 at that res and its around 70% gpu load
17:49 karolherbst: okay
17:50 karolherbst: should I trace heaven then?
17:52 glennk: it probably does a few more state changes and draw calls per frame than glxspheres
17:52 karolherbst: how can I launch it without that stupid launcher? :D
17:54 glennk: copy the exe line from htop and run it?
17:54 karolherbst: tried it already
17:54 karolherbst: black screen
17:54 glennk: +the env
17:54 karolherbst: now it gets messy
17:55 karolherbst: ohh
17:55 karolherbst: actually I can simply start the launcher under perf
17:55 karolherbst: it will fetch the childs, too
17:55 glennk: can kill the launcher once the app has started
17:55 karolherbst: right
17:56 karolherbst: mhh this looks boring somehow
17:57 karolherbst: nearly nothing inside nouveau
17:57 karolherbst: 0.38% biggest entry
17:57 karolherbst: teximage
17:58 glennk: probably a bunch of waits
17:58 karolherbst: most of the stuff is inside the unigine engine
17:58 glennk: you can grab a trace and run that instead
17:58 karolherbst: still only 85% gpu core load
17:59 karolherbst: right
18:00 karolherbst: mhhh
18:01 karolherbst: bad luck
18:01 karolherbst: okay, apitrace can't handle that
18:02 glennk: oh right, need to run the binary directly
18:02 karolherbst: I have a atrace
18:02 karolherbst: this ain't the problem
18:02 karolherbst: but apitrace can't replay it
18:02 karolherbst: the loading screen works fine though
18:03 karolherbst: I also get the fps counter
18:03 karolherbst: and the menu buttons at the top
18:03 karolherbst: but not the scene
18:03 glennk: oh yeah, its the mapped buffer extension, need to disable that or the vertexes get all messed up
18:03 karolherbst: bunch of those: 384686: warning: glGetError(glUniform4fv) = GL_INVALID_OPERATION
18:03 karolherbst: which is the ext string?
18:04 glennk: GL_ARB_buffer_storage i think?
18:05 glennk: GL_ARB_map_buffer_range sorry
18:05 glennk: buffer storage ext didn't exist when heaven was released
18:06 karolherbst: ohh wait, the menu is gone :D
18:06 karolherbst: could be mine fault though
18:06 karolherbst: don't know
18:06 karolherbst: don't care yet
18:07 karolherbst: wow
18:07 karolherbst: there is some flickering now
18:08 karolherbst: okay, that didn't work :/
18:12 glennk: xonotic, tesseract.gg are a couple that do a lot of draw calls
18:13 karolherbst: what about pixmark_piano?
18:13 karolherbst: ohh no
18:13 karolherbst: that is mainly gpu core
18:13 karolherbst: maybe it does draw calls
18:14 karolherbst: *many
18:14 imirkin: doom3
18:15 imirkin: apparently does infinity draw calls
18:17 glennk: karolherbst, pixmark piano probably draws a single full screen quad, its all raymarching in the pixel shader
18:17 karolherbst: yeah think so too :/
18:18 karolherbst: ohh wait
18:18 karolherbst: I made the window as small as possible
18:18 karolherbst: no 99.99% gpu core load anymore :D
18:18 karolherbst: "only" 98%
18:18 karolherbst: and 105% cpu
18:18 glennk: don't make it too small or the gpu will stall itself horribly
18:19 karolherbst: ohh it can't stall there
18:19 karolherbst: how?
18:19 glennk: think about it for a bit
18:19 karolherbst: memory load is under 1%
18:19 karolherbst: so I don't think there is anything going up except in the gpu core
18:21 karolherbst: gputest triangly is kind of strange
18:21 karolherbst: 66% gpu core/ 8% gpu mem / 55% pcie / 63% cpu
18:50 karolherbst: :)
18:50 karolherbst: first userspace reclocking daemon done :)
18:50 karolherbst: and it "kind of" works
18:50 karolherbst: allthough it takes a while until the gpu clocks up
21:40 lindylex: I have a Mac PPC G4 with the following video card VGA compatible controller [0300]: NVIDIA Corporation NV11 [GeForce2 MX/MX 400] [10de:0110] (rev a1 I am trying to install Debian Wheezy and it boot up to a black screen. This is my Xorg.0.log file contents. http://pastebin.com/xBS3Exvi