09:27 RSpliet: skeggsb: do you take github pull requests for kernel dev, or do you still prefer mailing list communication?
10:08 mupuf: RSpliet: please stick to the ML :)
10:08 RSpliet: mupuf: I'd love to, but my free e-mail provider is being a bit of a <insert profanity here> about me using their SMTP server to send patches
10:09 mupuf: ...
10:09 mupuf: really? :o
10:09 RSpliet: yeah... if you want something done right, better do it yourself
10:09 mupuf: ... or send emails through gmail :s
12:53 DebianLinuxero: Hello. Any dev here?
12:54 imirkin: just ask your question, someone will see it and reply in due time
12:55 DebianLinuxero: It's only to say I finally could do the requested test in this bug :
12:56 DebianLinuxero: https://bugs.freedesktop.org/show_bug.cgi?id=95044
12:59 felipebalbi: imirkin: okay, I'll try that
13:23 RSpliet: DebianLinuxero: activities on bugs are logged in the mailinglist. Thanks, it's helpful
13:24 RSpliet: I'm a bit short on time, so I haven't been able to triple-check trace and generate a patch, but it's high on my todo list
13:25 RSpliet: (it's not really my day-time job ;-) )
13:27 DebianLinuxero: RSPliet:Thanks for reply.
14:20 imirkin_: hakzsam: i think my patch is fine... the generated code you sent me looks ok
14:20 imirkin_: hakzsam: it's not ideal, since preferably that second mov wouldn't be there, but... meh
14:21 imirkin_: hakzsam: i think the source also has to be a merged thing, otherwise the RA doesn't guarantee sequential alloc or something, so it inserts constraint movs
14:21 imirkin_: hakzsam: but like i said... i wouldn't worry about it
14:47 hakzsam: imirkin_, right, after a second look, both your patch and the generated code are fine, but the latter looks ugly :)
14:47 hakzsam: anyway, that will fix the issue
14:50 imirkin_: hakzsam: i think it's fixable, but dunno that it's worth fixing
14:51 hakzsam: well, that doesn't matter for now :)
14:53 karolherbst: mind giving me some comments about my Dual issueing pass? It doesn't seem to harm anything significantly and improves performance in some gputest benchmarks: https://github.com/karolherbst/mesa/commit/8e18f33893cde040776328732948355f8b6a067f
14:57 imirkin_: karolherbst: needs comments about wtf it's trying to do
14:57 imirkin_: i.e. what algorithm it's implementing
14:57 karolherbst: I see
14:58 imirkin_: like a nice paragraph in plain english
14:58 karolherbst: well if i and i->next can't be dual issued, find a later instruction which can be dual issued with i and move it next to i :)
14:59 karolherbst: (inside the same BB)
15:03 mupuf: karolherbst: make sure you do not increase the compilation time too much too
15:03 karolherbst: mupuf: yeah, I am aware of that
15:03 karolherbst: mupuf: I also have to limit this pass on archs with dual issueing, too
15:03 mupuf: ah ah, that would be good, yeah
15:03 karolherbst: but if you can't dual issue at all, the perf is bad anyway (or this isn't even possible anway)
15:04 karolherbst: usually only 3 or 4 next instructions are checked
15:04 karolherbst: so the complexity is something around 4n (n: instruction count)
15:04 karolherbst: (in avarage)
15:05 karolherbst: mupuf: and if we have perfect dual issuing already, it is close to n
15:05 imirkin_: Horrorcat: what kernel are you on?
15:06 mupuf: karolherbst: does not sound too crazy
15:06 Horrorcat: 4.5.0-1-amd64 (debian testing)
15:06 imirkin_: Horrorcat: let me rephrase... try to grab kernel 4.5, and then separately grab this: https://github.com/skeggsb/nouveau/
15:06 imirkin_: then "cd drm; make -j8"
15:06 imirkin_: that should make you a new nouveau.ko which you can insmod and should have GM108 support
15:06 Horrorcat: thanks for the intsructinos, I’ll follow them tomorrow or something, I’m in a metting right now
15:07 karolherbst: mupuf: improves performance in pixmark_piano by 0.86% :)
15:07 Horrorcat: that sounds excellent
15:07 karolherbst: mupuf: and volplosion by 0.25%
15:07 imirkin_: karolherbst: statistically significant?
15:07 mupuf: well, that is not nothing
15:07 karolherbst: imirkin_: well, pixmark_piano has no varriance :D
15:08 karolherbst: n=6 +-0.00%
15:08 imirkin_: lol
15:08 karolherbst: same for volplosion
15:08 mupuf: frames take so long to render that you get discretisation
15:08 imirkin_: hopefully it's not using GL_TIMESTAMP stuff for anything
15:08 karolherbst: I even had n=20 for piano
15:08 karolherbst: and got +-0.00%
15:09 mupuf: imirkin_: nope, it counts the number of frames it rendered in $seconds
15:09 imirkin_: ok good
15:09 mupuf: this means that the score is linear with the execution time :D
15:10 mupuf: ezbench normalizes that to produce an average FPS reading
15:10 mupuf: because, that's what people expect anyway
15:10 mezo: Horrorcat: just a hint, because i had same problem on debian ;) instead of cd drm; make u have to do
15:10 mezo: cd /usr/src/linux-headers-4.5.0-1-amd64/
15:10 mezo: make M=/home/daniel/Downloads/nouveau/drm/nouveau KCPPFLAGS="" modules
15:11 mezo: for example
15:11 mezo: otherwise it doesnt work
15:11 mezo: took me hours to find out
15:11 imirkin_: hmmm... there should be an easier way
15:11 karolherbst: mupuf: mhh is there some inaccuracy in getting the fps value of pixmark_piano? cause it looks a bit odd
15:12 karolherbst: mupuf: 4.733, 4.666 and 4.633 are the values for all commits
15:12 mupuf: well, sure, the value is discrete
15:12 mupuf: the difference is likely that you computed one more frame
15:12 karolherbst: ahh right
15:12 mupuf: you need longer execution times if you want to see more finely
15:12 imirkin_: mezo: have a look at https://github.com/skeggsb/nouveau/blob/master/drm/Makefile
15:12 karolherbst: mupuf: yeha, makes sense
15:13 karolherbst: mupuf: or lower resolution
15:13 imirkin_: mezo: should be able to do "make LINUXDIR=/usr/src/linux-headers-4.5.0-1-amd64/" in the drm dir
15:13 mupuf: karolherbst: lower resolution would help, but you may not hit the same bottlenecks
15:13 Horrorcat: ah thanks mezo
15:13 mupuf: in this case, since there are no textures, that should be OK
15:13 karolherbst: mupuf: there is no other bottleneck than shader computation in it :D
15:14 mupuf: karolherbst: how long do you run it?
15:14 karolherbst: 30 seconds
15:14 Horrorcat: mezo: where do I get that directory from? I don’t have /usr/src/linux-headers-*
15:14 mezo: u need to install kernel headers
15:15 mupuf: karolherbst: well, all the results you show have one or two frames rendered difference
15:15 mezo: linux-headers-amd64 for example
15:15 mezo: imirkin_: ok, nice to know.
15:15 mupuf: so, I would suggest lowering the resolution and increasing the run time to 60 seocnds
15:15 karolherbst: mezo: yeah, but each run at the same commit has the same amount of frame rendered and with lower resolution I had the same results
15:16 mupuf: same result as higher resolution? WTF?
15:16 mupuf: is it a vertex test or what?
15:17 karolherbst: mupuf: no, just a big fragment shader
15:17 karolherbst: one draw call
15:17 mupuf: then it makes no sense
15:17 karolherbst: well I meant I get the same improvements
15:17 karolherbst: at higher fps
15:18 karolherbst: well, I will make a new run with higher accuracy anyway later
15:18 mezo: "make LINUXDIR=/usr/src/linux-headers-4.5.0-1-amd64/" works. just tested.
15:21 imirkin_: mezo: that makefile has a few heuristics for determining the linux source location... this helps it out :)
15:22 mezo: ye, the problem was, it used -common instead of -amd64 if i remember right
15:22 imirkin_: the problem is that it expects your system to have semi-usual things configured, but instead you're using debian :p
15:22 imirkin_: normally /lib/modules/$version/source points to the tree
15:23 karolherbst: I really hoped that no distribution would try to be smart and just do it the easy way ...
15:26 mezo: source -> /usr/src/linux-headers-4.5.0-1-common
15:26 imirkin_: ok. so take it up with your package maintainers.
15:48 Horrorcat: imirkin_: /home/horazont/Builds/nouveau/drm/nouveau/nouveau_acpi.c:317:2: error: too many arguments to function ‘vga_switcheroo_register_handler’
15:48 Horrorcat: do you need a full paste?
15:52 imirkin_: hmmmm... i guess it depends on something introduced in 4.6 =/
15:57 mezo: what is the difference between karolherbst/nouveau and skeggsb/nouveau?
15:57 imirkin_: one is karol's tree, one is ben's?
15:57 imirkin_: ben's the nouveau maintainer, btw.
16:01 Horrorcat: imirkin_: :(
16:03 imirkin_: Horrorcat: in that repo, do "git revert 3b3ec4e10ade6"
16:06 Horrorcat: imirkin_: built, copied the nouveau.ko over the one shipped by debian, rmmod, modprobe
16:06 Horrorcat: how do I check that it works? ;-)
16:06 Horrorcat: it is not listed in xrandr --listproviders
16:06 imirkin_: Horrorcat: pastebin dmesg
16:10 Horrorcat: imirkin_: https://paste.fedoraproject.org/359549/60061114/
16:10 imirkin_: all's well.
16:10 imirkin_: DRI_PRIME=1 glxinfo
16:10 imirkin_: [do you have a dri3 ddx?]
16:11 Horrorcat: OpenGL renderer string: Mesa DRI Intel(R) HD Graphics 530 (Skylake GT2)
16:12 imirkin_: i guess not :)
16:12 Horrorcat: :-)
16:12 imirkin_: i would highly recommend enabling DRI3 and making sure that the modesetting ddx does NOT add the nvidia gpu as a secondary
16:12 imirkin_: basically you want to add an Option "DRI" "3" to the device section
16:13 imirkin_: and Option "AutoAddGPU" "false" to the ... ServerFlags? something like that
16:13 Horrorcat: Thanks for your help so far, I gotta go now. I’ll come back tomorrow or something
16:23 urmet: GM108 and PRIME testing going on here? sweet
16:26 imirkin_: urmet: ben's tree has GM108 support now
16:27 imirkin_: he fixed up some of the breakage due to weird internal geometry
16:30 urmet: i'll try to try it out in a few hours. have a laptop with skylake+840M
17:05 Horrorcat: imirkin_: hm, can I somehow dump the current "autogenerated"(?) xorg config? I don’t have any persistent xorg config except the one I use to force my keyboard layout
17:06 Horrorcat: otherwise, I don’t know how to set these options
17:11 Horrorcat: ah, there we go
17:12 imirkin_: Horrorcat: you can just add little bits to it
17:12 Horrorcat: yes, but I have no idea how it’s supposed to look. Anyways, I found `X :2 -configure` online, which gave me something to start from
17:12 imirkin_: Horrorcat: like in /etc/xorg.conf.d/10-device.conf - just stick a tiny device section in there
17:12 imirkin_: nah, you want nothing to do with the output of that
17:13 Horrorcat: so, like, just Section "Device"\n Option "DRI" "3"\nEndSection?
17:13 imirkin_: Horrorcat: http://hastebin.com/busohohiri.cmake - stick that in /etc/X11/xorg.conf.d/00-foo.conf
17:13 Horrorcat: hm, the X -configure gave me a Identifier "Card0" for the intel
17:13 Horrorcat: still use it?
17:14 Horrorcat: https://paste.fedoraproject.org/359591/60445914/ that’s what I have now
17:17 Horrorcat: now I get "OpenGL renderer string: Gallium 0.4 on NV118" from DRI_PRIME=1 glxgears
17:17 Horrorcat: s/glxgears/glxinfo/
17:17 Horrorcat: I’m a bit suspicious though as glxgears works, even though I don’t have any compositing
17:21 Horrorcat: hm, tried some wine game, performance is horrible with DRI_PRIME=1, but fine without.
17:24 imirkin_: Horrorcat: sounds like everything's working then :)
17:24 imirkin_: with dri3 you don't need compositing
17:24 Horrorcat: ah, nice
17:25 imirkin_: the gpu is in its lowest pstate, so you get horrid perf, as expected
17:25 Horrorcat: ah, nice
17:25 Horrorcat: I prefer lower pstate over not working power management ;-)
17:25 imirkin_: i think karolherbst might have some patches to enable super-duper-experimental reclocking on maxwell
17:25 Horrorcat: I’d prefer not to fry my hardware. but if the patches are reasonably safe, anyone can ping me for quick tests
17:26 imirkin_: highly unlikely to fry
17:28 Horrorcat: well, karolherbst, feel free to ping me if you need some testing for anything on that gpu
17:33 karolherbst: it isn't super-duper-experimental anymore :D
17:33 karolherbst: Horrorcat: mind telling me your kernel version, distribution and GPU?
17:33 karolherbst: ohh maxwell
17:33 imirkin_: but fairly likely to hang your gpu
17:33 karolherbst: yeah well
17:33 karolherbst: untested
17:33 Horrorcat: karolherbst: 02:00.0 3D controller [0302]: NVIDIA Corporation GM108M [GeForce 940MX] [10de:134d] (rev a2), 4.5.0-1-amd64, debian-testing
17:34 karolherbst: ohh DDR3 nice
17:34 karolherbst: never tested those I think
17:34 Horrorcat: well, it works
17:34 karolherbst: Horrorcat: if you are up to it, you can try reclocking it :)
17:34 karolherbst: Horrorcat: yeah, but no memory reclocking, right?
17:34 Horrorcat: not right now, but if you link me patches...
17:34 Horrorcat: it’s horribly slow if you mean that ;-)
17:34 Horrorcat: easily outperformed by the intel
17:35 karolherbst: right :)
17:35 karolherbst: Horrorcat: did you actually reclock through the pstate file?
17:35 Horrorcat: I built from https://github.com/skeggsb/nouveau/, with 3b3ec4e10ade69abcf0e10f46eb63293c7949698 reverted
17:35 Horrorcat: no, did nothing
17:35 karolherbst: well I think even against nvidia it is a close call
17:35 karolherbst: Horrorcat: yeah, let me update my maxwell branch
17:35 Horrorcat: how would I reclock manually?
17:36 karolherbst: Horrorcat: /sys/kernel/debug/dri/1/pstate
17:37 Horrorcat: so I would for example echo '0a' > /sys/kernel/debug/dri/1/pstate?
17:39 imirkin_: yes. except i'm pretty sure reclocking is disabled on maxwell.
17:39 Horrorcat: that makes sense, I did not see a difference :-)
17:43 karolherbst: Horrorcat: cat the file
17:43 karolherbst: Horrorcat: the last line should change
17:43 karolherbst: well
17:43 karolherbst: the core part
17:43 karolherbst: imirkin_: core reclocking is enabled afaik
17:43 Horrorcat: the line I selected has "AC DC *" suffixed afterwards
17:43 karolherbst: right
17:43 imirkin_: check the last line
17:44 Horrorcat: ak
17:44 Horrorcat: *ah
17:44 Horrorcat: yes, memory clock stays the same, core clock increased
17:44 imirkin_: ah cool
17:44 karolherbst: Horrorcat: https://github.com/karolherbst/nouveau
17:44 karolherbst: Horrorcat: branch: maxwell_reclocking
17:45 karolherbst: I just hope regarding DDR3 nothing changed compared to kepler
17:45 karolherbst: I just checked GDDR5
17:45 karolherbst: well "nothing"
17:45 Horrorcat: I’m away for today, I just started the build
17:45 karolherbst: no problem
17:45 Horrorcat: maybe I’ll have a few seconds to test the module today, otherwise I’ll do it tomorrow :-)
17:46 karolherbst: just make sure to report back on how stable it is on 0f :D
17:46 karolherbst: but I am sure it will be still slower as intel
17:46 Horrorcat: will do :-)
17:46 karolherbst: Horrorcat: and don't check with glxgears :D
17:46 Horrorcat: I’ll try with X3TC.exe ;-)
17:46 karolherbst: ahh good
17:46 Horrorcat: (which is currently unusable with the nvidia)
17:46 karolherbst: well
17:47 karolherbst: at least you get a faster pcie link too currently
17:47 Horrorcat: (if it still is I’ll find something different, maybe W40k.exe or something)
17:47 Horrorcat: afk for real now
17:47 karolherbst: this gives like a solid 5% speed improvement for prime offloading
20:08 stopzion: you won't win
20:13 stopzion: jews
20:58 karolherbst: mupuf: k, now REing those tables a bit :)
21:11 karolherbst: mupuf: mhh I don't think those new tables have any effect :/
21:13 karolherbst: ohh wait
21:13 karolherbst: I forgot to configure the gpu for faking the vbios
21:13 karolherbst: silly me
21:15 karolherbst: okay, nvidia gets pissed after the table is messed up :)
21:15 karolherbst: yay
21:15 mupuf: ah ah
21:16 karolherbst: yay!
21:16 mupuf: well, that is promising :D
21:16 karolherbst: lowered the temps
21:16 karolherbst: now downlock at 79°C :)
21:16 karolherbst: I set 72 and 80 °C
21:17 karolherbst: let's see what nvidia does
21:17 karolherbst: ohh
21:17 karolherbst: downclocking starts at 74°C
21:17 karolherbst: mupuf: ok, so yeah, I think it is this table :)
21:18 karolherbst: upclocking starts at 67°C again
21:18 mupuf: +2/-2?
21:18 karolherbst: no, I don't think so
21:19 karolherbst: it downclocks faster at 76°C
21:23 mupuf: great, are there any tables left in bit P we do not know the meaning of?
21:26 karolherbst: yeah, like 5
21:26 karolherbst: or do you mean meaning as in "we have a clue"?
21:26 karolherbst: there is still the new FAN table
21:26 karolherbst: but besides that? there are some totally unknown
21:28 karolherbst: okay, at 74°C it was down in 0.24 seconds, at 75° it only took 0.13 seconds
21:28 karolherbst: but this may be due to inaccurate readings
21:28 karolherbst: at 76° it jumps instantly
21:28 karolherbst: k, let me mess with the table a bit
21:28 karolherbst: everything unknown, set to 0 :)
21:29 karolherbst: k, nvidia is pissed by this
21:31 mupuf: karolherbst: are you using nvaforcetemp?
21:31 karolherbst: it is just bad that this entry is like 0x1c bytes big
21:31 karolherbst: mupuf: yeah
21:31 mupuf: going to be a ton of fun to RE....
21:32 karolherbst: yep
21:35 karolherbst: fun
21:35 karolherbst: now nvidia doesn't want to start at specific temperatures
21:41 karolherbst: mhh funny, the second temrpeature doesn't seems to do much
21:41 karolherbst: the first one controls when to clock down
21:41 karolherbst: this I am sure of
21:45 karolherbst: mupuf: .... this entry looks much alike those voltage map entries.... I fear the worst now
21:47 karolherbst: why u do dis nvidia...
21:47 karolherbst: byte 0 is again some mode flag
21:47 karolherbst: switching it to 0 changes the behaviour slitghly
21:47 karolherbst: instead the drop from cstate 16 => 0
21:47 karolherbst: and I get stepped downclocks evet between 16 and 0
21:49 karolherbst: and byte 1 is ff
21:50 karolherbst: and I am sure byte 2 and byte 6 are min/max values and the other stuff is something stupid
21:50 karolherbst: mhh
21:50 karolherbst: no that doens't make that much sense
21:50 karolherbst: let me see
21:54 karolherbst: ha
21:54 karolherbst: now I managed to disable the temperature downclock even above the temperature :)
21:55 karolherbst: 0xc8
21:55 karolherbst: strange value
21:55 karolherbst: 200 as decimal
21:55 karolherbst: :)
21:59 karolherbst: okay
21:59 karolherbst: this reg somewhat controls the speed of the downclock
22:00 karolherbst: 0x8 and 0x9 as a 16bit value
22:01 karolherbst: :D
22:01 karolherbst: nice
22:01 karolherbst: one cstep a second now
22:04 karolherbst: it goes both ways though
22:09 mupuf: does not sound too insane so far
22:09 mupuf: they had something like that for the fan management too
22:09 mupuf: how often one should poll and how much should we increase?
22:11 karolherbst: mhh could be?
22:11 karolherbst: I just now that there might be a temp->duty->rpm mapping in the table
22:13 mupuf: I doubt there woyuld be such a thing since a PID controller would achieve the same
22:13 karolherbst: maybe, but it looks like it and is consistent across all vbios I saw
22:16 karolherbst: mhh "speed 200" is default in my vbios
22:16 karolherbst: maybe one cstep per second?
22:17 karolherbst: yep
22:18 karolherbst: 200 means the interval in msec
22:18 karolherbst: and 1000 means one downclock each second
22:18 karolherbst: where 200 means one downclock each 200 msecs :)
22:18 karolherbst: and downlock doesn't mean one cstep though, it is just what nvidia thinks is smart I guess
22:19 RSpliet: hehe, run a training round on their "when and how much to download" neural network? Call a member of the helpdesk to ask for help? :-P
22:19 RSpliet: *downclock
22:20 karolherbst: :D
22:20 karolherbst: well
22:20 karolherbst: nvidia seems to skip some csteps
22:20 karolherbst: no idea why
22:21 RSpliet: yet ;-)
22:22 karolherbst: course... if the first byte is 0, the 0x8-0x9 bytes aren't the interval speed anymore...
22:36 karolherbst: mupuf: super odd situation I have hear
22:36 karolherbst: mupuf: at 95°C it clocks higher than at 1°C now
22:39 mupuf: ah ah ah
22:41 karolherbst: any idea what this might be?
22:41 karolherbst: I changed a 0x12 to 0x5
22:43 mupuf: hmm
22:43 karolherbst: RSpliet: you are right, now I got nvidia to use every cstep :D
22:43 karolherbst: this looks so nice :)
22:43 mupuf: karolherbst: I would say, collect more data for the stuff you do not know what it does
22:43 mupuf: and plot it
22:43 karolherbst: right
22:43 mupuf: this should be three dimensions
22:43 karolherbst: but first I try to find the obvious stuff
22:44 karolherbst: still like 15 bytes to go
22:44 karolherbst: https://gist.github.com/karolherbst/38a57114231f148a8beb11a8f2998073 :)
22:44 karolherbst: first is time in µs
22:44 mupuf: yeah, if you change one thing at a time, you may be able to fit it in 3 diemnsions
22:44 karolherbst: second last cstate
22:45 mupuf: hehe
22:45 mupuf: time for me to sleep, sorry
22:45 karolherbst: mhhh I changed 0x30 to 0x20
22:45 karolherbst: maybe MHz down per check?
22:45 karolherbst: this would fit
22:45 mupuf: well, MHz would be a good idea, yeah
22:46 mupuf: have fun!
22:46 karolherbst: mhh
22:46 karolherbst: thanks
22:46 karolherbst: but it doesn't work
22:46 karolherbst: 0x10 has bigger steps now
22:46 karolherbst: this byte most likely contains some sort of flags
22:46 karolherbst: or something odd
23:15 karolherbst: shit
23:15 karolherbst: I think it is really a formular entry like in the vmap table
23:15 karolherbst: and the input is time since over temp threshold
23:17 karolherbst: and output might be something like clock down by ... MHz