09:27RSpliet: skeggsb: do you take github pull requests for kernel dev, or do you still prefer mailing list communication?
10:08mupuf: RSpliet: please stick to the ML :)
10:08RSpliet: mupuf: I'd love to, but my free e-mail provider is being a bit of a <insert profanity here> about me using their SMTP server to send patches
10:09mupuf: really? :o
10:09RSpliet: yeah... if you want something done right, better do it yourself
10:09mupuf: ... or send emails through gmail :s
12:53DebianLinuxero: Hello. Any dev here?
12:54imirkin: just ask your question, someone will see it and reply in due time
12:55DebianLinuxero: It's only to say I finally could do the requested test in this bug :
12:59felipebalbi: imirkin: okay, I'll try that
13:23RSpliet: DebianLinuxero: activities on bugs are logged in the mailinglist. Thanks, it's helpful
13:24RSpliet: I'm a bit short on time, so I haven't been able to triple-check trace and generate a patch, but it's high on my todo list
13:25RSpliet: (it's not really my day-time job ;-) )
13:27DebianLinuxero: RSPliet:Thanks for reply.
14:20imirkin_: hakzsam: i think my patch is fine... the generated code you sent me looks ok
14:20imirkin_: hakzsam: it's not ideal, since preferably that second mov wouldn't be there, but... meh
14:21imirkin_: hakzsam: i think the source also has to be a merged thing, otherwise the RA doesn't guarantee sequential alloc or something, so it inserts constraint movs
14:21imirkin_: hakzsam: but like i said... i wouldn't worry about it
14:47hakzsam: imirkin_, right, after a second look, both your patch and the generated code are fine, but the latter looks ugly :)
14:47hakzsam: anyway, that will fix the issue
14:50imirkin_: hakzsam: i think it's fixable, but dunno that it's worth fixing
14:51hakzsam: well, that doesn't matter for now :)
14:53karolherbst: mind giving me some comments about my Dual issueing pass? It doesn't seem to harm anything significantly and improves performance in some gputest benchmarks: https://github.com/karolherbst/mesa/commit/8e18f33893cde040776328732948355f8b6a067f
14:57imirkin_: karolherbst: needs comments about wtf it's trying to do
14:57imirkin_: i.e. what algorithm it's implementing
14:57karolherbst: I see
14:58imirkin_: like a nice paragraph in plain english
14:58karolherbst: well if i and i->next can't be dual issued, find a later instruction which can be dual issued with i and move it next to i :)
14:59karolherbst: (inside the same BB)
15:03mupuf: karolherbst: make sure you do not increase the compilation time too much too
15:03karolherbst: mupuf: yeah, I am aware of that
15:03karolherbst: mupuf: I also have to limit this pass on archs with dual issueing, too
15:03mupuf: ah ah, that would be good, yeah
15:03karolherbst: but if you can't dual issue at all, the perf is bad anyway (or this isn't even possible anway)
15:04karolherbst: usually only 3 or 4 next instructions are checked
15:04karolherbst: so the complexity is something around 4n (n: instruction count)
15:04karolherbst: (in avarage)
15:05karolherbst: mupuf: and if we have perfect dual issuing already, it is close to n
15:05imirkin_: Horrorcat: what kernel are you on?
15:06mupuf: karolherbst: does not sound too crazy
15:06Horrorcat: 4.5.0-1-amd64 (debian testing)
15:06imirkin_: Horrorcat: let me rephrase... try to grab kernel 4.5, and then separately grab this: https://github.com/skeggsb/nouveau/
15:06imirkin_: then "cd drm; make -j8"
15:06imirkin_: that should make you a new nouveau.ko which you can insmod and should have GM108 support
15:06Horrorcat: thanks for the intsructinos, I’ll follow them tomorrow or something, I’m in a metting right now
15:07karolherbst: mupuf: improves performance in pixmark_piano by 0.86% :)
15:07Horrorcat: that sounds excellent
15:07karolherbst: mupuf: and volplosion by 0.25%
15:07imirkin_: karolherbst: statistically significant?
15:07mupuf: well, that is not nothing
15:07karolherbst: imirkin_: well, pixmark_piano has no varriance :D
15:08karolherbst: n=6 +-0.00%
15:08karolherbst: same for volplosion
15:08mupuf: frames take so long to render that you get discretisation
15:08imirkin_: hopefully it's not using GL_TIMESTAMP stuff for anything
15:08karolherbst: I even had n=20 for piano
15:08karolherbst: and got +-0.00%
15:09mupuf: imirkin_: nope, it counts the number of frames it rendered in $seconds
15:09imirkin_: ok good
15:09mupuf: this means that the score is linear with the execution time :D
15:10mupuf: ezbench normalizes that to produce an average FPS reading
15:10mupuf: because, that's what people expect anyway
15:10mezo: Horrorcat: just a hint, because i had same problem on debian ;) instead of cd drm; make u have to do
15:10mezo: cd /usr/src/linux-headers-4.5.0-1-amd64/
15:10mezo: make M=/home/daniel/Downloads/nouveau/drm/nouveau KCPPFLAGS="" modules
15:11mezo: for example
15:11mezo: otherwise it doesnt work
15:11mezo: took me hours to find out
15:11imirkin_: hmmm... there should be an easier way
15:11karolherbst: mupuf: mhh is there some inaccuracy in getting the fps value of pixmark_piano? cause it looks a bit odd
15:12karolherbst: mupuf: 4.733, 4.666 and 4.633 are the values for all commits
15:12mupuf: well, sure, the value is discrete
15:12mupuf: the difference is likely that you computed one more frame
15:12karolherbst: ahh right
15:12mupuf: you need longer execution times if you want to see more finely
15:12imirkin_: mezo: have a look at https://github.com/skeggsb/nouveau/blob/master/drm/Makefile
15:12karolherbst: mupuf: yeha, makes sense
15:13karolherbst: mupuf: or lower resolution
15:13imirkin_: mezo: should be able to do "make LINUXDIR=/usr/src/linux-headers-4.5.0-1-amd64/" in the drm dir
15:13mupuf: karolherbst: lower resolution would help, but you may not hit the same bottlenecks
15:13Horrorcat: ah thanks mezo
15:13mupuf: in this case, since there are no textures, that should be OK
15:13karolherbst: mupuf: there is no other bottleneck than shader computation in it :D
15:14mupuf: karolherbst: how long do you run it?
15:14karolherbst: 30 seconds
15:14Horrorcat: mezo: where do I get that directory from? I don’t have /usr/src/linux-headers-*
15:14mezo: u need to install kernel headers
15:15mupuf: karolherbst: well, all the results you show have one or two frames rendered difference
15:15mezo: linux-headers-amd64 for example
15:15mezo: imirkin_: ok, nice to know.
15:15mupuf: so, I would suggest lowering the resolution and increasing the run time to 60 seocnds
15:15karolherbst: mezo: yeah, but each run at the same commit has the same amount of frame rendered and with lower resolution I had the same results
15:16mupuf: same result as higher resolution? WTF?
15:16mupuf: is it a vertex test or what?
15:17karolherbst: mupuf: no, just a big fragment shader
15:17karolherbst: one draw call
15:17mupuf: then it makes no sense
15:17karolherbst: well I meant I get the same improvements
15:17karolherbst: at higher fps
15:18karolherbst: well, I will make a new run with higher accuracy anyway later
15:18mezo: "make LINUXDIR=/usr/src/linux-headers-4.5.0-1-amd64/" works. just tested.
15:21imirkin_: mezo: that makefile has a few heuristics for determining the linux source location... this helps it out :)
15:22mezo: ye, the problem was, it used -common instead of -amd64 if i remember right
15:22imirkin_: the problem is that it expects your system to have semi-usual things configured, but instead you're using debian :p
15:22imirkin_: normally /lib/modules/$version/source points to the tree
15:23karolherbst: I really hoped that no distribution would try to be smart and just do it the easy way ...
15:26mezo: source -> /usr/src/linux-headers-4.5.0-1-common
15:26imirkin_: ok. so take it up with your package maintainers.
15:48Horrorcat: imirkin_: /home/horazont/Builds/nouveau/drm/nouveau/nouveau_acpi.c:317:2: error: too many arguments to function ‘vga_switcheroo_register_handler’
15:48Horrorcat: do you need a full paste?
15:52imirkin_: hmmmm... i guess it depends on something introduced in 4.6 =/
15:57mezo: what is the difference between karolherbst/nouveau and skeggsb/nouveau?
15:57imirkin_: one is karol's tree, one is ben's?
15:57imirkin_: ben's the nouveau maintainer, btw.
16:01Horrorcat: imirkin_: :(
16:03imirkin_: Horrorcat: in that repo, do "git revert 3b3ec4e10ade6"
16:06Horrorcat: imirkin_: built, copied the nouveau.ko over the one shipped by debian, rmmod, modprobe
16:06Horrorcat: how do I check that it works? ;-)
16:06Horrorcat: it is not listed in xrandr --listproviders
16:06imirkin_: Horrorcat: pastebin dmesg
16:10Horrorcat: imirkin_: https://paste.fedoraproject.org/359549/60061114/
16:10imirkin_: all's well.
16:10imirkin_: DRI_PRIME=1 glxinfo
16:10imirkin_: [do you have a dri3 ddx?]
16:11Horrorcat: OpenGL renderer string: Mesa DRI Intel(R) HD Graphics 530 (Skylake GT2)
16:12imirkin_: i guess not :)
16:12imirkin_: i would highly recommend enabling DRI3 and making sure that the modesetting ddx does NOT add the nvidia gpu as a secondary
16:12imirkin_: basically you want to add an Option "DRI" "3" to the device section
16:13imirkin_: and Option "AutoAddGPU" "false" to the ... ServerFlags? something like that
16:13Horrorcat: Thanks for your help so far, I gotta go now. I’ll come back tomorrow or something
16:23urmet: GM108 and PRIME testing going on here? sweet
16:26imirkin_: urmet: ben's tree has GM108 support now
16:27imirkin_: he fixed up some of the breakage due to weird internal geometry
16:30urmet: i'll try to try it out in a few hours. have a laptop with skylake+840M
17:05Horrorcat: imirkin_: hm, can I somehow dump the current "autogenerated"(?) xorg config? I don’t have any persistent xorg config except the one I use to force my keyboard layout
17:06Horrorcat: otherwise, I don’t know how to set these options
17:11Horrorcat: ah, there we go
17:12imirkin_: Horrorcat: you can just add little bits to it
17:12Horrorcat: yes, but I have no idea how it’s supposed to look. Anyways, I found `X :2 -configure` online, which gave me something to start from
17:12imirkin_: Horrorcat: like in /etc/xorg.conf.d/10-device.conf - just stick a tiny device section in there
17:12imirkin_: nah, you want nothing to do with the output of that
17:13Horrorcat: so, like, just Section "Device"\n Option "DRI" "3"\nEndSection?
17:13imirkin_: Horrorcat: http://hastebin.com/busohohiri.cmake - stick that in /etc/X11/xorg.conf.d/00-foo.conf
17:13Horrorcat: hm, the X -configure gave me a Identifier "Card0" for the intel
17:13Horrorcat: still use it?
17:14Horrorcat: https://paste.fedoraproject.org/359591/60445914/ that’s what I have now
17:17Horrorcat: now I get "OpenGL renderer string: Gallium 0.4 on NV118" from DRI_PRIME=1 glxgears
17:17Horrorcat: I’m a bit suspicious though as glxgears works, even though I don’t have any compositing
17:21Horrorcat: hm, tried some wine game, performance is horrible with DRI_PRIME=1, but fine without.
17:24imirkin_: Horrorcat: sounds like everything's working then :)
17:24imirkin_: with dri3 you don't need compositing
17:24Horrorcat: ah, nice
17:25imirkin_: the gpu is in its lowest pstate, so you get horrid perf, as expected
17:25Horrorcat: ah, nice
17:25Horrorcat: I prefer lower pstate over not working power management ;-)
17:25imirkin_: i think karolherbst might have some patches to enable super-duper-experimental reclocking on maxwell
17:25Horrorcat: I’d prefer not to fry my hardware. but if the patches are reasonably safe, anyone can ping me for quick tests
17:26imirkin_: highly unlikely to fry
17:28Horrorcat: well, karolherbst, feel free to ping me if you need some testing for anything on that gpu
17:33karolherbst: it isn't super-duper-experimental anymore :D
17:33karolherbst: Horrorcat: mind telling me your kernel version, distribution and GPU?
17:33karolherbst: ohh maxwell
17:33imirkin_: but fairly likely to hang your gpu
17:33karolherbst: yeah well
17:33Horrorcat: karolherbst: 02:00.0 3D controller : NVIDIA Corporation GM108M [GeForce 940MX] [10de:134d] (rev a2), 4.5.0-1-amd64, debian-testing
17:34karolherbst: ohh DDR3 nice
17:34karolherbst: never tested those I think
17:34Horrorcat: well, it works
17:34karolherbst: Horrorcat: if you are up to it, you can try reclocking it :)
17:34karolherbst: Horrorcat: yeah, but no memory reclocking, right?
17:34Horrorcat: not right now, but if you link me patches...
17:34Horrorcat: it’s horribly slow if you mean that ;-)
17:34Horrorcat: easily outperformed by the intel
17:35karolherbst: right :)
17:35karolherbst: Horrorcat: did you actually reclock through the pstate file?
17:35Horrorcat: I built from https://github.com/skeggsb/nouveau/, with 3b3ec4e10ade69abcf0e10f46eb63293c7949698 reverted
17:35Horrorcat: no, did nothing
17:35karolherbst: well I think even against nvidia it is a close call
17:35karolherbst: Horrorcat: yeah, let me update my maxwell branch
17:35Horrorcat: how would I reclock manually?
17:36karolherbst: Horrorcat: /sys/kernel/debug/dri/1/pstate
17:37Horrorcat: so I would for example echo '0a' > /sys/kernel/debug/dri/1/pstate?
17:39imirkin_: yes. except i'm pretty sure reclocking is disabled on maxwell.
17:39Horrorcat: that makes sense, I did not see a difference :-)
17:43karolherbst: Horrorcat: cat the file
17:43karolherbst: Horrorcat: the last line should change
17:43karolherbst: the core part
17:43karolherbst: imirkin_: core reclocking is enabled afaik
17:43Horrorcat: the line I selected has "AC DC *" suffixed afterwards
17:43imirkin_: check the last line
17:44Horrorcat: yes, memory clock stays the same, core clock increased
17:44imirkin_: ah cool
17:44karolherbst: Horrorcat: https://github.com/karolherbst/nouveau
17:44karolherbst: Horrorcat: branch: maxwell_reclocking
17:45karolherbst: I just hope regarding DDR3 nothing changed compared to kepler
17:45karolherbst: I just checked GDDR5
17:45karolherbst: well "nothing"
17:45Horrorcat: I’m away for today, I just started the build
17:45karolherbst: no problem
17:45Horrorcat: maybe I’ll have a few seconds to test the module today, otherwise I’ll do it tomorrow :-)
17:46karolherbst: just make sure to report back on how stable it is on 0f :D
17:46karolherbst: but I am sure it will be still slower as intel
17:46Horrorcat: will do :-)
17:46karolherbst: Horrorcat: and don't check with glxgears :D
17:46Horrorcat: I’ll try with X3TC.exe ;-)
17:46karolherbst: ahh good
17:46Horrorcat: (which is currently unusable with the nvidia)
17:47karolherbst: at least you get a faster pcie link too currently
17:47Horrorcat: (if it still is I’ll find something different, maybe W40k.exe or something)
17:47Horrorcat: afk for real now
17:47karolherbst: this gives like a solid 5% speed improvement for prime offloading
20:08stopzion: you won't win
20:58karolherbst: mupuf: k, now REing those tables a bit :)
21:11karolherbst: mupuf: mhh I don't think those new tables have any effect :/
21:13karolherbst: ohh wait
21:13karolherbst: I forgot to configure the gpu for faking the vbios
21:13karolherbst: silly me
21:15karolherbst: okay, nvidia gets pissed after the table is messed up :)
21:15mupuf: ah ah
21:16mupuf: well, that is promising :D
21:16karolherbst: lowered the temps
21:16karolherbst: now downlock at 79°C :)
21:16karolherbst: I set 72 and 80 °C
21:17karolherbst: let's see what nvidia does
21:17karolherbst: downclocking starts at 74°C
21:17karolherbst: mupuf: ok, so yeah, I think it is this table :)
21:18karolherbst: upclocking starts at 67°C again
21:18karolherbst: no, I don't think so
21:19karolherbst: it downclocks faster at 76°C
21:23mupuf: great, are there any tables left in bit P we do not know the meaning of?
21:26karolherbst: yeah, like 5
21:26karolherbst: or do you mean meaning as in "we have a clue"?
21:26karolherbst: there is still the new FAN table
21:26karolherbst: but besides that? there are some totally unknown
21:28karolherbst: okay, at 74°C it was down in 0.24 seconds, at 75° it only took 0.13 seconds
21:28karolherbst: but this may be due to inaccurate readings
21:28karolherbst: at 76° it jumps instantly
21:28karolherbst: k, let me mess with the table a bit
21:28karolherbst: everything unknown, set to 0 :)
21:29karolherbst: k, nvidia is pissed by this
21:31mupuf: karolherbst: are you using nvaforcetemp?
21:31karolherbst: it is just bad that this entry is like 0x1c bytes big
21:31karolherbst: mupuf: yeah
21:31mupuf: going to be a ton of fun to RE....
21:35karolherbst: now nvidia doesn't want to start at specific temperatures
21:41karolherbst: mhh funny, the second temrpeature doesn't seems to do much
21:41karolherbst: the first one controls when to clock down
21:41karolherbst: this I am sure of
21:45karolherbst: mupuf: .... this entry looks much alike those voltage map entries.... I fear the worst now
21:47karolherbst: why u do dis nvidia...
21:47karolherbst: byte 0 is again some mode flag
21:47karolherbst: switching it to 0 changes the behaviour slitghly
21:47karolherbst: instead the drop from cstate 16 => 0
21:47karolherbst: and I get stepped downclocks evet between 16 and 0
21:49karolherbst: and byte 1 is ff
21:50karolherbst: and I am sure byte 2 and byte 6 are min/max values and the other stuff is something stupid
21:50karolherbst: no that doens't make that much sense
21:50karolherbst: let me see
21:54karolherbst: now I managed to disable the temperature downclock even above the temperature :)
21:55karolherbst: strange value
21:55karolherbst: 200 as decimal
21:59karolherbst: this reg somewhat controls the speed of the downclock
22:00karolherbst: 0x8 and 0x9 as a 16bit value
22:01karolherbst: one cstep a second now
22:04karolherbst: it goes both ways though
22:09mupuf: does not sound too insane so far
22:09mupuf: they had something like that for the fan management too
22:09mupuf: how often one should poll and how much should we increase?
22:11karolherbst: mhh could be?
22:11karolherbst: I just now that there might be a temp->duty->rpm mapping in the table
22:13mupuf: I doubt there woyuld be such a thing since a PID controller would achieve the same
22:13karolherbst: maybe, but it looks like it and is consistent across all vbios I saw
22:16karolherbst: mhh "speed 200" is default in my vbios
22:16karolherbst: maybe one cstep per second?
22:18karolherbst: 200 means the interval in msec
22:18karolherbst: and 1000 means one downclock each second
22:18karolherbst: where 200 means one downclock each 200 msecs :)
22:18karolherbst: and downlock doesn't mean one cstep though, it is just what nvidia thinks is smart I guess
22:19RSpliet: hehe, run a training round on their "when and how much to download" neural network? Call a member of the helpdesk to ask for help? :-P
22:20karolherbst: nvidia seems to skip some csteps
22:20karolherbst: no idea why
22:21RSpliet: yet ;-)
22:22karolherbst: course... if the first byte is 0, the 0x8-0x9 bytes aren't the interval speed anymore...
22:36karolherbst: mupuf: super odd situation I have hear
22:36karolherbst: mupuf: at 95°C it clocks higher than at 1°C now
22:39mupuf: ah ah ah
22:41karolherbst: any idea what this might be?
22:41karolherbst: I changed a 0x12 to 0x5
22:43karolherbst: RSpliet: you are right, now I got nvidia to use every cstep :D
22:43karolherbst: this looks so nice :)
22:43mupuf: karolherbst: I would say, collect more data for the stuff you do not know what it does
22:43mupuf: and plot it
22:43mupuf: this should be three dimensions
22:43karolherbst: but first I try to find the obvious stuff
22:44karolherbst: still like 15 bytes to go
22:44karolherbst: https://gist.github.com/karolherbst/38a57114231f148a8beb11a8f2998073 :)
22:44karolherbst: first is time in µs
22:44mupuf: yeah, if you change one thing at a time, you may be able to fit it in 3 diemnsions
22:44karolherbst: second last cstate
22:45mupuf: time for me to sleep, sorry
22:45karolherbst: mhhh I changed 0x30 to 0x20
22:45karolherbst: maybe MHz down per check?
22:45karolherbst: this would fit
22:45mupuf: well, MHz would be a good idea, yeah
22:46mupuf: have fun!
22:46karolherbst: but it doesn't work
22:46karolherbst: 0x10 has bigger steps now
22:46karolherbst: this byte most likely contains some sort of flags
22:46karolherbst: or something odd
23:15karolherbst: I think it is really a formular entry like in the vmap table
23:15karolherbst: and the input is time since over temp threshold
23:17karolherbst: and output might be something like clock down by ... MHz