04:03 karolherbst: mupuf_: any preference about the structure of the current_load file? csv?
04:03 karolherbst: I don't want to put something there which is hard to parse
04:03 mupuf_: fully agree
04:03 mupuf_: are you talking about metrics collection for ezbench?
04:03 mupuf_: everything in ezbench is CSV
04:04 karolherbst: for example, but I wasn't thinking of that
04:04 karolherbst: more like bash in general
04:04 karolherbst: currently I have a key: value format
04:05 karolherbst: this this stuff looks always silly if the values aren't starting in the same coloumn
04:05 karolherbst: currently I do this: https://github.com/karolherbst/nouveau/commit/963e93baf8f4b0547aa5501cef09bd5de79f80f0#diff-6fc696de511b0108a1a7b4f8a0776021R296
04:05 mupuf_: karolherbst: http://hastebin.com/ezolaniqod.js <-- the newest version of env_dump
04:06 karolherbst: every touched env variable?
04:06 karolherbst: so you intercept getenv :D
04:06 mupuf_: yes, I do
04:06 karolherbst: you may want to intercept setenv, too
04:06 mupuf_: but I also list all the variables
04:06 mupuf_: wait, no, I do not intercept getenv
04:06 mupuf_: only the ones changing the environment
04:07 karolherbst: mhh?
04:07 mupuf_: http://cgit.freedesktop.org/~mperes/ezbench/tree/utils/env_dump/posix_env.c
04:09 karolherbst: ohh, you use environ :D
04:09 mupuf_: ;)
04:09 karolherbst: mhhh
04:09 karolherbst: right
04:10 mupuf_: it was easier to just dump everything and then only show changes
04:10 mupuf_: getenv may be called pretty often
04:10 karolherbst: right
04:10 mupuf_: http://cgit.freedesktop.org/~mperes/ezbench/tree/utils/env_dump/net.c <-- I love this
04:10 karolherbst: maybe ... no mhh
04:11 karolherbst: :D
04:11 karolherbst: you still preload, do you?
04:11 mupuf_: it tells you which server you are connected to and how it got launched
04:11 mupuf_: yes
04:11 karolherbst: yeah, preloading is a dangerous technique :D
04:11 mupuf_: it is not perfect
04:11 karolherbst: preload based adblocker :O
04:12 mupuf_: what do you mean by dangerous?
04:12 mupuf_: if you are talking about userspace keyloggers and others, agreed
04:13 karolherbst: yeah, you can do a lot of shady things there
04:13 mupuf_: right
04:13 mupuf_: as you can see, I cleaned up the code a lot
04:13 karolherbst: yeah, it's awesome
04:13 mupuf_: wrote a makefile and split the entire thing
04:14 karolherbst: yeah
04:14 mupuf_:wouldn't go this far
04:14 karolherbst: I just wanted to write that
04:14 karolherbst: it may make sense to make this an own project
04:15 karolherbst: first requested feature: mark "important" changed stuff compared to other run (old output file passed in thorugh env variable)
04:15 mupuf_: agreed, but for legal matters, I won't be able to. there is a process to get stuff open sourced at intel and if I were to take it out of ezbench, I would have to ask for permission again
04:15 karolherbst: ohhh I see
04:15 mupuf_: yes, there is the diffing tool
04:15 karolherbst: ask before you go :p
04:16 mupuf_: ask before I go?
04:16 mupuf_: I do not need to ask as long as it is part of ezbench :D
04:16 karolherbst: :D
04:16 karolherbst: will ezbench be like public open source=
04:16 mupuf_:was careful in the wording of the project being a collection of tools developed for benchmarking
04:16 karolherbst: ?
04:16 mupuf_: it is already
04:16 mupuf_: I will move that to another repo next week
04:16 karolherbst: GPL?
04:16 mupuf_: it is MIT-licensed
04:17 karolherbst: ohh I see
04:17 karolherbst: I thought then you could move the stuff out without seeking permission?
04:17 mupuf_: Well, someone can do it, but I can't ... I guess
04:17 mupuf_: I am sure no one would care though
04:17 karolherbst: I .. see
04:18 mupuf_:may be too anal here
04:18 mupuf_: no idea
04:18 mupuf_: right now, it is a minor inconvenience I would say
04:18 karolherbst: no when somebody cares enough he will do it
04:18 mupuf_: and when I stop developing it, you can move it to its own repo :p
04:18 karolherbst: :D
04:18 karolherbst: right
04:19 mupuf_: anyway, first things first : asking the package manager the version of the package containing the .so referenced
04:19 karolherbst: mhhh
04:20 karolherbst: I won't add support for asking pacakge managers
04:20 mupuf_: and for the binaires unknown to the package manager, we will have to ask the build db ... which we need to create
04:20 karolherbst: *wouldn't
04:20 mupuf_: why?
04:20 karolherbst: that's kind of messy stuff
04:20 karolherbst: maybe, packagekit can do this
04:20 mupuf_: oh, good idea
04:20 mupuf_: pacman -Qo /my/path/to/lib.so
04:20 karolherbst: like ask which package the .so file belongs to
04:20 mupuf_: that's not too messy :p
04:20 mupuf_: can you check it out?
04:21 karolherbst: and then get all files name like the old but with . additions
04:21 karolherbst: mhh
04:21 karolherbst: why not just search for files?
04:21 karolherbst: you take the .so file and search for files begining with the name
04:22 karolherbst: mupuf_: wait, you just want to have the version of the library, do you?
04:22 mupuf_: yeah, along with the name of the distro that provided it
04:22 mupuf_: debian-mylib-2.65.4-45
04:22 mupuf_: something like that
04:23 karolherbst: the documantation of packagekit is just not there :D
04:24 karolherbst: ohh I totally see why nobody wants to use it
04:24 karolherbst: there is a cli tool though
04:25 mupuf_: yes, there is
04:25 karolherbst: pkcon search file $value
04:25 mupuf_: seems to work!
04:25 karolherbst: :)
04:25 mupuf_: pkcon search file /usr/bin/ls --> Installed coreutils-8.24-1.x86_64 (installed) The basic file, shell and text manipulation utilities of the GNU operating system
04:26 karolherbst: packagekit is good enough to get support for like all distributions at once
04:26 karolherbst: I guess
04:26 karolherbst: but packagekit on gentoo is a bit messy :/
04:26 karolherbst: I had high cpu loads while having it installed
04:26 karolherbst: because it's cron job always scaned all packages and built the database :/
04:26 mupuf_: pkcon backend-details --> we can use the backend name to prefix the package name
04:27 mupuf_: hmm
04:27 karolherbst: I would do it in a way where native package managers can be used
04:27 karolherbst: but packagekit as a fallback
04:27 mupuf_: agreed
04:27 mupuf_: gentoo can have its own, pkcon for everyone else
04:27 karolherbst: :D
04:27 karolherbst: this was like two years ago
04:28 karolherbst: maybe it is fixed, who knows
04:28 mupuf_: we can fix it when someone complains
04:28 mupuf_: no need to support the entire world at first
04:28 mupuf_: let's just make it extensivle
04:29 karolherbst: gentoo: equery belongs /usr/bin/lsof --> * Searching for /usr/bin/lsof ... \n sys-process/lsof-4.89 (/usr/bin/lsof)
04:29 karolherbst: but this can take like several seconds
04:30 mupuf_: 32ms on archlinux
04:31 karolherbst: yeah it is faster everywhere else
04:32 karolherbst: it is more a design problem
04:32 mupuf_: anyway, as much as I would like you to help me on this (beside giving excellent tips), shouldn't you work on the metrics collection?
04:32 mupuf_: you have all the necessary information now :D
04:32 karolherbst: you get paid, I don't :p
04:33 karolherbst: first I want to finish this current_load interface, because this might be important for this anyway
04:33 mupuf_: agreed, hence why we should work on what you wanted to do in the first place, metrics collection! :p
04:33 mupuf_: might? It is!
04:33 karolherbst: yeah, that's why I asked you about the layout of that file :p
04:33 mupuf_: oh, right
04:34 mupuf_:has a shallow stack some days .... most days .... always?
04:34 mupuf_: I think one entry per line is the easiest
04:34 mupuf_: core:val%
04:35 mupuf_: so as we can vary the number of entries as the hw changes
04:35 karolherbst: I really would like to stay consistent with the layout for this
04:35 mupuf_: and we do not need to keep the ordering as strict
04:35 mupuf_: how are you planing on supporting the nvaX then>
04:35 mupuf_: ?
04:35 mupuf_: they only have 4 counters?
04:36 mupuf_: you want to add data, but never take some out?
04:36 karolherbst: I don't care about the slots on the nouveau side
04:36 mupuf_: what if we discover that one counter is more important?
04:36 karolherbst: mhh
04:36 karolherbst: we should have a clear goal what we want to know through the counters
04:36 mupuf_: the slot allocation is the constraint
04:36 mupuf_: as far as I can tell, we will never have one
04:36 mupuf_: and it will change with hw
04:36 karolherbst: I know, so nvaX just collects 3 different kind of information we care about
04:36 mupuf_: yep
04:37 mupuf_: we can push it to 4
04:37 mupuf_: but that's it
04:37 karolherbst: there isn't much we can do though
04:37 karolherbst: we will stay with our cstate/pstate semantics
04:37 karolherbst: so
04:37 karolherbst: so one slot should take care of all information for a cstate chang trigger
04:37 karolherbst: one for the pstate
04:37 karolherbst: then we can split that up as we want
04:37 karolherbst: or if something is not good enough
04:37 mupuf_:would argue that we will allways want to expose all the counters for the metrics collection since they got polled anyway
04:38 mupuf_: now you don't make sense
04:38 karolherbst: I mean we can't get any counter anyway at once
04:38 karolherbst: so we have to already decide what kind of information we want to get
04:38 mupuf_: ?
04:38 mupuf_: definitelyuy
04:39 mupuf_: but that may change in the future
04:39 karolherbst: right
04:39 karolherbst: so we need to be more abstract than that
04:39 mupuf_: how about the userspace should not care much about them and just expose them all in its report?
04:40 karolherbst: does it makes sense to expose ROP, PCOPY0/1/2, ... loads seperated?
04:40 mupuf_: oh, one missing feature: querying the gpu information out of the drm node
04:40 mupuf_: sure, if you can
04:40 karolherbst: well we can't
04:40 karolherbst: we have only 7 slots
04:40 mupuf_: 8 if we are smart
04:40 karolherbst: on fermi+ that is
04:40 karolherbst: :D
04:41 karolherbst: yeah okay, but I really don't want to remove that 8th one
04:41 mupuf_: some slots may be turned configurable, but that's another story
04:41 mupuf_: what I am saying is that we should expose to the userspace all the counters we are currently polling
04:41 mupuf_: that's it
04:41 karolherbst: on fermi+ I already use 5 slots
04:41 mupuf_: then there are 2 slots available for .. fun :D
04:41 karolherbst: 3
04:41 mupuf_: oh, right
04:42 mupuf_: no need to double add the cycles counter
04:42 karolherbst: https://github.com/karolherbst/nouveau/commit/a46efeccc74d481ca9552684f31b992d8451a661#diff-5e5cb4582f6faff078d1cad6144b248aR155
04:43 karolherbst: imagine we wanted to poll each of the counters seperated
04:43 karolherbst: and not grouped
04:43 mupuf_: no, poll them all in one go
04:43 mupuf_: it is dumb to poll them individually
04:43 mupuf_: a trip through the pcie port is slow as heck
04:43 karolherbst: I mean the slot configurations
04:44 mupuf_: yes?
04:44 karolherbst: this is on the falcon
04:45 karolherbst: I read all the values from the last read out in one go from the host, but that's the boring part here
04:45 mupuf_: from the host?
04:45 karolherbst: I meant, how should we configure all those slots to get data we want
04:45 mupuf_: the readout is on the pmu
04:45 karolherbst: and I cache them on the pmu
04:46 karolherbst: I think we are lost and should reset :D
04:46 mupuf_: (falcon is an ISA btw, almost all the engines use this ISA so refering to pdaemon/pmu using falcon is not helpful :p)
04:46 karolherbst: I see
04:46 karolherbst: then pmu
04:46 mupuf_: falcon == fuc, also
04:46 mupuf_: falcon == nvidia's name
04:46 mupuf_: fuc is our
04:47 mupuf_: the pmu should be responsible for configuring the counters, polling them periodically, making reclocking decisions and sending it to the host
04:48 mupuf_: the host may also request from the pmu to return the latest values polled
04:48 mupuf_: that's it!
04:48 mupuf_: and this is what my code was allowing, not sure where you are going be it sounds unclear
04:49 karolherbst: no, that's what I do
04:49 karolherbst: I was just talking about the general idea what purpose the counter configurations should follow
04:49 mupuf_: oh
04:49 mupuf_: the purpose is reclocking
04:50 mupuf_: and doing power management in general
04:50 mupuf_: the fact that we are going to poll on that from the userspace is not relevant
04:51 karolherbst: yeah
04:51 mupuf_: you really want to have different methods that can be called by the host by the way
04:51 karolherbst: but I meant it a bit more specific than that :)
04:51 karolherbst: like if we want tor reclock, what do we want to know?
04:52 karolherbst: we want to know stuff like
04:52 karolherbst: is our current cstate high enough
04:52 karolherbst: and then, which configuration will allow us to know that
04:52 mupuf_: no, ctstae and pstate should not exist at this level
04:52 karolherbst: mhh okay
04:52 karolherbst: then more like, is the memory clock fast enough
04:52 mupuf_: right
04:52 karolherbst: or is the PCOPY012 clock fast enough
04:52 mupuf_: this is it
04:53 karolherbst: maybe we find something between cstates and engines, which is generic enough as a guideline, but specific enough to follow this across all chipsets
04:53 mupuf_: if you want to upclock, ask for it and wait for the host to have confirmed that the change was done before reporting a second time you nedd to increase the perf
04:53 mupuf_: but if you see that the perf is not needed anymore, just report that you can lower the clock
04:54 mupuf_: well, just asking for more perf with a urgency level so as we can scale more or less quickly based on the current load would be good
04:55 mupuf_: or we just implement a simple hysteresis in the pmu and be done with it ... but it is not super good
04:55 mupuf_: anyway, I cannot talk about that now
06:34 RSpliet: karolherbst: for GT21x, I recall something about the memory clock and core clock not being allowed to be too far apart
06:34 RSpliet: (which could have something to do with the design of the clock-crossing logic...)
06:34 RSpliet: which, in other words, means you change the entire pstate, or none at all
06:37 mupuf_: yes, hence why the logic in pmu should be stupid
06:37 mupuf_: just request more performance for domains that limited
06:37 mupuf_: and let the kernel figure out what to do
07:41 karolherbst: mupuf_: what should we do when the kernel decides not to clock up?
07:42 mupuf_: you should not send another update until you get an ack from the kernel
07:42 karolherbst: yeah, but what if we never get one, because the load isn't high enough for the kernel
07:43 karolherbst: or when we already reached highest clocks
07:43 karolherbst: maybe we could send a nack + load values for which the pmu shall notify the next time?
07:44 mupuf_: no
07:44 mupuf_: well, not high-enough for the kernel is wrong
07:44 mupuf_: if pdaemon says upclock, the kernel should upclock unless it is impossible
07:45 mupuf_: and in this case, it should not ack any change was made
07:45 mupuf_: pdaemon may send another update, but only when we need to downclock
07:45 karolherbst: mhh, this really restricts us in teh algorithm we can use
07:46 mupuf_: as in?
07:46 mupuf_: the decision of reclocking decision should be made by pdaemon, not the host
07:46 mupuf_: the host is here to execute
07:46 mupuf_: otherwise, you will enter some funny rules on when to send an IRQ or not
07:47 mupuf_: and it is going to be messy to write in asm
07:47 karolherbst: in my current algorithm I use information like pstate and cstate count
07:47 mupuf_: what for?
07:48 karolherbst: to calculate which cstate I clock to
07:48 karolherbst: and to have smoother clocking
07:48 karolherbst: the current nouveau gk20a code also uses this
07:51 karolherbst: the thing is, if the current load is like 85% and the target is 75%, shall we upclock or not? and if we upclock and we got a load of 50% after that, shall we downclock?
07:51 karolherbst: we might end up in a up/down clocking cycle if the pmu don't know what upclocking actually does
07:52 karolherbst: there are kepler cards with only 3 cstates, one for each pstate
07:53 mupuf_: how about a double hysteresis window?
07:53 mupuf_: or simply-said, 2 windows
07:53 mupuf_: you only upclock to the next pstate if you reach the upper threshold
07:54 mupuf_: and this is only if you already reached the last cstate of the pstate
07:55 karolherbst: I think we really need to know how many steps the kernel has to clock, because that determines the size of each steps
07:55 karolherbst: and also the threshold when we should up/down clock
07:55 mupuf_: what you want is a way to predict the performance based on the clock increase
07:55 mupuf_: you really think you can write this code in asm?
07:56 karolherbst: it's not that hard
07:56 karolherbst: you just factor in the step width
07:56 karolherbst: inside the cur_load, tar_load, max_load scale
07:57 mupuf_: what if the step width is not constant?
07:57 mupuf_: which is ... true
07:57 karolherbst: you somehow devide the difference between tar_load and max_load into max_cstate - cur_cstate, parts
07:57 karolherbst: and just upclock the count of the parts you are above cur_load
07:57 mupuf_: anyway, still can;t talk, sorry
07:57 mupuf_: you can do that in the kernel
07:57 mupuf_: no need to do it on the pmu
07:58 mupuf_: anyway, how about having the testing rig ready before writing a ton of code?
07:58 karolherbst: yeah, but we still should be able to predict on the pmu if the kernel actually will reclock
07:58 mupuf_: we should test stuff in the userspace first
07:58 karolherbst: yeah, I need to modify my code for that a bit
07:58 mupuf_:disagrees, but time will tell
07:59 mupuf_: and I am happy to be *proven* wrong
07:59 mupuf_: we can model that in th euserspace anyway
07:59 mupuf_: premature implementation in asm is only going to annoy you a lot and reduce the number of tests
08:01 karolherbst: I know
08:01 mupuf_: it's frustrating, right? :s
08:02 karolherbst: yeah
08:04 karolherbst: sadly my cstates table is too linear, so I couldn't test it with strange cstates
11:31 pmoreau: \o/ I like it! PGRAPH is talking to me: "TRAP_MP - TP0: GLOBAL_LIMIT_WRITE"! :-)
11:33 imirkin_: you need to set up a bunch of registers
11:34 imirkin_: like the base memory address
11:34 pmoreau: I guess curro's patches take care of most of them, maybe all :-)
11:34 pmoreau: I'll have to check
11:35 pmoreau: I was wondering whether it could be that I'm writing outside of the allocated memory, due to not reading the pointer from the correct memory area.
11:35 pmoreau: The different memory areas seem to start at an offset, and I don't respect it at all for now.
11:36 imirkin_: this is nv50 or nvc0?
11:36 pmoreau: nv50
11:36 imirkin_: can i see your code?
11:36 pmoreau: On my brave MBP laptop :D
11:36 imirkin_: i.e. the full NV50_PROG_DEBUG output
11:36 pmoreau: Sure! Just a sec
11:37 pmoreau: With debug level equals?
11:37 imirkin_: =1
11:37 pmoreau: Ok
11:38 pmoreau: imirkin_: https://phabricator.pmoreau.org/P51
11:39 imirkin_: nice, looks like emission is working
11:40 imirkin_: at least envydis agrees
11:40 pmoreau: \o/
11:40 imirkin_: that's something to look out for, esp on nv50
11:40 pmoreau: Ok
11:40 pmoreau: How do you use envydis?
11:40 imirkin_: envydis -m g80 -V g84 -O cp -w
11:40 pmoreau: O.O
11:40 pmoreau: :D
11:41 imirkin_: then paste the dwords in, and ^D
11:41 pmoreau: m is family, V version, O type of shader?
11:41 imirkin_: m is machine, v is variant, o is variant2
11:42 imirkin_: internally you can condition various things on the variant
11:42 imirkin_: that way you can have a single machine with slightly different behaviours depending on the variant
11:43 pmoreau: Hmmm, ok
11:45 pmoreau: Once I'll get hello_world to work, I'll still have to handle all the control flow commands and SSA, as well as system values… :/
11:45 pmoreau: Hopefully, by then we will have agreed on which path to follow for compute, and I won't be working on it alone :D
11:46 imirkin_: hopefully.
11:46 pmoreau: If I'm slow enough, I can only increase the chances!
11:46 pmoreau: s/I can/it can
17:13 wadadli: Hey I'm using fedora, with the nouveau drivers installed, I have two identical monitors yet one is being displayed at a lower reslotion than the other check my xrandr output http://paste.fedoraproject.org/283124/64508814/raw/
17:14 wadadli: My monitor can display it's native resolution via both hdmi and dvi before anyone says it's a port limitation
17:14 wadadli: I have a NVIDIA Geforce GT 730
17:14 imirkin: wadadli: not a physical limitation
17:14 imirkin: but a nouveau one
17:14 imirkin: we max hdmi at 165mhz
17:14 imirkin: even though it can do more
17:15 imirkin: there are some patches you can apply if you're able to build your own kernel
17:17 wadadli: imirkin: would this information be sufficient to do so? https://wiki.ubuntu.com/Kernel/BuildYourOwnKernel
17:17 imirkin: probably... you have to apply a couple patches too
17:17 imirkin: lemme find them... sec
17:18 imirkin: apply this patch: http://lists.freedesktop.org/archives/nouveau/2015-August/021841.html and this patch: http://lists.freedesktop.org/archives/nouveau/2015-August/021839.html
17:18 imirkin: in the second one you can probably replace 225000 with 297000
17:21 wadadli: I'm assuming I have to add these to the nouveau source code?
17:22 imirkin: they're kernel patches
17:23 wadadli:is trying to stay afloat
17:28 wadadli: hey what language is being used in these patches? imirkin
17:29 wadadli: Sorry disconnected there, said something imirkin?
17:29 imirkin: nope. the kernel is written in C
17:30 wadadli: imirkin: okay thank you for your help sir
17:32 imirkin: wadadli: btw, curious -- what monitor has a 2560x1080 native resolution?
17:32 imirkin: never seen a 2.5:1 monitor...
17:33 wadadli: imirkin: LG ultrawide monitors
17:33 imirkin: ahh
17:34 imirkin: i wonder if they rotate... 2 of those side-by-side would be pretty awesome
17:35 wadadli: not on the stock stand but everything is possible with a bit of imagination
17:36 imirkin: sure
17:36 imirkin: although i'd need 1200... 1080 isn't quite enough
17:36 imirkin: o well
17:36 wadadli: question do I apply these patches using the patch program?
17:36 imirkin: probably easiest to do so, yes
17:37 imirkin: from the top of the kernel tree, you can run 'patch -p1 foo.patch'