04:03karolherbst: mupuf_: any preference about the structure of the current_load file? csv?
04:03karolherbst: I don't want to put something there which is hard to parse
04:03mupuf_: fully agree
04:03mupuf_: are you talking about metrics collection for ezbench?
04:03mupuf_: everything in ezbench is CSV
04:04karolherbst: for example, but I wasn't thinking of that
04:04karolherbst: more like bash in general
04:04karolherbst: currently I have a key: value format
04:05karolherbst: this this stuff looks always silly if the values aren't starting in the same coloumn
04:05karolherbst: currently I do this: https://github.com/karolherbst/nouveau/commit/963e93baf8f4b0547aa5501cef09bd5de79f80f0#diff-6fc696de511b0108a1a7b4f8a0776021R296
04:05mupuf_: karolherbst: http://hastebin.com/ezolaniqod.js <-- the newest version of env_dump
04:06karolherbst: every touched env variable?
04:06karolherbst: so you intercept getenv :D
04:06mupuf_: yes, I do
04:06karolherbst: you may want to intercept setenv, too
04:06mupuf_: but I also list all the variables
04:06mupuf_: wait, no, I do not intercept getenv
04:06mupuf_: only the ones changing the environment
04:09karolherbst: ohh, you use environ :D
04:10mupuf_: it was easier to just dump everything and then only show changes
04:10mupuf_: getenv may be called pretty often
04:10mupuf_: http://cgit.freedesktop.org/~mperes/ezbench/tree/utils/env_dump/net.c <-- I love this
04:10karolherbst: maybe ... no mhh
04:11karolherbst: you still preload, do you?
04:11mupuf_: it tells you which server you are connected to and how it got launched
04:11karolherbst: yeah, preloading is a dangerous technique :D
04:11mupuf_: it is not perfect
04:11karolherbst: preload based adblocker :O
04:12mupuf_: what do you mean by dangerous?
04:12mupuf_: if you are talking about userspace keyloggers and others, agreed
04:13karolherbst: yeah, you can do a lot of shady things there
04:13mupuf_: as you can see, I cleaned up the code a lot
04:13karolherbst: yeah, it's awesome
04:13mupuf_: wrote a makefile and split the entire thing
04:14mupuf_:wouldn't go this far
04:14karolherbst: I just wanted to write that
04:14karolherbst: it may make sense to make this an own project
04:15karolherbst: first requested feature: mark "important" changed stuff compared to other run (old output file passed in thorugh env variable)
04:15mupuf_: agreed, but for legal matters, I won't be able to. there is a process to get stuff open sourced at intel and if I were to take it out of ezbench, I would have to ask for permission again
04:15karolherbst: ohhh I see
04:15mupuf_: yes, there is the diffing tool
04:15karolherbst: ask before you go :p
04:16mupuf_: ask before I go?
04:16mupuf_: I do not need to ask as long as it is part of ezbench :D
04:16karolherbst: will ezbench be like public open source=
04:16mupuf_:was careful in the wording of the project being a collection of tools developed for benchmarking
04:16mupuf_: it is already
04:16mupuf_: I will move that to another repo next week
04:16mupuf_: it is MIT-licensed
04:17karolherbst: ohh I see
04:17karolherbst: I thought then you could move the stuff out without seeking permission?
04:17mupuf_: Well, someone can do it, but I can't ... I guess
04:17mupuf_: I am sure no one would care though
04:17karolherbst: I .. see
04:18mupuf_:may be too anal here
04:18mupuf_: no idea
04:18mupuf_: right now, it is a minor inconvenience I would say
04:18karolherbst: no when somebody cares enough he will do it
04:18mupuf_: and when I stop developing it, you can move it to its own repo :p
04:19mupuf_: anyway, first things first : asking the package manager the version of the package containing the .so referenced
04:20karolherbst: I won't add support for asking pacakge managers
04:20mupuf_: and for the binaires unknown to the package manager, we will have to ask the build db ... which we need to create
04:20karolherbst: that's kind of messy stuff
04:20karolherbst: maybe, packagekit can do this
04:20mupuf_: oh, good idea
04:20mupuf_: pacman -Qo /my/path/to/lib.so
04:20karolherbst: like ask which package the .so file belongs to
04:20mupuf_: that's not too messy :p
04:20mupuf_: can you check it out?
04:21karolherbst: and then get all files name like the old but with . additions
04:21karolherbst: why not just search for files?
04:21karolherbst: you take the .so file and search for files begining with the name
04:22karolherbst: mupuf_: wait, you just want to have the version of the library, do you?
04:22mupuf_: yeah, along with the name of the distro that provided it
04:22mupuf_: something like that
04:23karolherbst: the documantation of packagekit is just not there :D
04:24karolherbst: ohh I totally see why nobody wants to use it
04:24karolherbst: there is a cli tool though
04:25mupuf_: yes, there is
04:25karolherbst: pkcon search file $value
04:25mupuf_: seems to work!
04:25mupuf_: pkcon search file /usr/bin/ls --> Installed coreutils-8.24-1.x86_64 (installed) The basic file, shell and text manipulation utilities of the GNU operating system
04:26karolherbst: packagekit is good enough to get support for like all distributions at once
04:26karolherbst: I guess
04:26karolherbst: but packagekit on gentoo is a bit messy :/
04:26karolherbst: I had high cpu loads while having it installed
04:26karolherbst: because it's cron job always scaned all packages and built the database :/
04:26mupuf_: pkcon backend-details --> we can use the backend name to prefix the package name
04:27karolherbst: I would do it in a way where native package managers can be used
04:27karolherbst: but packagekit as a fallback
04:27mupuf_: gentoo can have its own, pkcon for everyone else
04:27karolherbst: this was like two years ago
04:28karolherbst: maybe it is fixed, who knows
04:28mupuf_: we can fix it when someone complains
04:28mupuf_: no need to support the entire world at first
04:28mupuf_: let's just make it extensivle
04:29karolherbst: gentoo: equery belongs /usr/bin/lsof --> * Searching for /usr/bin/lsof ... \n sys-process/lsof-4.89 (/usr/bin/lsof)
04:29karolherbst: but this can take like several seconds
04:30mupuf_: 32ms on archlinux
04:31karolherbst: yeah it is faster everywhere else
04:32karolherbst: it is more a design problem
04:32mupuf_: anyway, as much as I would like you to help me on this (beside giving excellent tips), shouldn't you work on the metrics collection?
04:32mupuf_: you have all the necessary information now :D
04:32karolherbst: you get paid, I don't :p
04:33karolherbst: first I want to finish this current_load interface, because this might be important for this anyway
04:33mupuf_: agreed, hence why we should work on what you wanted to do in the first place, metrics collection! :p
04:33mupuf_: might? It is!
04:33karolherbst: yeah, that's why I asked you about the layout of that file :p
04:33mupuf_: oh, right
04:34mupuf_:has a shallow stack some days .... most days .... always?
04:34mupuf_: I think one entry per line is the easiest
04:35mupuf_: so as we can vary the number of entries as the hw changes
04:35karolherbst: I really would like to stay consistent with the layout for this
04:35mupuf_: and we do not need to keep the ordering as strict
04:35mupuf_: how are you planing on supporting the nvaX then>
04:35mupuf_: they only have 4 counters?
04:36mupuf_: you want to add data, but never take some out?
04:36karolherbst: I don't care about the slots on the nouveau side
04:36mupuf_: what if we discover that one counter is more important?
04:36karolherbst: we should have a clear goal what we want to know through the counters
04:36mupuf_: the slot allocation is the constraint
04:36mupuf_: as far as I can tell, we will never have one
04:36mupuf_: and it will change with hw
04:36karolherbst: I know, so nvaX just collects 3 different kind of information we care about
04:37mupuf_: we can push it to 4
04:37mupuf_: but that's it
04:37karolherbst: there isn't much we can do though
04:37karolherbst: we will stay with our cstate/pstate semantics
04:37karolherbst: so one slot should take care of all information for a cstate chang trigger
04:37karolherbst: one for the pstate
04:37karolherbst: then we can split that up as we want
04:37karolherbst: or if something is not good enough
04:37mupuf_:would argue that we will allways want to expose all the counters for the metrics collection since they got polled anyway
04:38mupuf_: now you don't make sense
04:38karolherbst: I mean we can't get any counter anyway at once
04:38karolherbst: so we have to already decide what kind of information we want to get
04:39mupuf_: but that may change in the future
04:39karolherbst: so we need to be more abstract than that
04:39mupuf_: how about the userspace should not care much about them and just expose them all in its report?
04:40karolherbst: does it makes sense to expose ROP, PCOPY0/1/2, ... loads seperated?
04:40mupuf_: oh, one missing feature: querying the gpu information out of the drm node
04:40mupuf_: sure, if you can
04:40karolherbst: well we can't
04:40karolherbst: we have only 7 slots
04:40mupuf_: 8 if we are smart
04:40karolherbst: on fermi+ that is
04:41karolherbst: yeah okay, but I really don't want to remove that 8th one
04:41mupuf_: some slots may be turned configurable, but that's another story
04:41mupuf_: what I am saying is that we should expose to the userspace all the counters we are currently polling
04:41mupuf_: that's it
04:41karolherbst: on fermi+ I already use 5 slots
04:41mupuf_: then there are 2 slots available for .. fun :D
04:41mupuf_: oh, right
04:42mupuf_: no need to double add the cycles counter
04:43karolherbst: imagine we wanted to poll each of the counters seperated
04:43karolherbst: and not grouped
04:43mupuf_: no, poll them all in one go
04:43mupuf_: it is dumb to poll them individually
04:43mupuf_: a trip through the pcie port is slow as heck
04:43karolherbst: I mean the slot configurations
04:44karolherbst: this is on the falcon
04:45karolherbst: I read all the values from the last read out in one go from the host, but that's the boring part here
04:45mupuf_: from the host?
04:45karolherbst: I meant, how should we configure all those slots to get data we want
04:45mupuf_: the readout is on the pmu
04:45karolherbst: and I cache them on the pmu
04:46karolherbst: I think we are lost and should reset :D
04:46mupuf_: (falcon is an ISA btw, almost all the engines use this ISA so refering to pdaemon/pmu using falcon is not helpful :p)
04:46karolherbst: I see
04:46karolherbst: then pmu
04:46mupuf_: falcon == fuc, also
04:46mupuf_: falcon == nvidia's name
04:46mupuf_: fuc is our
04:47mupuf_: the pmu should be responsible for configuring the counters, polling them periodically, making reclocking decisions and sending it to the host
04:48mupuf_: the host may also request from the pmu to return the latest values polled
04:48mupuf_: that's it!
04:48mupuf_: and this is what my code was allowing, not sure where you are going be it sounds unclear
04:49karolherbst: no, that's what I do
04:49karolherbst: I was just talking about the general idea what purpose the counter configurations should follow
04:49mupuf_: the purpose is reclocking
04:50mupuf_: and doing power management in general
04:50mupuf_: the fact that we are going to poll on that from the userspace is not relevant
04:51mupuf_: you really want to have different methods that can be called by the host by the way
04:51karolherbst: but I meant it a bit more specific than that :)
04:51karolherbst: like if we want tor reclock, what do we want to know?
04:52karolherbst: we want to know stuff like
04:52karolherbst: is our current cstate high enough
04:52karolherbst: and then, which configuration will allow us to know that
04:52mupuf_: no, ctstae and pstate should not exist at this level
04:52karolherbst: mhh okay
04:52karolherbst: then more like, is the memory clock fast enough
04:52karolherbst: or is the PCOPY012 clock fast enough
04:52mupuf_: this is it
04:53karolherbst: maybe we find something between cstates and engines, which is generic enough as a guideline, but specific enough to follow this across all chipsets
04:53mupuf_: if you want to upclock, ask for it and wait for the host to have confirmed that the change was done before reporting a second time you nedd to increase the perf
04:53mupuf_: but if you see that the perf is not needed anymore, just report that you can lower the clock
04:54mupuf_: well, just asking for more perf with a urgency level so as we can scale more or less quickly based on the current load would be good
04:55mupuf_: or we just implement a simple hysteresis in the pmu and be done with it ... but it is not super good
04:55mupuf_: anyway, I cannot talk about that now
06:34RSpliet: karolherbst: for GT21x, I recall something about the memory clock and core clock not being allowed to be too far apart
06:34RSpliet: (which could have something to do with the design of the clock-crossing logic...)
06:34RSpliet: which, in other words, means you change the entire pstate, or none at all
06:37mupuf_: yes, hence why the logic in pmu should be stupid
06:37mupuf_: just request more performance for domains that limited
06:37mupuf_: and let the kernel figure out what to do
07:41karolherbst: mupuf_: what should we do when the kernel decides not to clock up?
07:42mupuf_: you should not send another update until you get an ack from the kernel
07:42karolherbst: yeah, but what if we never get one, because the load isn't high enough for the kernel
07:43karolherbst: or when we already reached highest clocks
07:43karolherbst: maybe we could send a nack + load values for which the pmu shall notify the next time?
07:44mupuf_: well, not high-enough for the kernel is wrong
07:44mupuf_: if pdaemon says upclock, the kernel should upclock unless it is impossible
07:45mupuf_: and in this case, it should not ack any change was made
07:45mupuf_: pdaemon may send another update, but only when we need to downclock
07:45karolherbst: mhh, this really restricts us in teh algorithm we can use
07:46mupuf_: as in?
07:46mupuf_: the decision of reclocking decision should be made by pdaemon, not the host
07:46mupuf_: the host is here to execute
07:46mupuf_: otherwise, you will enter some funny rules on when to send an IRQ or not
07:47mupuf_: and it is going to be messy to write in asm
07:47karolherbst: in my current algorithm I use information like pstate and cstate count
07:47mupuf_: what for?
07:48karolherbst: to calculate which cstate I clock to
07:48karolherbst: and to have smoother clocking
07:48karolherbst: the current nouveau gk20a code also uses this
07:51karolherbst: the thing is, if the current load is like 85% and the target is 75%, shall we upclock or not? and if we upclock and we got a load of 50% after that, shall we downclock?
07:51karolherbst: we might end up in a up/down clocking cycle if the pmu don't know what upclocking actually does
07:52karolherbst: there are kepler cards with only 3 cstates, one for each pstate
07:53mupuf_: how about a double hysteresis window?
07:53mupuf_: or simply-said, 2 windows
07:53mupuf_: you only upclock to the next pstate if you reach the upper threshold
07:54mupuf_: and this is only if you already reached the last cstate of the pstate
07:55karolherbst: I think we really need to know how many steps the kernel has to clock, because that determines the size of each steps
07:55karolherbst: and also the threshold when we should up/down clock
07:55mupuf_: what you want is a way to predict the performance based on the clock increase
07:55mupuf_: you really think you can write this code in asm?
07:56karolherbst: it's not that hard
07:56karolherbst: you just factor in the step width
07:56karolherbst: inside the cur_load, tar_load, max_load scale
07:57mupuf_: what if the step width is not constant?
07:57mupuf_: which is ... true
07:57karolherbst: you somehow devide the difference between tar_load and max_load into max_cstate - cur_cstate, parts
07:57karolherbst: and just upclock the count of the parts you are above cur_load
07:57mupuf_: anyway, still can;t talk, sorry
07:57mupuf_: you can do that in the kernel
07:57mupuf_: no need to do it on the pmu
07:58mupuf_: anyway, how about having the testing rig ready before writing a ton of code?
07:58karolherbst: yeah, but we still should be able to predict on the pmu if the kernel actually will reclock
07:58mupuf_: we should test stuff in the userspace first
07:58karolherbst: yeah, I need to modify my code for that a bit
07:58mupuf_:disagrees, but time will tell
07:59mupuf_: and I am happy to be *proven* wrong
07:59mupuf_: we can model that in th euserspace anyway
07:59mupuf_: premature implementation in asm is only going to annoy you a lot and reduce the number of tests
08:01karolherbst: I know
08:01mupuf_: it's frustrating, right? :s
08:04karolherbst: sadly my cstates table is too linear, so I couldn't test it with strange cstates
11:31pmoreau: \o/ I like it! PGRAPH is talking to me: "TRAP_MP - TP0: GLOBAL_LIMIT_WRITE"! :-)
11:33imirkin_: you need to set up a bunch of registers
11:34imirkin_: like the base memory address
11:34pmoreau: I guess curro's patches take care of most of them, maybe all :-)
11:34pmoreau: I'll have to check
11:35pmoreau: I was wondering whether it could be that I'm writing outside of the allocated memory, due to not reading the pointer from the correct memory area.
11:35pmoreau: The different memory areas seem to start at an offset, and I don't respect it at all for now.
11:36imirkin_: this is nv50 or nvc0?
11:36imirkin_: can i see your code?
11:36pmoreau: On my brave MBP laptop :D
11:36imirkin_: i.e. the full NV50_PROG_DEBUG output
11:36pmoreau: Sure! Just a sec
11:37pmoreau: With debug level equals?
11:38pmoreau: imirkin_: https://phabricator.pmoreau.org/P51
11:39imirkin_: nice, looks like emission is working
11:40imirkin_: at least envydis agrees
11:40imirkin_: that's something to look out for, esp on nv50
11:40pmoreau: How do you use envydis?
11:40imirkin_: envydis -m g80 -V g84 -O cp -w
11:41imirkin_: then paste the dwords in, and ^D
11:41pmoreau: m is family, V version, O type of shader?
11:41imirkin_: m is machine, v is variant, o is variant2
11:42imirkin_: internally you can condition various things on the variant
11:42imirkin_: that way you can have a single machine with slightly different behaviours depending on the variant
11:43pmoreau: Hmmm, ok
11:45pmoreau: Once I'll get hello_world to work, I'll still have to handle all the control flow commands and SSA, as well as system values… :/
11:45pmoreau: Hopefully, by then we will have agreed on which path to follow for compute, and I won't be working on it alone :D
11:46pmoreau: If I'm slow enough, I can only increase the chances!
11:46pmoreau: s/I can/it can
17:13wadadli: Hey I'm using fedora, with the nouveau drivers installed, I have two identical monitors yet one is being displayed at a lower reslotion than the other check my xrandr output http://paste.fedoraproject.org/283124/64508814/raw/
17:14wadadli: My monitor can display it's native resolution via both hdmi and dvi before anyone says it's a port limitation
17:14wadadli: I have a NVIDIA Geforce GT 730
17:14imirkin: wadadli: not a physical limitation
17:14imirkin: but a nouveau one
17:14imirkin: we max hdmi at 165mhz
17:14imirkin: even though it can do more
17:15imirkin: there are some patches you can apply if you're able to build your own kernel
17:17wadadli: imirkin: would this information be sufficient to do so? https://wiki.ubuntu.com/Kernel/BuildYourOwnKernel
17:17imirkin: probably... you have to apply a couple patches too
17:17imirkin: lemme find them... sec
17:18imirkin: apply this patch: http://lists.freedesktop.org/archives/nouveau/2015-August/021841.html and this patch: http://lists.freedesktop.org/archives/nouveau/2015-August/021839.html
17:18imirkin: in the second one you can probably replace 225000 with 297000
17:21wadadli: I'm assuming I have to add these to the nouveau source code?
17:22imirkin: they're kernel patches
17:23wadadli:is trying to stay afloat
17:28wadadli: hey what language is being used in these patches? imirkin
17:29wadadli: Sorry disconnected there, said something imirkin?
17:29imirkin: nope. the kernel is written in C
17:30wadadli: imirkin: okay thank you for your help sir
17:32imirkin: wadadli: btw, curious -- what monitor has a 2560x1080 native resolution?
17:32imirkin: never seen a 2.5:1 monitor...
17:33wadadli: imirkin: LG ultrawide monitors
17:34imirkin: i wonder if they rotate... 2 of those side-by-side would be pretty awesome
17:35wadadli: not on the stock stand but everything is possible with a bit of imagination
17:36imirkin: although i'd need 1200... 1080 isn't quite enough
17:36imirkin: o well
17:36wadadli: question do I apply these patches using the patch program?
17:36imirkin: probably easiest to do so, yes
17:37imirkin: from the top of the kernel tree, you can run 'patch -p1 foo.patch'