08:51 karolherbst: imirkin: what about the min/max commit?
08:51 karolherbst: ohh, should read mails first :D
10:02 mupuf: karolherbst: I do not think you will manage to do a comparaison between nouveau and nvidia for sched codes
10:02 karolherbst: mupuf: why not?
10:02 karolherbst: mupuf: I was talking about compiled binaries extracted from mmts
10:02 mupuf: how about this instead: dumping into files all the generated shaders by nvidia and then ordering them by sched code
10:03 mupuf: that will give you all the most likely conditions
10:03 karolherbst: then we forget about branching
10:03 mupuf: why would we?
10:04 mupuf: oh wait, I got it, you mean writing an OoT sched code generator and comparing your output with nvidsia
10:04 mupuf: that actually is a sweet idea
10:04 karolherbst: right
10:05 mupuf: you can't just pipe the code through nouveau's compiler and expect it to compute the same sched code, that was my point :)
10:05 karolherbst: right
10:05 karolherbst: I just wanted to reused the sched generating code
10:05 karolherbst: but maybe this would be too messy to actually do
10:05 karolherbst: it
10:05 mupuf: Very probably :D
10:06 mupuf: is the code complex already?
10:06 karolherbst: we also have envydis which should help here
10:06 karolherbst: very
10:06 karolherbst: well
10:06 karolherbst: the main problem would be to translate the file into nv50 ir
10:06 karolherbst: we have no idea what actually matters
10:06 mupuf: oh well, you should RE the shit out of it first, and not try to make real code
10:06 karolherbst: maybe some mods also change the sched opcodes
10:07 mupuf: right, the code assumes nv50 IR, so that would likely be a lot of work
10:07 mupuf: just ignore the nouveau compiler and start from scratch
10:07 karolherbst: mupuf: well my idea was this: parse the generated binaries and push it through some sched opcode generating algorithm
10:07 mupuf: .... well .... actually ... you still need to parse the bloody demmio output
10:07 karolherbst: and see how much we differ with nvidia
10:07 karolherbst: mupuf: you mean demmt, right?
10:07 mupuf: yes, sorry
10:08 karolherbst: well
10:08 karolherbst: we have envydis for this
10:08 karolherbst: but mhh
10:08 mupuf: oh,true
10:08 mupuf: but it is not an IR
10:08 karolherbst: I still think it is a bit of work
10:08 karolherbst: doesn't matter
10:08 karolherbst: I don't need a real IR
10:08 karolherbst: what I need is where an instruction gets executed
10:08 karolherbst: not what the instruction really does
10:08 karolherbst: and a bit of source handling
10:09 mupuf: where?
10:09 mupuf: what you need is all the information possible
10:09 karolherbst: like sf
10:09 karolherbst: u
10:09 karolherbst: mhh, right
10:09 mupuf: there are many instructions, you need to know the registers, the type of operation, etc..
10:09 karolherbst: but I menat for using the current algorithm
10:10 karolherbst: those opclasses in nv50ir doesn't map 1to1 to the actual part used to executed those instructions
10:10 karolherbst: I think unit is the right word for this?
10:10 mupuf: no idea what you are talking about :s
10:11 karolherbst: I mean the instruction execution units
10:11 karolherbst: like the fpu on a cpu
10:11 karolherbst: this is really important to know for tesla (and maybe fermi as well, less on kepler)
10:11 RSpliet: yes, unit is correct, floating point unit, arithmetic logic unit...
10:13 mupuf: karolherbst: are you sure it is as simple as just knowing the unit?
10:13 mupuf: there are no dependencies for registers?
10:13 karolherbst: mupuf: no, but it is a part of it
10:13 mupuf: like RAW or WAR
10:13 karolherbst: mupuf: gt200+ is weird here a little
10:13 karolherbst: mad can be executed on two different units
10:13 karolherbst: and then the type also matters
10:14 RSpliet: karolherbst: keep in mind, iirc, kepler has 6 FPUs for 4 concurrent warps
10:14 RSpliet: which is weird, but... double check those kind of details as well ;-)
10:15 mupuf: ah ah, sounds like there will be a lot of fun here :D
10:15 karolherbst: mupuf: yes
10:15 karolherbst: mupuf: The individual streaming processing cores
10:15 karolherbst: of GeForce GTX 200 GPUs can now perform near full-speed dual-issue of multiply-add operations (MADs) and MULs (3 flops/SP) by using the SP’s MAD unit to perform a MUL and ADD per clock, and using the SFU to perform another MUL in the same clock
10:16 karolherbst: from nvidia ocs
10:16 karolherbst: *docs
10:16 karolherbst: but hey, the sfu can be busy doing sin stuff or something
13:37 karolherbst: mupuf: anything comes to your mind which would help with reducing power consumption except clock/power gating?
13:38 kdvr: hi, i have gone through multiple documents / troubleshooting etc but I can't find anything that could help me yet. Whenever I disconnect my monitor or switch (through hdmi switch) I lose my mouse cursor and chrome for instance does not show anything anymore, the only way to resolve is by logging out and logging back in.
13:39 karolherbst: kdvr: did you try to restart your window manager/compositor?
13:40 kdvr: I am running xfce, is there an easy way to restart without loosing the current open apps?
13:40 karolherbst: xfwm4 --replace
13:40 kdvr: I can try now
13:42 kdvr_: nope that did not work... Mouse is gone and I can't seem to type anything anymore in chrome (which I am using for freenode)
13:42 kdvr_: just logged out and logged back in
13:43 karolherbst: but does anything changes at all?
13:43 kdvr_: no, it just flickers and that is it
13:43 karolherbst: what are your kernel/X versions?
13:45 kdvr_: kernel: 4.7.2-201.fc24.x86_64
13:46 kdvr_: let me see how I can get the X version on fedora
13:47 karolherbst: and maybe it also helps if you tell us what gpu you have
13:48 kdvr_: NVIDIA Corporation GM206 [GeForce GTX 960]
13:49 kdvr_: sorry I am not that fast hold on :P
13:51 kdvr_: I am running X.Org X Server 1.18.4
13:52 kdvr_: All on Fedora 24
13:52 kdvr_: I can provide any log needed :P just have to tell me hahaha, I am a n00b in debugging
13:53 karolherbst: maybe dmesg or the x log tell us something
13:54 kdvr_: alright, thanks btw for your time
13:55 kdvr_: Dmesg: http://pastebin.com/93aKEV8H
13:56 kdvr_: And the Xorg.0.log: http://pastebin.com/zrWe3kSt
14:00 karolherbst: mhh the .old x log please
14:02 kdvr_: Here you go Xorg.0.log.old http://pastebin.com/7wMGztLJ
14:02 karolherbst: odd
14:03 kdvr_: It all looks chinese to me :P
14:04 kdvr_: I was hoping for errors when I would disconnect or reconnect
14:05 karolherbst: me too
14:07 kdvr_: Is there a way to get more verbose information for you?
14:50 mupuf: karolherbst: DVFS, clock gating, power gating are the typical techniques
14:51 mupuf: keeping the gpu cool also helps
14:51 mupuf: but you need to factor in the cost of the fan
14:51 mupuf: after this ... improve performance to reduce power usage (when vsync is on)
14:51 karolherbst: right, but I meant stuff besides that
14:53 mupuf: so yeah, that's it
14:53 mupuf: after this, it really is tuning the DVFS algorithms
14:54 mupuf: try to keep the clocks as low as possible while still not impacting the performance of dynamic workload
14:55 karolherbst: right
14:55 karolherbst: okay, I ust hoped there would be something left to do, but I guess reing power gating will be quite a thing
14:56 karolherbst: but even clock gating already helps a lot, even reducing power consumption under some workloads
14:56 karolherbst: I just asked cause I try to finish collecting all information I want to add to my talk, maybe I just forgot something
14:57 mupuf: sure sure
14:57 mupuf: well, clock gating is still left for REing and will be tough
14:57 mupuf: but bashing all the regs should help already
14:57 karolherbst: yes
14:57 karolherbst: it is enough for kepler+
14:57 karolherbst: nvidia seldom touches them
14:57 mupuf: nope, does not work on my e6
14:57 karolherbst: like I only saw it after suspend
14:58 karolherbst: mupuf: are you sure?
14:58 mupuf: yes, I am
14:58 karolherbst: what gpus are in reator currently?
14:58 mupuf: I tried for a long time to make them work
14:58 karolherbst: maybe I would take a look at some point
14:58 mupuf: not sure
14:58 mupuf: it is off right now
14:58 karolherbst: I will check
14:58 karolherbst: not anymore :D
14:59 mupuf: ok, have fun! I am with pmoreau in Utrecht, Roy's city of origin
14:59 mupuf: we are abut to go for a walk
14:59 karolherbst: gk106 :)
14:59 mupuf: ok, great!
14:59 karolherbst: I am sure it will work :p
14:59 mupuf: try, I did not see any drop in power usage when enabling
14:59 mupuf: well, if it does, /me is happy
14:59 karolherbst: uhh, it booted into the blob again
15:00 karolherbst: mhh
15:00 mupuf: fuck, no idea what people did, but they really screwed up grub
15:00 karolherbst: well
15:00 mupuf: grub-reboot's state should always be resette
15:00 mupuf: d
15:00 mupuf: or someone changed grub's conf
15:00 mupuf: hakzsam: ?
15:00 karolherbst: well, it sometimes happens
15:01 karolherbst: what is the linux grub entry? 0 or 1?
15:01 mupuf: try either one or the other :p
15:01 karolherbst: :D
15:02 mupuf: both are linux though :D
15:02 mupuf: 0 is what I remmeber
15:02 karolherbst: right :D
15:02 karolherbst: grub from the nvidia partition does something bad I assume?
15:03 mupuf: well, it is not the same boot partition
15:03 mupuf: ah, it booted on nouveau now
15:03 karolherbst: sure, but to repair that I always chroot into the nouveau partition and run grub-reboot 0 there
15:04 mupuf: right
15:04 karolherbst: and this leads me to the nouveau partitioan on reboot
15:04 karolherbst: no idea if chroot is required here
15:04 mupuf: normally, you just need to reboot once and it always go back to nouveau
15:05 karolherbst: regarding power consumption: on lowest clocks the effect is rather slim
15:06 karolherbst: k, it works
15:06 karolherbst: but well
15:06 karolherbst: the difference...
15:06 karolherbst: on 0f max boost
15:06 karolherbst: 43.77 W -> 40.60W
15:07 mupuf: ok, I guess I did not see any difference because I did reclock the gpu
15:07 karolherbst: on 07
15:07 karolherbst: 18.9W -> 18.4W
15:07 karolherbst: maybe you thought the 0.5W are usual noise or something
15:07 mupuf: well, the effect is supposed to be much larger
15:07 karolherbst: due to power sensors configured like shit
15:07 karolherbst: on kepler not so much
15:07 karolherbst: on maxwell, yes
15:07 mupuf: yeah, right, the sensor output used to be super noisy
15:07 karolherbst: on maxwell I get 30% drops on 0f
15:08 mupuf: ok, great
15:08 hakzsam: /root/reset_grub.sh works fine
15:08 hakzsam: on blob partition
15:08 mupuf: hakzsam: oh, right!
15:08 karolherbst: :O
15:08 hakzsam: and grub-reboot 2 from the nouveua one
15:08 mupuf: I had forgotten about this script
15:08 karolherbst: mupuf: even for some workloads at full load I got 10% drops on maxwell
15:09 mupuf: nice
15:09 karolherbst: yep
15:09 karolherbst: glxgears .D
15:09 mupuf: on tesla, I could get something similar when enabling clock gating
15:09 karolherbst: mhh
15:09 mupuf: err, clock throttling on idle
15:09 karolherbst: I see
15:10 karolherbst: I guess it is safe to implement it on nouveau then, because it kind of works on every kepler and maxwell gpu
15:11 mupuf: well, let's see!
15:11 karolherbst: mupuf: but we should also RE those bits to globally disable clock gating, just in case there are some gpus where it is indeed disabled
15:11 mupuf: want to write the patch?
15:12 mupuf: right, we'll need to find them
15:12 karolherbst: well, I planned to work on that with gnurou a bit or he do it or somebody else
15:12 mupuf: we can use ezbench to try an A/B with clock gating enabled or not
15:12 karolherbst: I really don't feel ike implementing it, because there are more complex/difficult things to do
15:12 mupuf: we'll see the impact on perf and power :)
15:12 karolherbst: and any new dev could implement it instead
15:12 karolherbst: mupuf: no impact afaik
15:12 mupuf: ah, right, new comer
15:12 karolherbst: yes
15:13 mupuf: karolherbst: and you base this data on what?
15:13 karolherbst: would be a good task to get used to the code base
15:13 mupuf:requires runs of every games and benchmarks possible :p
15:13 karolherbst: mupuf: well, nothing :p I just have it enabled since always
15:13 mupuf: and since you never take it off, you don't see any perf impact
15:13 karolherbst: well I sometimes checked and it didn't make any difference
15:13 mupuf: so, the only way of testing is to do an A/B comparaison
15:13 karolherbst: also nvidia doesn't touch those regs again
15:14 mupuf: sure sure, but still ;)
15:14 karolherbst: I know
15:14 karolherbst: but I assume there is no difference really, maybe in workloads with really spiky loads
15:14 karolherbst: and with spiky I mean several spikes within milliseconds
15:15 mupuf: yes
15:15 karolherbst: afaik it might happen that the clock signals gets disconnected with vsync a lot
15:15 karolherbst: but if we vsync we don't need more perf anyway
15:16 karolherbst: and I would assume that's the biggest tradeoff, that you have a slim lattency on starting things
15:16 kdvr: karolherbst: sorry to be a bother :P but do you think I can enable more verbose logging for my monitor disconnecting /reconnecting issue?
15:16 karolherbst: kdvr: no idea if that helps, I am no expert in that X stuff
15:16 karolherbst: I kind of hoped somebody else would have any ideas
15:16 kdvr: ok cool :-) me too
15:16 kdvr: haha
15:16 karolherbst: mupuf: but I think we need some reference counting ok within the kernel for the engines
15:16 karolherbst: *code
15:17 mupuf: what for?
15:17 karolherbst: or at least I fear we need that for fermi
15:17 karolherbst: clock gating
15:17 karolherbst: we could also do manual clock gating on kepler as well
15:18 karolherbst: mhh reference counting is the wrong term here
15:18 karolherbst: more like usage counting
15:18 karolherbst: and if an engine isn't in used anymore, we could clock/power gate it
15:18 karolherbst: if automatic gating doesn't work
15:51 mupuf: karolherbst: https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/drivers/gpu/drm/nouveau?id=22b6c9e8fef4553017a92ed5e27451e0b2f9c5ce <-- this is definitely something we need to do
15:51 mupuf: karolherbst: right, this is only for power gating
15:51 mupuf: and yes, we will need to monitor the command buffer
15:52 mupuf: or a bind to an engine
15:52 karolherbst: mupuf: yes, but this is for maxwell I think
15:52 mupuf: yep
15:52 mupuf: will fix stability issues
15:53 karolherbst: on maxwell I assume
15:53 karolherbst: I think on kepler the voltages are usually higher as they have to be in the vbios
15:53 karolherbst: or the other way around: on maxwell vbios the voltage is more tight to the actualy requiernment
15:55 karolherbst: mupuf: in the future we should just nack changes from gnurou if they also affect desktop gpus, just so that he actually have to implement it for all chips :p I already told him I will nack any tegra only clock gating patches
15:55 karolherbst: well, but I think we can always merge the code together at some point
15:55 karolherbst: just a waste of time really
15:56 mupuf: yeah, I get your point, but I would say no
15:56 mupuf: let him do the work for tegra and let us mimic it for the rest
15:57 mupuf: and ask questions
15:57 mupuf: otherwise, he may just not have the time to do anything
15:57 karolherbst: yeah, I wouldn't want to be so strict about it too, because of other problems which might come up
15:57 karolherbst: but I would at least try to convince him to start with more general solutions first
16:52 karolherbst: mupuf: :O I just saw that a gm204 has a max voltage of 1.27V
16:53 karolherbst: seems normal for those gpus
20:47 karolherbst: imirkin: I think I will try to completly remove that OP_SUB thing, any reason this is a bad idea?
20:48 imirkin: you're in for a lot of pain
20:48 karolherbst: sure
20:48 karolherbst: I am just asking if that's a goo idea :D
20:48 karolherbst: not that I do that and then later on it should stay
20:48 imirkin: static const uint32_t commutative[(OP_LAST + 31) / 32] =
20:48 imirkin: {
20:48 imirkin: // ADD,MAD,MUL,AND,OR,XOR,MAX,MIN
20:48 imirkin: 0x0670ca00, 0x0000003f, 0x00000000, 0x00000000
20:48 imirkin: see that bitmask?
20:48 imirkin: you're gonna have to adjust it.
20:48 imirkin: same for the nvc0 versions
20:48 karolherbst: ugh
20:48 karolherbst: ...
20:48 karolherbst: why do you do stuff like that? :D
20:48 imirkin: so ... like i said, just leave the OP_SUB in
20:49 imirkin: i dunno. "it was like that when i got there"
20:49 imirkin: [my excuse for just about everything]
20:49 karolherbst: I see
20:50 karolherbst: k, then I just whipe out every occuranty of that and add a nasty assert into instruction or something
20:50 karolherbst: actually before starting new things, I should finish my old stuff first...
20:52 karolherbst: I think I will finish up that postRAConstantFolding thing
20:52 karolherbst: because this is quite usefull
20:53 karolherbst: around -0.40% instruction count
20:53 karolherbst: even gpr usage drops
21:23 karolherbst: "Wait on fence 1 (ack = 1, next = 1) timed out !" I got this
21:23 karolherbst: I think I never got this before running shader-db
21:24 karolherbst: did anything change which could lead to this?
21:24 imirkin:blames hakzsam
21:24 imirkin: try reverting the patches he pushed recently.
21:25 karolherbst: it happens in nvc0_screen_destroy
21:26 imirkin:still blames hakzsam
21:29 karolherbst: mupuf, skeggsb, gnurou: do you think we should add a subdev for power/clock gating? then nvkm_engines could call nvkm_cpgateing_* where appropiate and we can move all the loginc into that subdev
21:30 karolherbst: or would you rather see it in common engine code?
21:32 karolherbst: imirkin: I also see that my mem usage was pretty high recently, because the cache was whiped out....
21:33 karolherbst: reverting his stuff seems to help
21:35 karolherbst: imirkin: well, or not
21:35 karolherbst: still happens
21:47 mupuf: karolherbst: a subdev for bashing all regs?
21:47 mupuf: imirkin: always blame the absents ;)
21:47 mupuf: power gating should be a method of all engines
21:48 karolherbst: yeah sure, but how would you implement it chip specific?
21:48 mupuf: like any other subdev?
21:48 karolherbst: mhh yes, that's why I thought it is a good idea to have an actual subdev for this
21:49 karolherbst: which takes an enum in its functions
21:49 mupuf: what you are asking us is to tell you what architecture is the right one when we obviously don't know yet
21:49 karolherbst: to select the engine it gets applied to
21:49 mupuf: what the heck? No
21:49 mupuf: the subdev's job would be to just bash the values on init()
21:49 mupuf: and resume()
21:49 mupuf: that's it
21:49 mupuf: nothing more, noitjing else
21:50 karolherbst: mhhh
21:50 mupuf: why do you want to expose everything?
21:50 mupuf: we won't care about this
21:50 mupuf: we just want to enable it and be done with it
21:51 karolherbst: okay, so we should keep it simple for now until we know more and have to extend things?
21:51 mupuf: the further away we are from nvidia, the worse it gets
21:51 mupuf: yes
21:51 karolherbst: k
21:51 mupuf: but we won't have to, for sure ;)
21:51 mupuf: AKA, if a platform is too complex --> Let's not care abput it
21:51 karolherbst: well, how would you implement it then if we have to disable the clock gate for a specific engine on demand?
21:52 karolherbst: allthough stuff like that can be added later to that subdev anyway if we need it
21:52 mupuf: why would you want to do this?
21:52 mupuf: nope, if you need to do this, you need to move it to the engine
21:53 karolherbst: mhh
21:53 mupuf: AKA, new method
21:53 mupuf: but remember, clock gating can also apply to other non-engine blocks
21:53 mupuf: and they need to be covered
21:53 karolherbst: right
21:54 mupuf: so instead of having to create one subdev per block, there would be the goto subdev for it
21:54 karolherbst: that's why I was thinking to call stuff in their init functions to the new subdev
21:54 mupuf: seriously?
21:54 mupuf: just make helper functions if you want
21:54 mupuf: but seriously, you are overengineering the heck out of this thing
21:54 karolherbst: :D
21:54 mupuf: java, get out of this body!
21:54 karolherbst: yeah, I tend to do that
21:55 mupuf: it is good to understand concepts
21:55 karolherbst: they also say that at work to me sometimes :O
21:55 mupuf: but forcing a shit ton of them on your code makes it rigid
21:55 mupuf: well, then it must be true ;)
21:55 karolherbst: I know it is true :D
21:56 karolherbst: still I think it is a good idea to create one new subdev for the entire power/clock gating logic
21:56 karolherbst: and if we have to provide access to the functionality if really needed in certain cases
21:56 karolherbst: it just looks like it on fermi
21:57 karolherbst: it is a big mess, no idea why, but it looks like it
21:57 mupuf: karolherbst: power gating is a no-no
21:57 karolherbst: what do you mean?
21:57 mupuf: clock gating, when it is just basjing values, then it is OK
21:58 mupuf: otherwise, it HAS to be a method of the engine class
21:58 mupuf: just like power gating
21:58 karolherbst: and every engine would implement it then (
21:58 karolherbst: + help of helper functions)?
21:58 mupuf: yes, make helper functions if there is code that can be shared
21:59 mupuf: but don't make a different subdev, it is against the idea of subdev anyway
21:59 mupuf: the subdev is just there as a mean to avoid creating one subdev per clock-gated block
21:59 mupuf: that's it
22:00 karolherbst: what is the idea of a subdev anway?
22:02 mupuf: represent a sub-device? :D
22:02 mupuf: A block of hw that does not execute code (unlike enigne)
22:02 karolherbst: ahh
22:02 mupuf: like gpio, ptherm, etc...
22:02 karolherbst: yeah, that makes sense
22:03 karolherbst: well, the clock gate bytes are in ptherm actually
22:03 karolherbst: but do we want to put that thing in there?
22:03 imirkin: an engine can be attached to a fifo
22:03 mupuf: imirkin: that too
22:03 imirkin: a subdev can't
22:04 imirkin: in general a subdev is a logical piece of code that manages a group of registers
22:04 mupuf: karolherbst: hmm, well, the helper function can be in the ptherm subdev
22:04 imirkin: usually a whole range at a time
22:05 mupuf: ok, going to bed guys
22:05 mupuf: see you!
22:05 karolherbst: bye
22:11 karolherbst: but seriously, what is nvidia doing here: https://gist.githubusercontent.com/karolherbst/c6004d342552d1cf96165905ceac5e69/raw/cc560cec5b841e696ea317b3481a6f7093a663e0/gistfile1.txt
22:12 imirkin: confusing you.
22:12 imirkin: successfully, it seems.
22:12 karolherbst: well
22:12 karolherbst: maybe we can simply ignore that and set the clockg ate to fermi to auto too
22:12 karolherbst: and be done with it
22:13 karolherbst: without causing any harm
22:13 karolherbst: also nvidia does it only on the graph
22:13 karolherbst: and only for some fermis
22:16 karolherbst: I've added some context: https://gist.githubusercontent.com/karolherbst/3b3e7f03304dfeffc02db84a163b8ee1/raw/a49db20e78efe7abb7638de45958d24d9a8e52b1/gistfile1.txt
22:18 karolherbst: is there simply some really odd devince initialization thing going on and nvidia indeed turns off/on those engines ?
22:21 imirkin: karolherbst: i think it might suspend the engine somehow and then power it back up
22:21 imirkin: there are various kernel module parameters you can pass in
22:21 imirkin: which turn that off
22:23 karolherbst: any reason why nvidia does that?
22:25 imirkin: dunno
22:25 imirkin: power saving
22:25 karolherbst: then I highly assume we can just enable it once for fermi and be done with it
22:25 karolherbst: and nvidia sets it to run just to disable that engine