08:51 karolherbst: imirkin: what about the min/max commit?
08:51 karolherbst: ohh, should read mails first :D
09:53 kloofy: it's a bit scary about how many bugs are you notating here, and how few people try to solve/fix those bugs
09:55 kloofy: makes me wanna wonder, if i have offered help, why are you refusing had you been refusing of additional workforce, but guess it leads to my id problems a need to sabbotaga whatever i do
10:02 mupuf: karolherbst: I do not think you will manage to do a comparaison between nouveau and nvidia for sched codes
10:02 karolherbst: mupuf: why not?
10:02 karolherbst: mupuf: I was talking about compiled binaries extracted from mmts
10:02 mupuf: how about this instead: dumping into files all the generated shaders by nvidia and then ordering them by sched code
10:03 mupuf: that will give you all the most likely conditions
10:03 karolherbst: then we forget about branching
10:03 mupuf: why would we?
10:04 mupuf: oh wait, I got it, you mean writing an OoT sched code generator and comparing your output with nvidsia
10:04 mupuf: that actually is a sweet idea
10:04 karolherbst: right
10:05 mupuf: you can't just pipe the code through nouveau's compiler and expect it to compute the same sched code, that was my point :)
10:05 karolherbst: right
10:05 karolherbst: I just wanted to reused the sched generating code
10:05 karolherbst: but maybe this would be too messy to actually do
10:05 karolherbst: it
10:05 mupuf: Very probably :D
10:06 mupuf: is the code complex already?
10:06 karolherbst: we also have envydis which should help here
10:06 karolherbst: very
10:06 karolherbst: well
10:06 karolherbst: the main problem would be to translate the file into nv50 ir
10:06 karolherbst: we have no idea what actually matters
10:06 mupuf: oh well, you should RE the shit out of it first, and not try to make real code
10:06 karolherbst: maybe some mods also change the sched opcodes
10:07 mupuf: right, the code assumes nv50 IR, so that would likely be a lot of work
10:07 mupuf: just ignore the nouveau compiler and start from scratch
10:07 karolherbst: mupuf: well my idea was this: parse the generated binaries and push it through some sched opcode generating algorithm
10:07 mupuf: .... well .... actually ... you still need to parse the bloody demmio output
10:07 karolherbst: and see how much we differ with nvidia
10:07 karolherbst: mupuf: you mean demmt, right?
10:07 mupuf: yes, sorry
10:08 karolherbst: well
10:08 karolherbst: we have envydis for this
10:08 karolherbst: but mhh
10:08 mupuf: oh,true
10:08 mupuf: but it is not an IR
10:08 karolherbst: I still think it is a bit of work
10:08 karolherbst: doesn't matter
10:08 karolherbst: I don't need a real IR
10:08 karolherbst: what I need is where an instruction gets executed
10:08 karolherbst: not what the instruction really does
10:08 karolherbst: and a bit of source handling
10:09 mupuf: where?
10:09 mupuf: what you need is all the information possible
10:09 karolherbst: like sf
10:09 karolherbst: u
10:09 karolherbst: mhh, right
10:09 mupuf: there are many instructions, you need to know the registers, the type of operation, etc..
10:09 karolherbst: but I menat for using the current algorithm
10:10 karolherbst: those opclasses in nv50ir doesn't map 1to1 to the actual part used to executed those instructions
10:10 karolherbst: I think unit is the right word for this?
10:10 mupuf: no idea what you are talking about :s
10:11 karolherbst: I mean the instruction execution units
10:11 karolherbst: like the fpu on a cpu
10:11 karolherbst: this is really important to know for tesla (and maybe fermi as well, less on kepler)
10:11 RSpliet: yes, unit is correct, floating point unit, arithmetic logic unit...
10:13 mupuf: karolherbst: are you sure it is as simple as just knowing the unit?
10:13 mupuf: there are no dependencies for registers?
10:13 karolherbst: mupuf: no, but it is a part of it
10:13 mupuf: like RAW or WAR
10:13 karolherbst: mupuf: gt200+ is weird here a little
10:13 karolherbst: mad can be executed on two different units
10:13 karolherbst: and then the type also matters
10:14 RSpliet: karolherbst: keep in mind, iirc, kepler has 6 FPUs for 4 concurrent warps
10:14 RSpliet: which is weird, but... double check those kind of details as well ;-)
10:15 mupuf: ah ah, sounds like there will be a lot of fun here :D
10:15 karolherbst: mupuf: yes
10:15 karolherbst: mupuf: The individual streaming processing cores
10:15 karolherbst: of GeForce GTX 200 GPUs can now perform near full-speed dual-issue of multiply-add operations (MADs) and MULs (3 flops/SP) by using the SP’s MAD unit to perform a MUL and ADD per clock, and using the SFU to perform another MUL in the same clock
10:16 karolherbst: from nvidia ocs
10:16 karolherbst: *docs
10:16 karolherbst: but hey, the sfu can be busy doing sin stuff or something
11:11 kloofy: mupuf: just couple of sentences, i posted yeah two of the solutions one for fermi kepler and one for maxwell how to plumb the correct sched codes into gallium , they were two links opennvidia and max-as
11:11 kloofy: i really am in difficulties to understand the terminlogu you use, but just look at those projects to handle the codes
11:17 kloofy: and remember my notes, this is the battle that is easily winnable only if you want to do it, and i offered two ways to do the scheduling, but i am getting onto your nerves again, so i have enough things to do to keep me occupied and quiet here now
13:37 karolherbst: mupuf: anything comes to your mind which would help with reducing power consumption except clock/power gating?
13:38 kdvr: hi, i have gone through multiple documents / troubleshooting etc but I can't find anything that could help me yet. Whenever I disconnect my monitor or switch (through hdmi switch) I lose my mouse cursor and chrome for instance does not show anything anymore, the only way to resolve is by logging out and logging back in.
13:39 karolherbst: kdvr: did you try to restart your window manager/compositor?
13:40 kdvr: I am running xfce, is there an easy way to restart without loosing the current open apps?
13:40 karolherbst: xfwm4 --replace
13:40 kdvr: I can try now
13:42 kdvr_: nope that did not work... Mouse is gone and I can't seem to type anything anymore in chrome (which I am using for freenode)
13:42 kdvr_: just logged out and logged back in
13:43 karolherbst: but does anything changes at all?
13:43 kdvr_: no, it just flickers and that is it
13:43 karolherbst: what are your kernel/X versions?
13:45 kdvr_: kernel: 4.7.2-201.fc24.x86_64
13:46 kdvr_: let me see how I can get the X version on fedora
13:47 karolherbst: and maybe it also helps if you tell us what gpu you have
13:48 kdvr_: NVIDIA Corporation GM206 [GeForce GTX 960]
13:49 kdvr_: sorry I am not that fast hold on :P
13:51 kdvr_: I am running X.Org X Server 1.18.4
13:52 kdvr_: All on Fedora 24
13:52 kdvr_: I can provide any log needed :P just have to tell me hahaha, I am a n00b in debugging
13:53 karolherbst: maybe dmesg or the x log tell us something
13:54 kdvr_: alright, thanks btw for your time
13:55 kdvr_: Dmesg: http://pastebin.com/93aKEV8H
13:56 kdvr_: And the Xorg.0.log: http://pastebin.com/zrWe3kSt
14:00 karolherbst: mhh the .old x log please
14:02 kdvr_: Here you go Xorg.0.log.old http://pastebin.com/7wMGztLJ
14:02 karolherbst: odd
14:03 kdvr_: It all looks chinese to me :P
14:04 kdvr_: I was hoping for errors when I would disconnect or reconnect
14:05 karolherbst: me too
14:07 kdvr_: Is there a way to get more verbose information for you?
14:50 mupuf: karolherbst: DVFS, clock gating, power gating are the typical techniques
14:51 mupuf: keeping the gpu cool also helps
14:51 mupuf: but you need to factor in the cost of the fan
14:51 mupuf: after this ... improve performance to reduce power usage (when vsync is on)
14:51 karolherbst: right, but I meant stuff besides that
14:53 mupuf: so yeah, that's it
14:53 mupuf: after this, it really is tuning the DVFS algorithms
14:54 mupuf: try to keep the clocks as low as possible while still not impacting the performance of dynamic workload
14:55 karolherbst: right
14:55 karolherbst: okay, I ust hoped there would be something left to do, but I guess reing power gating will be quite a thing
14:56 karolherbst: but even clock gating already helps a lot, even reducing power consumption under some workloads
14:56 karolherbst: I just asked cause I try to finish collecting all information I want to add to my talk, maybe I just forgot something
14:57 mupuf: sure sure
14:57 mupuf: well, clock gating is still left for REing and will be tough
14:57 mupuf: but bashing all the regs should help already
14:57 karolherbst: yes
14:57 karolherbst: it is enough for kepler+
14:57 karolherbst: nvidia seldom touches them
14:57 mupuf: nope, does not work on my e6
14:57 karolherbst: like I only saw it after suspend
14:58 karolherbst: mupuf: are you sure?
14:58 mupuf: yes, I am
14:58 karolherbst: what gpus are in reator currently?
14:58 mupuf: I tried for a long time to make them work
14:58 karolherbst: maybe I would take a look at some point
14:58 mupuf: not sure
14:58 mupuf: it is off right now
14:58 karolherbst: I will check
14:58 karolherbst: not anymore :D
14:59 mupuf: ok, have fun! I am with pmoreau in Utrecht, Roy's city of origin
14:59 mupuf: we are abut to go for a walk
14:59 karolherbst: gk106 :)
14:59 mupuf: ok, great!
14:59 karolherbst: I am sure it will work :p
14:59 mupuf: try, I did not see any drop in power usage when enabling
14:59 mupuf: well, if it does, /me is happy
14:59 karolherbst: uhh, it booted into the blob again
15:00 karolherbst: mhh
15:00 mupuf: fuck, no idea what people did, but they really screwed up grub
15:00 karolherbst: well
15:00 mupuf: grub-reboot's state should always be resette
15:00 mupuf: d
15:00 mupuf: or someone changed grub's conf
15:00 mupuf: hakzsam: ?
15:00 karolherbst: well, it sometimes happens
15:01 karolherbst: what is the linux grub entry? 0 or 1?
15:01 mupuf: try either one or the other :p
15:01 karolherbst: :D
15:02 mupuf: both are linux though :D
15:02 mupuf: 0 is what I remmeber
15:02 karolherbst: right :D
15:02 karolherbst: grub from the nvidia partition does something bad I assume?
15:03 mupuf: well, it is not the same boot partition
15:03 mupuf: ah, it booted on nouveau now
15:03 karolherbst: sure, but to repair that I always chroot into the nouveau partition and run grub-reboot 0 there
15:04 mupuf: right
15:04 karolherbst: and this leads me to the nouveau partitioan on reboot
15:04 karolherbst: no idea if chroot is required here
15:04 mupuf: normally, you just need to reboot once and it always go back to nouveau
15:05 karolherbst: regarding power consumption: on lowest clocks the effect is rather slim
15:06 karolherbst: k, it works
15:06 karolherbst: but well
15:06 karolherbst: the difference...
15:06 karolherbst: on 0f max boost
15:06 karolherbst: 43.77 W -> 40.60W
15:07 mupuf: ok, I guess I did not see any difference because I did reclock the gpu
15:07 karolherbst: on 07
15:07 karolherbst: 18.9W -> 18.4W
15:07 karolherbst: maybe you thought the 0.5W are usual noise or something
15:07 mupuf: well, the effect is supposed to be much larger
15:07 karolherbst: due to power sensors configured like shit
15:07 karolherbst: on kepler not so much
15:07 karolherbst: on maxwell, yes
15:07 mupuf: yeah, right, the sensor output used to be super noisy
15:07 karolherbst: on maxwell I get 30% drops on 0f
15:08 mupuf: ok, great
15:08 hakzsam: /root/reset_grub.sh works fine
15:08 hakzsam: on blob partition
15:08 mupuf: hakzsam: oh, right!
15:08 karolherbst: :O
15:08 hakzsam: and grub-reboot 2 from the nouveua one
15:08 mupuf: I had forgotten about this script
15:08 karolherbst: mupuf: even for some workloads at full load I got 10% drops on maxwell
15:09 mupuf: nice
15:09 karolherbst: yep
15:09 karolherbst: glxgears .D
15:09 mupuf: on tesla, I could get something similar when enabling clock gating
15:09 karolherbst: mhh
15:09 mupuf: err, clock throttling on idle
15:09 karolherbst: I see
15:10 karolherbst: I guess it is safe to implement it on nouveau then, because it kind of works on every kepler and maxwell gpu
15:11 mupuf: well, let's see!
15:11 karolherbst: mupuf: but we should also RE those bits to globally disable clock gating, just in case there are some gpus where it is indeed disabled
15:11 mupuf: want to write the patch?
15:12 mupuf: right, we'll need to find them
15:12 karolherbst: well, I planned to work on that with gnurou a bit or he do it or somebody else
15:12 mupuf: we can use ezbench to try an A/B with clock gating enabled or not
15:12 karolherbst: I really don't feel ike implementing it, because there are more complex/difficult things to do
15:12 mupuf: we'll see the impact on perf and power :)
15:12 karolherbst: and any new dev could implement it instead
15:12 karolherbst: mupuf: no impact afaik
15:12 mupuf: ah, right, new comer
15:12 karolherbst: yes
15:13 mupuf: karolherbst: and you base this data on what?
15:13 karolherbst: would be a good task to get used to the code base
15:13 mupuf:requires runs of every games and benchmarks possible :p
15:13 karolherbst: mupuf: well, nothing :p I just have it enabled since always
15:13 mupuf: and since you never take it off, you don't see any perf impact
15:13 karolherbst: well I sometimes checked and it didn't make any difference
15:13 mupuf: so, the only way of testing is to do an A/B comparaison
15:13 karolherbst: also nvidia doesn't touch those regs again
15:14 mupuf: sure sure, but still ;)
15:14 karolherbst: I know
15:14 karolherbst: but I assume there is no difference really, maybe in workloads with really spiky loads
15:14 karolherbst: and with spiky I mean several spikes within milliseconds
15:15 mupuf: yes
15:15 karolherbst: afaik it might happen that the clock signals gets disconnected with vsync a lot
15:15 karolherbst: but if we vsync we don't need more perf anyway
15:16 karolherbst: and I would assume that's the biggest tradeoff, that you have a slim lattency on starting things
15:16 kdvr: karolherbst: sorry to be a bother :P but do you think I can enable more verbose logging for my monitor disconnecting /reconnecting issue?
15:16 karolherbst: kdvr: no idea if that helps, I am no expert in that X stuff
15:16 karolherbst: I kind of hoped somebody else would have any ideas
15:16 kdvr: ok cool :-) me too
15:16 kdvr: haha
15:16 karolherbst: mupuf: but I think we need some reference counting ok within the kernel for the engines
15:16 karolherbst: *code
15:17 mupuf: what for?
15:17 karolherbst: or at least I fear we need that for fermi
15:17 karolherbst: clock gating
15:17 karolherbst: we could also do manual clock gating on kepler as well
15:18 karolherbst: mhh reference counting is the wrong term here
15:18 karolherbst: more like usage counting
15:18 karolherbst: and if an engine isn't in used anymore, we could clock/power gate it
15:18 karolherbst: if automatic gating doesn't work
15:51 mupuf: karolherbst: https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/drivers/gpu/drm/nouveau?id=22b6c9e8fef4553017a92ed5e27451e0b2f9c5ce <-- this is definitely something we need to do
15:51 mupuf: karolherbst: right, this is only for power gating
15:51 mupuf: and yes, we will need to monitor the command buffer
15:52 mupuf: or a bind to an engine
15:52 karolherbst: mupuf: yes, but this is for maxwell I think
15:52 mupuf: yep
15:52 mupuf: will fix stability issues
15:53 karolherbst: on maxwell I assume
15:53 karolherbst: I think on kepler the voltages are usually higher as they have to be in the vbios
15:53 karolherbst: or the other way around: on maxwell vbios the voltage is more tight to the actualy requiernment
15:55 karolherbst: mupuf: in the future we should just nack changes from gnurou if they also affect desktop gpus, just so that he actually have to implement it for all chips :p I already told him I will nack any tegra only clock gating patches
15:55 karolherbst: well, but I think we can always merge the code together at some point
15:55 karolherbst: just a waste of time really
15:56 mupuf: yeah, I get your point, but I would say no
15:56 mupuf: let him do the work for tegra and let us mimic it for the rest
15:57 mupuf: and ask questions
15:57 mupuf: otherwise, he may just not have the time to do anything
15:57 karolherbst: yeah, I wouldn't want to be so strict about it too, because of other problems which might come up
15:57 karolherbst: but I would at least try to convince him to start with more general solutions first
16:52 karolherbst: mupuf: :O I just saw that a gm204 has a max voltage of 1.27V
16:53 karolherbst: seems normal for those gpus
17:29 kloofy: i have disorders my own from 2005 when things all collapsed for me, after getting over decade of conspiracy, my consentration travels around so bad, kinda troublesome to do stuff still , and pack things inside my brain not leaving enough of those chanches for people to crack me up even further
17:30 kloofy: sort of can't resist at all it's dead easy to distract me with a nasty atitude those days
17:36 karolherbst: does anybody know of any nouveau kernel documentation we have?
17:38 imirkin: there is none.
17:38 kloofy: so for those firmware dudes, there is a method to clock circuits up even more intelligently on programmable devices, i know karolhebst knows how it's done, but normally pll generates the signal which can be multiplied and yet additionally phase shifted normally on mid-level devices in hw for 6-8 shifts
17:38 imirkin: skeggsb has various promised to write some
17:39 imirkin: but afaik that's all theoretical
17:39 kloofy: and it's done by using so called multiple clock signals
17:39 karolherbst: I see
17:39 imirkin: at one point i wrote something, but i (a) lost it and (b) it'd be very outdated by now, as it was about 3 nouveau rewrites ago
17:39 kloofy: i.e multi clock domains
17:39 karolherbst: imirkin: k
17:39 imirkin: do you have concrete questions?
17:39 imirkin: or just want something to read?
17:39 karolherbst: no
17:39 karolherbst: neither :D
17:41 imirkin: trouble sleeping and want something less addictive than sleeping pills? :)
17:41 imirkin: if so, i can also recommend any text on general relativity.
17:41 karolherbst: nah, I know better things to do when I can't sleep
17:41 kloofy: so for the asics that specific information isn't that relevant, for designers in competition that can be replicated or forumated doing things from scratch
17:44 kloofy: *formulated
17:47 kloofy: as of generating the clock signals and timing is largely guided also by tools who help with their feedback to plant things right, the fairly crucial part happens somewhere in the kernel for cpu designs, and that part is how to schedule instructions in the firmware
17:47 kloofy: and in the kernel the stream to start with
17:49 kloofy: the static scheduling tdm time derived stuff, i haven't myself fully understood, how it works, but i am non the less going with dynamic scheduling my own anyways
17:53 kloofy: dynamic scheduling is a version where input output ports are sort of pollable and when they are not in busy/occuipied states, the stream is scheduled for those units
17:54 kloofy: but static version is kinda something that compiler calculated the latency and they name it some distance, and probably they play with injection rates accordingly
17:56 kloofy: as said, i am working on understanding it better but, at the moment there are things which i do not fully understand in the static scheduling version, and it's quite probably that i won't use this so called more area effecient and cheaper method, cause i can get the dynamic one very cheap too
17:59 kloofy: as for the gpu's the scheduling is driver managed and improving it is lot simpler also for gpu rom for programming devices that would function the same, but doing that for the cpu is largely probably more complex, i am also doing the proper cpu firmware
17:59 kloofy: cause i tend to think that good price and mid-level hw cpu cores would end up being too slow
18:06 kloofy: i have all the proper verilog and tool based methods to make a very fast circuit for single core with several warps doing the alus and also to partition the cores for multicore options
18:06 kloofy: but generally i have not studied on cpu's extensively yet how that in kernel scheduler works
18:09 kloofy: but overall if things go well i'd after this work first distrubute scrambled netlists and later open up the platform to be open source if i managed to earn a little money
18:10 kloofy: if vendors manage to keep the programmable hw cheap, we have a lot of computational power to play with
18:12 kloofy: haven't yet found from here persons to collaborate with, i know imirkin based of the times when he yet communicated with me which was some old times
18:12 kloofy: that he has such programmable experience on those devices
18:14 kloofy: though overall my stamements are we do not have situations that hw does not provide chanches for the interested programmers or that there is information missing on the net how to fab and program different chips
18:17 kloofy: back those days programmers captilized things on very crappy hw, brilliant minds forming different corporations, today it's as hw production has been boosted so eneormously it's easier to do software heroics and find the satisfaction cause of using really fluently running hw probably
18:17 kloofy: which is also desgined to be power effecient, very small and powerful etc.
18:20 kloofy: i am one of those who likes small devices due to the life style and stuff happening around me, i was at times very unpeasently dissapointed that the strategy to produce software or choose the correct hw to make it happen was not going anywhere
18:22 kloofy: i concur with ajax there that it was enough of dissapointement to deal with first atom processors who filled that area, which were so weak that they barely were capable to function for web browsing sake
18:23 kloofy: and the strategy was just the sales, that non expert persons likely think that it's 21st century and do that mistake of buying that hw until one gets more educated and knows that is crap
18:29 kloofy: *this is, so yeah i take that opportunity and try to design powerful mobile chips, i am willing to collaborate, but have not yet met a person who has interest here
18:34 kloofy: i think it's mature time in electronics when amiguous plans can be taken on without too much risk being involved
18:34 kloofy: ambigous
18:37 kloofy: i.e i think about very ambiguous plans and it's questionable wether i have such ambitions i've yeah described it's the way i build my thoughts, rather then step by step with smaller missions to just tolerate the stress better, i.e it's a dilemma but i am using the first strategy
18:42 kloofy: though that strategy yeah is mostly referring to a sickness which also is the case with me, i before falling ill badly would had chosen the little step my step progressing and not building too many dreams, i mean just as linuses opinion about it was
18:44 kloofy: imo it just depends how much stress human has, if i naturally i am stressed out and in a need to tolerate it anyways in greater extend than it does not make a difference to take on large projects which are stressfull for implementing too
18:49 kloofy: so i am off, it seems i normally scare people away with this, most people do not want to take any stress on, where as my past experienace and opinion nowdays it's just stress has to be tolerated
19:04 kloofy: as all know body parts are in connection to work together and sometimes depenndant on really small and at first missable details, most of time even if person looks very talented outside, one does not get along with small detail that does not function as it should be which was inherited
19:04 kloofy: that causes the heart and stuff to unexpeditely fibritillate and human freaks out in times when one has to perform on harder things
19:05 kloofy: and usually unfortunently the fightback is projecting all the shit to normal persons, which is mad, cause they are unhappy what parts god gave them subcontiously
19:07 kloofy: and it has to be regulated somehow that they can't punish innocent people in a way, but that is a dreamland cause there are more of those ill ones then the others
19:07 kloofy: what helps is that when a country gets some sort of wealthier social status
19:11 kloofy: it's very difficult for them who have things to loose cause of zombies projecting their faults to him daily basis, ones the conspiracy is opened up and groups formulated to coin them, it could be the life time suffering road
19:17 kloofy: this is pointless situation, but that is how things go, people who suffered with getting the parts, make others who don't suffer in life a perfect balance, nasty and cruel view for the latter
19:19 kloofy: in other words, it's not that sick person who suffers in life, but people around one normally take a majorly bigger hit
19:19 kloofy: bye
20:47 karolherbst: imirkin: I think I will try to completly remove that OP_SUB thing, any reason this is a bad idea?
20:48 imirkin: you're in for a lot of pain
20:48 karolherbst: sure
20:48 karolherbst: I am just asking if that's a goo idea :D
20:48 karolherbst: not that I do that and then later on it should stay
20:48 imirkin: static const uint32_t commutative[(OP_LAST + 31) / 32] =
20:48 imirkin: {
20:48 imirkin: // ADD,MAD,MUL,AND,OR,XOR,MAX,MIN
20:48 imirkin: 0x0670ca00, 0x0000003f, 0x00000000, 0x00000000
20:48 imirkin: see that bitmask?
20:48 imirkin: you're gonna have to adjust it.
20:48 imirkin: same for the nvc0 versions
20:48 karolherbst: ugh
20:48 karolherbst: ...
20:48 karolherbst: why do you do stuff like that? :D
20:48 imirkin: so ... like i said, just leave the OP_SUB in
20:49 imirkin: i dunno. "it was like that when i got there"
20:49 imirkin: [my excuse for just about everything]
20:49 karolherbst: I see
20:50 karolherbst: k, then I just whipe out every occuranty of that and add a nasty assert into instruction or something
20:50 karolherbst: actually before starting new things, I should finish my old stuff first...
20:52 karolherbst: I think I will finish up that postRAConstantFolding thing
20:52 karolherbst: because this is quite usefull
20:53 karolherbst: around -0.40% instruction count
20:53 karolherbst: even gpr usage drops
21:23 karolherbst: "Wait on fence 1 (ack = 1, next = 1) timed out !" I got this
21:23 karolherbst: I think I never got this before running shader-db
21:24 karolherbst: did anything change which could lead to this?
21:24 imirkin:blames hakzsam
21:24 imirkin: try reverting the patches he pushed recently.
21:25 karolherbst: it happens in nvc0_screen_destroy
21:26 imirkin:still blames hakzsam
21:29 karolherbst: mupuf, skeggsb, gnurou: do you think we should add a subdev for power/clock gating? then nvkm_engines could call nvkm_cpgateing_* where appropiate and we can move all the loginc into that subdev
21:30 karolherbst: or would you rather see it in common engine code?
21:32 karolherbst: imirkin: I also see that my mem usage was pretty high recently, because the cache was whiped out....
21:33 karolherbst: reverting his stuff seems to help
21:35 karolherbst: imirkin: well, or not
21:35 karolherbst: still happens
21:47 mupuf: karolherbst: a subdev for bashing all regs?
21:47 mupuf: imirkin: always blame the absents ;)
21:47 mupuf: power gating should be a method of all engines
21:48 karolherbst: yeah sure, but how would you implement it chip specific?
21:48 mupuf: like any other subdev?
21:48 karolherbst: mhh yes, that's why I thought it is a good idea to have an actual subdev for this
21:49 karolherbst: which takes an enum in its functions
21:49 mupuf: what you are asking us is to tell you what architecture is the right one when we obviously don't know yet
21:49 karolherbst: to select the engine it gets applied to
21:49 mupuf: what the heck? No
21:49 mupuf: the subdev's job would be to just bash the values on init()
21:49 mupuf: and resume()
21:49 mupuf: that's it
21:49 mupuf: nothing more, noitjing else
21:50 karolherbst: mhhh
21:50 mupuf: why do you want to expose everything?
21:50 mupuf: we won't care about this
21:50 mupuf: we just want to enable it and be done with it
21:51 karolherbst: okay, so we should keep it simple for now until we know more and have to extend things?
21:51 mupuf: the further away we are from nvidia, the worse it gets
21:51 mupuf: yes
21:51 karolherbst: k
21:51 mupuf: but we won't have to, for sure ;)
21:51 mupuf: AKA, if a platform is too complex --> Let's not care abput it
21:51 karolherbst: well, how would you implement it then if we have to disable the clock gate for a specific engine on demand?
21:52 karolherbst: allthough stuff like that can be added later to that subdev anyway if we need it
21:52 mupuf: why would you want to do this?
21:52 mupuf: nope, if you need to do this, you need to move it to the engine
21:53 karolherbst: mhh
21:53 mupuf: AKA, new method
21:53 mupuf: but remember, clock gating can also apply to other non-engine blocks
21:53 mupuf: and they need to be covered
21:53 karolherbst: right
21:54 mupuf: so instead of having to create one subdev per block, there would be the goto subdev for it
21:54 karolherbst: that's why I was thinking to call stuff in their init functions to the new subdev
21:54 mupuf: seriously?
21:54 mupuf: just make helper functions if you want
21:54 mupuf: but seriously, you are overengineering the heck out of this thing
21:54 karolherbst: :D
21:54 mupuf: java, get out of this body!
21:54 karolherbst: yeah, I tend to do that
21:55 mupuf: it is good to understand concepts
21:55 karolherbst: they also say that at work to me sometimes :O
21:55 mupuf: but forcing a shit ton of them on your code makes it rigid
21:55 mupuf: well, then it must be true ;)
21:55 karolherbst: I know it is true :D
21:56 karolherbst: still I think it is a good idea to create one new subdev for the entire power/clock gating logic
21:56 karolherbst: and if we have to provide access to the functionality if really needed in certain cases
21:56 karolherbst: it just looks like it on fermi
21:57 karolherbst: it is a big mess, no idea why, but it looks like it
21:57 mupuf: karolherbst: power gating is a no-no
21:57 karolherbst: what do you mean?
21:57 mupuf: clock gating, when it is just basjing values, then it is OK
21:58 mupuf: otherwise, it HAS to be a method of the engine class
21:58 mupuf: just like power gating
21:58 karolherbst: and every engine would implement it then (
21:58 karolherbst: + help of helper functions)?
21:58 mupuf: yes, make helper functions if there is code that can be shared
21:59 mupuf: but don't make a different subdev, it is against the idea of subdev anyway
21:59 mupuf: the subdev is just there as a mean to avoid creating one subdev per clock-gated block
21:59 mupuf: that's it
22:00 karolherbst: what is the idea of a subdev anway?
22:02 mupuf: represent a sub-device? :D
22:02 mupuf: A block of hw that does not execute code (unlike enigne)
22:02 karolherbst: ahh
22:02 mupuf: like gpio, ptherm, etc...
22:02 karolherbst: yeah, that makes sense
22:03 karolherbst: well, the clock gate bytes are in ptherm actually
22:03 karolherbst: but do we want to put that thing in there?
22:03 imirkin: an engine can be attached to a fifo
22:03 mupuf: imirkin: that too
22:03 imirkin: a subdev can't
22:04 imirkin: in general a subdev is a logical piece of code that manages a group of registers
22:04 mupuf: karolherbst: hmm, well, the helper function can be in the ptherm subdev
22:04 imirkin: usually a whole range at a time
22:05 mupuf: ok, going to bed guys
22:05 mupuf: see you!
22:05 karolherbst: bye
22:11 karolherbst: but seriously, what is nvidia doing here: https://gist.githubusercontent.com/karolherbst/c6004d342552d1cf96165905ceac5e69/raw/cc560cec5b841e696ea317b3481a6f7093a663e0/gistfile1.txt
22:12 imirkin: confusing you.
22:12 imirkin: successfully, it seems.
22:12 karolherbst: well
22:12 karolherbst: maybe we can simply ignore that and set the clockg ate to fermi to auto too
22:12 karolherbst: and be done with it
22:13 karolherbst: without causing any harm
22:13 karolherbst: also nvidia does it only on the graph
22:13 karolherbst: and only for some fermis
22:16 karolherbst: I've added some context: https://gist.githubusercontent.com/karolherbst/3b3e7f03304dfeffc02db84a163b8ee1/raw/a49db20e78efe7abb7638de45958d24d9a8e52b1/gistfile1.txt
22:18 karolherbst: is there simply some really odd devince initialization thing going on and nvidia indeed turns off/on those engines ?
22:21 imirkin: karolherbst: i think it might suspend the engine somehow and then power it back up
22:21 imirkin: there are various kernel module parameters you can pass in
22:21 imirkin: which turn that off
22:23 karolherbst: any reason why nvidia does that?
22:25 imirkin: dunno
22:25 imirkin: power saving
22:25 karolherbst: then I highly assume we can just enable it once for fermi and be done with it
22:25 karolherbst: and nvidia sets it to run just to disable that engine