00:11marcheu: yeah these tiny 2D engines are usually used for things like compositing, rotation, YUV color conversion etc. They are super low power compared to the 3D engine so in a phone people care about them
00:12marcheu: it bothers me as well that they exist, but I can't argue with real-world power gains :p
00:12karolherbst: marcheu: I am mainly wondering about if it would be more power efficient to just design one GPU being able to do both or if that wouldn't make much of a difference
00:13marcheu: think about it, the amount of computation per pixel is much higher with a 3D engine. Think about stuff like shaders, rasterizer, etc.
00:14karolherbst: ohh, I wasn't talking about running it through the 3d pipeline
00:14karolherbst: even on todays nvidia gpus there are some 2d engines
00:14karolherbst: or well, you can do a lot with the 2d engine
00:14marcheu: then you'd have to detect 2D quads essentially, and when you see them send to a special "2D only" engine.... that's what we have today?
00:14karolherbst: let's put it this way
00:14karolherbst: marcheu: huh? it's not used through GL
00:15karolherbst: but only used inside the DDX right now.. well for nouveau that is
00:15marcheu: no they aren't
00:15marcheu: but I don't understand what you have in mind
00:15karolherbst: I am wondering if one GPU with a 3d + 2d engine + shared components would be more power efficient than 2 more seperate GPUs
00:16karolherbst: if you are able to completly power down the 3d engine or other power hungry parts if you don't need them
00:16marcheu: if it's completely standalone, why have the 2D engine integrated in the 3D one?
00:16karolherbst: well, the thing is, it's not
00:16karolherbst: not even today
00:16imirkin_: karolherbst: we use the 2d engine in GL ... for blits basically
00:16marcheu: in Chrome OS we avoid those 2D engines completely, and we are happy to not even pay the cost to integrate it in the SoC :)
00:16karolherbst: because for scan out you kind of have to blit to the other GPU
00:17imirkin_: although it's the same GPU engine which is servicing the requests
00:17imirkin_: dunno if there's separate dedicated hardware, or how it works deep down inside.
00:17airlied: karolherbst: with arm you never have to blit for scanout since it's all system ram :-P
00:17imirkin_: but it's all inside of "gr"
00:17marcheu: imirkin_: it used to the the same hw as the 3D engine, not sure these days
00:17karolherbst: imirkin_: sure.. but I am talking about a potential hardware design, not what we have today :p
00:17airlied: it's more getting tthe tiling formats to agree
00:17karolherbst: airlied: ohh, smart.. right
00:17imirkin_: but it's a _much_ more convenient interface if those are the operations you want
00:18imirkin_: marcheu: sure, same hw, but ... what are the hw primitives being used -- unknowable given our view of the world
00:18marcheu: anyway, there's a reason for these 2D engines today, I suspect as the android compositing becomes more fancy and less doable on a 2D engine, they will disappear
00:19karolherbst: isn't like the case already today?
00:19marcheu: imirkin_: they used to share PGRAPH state... :)
00:19karolherbst: never had the feeling the interface was particular smooth except when you force the 3d engine
00:19imirkin_: marcheu: yes. and they do today. but PGRAPH is big and complicated. who's to say it doesn't have dedicated 2d functionality.
00:20marcheu: imirkin_: because if you mess with PGRAPH for one thing (let's say blend) it used to mess for both 2D & 3D
00:20imirkin_: but there can still be dedicated 2d hw
00:20imirkin_: instead of going through the full rast pipeline
00:21imirkin_: perhaps it's even tied to the same state bits
00:21marcheu: IIRC I have gotten 2 triangles to show out of the 2D engine as well
00:21marcheu: so that's that...
00:21imirkin_: that's just scary
00:22marcheu: anyway, I don't think nvidia & AMD care much about their power numbers like phone people do
00:22karolherbst: they do, but different
00:22marcheu: phone folks will cry over 100mW
00:23marcheu: on desktop it's "meh"
00:23karolherbst: well.. AMD more than nvidia anyway
00:23karolherbst: marcheu: I wouldn't say that
00:23karolherbst: on a laptop 100mW is a lot these days
00:23imirkin_: on desktop they don't notice until it's 100MW :)
00:23karolherbst: not a lot
00:23karolherbst: but significant
00:23karolherbst: my laptop idles around <6W
00:24karolherbst: and it's 15"
00:24marcheu: fair enough, cutoff is not 100mW but you get the idea :)
00:28karolherbst: marcheu: but I don't think it's a big difference between laptops and mobiles in practise. They kind of want to reduce power consumption, do insane things to do so. Just laptops have bigger components with more performance, so the scale is higher
00:28karolherbst: but.. essentially it's the same, no?
00:29karolherbst: matters less for nvidia though as they went mainly with optimus
00:29karolherbst: but for AMD it does with their APUs and intel of course
00:30marcheu: my experience working with both laptop-ish (x86) and phone-ish (ARM) SoCs is that the ARM ones are always more power efficient, for everything from video to display to 3D. There are gaps at high usage but also maybe more importantly at full idel
00:30karolherbst: power efficient as in "less power" or "more perf/W"?
00:31marcheu: less power at idle for sure (which is 0 perf) but also better perf/W
00:31marcheu: it seems like when you design hw, you have to pick between power and performance when you optimize
00:32karolherbst: somehow I never really find good numbers for the latter, because for some reasons nobody is really able to compare that correctly
00:32marcheu: you can look at Chromebook battery life measurements (which we market as "battery life"), they are based on the same test
00:33marcheu: of course ARM usually has smaller batter, make sure to factor that in ;)
00:34marcheu: I agree though, that the result depend a lot of how much time you are willing to spend on optimizing :)
00:34karolherbst: but even battery life time is a useless number if it comes to power efficiency if the two systems you compare are not equal enough
00:34marcheu: well they're running the same sw stack doing the same thing
00:34karolherbst: anyway, I still have to see that valid benchmark comparing both archs showing which is more power efficient
00:34marcheu: there are chromebooks with same display etc. with ARM & x86 variants
00:34karolherbst: I think the most legit were highly parralized applications benchmarked on servers
00:35marcheu: that's to a great extent much simpler
00:35marcheu: the CPUs are pretty good at power
00:35karolherbst: yeah, less things to get wrong
00:35marcheu: a lot of the power waste goes to peripherals
00:35marcheu: displays and wifi and side chips and such
00:35karolherbst: marcheu: I am sure for the chromebooks not much was actually the same. like the entire motherboard has to be different...
00:35karolherbst: or so I would assume
00:36marcheu: for sure
00:36karolherbst: but yeah.. right now I think displays are the thing consuming most power
00:36karolherbst: and I kind of see why idle power consumption _might_ be better with ARM, but I don't think the CPU is the issue here with intel
00:36karolherbst: just everything else sucks
00:36marcheu: I am sure your display is 3.5W or 4W out of your 6W budget :)
00:37karolherbst: sounds about right
00:37karolherbst: it's even a 4K one
00:38karolherbst: I have no idea how power efficient the wifi chips are eg...
00:38karolherbst: I kind of blame them and the bluetooth ones
00:39karolherbst: marcheu: do you know how much one can trust "turbostat" with the package power consumption?
00:39marcheu: I don't trust it. I have two ways to measure:
00:39marcheu: - through the battery discharge
00:39marcheu: - using hw probes
00:40marcheu: the 1st one gets you down to ~50mW accuracy, 2nd one I would say 5-10mW
00:40marcheu: if you measure through battery discharge make sure your battery is always at the same charge level, because these things aren't linear
00:41marcheu: but they aren't linear in a reproducible way, so good neough
00:41karolherbst: yeah... but I am mostly interested in what part is actually drawing the power
00:41karolherbst: but turbostat seems to be close enough
00:41karolherbst: usually reports around 2.5W idling
00:41karolherbst: ohh, that's on AC :)
00:42karolherbst: 0.7W on battery
00:42karolherbst: mhh 0.5W by the RAM alone
00:42karolherbst: that's interesting
00:49marcheu: IMO tools like turbostat and intel gpu top are good for understanding what's going on, but not for measurements. I.e. run the tool, get a fell for what's happening, then go measure the power for real :)
00:49marcheu: they serve different purposes
00:49karolherbst: marcheu: right.. but I would like to get the power consumption of the CPU alone
00:49marcheu: yeah if you have hw probes you can look at the different power rails...
00:49marcheu: one is CPU
00:50karolherbst: at least for the full package intel kind of needs to have a more or less reliable sensor, no?
00:50karolherbst: or maybe they don't and all their TDP stuff is just voodoo
15:37kherbst: mupuf: maybe you could figure something out for me. You are aware of this runpm issue we have on many laptops, right? I think it's an Intel bug actually
15:38kherbst: I am convinced that the skylake and kabylake PCIe bridge controller has a hardware bug triggering it
15:38kherbst: and I was wondering if you might be able to help figureing out who to poke at intel to debug this
15:39diogenes_: kherbst, i used to have rando lockups and even random sudden poweroffs with intel and these two helped to solve the issue: processor.max_cstate=0 intel_idle.max_cstate=2
15:40kherbst: it's not a CPU thing though
15:40kherbst: it's for powering down the nvidia GPU
15:40kherbst: but the pcie controler is on the CPU die
16:33mupuf: kherbst: I might have access to some engineers who could help
16:33kherbst: that would be very helpful
16:34mupuf: please document everything in a bug, and I will see what I can do
16:34kherbst: mupuf: but.. I think there is some errate already, but it might be under NDA between Microsoft and Intel
16:34kherbst: or something
16:34kherbst: mupuf: yeah.. it's actually not that much. 2x nvapoke + 2x setpci are enough to trigger this bug :)
16:34kherbst: sadly the pci register I poke on the intel chip isn't documented
16:35kherbst: but it's the one used by the firmware