00:20karolherbst: imirkin: ohh, I found it
00:21karolherbst: imirkin: there is this result->pipeline_statistics.cs_invocations field we never wrote to
01:33imirkin: karolherbst: ah yeah... we don't support CS invocations
07:41karolherbst: imirkin: but hm, we could still fix the issue of reporting 0 if "nothing happens". It's better to report 0 than random numbers for compue shader invocations
09:49karolherbst: ohhhh, I think I slowly recognize a pattern in the 3d image tiling thing
13:28sdkj: imirkin: would you like to push those mesa fixes for nv40 upstream?
14:11imirkin: what fixes?
14:12imirkin: karolherbst: yeah, can always fill in 0
14:56karolherbst: imirkin: okay, and you prefer if the driver would do that?
14:57karolherbst: ohh, you replied
15:02karolherbst: okay, and I have a pretty good understand of what is going wrong with the load/store test now
15:03karolherbst: instead of putting the tiles together on the X axis, nouveau does it on the Z axis
15:04karolherbst: so it's even more broken than the warning in the code let us assume
15:05chuckdaniels: hi there, is it possible to reduce voltage using nouveau on an old 9800GTX+?
15:05karolherbst: chuckdaniels: only if the vbios allows it
15:05karolherbst: chuckdaniels: or do you mean undervolting?
15:05chuckdaniels: i just want to reduce performance/consumption
15:06karolherbst: check the pstate file in /sys/kernel/debug/dri/0/
15:06RSpliet: chuckdaniels: there's two issues at hand
15:06karolherbst: but maybe there are still reclocking issues as RSpliet might let us assume
15:07RSpliet: first: voltage. For some reason I have in the back of my head that this generation of GPUs doesn't really vary voltage on desktop cards. Unfortunate, because that's where the biggest win in power consumption comes from, but there's nothing we can do
15:07RSpliet: secondly... mmm... let me look up the exact gen that 9800GTX comes from.
15:07karolherbst: G92 afaik
15:08RSpliet: G92... no sorry, at this moment we don't support changing the performance modes
15:08chuckdaniels: ok thanks guys
15:08RSpliet: There's incomplete code in nouveau for these (and older) cards, sorry
15:08chuckdaniels: at least with nouveau my temperatures are lower
15:08chuckdaniels: compared to nvidia blob
15:08RSpliet: It could well boot in a lower performance level already. you can read it out from the pstate file that karolherbst pointed at
15:09karolherbst: Tesla are frigging hot GPUs anyway
15:09chuckdaniels: 0f: core 740 MHz shader 1836 MHz memory 1100 MHz AC: core 399 MHz shader 810 MHz memory 399 MHz
15:09RSpliet: the last line will tell you the current clocks, the lines above contain the "performance levels" defined by the video bios
15:09imirkin: karolherbst: right, well nouveau enables tiling along the Z axis, so the layout is different than the images code expects, which expects it to not be tiled along Z
15:09RSpliet: yeah, so it's running roughly at 40-50% speed, hence it's cooler. Not much we can do to make it consume even less
15:10chuckdaniels: RSpliet: thanks!
15:10RSpliet: (although block level clock gating and engine level power gating could shave half a watt off or sth. Not a huge impact on cards like that :-P)
15:10imirkin: karolherbst: what you need to figure out is if the nvidia driver de-tiles the 3d image before running the shader, or if its shader knows how to compute the texel address with the z-tiling in place
15:12chuckdaniels: well nouveau 62ºC > priv. nvidia 81ºC
15:12chuckdaniels: so it is good enough anyways
15:12karolherbst: imirkin: currently I assume the latter, at least it uses also gl_FragCoord.zw which we don't
15:13RSpliet: chuchdaniels: if anything, future nouveau changes will unleash more perf rather than save power on that card :-P
15:13chuckdaniels: RSpliet: good to know ;)
15:13imirkin: karolherbst: gl_FragCoord is wholly unrelated to this...
15:13karolherbst: RSpliet: don't forget about Lyudes clock gating work ;)
15:13karolherbst: imirkin: sure?, because it gets feed into the suclamp operation
15:14RSpliet: karolherbst: I literally mentioned that 8 lines above
15:14karolherbst: RSpliet: ohh
15:14RSpliet: lyude: how is progress on that area anyway? :-)
15:14imirkin: pretty sure...
15:15karolherbst: mhh, well the GLSL IR looks like this: (call __intrinsic_image_load (var_ref _ret_val) ((var_ref u_source_image) (swiz xy (expression ivec4 f2i (var_ref gl_FragCoord) ) )))
15:15imirkin: gl_FragCoord.z is the incoming frag depth. .w should be irrelevant...
15:15imirkin: yeah, that's the .xy
15:15karolherbst: right, but nvidia also uses zw or something totally unrelated I didn't see yet
15:15karolherbst: maybe the latter
15:15karolherbst: most likely
15:15chuckdaniels: thank for the info guys, have a nice day!
15:16karolherbst: I should see what nvidia puts into c0[0xf00] to c0[0xf0c]
15:17karolherbst: uhm.... wait
15:18karolherbst: a[0x70] is gl_FragCoord? I seriously need to dig into all that before doing crazy assumptions
15:18karolherbst: silly me
15:20karolherbst: ahh okay, this makes sense. x = x * c0[0xf00] + c0[0xf08]; y = y * c0[0xf04] + c0[0xf0c]; is what nvidia does
15:50sdkj: imirkin: this bugfix: https://bugs.freedesktop.org/show_bug.cgi?id=102349#c3
16:16anEpiov: how's opencl coming along?
16:27anEpiov: is it really bad 128bit bandwidth?
18:35karolherbst: anEpiov: there is no OpenCL yet, but there might be one in the future
18:37anEpiov: karolherbst: ha ha ha!!!
18:37anEpiov: that's the most re-assuring statement ever!
18:37karolherbst: I know
18:38karolherbst: but I think it could be done, there is already enough work done here to be pretty close, but it needs to be finished and the last bits are always the most annoying ones
18:41anEpiov: what percentage is the last bits? 10%? 5%?
18:41anEpiov: it would help having opencl on some places.
18:42anEpiov: speed up things.
18:42karolherbst: ask pmoreau
18:42anEpiov: opencl also is pervassive nowadays.
18:46imirkin: anEpiov: the last 20% takes 80% of the time
18:47imirkin: opencl is nowhere close for nouveau, in my estimation
18:48anEpiov: ooohh my gaawwd1!!
18:49anEpiov: why? the documentatin is all open source, schematics, no shortage of examples etc...
18:49karolherbst: but not the hardware doc
18:50karolherbst: and it isn't open source afaik
18:50tobijk: and nobody seems to be interested in finishing opencl right now :>
18:52karolherbst: well anEpiov seems interested
18:52tobijk: patches are always welcome
18:53tobijk: so anEpiov, start hacking if you want opencl :)
18:53anEpiov: which language?
18:53anEpiov: pmoreau: can I help with opencl??
19:20imirkin: anEpiov: why aren't you sending patches to implement all this stuff?
19:21imirkin: sounds like it should be trivial, the way you're talking about it...
19:49karolherbst: imirkin: sometimes you just have to let people try something out ;)
19:54imirkin: perfectly happy to do so
19:56imirkin: i just hate the implication that all this stuff is soooo trivial
19:58krutonium: It never is :3
19:59karolherbst: ohh, we are already playing here on ultra hard, so the base line is already super high ;)
20:00imirkin: and if only all those ingrate people who volunteer their time to develop the software would just stop being so lazy, then the world could have all this awesome stuff.
20:02tobijk: we should add another layer: implement an ai which is aimed to re nvidia hardware, simple :)
20:03imirkin: the opencl problem is largely a software one though
20:04tobijk: imirkin: well as long as you don't want to have thread safeness :>
20:05tobijk: as calling opengl work from different threads, i imagine that a problem for opencl as well
20:06imirkin: right, but ... that's a software problem
20:06imirkin: not a hardware RE issue
20:06tobijk: yet a bigger issue to be fixed (first)
20:06tobijk: and dont get me wrong, i know it is not trivial
20:07imirkin: we've got our best man on it :)
20:07tobijk: the void? or are you working on it again?!
20:08imirkin: i'm hardly the best man
20:09imirkin: that's how i ended up with this...
21:06pmoreau: anEpiov: Sure! Can you ping me again tomorrow? I’ll have recovered from the travel, and started to settle back in my appartment after being away from it for several months.
21:09anEpiov: mm.. no wonder opencl isn't advancing :/
21:10pmoreau: I can’t take 2 days of OpenCL?
21:11anEpiov: 'travel... away ... several months'
21:11pmoreau: I have been working somewhere else for several months, doesn’t mean I didn’t work on OpenCL during that time.
21:13RSpliet: anEpiov: without trying to step in the middle, please understand that pmoreau's efforts are entirely voluntarily, OpenCL support for nouveau is not his (or unfortunately anyone's) daytime job. Your helping hands are very welcome indeed! :-)