11:28karolherbst: Lekensteyn: heh... I can reproduce the runpm bug on a desktop now
18:32ReinUsesLisp: Hi, how does nouveau handle ARB_clip_control's flipped sampling on nvc0?
18:32ReinUsesLisp: I'm not talking about the flipped rendering, I've seen that it uses a positive viewport instead of OpenGL's negative
20:35imirkin_: ReinUsesLisp: nvc0 doesn't really care about any of that stuff
20:35imirkin_: ReinUsesLisp: st/mesa normalizes it all, the values Just Work (tm)
20:36imirkin_: the only bit of ARB_clip_control that requires driver support is the half-z stuff, which is a rasterizer setting (rast->clip_halfz)
20:36imirkin_: basically it controls whether depth is -1..1 or 0..1
20:37imirkin_: ReinUsesLisp: why do you ask?
20:37imirkin_: (the hardware *does* have some setting to flip y coords somewhere, but we never use it)
20:38imirkin_: (there's also a special reg where the setting of that value may be retrieved in a frag shader)
20:39ReinUsesLisp: while emulating nvc0 hardware I'm getting "flipping" issues that are different on OpenGL and on Vulkan, on OpenGL at the moment I'm flipping gl_Position.y at the end of the vertex shader while on Vulkan I use a negative viewport
20:39ReinUsesLisp: yes, Y_NEGATE flips the value on an S2R register
20:39ReinUsesLisp: it doesn't seem to affect rendering though
20:39ReinUsesLisp: what's the other register?
20:39imirkin_: the Y_NEGATE is the thing i'm talking about
20:39imirkin_: note that there's more to the coordinate flip
20:40imirkin_: you also need to know the window width/height
20:40imirkin_: since the ultimate coord is width - y
20:40ReinUsesLisp: does Y_NEGATE affect rendering or just S2R?
20:40imirkin_: i'm not 100% sure what the proper method of usage of this feature is
20:40ReinUsesLisp: oh, we are not using the window coordinates at all
20:40imirkin_: tbh, i am not 100% sure.
20:40imirkin_: i've never use it
20:40imirkin_: nor have i analyzed its usage
20:41ReinUsesLisp: the blob uses Y_NEGATE on OpenGL to implement dFdy
20:41imirkin_: i just know there's a flag for it on some method somewhere, and the special reg which retrieves its value.
20:41imirkin_: for a winsys fb, yeah
20:41ReinUsesLisp: I think it might also use it on NVN when it's using LOWER_LEFT
20:42ReinUsesLisp: but don't quote me on that one since shaders are precompiled there
20:42imirkin_: this is handled with uniforms in mesa
20:42imirkin_: which get set based on the current state of things
20:42imirkin_: since it's generic code, it can't rely on the Y_NEGATE stuff
20:42imirkin_: and it's not like this is perf-sensitive
20:43imirkin_: so we've never bothered to care.
20:43imirkin_: you also need this for interpolateAtOffset
20:43ReinUsesLisp: how's the window height related to the viewport height?
20:44imirkin_: i use them interchangeably
20:44imirkin_: er, i meant: https://cgit.freedesktop.org/mesa/mesa/tree/src/mesa/state_tracker/st_glsl_to_tgsi.cpp#n6334
20:44imirkin_: which will internally affect gl_Position.xy
20:45imirkin_: (er, just .y obviously)
20:46ReinUsesLisp: nice, I'll investigate what does the blob do on OpenGL and NVN with window coordinates
20:46imirkin_: and by gl_Position i of course mean gl_FragCoord
20:46ReinUsesLisp: hehe, I get you, it's the same abuf address
20:48ReinUsesLisp: does st/mesa flip in-shader texture coordinates depending on its ARB_clip_control state?
20:48imirkin_: it always multiplies by a uniform
20:49imirkin_: which will either contain 1.0 or -1.0
20:49imirkin_: depending on the flip state
20:49imirkin_: so it's very much like Y_NEGATE
20:49imirkin_: afaik, that reg will contain the float 1.0 or -1.0. not 100% sure though.
20:49ReinUsesLisp: but, does it do that in the shader?
20:49imirkin_: look at the first link for e.g. dFdy
20:49ReinUsesLisp: thanks, wanted to confirm
20:51imirkin_: it's also done for interpolateAtOffset
20:51imirkin_: and for adjusting gl_SamplePosition.y
20:51imirkin_: (i think)
21:11ReinUsesLisp: what does TRIANGLE_RAST_FLIP do? we are hacking it as a front face flip at the moment
21:14imirkin_: no, i think that's something else
21:14imirkin_: where is that?
21:15imirkin_: so, faceness CW vs CCW is handled elsewhere
21:15imirkin_: this, i can only imagine, means flipping the rasterization so that it goes bottom-up
21:16imirkin_: and then you don't have to mess around with the height thing for gl_FragCoord
21:17imirkin_: unfortunately the people who did the RE on all this are no longer around, and i don't quite know what it all means myself =/
21:17imirkin_: maybe mwk knows?
21:19imirkin_: ReinUsesLisp: https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/nvc0/nvc0_state.c#n281
21:23imirkin_: ReinUsesLisp: btw, as you guys discover stuff, let us know too :)
21:23ReinUsesLisp: taking a look, the blob seems to use different VIEW_VOLUME_CLIP_CTRL's UNK11 values
21:24ReinUsesLisp: yes, sure :P
21:24imirkin_: the view volume thing is ... quite confusing
21:24imirkin_: there's a value for a hard 0..1 view volume (on depth) which probalby makes sense for a winsys buffer.
21:24imirkin_: dunno what that unk11 thing does precisely...
21:25imirkin_: it's so hard to test these things.
21:27Lekensteyn: karolherbst: huh, how? the link speed issue? How can you turn off the power without help from ACPI?
21:27karolherbst: imirkin_: btw, I will buy a jetson nano
21:27karolherbst: Lekensteyn: because it's just a pci config register to turn of the link
21:27karolherbst: anyway, I wasn't able to reproduce the issue afterall, by resume path was broken
21:27karolherbst: I am able to shut off the link
21:28karolherbst: Lekensteyn: ACPI just uses the bridge + 0x248 bit 0x80 to turn it off and 0x100 to turn it back on
21:28karolherbst: and you can do the same on a desktop system
21:28karolherbst: no idea if the power is cut as well though
21:28karolherbst: the GPU isn't accessable anymore at least
21:29karolherbst: that gave me an idea I want to test on my mac mini to just power down the nvidia GPU and see if that reduces heat generation :)
21:30Lekensteyn: hm, if only documentation was available for those registers...
21:30karolherbst: the ACPI code of the desktop contains references to the Q0L2 field as well
21:30karolherbst: actually most of the code is the same as on a laptop
21:30karolherbst: just the GPU power resource stuff is missing
21:31karolherbst: Lekensteyn: only problem is, the CPU is a coffee lake one
21:31karolherbst: so no idea if they just fixed the issue there
21:31karolherbst: or it doesn't happen on a desktop
21:31karolherbst: on cannon lake the issue is fixed for sure
21:31karolherbst: I already tested it on two laptops
21:32karolherbst: and runpm works on cannon lakes
21:32karolherbst: so right now I blame sky lake and kaby lake
21:33karolherbst: Lekensteyn: anyway, I will try and see if there are some erratas under NDA or something available and maybe we can get something worked out here
21:34karolherbst: windows obviously doesn't show this issue
21:36karolherbst: imirkin_: and I would like to have a post merge mesa CI runner running test on the nano
21:36karolherbst: probably only the CTS for now as this is much simplier to manage than piglit
21:37imirkin_: karolherbst: cool
21:37karolherbst: and if that works out, I might buy two or three more
21:38imirkin_: hopefully on RH's dime
21:38karolherbst: the nanos are like $100 + shipping/taxes
21:38karolherbst: I already got informal approval for the one
21:39karolherbst: and they don't have on-chip storage, so also a microSD card is needed, but those are quite cheap as well
21:39airlied:expcets skeggsb has one sitting unused in a box :-P
21:39karolherbst: no idea, don't think so
21:39karolherbst: or maybe
21:39karolherbst: anyway, I would like to build some very basic CI for nouveau for that
21:39karolherbst: and if that works out, it should be easier to get funding for other chipsets
21:39karolherbst: just that the other chipsets require more money
21:40imirkin_: what's the nano? maxwell?
21:40karolherbst: 128 cores :D
21:40airlied: the biggest problem with CI is once youu have it you need throughput
21:40airlied: at which point just getting an x86 machine and a GPU wins
21:40karolherbst: well depends on what your goal is
21:40airlied: esp if you are compiling on i
21:40karolherbst: at first post merge is totally fine
21:41karolherbst: ohh, I don't plan on compiling on that one
21:41karolherbst: my initial idea is to build some docker images on some machine and just push them to the nano
21:41karolherbst: and the nanos are just running those to run the tests
21:41airlied: but CTS does some stupidly cpu intenseive crap as well
21:41karolherbst: that's why multiple nanos
21:41karolherbst: if the nano is just able to do one run per day, so be it
21:42karolherbst: it's better than what we have today
21:42karolherbst: and having a daily report on whether re regressed or not is good enough :)
21:42karolherbst: anyway, it's $150 for getting something working
21:42karolherbst: and if it doesn't work out, it's just $150
21:42karolherbst: and I can use the board for other testing