00:09 nyef: Feels like there's something not quite right with this theory.
00:12 nyef: Oh, and I should check on the default state of the panel power GPIO on this rig before I switch it back. Nuisance!
00:20 nyef: Is "INIT_GENERIC_CONDITION: unknown 0x07" important?
00:30 rhyskidd: nyef: perhaps see some context here https://github.com/envytools/envytools/pull/75
00:30 rhyskidd: if you find a definitely answer, would welcome knowing
00:31 nyef: Hrm. Thank you.
00:35 karolherbst: imirkin: I am already running the CTS over the patch
00:35 karolherbst: let's see how happy it will be
00:38 nyef: ... I'm looking at the wrong register for gpio on this card, aren't I?
00:38 karolherbst: but yeah, the patch looks suspicously trivial... I never looked into it exactly because of that comment about manual handling
00:45 kubast2_: Btw is there away to check gpu usage on nouveau ?
00:45 kubast2_: Out of the blur
00:45 kubast2_: *blue
00:52 karolherbst: kubast2_: I have some patches for that, but they need to be cleaned up
00:53 nyef: Hrm. Either 0xe104 is the right address, and the value I have is 0xbadf1000, or it's not the right address, and the value of 0xbadf1000 is bogus.
01:07 nyef: ... stupid question, but what's the easy way to grab the VBIOS from a running system?
01:10 rhyskidd: cat /sys/kernel/debug/dri/0/vbios.rom > vbios.rom
01:12 nyef: Thank you.
01:14 nyef: ... and it's GPIO line 11 on this system. Yeah, I was WAY out.
01:16 nyef: "0000d63c: 00006081", if I'm now actually on target.
01:17 nyef: And it remains so over a suspend cycle.
01:23 karolherbst: imirkin: all the ms image tests indeed pass with his patch
01:34 nyef: And with the eDP back in play, I get "0000d63c: 00007000". Within predictions.
01:41 nyef: And with it powered off (because suspend), I get "0000d63c: 00002000".
01:41 nyef: Hrm.
01:42 nyef: Okay, what do the lower bits mean, and does this imply that we have a way to detect that there actually IS an eDP panel, even when it's powered off?
01:46 nyef:sighs.
01:46 nyef: Apparently not quite.
01:47 nyef: 0x6081 is output enabled, output off, input on, mode special_sor, special_idx sor0_panel_power.
13:18 nyef: G'morning all.
13:19 nyef: Is there a way to grab the VBIOS from a system running the nvidia blob without rebooting it?
13:51 karolherbst: imirkin: do you want to take a deeper look at the MS images for maxwell patch? Tests seems to pass and I doubt it will break anything.
13:55 imirkin: karolherbst: if tests pass, all good with me
13:55 imirkin: i recall the blob doing something different, but meh
13:55 karolherbst: I mean, the blob does odd things even without MS
13:55 imirkin: ;)
13:55 imirkin: well, i remember the blob doing address calculations for MS
13:55 karolherbst: anyway, the CTS is happy
13:55 karolherbst: mhh
13:55 imirkin: basically accessing it buffer-style
13:55 imirkin: instead of 2d-image-style
13:55 karolherbst: mhh
13:56 imirkin: but that could have been due to any number of reasons
13:56 karolherbst: maybe some weird reasons
13:56 imirkin: first off, this was 2-3y ago
13:56 imirkin: and my memory can quickly fade
13:56 karolherbst: maybe it is faster or something ....
13:56 imirkin: yeah. or maybe it doesn't work with some of the compression modes.
13:56 imirkin: ;)
13:56 karolherbst: yeah
13:56 imirkin: although, i dunno why buffer loads would work either
13:56 karolherbst: might be
13:57 karolherbst: maybe I do a piglit run and check if something comes up there
13:58 karolherbst: or well, selected tests
14:03 karolherbst: well, even piglit seems happy
14:04 imirkin: sounds all good to me then
14:05 imirkin: hmmmmmmm
14:05 imirkin: bumping up the width/height is a bit of a hack
14:05 imirkin: i guess it's fine though
14:05 imirkin: since the MS stuff will get ignored with images
14:05 imirkin: yeah ok
14:05 karolherbst: yeah
14:13 karolherbst: I was already wondering if we would do something similiar for 3D images, like simply offseting the y coordinate by the height of one layer, but I think 3d images don't really work that way? Or maybe they do, I have no idea how they are actually stored inside memory, because I didn't check yet
14:13 imirkin: 3d images can have 3d tiling
14:13 karolherbst: uhh
14:13 imirkin: so your offset idea won't work. that's why 2d arrays "work" but 3d doesn't -- for 2d arrays, we offset the tic's address
14:14 karolherbst: okay, I see
14:14 imirkin: for the 3d images, you have to store the layer offset separately
14:14 imirkin: and then stick it in as the z coordinate when it's being accessed as a 2d image
14:14 imirkin: of course you have no clue when it happens this way, so you basically have to retrieve it for any 2d image access
14:14 karolherbst: okay, so we simply load the offset from the driver constbuf, offset the real address and then we should be good?
14:15 imirkin: and keep the texview target as 3d
14:15 karolherbst: well, the nvidia shader looks way more insane than what we do anyway
14:15 imirkin: well, it's a 2d image
14:15 imirkin: so there's no "natural" z index
14:15 imirkin: so you just use that as the z index and move on -- no need to offset anything
14:15 karolherbst: ahh, right
14:16 imirkin: oh, but you might need a target flag on the suld? if so, you're in trouble.
14:16 karolherbst: nvidia ends up doing this: "suld d t2d b128 ign $r0 g[$r8] $r10"
14:17 karolherbst: what is the last argument by the way?
14:17 nyef: Hrm... GENERIC_CONDITION 0x00 is "only run on eDP"?
14:21 pendingchaos: karolherbst: IIRC r10 is the image, r8,r9,... is the coordinates and r0,r1,... is the destination
14:22 karolherbst: pendingchaos: yeah that would make sense. My hope is that envydis is just displaying stuff wrongly
14:22 nyef: And how do I find what R[0x4061c040] maps to?
14:22 karolherbst: or maybe that's actually fine
14:22 karolherbst: just using g[] looks a bit odd to me
14:25 nyef: Ah, it's an OR-specific offset.
14:25 nyef: That begins to make sense.
14:31 Subv: hey, in maxwell, what's the difference between the CULL_FACE and D3D_CULL_MODE methods?
14:35 imirkin: Subv: d3d and GL enable slightly different features i think
14:35 imirkin: the two are mutually exclusive -- one overwrites the other, i think
14:35 Subv: the last one called before the draw wins?
14:35 imirkin: Subv: specifically d3d9 has different stuff
14:36 imirkin: for d3d10+ and opengl, the CULL_FACE one is used
14:36 karolherbst: imirkin: converting to 3d indeed works
14:37 imirkin: karolherbst: i wouldn't super-trust the decoding of image methods in envydis btw
14:37 karolherbst: yeah, I noticed
14:37 imirkin: and they're confusing in nvdisasm as well, unfortunately
14:37 imirkin: lots of unexpected arguments
14:37 imirkin: used in unexpected ways
14:37 karolherbst: anyway, we only know from an API point of view, that we have a 3D image, which sucks a little :(
14:38 imirkin: karolherbst: that's my point on why this sucks
14:38 karolherbst: yeah
14:38 imirkin: if the shader has to know it's a 3d image
14:38 imirkin: then you have to have variants and whatnot
14:38 imirkin: ohhhhh
14:38 imirkin: but
14:38 imirkin: you can cheat
14:38 imirkin: which is my favorite thing to do anyways
14:38 imirkin: i created this concept of fixups
14:38 imirkin: which allow you to perform arbitrary manipulation on the output binary of an operation
14:38 karolherbst: uhm, how can I "insert" a src?
14:39 imirkin: so such a fixup could flip between 2d and 3d
14:39 karolherbst: I doubt that would really work here
14:39 imirkin: and the generating code could always generate the 3d thing
14:39 imirkin: slightly sucks here, but would be easy
14:39 karolherbst: because: we also have to fixup srcs
14:39 imirkin: it would pessimize the super-common case of "2d image"
14:39 imirkin: nah, the srcs could always be correct
14:39 imirkin: if you always load the offset index
14:39 karolherbst: ohh wait
14:39 imirkin: and always feed it 3 coord args
14:39 Subv: mm, i see, i'm seeing a game enable face culling (via CULL_FACE_ENABLE) but the triangles it submits seem to be in the opposite winding order that it configures so they all get culled, i wonder what's going on there
14:39 karolherbst: instead of reading a 92 bit reg, it just ends up reading 64
14:40 karolherbst: right
14:40 imirkin: Subv: perhaps it's on purpose?
14:40 karolherbst: imirkin: do you really think it makes any difference in the end though?
14:40 pendingchaos: wouldn't that fail with bindless?
14:40 karolherbst: like a 2d operation being cheaper or whatever than a 3d one?
14:40 imirkin: karolherbst: no. image loads/stores are super-slow compared to just about anything else
14:40 imirkin: pendingchaos: yes, it will
14:40 karolherbst: okay, so why other for now?
14:40 imirkin: pendingchaos: bindless has to be redone
14:40 karolherbst: *bother
14:41 imirkin: i realized that when i was 90% of the way through it
14:41 Subv: i don't think so, that would lead to a black screen instead of the loading screen it usually shows, this is just a texture shown onto a quad
14:41 imirkin: but decided i didn't care enough.
14:41 imirkin: Subv: there's also a winding
14:41 karolherbst: ahh, we have Instruction::moveSources
14:41 imirkin: Subv: i.e. you can say whether the front face is cw or ccw
14:41 Subv: i'm using all three methods, CULL_FACE_ENABLE, FRONT_FACE and CULL_FACE
14:42 Subv: the specific configuration set is {true, CCW, Back}
14:42 imirkin: but it feeds the quad in CW, so it gets culled?
14:43 imirkin: oh, there's an extra flip involved
14:43 imirkin: depending on where y=0 is
14:43 Subv: the submitted triangles are in CCW, but the vertex shader output seems to be CW
14:44 imirkin: Subv: remind me what you're doing?
14:44 Subv: Nintendo Switch emulator, GPU is GM20B
14:45 Subv: you can configure the origin?
14:48 karolherbst: imirkin: mhh, if I simply convert all 2D surface load and stores to a 3D with z being 0 some tests start to fail, so maybe we can't really do that for non 3D images in the end
14:49 karolherbst: but right, then we kind of have to fixup the binary
14:55 kubast2_: mfw nvidia driver doesn't start any steam games no more so using nouveau is actually a must 🤔
14:55 kubast2_: 🤔
14:56 imirkin: Subv: yeah. there's a method for this...
14:56 imirkin: Subv: SCREEN_Y_CONTROL
14:57 imirkin: all this stuff is extremely confusing... there's like 10 different ways to flip things
14:57 imirkin: that's for point sprite replacement
14:57 imirkin: which is something else
14:58 imirkin: (it has various uses in old-gl, but in new-land, it's basically for gl_PointCoord)
14:58 imirkin: or whatever that builtin is called
14:58 imirkin: Subv: https://cgit.freedesktop.org/mesa/mesa/tree/src/mesa/state_tracker/st_atom_rasterizer.c#n77
14:59 imirkin: so there are not one but *two* things that can flip the front_ccw setting
14:59 imirkin: and nouveau never sets the SCREEN_Y_CONTROL thing, but the blob driver might
15:00 imirkin: Subv: are you trying to emulate the switch using GL, or by passing through to a maxwell GPU?
15:00 Subv: using opengl
15:00 Subv: i know how crazy that sounds
15:00 imirkin: kubast2_: i strongly suspect that someone somewhere has been able to start steam games on a blob driver, so it's probably an issue with your configuration.
15:00 imirkin: Subv: both are pretty crazy :) just wanted to make sure i knew which variant of crazy.
15:01 imirkin: Subv: as you well know, emulating all this stuff on GL is not straightforward - there's just no way to do certain things.
15:01 imirkin: hopefully games "don't do that"
15:01 Subv: "hopefully <insert smirky face from Nintendo>"
15:01 nyef:remembers reading about the dolphin-emu shader thing.
15:02 imirkin: nyef: dolphin generates GL shaders for fixed-function hardware, so it's a lot more reasonable
15:02 Subv: but yeah i'm sure we'll reach some impossible thing soon, maybe by that time Vulkan will be a better alternative, i do not know
15:02 imirkin: and that fixed function hw is a lot more limited in what it can do
15:02 imirkin: in general, if you can implement X on Y, you can't implement Y on X ;)
15:03 imirkin: Subv: well, it's not like you'll be able to do these things in vulkan either. the hw covers a much wider set of possible use-cases
15:03 nyef: In specific, if X and Y are turing machines, you can, but now you have no I/O.
15:03 imirkin: nyef: sometimes you can. but in the real world, Y tends to be more generic than X
15:04 imirkin: and there's the implicit "in a performant manner" in all of this, which theoretical turing machines aren't concerned by
15:04 Subv: fwiw, we already hit a roadblock somewhere wrt shader synchronization instruction
15:04 nyef: I know. I used to be heavily into writing emulators myself, and still do on occasion.
15:04 imirkin: Subv: which one?
15:04 Subv: shfl
15:04 imirkin: Subv: there's an ext for that
15:04 imirkin: it's not synchronization btw
15:04 Subv: doesn't that only work on nvidia?
15:04 imirkin: it's inter-late shuffle
15:05 imirkin: better than not working anywhere ;)
15:05 nyef: inter-lane?
15:05 imirkin: yes. inter-lane.
15:05 Subv: and it expects a specific number of threads in a warp, which may not hold for all hardware types
15:05 imirkin: sssorta
15:05 imirkin: the more common variant is likely SHFL.IDX
15:06 imirkin: which broadcasts one lane's value across all lanes
15:06 imirkin: and that index is not a fixed index but computed using various masks
15:06 imirkin: a lot of this is available in ARB_shader_group_*
15:06 imirkin: but not 100%
15:07 imirkin: GL_NV_shader_thread_shuffle -- that's the ext for the literal op
15:09 imirkin: GL_ARB_shader_ballot -- this covers the SHFL.IDX case, but not the others
15:10 kubast2_: imirkin: ik it worked just fine just yesterday
15:10 kubast2_: I was testing fan control on stocj kernel with proprietary and it worked
15:10 kubast2_: But day after any game doesn't run with nvidia gpu besides non steam ones
15:10 kubast2_: Haven't checked bouveau yet
15:11 kubast2_: Still I noticed a glitch here and there in dead island both in geometry render and loading menu
15:11 kubast2_: *slight glitch
15:12 kubast2_: Vs intel and vs nvidia driver
15:14 imirkin: kubast2_: maxwell, right?
15:14 kubast2_: Yup
15:15 imirkin: yeah, maxwell rendering has issues
15:15 imirkin: stuff randomly goes wrong sometimes
15:15 imirkin: no idea why
15:15 kubast2_: Yeh
15:15 imirkin: since nvidia isn't releasing firmware for maxwell and up, haven't thought it too important to investigate.
15:16 imirkin: kidna sucks for the GM10x people, for whom signed firmware isn't necessary
15:16 imirkin: if you're desperately looking to use nouveau, i strongly recommend kepler. otherwise i strongly recommend amd.
15:17 kubast2_: I see I will be getting a keppler test card and yeh that's my laptop it' allready a month old at least
15:17 kubast2_: Can't return it since I have replaced thermal paste in it
15:18 kubast2_: Even chainging ram makes this sorta glue thing set off so the warranty might have been gone right after I added in ram
15:21 nyef: ... I don't know if I'm more horrified that you messed with a laptop that was still under warranty, or that you got a laptop that was still under warranty in the first place. d-:
15:21 kubast2_: Lol I mean 4GB of ram was a little low under windows :d
15:23 kubast2_: And I got IT technician title(people who manage to get it range from "I play games" to "I create cheats for games")
15:23 kubast2_: nyef I mess with my phones under warranty
15:23 kubast2_: Why not a laptop
15:24 nyef: Okay then. I usually just buy out-of-warranty hardware to begin with.
15:24 kubast2_: Not my cash/decision 🤷
15:25 imirkin: nyef: not everyone has your level of impulse control :)
15:26 nyef: Heh. Impulse control, he calls it.
15:26 kubast2_: My dad didn't question it while my mom was mad when she saw the heatsink
15:26 kubast2_: lol
15:26 imirkin: nyef: every so often, someone wants to buy a toaster oven from the store rather than off craigslist :p
15:26 kubast2_: Cause it's payed off only in half
15:27 nyef: ... If I buy a toaster oven from the store, it'll be because I'm planning to turn it into a reflow oven.
15:27 kubast2_: imirkin well I could use my old hp laptop until it would die and a desktop but they decided to buy me a new laptop so I sorta nodded in to that
15:28 imirkin: nyef: yeah, want to get high quality gear for that. don't want old breadcrumbs getting reflowed into the pcb ;)
15:28 nyef: Exactly!
15:29 nyef: Or bacon grease, for that matter.
15:29 kubast2_: How does voltage control work btw? Does gpu bios sets the upper limit ,firmware/driver or it's done in hardware?
15:29 kubast2_: As in the upper limits
15:29 imirkin: kubast2_: at the hw level? you set it to whatever you want
15:30 imirkin: the bios specifies what levels to set it to
15:30 imirkin: and it would behoove one to respect it if one doesn't want to fry the board
15:30 nyef: Hunh. The GENERIC_CONDITION 0x07 thing only kicks in when I have the LVDS panel attached.
15:31 imirkin: nyef: skeggsb looked into what that thing did, and decided it wasn't worth worrying about, i think
15:31 nyef: Yeah, in this case it seems like it affects something like 6-8 SOR register values.
15:32 imirkin: i just mean he looked into how the condition was computed
15:32 nyef: Ahh.
15:32 imirkin: basically there's a way to toggle execution
15:33 imirkin: and the *_CONDITION things set that toggle based on ... stuff. depending on the specific op.
15:33 nyef: So, not worth worrying about, but also not worth suppressing the "unknown" message? (-:
15:33 imirkin: i guess not. his call. doesn't bother me enough.
15:33 nyef: Yeah, I got the basics on how the script interpreter works over last night and this morning.
15:34 imirkin: ;)
15:34 imirkin: and then you can look at the vbios which has the x86 interpreter in it as well
15:34 imirkin: to figure out what to do with the thing
15:34 imirkin: but it's not always easy to parse through. it used to be this nice simple table dispatch thing.
15:34 imirkin: not so much anymore, i think =/
15:35 nyef: Is disassembling that fair game, or should it just be run under emulation with various data corruptions?
15:35 imirkin: but i'm sure ben's got it figured out
15:35 imirkin: IANAL :)
15:35 nyef: Heh.
15:36 nyef: More asking after nouveau policy than legal advice here. d-:
15:36 imirkin: we're just like the army
15:36 imirkin: don't ask, don't tell
15:36 nyef: Ah, the plausible deniability approach. That works, I guess.
15:40 imirkin: pendingchaos: btw, i dunno if you're interested in this sort of thing, but...
15:40 imirkin: i just got reminded of this by jason's recent patch for nir (which covers a somewhat different case)
15:40 imirkin: texture operations have a "live only" flag ("NDV" iirc?)
15:41 imirkin: basically this is the difference between the texture op retrieving results for each fragment in the quad, or for only the ones that are covered/non-killed
15:41 imirkin: the reason why you wouldn't always just set this is that it affects derivatives
15:41 imirkin: (the automatic kind)
15:42 imirkin: so we can flip on the liveonly flag if none of the texture's *results* are ever used in a value chain that leads to another texture's *coordinates* which doesn't have an explicit lod or explicit derivatives.
15:42 imirkin: also operations like ddx/ddy obviously also use other quads' info, so also need to be checked in this value chain thing
15:43 imirkin: i think there's a trello card about this already
15:44 nyef: Umm... Did the various multi-threading issues ever get sorted out, btw?
15:44 imirkin: this is true in the majority of texture usage, so can be a big boost esp in high geometry situations
15:44 imirkin: nyef: no
15:44 nyef: Ah. Damn.
15:45 pendingchaos: yeah, there is: https://trello.com/c/bW7SYHtW/56-add-pass-to-set-liveonly-on-tex-instructions-when-possible
15:46 nyef: Is the threading damage purely userland, or is there also a kernel component to it?
15:46 pendingchaos: apparently there's some hanging problems with Payday2 and with texlod
15:47 imirkin: my guess is there's something wrong with the patches
15:47 imirkin: iirc i wasn't a huge fan of Karol's approach, although this was a very long time ago
15:47 kubast2: So I have tried to makepkg -S https://github.com/hakzsam/archlive-nouveau/tree/master/pkg/xf86-video-nouveau-git this since the one in aur doesn't want to install alongside mesa-git install ,but doing makepkg -S has an empty src directory and 1071 bytes tar ball
15:48 imirkin: karolherbst committed on May 28, 2016
15:48 imirkin: so yeah. it's been a long time in the making :)
15:52 imirkin: that pass seems overly conservative, but does seem like it ought to work...
15:52 imirkin: probably missing something silly
15:56 kubast2: got it
15:56 kubast2: -s
16:12 karolherbst: imirkin: which pass?
16:12 karolherbst: ahh texonly live
16:12 karolherbst: yeah, it hangs with some games
16:12 nyef: Missing something silly, like a save vs. infinite loop?
16:13 karolherbst: and there isn't much of a benefit in games
16:13 karolherbst: I get a 15% perf increase in gputest_furmark though
16:14 karolherbst: imirkin: ohh I remember, there was something odd with txs or something, so I added that "insn->op != OP_TEX" check
16:15 karolherbst: ohh right, it was texlod :D
16:17 karolherbst: pendingchaos: if you want to dig into something _really_ making a difference, then you should take a look at zcull
16:17 karolherbst: this should give us around 15% more perf in avarage
16:20 karolherbst: imirkin: also if you find some time, mind reviewing the CTS patches? https://github.com/karolherbst/mesa/commits/cts_v2 everything after "nvc0: enable 4.6" basically
16:21 karolherbst: but except those fp64 patches, all can't be merged independently
16:21 karolherbst: well and that disbale BGRA4 is a hack
16:33 nyef: Hunh. There's a STEREO_TOGGLE GPIO on this board.
16:49 CounterPillow: Any of the devs interested in a hardware donation of a Quadro FX 3500?
17:40 karolherbst: imirkin: any idea why one has to provide 4 coordinates for a 3d surface load inside PTX? Wondering what that's all about
17:40 karolherbst: ohh nvm
17:40 karolherbst: "and is a four-element vector for 3d surfaces, where the fourth element is ignored. "
17:43 kubast2: svn: E000104: Błąd podczas wykonywania kontekstu: Połączenie zerwane przez drugą stronę connection closed by the other side >???
17:44 kubast2: compiling llvm-svn for mesa-git
17:44 karolherbst: yeah well, that's hardly our bug :p
17:44 kubast2: so it's downloading from the repository I see
17:45 kubast2: I thought it's compilling at that moment 0.o
17:45 kubast2: *was
17:48 nyef: First clue that something is wrong: "svn". If "svn" is involved, something is almost certainly wrong.
17:49 nyef: But "at least" it's "just" something wrong with source control.
17:49 nyef: (Such as, using subversion for source control in the first place.)
18:25 nyef: Ugh. AFAICT, the GPIO gets converted to eDP operation when running the initial connector scripts, and converted back to LVDS operation once the decision is made to run an "IED script" (whatever that's supposed to mean) on the LVDS connector. But I don't know how the decision to do the IED script thing or to run with eDP is made.
18:26 kubast2: nyef, I think it just didn't liked the fact I'm downloading drm-next kernel and lib32-llvm-svn alongside 64bit version of it
18:27 kubast2: well the 64 bit version doesn't compile
18:27 kubast2: tools/clang/include/clang/Basic/Version.inc.in does not exist. I will have to check what's that about
18:27 nyef: ... If the panel is powered up as if it were eDP (which should still work if it's an LVDS panel), does LVDS HPD kick in or something? Or... the EDID channel would be active, wouldn't it?
18:28 nyef: And because they share LCDID, we know that the outputs collide?
18:30 kubast2: yeh the download failed I will just remove everything and restart llvm-git install fresh
18:30 nyef: The LVDS panel would have to come first on the DCB list so that the LVDS I2C channel detect has a chance to run, because otherwise... something something eDP startup delay, maybe?
18:32 nyef: So, post-facto, the best bet for determining if the eDP is usable is to check the GPIO pin to see if it's tied to an SOR for its panel power or if it's in standalone mode.
18:33 nyef:sighs.
18:33 nyef: It all hangs together plausibly, but I have no idea if it's actually the case.
18:36 nyef: Actually, if the GPIO function is tied to the SOR, does it matter what we try to drive it as?
18:36 nyef: (That is, do we even NEED to have a conditional in play?)
18:39 nyef: We'd still need a conditional to keep delays down in the LVDS case, but the nature of the conditional would change.
18:44 nyef: Resume-from-suspend requires eDP retrain. And this fails given insufficient power-on delay from the panel. And... Hrm. fbcon is aggressive about modeset post-HPD.
18:45 nyef: If I use the repower-eDP-on-nvkm_conn_init() thing, but remove the delay, it will break resume when in X11, but maybe fbcon would be sane? And that would tie into the HDMI input thing.
18:52 nyef: Score!
18:53 nyef: No delay, just hit the power toggle and the HPD does the rest.
19:00 nyef: Still the question of if GPIO mode special_sor special_idx sor0_panel_power trumps GPIO out or not...
19:03 karolherbst: imirkin: okay, no I have a good enough understanding of that 3d image thing to be able to write proper code :) painful is know, if we go that 2d to 3d converting route, how do we know which surface operation to flip back to 2d again?
19:25 Elv1313: karolherbst, Lyude: Hi again, we spoke about PM a couple days ago. I did my homework on this. I tried with Kernel 4.4 and git-master with my script and with laptop-mode-tools. Totally idle with backlight off, 4.4 takes 6.4w and git-master+nouveau 12.0, almost twice as much. The test setup was running the kernel in single user mode, starting X using `X :0 & sleep 5 && DISPLAY=:0 xterm -c 'powertop'`. laptop-mode and my script
19:25 Elv1313: have the same result because they do the same thing. I checked everything and the git-master kernel has all the PM features it can have, so the NVIDIA card is really the only possible cause for the doubled power usage
19:27 Elv1313: Any idea how to maybe use the ACPI table to really cut off power from nouveau or any other idea how to get that 5.4 watt back?
19:27 nyef: ... Using a bigger battery? (-:
19:27 Elv1313: nyef: It's a Lenevo W series, it's as big as they get
19:28 Elv1313: (W=workstation / desktop replacment)
19:48 imirkin: karolherbst: well the fixup has to receive data to know which way to set it
19:53 karolherbst: imirkin: sure, but we kind of have to coordinate between uniform locations and what texture we actually bind there, also indirects. Like would it be possible to have a texture2D[4] thing and bind 2 2D textures and layer 2/4 from a 3D texture?
19:55 karolherbst: (also we would need to invalidate the shaders after each glUniform* call as well, I guess)
19:55 nyef: About the only other thing that I can think of at this point is that we may not want to retrain eDP links immediately on resume if they're going to fail anyway, we should "just" wait for the HPD PLUG event.
19:55 karolherbst: or at least each glUniform on a sampler/image
19:57 Elv1313: Is it normal that I can't `modprobe -r nouveau` while using the Intel X11 stack?
20:08 imirkin: karolherbst: always load it
20:08 imirkin: sometimes it won't get used but who cares
20:08 imirkin: the fixup should only be for the texture type of the surface op
20:17 karolherbst: imirkin: no, I meant the fixup from 3d back to 2d
20:17 karolherbst: because if I load 0 and convert those to 3d a lot of things just fail
20:19 karolherbst: and I doubt it is trivial to do that, because of indirects and maybe other weird stuff
20:24 nyef: imirkin: Do I recall correctly that the multi-thread stuff is mostly the per-context pushbuf thing?
20:25 nyef: Or is that orthogonal, or does the issue go rather deeper than that?
21:16 nyef: Well, even if these fence and pushbuf patches _were_ sufficient to fix the issue, I wouldn't have applied them. /-:
21:17 nyef: Possibly a good starting point, at least?
22:07 nyef: ... If these ARE the right starting point, then I think I have hardware coverage, at least.
22:27 imirkin: nyef: it's related, but it goes much much deeper
22:28 imirkin: karolherbst: the fixup runs always no matter what. it's not so much about changing as it is about setting.
22:28 imirkin: each time it should set it based on whether it's 2d or 3d.
22:28 imirkin: nyef: basically i think that all the buffer tracking, fencing, and command submission needs to be redone
22:29 nyef: Ouch!
22:29 nyef: Are the fence and pushbuf patches still useful in the face of that?
22:31 imirkin: what patches?
22:31 nyef: Commits linked from https://trello.com/c/9Q2WB0OP
22:54 imirkin: oh heh
22:55 imirkin: those are super-old
22:55 imirkin: i had an attempt which completed that approach
22:55 imirkin: i realized it was unworkable.
22:55 imirkin: and that's when i arrived at the conclusion that the whole thing had to be ripped otu and redone from scratch
22:56 karolherbst: imirkin: okay sure, but how do we actually know if it is a 2d or a 3d one? Wouldn't we have to know what image is actually used with that operation and again, what about indirects?
22:56 imirkin: (those patches are gone, since some jokers decided it'd be hilarious to start shipping them as part of a distro)
22:56 imirkin: karolherbst: has to get passed in, same way as data for other fixups is passed in.
22:56 imirkin: karolherbst: indirects can use the base index
22:56 imirkin: oh ...
22:56 imirkin: hrm.
22:56 karolherbst: ;)
22:56 imirkin: you can have an array of image2D where some are one, some are the other.
22:56 imirkin: well THATS annoying
22:56 karolherbst: yes
22:57 imirkin: i disapprove of such uses.
22:57 imirkin: see what blob does? :)
22:57 karolherbst: 2d
22:57 imirkin: they have the same problem...
22:57 imirkin: ok
22:57 karolherbst: but they fail the piglit test
22:57 imirkin: so they make a shadow surface
22:57 imirkin: oh lol
22:57 imirkin: problem solved.
22:57 karolherbst: well
22:57 karolherbst: in piglit we have one image
22:58 karolherbst: and two bindings
22:58 karolherbst: so we bind layer 0 and layer 5 and write from 5 to the fb and write 0x33 into layer 0
22:58 karolherbst: in the CTS we have two images and copy each layer one by one through a 2d image inside the shader
23:00 karolherbst: the piglit test doesn't make much sense to be honest
23:00 karolherbst: but it works on intel...
23:00 nyef: So, neither necessary nor sufficient. Good to know.
23:01 karolherbst: ohh wait no, in piglit we write 0x33 into layer 5
23:02 karolherbst: so copy layer 5 into the FB and overwrite it with 0x33
23:03 karolherbst: imirkin: that is the FP shader inside the CTS: https://gist.githubusercontent.com/karolherbst/56f7e5269cd086ea5ed90ef9b684121e/raw/5d885ea67d0505845c3afdf63f987c628acc76a2/gistfile1.txt
23:03 karolherbst: (piglits is annoying, because there is some weirdo coordinate magic going on)
23:05 nyef: imirkin: I guess the next questions are basically scope of changes (anything outside of gallium/drivers/nouveau/?), and required approach?
23:09 nyef: (I wouldn't be surprised if there are further reasons why I might not want to run nouveau on my main system, but this is the remaining big one that I'm aware of.)
23:12 karolherbst: imirkin: in the worst case, we do a slct against Z > 0 :(
23:13 karolherbst: but uhm
23:14 karolherbst: I guess we can't because we really want to branch or predicate that stuff
23:24 Subv: mm, how are SSBOs uploaded to the GPU in maxwell? i imagined there was a sort of 'base address' method for them but that doesn't seem to be the case
23:25 HdkR: SSBOs are bound as a GPU pointer
23:26 HdkR: On an UMA system it just needs the mapping to be correct
23:27 HdkR: They don't go in to any binding table or anything if that is what you're expecting
23:27 Subv: the GPU still needs to know where the SSBO actually is, that's what i'm looking for
23:27 Subv: there's GLOBAL_BASE but it seems that went away with Kepler
23:28 HdkR: Right
23:28 Subv: there's also LOCAL_BASE but that sounds a little counterintuitive if i'm looking for the start of global memory
23:28 HdkR: I'm guessing that is the base of thread local memory
23:29 Subv: that's what nouveau sets it to
23:29 HdkR: SSBOs will literally end up being a pointer as a uniform
23:30 HdkR: Which will be used with the laodstore instructions
23:30 HdkR: So you won't know it is a SSBO until you encounter an LD* instruction
23:31 Subv: i see
23:33 HdkR: The only reason why there is a limit on the number of "bound" SSBOs in GL is because in GL the pointers take up some space in the driver managed cbank.
23:33 HdkR: So you don't want too much space taken up with them