00:15 Lekensteyn: karolherbst: yes you can use the AML debugger to directly call methods, read and write values
00:16 Lekensteyn: karolherbst: Q0L2 seems a trigger to bring the link state to L2, Q0L0 should bring it to L0
00:19 Lekensteyn: CPEX is probably some global non-volatile storage, https://github.com/Lekensteyn/acpi-stuff/blob/master/Clevo-P651RA/SANV-0x5FF9BD98.txt#L100
00:19 Lekensteyn: see also the commit message and notes in https://github.com/Lekensteyn/acpi-stuff/commit/fa5d01937fb182689d2dc603cb5a0fdf90d78e8c#diff-d70aaaf0d7a968013a31d6a31abaa609R120
00:20 Lekensteyn: I can confirm in an earlier trace from win10 (bare metal with kernel debugger and AMLi debugger) that CPEX=0x506e3
00:21 Lekensteyn: if you change CPEX and thereby change the code path... that does not seem right. Can you confirm that you actually save power?
00:24 karolherbst: Lekensteyn: yeah, I do save power
00:24 karolherbst: the value changed instead is called P0L2 and P0L0
00:25 Lekensteyn: yh, I am not sure in when P0L0 and P0L2 are supposed to be used. CPEX is not modified anywhere in the firmware, it is only read by the ACPI table
00:26 Lekensteyn: and these ACPI tables are known to have a lot of boilerplate and unremoved code, integrators do not seem to bother cleaning it up
00:26 karolherbst: Lekensteyn: the savings also seem to be equally good
00:26 karolherbst: I reach below 7W with either way
00:27 karolherbst: Lekensteyn: what I am wondering about is, if that value is never written anyway, where does the initial value comes from
00:29 Lekensteyn: my guess is that the UEFI firmware initializes it
00:29 karolherbst: yeah... but where and how
00:30 Lekensteyn: megabytes of blob I am afraid :/
00:30 karolherbst: anyway, using P0L2 instead of Q0L2 makes it work
00:30 karolherbst: which is already strange enough
00:30 karolherbst: I don't see any difference in power savings
00:32 Lekensteyn: what if you boot with an acpi_osi (and maybe acpi_rev_override, spelling?) option that avoids this Q0L2 code path completely and falls back to writing the Link Disable bit?
00:34 karolherbst: yeah... dunno. This workaround is disabled in the stock kernel I have...
00:34 karolherbst: that acpi_rev_override thing
00:34 karolherbst: _but_
00:34 karolherbst: this is no solution
00:34 karolherbst: and I think it didn't work
00:34 karolherbst: but maybe I should retest it
00:37 Lekensteyn: I've been booting with acpi_osi="!Windows 2015" since forever to force the old code path and avoid the hang. The only time I disabled it was while trying to debug and find a solution to this issue (failed)
00:37 karolherbst: mhh
00:37 karolherbst: yeah, but again, this is no proper solution
00:37 karolherbst: how would we able to fix the issue for users?
00:38 karolherbst: that the old path works tells us nothing in particular
00:40 karolherbst: Lekensteyn: anyway, the old path doesn't look like D3cold anyway
00:40 karolherbst: and I highly doubt there are big power savings
00:40 airlied: except that Windows behaviour/expectations changed
00:41 karolherbst: sure, just that we already knew that
00:42 karolherbst: mhh, the old path disables the link via LCTL
00:44 Lekensteyn: setting acpi_osi or directly fiddling with the CPEX field in NVS is indeed not an option. To fix it for users, we'll have to dig deeper to understand the cause and workaround/fix it
00:46 airlied: I assume then Windows has some settings that it does and the BIOS expects that we don't match
00:46 airlied: can we trace Windows there?
00:47 karolherbst: we have some windows traces, but only with an emulated bridge controller :/
00:47 Lekensteyn: I have some ACPI traces from Windows on bare metal, but no PCI accesses
00:47 karolherbst: Lekensteyn: ohh, right, but that's only ACPI
00:47 karolherbst: that won't help us here I think
00:48 karolherbst: I'd expect some magic happening inside the pciport driver or however that one is called in windows
00:49 Lekensteyn: do you have any contacts that are familiar with windows driver development and know how to trace PCI config (and mmio?) accesses?
00:49 Lekensteyn: on bare metal
00:49 airlied: nope, wishw e did :-P
00:50 karolherbst: maybe we should just hire windows kernel devs every 3 years so we can ask them such questions :p
00:55 karolherbst: Lekensteyn: anyway, I highly doubt we are in any way able to dig even deeper. I am not sure if nvidia could just tell us what's up here, even if they did know it. And this entire issue just sounds like something Intel would be responsible for
00:55 Lekensteyn: maybe ask your parent company, IBM :P
00:55 karolherbst: especially because those involed registers are not documented
00:55 karolherbst: Lekensteyn: not yet :p
00:56 karolherbst: but why would they know something?
00:56 Lekensteyn: maybe they have Windows devs?
00:56 karolherbst: for what hardware?
00:56 karolherbst: userspace stuff? for sure
00:56 karolherbst: but kernel drivers?
00:57 Lekensteyn: hm ok, maybe such a large org has former kernel devs
00:57 Lekensteyn: but yeah, does not help if Intel is to blame
00:57 karolherbst: probably
01:01 karolherbst: this issue is already a complete pain in the ass and it doesn't appear to be any less of a gigantic headache :/ I kind of hoped that digging deeper into all this would leave us with less unanswered questions, but it always seems like the opposite is is the result of all this
01:01 karolherbst: at least the list of potential workarounds is increasing
01:02 airlied: Lekensteyn: IBM doesn't even have an X86 divison anumore
01:02 airlied: they sold it years ago
01:03 airlied: just ask as many people as possible, intel, nvidia, upstream devs from either of the above
01:03 karolherbst: I already pinged bjorn and nvidia
01:03 karolherbst: no idea who to ask at intel
01:04 karolherbst: but I started to ask around a year ago or something
01:04 karolherbst: mhh, maybe more like half a year
01:05 karolherbst: heh.. first written question was actually august last year
01:05 karolherbst: and I am sure I've talked about it before already
01:05 airlied: ask the thunderbolt person
01:05 airlied: Mika?
01:07 karolherbst: Mika Westerberg?
01:07 karolherbst: maybe
01:58 airlied: karolherbst: yeah he'd be a good stat
03:48 JayFoxRox: how does linear texture interpolation / addressing work on NV20? [xbox GPU in particular]
03:49 JayFoxRox: how many fractional bits are there, and how does it interpolate?
04:18 imirkin: JayFoxRox: not sure that's fully documented
04:36 JayFoxRox: do we know how they interpolate at all? it's not the algorithm from the GL 2.x spec I think - somehow when drawing a checkerboard, it seems to bias pixels near the edges in texture space
04:37 JayFoxRox: so a 16x16 checkerboard with 4x4 tiles, displayed at 640x48 will still render as a 16x16 checkerboard - just blurry
04:37 imirkin: certainly not from GL 2.0
04:37 imirkin: probably more like D3D8?
04:38 imirkin: what did you expect btw?
04:40 imirkin: i assume you're using GL_LINEAR interp?
04:41 JayFoxRox: all following screenshots on OpenGL 4.x on intel HD 4000 / MESA:
04:41 JayFoxRox: https://cdn.discordapp.com/attachments/428359408204906497/592936624153690132/2019-06-25-063753_648x506_scrot.png - 64x64 texture, NEAREST filter
04:41 JayFoxRox: https://cdn.discordapp.com/attachments/428359408204906497/592936628113375233/2019-06-25-063806_648x506_scrot.png same texture. but LINEAR filter
04:41 JayFoxRox: https://cdn.discordapp.com/attachments/428359408204906497/592936625978474496/2019-06-25-063759_648x506_scrot.png < mimicking what I see on Xbox, this is actually rather accurate, but I don't fully understand my own code + it won't work if the checkerboard is alternating every pixel
04:41 imirkin: it's probably a wider interpolation
04:42 imirkin: https://docs.microsoft.com/en-us/windows/desktop/direct3d9/d3dtexturefiltertype
04:42 imirkin: look at some of the ones further down
04:43 imirkin: otoh those aren't supported in hw...
04:50 imirkin: come to think of it, it COULD be highly inaccurate fractional coordinates...
04:50 JayFoxRox: no.. nevermind. I'm an idiot
04:50 JayFoxRox: it probably is a convolution filter that I accidentally enabled >.<
04:50 imirkin: lol, oops
04:51 JayFoxRox: I had been trying to reduce fraction precision and adding biases and working with LUTs and all sorts of stuff already..
04:51 JayFoxRox: when you said it might be a different filter I was going to show you my low-level code which sets up the texture unit.. and then I realized the mode is wrong >.<
04:52 JayFoxRox: so thanks :P
04:52 imirkin: hehe np
04:52 JayFoxRox: (I have been at this for 7 hours or so)
04:52 imirkin: i find that after about 24h, i start believing things like multiple inheritance are a good idea.
04:52 imirkin: so you still have a ways to go :)
06:16 JayFoxRox: their quincunx filter appears to be broken - also no idea how their mipmapping works. I'll probably document my rough findings on XboxDevWiki soon and then I'll probably add them to nouveau docs once I've figured out the details
06:18 imirkin: i dunno how you're testing this
06:19 imirkin: but if you're not supplying depth properly
06:19 imirkin: then anisotropic filtering will play a number on the results
06:20 JayFoxRox: I don't think anisotropic filtering is turned on - but I should double check tomorrow
06:21 JayFoxRox: I basically tried setting my 16x16 texture to be entirely black, with 3 white dots spaced out. I then rendered it at 640x480 and checked the pixel values to guess where the convolution filter sampled
06:21 imirkin: is it a mip-mapped texture?
06:22 imirkin: basically i'm concerned that you're picking the wrong mip to sample from
06:22 JayFoxRox: no mipmap supplied and max level 0
06:22 JayFoxRox: the gaussian 3x3 has the expected 1,2,1;2,4,2;1,2,1 - the quincunx has 1,0,1; 1,4,1; 0,0,0
06:22 JayFoxRox: the linear and nearest filters look as expected, too fyi
06:22 JayFoxRox: so only the quincunx is very strange, as it's asymetric
06:23 JayFoxRox: I'd have expected a rotated gaussian 3x3 or something - but that doesn't seem to happen
06:23 JayFoxRox: (take the contribution factors in those kernels with a grain of salt btw - I'm basically guessing values by looking at my TV. I don't have a good method to dump framebuffers in my unit test)
06:25 JayFoxRox: my software renderer [where I emulate this] has these results:
06:25 JayFoxRox: https://cdn.discordapp.com/attachments/428359408204906497/592954076673736717/2019-06-25-074733_648x506_scrot.png GL_NEAREST
06:25 JayFoxRox: https://cdn.discordapp.com/attachments/428359408204906497/592954319435989012/2019-06-25-074840_648x506_scrot.png gaussian 3x3 convolution
06:25 JayFoxRox: https://cdn.discordapp.com/attachments/428359408204906497/592954078833803265/2019-06-25-074739_648x506_scrot.png quincunx convolution
06:25 JayFoxRox: this is using the rough kernel approximations I have seen on the TV - and it looks very very similar at least
06:26 JayFoxRox: the code for these experiments is here: <https://github.com/JayFoxRox/software-shader-texture/blob/master/main.c>
06:26 imirkin: mwk: in case this is interesting to you --^
06:27 JayFoxRox: [and the xbox code is at <https://github.com/JayFoxRox/nxdk/pull/13>]
06:27 JayFoxRox: [the xbox code is outdated - I'll push more to it tomorrow probably]
10:22 karolherbst: pmoreau: btw, I think I know a way how to make curro happy about the changes... might just need a little more work on the driver side :/
15:09 imirkin_: Lyude: were you able to find a nva0? or should we just revert that i2c change?
15:17 RSpliet: imirkin_: I believe I have an NVA0 at home. Not 100% sure it fits in my PC, but I should have one. Is there anything that needs testing?
15:17 imirkin_: if not, then i'm sure your PC fits inside it :)
15:18 imirkin_: there was a report of fan control fail with some recent i2c changes
15:18 RSpliet: It is that kind of a card IIRC. Anyway, you should have access to the VBIOS to see if my card has a similar i2c fan set-up
15:44 Lyude: imirkin_: I didn't get a chance to check, but feel free to revert it for now if you think that's a good ide
15:44 Lyude: I don't think it ended up fixing the original problem it was intended to fix anyway
15:44 imirkin_: well, i'm just looking at it as a regression
15:44 imirkin_: which is !good
15:45 Lyude: yeah-hence why I'm ok with just having it reverted for now :p
15:45 imirkin_: but i also know it's highly annoying to have one's code reverted, so i wanted to give you an opportunity to fix it before doing a revert
15:45 Lyude: ahh, alright
15:45 Lyude: btw: skeggsb, ^ if I don't have an nva0 do you think I could get ssh access to one of yours?
15:46 imirkin_: nva0's tended to have "interesting" sensor/etc setup
15:46 imirkin_: so it might even be something specific to that person's
17:30 karolherbst: imirkin_: wow.. I let that user test increasing the PCIe link speed on his nearly 4k screen, "20% faster" from 2.5 to 8.0...
17:31 karolherbst: something is terribly wrong
17:35 Lyude: imirkin_:oh hey! Looks like I actually have a GTX 275, nva0
17:36 imirkin_: cool. it's likely similar. at least worth a test.
18:02 Lyude: oh-there it goes
18:03 Lyude: yes, I can definitely reproduce this issue :)
18:03 Lyude: which is kind of surprising, seeing as this thing doesn't seem to have a fan readout
18:05 imirkin_: surprised it doesn't have a tach
18:05 imirkin_: usually there are additional sensors
18:05 Lyude: mhm
18:05 imirkin_: it won't be on the "primary" thing
18:05 Lyude: [ 235.649236] nouveau 0000:22:00.0: DRM: GPU lockup - switching to software fbcon
18:05 Lyude: also happens
18:05 imirkin_: heh
18:05 imirkin_: that's just FANtastic
18:05 Lyude: i'm going to make sure that's related to the i2c changes (if so, I'm kinda impressed)
18:05 Lyude: hehehe
18:10 imirkin_: crazier things have happened
18:13 karolherbst:adding tons of new projects to the EVoC/GSoC idea page
18:13 Lyude: karolherbst:do tell!
18:13 karolherbst: Lyude: one of the most "fun" project would be to make nouveau ready for eGPU
18:13 karolherbst: that's a super good task though
18:13 karolherbst: requires 0 understanding of GPUs
18:15 Lyude: interesting, the lockup is unrelated to the i2c changes
18:16 Lyude: fan speed sure is related though
18:16 imirkin_: ok. don't worry about the lockup (for now)
18:16 imirkin_: you can boot with nouveau.noaccel=1 nouveau.nofbaccel=1 to avoid the lockup
18:16 Lyude: imirkin_: yeah i wasn't planning on it :p
18:16 Lyude: looking to figure out what exctly is breaking here
18:17 imirkin_: although if you could, afterwards, spend a bit of time to see if the nva0 lockup is recent, that'd be appreciated.
18:17 Lyude: yeah I can do that
18:19 karolherbst: Lyude: https://cgit.freedesktop.org/wiki/xorg/commit/SummerOfCodeIdeas.mdwn?id=5d8b13f1fc794a455174d9bf9960d7a615d1933c
18:22 imirkin_: diffivulty?
18:24 karolherbst: fun fact, with Tesla GPUs it seems to work, as the fini path isn't locking up
18:24 karolherbst: well, unplug with no users that is
18:30 Lyude: well that sure is, interesting. there is absolutely no i2c accesses in nvkm/subdev/therm/nv50.c
18:39 Lyude: Hah
18:39 Lyude: I have a theory that now looking at this for long enough seems a little less crazy
18:40 Lyude: there are indeed i2c accesses in this file, but I think they are all being done using nvkm_wr32() on the address range where the i2c busses live
18:40 imirkin_: iirc there's some bit-banging going on
18:40 imirkin_: for PWM
18:41 imirkin_: maybe that's for like nv40
18:42 imirkin_: it definitely uses gpio's
18:43 imirkin_: might be in the same range
18:43 imirkin_: (also looking at i2c/auxg94.c)
18:43 Lyude: they look like they're in the same range, yeah
18:43 imirkin_: there's no fini on i2c though...
18:43 imirkin_: not sure what you're shutting down
18:44 Lyude: yeah, and the patch doesn't actually change anything on the hw
18:44 imirkin_: o i see... it just makes nvkm_i2c_bus_acquire fail
18:45 Lyude: mhm-which we don't actually call anywhere here
18:45 Lyude: in the therm code I mena
18:45 Lyude: *mean
18:45 imirkin_: but you only do it for aux
18:46 Lyude: no, we do it for both aux and bus in the mainline patch
18:46 imirkin_: you don't do it for e.g. pad_init from what i can see
18:47 Lyude: I don't think we need to since we don't expose pads to userspace, and having the bus/aux disabled is enough
18:47 imirkin_: erm
18:47 imirkin_: you might not expose them to userspace
18:47 imirkin_: but they still have ->enabled =false
18:47 imirkin_: which means that the get will fail
18:48 Lyude: mhm, again though - we aren't actually doing any pad, aux, or i2c stuff from nv50.c (at least not with nvkm, the nvkm_rds/wrs in that file are weird)
18:48 imirkin_: oh i see. hrm.
18:48 imirkin_: well, that stuff is a bit elsewhere. there are various sensors
18:48 imirkin_: check like iccsens
18:48 imirkin_: and some others
18:49 Lyude: ahhh, I didn't notice that
18:50 imirkin_: can you throw a WARN() in there when you return -EIO?
18:51 imirkin_: i suspect that should help you track down when it's happening
18:53 Lyude: yeah I can, I think I already see where it's messing up though :)
18:53 Lyude: this shouldn't take long to fix
18:53 imirkin_: kk
19:54 Lyude: lord, that explains it. it isn't iccsense, it's bios
19:57 RSpliet: init scripts?
19:57 karolherbst: Lyude: why would it be iccsense?
19:57 Lyude: no, it's definitely our driver
19:57 Lyude: karolherbst: imirkin_ had suggested it
19:57 imirkin_: i was just pointing out things use i2c
19:57 karolherbst: mhh, iccsense is just reading from i2c
19:58 karolherbst: in case there is a therm sensor
19:58 karolherbst: which isn't the case for tesla
19:58 karolherbst: I think
19:58 karolherbst: fermi is the first ones where it becomes relevant
19:58 imirkin_: nva0 has various fancy shit
19:58 karolherbst: I highly doubt they have power sensors though
19:58 karolherbst: s/therm/power/ (from my older replies)
19:58 Lyude: anyway-the fix for this should be simple then, enable i2c busses in preinit
20:01 Lyude: RSpliet:btw, you were probably right guessing it's init scripts
20:02 Lyude: interestingly enough, this means we've also always been using the i2c busses on tesla GPUs before they're setup when resuming the gpu
20:04 pmoreau: karolherbst: What change would it be? (I really need to re-read the latest messages about the pc vs entry point name discussion.)
20:10 karolherbst: pmoreau: well, to implement the serialization of our shader binaries earlier
20:10 karolherbst: this will essentially remove the need for certain hacks
20:11 karolherbst: and we need this anyway
20:11 karolherbst: I am just not sure if we want to get rid of this entry_point field
20:12 pmoreau: Let’s see if I remember this right, the binary generation only happens right before the kernel is launched?
20:12 karolherbst: pmoreau: my idea was to not pass a full spirv and force the driver to compile against each entry point :/
20:12 karolherbst: pmoreau: yes
20:12 karolherbst: and we have to move it to an earlier point
20:12 karolherbst: but curro wants to do it at cl_program creation
20:12 karolherbst: and just compile for all entry points
20:12 karolherbst: but moving it up also means we have to serialize it
20:13 karolherbst: which CL requires anyway
20:13 pmoreau: Right
20:13 karolherbst: I am just not convinced that the driver has to produce a multi entry point capable binary
20:13 karolherbst: my idea is that clover just calls into the driver to compile for a specific entry point
20:13 pmoreau: I agree with you that both should be doable at least.
20:14 karolherbst: yeah
20:14 karolherbst: that's what the CAP is for
20:14 karolherbst: but curro was again tight up on technicallities :/
20:14 karolherbst: it's a bit annoying him being like this
20:14 pmoreau: Regarding the serialisation, do you remember what OpenCL requires?
20:14 karolherbst: yes
20:14 karolherbst: uploading a binary to launch a kernel
20:15 karolherbst: instead of source
20:15 karolherbst: and I think the spec even requires that if you get a binary from the same driver version, it has to work
20:15 karolherbst: although not 100% sure on that
20:16 pmoreau: Sure, I was more thinking as to whether we need to provide a binary/blob containing everything, or just whatever had been compiled.
20:16 karolherbst: only what has been compiled I think... let me check
20:16 karolherbst: uff
20:16 karolherbst: actually
20:16 karolherbst: pmoreau: it's called on the cl_program object
20:16 karolherbst: clGetProgramInfo with param_name=CL_PROGRAM_BINARIES
20:17 karolherbst: but anyway, we can handle that just fine inside clover
20:17 karolherbst: by just having an array of all binaries
20:17 karolherbst: and we could even compile lazily
20:18 pmoreau: “The decision on which information is returned in the binary is up to the OpenCL implementation” but since it’s on the program object, I doubt we can return the binaries for only some of the entry points.
20:18 karolherbst: yeah
20:18 karolherbst: but we don't have to compile them all when creating the cl_program object
20:18 karolherbst: just when clGetPRogramInfo is called or something
20:18 karolherbst: but that's for later optimizing
20:19 pmoreau: Sure
20:19 karolherbst: I am just annoyed why curro thinks "pipe_grid_info::pc" is some first level general field everybody is using, and my entry point thing is some stupid spirv specific stuff, which is totally clover specific and ::pc is not
20:20 karolherbst: ::pc is as clover specific as is ::entry_point
20:21 pmoreau: How are subroutines handled? Do they use the ::pc member?
20:22 karolherbst: nothing uses ::pc
20:22 karolherbst: it's not even state in the mesa state tracker
20:36 karolherbst: ohh right, I wanted to add the tearing prevention stuff as well
20:40 karolherbst: https://cgit.freedesktop.org/wiki/xorg/commit/SummerOfCodeIdeas.mdwn?id=aa5eec6a9641990e10f4547f53b7e578dd46b30b
20:40 karolherbst: would be nice if we could extend that list a bit
21:02 Lyude: imirkin_: i2c stuff fixed, just waiting for a response on the bugzilla. Will start bisecting that GPU lockup issue now
21:02 imirkin_: thanks =]
21:02 Lyude: np
21:04 karolherbst: Lekensteyn: can we do ACPI calls on windows?
21:04 karolherbst: for eg retrieving the value of \CPEX?
21:05 karolherbst: I mean, I don
21:05 karolherbst: 't expect it to be different
21:05 karolherbst: but.. what if it is indeed different?
21:06 karolherbst: and it would also be interesting to know if we could reproduce the issue on windows as well