00:38 karolherbst: imirkin: "multiple instances of buffer 50 on validation lis"
00:38 karolherbst: validate_init: -22
00:38 imirkin: sure sign of multithreading
00:39 imirkin: if you can make an apitrace that causes that, i can take a look. my guess is that you can't.
00:39 karolherbst: my guess is the same
00:39 karolherbst: it also didn't happen after a few minutes, but more like few hours
00:46 karolherbst: imirkin: but shouldn't it be quite enough to know there is an issue to check like all places where this might happen or would it be much to high effort?
00:46 karolherbst: (well if we ignore the fact that even if we find a few places, we still don't know that we find the correct one=
00:59 imirkin: not sure i understand the question.
07:21 tagr: imirkin: not much
07:21 tagr: imirkin: is this related to the C8 support that you've been trying to fix in Nouveau?
07:28 tagr: skeggsb: any ideas how I could debug the big page support for Tegra? I've got a couple of fixes that make things work again after the v4.15-rc1 changes had regressed, but the big page support still doesn't work
08:09 tagr: imirkin: it's possible that Tegra uses the same (or a similar) LUT implementation as the desktop GPUs, though historically the display engines have been quite different
08:09 tagr: imirkin: 1025 entries sounds familiar, though
08:09 tagr: let me read up a bit
08:55 nuovolnx: excuse me. Anyone know how to use nouveau open source drivers to update the firmware of the graphics card (nvidia gt 320 on acer x3900)?
08:57 karolherbst: nuovolnx: why would you want to update the firmware?
08:58 karolherbst: mhh, maybe they update the vbios though...
08:58 karolherbst: anyhow... it isn't really important to do that, except you have some severe issue nouveau is hitting due to that
08:59 karolherbst: nuovolnx: but you can also just pass the new vbios to nouveau and it would use that one
08:59 nuovolnx: karolherbst, I would use open driver but I'm the problem the computer is block with nouveau on google maps and on application to use 3d
08:59 karolherbst: that might have other reasons than buggy vbios
09:02 nuovolnx: karolherbst, I now have legacy nvidia drivers via non-free repositories, but I do not get much.
09:05 nuovolnx: karolherbst, could you help me?
09:07 karolherbst: nuovolnx: no idea, I don't really know any details about your issues.
09:09 nuovolnx: my issues is with nouveau driver: computer is blocked with google maps and with applications that uses 3d
09:11 karolherbst: nuovolnx: well you could create a bug on bugzilla and provide all the information about your system, which might be relevant here (kernel version, mesa version, software being used, etc..)
09:17 karolherbst: mupuf: any smart ideas what I can do, when something works using mmiotrace under nouveau and not if I don't trace?....
09:46 mupuf: Try to add random delays in your code
10:12 karolherbst: how should we work with that kind of shit... seriously. Everything is fine until those falcons are started with the LS blobs, then they "magically" go into an error state and we have like no means to properly debug this mess
10:13 mupuf: ...
10:14 karolherbst: mupuf: https://gist.github.com/karolherbst/613ed83237d528587108b715e6691a68
10:14 mupuf: what is this reg?
10:15 karolherbst: SCRATCH0 on gr
10:15 karolherbst: ;)
10:15 karolherbst: well, rnndb calls it PGRAPH.CTXCTL.CC_SCRATCH[0]
10:16 mupuf: ok, and what do you store there?
10:16 mupuf: but yeah, you can't read the reg in HS mode?
10:16 karolherbst: okay here is how it works:
10:16 karolherbst: the LS firmware starts and writes into that reg when it is done "booting
10:17 karolherbst: "
10:17 karolherbst: it should write something, which results in true true if & 0x1
10:17 karolherbst: we just wait on the host, until the falcons are ready
10:17 mupuf: I see
10:19 karolherbst: I want full docs on the entire falcon security stuff, otherwise we can't debug shit... seriously, or they should provide a dev, we can annoy 24/7 with issues like this, I don't care.... I just want to have a way to deal with this kind of annoying super mess
10:20 karolherbst: and we already had buggy firmware on the gp107
10:21 mupuf: sounds lovely!
10:23 karolherbst: right... but seriously, we have to find a solution for this
10:24 karolherbst: I mean, it can't be, that I spend hours on figuring out what Nouveau does wrong, if the firmware they provide do nothing accept: 1. tell us everything is fine or fuckup the system with providing no feedback at all
10:25 karolherbst: well, they do provide feedback: something is wrong
10:26 mupuf: karolherbst: can we reset the hw and try again if this fails?
10:26 karolherbst: no
10:26 karolherbst: or maybe?
10:26 karolherbst: I know that suspending kind of "resets" the GPU enough
10:27 karolherbst: but unloading nouveau and loading it again, breaks the machine even more
10:27 mupuf: don't go that far
10:27 mupuf: try resetting pdaemon
10:27 karolherbst: what has pdaemon to do with it?
10:28 mupuf: pdaemon + gr
10:28 karolherbst: again: what has pdaemon to do with any of it?
10:29 mupuf: pdaemon is responsible for loading the pgraph firmware, isn't it?
10:29 karolherbst: not on pascal
10:29 mupuf: oh
10:29 mupuf: that changed, ok
10:29 karolherbst: well, it runs the _unloading_ binaries though....
10:29 karolherbst: but they use that sec2 falcon for loading
10:30 karolherbst: pdaemon doesn't run shit until you unload stuff
10:30 karolherbst: but here is the funny thing: the entire GPU goes down
10:30 karolherbst: and while doing "rmmod nouveau", all the mmu operations timeout as well
10:31 karolherbst: and like all regs start to return bad*
10:31 karolherbst: except a few
10:31 karolherbst: like 0x0
10:31 karolherbst: even 0x88000 returns bad00100
10:31 karolherbst: and that's where the "normal" PCI stuff should be
10:32 mupuf: ...
10:32 karolherbst: 0x200 still returns stuff though
10:33 karolherbst: 101000 also returns bad, just in case you wondered if some other basic stuff still responds normally
10:34 karolherbst: PTIMER? also plain bad
10:34 karolherbst: so, good luck with resetting ;)
10:35 karolherbst: mhh, interesting
10:35 karolherbst: 9084 isn't bad: PTIMER.MMIO_FAULT_ADDRESS => { UNK0 | ACCESS = READ | ADDRESS = 0x80100c80 }
10:47 cyndis: fwiw, if for debugging purposes, on tegra it is possible to reset both igpu and dgpu at runtime
10:49 cyndis: (at least i'm pretty sure about dgpu)
10:51 karolherbst: well, currently going into suspend also does quite nicely, but in the end, it doesn't help. Evenn if we would have a way to reset it in 1ms, it doesn't help. Not at all
10:51 karolherbst: I simply want to know why the firwmare things killing the GPU is a good idea
10:51 karolherbst: that's all I need
10:52 karolherbst: or maybe it is done by the falcon itself, because it thinks the system is compromised?
10:55 karolherbst: I wasn't joking about getting documentation about when and how the "security" part of the falcons thinks killing the GPU is a good idea part...
10:57 mupuf: karolherbst: so, this may be due to us not being able to handle faults
10:57 karolherbst: maybe? maybe not?
10:57 karolherbst: the point is: we don't know how to get that kind of information
10:57 mupuf: what is the loading failure rate?
10:58 karolherbst: 100%
10:58 karolherbst: basically
10:58 karolherbst: except I do an mmiotrace?
10:58 karolherbst: maube it was random?
10:58 karolherbst: dunno
10:58 karolherbst: it pretty much feels more like 100% than it feels like 50%
10:58 mupuf: none is acceptable, but just for my information
10:58 karolherbst: well, it works when I load nvidia before nouveau
10:59 karolherbst: and then i works in 99% of the cases pretty much
10:59 karolherbst: *it
11:05 mupuf: I see
11:06 mupuf: yeah, go figure out what is wrong :s
11:14 mupuf: karolherbst: I would suggest diffing the reg space before loading the blob and after unloading it
11:14 mupuf: especially in the MMU area
11:53 karolherbst: mupuf: yeah, maybe this will help
11:53 mupuf: karolherbst: there atre scripts for htat in the vbios repo IIRC
11:55 karolherbst: mupuf: currently I try to figure out something from the falcon code, maybe there is something obvious in it
11:56 mupuf: good luck
11:57 karolherbst: well, I already found the place where they write 0x1 to 0x409800
11:59 karolherbst: interesing. a conditional jump
12:12 karolherbst: there are enough writes into the mmio space I might be able to read out and maybe I can pinpoint the exact location where it bails, dunno
12:34 mupuf: karolherbst: good luck!
12:35 karolherbst: ohhhhhhhhh wait.....
12:35 karolherbst: mhh, no the entire image should be secured
12:36 karolherbst: can a security lockdown kill the entire GPU?
12:39 karolherbst: okay fun
12:39 karolherbst: mupuf: if I don't start the falcons, the GPU stays sane. Concidence?
12:39 karolherbst: *coincidence
12:40 mupuf: ha ha ha ha
12:40 mupuf: well, it is possible indeed
12:40 karolherbst: well, now I am sure it is some kind of secboot issue
12:40 karolherbst: whatever the actual error is
12:41 karolherbst: might be Nouveaus ones as well. But I would also blame a compiler for giving me crappy error messages even if I wrote crappy code
12:41 karolherbst: same applies here really
12:44 karolherbst: LOL
12:44 karolherbst: the actual fuck
12:44 karolherbst: mupuf: now I tried to be smart: let just start one of the two at once, see which one messes up. apperantly I got timeouts if I have one disabled, doesn't matter which one. guess what happens after I start both now?
12:45 mupuf: it works? :D
12:46 karolherbst: everything else wouldn't make sense, right?
12:46 mupuf: so, by both you mean sec2 and pgraph?
12:46 karolherbst: nonono
12:46 karolherbst: "nvkm_falcon_start(gr->gpccs);" and "nvkm_falcon_start(gr->fecs);"
12:47 mupuf: I see
12:47 mupuf: well, if there are inter-dependencies, that's to be expected
12:48 karolherbst: maybe we have to wait for a second
12:48 karolherbst: ...
12:52 karolherbst: well at least suspending the laptop brings the GPU into the same state everytime
13:04 imirkin: tagr: actually i was hoping to get info on how the LUT worked for 30-bit modes
13:05 imirkin: but i looked in drm/tegra, doesn't seem to be any gamma stuff in there at all
13:13 tagr: imirkin: no, we don't support gamma yet, not something I ever got around to
13:13 tagr: also, we don't really have support for 30-bit modes on anything prior to Tegra186 as far as I know
13:14 tagr: imirkin: I guess it also depends a lot on what LUT you're talking about, if you're looking at 30-bit modes I suspect you're looking at the CSC LUT
13:15 imirkin: well, looking at the official nvidia display class evo docs, i'm looking at the ... (sec)
13:15 imirkin: HEAD_SET_BASE_LUT_LO
13:16 imirkin: there's also an OUTPUT_LUT which we don't use
13:16 imirkin: i saw no mention of CSC things, but ... what do i know
13:16 imirkin: [this is for dGPUs, obviously]
13:17 imirkin: unfortunately the testing cycle on display things *stinks*, so i was hoping to get some advance intel before rebooting 100000 times
13:17 tagr: imirkin: yeah, like I said EVO and Tegra are completely different architectures, though occasionally things got integrated back into Tegra
13:18 tagr: especially encoder type stuff, the display controllers themselves are still fairly different from what I can tell
13:18 tagr: imirkin: let me go look at the docs myself
13:18 imirkin: well, if you could tell me the order in which colors in a 1025-sized LUT go, that'd be great
13:19 imirkin: i suspect that the low 2 bits get moved to the front of the index id
13:19 imirkin: also what that extra entry is for
13:25 tagr: imirkin: https://nv-tegra.nvidia.com/gitweb/?p=linux-t18x.git;a=blob;f=drivers/video/tegra/dc/nvdisp/nvdisp.c;h=ebbfaeec6c95c89f30227d6a87ceaca8d954ac62;hb=l4t/l4t-r28.1#l684
13:26 tagr: imirkin: not sure if that helps, but it's at least a 1025-sized LUT
13:27 tagr: oh... doesn't look like that helps with the ordering, though
13:27 tagr: since it uses the same value for each of RGB
13:28 tagr: I was reading that code as: r << 32 | g << 16 | b
13:31 tagr: imirkin: do you have a link to the evo docs? I can't seem to find them
13:34 pmoreau: tagr: Not having a link to the doc you post? :-p http://download.nvidia.com/open-gpu-doc/Display-Class-Methods/1/
13:35 pmoreau: (I mean, not necessarily *you*, but the company)
13:39 tagr: pmoreau: yeah, I rely too much on Google to find things for me...
13:39 tagr: pmoreau: thanks
13:40 pmoreau: We need to boost Google’s ranking of those pages :-D
13:42 tagr: pmoreau: actually if you include open-gpu-doc it will find it, but I didn't think of that
13:53 imirkin: tagr: well, it could also be that there's something else wrong, and the order is fine
13:53 imirkin: tagr: we're on the "HIRES" lut, which i believe is only 257 entries
13:53 imirkin: that said ... those LUTs aren't *exactly* easy to read (from the nvdisp.c code)
13:55 imirkin: btw - good job nvidia and amd -- named your display controllers "DC".
13:59 mupuf: imirkin: you mean tegra code for nvidia, right?
14:12 supermart: hi guys, well then look at some LLVM tests of readeon too please, they know more about reg_sequence and indirect addressing then me too probably , charging by the written MIR test https://git.llvm.org/klaus/llvm/blob/1bf162a64a244eca4a75d7079e51195b2169d4b0/test/CodeGen/AMDGPU/detect-dead-lanes.mir
14:13 supermart: the relevant parts from there with movrel which is probably the lifting of reg_sequence (not sure though) are the test1/2
14:17 supermart: the meaning behind that, is what hypothesize on is probably that, both pointing and repointing the empty lanes takes around two movrel instructions
14:20 supermart: you see , i kinda figured that AMD devs based of my sniffing on the web, plus their reactions to my talks, that they are fully ever of how it is done and works, and know even better then me how hw behaves
14:23 supermart: after all, as when allready working for such company does not require much intelligence , to gather how pointers behave in circuits, as in comparison today due to i have wasted time, i know round about, but details will be covered this month
14:24 supermart: however them refusing to push the code and trolling, they obviously have known it all the time how pointers/arrays function
14:57 karolherbst: mhhhhhh, messy
14:57 karolherbst: I think the issue is related to starting the falcons, but, I don't know why they get upset
14:58 karolherbst: and it seems to only happen when both the gpccs and the fecs are started
15:03 karolherbst: mupuf: set breakpoints from the host and debug from userspace :3 let's see if we can at least do that
15:25 karolherbst: mupuf: messy, I can't debug those falcons either :(
15:32 mupuf: not fun..
15:33 karolherbst: well
15:33 karolherbst: at least I can dump the loaded binary.....
15:35 karolherbst: I can't break either
20:38 imirkin_: anyone know if skeggsb is back?
20:38 skeggsb: imirkin_: i am, sort-of, i'm sick now, so somewhat in/out still
20:38 imirkin_: ah ok. sorry to hear that.
20:39 imirkin_: feel better :)
20:39 imirkin_: in the meanwhile... is CPU-side VA allocations a thing with your new VMM stuff?
20:39 skeggsb: all good, hopefully just a cold from the plane, and not something worse like malaria :P
20:39 imirkin_: where did you go?!
20:39 skeggsb: india
20:39 imirkin_: ebola apparently also has flu-like symptoms
20:39 skeggsb: it is, but not exposed to userspace yet
20:39 imirkin_: did you eat monkey brains?
20:40 skeggsb: haha no, but i got bitten by a *lot* of mosquitoes
20:40 imirkin_: you're probably fine then.
20:48 koz_: imirkin_: It seems the Minetest perf issues on nouveau are due to mipmapping.
20:48 koz_: More specifically, nouveau seems unable to mipmap 512 textures, while the blob has no issues.
20:48 koz_: (if I turn off mipmapping with the blob, I get the same result as I get with nouveau with mipmapping on)
20:48 koz_: (supposedly, anyway)
20:49 imirkin_: to mipmap or not to mipmap, that is the question
20:49 imirkin_:wonders what "mipmapping" does
20:49 koz_: imirkin_: https://en.wikipedia.org/wiki/Mipmap
20:50 koz_: I'll also let karol know when he appears.
20:50 imirkin_: i'm quite well aware of what a 'mipmap' is
20:50 imirkin_: however what i don't know is what enabling that option does in minetest.
20:50 koz_: Ah, I see.
20:51 koz_: What specific question could I ask the maintainers to help clear that up for you?
20:51 koz_: Well, question or questions.
20:51 imirkin_: e.g. does it constantly call glGenerateMipmaps()?
20:51 koz_: I have an issue thread which has their attention if that helps?
20:51 imirkin_: that might be an unoptimized path in nouveau
20:52 imirkin_: or perhaps they create a ton of texture views with different base levels instead of adjusting samplers
20:52 imirkin_: or perhaps ... who knows.
20:52 koz_: imirkin_: https://github.com/minetest/minetest/issues/6682 <-- thread in question
20:53 koz_: Don't mind the closure - they're open to discussing it, it's just that they're not sure if this is a Minetest issue specifically.
20:53 koz_: I'd be happy to ask on your behalf, but I have zero clue about any of this.
20:53 koz_: One thing that I've noticed is that this only happens for 512 textures.
20:53 koz_: 256 and below are fine.
20:58 imirkin_: i've asked.
20:58 imirkin_: i think that i should be auto-cc'd on any replies
20:59 koz_: imirkin_: Thank you for your diligence. :)
21:03 koz_: imirkin_: If I understand correctly, I can side-step the whole issue by just having a card powerful enough not to need mipmapping?
21:03 imirkin_: i think you have an incomplete understanding of what mipmapping is
21:03 koz_: That is almost certain.
21:04 imirkin_: it's a way to make a texture look good even if it's being scaled down a lot
21:04 koz_: So then why does turning it off drop my framerate?
21:04 imirkin_: who knows. but it has nothing to do with the literal mip maps.
21:05 imirkin_: this is something that drives me mad...
21:05 imirkin_: people see some tick box in some config menu in a game
21:05 imirkin_: and assume that it maps 1:1 with some hw feature
21:05 imirkin_: and then it all of a sudden becomes "why is AA broken"
21:06 koz_: Sorry about my ignorance, then.
21:06 imirkin_: even though it may have nothing to do with $feature, just how $application makes use of $feature
21:06 imirkin_: well - it's understandable
21:06 imirkin_: just ... doesn't help me any.
21:06 koz_: Oh, of course.
21:07 koz_: It's really something I know nothing about.
23:49 imirkin_: tagr: whoa, wait. there *is* CSC stuff on the later gpu's