00:49imirkin: karolherbst: were you still planning on testing out my version of the limm patch?
00:49imirkin: or should i just push?
00:50karolherbst: imirkin: well I still plan to test them, but if ou feel confident you can push it. Did you run piglit or something?
06:26nikk_: Hey Lyude you there? I was interested to get a better understanding of https://trello.com/c/4xhrtohN/201-write-a-proper-mmio-scanning-tool-for-searching-tracesavailable-source
07:30nikk_: Which of the following is better? Out of tree compilation or full kernel installation?
07:32pmoreau: You will need a full kernel installation if the out of tree compilation fails due to your system kernel being too old.
07:33nikk_: Oh okay👍🏻
07:33pmoreau: But then, once you have a kernel against which you can compile and run, the out of tree is the better option imho.
07:53nikk_: pmoreau: so I have Ubuntu 16.04 on my machine and a fermi gpu. So now i should follow this right? https://nouveau.freedesktop.org/wiki/UbuntuPackages/
07:54pmoreau: I didn’t even know we had those pages :o
07:56pmoreau: There might be some information in there that is still relevant, but most of the information seems to be from 2007-2009.
07:57nikk_: pmoreau: So what do you recommend? Where can I find the complete steps?
07:58pmoreau: I don’t know which kernel you are currently using on your 16.04 machine, but it will most likely be too old to use with the out of tree module.
07:59pmoreau: https://nouveau.freedesktop.org/wiki/InstallNouveau/ has most steps in it. Start with compiling a full kernel, check that it works fine, and then compile the out of tree against it.
07:59pmoreau: Depending on what you want to do, you might want to compile Mesa as well, and install it in a prefix as described by that same document.
08:47nikk_: pmoreau: for this task https://trello.com/c/4xhrtohN/201-write-a-proper-mmio-scanning-tool-for-searching-tracesavailable-source
08:47nikk_: Which tool do you suggest to get register names from physical address and data values?
08:48pmoreau: `lookup`, from envytools
08:49pmoreau: There is probably a more efficient way to get that information, and you might want to look at `demmio` to see how it does it.
08:51nikk_: Ya exactly I was searching for demmio but could not find it
08:51pmoreau: and for lookup: https://github.com/envytools/envytools/blob/master/rnn/lookup.c
12:11karolherbst: imirkin: I will run piglit with your limm patches today on a gk106 and gp107
13:21imirkin: karolherbst: thanks
14:37pendingchaos: imirkin: it seems pixld returns the d3d11 standard patterns except for with 8x msaa, which it seems to return an unknown pattern for
14:37pendingchaos: on the blob, both gl_SamplePosition and interpolateAtSample seem to read from a constbuf like with gl_SamplePosition on nvc0 (though gl_SamplePosition seems to return standard positions on the blob)
14:48imirkin_: pendingchaos: there are 2 8x "standards"
14:49imirkin_: pendingchaos: are you saying PIXLD doesn't respect the sample location settings?
14:49pendingchaos: I'm talking about the 8x pattern on https://msdn.microsoft.com/en-us/library/windows/desktop/ff476218(v=vs.85).aspx
14:49pendingchaos: it seems that way
14:50imirkin_: keep in mind that gl_SamplePosition gets inverted on a winsys fb ... i think? not actually 100% sure
14:51imirkin_: you can adjust regular fb's too btw -- ARB_clip_control has a thing
14:52pendingchaos: what was the other standard 8x pattern you mentioned?
14:52imirkin_: check the get_sample_position comments
14:56pendingchaos: it looks like ms8 reordered
14:57imirkin_: oh. your locations are just totally different?
14:57pendingchaos: I think so
14:57pendingchaos: in the 8x msaa case
14:58pendingchaos: there are a few locations that they both the pixld and d3d11 patterns share though
14:59imirkin_: is it a mirror perhaps?
15:00pendingchaos: after comparing them while ignoring the signs, I don't think so
15:01pendingchaos: these are the positions I'm getting with pixld: https://hastebin.com/odamemuxes.txt
15:05pendingchaos: using interpolateAtSample to get the sample positions, I think it might somehow be how I'm using it to implement gl_SamplePosition
15:05pendingchaos: odd that it would work fine for 2x and 4x msaa
15:07pendingchaos: (fine being that it returns the standard positions like hack with interpolateAtSample)
15:15imirkin_: iirc you just divide by 16 (float), and then add -0.5
15:15imirkin_: [to go from what pixld returns to gl_SamplePosition]
15:16pendingchaos: turns out it's 0.12 fixed point
15:16imirkin_: but the whole point of this was to get the "proper" locations from the hw
15:16imirkin_: oh. hah. yeah, maybe.
15:49karolherbst: imirkin_: no regressions on nve6
15:49imirkin_: it should fix some tests i haven't pushed out yet
15:50karolherbst: ahh, nice
17:10Lyude: Hey uh; if nikk comes on could someone let them know that they might need to wait a while before signing off so I can respond to them? :(
17:10Lyude: would like to work with them but every time I come on to respond they have already left
17:13dutt004: I was trying to do mmiotrace on both fermi and kepler....my fermi trace is done fine...but when I am trying to do it on kepler it seems to get stuck ..Does someone have any clue why this can happen?
17:16Lyude: dutt004: could you give a bit more info? what exactly is getting stuck
17:19Lyude: but: is a command hanging somewhere or something like that?
17:21dutt004: Sure....so I followed this link to perform the tracing...https://www.kernel.org/doc/Documentation/trace/mmiotrace.txt .....So first I wrote a very simple CUDA program....and then carried out the instructions as mentioned in the link.....during the GPU buffer creation it gets stuck...I used the demmio tool from envytools to dump whatever the trace is and I saw that the trace does not progress after a point....I am using kepler K40C ....I don't know if
17:21dutt004: this is enough..
17:22Lyude: ahhhh; so the cuda application hangs when you're doing an mmiotrace but not otherwise?
17:23dutt004: ummm...I have not done it on X...I turned the X off while doing this
17:23Lyude: I mean-when running the CUDA application, it doesn't hang if you don't have mmiotrace running. But if you are doing a trace, the application and the mmiotrace hang?
17:24dutt004: and the trace works well on fermi
17:24Lyude: karolherbst: ^ any idea?
17:24dutt004: could it be because of the trace buffer size?
17:25Lyude: that's a possibility, might be worth tripling/quadroupling it
17:26dutt004: hmmm...okay I actually increased it to 128 MB buffer.....thought it would be enough...
17:31pmoreau: dutt004: I don’t know what you are trying to achieve, but I’m not sure there will be that much in the MMIO trace, if you are only tracing a CUDA application, once the card has been initialised. MMT would give you more data, like the binary of your CUDA kernel and various other configurations. Though MMT no longer works with new versions of the NVIDIA driver.
17:31pmoreau: Though that doesn’t explain why it hangs.
17:35dutt004: pmoreau: I was just trying to dump the MMIOs ...I am also going to do MMT....I am using 384.111 version currently....do you know upto which it works?
17:37dutt004: also does MMT depends on cuda version as well?
17:49nikk_: Hey lyude!
17:49Lyude: sorry I haven't responded previously; you always left before I got a chance to respond :P
17:50nikk_: It is happening because of different time lines..
17:51Lyude: https://trello.com/c/4xhrtohN/201-write-a-proper-mmio-scanning-tool-for-searching-tracesavailable-source so you said you were interested in this, right?
17:51nikk_: Yeah right!
17:51Lyude: so; I guess the first place to start would be asking if you know what an mmiotrace is? (it's ok if not!)
17:52nikk_: Read a little about it. But some more info about it will definitely be great!
17:55Lyude: sure-so; basically one of the methods you use for controlling a nvidia GPU is mmio, memory mapped io. It's basically a region of the memory that's been mapped to a set of device registers on the nvidia GPU. So like, it could (this is probably not a real address, but you get the idea) have the kernel have it mapped to 0x8000 or something like that, and if you wanted to write register 0x100 on the GPU, you'd
17:55Lyude: just write to 0x8100 in the memory from the kernel driver. so, some of the things this is used for is bringing up various hw engine blocks on the GPU, display hardware, etc.
17:56Lyude: as one might imagine; when reverse engineering the nvidia driver it's very useful for us to be able to record the mmio reads and writes being done by the nvidia driver since we can use them to figure out how we're supposed to handle something with the GPU like reclocking, or just reading a fan sensor. in addition to that; we've got a couple of collections of mmiotraces going across nearly all (but not all,
17:56Lyude: unfortunately!) of the nvidia GPUs made in the past few years
17:58Lyude: so: that brings me to the where this tool would come in. when nvidia GPUs introduce new registers for various features, it's rather useful to know things like "which generation of hw introduced this register first", "what are the various values written to this register on each generation of hw", things like that. And we pretty much get a large chunk of this information from scanning through mmiotraces to see
17:59Lyude: which of them contain interactions with the register in question
18:00Lyude: the problem is we don't really have any way of doing this that isn't rather slow. scanning through mmiotraces means having to extract each one, scan for some text, go to the net one and repeat, etc. and with the size of mmiotraces this can take quite a while. My scripts for doing mmiotrace scans usually take at least 10-15 minutes to run on my laptop
18:00Lyude: (that's per-scan, by the way)
18:00Lyude: so, what I'd like this tool to do is basically go through all of the mmiotraces, and actually create a database of sorts from them that can be searched far more quickly
18:03Lyude: so like, ideally you'd be able to run something like `mmioscan 0x100`, then have it bring up the results of which generations read/wrote to that register (and which values they read/wrote), and hopefully in a much shorter timeframe then 10 minutes
18:04Lyude: there's other smaller projects in scope then this by the way!
18:05Lyude: if you would feel more comfortable with trying on something small to familiarize yourself with nvidia hardware first, that would be fine as well :)
18:06cliff-hm: sounds like we all just got educated some :)
18:13nikk_: Okay so this gives me a much clearer idea of the concept!
18:14nikk_: And yes I feel starting with something smaller will be useful.. so what do suggest?
18:14Lyude: nikk_: well; what nvidia hw do you currently have access to?
18:14nikk_: So I have a Linux based machine and fermi gpu in it
18:16adya: Hey Lyude, I am working along with nikk_
18:16adya: We now have a clearer idea about the task!
18:17Lyude: ooh, neat. is this for a school project? (also, I am currently looking to see if we've got any easy tasks open for fermi hw right now)
18:17adya: It's good to get started with this. The task is interesting :) So now if we start with something smaller what do you suggest, with the hw that we have
18:18dutt004: pmoreau: I did demmt and it failed....
18:18adya: This is for an undergrad college project
18:19Lyude: aaah! I figured it might be :)
18:20Lyude: so- a lot of the tasks we've got that would be good starter stuff are unfortunately on newer hardware (maxwell1/2 and forward) , I did find this however https://trello.com/c/YnhHNblO/100-nv50-nvc0-fix-fbo-blending-formats-glrgb10
18:21Lyude: this would be a rather small one; but fixing bugs is always a rather good way of familiarizing yourself with the stack
18:21pmoreau: dutt004: IIRC, up to 370.xx were working, but I’m not entirely sure.
18:22nikk_: Lyude yes we are simultaneously working with this task as well. But have not yet got a clear picture of this.
18:23adya: Thanks Lyude! Could you give pointers to start off with this task? We had checked this link for installation: https://nouveau.freedesktop.org/wiki/InstallNouveau/
18:23pmoreau: dutt004: And no, I doubt MMT does not depend on the CUDA version. MMT does not intercept CUDA calls or anything, but what the driver underneath submits to the GPU.
18:25Lyude: adya: hm, first thing would probably be getting piglit setup and finding out which piglit test is actually showing that failure
18:25Lyude: imirkin_: https://trello.com/c/YnhHNblO/100-nv50-nvc0-fix-fbo-blending-formats-glrgb10 it seems you filed this one; any idea which piglit test triggers this?
18:26dutt004: pmoreau: okay..thanks so mcuh...let me downgrade the driver and see ...also I was wondering is there an assembler for falcon..I see that there is a dissambler in envytools repo...but just curious if there is an assembler
18:28adya: Lyude, I think we need to familiarize ourselves with the piglit tool, since the task requires us to contribute to mesa
18:29Lyude: mm, that would probably be a good idea since most of the easy taks for later hw will need piglit/mesa contributions as well
18:30pmoreau: dutt004: I’m not sure. I think mwk (and maybe karolherbst?) were looking into that.
18:31adya: Ok! Sure! Thank you so much Lyude!
18:31Lyude: no problem :), let me know if there's anything else I can help out with
18:31nikk_: Thanks Lyude !!
18:32dutt004: pmoreau: Thanks so much for the updates...also I am going to poke a bit on the kepler mmiotrace issue ..I am not sure why is that happening....presently I am going to live with fermi :(
18:33pmoreau: So you do not have the MMIO trace issue on Fermi? Weird :-/
18:34nikk_: Which GPUs are required then for this task like to get mmio traces?
18:35pmoreau: Usually all should work
18:36dutt004: yeah...okay there is one thing...so whatever issue I mentioned happened with only one card in the system... But I tried to do the trace with 2 cards in the system ....and in the cuda program I mentioned which GPU to run on... then the trace gets hanged even for fermi ....
18:43nikk__: @lyude so for https://trello.com/c/YnhHNblO/100-nv50-nvc0-fix-fbo-blending-formats-glrgb10 , i found the piglit results : https://people.freedesktop.org/~imirkin/nvc0-comparison/problems.html
18:44nikk_: pmoreau: Lyude guide us getting a better picture of this task please
18:44Lyude: alright; i actually will probably be afk for a bit unfortunately as I've got a meeting to attend in 15
18:47nikk__: Oh no problem.. we can discuss this once you're free. Thanks
19:11imirkin_: Lyude: it says it in the title...
19:11imirkin_: too obvious?
19:11Lyude: imirkin_: i gave them a bit more fo a pointer for the st mesa stuff
19:12imirkin_: this is not fixable in st/mesa
19:12imirkin_: needs to be done in the driver backend
19:12Lyude: yeah- that's what i mentioned
19:12Lyude: i need to go back to meeting
19:16adya: Hey imirkin_ can you please give us a flow structure for implementation?
19:17imirkin_: not really
19:17imirkin_: if you have concrete questions, happy to answer
19:17imirkin_: (no clue what a "flow structure" is, among other things)
19:22karolherbst: imirkin_: I think you know what it is, but you don't dare to think it is really the thing you think it is
19:23nikk_: Okay so actually we got a basic understanding of frame buffers and fbo.. completed the installations.. so now we are unable to figure out the exact file we have to go through or add patches to for this..
19:25adya: If you could guide us by providing some links for details of this task?
19:30imirkin_: karolherbst: i dunno, i've never heard that term before
19:30imirkin_: adya: there are no links. no docs. no nothing. dive into the code, ask questions.
19:31karolherbst: imirkin_: high level flow overview of the to be implemented whatever
19:32imirkin_: so ... i have to go figure out how to do it first?
19:32imirkin_: yeah ok. i won't be doing that.
19:32imirkin_: the whole point of other people doing things is that i don't have to.
19:32karolherbst: you know, like 50 years ago where smart people thought about _how_ to implement things and others did the implementing ;)
19:33imirkin_: yeah, that's a lot of work
19:33imirkin_: easier to just do it yourself
19:33imirkin_: either way, i have neither the time nor inclination to do it for nouveau
19:34imirkin_: and definitely not without some strong demonstration of interest and competence from the other side
19:47nlundsten: hello, im trying to investigate a gpu temperature issue with nouveau, could someone walk me through it? -- I have a 980Ti
19:47nlundsten: im seeing idle/mostly idle temps of > 80c
19:50nlundsten: there is no search tool or faq at the freedesktop link in the topic .. :(
19:50imirkin_: what's the question?
19:50imirkin_: note that on GM20x+, we have no ability to control the fan
19:50nlundsten: why would my temps be so insane?
19:50imirkin_: so it's whatever the firmware wants to do
19:50imirkin_: (the signed firmware)
19:51nlundsten: imirkin_: so youre saying my firmware is happy with 82c? (assuming there are no airflow/cooling issues) ?
19:51imirkin_: do the fans sound as if the jet is preparing for take-off?
19:51nlundsten: i get like 45c in windows.. assuming its firmware controlled as well, as i dont have any custom fan curves/software
19:51nlundsten: nah, its silent, and possibly not even spinning
19:52nlundsten: (i could check)
19:52imirkin_: then yes, the firmware's happy
19:52imirkin_: basically either it's unhappy, but is unable to reduce temp any further
19:52imirkin_: or it's happy
19:52nlundsten: hmm, that just seems excessive
19:52imirkin_: it's different firmware than on windows
19:52imirkin_: they release a special one just for nouveau
19:52imirkin_: with 99% of the functionality taken out
19:53nlundsten: we're talking *FIRM*ware ?
19:53nlundsten: separate ones for different os' ?
19:53imirkin_: i suspect their blob driver uses the same one as the windows driver
19:53imirkin_: it's uploaded by the operating system
19:54imirkin_: the one that they made available for redistribution in linux-firmware is different though
19:54imirkin_: GM20x+ requires signed firmware in order to change fan speeds
19:54imirkin_: so we're kinda stuck with what they hand out
19:54cliff-hm: maybe use term, microcode - flashable firmware updates.
19:54imirkin_: firmware != microcode?
19:54imirkin_: it's all the same to me
19:55imirkin_: a sequence of bits provided to a piece of hardware in order to assist in its operation
19:55nlundsten: so flashing a newer/different firmware MAY fix it?
19:55cliff-hm: depends on how old-school sysadmin you are - when flashing firmware meant booting from a floppy disk :)
19:55imirkin_: it's uploaded by the OS, not part of the vbios or anything else onboard
19:57nlundsten: oh, hm.
19:58imirkin_: i've started writing a program for extracting that stuff out of the blob driver
19:58cliff-hm:had plenty of fun with microcode updates with Intel earlier this year.
19:58imirkin_: but i'm at the point where i need feedback from ... people who knwo more than i do
19:59nlundsten: is there any other "microcode"/"firmware" i could use with nouveau/proprietary nvidia?
19:59imirkin_: the blob driver should work quite well
19:59nlundsten: im in the dark here, i just dont like looking at 82c on an (mostly) idle video card
20:00imirkin_: if you're looking for open-source support, i strongly recommend AMD for your next purchase.
20:00nlundsten: blob driver.. check.. not a clue what that is..
20:00nlundsten: imirkin_: lol
20:00imirkin_: nvidia-made driver.
20:00imirkin_: == blob
20:00nlundsten: imirkin_: oh, the proprietary/non open source.. whatever terminology
20:00imirkin_: since you're just downloading some codes off the interwebs, and sticking that into your kernel at ring 0
20:01imirkin_: and dynamically linking their libs into any graphical application
20:02imirkin_: some people are perfectly happy with this, others are uncomfortable
20:02imirkin_: most of the former group run windows
20:25nlundsten: imirkin_: just to clarify, i just need to find+follow an install guide for the nvidia proprietary driver for my flavor of linux, correct?
20:27cliff-hm: likely yes.
20:29cliff-hm: pretty much first link for 'nvidia linux download' - http://www.nvidia.com/object/unix.html - it is a shell script installer, typically wants kernel-devel packages, and maybe other deps.
20:35nlundsten: cliff-hm: thanks
20:35karolherbst: imirkin_: limm patches look good on gp107 as well. You can add a tested-by me
20:36imirkin_: karolherbst: awesome thanks
20:46imirkin_: (i'll push tonight)
23:06Thog: Hello, do someone know why would a TSEC not execute a single instruction on a t210 even with (in theory) a proper firmware setup? (DMA done, uc entry set and cpuctrl = 2)
23:12karolherbst: Thog: in what way? Usually if the falcons are in HS or LS mode you can't really know if they execute anything, or can you?
23:13imirkin_: Thog: lots of reasons ... is it enabled in PMC.ENABLE?
23:14imirkin_: did it throw an unacknowledged interrupt?
23:18Thog: I'm pretty new to this... how can I know that?
23:19imirkin_: i dunno. how are you operating these things?
23:20imirkin_: are you driving the mmio manually? using a driver?
23:24Thog: manually. I'm trying to make this code work https://github.com/Atmosphere-NX/Atmosphere/blob/master/fusee/fusee-primary/src/hwinit/tsec.c#L37 using the t210 bootrom 0day that was published recently
23:25imirkin_: ok. if you're doing stuff manually, there's like a bunch of stuff that needs to be done ;)
23:26imirkin_: anyways, you may want to ask the folks who put that out there
23:26imirkin_: rather than here
23:27karolherbst: Thog: you want to take the bootrom and just execute it on the TSEC?
23:29Thog: no, I have a firmware for the falcon and want to execute it. I have reversed how nintendo have done it and everything is simillar to the code that I have linked
23:29karolherbst: uhm, well, I think not, but what kind of firmware do you plan to put there?
23:30karolherbst: sure, but it this firmware signed?
23:30karolherbst: or are you planning in running in non secure mode?
23:30karolherbst: you also have to kind of put the signature and everything into a pre bootloader for the TSEC so that it actually switches into a higher security mode and so on
23:31karolherbst: usually the secure falcon binaries have some weird header which is parsed by the driver and then a 0x100 big unsecure area + signed code following that
23:32karolherbst: and in the unsecure area you setup the signature and how big the secure code is, then jump into 0x100 to start the validation of that
23:32karolherbst: if anything here screws up the falcon locksdown
23:33karolherbst: Thog: you should first try to execute just some random code without trying to run some official or self written TSEC firmware, because if you don't know what you do, you screw up
23:33karolherbst: and no way to debug
23:42Thog: karolherbst, I have tried to code a little thing that just write a value to SCRATCH1 but nothing is executed...
23:43Thog: by the way there is some documentation about how it is loaded and executed here http://switchbrew.org/index.php?title=TSEC
23:44Thog: Sorry to bother you ^^'
23:45karolherbst: nah, my final goal is to be able to run unsigned firmware on those stupid falcons... and I get the feeling that if just enough people hack on the switch we get there at some point :p
23:46karolherbst: Thog: where does it fail/hangs in the code you linked to?
23:48Thog: karolherbst, it exceeds the timeout imposed (line 85)
23:50karolherbst: Thog: at some point I started to write a debugger for those falcons, but it is in no way usable for real stuff
23:50karolherbst: the source might be useful to look at: https://github.com/karolherbst/nouveau_tools/blob/master/dbg_falcon.sh
23:51karolherbst: before you reach the jump into the signed code, you can actually kind of set break points and pause the execution
23:51karolherbst: I have some more local change with parsing the code out of the falcon and so on (and want to rewrite that in C or something)
23:52karolherbst: with that you might be able to check if you can single step or something
23:52karolherbst: or maybe nothing happens at all indeed
23:53Thog: I will take a look to it, thanks!
23:53karolherbst: otherwise most of the falcon stuff is here:
23:54karolherbst: "<reg32 offset="0x200" name="DEBUG_CMD" variants="GF119-">" is the debbug reg
23:54karolherbst: and with my script you kind of see how to use it
23:58karolherbst: mhh, I totally forgot that you don't have direct hw access, right?
23:58karolherbst: no idea how much of that is actually exposed and what is not
23:58karolherbst: and if some regs are different
23:59karolherbst: 0x1044 is SCRATCH1
23:59Thog: karolherbst, well I can execute whatever I want at the context of the bootrom
23:59karolherbst: it is falcon_base+0x044 with direct access