MmioTrace information for developers and trace analysts. Some of this information applies only to very old, unsupported mmiotrace versions.
How mmiotrace works inside
Kernel functions ioremap, ioremap_nocache and iounmap are replaced (for the driver module only) with wrappers to record MMIO areas. In ioremap the pages for the MMIO area are marked as not present, causing access to those addresses generate a page fault. In the page fault handler the mmio-traced addresses are detected and the attempted action recorded. The page is marked present and the page-faulting code is single-stepped to execute the instruction doing MMIO. Then, the page is marked again as not present.
The recording works by calling pre and post functions in mmio.ko before and after the single-stepping. mmio.ko uses relayfs and debugfs to relay the data to user space.
Unfortunately the legacy ISA address range 0xa0000 - 0x100000 cannot be traced this way because marking those pages as not present crashes the kernel. There can also be machine instructions that are not decoded properly, but so far they have been rare enough.
An Alternative idea
While discussing about x86 instruction emulation, Avi Kivity proposed the following, quote:
However there is a simpler (for you) solution: run the driver-to-be-reverse-engineered in a kvm guest, and modify kvm userspace to log accesses to mmio regions. This requires the not-yet-merged pci passthrough support. You can reverse engineer Windows drivers with this as well.
Reference: http://lkml.org/lkml/2008/4/5/13
Who would like to take that project?
Out-of-tree mmiotrace and analysis tools
The main out-of-kernel-tree source of mmio-trace is pq's (PekkaPaalanen) git tree: git://people.freedesktop.org/~pq/mmio-trace (browse)
There are two branches: master and binformat. Use branch master if you can, and binformat only if you absolutely have to. binformat is mostly for people wanting to reparse old binary mmio logs. Further description of the branches follows later.
out-of-kernel-tree mmio-trace consists of:
kernel module to process MMIO accesses, mmio.ko (deprecated)
user space program to record the logs, mmio-trace (deprecated)
user space program to interpret the logs, mmio-parse
user space program to re-execute the logs, mmio-replay
user space program to convert between some log formats, mmio-convert
user space program to validate testmmiotrace.ko output, test-check
kernel module to test mmiotrace, testmmiotrace.ko
In addition the inspected driver module has to have some symbols it calls from the kernel core rerouted, there is a script to do this.
CONFIG_DEBUG_FS and CONFIG_RELAY kernel options are required to be enabled in your kernel.
Old discussion: RFC mmiotrace full patch, preview 1, RFC mmiotrace full patch, preview 2, RFC 1/3 mmiotrace full patch, preview 3 http://lkml.org/lkml/2008/3/22/70
1. pq's master branch
Compared to older versions, this branch has redesigned data formats for both kernel-userspace communication via relayfs and on-disk log format. This is completely incompatible with the old versions.
The log format is text based, so it is readable without any special tools, and also avoids incompatibilities between architectures. Benefits of the rewritten kernel data format are variable length messages from kernel to user space and logging correct physical addresses of memory operations.
Additional features:
- records the correct physical address of an event
stores lspci output and /proc/bus/pci/devices information, the latter is used to convert physical addresses into offsets usable with Rules-ng
- mmio-parse on any arch can parse logs from any other arch
mmio-replay can replay register writes from logs. Dangerous!
This branch contains a version of mmio-convert written by jwstolk. This tool can convert dump files from one format to another, but the results are not always quite accurate. If you have an old dump that you cannot redo, take a look at mmio-convert.
2. pq's binformat branch
Here lives the older mmio-trace version which uses a binary on-disk log format. No new development happens here, but it is kept in somewhat working order for people who for one reason or another cannot use the master branch.
3. jrmuizel's git tree
The old source at jrmuizel's git tree: git://people.freedesktop.org/~jrmuizel/mmio-trace (browse) is no longer actively developed. jrmuizel is the one who resurrected mmio-trace and brought it to Nouveau when the developers did not yet have a way to see what the nvidia kernel blob did. pq's git tree is a fork of jrmuizel's master branch as of Aug 24th, 2007.
Usage notes (out-of-kernel-tree)
You can inject markers (text lines) into the trace log by echo 'X is running' > /proc/mmio-marker
- Only one active CPU is supported, do not use in multiprocessing system. You can disable extra processors/cores during boot time with a kernel argument or runtime via sysFS entries.
Do not restart ./mmio-trace because it will get old data from kernel buffers at first. You need to unload and reload mmio.ko before starting ./mmio-trace again to clear the kernel relay buffers.
After tracing, check your kernel log for cpu 0 buffer full!!! errors. If you have any, almost certainly some events were lost. You should redo the trace with bigger relay buffers. Relay buffer size is controlled by n_subbufs parameter to the mmio kernel module. The default value is 128, and means the number of 256kB pieces reserved for the buffer, so 128 means 32MB. Increase the number, until you no longer get the buffer full message.
Low ISA range tracing, experimental: apply 0001-ioremap-do-not-handle-the-low-ISA-range-specially.patch to your kernel, and insmod mmio.ko ISA_trace=1 and let PekkaPaalanen know what happened. If you do not patch your kernel, your machine will crash, if the blob maps the ISA range. This is an untested feature.
Old Nvidia driver with 2.6.25 - To build the nvidia driver with 2.6.25 then you need the patches from nvnews, http://www.nvnews.net/vbulletin/showthread.php?t=110088.
Another thing is that you will probably see messages like NVRM: bad caching on address XXXXXXX: actual 0x173 != expected 0x17b in dmesg. This didn't happen with 2.6.25-rc6 but it now exists in 2.6.25-rc7 because of commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=d546b67a940eb42a99f56b86c5cd8d47c8348c2a.
Rules-ng connection
RulesNG (CVS) is a database definition describing hardware registers and a set of tools for using the database. One of the tools is staticdb. mmio-parse supports staticdb plugins.
Staticdb is a Rules-ng database converted into C-code and compiled into a dynamically loaded library. It allows to convert raw addresses into symbolic register names, track indexed registers, and convert register values into a more human readable format. To use it, you need to get Rules-ng from CVS, and say make in staticdb/ directory to get the database library. The library is used with the command ./mmio-parse -m NN -s path/to/libnvidia-mmio.so where NN is your card type (in Rules-ng terms variant).
To do, suggestions and known issues
- kprobes has a generic instruction decoding facility, use that instead of homebrewn (or KVM), and use emulation instead of page faulting
- kmemcheck may grow per-cpu page table support thanks to the PaX team, copy that
copy other useful tricks from kmemcheck, like P4 REP issue fix. "< vegard> you need to toggle a bit in IA32_MISC_MSR or something like that" Test Vegard's patch: 0001-x86-fix-REP-handling-for-mmiotrace.patch
- support large pages
- complete instruction support: get rid of "unknown type"
- support tracing access from user space
- event filtering based on device and BAR, maybe address ranges
- Changes to the log format:
- backtraces
- cpu identifier?
- PPC support?
- think about how to trace ISA region; David suggests that commenting out the ISA region checks in arch/i386/mm/ioremap.c would be enough to enable ISA range tracing. Is it really so? Does it break something?
- improve mmio-parse
- (output fifo traces to replace renouvau data collector (requires tracing user space access))
- make the mapping from PCI resources to staticdb plugins configurable (currently hardcoded for Nvidia BAR 0)