From 0ff15b41edf1dfd50776877edce7cae6e757574f Mon Sep 17 00:00:00 2001 From: Andy Bonventre Date: Tue, 22 Sep 2015 17:29:52 -0400 Subject: [Docs] add markdown docs (converted from Wiki) BUG=none R=mark CC=google-breakpad-dev@googlegroups.com Review URL: https://codereview.chromium.org/1357773004 . Patch from Andy Bonventre . --- docs/processor_design.md | 230 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 230 insertions(+) create mode 100644 docs/processor_design.md (limited to 'docs/processor_design.md') diff --git a/docs/processor_design.md b/docs/processor_design.md new file mode 100644 index 00000000..c2af41a1 --- /dev/null +++ b/docs/processor_design.md @@ -0,0 +1,230 @@ +# Breakpad Processor Library + +## Objective + +The Breakpad processor library is an open-source framework to access the the +information contained within crash dumps for multiple platforms, and to use that +information to produce stack traces showing the call chain of each thread in a +process. After processing, this data is made available to users of the library. + +## Background + +The Breakpad processor is intended to sit at the core of a comprehensive +crash-reporting system that does not require debugging information to be +provided to those running applications being monitored. Some existing +crash-reporting systems, such as [GNOME](http://www.gnome.org/)’s Bug-Buddy and +[Apple](http://www.apple.com/)’s [CrashReporter] +(http://developer.apple.com/technotes/tn2004/tn2123.html), require symbolic +information to be present on the end user’s computer; in the case of +CrashReporter, the reports are transmitted only to Apple, not to third-party +developers. Other systems, such as [Microsoft](http://www.microsoft.com/)’s +[Windows Error Reporting](http://msdn.microsoft.com/isv/resources/wer/) and +SupportSoft’s Talkback, transmit only a snapshot of a crashed process’ state, +which can later be combined with symbolic debugging information without the need +for it to be present on end users’ computers. Because symbolic debugging +information consumes a large amount of space and is otherwise not needed during +the normal operation of software, and because some developers are reluctant to +release debugging symbols to their customers, Breakpad follows the latter +approach. + +We know of no currently-maintained crash-reporting systems that meet our +requirements, which are to: * allow for symbols to be separate from the +application, * handle crash reports from multiple platforms, * allow developers +to operate their own crash-reporting platform, and to * be open-source. Windows +Error Reporting only functions for Microsoft products, and requires the +involvement of Microsoft’s servers. Talkback, while cross-platform, has not been +maintained and at this point does not support Mac OS X on x86, which we consider +to be a significant platform. Talkback is also closed-source commercial +software, and has very specific requirements for its server platform. + +We are aware of Windows-only crash-reporting systems that leverage Microsoft’s +debugging interfaces. Such systems, even if extended to support dumps from other +platforms, are tied to using Windows for at least a portion of the processor +platform. + +## Overview + +The Breakpad processor itself is written in standard C++ and will work on a +variety of platforms. The dumps it accepts may also have been created on a +variety of systems. The library is able to combine dumps with symbolic debugging +information to create stack traces that include function signatures. The +processor library includes simple command-line tools to examine dumps and +process them, producing stack traces. It also exposes several layers of APIs +enabling crash-reporting systems to be built around the Breakpad processor. + +## Detailed Design + +### Dump Files + +In the processor, the dump data is of primary significance. Dumps typically +contain: + +* CPU context (register data) as it was at the time the crash occurred, and an + indication of which thread caused the crash. General-purpose registers are + included, as are special-purpose registers such as the instruction pointer + (program counter). +* Information about each thread of execution within a crashed process, + including: + * The memory region used for each thread’s stack. + * CPU context for each thread, which for various reasons is not the same + as the crash context in the case of the crashed thread. +* A list of loaded code segments (or modules), including: + * The name of the file (`.so`, `.exe`, `.dll`, etc.) which provides the + code. + * The boundaries of the memory region in which the code segment is visible + to the process. + * A reference to the debugging information for the code module, when such + information is available. + +Ordinarily, dumps are produced as a result of a crash, but other triggers may be +set to produce dumps at any time a developer deems appropriate. The Breakpad +processor can handle dumps in the minidump format, either generated by an +[Breakpad client “handler”](client_design.md) implementation, or by another +implementation that produces dumps in this format. The +[DbgHelp.dll!MiniDumpWriteDump] +(http://msdn2.microsoft.com/en-us/library/ms680360.aspx) function on Windows +produces dumps in this format, and is the basis for the Breakpad handler +implementation on that platform. + +The [minidump format] +(http://msdn.microsoft.com/en-us/library/ms679293%28VS.85%29.aspx) is +essentially a simple container format, organized as a series of streams. Each +stream contains some type of data relevant to the crash. A typical “normal” +minidump contains streams for the thread list, the module list, the CPU context +at the time of the crash, and various bits of additional system information. +Other types of minidump can be generated, such as a full-memory minidump, which +in addition to stack memory contains snapshots of all of a process’ mapped +memory regions. + +The minidump format was chosen as Breakpad’s dump format because it has an +established track record on Windows, and it can be adapted to meet the needs of +the other platforms that Breakpad supports. Most other operating systems use +“core” files as their native dump formats, but the capabilities of core files +vary across platforms, and because core files are usually presented in a +platform’s native executable format, there are complications involved in +accessing the data contained therein without the benefit of the header files +that define an executable format’s entire structure. Because minidumps are +leaner than a typical executable format, a redefinition of the format in a +cross-platform header file, `minidump_format.h`, was a straightforward task. +Similarly, the capabilities of the minidump format are understood, and because +it provides an extensible container, any of Breakpad’s needs that could not be +met directly by the standard minidump format could likely be met by extending it +as needed. Finally, using this format means that the dump file is compatible +with native debugging tools at least on Windows. A possible future avenue for +exploration is the conversion of minidumps to core files, to enable this same +benefit on other platforms. + +We have already provided an extension to the minidump format that allows it to +carry dumps generated on systems with PowerPC processors. The format already +allows for variable CPUs, so our work in this area was limited to defining a +context structure sufficient to represent the execution state of a PowerPC. We +have also defined an extension that allows minidumps to indicate which thread of +execution requested a dump be produced for non-crash dumps. + +Often, the information contained within a dump alone is sufficient to produce a +full stack backtrace for each thread. Certain optimizations that compilers +employ in producing code frustrate this process. Specifically, the “frame +pointer omission” optimization of x86 compilers can make it impossible to +produce useful stack traces given only a stack snapshot and CPU context. In +these cases, however, compiler-emitted debugging information can aid in +producing useful stack traces. The Breakpad processor is able to take advantage +of this debugging information as supplied by Microsoft’s C/C++ compiler, the +only compiler to apply such optimizations by default. As a result, the Breakpad +processor can produce useful stack traces even from code with frame pointer +omission optimizations as produced by this compiler. + +### Symbol Files + +The [symbol files](symbol_files.md) that the Breakpad processor accepts allow +for frame pointer omission data, but this is only one of their capabilities. +Each symbol file also includes information about the functions, source files, +and source code line numbers for a single module of code. A module is an +individually-loadble chunk of code: these can be executables containing a main +program (`exe` files on Windows) or shared libraries (`.so` files on Linux, +`.dylib` files, frameworks, and bundles on Mac OS X, and `.dll` files on +Windows). Dumps contain information about which of these modules were loaded at +the time the dump was produced, and given this information, the Breakpad +processor attempts to locate debugging symbols for the module through a +user-supplied function embodied in a “symbol supplier.” Breakpad includes a +sample symbol supplier, called `SimpleSymbolSupplier`, that is used by its +command-line tools; this supplier locates symbol files by pathname. +`SimpleSymbolSupplier` is also available to other users of the Breakpad +processor library. This allows for the use of a simple reference implementation, +but preserves flexibility for users who may have more demanding symbol file +storage needs. + +Breakpad’s symbol file format is text-based, and was defined to be fairly +human-readable and to encompass the needs of multiple platforms. The Breakpad +processor itself does not operate directly with native symbol formats ([DWARF] +(http://dwarf.freestandards.org/) and [STABS] +(http://sourceware.org/gdb/current/onlinedocs/stabs.html) on most Unix-like +systems, [.pdb files] +(http://msdn2.microsoft.com/en-us/library/yd4f8bd1(VS.80).aspx) on Windows), +because of the complications in accessing potentially complex symbol formats +with slight variations between platforms, stored within different types of +binary formats. In the case of `.pdb` files, the debugging format is not even +documented. Instead, Breakpad’s symbol files are produced on each platform, +using specific debugging APIs where available, to convert native symbols to +Breakpad’s cross-platform format. + +### Processing + +Most commonly, a developer will enable an application to use Breakpad by +building it with a platform-specific [client “handler”](client_design.md) +library. After building the application, the developer will create symbol files +for Breakpad’s use using the included `dump_syms` or `symupload` tools, or +another suitable tool, and place the symbol files where the processor’s symbol +supplier will be able to locate them. + +When a dump file is given to the processor’s `MinidumpProcessor` class, it will +read it using its included minidump reader, contained in the `Minidump` family +of classes. It will collect information about the operating system and CPU that +produced the dump, and determine whether the dump was produced as a result of a +crash or at the direct request of the application itself. It then loops over all +of the threads in a process, attempting to walk the stack associated with each +thread. This process is achieved by the processor’s `Stackwalker` components, of +which there are a slightly different implementations for each CPU type that the +processor is able to handle dumps from. Beginning with a thread’s context, and +possibly using debugging data, the stackwalker produces a list of stack frames, +containing each instruction executed in the chain. These instructions are +matched up with the modules that contributed them to a process, and the +`SymbolSupplier` is invoked to locate a symbol file. The symbol file is given to +a `SourceLineResolver`, which matches the instruction up with a specific +function name, source file, and line number, resulting in a representation of a +stack frame that can easily be used to identify which code was executing. + +The results of processing are made available in a `ProcessState` object, which +contains a vector of threads, each containing a vector of stack frames. + +For small-scale use of the Breakpad processor, and for testing and debugging, +the `minidump_stackwalk` tool is provided. It invokes the processor and displays +the full results of processing, optionally allowing symbols to be provided to +the processor by a pathname-based symbol supplier, `SimpleSymbolSupplier`. + +For lower-level testing and debugging, the processor library also includes a +`minidump_dump` tool, which walks through an entire minidump file and displays +its contents in somewhat readable form. + +### Platform Support + +The Breakpad processor library is able to process dumps produced on Mac OS X +systems running on x86, x86-64, and PowerPC processors, on Windows and Linux +systems running on x86 or x86-64 processors, and on Android systems running ARM +or x86 processors. The processor library itself is written in standard C++, and +should function properly in most Unix-like environments. It has been tested on +Linux and Mac OS X. + +## Future Plans + +There are currently no firm plans or timetables to implement any of these +features, although they are possible avenues for future exploration. + +The symbol file format can be extended to carry information about the locations +of parameters and local variables as stored in stack frames and registers, and +the processor can use this information to provide enhanced stack traces showing +function arguments and variable values. + +On Mac OS X and Linux, we can provide tools to convert files from the minidump +format into the native core format. This will enable developers to open dump +files in a native debugger, just as they are presently able to do with minidumps +on Windows. -- cgit v1.2.1