mac dump_syms: Support .dSYMs > 4GB (partially)

Even 64-bit Mach-O (MH_MAGIC_64 = 0xfeedfacf) is not a fully 64-bit file format. File offsets in sections are stored in 32-bit fields, with Mach-O writers typically truncating offsets too large to fit to just their low 32 bits. When a section begins at a file offset >= 4GB, dump_syms would produce an error such as: Google Chrome Framework.dSYM/Contents/Resources/DWARF/Google Chrome Framework: the section '__apple_names' in segment '__DWARF' claims its contents lie outside the segment's contents As a workaround, this implements the strategy I first described in https://crbug.com/940823#c22. Segment file offsets are stored in 64-bit fields. Because segments contain sections and must load contiguously, it’s possible to infer a section’s actual offset by computing its load address relative to its containing segment’s load address, and treating this as an offset into the containing segment’s file offset. For safety, this is only done for 64-bit segments (LC_SEGMENT_64) where the 32-bit section offset stored in the Mach-O file is equal to the low (truncated) 32 bits of the section offset recomputed per the above strategy. Beware that this does not provide full “large file” support for 64-bit Mach-O files. There are other file offsets within Mach-O files aside from section file offsets that are stored in 32-bit fields even in the 64-bit format, including offsets to symbol table data (LC_SYMTAB and LC_DYSYMTAB). No attempt is made to recover correct file offsets for such data because, at present, such data is always stored by dsymutil near the beginning of .dSYM files, within the first 4GB. If it becomes necessary to address these other offsets, it should be possible to recover these offsets by reference to the __LINKEDIT segment that normally contains them, provided that __LINKEDIT doesn’t span more than 4GB, according to the strategy discussed at the bottom of https://crbug.com/940823#c22. Although this is sufficient to allow dump_syms to interpret Chromium .dSYM files that exceed 4GB, be warned that these Mach-O files are still technically malformed, and most other tools that consume Mach-O files will continue to have difficulties interpreting these large files. As further warning, note that should any individual DWARF section exceed 4GB, internal section offsets will be truncated irrecoverably, unless and until the toolchain implements support for DWARF64. https://bugs.llvm.org/show_bug.cgi?id=14969 With this change, dump_syms is able to correctly recover file offsets from and continue processing a .dSYM file with length 4530593528 (4321MB), whose largest section (__DWARF,__debug_info = .debug_info) has size 0x8d64c0b8 (2262MB), and which contains four sections (starting with __DWARF,__apple_names) beginning at file offsets >= 4GB. Bug: chromium:940823, chromium:946404 Change-Id: I23f5f3b07773fa2f010204d5bb53b6fb1d4926f7 Reviewed-on: https://chromium-review.googlesource.com/c/breakpad/breakpad/+/1541830 Reviewed-by: Robert Sesek <rsesek@chromium.org> Reviewed-by: Mike Frysinger <vapier@chromium.org>
author: Mark Mentovai <mark@chromium.org> 2019-03-28 16:07:39 -0400
committer: Mark Mentovai <mark@chromium.org> 2019-03-28 20:43:54 +0000
commit: b4a0eb2d06fa69a29fc0c87758fe294ccf5a7d99 (patch)
tree: ee28a1fc9b43534693d47e037a3eef426d5997b4 /src/common
parent: Fix dump_syms unit tests on Windows. (diff)
download: breakpad-b4a0eb2d06fa69a29fc0c87758fe294ccf5a7d99.tar.xz
2 files changed, 34 insertions, 10 deletions
diff --git a/src/common/mac/macho_reader.cc b/src/common/mac/macho_reader.cc
index 52f3c411..b4e3192a 100644
--- a/src/common/mac/macho_reader.cc
+++ b/src/common/mac/macho_reader.cc
@@ -38,6 +38,8 @@
 #include <stdio.h>
 #include <stdlib.h>
 
+#include <limits>
+
 // Unfortunately, CPU_TYPE_ARM is not define for 10.4.
 #if !defined(CPU_TYPE_ARM)
 #define CPU_TYPE_ARM 12
@@ -344,8 +346,8 @@ bool Reader::WalkLoadCommands(Reader::LoadCommandHandler *handler) const {
         cursor
             .Read(word_size, false, &segment.vmaddr)
             .Read(word_size, false, &segment.vmsize)
-            .Read(word_size, false, &file_offset)
-            .Read(word_size, false, &file_size);
+            .Read(word_size, false, &segment.fileoff)
+            .Read(word_size, false, &segment.filesize);
         cursor >> segment.maxprot
                >> segment.initprot
                >> segment.nsects
@@ -354,8 +356,8 @@ bool Reader::WalkLoadCommands(Reader::LoadCommandHandler *handler) const {
           reporter_->LoadCommandTooShort(index, type);
           return false;
         }
-        if (file_offset > buffer_.Size() ||
-            file_size > buffer_.Size() - file_offset) {
+        if (segment.fileoff > buffer_.Size() ||
+            segment.filesize > buffer_.Size() - segment.fileoff) {
           reporter_->MisplacedSegmentData(segment.name);
           return false;
         }
@@ -363,11 +365,11 @@ bool Reader::WalkLoadCommands(Reader::LoadCommandHandler *handler) const {
         // segments removed, and their file offsets and file sizes zeroed
         // out. To help us handle this special case properly, give such
         // segments' contents NULL starting and ending pointers.
-        if (file_offset == 0 && file_size == 0) {
+        if (segment.fileoff == 0 && segment.filesize == 0) {
           segment.contents.start = segment.contents.end = NULL;
         } else {
-          segment.contents.start = buffer_.start + file_offset;
-          segment.contents.end = segment.contents.start + file_size;
+          segment.contents.start = buffer_.start + segment.fileoff;
+          segment.contents.end = segment.contents.start + segment.filesize;
         }
         // The section list occupies the remainder of this load command's space.
         segment.section_list.start = cursor.here();
@@ -461,14 +463,14 @@ bool Reader::WalkSegmentSections(const Segment &segment,
   for (size_t i = 0; i < segment.nsects; i++) {
     Section section;
     section.bits_64 = segment.bits_64;
-    uint64_t size;
-    uint32_t offset, dummy32;
+    uint64_t size, offset;
+    uint32_t dummy32;
     cursor
         .CString(&section.section_name, 16)
         .CString(&section.segment_name, 16)
         .Read(word_size, false, &section.address)
         .Read(word_size, false, &size)
-        >> offset
+        .Read(sizeof(uint32_t), false, &offset)  // clears high bits of |offset|
         >> section.align
         >> dummy32
         >> dummy32
@@ -481,6 +483,24 @@ bool Reader::WalkSegmentSections(const Segment &segment,
       reporter_->SectionsMissing(segment.name);
       return false;
     }
+
+    // Even 64-bit Mach-O isn’t a true 64-bit format in that it doesn’t handle
+    // 64-bit file offsets gracefully. Segment load commands do contain 64-bit
+    // file offsets, but sections within do not. Because segments load
+    // contiguously, recompute each section’s file offset on the basis of its
+    // containing segment’s file offset and the difference between the section’s
+    // and segment’s load addresses. If truncation is detected, honor the
+    // recomputed offset.
+    if (segment.bits_64 &&
+        segment.fileoff + segment.filesize >
+            std::numeric_limits<uint32_t>::max()) {
+      const uint64_t section_offset_recomputed =
+          segment.fileoff + section.address - segment.vmaddr;
+      if (offset == static_cast<uint32_t>(section_offset_recomputed)) {
+        offset = section_offset_recomputed;
+      }
+    }
+
     const uint32_t section_type = section.flags & SECTION_TYPE;
     if (section_type == S_ZEROFILL || section_type == S_THREAD_LOCAL_ZEROFILL ||
             section_type == S_GB_ZEROFILL) {
diff --git a/src/common/mac/macho_reader.h b/src/common/mac/macho_reader.h
index 30db742d..145d17d1 100644
--- a/src/common/mac/macho_reader.h
+++ b/src/common/mac/macho_reader.h
@@ -175,6 +175,10 @@ struct Segment {
   // of this value are valid.
   uint64_t vmsize;
 
+  // The file offset and size of the segment in the Mach-O image.
+  uint64_t fileoff;
+  uint64_t filesize;
+
   // The maximum and initial VM protection of this segment's contents.
   uint32_t maxprot;
   uint32_t initprot;
author	Mark Mentovai <mark@chromium.org>	2019-03-28 16:07:39 -0400
committer	Mark Mentovai <mark@chromium.org>	2019-03-28 20:43:54 +0000
commit	b4a0eb2d06fa69a29fc0c87758fe294ccf5a7d99 (patch)
tree	ee28a1fc9b43534693d47e037a3eef426d5997b4 /src/common
parent	Fix dump_syms unit tests on Windows. (diff)
download	breakpad-b4a0eb2d06fa69a29fc0c87758fe294ccf5a7d99.tar.xz