Hasty Briefsbeta

Reverse Engineering iWork

a day ago
  • #Protocol Buffers
  • #Swift
  • #iWork
  • The app discussed ingests files but lacks a solution for parsing .key, .numbers, or .pages files without first exporting them to PDF or another format.
  • The author built a parser that processes iWork files natively, avoiding the need for conversion or server-side processing, inspired by a previous project porting Perl to WebAssembly for client-side metadata extraction.
  • Apple switched the iWork document format from XML to a binary format based on Google’s Protocol Buffers in 2013, likely to optimize performance for early iPhones and iPads.
  • The parser recovers protobuf message descriptors from Apple's Pages, Keynote, and Numbers executables, which define the structure of every message type in the documents.
  • The parsing process involves decompressing Snappy-compressed chunks, handling Apple's custom Snappy implementation, and processing protobuf messages with type IDs mapped to Swift classes.
  • Documents are structured with a main Index.zip or directory containing .iwa files, metadata, and referenced media files, with a two-pass loading system for merging incremental updates.
  • The parser supports various document elements like images, media (audio, video, 3D models), equations, tables, shapes, and charts, each with specific handling for their data structures and metadata.
  • Style inheritance and spatial information are key features, with styles resolved through inheritance chains and elements positioned using an infinite canvas model.
  • A visitor protocol provides callbacks for document traversal, offering fully resolved styles and decoded content in the correct reading order.
  • The code is available as a Swift package on GitHub, with documentation covering the visitor protocol and common use cases, though some features like legacy XML format support are not yet implemented.