Hasty Briefsbeta

Bilingual

A Fake Shell for Pangenomics

20 hours ago
  • #high-performance
  • #vectorized-interpreter
  • #pangenomics
  • FlatGFA is a zero-copy pangenomics toolkit with identical in-memory and on-disk formats, enabling mmap-based loading and up to thousands of times speedup over odgi in some cases.
  • Two options for composing workflows—CLI shell scripts or Rust API—are limited: CLI restricts composition to files/pipes with overhead, while Rust API is idiosyncratic and complex for biologists.
  • A 'fake shell' named Flash is built as a vectorized interpreter that parses shell syntax, translates to an IR, and opportunistically replaces CLI commands with internal Rust functions to avoid I/O and enable optimizations.
  • Flash supports mixing external resources (files/pipes) with internal data structures (e.g., GFA stores), allowing unmodified shell scripts to run faster by deduplicating loads, avoiding intermediate files, and using efficient binary formats.
  • Optimizations in Flash include avoiding round trips through files, replacing instructions with cheaper alternatives, deduplicating loads, and using pre-converted binary formats, yielding a 28× speedup over odgi in an example workflow.