Hasty Briefsbeta

dgsh – Directed Graph Shell

6 hours ago
  • #data processing
  • #parallel computing
  • #Unix shell
  • dgsh is a Unix-style shell based on bash, designed for constructing sophisticated and efficient big data processing pipelines.
  • It allows the specification of pipelines with non-linear, non-uniform operations forming directed acyclic process graphs.
  • dgsh introduces new ways for inter-process communication, including commands like comm, cat, and paste adapted for multiple inputs and outputs.
  • The shell supports multipipe blocks for asynchronous command execution, enabling parallel processing.
  • dgsh scripts follow bash syntax with added multipipe blocks, allowing recursive composition of commands.
  • Examples demonstrate dgsh's capabilities, including comparing compression utilities, processing Git history, and analyzing C source code metrics.
  • Adapted Unix tools in dgsh support multiple inputs and outputs, enhancing their utility in pipeline construction.
  • Installation requires commands like make, gcc, and git, with testing requiring additional utilities like wbritish and curl.
  • The suite has been tested on Debian, Ubuntu, FreeBSD, and Mac OS X, with a Cygwin port in progress.
  • GraphViz can visualize dgsh process graphs, aiding in pipeline design and debugging.