dgsh – Directed Graph Shell
7 hours ago
- #data processing
- #parallel computing
- #Unix shell
- dgsh is a Unix-style shell based on bash, designed for constructing sophisticated and efficient big data processing pipelines.
- It allows the specification of pipelines with non-linear, non-uniform operations forming directed acyclic process graphs.
- dgsh introduces new ways for inter-process communication, including commands like comm, cat, and paste adapted for multiple inputs and outputs.
- The shell supports multipipe blocks for asynchronous command execution, enabling parallel processing.
- dgsh scripts follow bash syntax with added multipipe blocks, allowing recursive composition of commands.
- Examples demonstrate dgsh's capabilities, including comparing compression utilities, processing Git history, and analyzing C source code metrics.
- Adapted Unix tools in dgsh support multiple inputs and outputs, enhancing their utility in pipeline construction.
- Installation requires commands like make, gcc, and git, with testing requiring additional utilities like wbritish and curl.
- The suite has been tested on Debian, Ubuntu, FreeBSD, and Mac OS X, with a Cygwin port in progress.
- GraphViz can visualize dgsh process graphs, aiding in pipeline design and debugging.