Gt: [experimental] multiplexing tensor framework
5 months ago
- #gpu
- #machine-learning
- #distributed-computing
- GT is an experimental multiplexing tensor framework for distributed GPU computing.
- It rejects the clunky lock-step paradigm used in ML research, embracing dynamic scheduling and asynchronous execution.
- GT consists of three components: clients (users), dispatcher (coordinator), and workers (one per GPU).
- Clients emit pure functional instructions, which the dispatcher rewrites to be GPU-aware and sends to workers.
- Workers asynchronously process instructions, optionally JIT compiling.
- Instruction streams are annotated with signals for sharding and hot paths for JIT hints.
- YAML configs supplement annotations for sharding and compilation, but annotations can be safely ignored.
- GT automatically spins up an asynchronous dispatching server and GPU worker in the background.
- Features include high-performance transport (ZeroMQ), autograd support, PyTorch-compatible API, and signal-based sharding.
- Additional features: real-time monitoring, instruction logging, AI-assisted development, and comprehensive documentation.
- Installation is via pip, and usage includes auto-server mode, tensor operations, autograd, and signal-based sharding.
- Examples demonstrate basic tensor operations, signal-based sharding, compilation directives, debug utilities, and visualization.
- GT is designed for simplicity, readability, and collaboration with AI coding assistants.
- Contributions are welcome, with detailed guidelines provided.
- Licensed under MIT.