Hasty Briefsbeta

Bilingual

Bauplan – Git-for-data pipelines on object storage

a year ago
  • #python
  • #serverless
  • #data-platform
  • Bauplan is a Pythonic data platform for large-scale data pipelines and git-for-data over S3 data lakes.
  • It allows running ML workflows, AI applications, and data transformation pipelines without managing infrastructure.
  • Built by ML and data engineers to simplify cloud infrastructure management.
  • Simple: Write pipelines as Python functions without containerization or Spark.
  • Robust: Features Git-for-data and Refs for versioning, reproducibility, and auditability.
  • Pythonic by design: No DSLs, YAML, or Spark required.
  • Work with tables in S3: Convert Parquet/CSV to Iceberg tables with ACID transactions.
  • Git-for-data: Create zero-copy branches for safe collaboration.
  • Serverless pipelines: Run stateless Python functions in the cloud.
  • SQL everywhere: Run queries across branches and tables in S3.
  • CI/CD for data: Automate testing and deployment of pipelines.
  • Version and reproduce with Refs: Track pipeline runs for reproducibility and audits.