Hasty Briefsbeta

Bilingual

S3 Files and the changing face of S3

4 hours ago
  • #S3
  • #Cloud Storage
  • #Data Integration
  • Andy Warfield's experience at UBC with genomics researchers highlighted the data friction of copying large datasets back and forth, motivating a solution for seamless data access.
  • S3 Files integrates Amazon EFS into S3, allowing direct filesystem access to S3 data, addressing the divide between object storage and file-based tools.
  • The design uses a 'stage and commit' approach, separating file and object semantics to preserve the strengths of both, with an explicit boundary for synchronization.
  • S3 Files enables mounting S3 buckets or prefixes as filesystems on EC2, containers, or Lambda, with changes synced to S3 and lazy hydration for large datasets.
  • Challenges included consistency, authorization differences, namespace semantics, and performance optimizations, leading to trade-offs like delayed commits and naming restrictions.
  • The feature supports high-throughput reads via 'read bypass' and scales for large datasets, while acknowledging limitations like expensive renames and object key compatibility.
  • S3's evolution with Tables, Vectors, and Files reflects a focus on diverse data access patterns, aiming to reduce storage friction and accelerate application development.