S3 Files and the changing face of S3

4 hours ago

Andy Warfield's experience at UBC with genomics researchers highlighted the data friction of copying large datasets back and forth, motivating a solution for seamless data access.
S3 Files integrates Amazon EFS into S3, allowing direct filesystem access to S3 data, addressing the divide between object storage and file-based tools.
The design uses a 'stage and commit' approach, separating file and object semantics to preserve the strengths of both, with an explicit boundary for synchronization.
S3 Files enables mounting S3 buckets or prefixes as filesystems on EC2, containers, or Lambda, with changes synced to S3 and lazy hydration for large datasets.
Challenges included consistency, authorization differences, namespace semantics, and performance optimizations, leading to trade-offs like delayed commits and naming restrictions.
The feature supports high-throughput reads via 'read bypass' and scales for large datasets, while acknowledging limitations like expensive renames and object key compatibility.
S3's evolution with Tables, Vectors, and Files reflects a focus on diverse data access patterns, aiming to reduce storage friction and accelerate application development.

Hasty Briefsbeta