S3 Files and the changing face of S3
4 hours ago
- #S3
- #Cloud Storage
- #Data Integration
- Andy Warfield's experience at UBC with genomics researchers highlighted the data friction of copying large datasets back and forth, motivating a solution for seamless data access.
- S3 Files integrates Amazon EFS into S3, allowing direct filesystem access to S3 data, addressing the divide between object storage and file-based tools.
- The design uses a 'stage and commit' approach, separating file and object semantics to preserve the strengths of both, with an explicit boundary for synchronization.
- S3 Files enables mounting S3 buckets or prefixes as filesystems on EC2, containers, or Lambda, with changes synced to S3 and lazy hydration for large datasets.
- Challenges included consistency, authorization differences, namespace semantics, and performance optimizations, leading to trade-offs like delayed commits and naming restrictions.
- The feature supports high-throughput reads via 'read bypass' and scales for large datasets, while acknowledging limitations like expensive renames and object key compatibility.
- S3's evolution with Tables, Vectors, and Files reflects a focus on diverse data access patterns, aiming to reduce storage friction and accelerate application development.