Buckets and objects are not enough

5 days ago

Amazon S3 has been a popular cloud storage service for 20 years, used by many companies for diverse data types.
S3 organizes data into buckets, but lacks a first-class way to group related objects into datasets, relying on naming conventions and prefixes instead.
Prefixes serve as a human-readable hierarchy but are not inherently understood by S3, leading to management challenges.
The dataset abstraction is missing in S3, making it difficult to list, size, cost, archive, restore, or delete related objects as a unit.
External tools like catalogs and security solutions partially address the gap but often don’t manage storage directly or cover all datasets.
Large companies like Netflix and Pinterest build custom solutions, but most lack the resources, highlighting a structural gap in cloud storage platforms.
Cost overruns often stem from underlying governance issues, where unidentified or orphaned data accumulates due to inadequate tooling.
A need exists for a layer that discovers datasets within buckets, attaches metadata, and operates at the dataset level without requiring manual registration.
The author is building a solution to address this problem and invites contact from those experiencing similar storage management issues.

Hasty Briefsbeta