Towards Scalable Dataframe Systems
8 hours ago
- #data-management
- #scalability
- #dataframes
- Dataframes are popular for data representation and analysis but face performance issues on large datasets.
- There is significant ambiguity in dataframe semantics, highlighting a need for clearer definitions and standardization.
- The paper proposes a simple data model and algebra for dataframes to establish a foundational framework for the field.
- Modin is presented as a scalable implementation of the pandas API, demonstrating potential for improved performance.
- An agenda of open research opportunities is outlined, focusing on features like flexible schemas, ordering, and fluid data/metadata.
- The paper emphasizes the unique challenges of dataframes, requiring advancements in data management practices.