Data Manipulation in Clojure Compared to R and Python
3 days ago
- #Clojure
- #data-science
- #comparison
- Comparison of data manipulation in Clojure (tablecloth), R (dplyr), Python (Pandas, Polars).
- Reading data: tablecloth interprets 'NA' as missing by default; R uses read_csv; Pandas has built-in NaN values; Polars uses null_values.
- Basic exploration: Commands for viewing rows, column names, selecting columns/rows, and sorting are compared across libraries.
- Advanced filtering: Selecting columns except one, columns starting with a string, numeric columns, and filtering rows by range.
- Reshaping data: Pivoting to longer format in tablecloth, dplyr, Pandas, and Polars.
- Creating/renaming columns: Adding columns based on existing ones and renaming columns, with emphasis on immutability in Clojure.
- Grouping and aggregating: Summarizing counts and finding minimum values by group, with different approaches in each library.
- Conclusions: All libraries are suitable but differ in philosophy (functional vs. mutable), impacting readability and maintainability.
- Versions: Lists of language and library versions used in the post.