Setting Up a Cluster of Tiny PCs for Parallel Computing
2 months ago
- #parallel computing
- #cluster setup
- #R programming
- Setting up a cluster of tiny PCs for parallel computing involves installing Ubuntu, configuring passwordless SSH, and automating package installations across nodes.
- The project aimed to distribute R simulations efficiently, comparing performance between CV5 and CV10 cross-validation methods.
- Key steps included selecting affordable PCs like Lenovo M715q, installing Ubuntu Server, and configuring network settings for fixed IPs.
- Passwordless SSH and sudo were set up to facilitate seamless command execution across nodes without manual password entry.
- A template R script was created to automate simulations, leveraging multicore processing on each node to minimize network overhead.
- The setup demonstrated significant time savings, with some simulations running up to three times faster on three nodes compared to a single quad-core machine.
- Analysis showed that increasing CV folds from 5 to 10 reduced bias but slightly increased variance, with tuned xgboost + logistic regression performing best in terms of coverage and bias.
- Opportunities for improvement include developing a package for easier setup, implementing notifications for task completion, and learning OpenMPI for more advanced parallel computing.
- Lessons learned include the effectiveness of `future.seed` for reproducibility in parallel processing and the importance of asymmetrical coverage assessment in method evaluation.