You can't unit test for taste
a day ago
- #Data Processing
- #AI Development
- #Geolocation Apps
- The author built In the Long Run, an app for runners to virtually traverse famous routes using Strava data, aiming to add points of interest (POIs) to maps.
- GeoNames was used as a data source with Creative Commons licensing, processed via a pipeline using Python, Parquet files, and DuckDB, with Claude AI assisting.
- Initial filtering of POIs excluded administrative divisions and focused on specific feature codes like parks and historic sites, with population and elevation filters.
- Wikipedia links from GeoNames provided notoriety signals and summaries, but biases emerged due to anglophone editing patterns.
- An LLM (Anthropic's Haiku) was used to rate POIs for significance, but it hallucinated details, leading to reliance on Wikipedia summaries for correctness.
- Per-route parameters were added to adjust filtering, ranking, and geographic spread to address variances in POI types across different regions.
- Evaluation was challenging due to lack of objective metrics for taste and POI relevance, requiring iterative tweaks and manual overrides.
- The project shifted from viewing AI as a core feature to using it as a supplementary tool alongside traditional data processing methods.