Hasty Briefsbeta

Bilingual

Classifying aviation-related posts on Hacker News with SLMs

a year ago
  • #machine-learning
  • #data-analysis
  • #aviation
  • Hacker News has a surprisingly high volume of aviation-related content.
  • The author used Small Language Models (SLMs) to classify 42 million Hacker News posts for aviation relevance.
  • Data was gathered via Hacker News API, processed, and stored in Cloudflare R2 Bucket.
  • A pipeline was created to preprocess posts, concatenating titles and texts for model input.
  • Model selection and prompt prototyping were done on 10,000 posts for efficiency.
  • The final analysis classified 0.62% of all posts and 1.13% of top stories as aviation-related.
  • Aviation-related posts have increased over time, with spikes during major aviation incidents.
  • The top 30 contributors to aviation content on Hacker News were acknowledged.
  • Future improvements include more rigorous evaluations and advanced modeling techniques.
  • The author highlights the effectiveness of small, pre-trained models for large-scale data analysis.