Hasty Briefsbeta

Bilingual

Tagging Blog Posts with BERTopic and LLMs

a day ago
  • #LLMs
  • #BERTopic
  • #blog-tagging
  • The author added tags to their blog using BERTopic and LLMs, completing a project started in 2023.
  • LLMs are effective for labeling and compressing information, aiding in tasks like topic modeling and tagging.
  • Historically, tagging evolved with platforms like Delicious and Twitter, but its importance has declined with the rise of LLMs and algorithmic discovery.
  • Traditional topic modeling used methods like LDA, which had limitations such as treating documents as bags of words without context.
  • Advances in embeddings and attention mechanisms improved contextual understanding, leading to better topic modeling.
  • BERTopic was used to cluster blog posts via embeddings, dimensionality reduction, and HDBSCAN, followed by tf-idf for topic extraction.
  • The author refined generated tags with LLMs like Gemini and GPT-OSS, and used Claude Code for batch-tagging posts.
  • UI improvements for tags were made using Pi, adjusting design for better discoverability in the sidebar.
  • The project took 6-10 hours over a month, highlighting the efficiency of blending LLMs with traditional ML approaches.
  • Tags are now implemented, though their utility for audience discovery is uncertain in the age of LLM-driven content access.