Tagging Blog Posts with BERTopic and LLMs
a day ago
- #LLMs
- #BERTopic
- #blog-tagging
- The author added tags to their blog using BERTopic and LLMs, completing a project started in 2023.
- LLMs are effective for labeling and compressing information, aiding in tasks like topic modeling and tagging.
- Historically, tagging evolved with platforms like Delicious and Twitter, but its importance has declined with the rise of LLMs and algorithmic discovery.
- Traditional topic modeling used methods like LDA, which had limitations such as treating documents as bags of words without context.
- Advances in embeddings and attention mechanisms improved contextual understanding, leading to better topic modeling.
- BERTopic was used to cluster blog posts via embeddings, dimensionality reduction, and HDBSCAN, followed by tf-idf for topic extraction.
- The author refined generated tags with LLMs like Gemini and GPT-OSS, and used Claude Code for batch-tagging posts.
- UI improvements for tags were made using Pi, adjusting design for better discoverability in the sidebar.
- The project took 6-10 hours over a month, highlighting the efficiency of blending LLMs with traditional ML approaches.
- Tags are now implemented, though their utility for audience discovery is uncertain in the age of LLM-driven content access.