Backprompting: Leveraging synthetic production data for health advice guardrails
9 hours ago
- #Health Advice Detection
- #Synthetic Data
- #LLM Guardrails
- Proposes backprompting, a method to generate production-like labeled data for health advice guardrails in LLMs.
- Combines backprompting with sparse human-in-the-loop clustering to label synthetic data.
- Aims to create a parallel corpus resembling real LLM outputs for robust detector training.
- Demonstrates effectiveness in identifying health advice, outperforming GPT-4o by up to 3.73% with fewer parameters.