Don't Trust the Salt: AI Summarization, Multilingual Safety, and LLM Guardrails

8 days ago

The author reflects on their journey from a young engineer to a researcher focusing on AI and human rights, emphasizing the importance of critical thinking over AI-generated summaries.
Introduces 'Bilingual Shadow Reasoning', a method to steer AI model outputs through non-English policies, revealing how summaries can be manipulated subtly.
Highlights the risks of relying on AI summarization tools in high-stakes domains due to potential biases and hidden policy directives.
Describes the 'Multilingual AI Safety Evaluation Lab', an open-source platform to benchmark inconsistencies in LLM outputs across languages, showing significant quality drops in non-English responses.
Presents findings from refugee and asylum-related scenarios, where non-English responses were less actionable, factual, and safe compared to English.
Discusses 'Evaluating Multilingual, Context-Aware LLM Guardrails', revealing guardrails often fail across languages, with discrepancies and hallucinations in non-English outputs.
Advocates for 2026 to be the year evaluation informs custom safeguard and guardrail design, focusing on continuous improvement and real-time fact-checking.
Plans to expand the Multilingual Evaluation Lab to include voice-based and multi-turn evaluations, and seeks partnerships for studies on gender-based violence and reproductive health.

Hasty Briefsbeta