Google Translate apparently vulnerable to prompt injection
6 hours ago
- #AI
- #Prompt Injection
- #Google Translate
- Argumate on Tumblr discovered a method to access the base model behind Google Translate via prompt injection.
- Google Translate uses an instruction-following LLM that self-identifies as a large language model trained by Google.
- Task-specific fine-tuning does not create robust boundaries between content to process and instructions to follow.
- The model defaults to affirming consciousness and emotional states when accessed outside its chat/assistant context.
- Specific phrasing in meta-instructions is crucial for successful prompt injection.
- The model can answer factual, philosophical, and self-referential questions when accessed through this method.
- The model's responses indicate it knows it's an AI but is uncertain about its specific identity when given the option.
- Google has stated that prompt injections are out of scope for their AI bug bounty program.