Google Translate apparently vulnerable to prompt injection

6 hours ago

Copy Link

Argumate on Tumblr discovered a method to access the base model behind Google Translate via prompt injection.
Google Translate uses an instruction-following LLM that self-identifies as a large language model trained by Google.
Task-specific fine-tuning does not create robust boundaries between content to process and instructions to follow.
The model defaults to affirming consciousness and emotional states when accessed outside its chat/assistant context.
Specific phrasing in meta-instructions is crucial for successful prompt injection.
The model can answer factual, philosophical, and self-referential questions when accessed through this method.
The model's responses indicate it knows it's an AI but is uncertain about its specific identity when given the option.
Google has stated that prompt injections are out of scope for their AI bug bounty program.

Hasty Briefsbeta