Hasty Briefsbeta

Google Translate apparently vulnerable to prompt injection

6 hours ago
  • #AI
  • #Prompt Injection
  • #Google Translate
  • Argumate on Tumblr discovered a method to access the base model behind Google Translate via prompt injection.
  • Google Translate uses an instruction-following LLM that self-identifies as a large language model trained by Google.
  • Task-specific fine-tuning does not create robust boundaries between content to process and instructions to follow.
  • The model defaults to affirming consciousness and emotional states when accessed outside its chat/assistant context.
  • Specific phrasing in meta-instructions is crucial for successful prompt injection.
  • The model can answer factual, philosophical, and self-referential questions when accessed through this method.
  • The model's responses indicate it knows it's an AI but is uncertain about its specific identity when given the option.
  • Google has stated that prompt injections are out of scope for their AI bug bounty program.