Hasty Briefsbeta

LLMs are getting better at character-level text manipulation

11 hours ago
  • #Base64 Decoding
  • #LLMs
  • #Character Manipulation
  • Newer generations of large language models (LLMs) like GPT-5 and Claude 4.5 show improved capabilities in handling character manipulation, counting characters, and solving encoding/cipher tasks compared to previous models.
  • LLMs traditionally struggle with character-level tasks due to tokenization, where text is broken into tokens that may represent multiple characters or whole words, making granular character manipulation difficult.
  • Testing character manipulation: Models from GPT-4.1 onwards consistently perform well in tasks like replacing letters in a sentence, while earlier models like GPT-3.5-turbo fail.
  • Counting characters remains a challenge for most LLMs, with only GPT-4.1 and GPT-5 (with reasoning) reliably counting characters in sentences or specific letters.
  • Base64 and ROT20 cipher tests reveal that newer models (GPT-5, Claude Sonnet 4.5, Gemini 2.5 Pro) can decode and decipher encoded messages, suggesting a deeper understanding beyond memorized patterns.
  • Some models, like Claude Sonnet 4.5 and Grok 4, refuse to process encoded or obfuscated text due to safety filters, limiting their usability for certain tasks.
  • Chinese reasoning models exhibit lengthy internal monologues when solving ciphers, consuming significantly more tokens than other models.
  • Newer models demonstrate improved generalization in Base64 decoding, handling even gibberish-like ROT20 encoded text, indicating algorithmic understanding rather than just pattern memorization.
  • Character-level operations in LLMs are improving, with newer models showing better performance in substitution tasks and cipher decoding, though challenges remain.