LLMs are getting better at character-level text manipulation

11 hours ago

https://blog.burkert.me/posts/llm_evolution_character_manipulation/

Copy Link

#Base64 Decoding
#LLMs
#Character Manipulation

Newer generations of large language models (LLMs) like GPT-5 and Claude 4.5 show improved capabilities in handling character manipulation, counting characters, and solving encoding/cipher tasks compared to previous models.
LLMs traditionally struggle with character-level tasks due to tokenization, where text is broken into tokens that may represent multiple characters or whole words, making granular character manipulation difficult.
Testing character manipulation: Models from GPT-4.1 onwards consistently perform well in tasks like replacing letters in a sentence, while earlier models like GPT-3.5-turbo fail.
Counting characters remains a challenge for most LLMs, with only GPT-4.1 and GPT-5 (with reasoning) reliably counting characters in sentences or specific letters.
Base64 and ROT20 cipher tests reveal that newer models (GPT-5, Claude Sonnet 4.5, Gemini 2.5 Pro) can decode and decipher encoded messages, suggesting a deeper understanding beyond memorized patterns.
Some models, like Claude Sonnet 4.5 and Grok 4, refuse to process encoded or obfuscated text due to safety filters, limiting their usability for certain tasks.
Chinese reasoning models exhibit lengthy internal monologues when solving ciphers, consuming significantly more tokens than other models.
Newer models demonstrate improved generalization in Base64 decoding, handling even gibberish-like ROT20 encoded text, indicating algorithmic understanding rather than just pattern memorization.
Character-level operations in LLMs are improving, with newer models showing better performance in substitution tasks and cipher decoding, though challenges remain.

Hasty Briefsbeta

LLMs are getting better at character-level text manipulation