Human typing habits and token counts

14 hours ago

Human typing habits like typos, shorthand, filler words, and pasted data increase token counts without changing intent, affecting billing.
Typos (e.g., swapped or dropped letters) and word variations (e.g., suffixes) cause tokenizers to split text differently, raising token counts.
Conversational padding (e.g., fillers, hedges, expressive habits) adds tokens that help tone but rarely aid task completion, impacting costs.
Shorthand forms (e.g., 'pls' for 'please') can be less token-efficient than standard words, contrary to keystroke-saving intentions.
Non-conversational elements like UUIDs, timestamps, and URLs significantly inflate token counts in work contexts, contributing to billing overhead.
Tokenizers, such as OpenAI's and Claude's, vary in token output for the same text, with counts influenced by surrounding text features.
The disconnect between human typing for efficiency and tokenizer billing based on patterns introduces cost considerations in everyday communication.

Hasty Briefsbeta