Human typing habits and token counts
14 hours ago
- #Typing Habits
- #Tokenization
- #AI Billing
- Human typing habits like typos, shorthand, filler words, and pasted data increase token counts without changing intent, affecting billing.
- Typos (e.g., swapped or dropped letters) and word variations (e.g., suffixes) cause tokenizers to split text differently, raising token counts.
- Conversational padding (e.g., fillers, hedges, expressive habits) adds tokens that help tone but rarely aid task completion, impacting costs.
- Shorthand forms (e.g., 'pls' for 'please') can be less token-efficient than standard words, contrary to keystroke-saving intentions.
- Non-conversational elements like UUIDs, timestamps, and URLs significantly inflate token counts in work contexts, contributing to billing overhead.
- Tokenizers, such as OpenAI's and Claude's, vary in token output for the same text, with counts influenced by surrounding text features.
- The disconnect between human typing for efficiency and tokenizer billing based on patterns introduces cost considerations in everyday communication.