Token Counter
Estimate token count in AI models with pricing for major LLM providers
Statistics
Estimated Cost
$2.50 / 1M input tokens
$10.00 / 1M output tokens
What is a Token?
The Token Counter estimates how AI models split text into processing units. A token is not always a word or a character; it may be a full word, part of a word, punctuation, whitespace behavior, or a single character in some languages. Token counts matter for context limits, cost estimation, prompt design, RAG chunking, chat history management, and deciding whether an input can fit into a model request. Different model families use different tokenizers, so the count is model-dependent rather than universal. This tool helps trim, split, or structure text before sending it, but the final limit should still be checked against the exact model being used.
How to Use
Basic Operations
- Enter or paste text in the input area
- Select target AI model (GPT-4, Claude, Gemini, etc.)
- View token count estimation on the right panel
- Set estimated output length to calculate API costs
Tokenization Rules
- GPT series: ~4 English chars = 1 token, ~1.5 Chinese chars = 1 token
- Claude series: Similar to GPT with slight differences
- DeepSeek series: Optimized for Chinese, ~2 chars = 1 token
- Special characters, punctuation, and line breaks also consume tokens
- Structured text like code and JSON typically has higher token density
Use Cases
Technical Principle
Modern LLM tokenizers use subword algorithms — primarily Byte Pair Encoding (BPE) and SentencePiece — rather than splitting on whitespace. BPE starts from individual bytes and iteratively merges the most frequent adjacent pair, producing a fixed vocabulary of typically 32k-200k symbols. Common words become single tokens, rare words break into several subwords, and arbitrary bytes (emoji, control characters) still encode safely because the alphabet covers all 256 bytes. SentencePiece (used by Llama, Mistral, Gemini variants) treats whitespace as a regular character via the `▁` marker, so leading spaces become part of the next token, which is why ` hello` and `hello` are usually different token IDs. OpenAI publishes three main BPE vocabularies via the `tiktoken` library: `p50k_base` (50,281 tokens, GPT-3 / Codex), `cl100k_base` (100,277 tokens, GPT-3.5 Turbo and GPT-4), and `o200k_base` (~200k tokens, GPT-4o and o1) which adds non-English coverage and shrinks Chinese/Japanese token counts by roughly 1.4-1.7×. Claude uses a related but proprietary tokenizer with a similar vocabulary scale. As rough working ratios, English text averages ~4 characters per token, Chinese ~1.5-2 characters per token on cl100k_base and ~2 on o200k_base, and a single emoji often consumes 2-5 tokens because it is encoded as multiple UTF-8 bytes. Token count drives both context window usage and cost. Current windows include GPT-4o 128k, Claude 3.5 Sonnet 200k, and Gemini 1.5 Pro 2M; cost is billed as `tokens × price_per_1M`, with input and output priced separately (e.g. GPT-4o at $2.50/$10.00 per 1M, Claude 3.5 Sonnet at $3.00/$15.00). This counter uses heuristic coefficients per family because shipping every tokenizer's vocabulary file would be megabytes of payload, so the result is a working estimate — the authoritative number is the `usage` field of the model's API response.
- BPE merges frequent byte pairs into a fixed vocabulary; OpenAI vocabularies are `cl100k_base` (GPT-4/3.5), `o200k_base` (GPT-4o/o1), `p50k_base` (Codex).
- SentencePiece encodes leading whitespace as `▁`, so ` world` and `world` map to different token IDs in Llama/Mistral/Gemini.
- English heuristic ≈ 4 chars/token; CJK ≈ 1.5-2 chars/token on cl100k_base, ≈ 2 on o200k_base; emoji typically 2-5 tokens each.
- Cost formula: `(input_tokens / 1_000_000) × input_price + (output_tokens / 1_000_000) × output_price`, with input and output priced separately.
- Context windows in 2025: GPT-4o 128k, GPT-4 Turbo 128k, Claude 3.5 Sonnet 200k, Gemini 1.5 Pro 2M, DeepSeek V3 128k.
- Same text gives different token counts across vendors: tokenizer vocabulary, byte-fallback rules, and whitespace handling all differ.
- Authoritative count is the API response `usage.prompt_tokens` / `usage.completion_tokens` (OpenAI) or `usage.input_tokens` / `usage.output_tokens` (Anthropic).
Examples
Short English phrase under GPT-4
Input: Hello, world!
Model: GPT-4 (cl100k_base)
Tokens: 4 -> ["Hello", ",", " world", "!"]
Chars: 13
Ratio: 3.25 chars/tokenChinese text uses more tokens per character
Input: 你好,世界! (Hello, world! in Chinese)
GPT-4: ~8 tokens (1.5 chars/token)
DeepSeek V3: ~4 tokens (2 chars/token, optimized for CJK)
Claude 3.5: ~7 tokensEstimate cost for a 1,000-word article
Input: 1,000 English words (~1,330 tokens)
Expected output: 500 tokens
Model: GPT-4o ($2.50 input / $10.00 output per 1M tokens)
Input cost: 1,330 / 1,000,000 * $2.50 = $0.00333
Output cost: 500 / 1,000,000 * $10.00 = $0.00500
Total: ~$0.0083 per requestRule of thumb: ~75 words = ~100 tokens (English)
Paragraph (75 words):
"The quick brown fox jumps over the lazy dog. Pack my box with five
dozen liquor jugs. How vexingly quick daft zebras jump! The five
boxing wizards jump quickly. Sphinx of black quartz, judge my vow."
GPT-4 tokens: ~100
Claude tokens: ~95Chunk size before embedding into a vector DB
Target chunk: 512 tokens (text-embedding-3-small limit: 8191)
English text: ~384 words per chunk
Chinese text: ~768 characters per chunk (GPT tokenizer)
Overlap: 50 tokens between chunks (preserves context)FAQ
Which tokenizer does the counter use?
Typically OpenAI's tiktoken (cl100k_base for GPT-4, GPT-3.5; o200k_base for GPT-4o), and sometimes anthropic's Claude tokenizer or Hugging Face tokenizers for open models. Different models split text differently, so the count varies between models.
Why is the count not the same as word count?
Tokens are sub-word units. 'Hello world' is 2 tokens; 'antidisestablishmentarianism' is 5-6 tokens. English averages ~0.75 words per token (so 1000 tokens ≈ 750 words). Other languages are denser - Chinese characters are often 1-2 tokens each, despite being one character.
Is my prompt uploaded?
No. The tokenizer runs in your browser - tiktoken has a JavaScript port that does the encoding locally. Your prompt does not cross the network.
How accurate is the cost estimate?
Token count is exact. The cost figure depends on the price-per-1K-tokens for the chosen model, which the page reads from a published price list. Provider price changes are reflected when the page is updated; verify against the latest pricing for budget-critical decisions.
Why do my counts differ slightly between this and OpenAI's playground?
Different tiktoken versions can have minor differences. Special tokens (chat messages have role/system framing tokens) add a few tokens per message that an unstructured counter may not include. For exact API-call billing, count what your code actually sends.
How does it handle code, JSON, and structured data?
Tokenizers split punctuation, brackets, and whitespace into many small tokens. JSON is dense - a small JSON object can use 50+ tokens. Code uses more tokens than equivalent prose. Plan for this when sending JSON or code to a model with a tight context limit.
Can I count tokens for a model not listed?
Only if its tokenizer is available in-browser. Common ones (GPT, Claude, Llama) have JS implementations. For obscure or proprietary models, use the model provider's official counter or estimate (4 chars ≈ 1 token for English).