ToolActToolAct

Token Counter

Estimate token count in AI models with pricing for major LLM providers

Input Text

Statistics

Token Count0
Characters0
Chinese Characters0
Words (Est.)0
Chars/Token Ratio0

Estimated Cost

Input Tokens0
Output Tokens1,000
Total Cost$0.0100

$2.50 / 1M input tokens
$10.00 / 1M output tokens

What is a Token?

The Token Counter estimates how AI models split text into processing units. A token is not always a word or a character; it may be a full word, part of a word, punctuation, whitespace behavior, or a single character in some languages. Token counts matter for context limits, cost estimation, prompt design, RAG chunking, chat history management, and deciding whether an input can fit into a model request. Different model families use different tokenizers, so the count is model-dependent rather than universal. This tool helps trim, split, or structure text before sending it, but the final limit should still be checked against the exact model being used.

How to Use

Basic Operations

  1. Enter or paste text in the input area
  2. Select target AI model (GPT-4, Claude, Gemini, etc.)
  3. View token count estimation on the right panel
  4. Set estimated output length to calculate API costs

Tokenization Rules

  • GPT series: ~4 English chars = 1 token, ~1.5 Chinese chars = 1 token
  • Claude series: Similar to GPT with slight differences
  • DeepSeek series: Optimized for Chinese, ~2 chars = 1 token
  • Special characters, punctuation, and line breaks also consume tokens
  • Structured text like code and JSON typically has higher token density

Use Cases

Estimate prompt size across model familiesPaste text and choose from OpenAI, Claude, Gemini, Llama, Mistral, or DeepSeek model presets. The estimator uses different heuristic coefficients for Chinese characters, English words, punctuation, and whitespace depending on the selected family. Switching presets quickly reveals how a prompt designed for one vendor will be billed on another, which is useful when negotiating provider contracts.
Forecast rough input and output costThe panel combines estimated input tokens with a user-entered expected output token count and model-specific per-thousand-token pricing values. It shows input tokens, output tokens, and total estimated cost for quick budget checks. For a long-running batch job, multiply the single-request estimate by the planned request count to project monthly spend before committing to a model.
Understand multilingual text compositionBesides token estimate, the tool reports total characters, Chinese character count, word count, and characters-per-token ratio. This is useful when trimming prompts, comparing Chinese and English drafts, or preparing content for model context limits. A high characters-per-token ratio means the tokenizer is packing more text into each token, which usually lowers cost per page.
Compare tokenizer estimates side by sideSwitch the model preset between GPT, Claude, and Gemini on the same text to see how Chinese vs English costs shift, useful when porting a prompt across providers or estimating chunk size for a RAG pipeline. The difference between BPE and SentencePiece tokenizers becomes visible: BPE tends to break rare words into more subword tokens, while SentencePiece (used by Llama and Mistral) can split whitespace differently and treat Chinese characters as larger units.
Size chunks before embedding or retrievalAim each paragraph near the chosen model's context slice (e.g., 512 or 1024 tokens), copy the boundary sentence into the splitter, and tag chunks with their token count for downstream retrieval indices. The exact cl100k_base vocabulary used by GPT-4o, o200k_base for newer OpenAI models, and Claude's roughly 100k-symbol SentencePiece vocabulary all produce different chunk boundaries on the same document.

Technical Principle

Modern LLM tokenizers use subword algorithms — primarily Byte Pair Encoding (BPE) and SentencePiece — rather than splitting on whitespace. BPE starts from individual bytes and iteratively merges the most frequent adjacent pair, producing a fixed vocabulary of typically 32k-200k symbols. Common words become single tokens, rare words break into several subwords, and arbitrary bytes (emoji, control characters) still encode safely because the alphabet covers all 256 bytes. SentencePiece (used by Llama, Mistral, Gemini variants) treats whitespace as a regular character via the `▁` marker, so leading spaces become part of the next token, which is why ` hello` and `hello` are usually different token IDs. OpenAI publishes three main BPE vocabularies via the `tiktoken` library: `p50k_base` (50,281 tokens, GPT-3 / Codex), `cl100k_base` (100,277 tokens, GPT-3.5 Turbo and GPT-4), and `o200k_base` (~200k tokens, GPT-4o and o1) which adds non-English coverage and shrinks Chinese/Japanese token counts by roughly 1.4-1.7×. Claude uses a related but proprietary tokenizer with a similar vocabulary scale. As rough working ratios, English text averages ~4 characters per token, Chinese ~1.5-2 characters per token on cl100k_base and ~2 on o200k_base, and a single emoji often consumes 2-5 tokens because it is encoded as multiple UTF-8 bytes. Token count drives both context window usage and cost. Current windows include GPT-4o 128k, Claude 3.5 Sonnet 200k, and Gemini 1.5 Pro 2M; cost is billed as `tokens × price_per_1M`, with input and output priced separately (e.g. GPT-4o at $2.50/$10.00 per 1M, Claude 3.5 Sonnet at $3.00/$15.00). This counter uses heuristic coefficients per family because shipping every tokenizer's vocabulary file would be megabytes of payload, so the result is a working estimate — the authoritative number is the `usage` field of the model's API response.

  • BPE merges frequent byte pairs into a fixed vocabulary; OpenAI vocabularies are `cl100k_base` (GPT-4/3.5), `o200k_base` (GPT-4o/o1), `p50k_base` (Codex).
  • SentencePiece encodes leading whitespace as `▁`, so ` world` and `world` map to different token IDs in Llama/Mistral/Gemini.
  • English heuristic ≈ 4 chars/token; CJK ≈ 1.5-2 chars/token on cl100k_base, ≈ 2 on o200k_base; emoji typically 2-5 tokens each.
  • Cost formula: `(input_tokens / 1_000_000) × input_price + (output_tokens / 1_000_000) × output_price`, with input and output priced separately.
  • Context windows in 2025: GPT-4o 128k, GPT-4 Turbo 128k, Claude 3.5 Sonnet 200k, Gemini 1.5 Pro 2M, DeepSeek V3 128k.
  • Same text gives different token counts across vendors: tokenizer vocabulary, byte-fallback rules, and whitespace handling all differ.
  • Authoritative count is the API response `usage.prompt_tokens` / `usage.completion_tokens` (OpenAI) or `usage.input_tokens` / `usage.output_tokens` (Anthropic).

Examples

Short English phrase under GPT-4

Input:    Hello, world!
Model:    GPT-4 (cl100k_base)
Tokens:   4   ->  ["Hello", ",", " world", "!"]
Chars:    13
Ratio:    3.25 chars/token

Chinese text uses more tokens per character

Input:    你好,世界!  (Hello, world! in Chinese)
GPT-4:        ~8 tokens (1.5 chars/token)
DeepSeek V3:  ~4 tokens (2 chars/token, optimized for CJK)
Claude 3.5:   ~7 tokens

Estimate cost for a 1,000-word article

Input:        1,000 English words (~1,330 tokens)
Expected output: 500 tokens
Model:        GPT-4o ($2.50 input / $10.00 output per 1M tokens)

Input cost:   1,330 / 1,000,000 * $2.50 = $0.00333
Output cost:  500   / 1,000,000 * $10.00 = $0.00500
Total:        ~$0.0083 per request

Rule of thumb: ~75 words = ~100 tokens (English)

Paragraph (75 words):
"The quick brown fox jumps over the lazy dog. Pack my box with five
dozen liquor jugs. How vexingly quick daft zebras jump! The five
boxing wizards jump quickly. Sphinx of black quartz, judge my vow."

GPT-4 tokens: ~100
Claude tokens: ~95

Chunk size before embedding into a vector DB

Target chunk: 512 tokens (text-embedding-3-small limit: 8191)
English text:  ~384 words per chunk
Chinese text:  ~768 characters per chunk (GPT tokenizer)

Overlap:       50 tokens between chunks (preserves context)

FAQ

Which tokenizer does the counter use?

Typically OpenAI's tiktoken (cl100k_base for GPT-4, GPT-3.5; o200k_base for GPT-4o), and sometimes anthropic's Claude tokenizer or Hugging Face tokenizers for open models. Different models split text differently, so the count varies between models.

Why is the count not the same as word count?

Tokens are sub-word units. 'Hello world' is 2 tokens; 'antidisestablishmentarianism' is 5-6 tokens. English averages ~0.75 words per token (so 1000 tokens ≈ 750 words). Other languages are denser - Chinese characters are often 1-2 tokens each, despite being one character.

Is my prompt uploaded?

No. The tokenizer runs in your browser - tiktoken has a JavaScript port that does the encoding locally. Your prompt does not cross the network.

How accurate is the cost estimate?

Token count is exact. The cost figure depends on the price-per-1K-tokens for the chosen model, which the page reads from a published price list. Provider price changes are reflected when the page is updated; verify against the latest pricing for budget-critical decisions.

Why do my counts differ slightly between this and OpenAI's playground?

Different tiktoken versions can have minor differences. Special tokens (chat messages have role/system framing tokens) add a few tokens per message that an unstructured counter may not include. For exact API-call billing, count what your code actually sends.

How does it handle code, JSON, and structured data?

Tokenizers split punctuation, brackets, and whitespace into many small tokens. JSON is dense - a small JSON object can use 50+ tokens. Code uses more tokens than equivalent prose. Plan for this when sending JSON or code to a model with a tight context limit.

Can I count tokens for a model not listed?

Only if its tokenizer is available in-browser. Common ones (GPT, Claude, Llama) have JS implementations. For obscure or proprietary models, use the model provider's official counter or estimate (4 chars ≈ 1 token for English).