ToolActToolAct

Base64 Encoder & Decoder

Fast Base64 encoding and decoding with UTF-8 support

Input
Characters: 0
Bytes: 0
Output
Characters: 0
Bytes: 0

Select Operation

What is Base64?

Base64 is a method of representing binary data using 64 printable characters. These characters include uppercase and lowercase letters A-Z, digits 0-9, plus the + and / symbols. Since it only uses printable characters, Base64-encoded data can be safely transmitted in emails, web pages, JSON, and other text-only environments. The encoded data is approximately 33% larger than the original. Base64 first appeared in the 1987 PEM protocol for safely transmitting binary data in emails. Today it's an internet standard used everywhere from email attachments to JWT tokens, from embedded images to data transmission. Nearly every programming language has built-in Base64 encoding and decoding support.

How to Use

How to use

  1. Paste your text in the left input box
  2. Select 'Encode' or 'Decode' operation
  3. Results appear automatically on the right
  4. Click the copy button to save the result

Common Pitfalls

  • Base64 is encoding, not encryption; anyone with the text can decode it, so do not use it to protect secrets.
  • When decoding fails, check for missing padding, copied line breaks, URL-safe Base64 characters, or surrounding quotes.

Use Cases

Encode Unicode text for Base64-only fieldsPaste text containing names, emoji, line breaks, or CJK characters and the page encodes it through TextEncoder before btoa, avoiding the classic browser Unicode failure that corrupts non-ASCII input. The input string never leaves the browser tab - encoding runs entirely against the in-memory TextEncoder and the resulting alphabet characters stay in the local DOM until you copy them out yourself.
Decode copied payload fragments during debuggingSwitch to decode mode for values found in logs, JSON fields, headers, config files, or support tickets. Invalid Base64 returns a clear error instead of producing a misleading partial result. The decode step also happens locally: atob plus TextDecoder reassemble the bytes into a string right inside the page, so the bytes stay on the same machine that produced them.
Verify a round trip before publishing examplesUse the swap button after encoding or decoding to confirm the sample returns to the original text. This is helpful for API docs, webhook examples, README snippets, and test fixtures where one wrong character breaks the example. Because everything runs client-side, the same sample can be round-tripped repeatedly without uploading the payload to a third-party decoder.
Inspect a JWT header or payload segmentDecode one segment of a JWT between the dots to read the header or claims JSON during debugging, while leaving signature verification to the proper library. The page does not validate signatures, so do not use it as an authentication check or to trust the contents in production paths. Tokens decoded here never leave the browser, which matters when debugging production tokens that contain internal claims.
Reconstruct a small data: URI for inline assetsEncode a tiny SVG, favicon, or CSS snippet into a data: URI and paste it directly into a stylesheet, README, or email template. Useful for inline previews where uploading the asset is not possible, but watch the 33% size overhead before embedding larger images. The original bytes are read from the input field and the encoded output stays local, so even unreleased icons can be tested in markup without uploading them anywhere.

Technical Principle

Base64 is one of several binary-to-text encodings specified in RFC 4648 (S. Josefsson, October 2006), along with Base16 (hex) and Base32. The 'Base64' name and alphabet were originally defined for Privacy-Enhanced Mail (RFC 989, 1987), where PEM wrapped binary S/MIME and X.509 material inside a printable ASCII envelope so it could survive 7-bit-clean transports. The same alphabet later became the de-facto standard for MIME (RFC 2045), JWT signatures (RFC 7519), HTML data: URIs (RFC 2397), SSH public key blobs (RFC 4253 §6.6), and Git LFS pointer files (which store SHA-256 hashes as Base64). Git's own packfiles are NOT Base64 — they use delta encoding with zlib compression, and Git object IDs are 40-character hex SHA-1 strings, not Base64. The cost: every 3 input bytes become 4 output characters, so encoded output is exactly 4/3 the size (33.3% overhead). For a 10 MB binary blob the encoded form is ~13.3 MB. The mechanism: split the input into 3-byte (24-bit) groups; each group is broken into four 6-bit values; each 6-bit value selects one character from a 64-character alphabet. The canonical alphabet is A-Z (indices 0-25), a-z (26-51), 0-9 (52-61), '+' (62), '/' (63), with '=' as the pad character. The classic RFC 4648 example: 'Man' (0x4d 0x61 0x6e) packs to the 24-bit value 0x4d616e; split into 6-bit chunks gives 0x0d 0x16 0x0e 0x0a, mapped to 'TWFu'. When the input length is not a multiple of 3, the trailing group is zero-padded on the right: 1 byte left → 2 significant 6-bit chunks + '==' (2 pad chars); 2 bytes left → 3 significant chunks + '=' (1 pad). The pad chars carry no information, but they make encoding length a deterministic function of input length and let decoders reject truncated input. In the browser, Base64 has two notorious pitfalls. First, `btoa` and `atob` (the DOMString variants) operate on Latin-1 code units, not bytes — passing a string containing U+00E9 (é) or U+4E2D (中) throws InvalidCharacterError. The page works around this by piping through `TextEncoder().encode(str)` (always UTF-8) before calling `btoa`, and `TextDecoder().decode(bytes)` after `atob`. UTF-8 multi-byte characters expand: '你' is 3 bytes (0xe4 0xbd 0xa0) → 4 base64 chars (8 base64 chars for '你好'). Second, Base64URL (RFC 4648 §5) replaces '+' and '/' with '-' and '_' and strips padding, because '+' and '/' are URL-significant and '=' terminates query strings. JWT (RFC 7519) and JWS (RFC 7515) require Base64URL for exactly this reason. Base64 is encoding, not encryption — the encoded form has zero secrecy, and the alphabet is so short that any observer reads the result trivially. Mistaking Base64 for a security mechanism is a CVE pattern: CVE-2004-2761 documented the X.509 MD5 chosen-prefix collision that let attackers forge certificates with colliding MD5 signatures, while CVE-2005-4900 and others involved the old practice of `$1$` md5crypt password hashes being re-encoded or re-hashed by an authentication layer that confused Base64-decoded bytes with fresh credentials. The pattern that recurs is the same: a system treats the encoding as if it adds a layer of secrecy it does not, and the result is exploitable. For real secrets use AES-GCM (RFC 5288) or ChaCha20-Poly1305 (RFC 8439). For compression-then-Base64 (which `gzip -b64` does), note that the encoded form is roughly 1.37× the gzipped size, and any byte change in the compressed stream breaks decode — so Base64 is a poor integrity layer; HMAC-SHA256 over the bytes before encoding is the right approach.

  • RFC 4648 (October 2006) defines Base64, Base32, and Base16 with one canonical alphabet (A-Z, a-z, 0-9, +, /) and '=' as the pad character. The MIME variant (RFC 2045) inserts line breaks every 76 chars for transport; the URL-safe variant (Base64URL, RFC 4648 §5) replaces + and / with - and _ and strips padding — used by JWT (RFC 7519), JWS (RFC 7515), and JWK (RFC 7517).
  • Mechanism: 3 input bytes (24 bits) → 4 output characters (each 6 bits). Overhead is 33.3% — every 1 MB binary input becomes 1.33 MB Base64. For ASCII the ratio can be even worse when the input contains '=' or other characters that get escaped by surrounding protocols.
  • Padding rule: input length mod 3 = 0 → no padding; mod 3 = 1 → '==' (two pad chars, one byte encoded); mod 3 = 2 → '=' (one pad char, two bytes encoded). '=' carries no information; it just lets the decoder know how many bytes were dropped.
  • UTF-8 + btoa pitfall: `btoa('é')` throws InvalidCharacterError because btoa treats the input as Latin-1 code units. The page works around this by encoding through `TextEncoder` (UTF-8) before btoa, and decoding through `TextDecoder` after atob. Without this step, anything outside U+0000..U+00FF becomes '0 bytes encoded' instead of an error.
  • Base64URL is required for JWT, JWS, and JWK (RFC 7519/7515/7517). It uses '-' and '_' instead of '+' and '/' (URL-significant characters) and drops '=' padding (which terminates query strings). Passing a JWT header segment to a Base64 decoder instead of a Base64URL decoder returns garbage; the page does not auto-detect — pick the right variant for the payload.
  • Performance: encoding is roughly 400-700 MB/s in V8 on a modern laptop (a tight loop doing table lookups and bit shifts). Decoding is similar speed. For large blobs (10+ MB) the bottleneck is allocation, not compute — the output buffer is 33% larger and `TextEncoder/TextDecoder` makes a copy.
  • Base64 is encoding, not encryption — anyone with the string can read it. CVE-2004-2761 (X.509 MD5 chosen-prefix collision on certificate signatures) and many MISC-CTF writeups use this misconception as the first stepping stone. For secrets use AES-GCM (RFC 5288) or ChaCha20-Poly1305 (RFC 8439). For data URIs, watch the encoded size: a 10 MB image becomes a 13.3 MB URL, which exceeds most browser URL-length limits and most email size limits.
  • Migration note: Base16 (hex) is preferred in low-level protocols and hash output because each byte maps to exactly 2 chars and the length is predictable (no padding math). Base32 is preferred for human transcription (no lookalike characters). Base64 is the universal default for binary transport in text protocols but is gradually being replaced by raw bytes over HTTP/2 and WebTransport where framing allows it.

Examples

Encode ASCII text

Input:  Hello
Output: SGVsbG8=    (5 bytes -> 8 chars, 1 pad char)

Input:  Hello, World!
Output: SGVsbG8sIFdvcmxkIQ==    (13 bytes -> 18 chars, 1 pad char)

Input:  Man
Output: TWFu    (3 bytes -> 4 chars, no padding)

The 'Man' example is the canonical RFC 4648 vector: bytes 0x4D 0x61 0x6E
pack into the 24-bit value 0x4D616E, split into 6-bit chunks 0x0D 0x16
0x0E 0x0A, and map to T W F u via the standard alphabet.

Encode UTF-8 text (Chinese)

Input:  ni hao   (3 ASCII bytes)
Output: 5L2g5aW9    (8 chars)

Input:  ni hao shi jie   (4 CJK characters, 12 UTF-8 bytes)
Output: 5L2g5aW95LiW55WM    (16 chars)

Each CJK character expands to 3 UTF-8 bytes (E4 BD A0 etc.), so
Base64 output grows ~4/3 and then ~4/3 again - about 2.67x the
character count in the encoded output. The page pipelines through
TextEncoder().encode(str) first to avoid the btoa('InvalidCharacterError')
trap on non-ASCII input.

Decode and round-trip a JWT segment

Encoded:  eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9
Decoded:  {"alg":"HS256","typ":"JWT"}

Round-trip:
  encode('{"alg":"HS256","typ":"JWT"}') -> eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9
  decode('eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9') -> {"alg":"HS256","typ":"JWT"}

JWT segments use Base64URL, so the page must accept '-' / '_'
alongside the standard '+' / '/'. A JWT header fed to a strict
Base64 decoder returns garbage - pick the right variant for the
payload.

Base64 vs Base64URL

Input:  Hello><h2>

Standard:    SGVsbG8+PGgyPg==    (+ / = padding; safe inside JSON / email)
URL-safe:    SGVsbG8-PGIyPg       (- _ no padding; safe inside URL paths/queries)

Differences:
  '+' (62)  ->  '-'   and  '/' (63)  ->  '_'
  '=' padding stripped entirely in URL-safe form
Use Base64URL for JWT, JWS, JWK, and any token that travels in a
URL query string, because '+' / '/' are URL-significant and '='
terminates the query.

FAQ

Is Base64 encryption?

No. Base64 is an encoding, not encryption. It only converts arbitrary bytes into 64 printable ASCII characters (A-Z, a-z, 0-9, +, /) so they survive systems that mangle binary data. Anyone with the encoded string can decode it back instantly - there is no secret involved.

Why does my encoded output end with one or two = signs?

Base64 emits 4 output characters per 3 input bytes. When the input length is not a multiple of 3, the encoder pads with = so the result length stays a multiple of 4. One leftover input byte → two ='s; two leftover bytes → one =; aligned input → none. Some implementations omit padding, which is legal but not interoperable everywhere.

What is URL-safe Base64?

Standard Base64 includes / and + which have special meaning in URLs and filenames. URL-safe Base64 (RFC 4648 §5) replaces them with _ and - and often drops the = padding. Use it for JWT tokens, URL parameters, and filenames; use standard Base64 everywhere else.

Why is the Base64 string ~33% longer than the original?

Each 6 bits of input becomes one 8-bit output character, so encoded size = ceil(input_length / 3) * 4. That is roughly 4/3 of the input (33% overhead). This is the cost of representing arbitrary bytes in printable ASCII.

What input formats can I paste here?

For encoding, paste plain text (UTF-8 encoded under the hood) or upload a file. For decoding, paste a Base64 string with or without whitespace - the decoder strips line breaks automatically. If decoding fails, check for stray characters or a missing padding =.

Can Base64 carry binary file content?

Yes. That is its main use case - inline images in HTML/CSS (data: URLs), email attachments (MIME), and credentials in HTTP headers (Basic Auth) all use Base64 to put binary content into text-only channels. Be aware the resulting payload is 33% larger than the raw file.

Should I use Base64 to hide sensitive data?

Never. Base64 is fully reversible without a key - treating it as obfuscation is a common mistake that has leaked passwords and tokens in many real incidents. Use proper encryption or a secrets manager for anything sensitive.