Base64 Encoder & Decoder
Fast Base64 encoding and decoding with UTF-8 support
Select Operation
What is Base64?
Base64 is a method of representing binary data using 64 printable characters. These characters include uppercase and lowercase letters A-Z, digits 0-9, plus the + and / symbols. Since it only uses printable characters, Base64-encoded data can be safely transmitted in emails, web pages, JSON, and other text-only environments. The encoded data is approximately 33% larger than the original. Base64 first appeared in the 1987 PEM protocol for safely transmitting binary data in emails. Today it's an internet standard used everywhere from email attachments to JWT tokens, from embedded images to data transmission. Nearly every programming language has built-in Base64 encoding and decoding support.
How to Use
How to use
- Paste your text in the left input box
- Select 'Encode' or 'Decode' operation
- Results appear automatically on the right
- Click the copy button to save the result
Common Pitfalls
- Base64 is encoding, not encryption; anyone with the text can decode it, so do not use it to protect secrets.
- When decoding fails, check for missing padding, copied line breaks, URL-safe Base64 characters, or surrounding quotes.
Use Cases
Technical Principle
Base64 is one of several binary-to-text encodings specified in RFC 4648 (S. Josefsson, October 2006), along with Base16 (hex) and Base32. The 'Base64' name and alphabet were originally defined for Privacy-Enhanced Mail (RFC 989, 1987), where PEM wrapped binary S/MIME and X.509 material inside a printable ASCII envelope so it could survive 7-bit-clean transports. The same alphabet later became the de-facto standard for MIME (RFC 2045), JWT signatures (RFC 7519), HTML data: URIs (RFC 2397), SSH public key blobs (RFC 4253 §6.6), and Git LFS pointer files (which store SHA-256 hashes as Base64). Git's own packfiles are NOT Base64 — they use delta encoding with zlib compression, and Git object IDs are 40-character hex SHA-1 strings, not Base64. The cost: every 3 input bytes become 4 output characters, so encoded output is exactly 4/3 the size (33.3% overhead). For a 10 MB binary blob the encoded form is ~13.3 MB. The mechanism: split the input into 3-byte (24-bit) groups; each group is broken into four 6-bit values; each 6-bit value selects one character from a 64-character alphabet. The canonical alphabet is A-Z (indices 0-25), a-z (26-51), 0-9 (52-61), '+' (62), '/' (63), with '=' as the pad character. The classic RFC 4648 example: 'Man' (0x4d 0x61 0x6e) packs to the 24-bit value 0x4d616e; split into 6-bit chunks gives 0x0d 0x16 0x0e 0x0a, mapped to 'TWFu'. When the input length is not a multiple of 3, the trailing group is zero-padded on the right: 1 byte left → 2 significant 6-bit chunks + '==' (2 pad chars); 2 bytes left → 3 significant chunks + '=' (1 pad). The pad chars carry no information, but they make encoding length a deterministic function of input length and let decoders reject truncated input. In the browser, Base64 has two notorious pitfalls. First, `btoa` and `atob` (the DOMString variants) operate on Latin-1 code units, not bytes — passing a string containing U+00E9 (é) or U+4E2D (中) throws InvalidCharacterError. The page works around this by piping through `TextEncoder().encode(str)` (always UTF-8) before calling `btoa`, and `TextDecoder().decode(bytes)` after `atob`. UTF-8 multi-byte characters expand: '你' is 3 bytes (0xe4 0xbd 0xa0) → 4 base64 chars (8 base64 chars for '你好'). Second, Base64URL (RFC 4648 §5) replaces '+' and '/' with '-' and '_' and strips padding, because '+' and '/' are URL-significant and '=' terminates query strings. JWT (RFC 7519) and JWS (RFC 7515) require Base64URL for exactly this reason. Base64 is encoding, not encryption — the encoded form has zero secrecy, and the alphabet is so short that any observer reads the result trivially. Mistaking Base64 for a security mechanism is a CVE pattern: CVE-2004-2761 documented the X.509 MD5 chosen-prefix collision that let attackers forge certificates with colliding MD5 signatures, while CVE-2005-4900 and others involved the old practice of `$1$` md5crypt password hashes being re-encoded or re-hashed by an authentication layer that confused Base64-decoded bytes with fresh credentials. The pattern that recurs is the same: a system treats the encoding as if it adds a layer of secrecy it does not, and the result is exploitable. For real secrets use AES-GCM (RFC 5288) or ChaCha20-Poly1305 (RFC 8439). For compression-then-Base64 (which `gzip -b64` does), note that the encoded form is roughly 1.37× the gzipped size, and any byte change in the compressed stream breaks decode — so Base64 is a poor integrity layer; HMAC-SHA256 over the bytes before encoding is the right approach.
- RFC 4648 (October 2006) defines Base64, Base32, and Base16 with one canonical alphabet (A-Z, a-z, 0-9, +, /) and '=' as the pad character. The MIME variant (RFC 2045) inserts line breaks every 76 chars for transport; the URL-safe variant (Base64URL, RFC 4648 §5) replaces + and / with - and _ and strips padding — used by JWT (RFC 7519), JWS (RFC 7515), and JWK (RFC 7517).
- Mechanism: 3 input bytes (24 bits) → 4 output characters (each 6 bits). Overhead is 33.3% — every 1 MB binary input becomes 1.33 MB Base64. For ASCII the ratio can be even worse when the input contains '=' or other characters that get escaped by surrounding protocols.
- Padding rule: input length mod 3 = 0 → no padding; mod 3 = 1 → '==' (two pad chars, one byte encoded); mod 3 = 2 → '=' (one pad char, two bytes encoded). '=' carries no information; it just lets the decoder know how many bytes were dropped.
- UTF-8 + btoa pitfall: `btoa('é')` throws InvalidCharacterError because btoa treats the input as Latin-1 code units. The page works around this by encoding through `TextEncoder` (UTF-8) before btoa, and decoding through `TextDecoder` after atob. Without this step, anything outside U+0000..U+00FF becomes '0 bytes encoded' instead of an error.
- Base64URL is required for JWT, JWS, and JWK (RFC 7519/7515/7517). It uses '-' and '_' instead of '+' and '/' (URL-significant characters) and drops '=' padding (which terminates query strings). Passing a JWT header segment to a Base64 decoder instead of a Base64URL decoder returns garbage; the page does not auto-detect — pick the right variant for the payload.
- Performance: encoding is roughly 400-700 MB/s in V8 on a modern laptop (a tight loop doing table lookups and bit shifts). Decoding is similar speed. For large blobs (10+ MB) the bottleneck is allocation, not compute — the output buffer is 33% larger and `TextEncoder/TextDecoder` makes a copy.
- Base64 is encoding, not encryption — anyone with the string can read it. CVE-2004-2761 (X.509 MD5 chosen-prefix collision on certificate signatures) and many MISC-CTF writeups use this misconception as the first stepping stone. For secrets use AES-GCM (RFC 5288) or ChaCha20-Poly1305 (RFC 8439). For data URIs, watch the encoded size: a 10 MB image becomes a 13.3 MB URL, which exceeds most browser URL-length limits and most email size limits.
- Migration note: Base16 (hex) is preferred in low-level protocols and hash output because each byte maps to exactly 2 chars and the length is predictable (no padding math). Base32 is preferred for human transcription (no lookalike characters). Base64 is the universal default for binary transport in text protocols but is gradually being replaced by raw bytes over HTTP/2 and WebTransport where framing allows it.
Examples
Encode ASCII text
Input: Hello
Output: SGVsbG8= (5 bytes -> 8 chars, 1 pad char)
Input: Hello, World!
Output: SGVsbG8sIFdvcmxkIQ== (13 bytes -> 18 chars, 1 pad char)
Input: Man
Output: TWFu (3 bytes -> 4 chars, no padding)
The 'Man' example is the canonical RFC 4648 vector: bytes 0x4D 0x61 0x6E
pack into the 24-bit value 0x4D616E, split into 6-bit chunks 0x0D 0x16
0x0E 0x0A, and map to T W F u via the standard alphabet.Encode UTF-8 text (Chinese)
Input: ni hao (3 ASCII bytes)
Output: 5L2g5aW9 (8 chars)
Input: ni hao shi jie (4 CJK characters, 12 UTF-8 bytes)
Output: 5L2g5aW95LiW55WM (16 chars)
Each CJK character expands to 3 UTF-8 bytes (E4 BD A0 etc.), so
Base64 output grows ~4/3 and then ~4/3 again - about 2.67x the
character count in the encoded output. The page pipelines through
TextEncoder().encode(str) first to avoid the btoa('InvalidCharacterError')
trap on non-ASCII input.Decode and round-trip a JWT segment
Encoded: eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9
Decoded: {"alg":"HS256","typ":"JWT"}
Round-trip:
encode('{"alg":"HS256","typ":"JWT"}') -> eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9
decode('eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9') -> {"alg":"HS256","typ":"JWT"}
JWT segments use Base64URL, so the page must accept '-' / '_'
alongside the standard '+' / '/'. A JWT header fed to a strict
Base64 decoder returns garbage - pick the right variant for the
payload.Base64 vs Base64URL
Input: Hello><h2>
Standard: SGVsbG8+PGgyPg== (+ / = padding; safe inside JSON / email)
URL-safe: SGVsbG8-PGIyPg (- _ no padding; safe inside URL paths/queries)
Differences:
'+' (62) -> '-' and '/' (63) -> '_'
'=' padding stripped entirely in URL-safe form
Use Base64URL for JWT, JWS, JWK, and any token that travels in a
URL query string, because '+' / '/' are URL-significant and '='
terminates the query.FAQ
Is Base64 encryption?
No. Base64 is an encoding, not encryption. It only converts arbitrary bytes into 64 printable ASCII characters (A-Z, a-z, 0-9, +, /) so they survive systems that mangle binary data. Anyone with the encoded string can decode it back instantly - there is no secret involved.
Why does my encoded output end with one or two = signs?
Base64 emits 4 output characters per 3 input bytes. When the input length is not a multiple of 3, the encoder pads with = so the result length stays a multiple of 4. One leftover input byte → two ='s; two leftover bytes → one =; aligned input → none. Some implementations omit padding, which is legal but not interoperable everywhere.
What is URL-safe Base64?
Standard Base64 includes / and + which have special meaning in URLs and filenames. URL-safe Base64 (RFC 4648 §5) replaces them with _ and - and often drops the = padding. Use it for JWT tokens, URL parameters, and filenames; use standard Base64 everywhere else.
Why is the Base64 string ~33% longer than the original?
Each 6 bits of input becomes one 8-bit output character, so encoded size = ceil(input_length / 3) * 4. That is roughly 4/3 of the input (33% overhead). This is the cost of representing arbitrary bytes in printable ASCII.
What input formats can I paste here?
For encoding, paste plain text (UTF-8 encoded under the hood) or upload a file. For decoding, paste a Base64 string with or without whitespace - the decoder strips line breaks automatically. If decoding fails, check for stray characters or a missing padding =.
Can Base64 carry binary file content?
Yes. That is its main use case - inline images in HTML/CSS (data: URLs), email attachments (MIME), and credentials in HTTP headers (Basic Auth) all use Base64 to put binary content into text-only channels. Be aware the resulting payload is 33% larger than the raw file.
Should I use Base64 to hide sensitive data?
Never. Base64 is fully reversible without a key - treating it as obfuscation is a common mistake that has leaked passwords and tokens in many real incidents. Use proper encryption or a secrets manager for anything sensitive.