HTML Entity Encoder

Convert HTML entity characters online, supports encoding and decoding to prevent XSS attacks

Input Text

Characters: 0

Bytes: 0

Output

Characters: 0

Bytes: 0

Select Conversion Method

What is HTML Entity Encoding?

HTML entity encoding is a mechanism that converts special characters into HTML entity references. In HTML, certain characters have special meanings (like <, >, &), and if you need to display these characters themselves on the page, you must use entity encoding. Entity encoding comes in two forms: named entities (like <) and numeric entities (like <). Named entities are more readable, while numeric entities can represent any Unicode character. HTML encoding matters when text must be inserted into HTML without being interpreted as markup. Characters such as <, >, &, quotes, and apostrophes can otherwise change tags, attributes, or entities. The tool helps with examples, templates, CMS content, and debugging XSS-related issues. Context still matters: HTML body text, attributes, URLs, JavaScript, and CSS all require different escaping rules, so encoded output should be used in the correct place.

How to Use

How to use

Enter or paste text to convert in the left input box
Click the corresponding conversion button to select encoding or decoding method
The result will automatically display on the right
Click the "Copy" button to copy the result to clipboard

Conversion Methods

HTML Entity EncodeConvert < > & " ' to named entities, suitable for XSS prevention

HTML Entity DecodeRestore named entities to original characters

Numeric Entity EncodeConvert special characters to numeric entity form (like <)

Full EncodeEncode all non-ASCII characters, suitable for internationalization

Full DecodeRestore all forms of HTML entities

Keyboard Shortcuts

Ctrl + EHTML Entity Encode
Ctrl + DHTML Entity Decode

Encoding Tips

Encode user-visible text before inserting it into HTML source, especially when the text may contain angle brackets, quotes, or ampersands.
HTML entity encoding helps prevent markup from being interpreted, but it is only one part of XSS defense and does not replace contextual output escaping.

Use Cases

Escape unsafe characters before putting text into HTMLEncode ampersands, angle brackets, quotes, and apostrophes as named entities so copied user text can appear in markup examples without becoming tags or attributes. The original source string is never transmitted - each substitution runs against the value you typed, so internal documentation, unreleased copy, or private snippets can be sanitized without uploading the text to a remote service.

Decode entities found in copied page sourceSwitch to entity, numeric, or full decode modes to turn &lt;, <, or other encoded snippets back into readable text during debugging or content cleanup. The decode uses the page's DOM parser against a detached node, which means the input never enters a network request and the rebuilt string stays in the same browser tab where it originated.

Choose between minimal and full encodingUse entity or numeric modes for only the critical HTML characters, or full encoding when non-ASCII characters also need numeric character references for a legacy system. Pick the mode that matches the destination's expected entity set - for example, named entities for human-readable HTML, numeric entities for older CMS templates, and full encoding for ASCII-only transport channels.

Encode ampersands and angle brackets for safe CMS pasteRun a code snippet or template string through encode mode so &, <, >, and quotes become & < > entities before pasting into a rich-text editor, email template, or static site field that re-parses input as HTML. Because the encoding is a local regex pass, the original snippet stays on the page until you explicitly copy the output, which helps when the source contains confidential examples.

Decode JSON inside HTML attributes during inspectionSwitch to numeric or full decode to turn { { " and similar sequences back into literal characters when reading encoded JSON, webhook payloads, or attribute strings pulled from saved page source. The decoded JSON stays in the right-hand output area; nothing is sent to a parser endpoint, so payload fragments from production logs can be inspected without leaving the browser.

Technical Principle

HTML uses two kinds of character references defined by the WHATWG HTML Living Standard. Named character references begin with & and end with ;, drawing from the entities.json table maintained by WHATWG (about 2,231 names as of the current spec, including legacy aliases without trailing semicolons such as &amp without the ;). Numeric character references use Unicode code points in either decimal (<) or hexadecimal (<) form and can encode any character from U+0000 to U+10FFFF except for the surrogate range U+D800-U+DFFF. The five characters that MUST be escaped to preserve HTML syntactic safety are & (&), < (<), > (>), " ("), and ' ('); note that ' is part of XML and HTML5 but is NOT valid in HTML 4.01, so OWASP recommends the numeric form ' for double-quote-delimited attributes that must round-trip through legacy parsers. Encoding in this tool is a single-pass replacement: the order matters because & must be escaped first, otherwise the entity prefixes inserted for < and > would themselves get re-escaped into &lt;. Decoding leverages the browser's HTML parser by assigning the input to a detached element's innerHTML and reading back textContent; this dispatches to the official Tokenizer state machine in the HTML spec (sections 13.2.5.72 Character reference state through 13.2.5.80), which correctly resolves named, decimal, and hex forms including malformed inputs like missing semicolons. Numeric encoding for the full-encode mode walks the string code-point by code-point using String.prototype.codePointAt to handle astral characters that occupy a UTF-16 surrogate pair (e.g., emoji U+1F600 becomes 😀 not the two-surrogate fallback). XSS prevention requires context-aware escaping, not just HTML-entity encoding. The OWASP Cross-Site Scripting Prevention Cheat Sheet defines five distinct contexts: HTML body, HTML attribute (quoted vs unquoted), JavaScript data (inside <script>), CSS, and URL. HTML-entity escaping covers contexts 1 and 2 only. JavaScript contexts should use \xHH or \uHHHH escapes via JSON.stringify, URL contexts need encodeURIComponent (RFC 3986 percent-encoding), and inline event handlers compound the rules because their values pass through both HTML and JavaScript parsers. A Content-Security-Policy header with script-src 'self' and 'unsafe-inline' removed is the modern defense-in-depth layer that catches escape mistakes, and DOM sinks such as innerHTML, document.write, and setAttribute('on*', ...) should be replaced with textContent or framework-managed bindings (React's JSX, Vue's mustache) that escape by default.

Named references: about 2,231 entries in WHATWG entities.json; the five must-escape names are & < > " ' (' is HTML5/XML-only, not HTML 4.01)
Numeric references: decimal &#DDDDD; and hexadecimal &#xHHHH; cover U+0000 to U+10FFFF; surrogates U+D800-U+DFFF and U+0000 NULL are invalid per HTML spec
Escape order: & must be replaced first, otherwise the inserted & prefix of subsequent escapes is double-encoded; encoding is O(n) with a 5-entry lookup table
Decoding via DOMParser: assigning to a detached element's innerHTML invokes the HTML spec tokenizer (Character reference state, sections 13.2.5.72-80) which handles legacy entities without trailing semicolons
Astral character handling: use String.prototype.codePointAt and for...of iteration so emoji and CJK extension B characters (U+10000+) produce a single &#NNNNN; rather than two surrogate references
Context-aware escaping (OWASP XSS Prevention Cheat Sheet rule #0): HTML body, HTML attribute, JavaScript, CSS, and URL each need different escaping; HTML entities alone do not stop XSS in JS or URL sinks
Defense in depth: Content-Security-Policy script-src 'self' (RFC-style), DOMPurify allowlist sanitization for rich-text input, and preferring textContent/innerText over innerHTML in vanilla DOM code

Examples

Basic element encoding

Input:  <script>alert(1)</script>
Output: &lt;script&gt;alert(1)&lt;/script&gt;
Use:    prevent the browser from interpreting the text as a real tag when rendering user-supplied content

Attribute value encoding

Input:  <div title="Hello & world">
Output: &lt;div title=&quot;Hello &amp; world&quot;&gt;
Note:   quotes and the ampersand inside the attribute are entity-encoded so the value cannot break out of the quotes

URL display in page

Input:  search?q=hello&lang=en
Output: search?q=hello&amp;lang=en
Use:    the page should encode the & before inserting the URL into HTML, otherwise the parser may treat the rest as a malformed entity

Non-ASCII characters (full encode)

Input:  CJK characters like 中文
Output: full UTF-8 numeric form &#20013;&#25991; (or named entities if the page supports them)
Use:    safe embedding of arbitrary Unicode into legacy HTML; modern pages usually rely on UTF-8 instead

FAQ

Which characters does HTML encoding convert?

The five SGML reserved characters: & → &, < → <, > → >, " → ", ' → ' (or '). Optionally non-ASCII characters can be converted to numeric entities (&#xNN;) for legacy systems that don't handle UTF-8.

When do I need HTML encoding?

Any time user-supplied text is inserted into HTML content. Failing to encode is the root cause of XSS vulnerabilities. Encode user content for HTML body, attribute values, JavaScript context, CSS context, and URL context - each context has slightly different rules.

What's the difference between ' and '?

Both produce a single quote. ' was added to HTML5 but is not valid in HTML4 or older email clients - if your output is read by old systems, use '. The page emits ' by default for maximum compatibility.

Why does my output still contain &?

If the input already contains an HTML entity like &, encoding it produces &amp; - which is correct because the input ampersand was a literal character, not an entity. Decode first if your source is already entity-encoded.

Are emojis converted?

Emojis are valid Unicode and modern HTML handles them as ordinary characters - no encoding needed unless your target system insists on ASCII-only. Toggle 'numeric entity for non-ASCII' to convert them to &#xNNNN; form.

Is encoding the same as URL encoding?

No. URL encoding (percent-encoding) replaces unsafe characters with %NN sequences for use in URLs. HTML encoding replaces them with named or numeric entities for use in HTML. Use the right tool for the right context - mixing them creates double-encoding bugs.

Is the conversion done locally?

Yes. Encoding and decoding happen in your browser. Pasted text is not uploaded.

Related Tools

URL Encoder & Decoder

Free online URL encoder and decoder for parameters, Unicode text, and special characters. Fix URL encoding issues instantly in your browser.

Base64 Encoder & Decoder

Free online Base64 encoder and decoder for UTF-8 text, Unicode, and images. Convert Base64 instantly with local browser processing and no signup.

Unicode Converter

Free online Unicode encoder & decoder supporting multiple formats: \uXXXX, &#xXXXX;, etc. Handle internationalized text easily, solve encoding issues.

HTML Formatting Tool

Free online HTML formatter with code beautification, minification & syntax highlighting. Clean up messy HTML code instantly, improve readability.

XML Formatting Tool

Free online XML formatter with auto-indentation, validation & minification. Customize indentation settings, quickly detect XML format errors.

JSON Escape Tool

Free online JSON escape tool for quick JSON string escaping and unescaping. Handle quotes, newlines, tabs and special characters, easy code embedding.