XML Formatting Tool
What is XML Formatting?
The XML Formatter turns compact or messy XML into an indented, readable structure. XML appears in configuration files, SOAP messages, sitemaps, RSS feeds, office documents, build artifacts, and many older enterprise integrations, where one missing closing tag or incorrectly escaped character can break processing. The tool helps reveal nested elements, attributes, text nodes, namespaces, and likely error locations more quickly than reading a single compressed line. Common uses include debugging, documentation, comparison, and teaching, but schema validation is still required when a specific XSD, API contract, or partner format must be followed. Formatting improves readability; it does not automatically make invalid XML correct.
How to Use
How to use
- Paste or enter XML data in the left input box
- Select indent size (2 spaces, 4 spaces, or Tab)
- Click Format to beautify or Minify to remove whitespace
- Results display on the right with syntax highlighting
- Click Copy or Download to save the result
XML Notes
- Formatting changes whitespace, not document meaning, but whitespace can still matter inside text nodes and mixed-content documents.
- If parsing fails, check unclosed tags, mismatched nesting, duplicate attributes, and unescaped characters such as ampersands.
Use Cases
` (an SGML feature accepted by XML 1.0 §3.1) is preserved as such, so HTML void elements that travel inside XHTML or RSS must keep their slash; pairing a `
` instead will silently change the rendered page in HTML5 parsers. CDATA sections, including the literal `]]>` terminator rule, are also kept intact rather than being unescaped into `&` entities.
Technical Principle
XML formatting is grounded in the W3C XML 1.0 (Fifth Edition) recommendation. Parsing happens through `new DOMParser().parseFromString(src, 'application/xml')` (or `text/xml`), which returns a Document whose nodes match the seven productions defined in the spec: Element, Attr, Text, CDATASection, Comment, ProcessingInstruction and DocumentType. Unlike HTML, XML parsers are strict: a mismatched tag, an unescaped `&`, or a duplicate attribute aborts parsing and the DOMParser returns a `<parsererror>` element whose body carries the line and column. Server-side equivalents include libxml2 (`xmllint --format`), Python's `xml.etree.ElementTree`, Java's StAX/SAX and .NET's `XmlReader`. The printer serializes the DOM node-by-node: each Element opens on its own line at depth*indent spaces, children recurse with depth+1, the closing tag aligns with the opening tag, and empty elements collapse to the self-closing shorthand `<foo/>` which §3.1 of the spec declares semantically equivalent to `<foo></foo>`. Five predefined entities are always re-encoded in text and attribute values: `&` `<` `>` `"` `'`. CDATA sections (`<![CDATA[ ... ]]>`) are preserved verbatim because they are the explicit escape hatch for content that would otherwise need entity encoding; the terminator `]]>` cannot appear inside a CDATA block per §2.7, which the parser enforces. Processing instructions like `<?xml-stylesheet?>` and the XML declaration `<?xml version="1.0" encoding="UTF-8"?>` are kept at the document prolog. The DOCTYPE declaration is round-tripped as a single string. Two subtleties drive the bulk of formatter complexity. First, mixed content - an element containing both text and child elements, like `<p>Hello <b>world</b>!</p>` - cannot be re-indented without altering the document infoset, because every whitespace character in such a context is a significant Text node. Formatters detect mixed content by scanning whether an Element has any non-whitespace Text child and switch to a single-line serialization for that subtree. Second, namespace declarations (`xmlns`, `xmlns:prefix`) must remain on the element where they are first declared; moving them changes scope. Attribute order is not canonical per the spec, so most formatters apply alphabetical sort while preserving any leading `xmlns:*` declarations. Parsing and serialization are O(n) over document length; very large feeds typically stream via SAX rather than building a DOM in memory.
- Spec: W3C XML 1.0 Fifth Edition (REC-xml-20081126); the DOM is built via `DOMParser().parseFromString(src, 'application/xml')`, equivalent to libxml2 or `xmllint --format` server-side.
- Strict by design: mismatched tags, unescaped `&`/`<`, duplicate attributes or invalid characters trigger a `<parsererror>` element; HTML's forgiving recovery does not apply.
- Five predefined entities re-encoded in text and attribute values: `&` `<` `>` `"` `'`. Numeric character references like ` ` are preserved as-is.
- Self-closing `<foo/>` and `<foo></foo>` are semantically identical per §3.1; CDATA sections are preserved verbatim and may not contain the terminator `]]>`.
- Mixed content (text + child elements) cannot be re-indented without changing the infoset - the printer detects significant whitespace and serializes those subtrees inline.
- Namespace declarations (`xmlns`, `xmlns:prefix`) stay on the element of first declaration; moving them changes scope. Attribute order is not canonical, so a stable sort (xmlns first, then alphabetical) is typical.
- Complexity: O(n) parse and O(n) serialize for DOM-based tools; large documents stream via SAX/StAX (xml.sax in Python, javax.xml.stream in Java) to avoid loading the full tree into memory.
Examples
Basic Elements
<!-- Input (one line) -->
<book><title>XML Guide</title><author>Jane Doe</author><year>2024</year></book>
<!-- Output (2 spaces) -->
<book>
<title>XML Guide</title>
<author>Jane Doe</author>
<year>2024</year>
</book>With Attributes
<?xml version="1.0" encoding="UTF-8"?>
<catalog>
<book id="b001" lang="en" available="true">
<title>Effective XML</title>
<price currency="USD">29.99</price>
</book>
</catalog>Nested Structure with Namespaces and CDATA
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<soap:Body>
<ns:GetUser xmlns:ns="https://example.com/api">
<ns:UserId>10086</ns:UserId>
<ns:Script>
<![CDATA[ if (a < b && b > 0) { return true; } ]]>
</ns:Script>
</ns:GetUser>
</soap:Body>
</soap:Envelope>FAQ
What does it do?
Pretty-prints XML by indenting nested elements, putting each element on its own line, and aligning closing tags. Useful for inspecting SOAP responses, RSS feeds, configuration files, and other XML payloads that arrived as one giant line.
Does it validate against a schema?
No. It formats whatever well-formed XML you paste. Schema (XSD, DTD, RELAX NG) validation needs a separate tool. Well-formedness errors (mismatched tags, missing close brackets) are reported but the page won't fix them.
Will it preserve attribute order?
Yes. XML attribute order is technically not significant per the spec, but the formatter keeps the original order to avoid surprising you. CDATA sections, comments, and processing instructions are also preserved.
How is whitespace handled?
Whitespace between elements is normalized (one element per line with indentation). Whitespace inside text content is preserved by default - element content is significant. xml:space='preserve' attributes are honored where present.
Can I minify XML?
Some builds offer a minify mode that strips inter-element whitespace. Be careful: any whitespace that was significant (text content, xml:space='preserve' regions) should be preserved. Test the round-trip if your XML has rich text content.
Is the XML uploaded?
No. Parsing and pretty-printing run in your browser using DOMParser and XMLSerializer (built into the browser). Nothing is transmitted.
What about XML namespaces?
Namespace declarations (xmlns) and prefixed elements/attributes are formatted exactly as written. Re-declaring the same namespace in nested elements is preserved (some validators care, others don't).