ToolActToolAct

XML Formatting Tool

XML Input
Formatted Output
Lines: 1Characters: 0Bytes: 0
Lines: 1Characters: 0

What is XML Formatting?

The XML Formatter turns compact or messy XML into an indented, readable structure. XML appears in configuration files, SOAP messages, sitemaps, RSS feeds, office documents, build artifacts, and many older enterprise integrations, where one missing closing tag or incorrectly escaped character can break processing. The tool helps reveal nested elements, attributes, text nodes, namespaces, and likely error locations more quickly than reading a single compressed line. Common uses include debugging, documentation, comparison, and teaching, but schema validation is still required when a specific XSD, API contract, or partner format must be followed. Formatting improves readability; it does not automatically make invalid XML correct.

How to Use

How to use

  1. Paste or enter XML data in the left input box
  2. Select indent size (2 spaces, 4 spaces, or Tab)
  3. Click Format to beautify or Minify to remove whitespace
  4. Results display on the right with syntax highlighting
  5. Click Copy or Download to save the result

XML Notes

  • Formatting changes whitespace, not document meaning, but whitespace can still matter inside text nodes and mixed-content documents.
  • If parsing fails, check unclosed tags, mismatched nesting, duplicate attributes, and unescaped characters such as ampersands.

Use Cases

Reformat compact XML after validating that it parsesPaste a minified feed, SOAP message, sitemap, SVG fragment, or device response and the browser XML parser checks it before output is produced. Valid XML is then expanded with 2 spaces, 4 spaces, or tabs so nested elements are much easier to inspect.
Keep XML declarations, comments, CDATA, and self-closing tags readableThe formatter handles processing instructions, comments, CDATA blocks, declarations, closing tags, text nodes, and self-closing tags separately instead of treating the document as plain text. Configuration files and integration samples keep their special XML sections visible.
Create either readable XML or a compact payload for handoffUse format mode when reviewing hierarchy and minify mode when preparing a smaller payload for examples, test fixtures, or transport. Parse errors are shown with available line and column information, while schema validation and business-rule checks remain the responsibility of the target XML system.
Locate a missing tag by reading the parser error positionWhen the page reports a parse failure, note the line and column, then look just above for an unclosed parent or a stray closing tag. The formatter only fixes whitespace, so a real structural fix still needs to be hand-edited into the source before re-formatting. Self-closing shorthand like `
` (an SGML feature accepted by XML 1.0 §3.1) is preserved as such, so HTML void elements that travel inside XHTML or RSS must keep their slash; pairing a `

` instead will silently change the rendered page in HTML5 parsers. CDATA sections, including the literal `]]>` terminator rule, are also kept intact rather than being unescaped into `&` entities.
Tidy an SVG fragment before embedding it in markupPaste a one-line SVG, expand it with 2-space indentation, and verify path data, viewBox, and namespace declarations are intact. The reformatted SVG can then be inlined into HTML without breaking the shape, and the hierarchy makes it easier to debug transform or gradient attributes later. Per the W3C XML 1.0 spec, attribute order on an element is not canonical, so reformatting may shuffle id and class to a different position than the source, and any namespace prefix declared as xmlns:svg on the root svg element must remain there or the browser will not recognize the children. The XML declaration starting with <?xml version is optional for HTML inline SVG but required for standalone XML feeds.

Technical Principle

XML formatting is grounded in the W3C XML 1.0 (Fifth Edition) recommendation. Parsing happens through `new DOMParser().parseFromString(src, 'application/xml')` (or `text/xml`), which returns a Document whose nodes match the seven productions defined in the spec: Element, Attr, Text, CDATASection, Comment, ProcessingInstruction and DocumentType. Unlike HTML, XML parsers are strict: a mismatched tag, an unescaped `&`, or a duplicate attribute aborts parsing and the DOMParser returns a `<parsererror>` element whose body carries the line and column. Server-side equivalents include libxml2 (`xmllint --format`), Python's `xml.etree.ElementTree`, Java's StAX/SAX and .NET's `XmlReader`. The printer serializes the DOM node-by-node: each Element opens on its own line at depth*indent spaces, children recurse with depth+1, the closing tag aligns with the opening tag, and empty elements collapse to the self-closing shorthand `<foo/>` which §3.1 of the spec declares semantically equivalent to `<foo></foo>`. Five predefined entities are always re-encoded in text and attribute values: `&amp;` `&lt;` `&gt;` `&quot;` `&apos;`. CDATA sections (`<![CDATA[ ... ]]>`) are preserved verbatim because they are the explicit escape hatch for content that would otherwise need entity encoding; the terminator `]]>` cannot appear inside a CDATA block per §2.7, which the parser enforces. Processing instructions like `<?xml-stylesheet?>` and the XML declaration `<?xml version="1.0" encoding="UTF-8"?>` are kept at the document prolog. The DOCTYPE declaration is round-tripped as a single string. Two subtleties drive the bulk of formatter complexity. First, mixed content - an element containing both text and child elements, like `<p>Hello <b>world</b>!</p>` - cannot be re-indented without altering the document infoset, because every whitespace character in such a context is a significant Text node. Formatters detect mixed content by scanning whether an Element has any non-whitespace Text child and switch to a single-line serialization for that subtree. Second, namespace declarations (`xmlns`, `xmlns:prefix`) must remain on the element where they are first declared; moving them changes scope. Attribute order is not canonical per the spec, so most formatters apply alphabetical sort while preserving any leading `xmlns:*` declarations. Parsing and serialization are O(n) over document length; very large feeds typically stream via SAX rather than building a DOM in memory.

  • Spec: W3C XML 1.0 Fifth Edition (REC-xml-20081126); the DOM is built via `DOMParser().parseFromString(src, 'application/xml')`, equivalent to libxml2 or `xmllint --format` server-side.
  • Strict by design: mismatched tags, unescaped `&`/`<`, duplicate attributes or invalid characters trigger a `<parsererror>` element; HTML's forgiving recovery does not apply.
  • Five predefined entities re-encoded in text and attribute values: `&amp;` `&lt;` `&gt;` `&quot;` `&apos;`. Numeric character references like `&#10;` are preserved as-is.
  • Self-closing `<foo/>` and `<foo></foo>` are semantically identical per §3.1; CDATA sections are preserved verbatim and may not contain the terminator `]]>`.
  • Mixed content (text + child elements) cannot be re-indented without changing the infoset - the printer detects significant whitespace and serializes those subtrees inline.
  • Namespace declarations (`xmlns`, `xmlns:prefix`) stay on the element of first declaration; moving them changes scope. Attribute order is not canonical, so a stable sort (xmlns first, then alphabetical) is typical.
  • Complexity: O(n) parse and O(n) serialize for DOM-based tools; large documents stream via SAX/StAX (xml.sax in Python, javax.xml.stream in Java) to avoid loading the full tree into memory.

Examples

Basic Elements

<!-- Input (one line) -->
<book><title>XML Guide</title><author>Jane Doe</author><year>2024</year></book>

<!-- Output (2 spaces) -->
<book>
  <title>XML Guide</title>
  <author>Jane Doe</author>
  <year>2024</year>
</book>

With Attributes

<?xml version="1.0" encoding="UTF-8"?>
<catalog>
  <book id="b001" lang="en" available="true">
    <title>Effective XML</title>
    <price currency="USD">29.99</price>
  </book>
</catalog>

Nested Structure with Namespaces and CDATA

<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
  <soap:Body>
    <ns:GetUser xmlns:ns="https://example.com/api">
      <ns:UserId>10086</ns:UserId>
      <ns:Script>
        <![CDATA[ if (a < b && b > 0) { return true; } ]]>
      </ns:Script>
    </ns:GetUser>
  </soap:Body>
</soap:Envelope>

FAQ

What does it do?

Pretty-prints XML by indenting nested elements, putting each element on its own line, and aligning closing tags. Useful for inspecting SOAP responses, RSS feeds, configuration files, and other XML payloads that arrived as one giant line.

Does it validate against a schema?

No. It formats whatever well-formed XML you paste. Schema (XSD, DTD, RELAX NG) validation needs a separate tool. Well-formedness errors (mismatched tags, missing close brackets) are reported but the page won't fix them.

Will it preserve attribute order?

Yes. XML attribute order is technically not significant per the spec, but the formatter keeps the original order to avoid surprising you. CDATA sections, comments, and processing instructions are also preserved.

How is whitespace handled?

Whitespace between elements is normalized (one element per line with indentation). Whitespace inside text content is preserved by default - element content is significant. xml:space='preserve' attributes are honored where present.

Can I minify XML?

Some builds offer a minify mode that strips inter-element whitespace. Be careful: any whitespace that was significant (text content, xml:space='preserve' regions) should be preserved. Test the round-trip if your XML has rich text content.

Is the XML uploaded?

No. Parsing and pretty-printing run in your browser using DOMParser and XMLSerializer (built into the browser). Nothing is transmitted.

What about XML namespaces?

Namespace declarations (xmlns) and prefixed elements/attributes are formatted exactly as written. Re-declaring the same namespace in nested elements is preserved (some validators care, others don't).