PDF to Word Converter
Upload a PDF file and convert it to an editable Word document in one click
Drag and drop a PDF file here, or click to select
Supports .pdf format
What is PDF to Word conversion?
PDF to Word is an online document conversion tool that quickly converts PDF files into editable Word documents (.docx). While PDFs preserve formatting across platforms, their content can't be edited directly. Converting to Word lets you freely modify text, adjust formatting, and add content without starting from scratch.
The file is uploaded to ToolAct's conversion service, where the PDF text layer is parsed, images are preserved, and table structures are reconstructed before a .docx file is returned. Files are deleted from the server immediately after conversion.
Before publishing or submitting, open the output and check readability, cropping, resolution, ordering, and missing content.
How to Use
How to use
- Click the upload area or drag a PDF file directly into it
- Choose output format (DOCX or DOC), then click "Convert to Word"
- After conversion, click "Download Word" to save the file locally
- Need to convert more files? Click "Convert another file" to upload again
Conversion Expectations
- PDF to Word conversion may not perfectly preserve layout, fonts, tables, or scanned text.
- Review the DOCX before editing or sharing, especially for contracts, resumes, and forms.
Use Cases
Technical Principle
PDF (ISO 32000-1 for PDF 1.7 and ISO 32000-2 for PDF 2.0) is a fixed-layout format whose page content is a stream of drawing operators (`Tf` to select font, `Td` to position the text cursor, `Tj` to show a glyph string, `Tm` for the text matrix) rather than a flowing document model. There is no concept of paragraph, heading or table at the file level; those are visual artifacts produced by absolutely positioned glyph runs. Converting to DOCX (Office Open XML, ECMA-376 / ISO/IEC 29500), which IS a flowing model with `<w:p>` paragraphs, `<w:tbl>` tables and run properties inside a ZIP container, is therefore a reconstruction problem rather than a translation. Text extraction itself depends on the PDF's `ToUnicode` CMap inside each embedded font: if the CMap is missing or maps glyph IDs to private-use Unicode codepoints (a common anti-copy pattern), the visible characters cannot be recovered without OCR even though the page renders correctly. The uploaded PDF is parsed by ToolAct's server-side conversion engine, which reads the page content stream into positioned text runs and reconstructs document structure on top of them. Reconstructing paragraphs requires clustering these runs by y-coordinate (within roughly one line-height), sorting by x-coordinate, detecting column boundaries from the histogram of x-starts, and inferring line breaks from gaps. Table reconstruction is harder: bordered tables can be recovered by intersecting the page's line operators (`re`, `l`, `S`) into a grid and assigning text runs to cells; borderless tables require column-detection heuristics like the ones in Tabula or Camelot, and accuracy drops sharply with merged cells or multi-line rows. Images are pulled from the page's XObject dictionary and re-embedded into the DOCX `media/` folder. The DOCX output is assembled as a ZIP containing `[Content_Types].xml`, `word/document.xml`, `word/styles.xml` and any media. Round-trip fidelity is bounded: text-born PDFs with single-column body text round-trip cleanly; multi-column scientific layouts, tables without borders, mathematical typesetting, ligatures whose CMap is missing, and rotated text all degrade. 100% preservation of an arbitrary PDF is provably impossible because the source model is strictly more expressive than the target. Files are deleted from the server immediately after conversion.
- PDF spec: ISO 32000-1 (PDF 1.7) / ISO 32000-2 (PDF 2.0). Page content is a stream of operators (`Tf`, `Td`, `Tj`, `Tm`) drawing positioned glyphs - no paragraph/heading/table at file level.
- DOCX spec: Office Open XML, ECMA-376 / ISO/IEC 29500. A ZIP of XML parts (`word/document.xml`, `word/styles.xml`, `[Content_Types].xml`) with `<w:p>` paragraphs and `<w:tbl>` tables - a flowing model.
- Text extraction depends on the font's ToUnicode CMap; PDFs with missing or PUA-mapped CMaps render correctly but extract as gibberish, forcing OCR fallback.
- Files are uploaded to ToolAct's server-side conversion engine and deleted immediately after the conversion completes.
- Paragraph reconstruction: cluster text runs by y-coordinate within ~1 line-height, sort by x, detect columns from the x-start histogram, infer line breaks from inter-run gaps.
- Table reconstruction: bordered tables come from intersecting `re`/`l`/`S` line operators into a grid; borderless tables need column-detection heuristics (Tabula/Camelot) and degrade on merged or multi-line cells.
- 100% PDF->DOCX preservation is provably impossible - the source model is strictly more expressive than the target. Multi-column scientific layouts, tables without borders, mathematical typesetting, and rotated text degrade most.
Examples
Contract editing
Received a PDF contract and need to modify terms? Convert to Word and edit directly.Report reuse
Convert a PDF report to Word to extract data and charts for a new document.Paper citation
Need to quote a paragraph from a PDF paper? Convert to Word for easy copy-paste.FAQ
Does my PDF stay on this device?
No. The PDF is uploaded to our conversion server, parsed there, and a Word file is sent back as a download. Avoid uploading PDFs containing personal IDs, signed contracts, or confidential reports - run a desktop converter locally for those.
Will scanned PDFs become editable text?
Only if the PDF already contains a text layer. Pure image scans without OCR come out as images embedded in the Word page; the words are not searchable or editable. Run OCR on the PDF before uploading if you need real text.
What output formats can I download?
The converter produces .docx (Word 2007+ XML format). Open the result in Microsoft Word, Google Docs, WPS, or LibreOffice. Other Word-compatible formats are not supported by this endpoint - re-save the .docx in your editor of choice if you need a different format.
Why does the layout differ from the original PDF?
PDF describes positioned glyphs on a page; Word describes flowing paragraphs. Multi-column layouts, sidebars, footnotes, and complex tables are reconstructed as best-effort and often need manual cleanup. Single-column body text usually transfers cleanly.
Are tables, lists, and formulas preserved?
Simple tables with visible borders convert reasonably well. Tables without borders, nested tables, merged cells, and bulleted lists generated by paragraph indents often come back as plain text or broken table fragments. Math formulas typeset by LaTeX/Word equation editors usually flatten into images.
Will embedded fonts and colors carry over?
Standard fonts (Times, Arial, Helvetica, common CJK families) carry over by name. PDFs that embed a custom font as a subset may render with a similar fallback in Word, which shifts kerning and line breaks slightly.
Is there a page or size limit?
Very long PDFs (hundreds of pages) or files with thousands of high-resolution images may time out. If a conversion fails, try splitting the PDF into smaller chunks or compressing the embedded images first.