File Reconstruction

Rebuild Translated Files With Structured Output, Not a Text Dump

Translation is only half the job. AI-DocTranslate also rebuilds the translated content back into the original file format, as a reconstructed PDF or a translated XLIFF ZIP package. For XLIFF, the package structure is fully preserved. For PDF, the pipeline closely follows the source layout, though exact reproduction cannot be guaranteed for every document.

Two reconstruction pipelines

PDF

PDF Reconstruction

  1. 1

    Structured extraction

    The source PDF is parsed into a content model that includes text blocks, headings, tables, and captions, and each item retains its position and style information.

  2. 2

    Translation with context

    Each content block is translated independently, keeping the structural metadata intact so the rebuild step knows how to place the translated text.

  3. 3

    PDF rebuild

    The translated content blocks are composed back into a PDF that closely follows the source layout. The pipeline aims to reproduce headings, paragraphs, and tables in their original positions, but exact layout fidelity depends on the complexity of the source document and cannot be guaranteed.

Output: A translated PDF file, a document model JSON, a QA report JSON, and a supporting artifacts ZIP.
XLF

XLIFF Package Reconstruction

  1. 1

    Segment parsing

    Each XLIFF file is parsed at the segment level. Source text, inline codes, placeholders, and target fields are read individually so no structural information is lost.

  2. 2

    Segment translation

    The AI translates each segment in its full document context, preserving inline tags, variable references, and cross reference markers in the translated target text.

  3. 3

    Package rebuild

    The translated target content is written back into XLIFF files that follow the same version, namespace, and segment structure as the source. The files are packaged into a ZIP with the same folder hierarchy as the upload.

Output: A translated XLIFF ZIP in the original package structure, a manifest JSON, and a QA report JSON.

Why reconstruction matters

Most AI translation tools return a block of translated text. That output requires the user to manually format again, paginate again, and import the content back into the target system, which can take as long as the translation itself for complex documents.

AI-DocTranslate treats reconstruction as a core step in the pipeline. The translation model and the rebuild step share the same structured content model, so translated content is placed back into the document rather than returned as flat text. For XLIFF, this means the package structure and segment IDs are fully preserved. For PDF, the pipeline closely follows the source layout, but PDF reconstruction is a best effort process and results vary with document complexity.

The result is a translated file that requires significantly less manual work than raw translated text, though PDF output should always be reviewed before distribution, especially for complex or heavily formatted documents.