Skip to content

AI provenance metadata in DOCX custom XML parts #1546

@HMAKT99

Description

@HMAKT99

Context

AI-generated Word documents are everywhere — reports, proposals, analyses created by LLMs. But the DOCX format carries no metadata about AI generation, trust level, or source provenance.

python-docx supports custom XML parts (customXml), which is the natural place to store this.

The Question

Is there a recommended pattern for storing custom provenance metadata in DOCX files via python-docx? Specifically:

  1. Can customXml parts be read/written through the current API?
  2. Is there interest in a helper for AI provenance (e.g., document.core_properties.ai_generated = True)?

Why This Matters

EU AI Act Article 50 (August 2, 2026) requires transparency metadata on AI-generated content. DOCX is a primary format for enterprise content. Having a standard way to mark AI provenance in Word docs is becoming a compliance requirement.

Existing Approach

AKF embeds provenance into DOCX custom XML:

<akf:metadata>{"v":"1.0","claims":[{"c":"Q3 report","t":0.85,"src":"SEC 10-Q"}]}</akf:metadata>

But curious if python-docx has its own approach or if this should be handled at the application level.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions