-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
Context
AI-generated Word documents are everywhere — reports, proposals, analyses created by LLMs. But the DOCX format carries no metadata about AI generation, trust level, or source provenance.
python-docx supports custom XML parts (customXml), which is the natural place to store this.
The Question
Is there a recommended pattern for storing custom provenance metadata in DOCX files via python-docx? Specifically:
- Can
customXmlparts be read/written through the current API? - Is there interest in a helper for AI provenance (e.g.,
document.core_properties.ai_generated = True)?
Why This Matters
EU AI Act Article 50 (August 2, 2026) requires transparency metadata on AI-generated content. DOCX is a primary format for enterprise content. Having a standard way to mark AI provenance in Word docs is becoming a compliance requirement.
Existing Approach
AKF embeds provenance into DOCX custom XML:
<akf:metadata>{"v":"1.0","claims":[{"c":"Q3 report","t":0.85,"src":"SEC 10-Q"}]}</akf:metadata>But curious if python-docx has its own approach or if this should be handled at the application level.