2026-06-08·5 min read·sota.io Team

EU AI Act Annex IV Technical Documentation Automation: Generating Audit-Ready Records from Your CI/CD Pipeline

Post #1584 in the sota.io EU AI Act Compliance Automation Series (3/5)

EU AI Act Annex IV technical documentation automation CI/CD pipeline

There is a pattern that repeats across high-risk AI teams in the months before an enforcement deadline. The technical documentation is written once — usually by a compliance consultant, usually during the conformity assessment sprint — and then it immediately begins to diverge from reality. A new model version ships. A training dataset is updated. The risk management plan is amended. The documentation is not. By the time a national market surveillance authority requests your Annex IV package, the document on file describes a system that no longer exists.

This is not a compliance attitude problem. It is an architecture problem. Documentation that lives in a separate tool, maintained by a separate team, on a separate cadence from your deployments will always be stale. The only way to keep Annex IV documentation current is to make it a first-class output of your engineering pipeline, generated and updated automatically from the same sources that change your system.

This is the third post in the EU AI Act Compliance Automation series. The first post covered Art.72 post-market monitoring. The second post covered Art.73 incident detection. This post covers Annex IV: how to build a documentation pipeline that generates and maintains all eight required sections from your existing development artefacts.

What Art.11 and Annex IV Actually Require

Article 11 of Regulation (EU) 2024/1689 imposes two distinct obligations on providers of high-risk AI systems. First, you must draw up technical documentation in the form specified by Annex IV before the system is placed on the market or put into service. Second — and this is the part most teams miss — you must keep it up to date throughout the system's operational lifetime.

"Keep it up to date" is not a quarterly review obligation. Any substantial change to the system that could affect conformity with the requirements of Chapter III Section 2 triggers a documentation update obligation. In practice, for a machine learning system under active development, this means that every model version change, every training data update, and every modification to the risk management system is a potential documentation trigger.

Annex IV organises the required technical documentation into eight sections:

Section 1 — General description: the intended purpose, the version identification, the hardware and software dependencies, the deployment environments, and the instructions for use.

Section 2 — Detailed description of elements and development process: the system architecture, the design decisions, the monitoring and control mechanisms, the data flow diagrams, and the development methodology including the tools and frameworks used.

Section 3 — Information on the training methodology and training datasets: the provenance, the preparation and labelling procedures, the bias assessment, and the measures taken to prevent bias — plus validation and test dataset specifications including statistical characteristics.

Section 4 — Description of the risk management system in accordance with Art.9: the risk identification and analysis, the mitigation measures, the residual risks, the testing against those residual risks, and the post-market monitoring plan linked to the risk management lifecycle.

Section 5 — Human oversight mechanisms: the measures implemented in accordance with Art.14, the technical means for monitoring by natural persons, and the evidence that designated operators can actually override or interrupt the system.

Section 6 — Description of testing and validation procedures: the test protocols, the test results, and the measures taken in accordance with Art.15 concerning accuracy, robustness, and cybersecurity.

Section 7 — Standards applied: the harmonised or other standards applied, and where no harmonised standards exist, the description of other solutions adopted to meet the relevant requirements.

Section 8 — Post-market monitoring plan: the data collection mechanisms, the indicators, the thresholds that trigger the feedback mechanism, and the connection to the incident reporting obligations under Art.73.

Eight sections. For a system under active development, all eight need to stay current with every material change. Manual maintenance of this documentation is not a viable strategy past the first release.

Why Documentation Automation Is Not Optional

Consider what happens in a typical model update cycle. A new training dataset version is prepared. The model is retrained. Performance benchmarks are run. The model is validated and deployed. The entire cycle can complete in 72 hours for a well-automated ML platform.

At the end of that cycle, Section 3 (training data), Section 4 (risk management — because performance characteristics have changed), Section 6 (testing results), and Section 8 (post-market monitoring thresholds — potentially updated based on new performance baselines) are all stale. That is four of eight sections affected by a single routine update.

A team that manually updates documentation will fall behind within weeks of the first model update. The more mature the ML platform, the faster the documentation gap opens.

The automated approach treats documentation as a build artefact. Just as your CI/CD pipeline produces a container image as an output of the build stage, it produces documentation sections as an output of each pipeline stage. The documentation is assembled at the end from section artefacts, versioned alongside the model, and stored in a location where it can be retrieved on demand by an NCA inspector.

The Automation Architecture

The documentation pipeline consists of four layers:

Source layer: the artefacts that contain documentation-relevant information — code repositories, model registries, data lineage systems, test result stores, risk management tracking systems, and logging infrastructure.

Extraction layer: automated jobs that pull documentation-relevant information from each source and transform it into structured documentation content. This layer runs as part of your CI/CD pipeline at defined trigger points.

Assembly layer: a documentation generator that combines section artefacts into a complete, versioned Annex IV package. The generator handles formatting, cross-referencing, and version attribution.

Distribution layer: a compliant storage and access system that retains versions, provides retrieval by system version and date, and generates audit trails of access by NCA inspectors.

Section-by-Section Automation

Section 1: General Description (Automated from manifest files)

Section 1 is the most straightforward to automate because most of its required content already exists in machine-readable form in your repository.

# doc_generator/section1_general.py

import json
import subprocess
from pathlib import Path
from datetime import date

def generate_section1(repo_path: str, model_version: str) -> dict:
    """Extract Section 1 content from repository manifest files."""
    
    # Read system manifest (maintained by development team)
    manifest_path = Path(repo_path) / "compliance" / "system_manifest.json"
    with open(manifest_path) as f:
        manifest = json.load(f)
    
    # Extract git commit hash for version linking
    commit_hash = subprocess.check_output(
        ["git", "-C", repo_path, "rev-parse", "HEAD"]
    ).decode().strip()[:8]
    
    return {
        "section": "1",
        "generated_at": date.today().isoformat(),
        "model_version": model_version,
        "git_commit": commit_hash,
        "intended_purpose": manifest["intended_purpose"],
        "deployment_environments": manifest["deployment_environments"],
        "hardware_requirements": manifest["hardware_requirements"],
        "software_dependencies": extract_dependencies(repo_path),
        "instructions_for_use_path": manifest.get("instructions_for_use_path"),
        "system_description": manifest["system_description"],
    }

def extract_dependencies(repo_path: str) -> list[dict]:
    """Extract software dependencies from package files."""
    deps = []
    
    # Python requirements
    req_path = Path(repo_path) / "requirements.txt"
    if req_path.exists():
        with open(req_path) as f:
            for line in f:
                line = line.strip()
                if line and not line.startswith("#"):
                    deps.append({"type": "python", "package": line})
    
    return deps

The system_manifest.json file is the one document your team maintains manually. It contains the human-authored descriptions that cannot be extracted from code — intended purpose, deployment environment descriptions, and instructions for use. Everything else in Section 1 is extracted programmatically.

Section 2: Architecture and Development Process (Automated from code analysis)

Section 2 requires architecture documentation that can be partially automated from code structure analysis.

# doc_generator/section2_architecture.py

import ast
import json
from pathlib import Path

def generate_section2(repo_path: str, mlflow_run_id: str) -> dict:
    """Generate Section 2 from code analysis and MLflow run metadata."""
    
    # Extract model architecture from MLflow
    import mlflow
    run = mlflow.get_run(mlflow_run_id)
    
    architecture = {
        "model_type": run.data.params.get("model_type"),
        "framework": run.data.params.get("framework"),
        "layers_or_estimators": run.data.params.get("n_estimators") or run.data.params.get("num_layers"),
        "input_dimensionality": run.data.params.get("input_size"),
        "output_type": run.data.params.get("output_type"),
    }
    
    # Extract data flow from pipeline configuration
    pipeline_config_path = Path(repo_path) / "pipeline" / "config.yaml"
    pipeline_doc = extract_pipeline_documentation(pipeline_config_path)
    
    return {
        "section": "2",
        "model_architecture": architecture,
        "development_methodology": extract_methodology(repo_path),
        "monitoring_mechanisms": extract_monitoring_config(repo_path),
        "data_flow": pipeline_doc,
        "tools_and_frameworks": extract_tool_inventory(repo_path),
        "mlflow_run_id": mlflow_run_id,
        "mlflow_experiment": run.info.experiment_id,
    }

The MLflow integration is particularly important here. MLflow's run metadata contains the training configuration, parameter choices, and framework versions that Section 2 requires. If you are not using MLflow, Weights & Biases, or a comparable experiment tracker, Section 2 automation is significantly harder — and you should add one before your conformity assessment, not after.

Section 3: Training Data Documentation (Automated from DVC and data lineage)

Section 3 is the most technically demanding to automate because it requires evidence about your training data that many teams do not currently collect in a structured form.

# doc_generator/section3_training_data.py

import subprocess
import json
import hashlib
from pathlib import Path

def generate_section3(repo_path: str, dataset_version: str) -> dict:
    """Generate Section 3 from DVC metadata and data quality reports."""
    
    # Read DVC pipeline status for training data provenance
    dvc_lock_path = Path(repo_path) / "dvc.lock"
    with open(dvc_lock_path) as f:
        import yaml
        dvc_lock = yaml.safe_load(f)
    
    # Extract dataset hash for integrity verification
    dataset_stage = dvc_lock.get("stages", {}).get("prepare_data", {})
    dataset_hash = dataset_stage.get("deps", [{}])[0].get("md5", "unknown")
    
    # Read bias assessment report (generated by fairness evaluation step)
    bias_report_path = Path(repo_path) / "reports" / "bias_assessment.json"
    bias_report = {}
    if bias_report_path.exists():
        with open(bias_report_path) as f:
            bias_report = json.load(f)
    
    # Read data statistics report
    stats_report_path = Path(repo_path) / "reports" / "data_statistics.json"
    with open(stats_report_path) as f:
        stats = json.load(f)
    
    return {
        "section": "3",
        "dataset_version": dataset_version,
        "dataset_hash": dataset_hash,
        "provenance": extract_data_provenance(repo_path, dvc_lock),
        "preparation_procedures": extract_preparation_steps(repo_path),
        "labelling_procedures": read_annotation_guidelines(repo_path),
        "bias_assessment": bias_report,
        "bias_mitigation_measures": extract_bias_mitigations(repo_path),
        "statistical_characteristics": stats,
        "validation_dataset_split": stats.get("validation_split"),
        "test_dataset_split": stats.get("test_split"),
    }

If your team does not currently run a fairness evaluation step as part of your training pipeline, this is the moment to add one. The Fairlearn library (Microsoft Research, MIT licence) and AIF360 (IBM, Apache licence) both produce structured reports suitable for Annex IV inclusion. Add the fairness evaluation to your training pipeline before your conformity assessment.

Section 4: Risk Management System (Linked to your risk register)

Section 4 documentation must reflect your Art.9 risk management system in real time. The automation here is primarily about keeping a structured risk register that can be exported rather than maintaining a separate document.

# doc_generator/section4_risk_management.py

def generate_section4(risk_register_path: str, current_version: str) -> dict:
    """Generate Section 4 from structured risk register."""
    
    with open(risk_register_path) as f:
        register = json.load(f)
    
    # Filter to risks applicable to this version
    version_risks = [
        r for r in register["risks"]
        if r.get("introduced_in_version", "0") <= current_version
        and r.get("resolved_in_version", "9999") > current_version
    ]
    
    return {
        "section": "4",
        "system_version": current_version,
        "risk_identification_methodology": register["methodology"],
        "risks": [
            {
                "id": risk["id"],
                "category": risk["category"],
                "description": risk["description"],
                "severity": risk["severity"],
                "likelihood": risk["likelihood"],
                "mitigation_measures": risk["mitigations"],
                "residual_risk": risk["residual_risk"],
                "testing_evidence_path": risk.get("test_evidence_path"),
                "last_reviewed": risk["last_reviewed"],
            }
            for risk in version_risks
        ],
        "overall_residual_risk_level": calculate_overall_risk(version_risks),
        "post_market_monitoring_link": "section_8",
    }

The risk register format matters enormously here. A spreadsheet is not a machine-readable risk register. Define a JSON schema for your risk register and enforce it from the beginning of your Art.9 process. The investment pays back every time you generate Section 4 documentation automatically rather than extracting it manually from a spreadsheet.

Sections 5 Through 8: Evidence Aggregation

Sections 5 through 8 are primarily evidence aggregation exercises. The evidence already exists in your infrastructure — it just needs to be collected, versioned, and linked to the specific system version under assessment.

Section 5 (Human oversight): The human oversight mechanisms are documented in your system architecture, but the evidence that they work — the test results showing that override mechanisms function — lives in your test suite. Add a dedicated override mechanism test suite that produces structured JSON output, and include those results in Section 5 automatically.

Section 6 (Testing and validation): Your CI/CD pipeline already runs tests and produces results. The automation here is purely about capturing those results in a format suitable for Annex IV inclusion. Add a test result exporter to your pipeline that formats results as structured documentation rather than (only) as CI/CD pass/fail signals.

Section 7 (Standards applied): This section benefits from a static manifest file that lists the standards your system is designed to conform to, along with the evidence artefacts that demonstrate conformity. Maintain this manifest in your repository; it changes infrequently and is not worth automating beyond structured export.

Section 8 (Post-market monitoring): This section should be populated from the same monitoring infrastructure described in the first post in this series. Your Art.72 monitoring pipeline already defines the indicators, thresholds, and data collection mechanisms. Export them directly into Section 8 rather than maintaining a separate description.

The Documentation Assembly Pipeline

With section generators in place, the assembly step combines all sections into a complete, versioned package:

# doc_generator/assembler.py

from dataclasses import dataclass
from typing import Optional
import json
import hashlib
from pathlib import Path
from datetime import date

@dataclass
class AnnexIVPackage:
    system_id: str
    system_version: str
    generated_at: str
    sections: dict
    package_hash: str

def assemble_annex_iv_package(
    repo_path: str,
    model_version: str,
    mlflow_run_id: str,
    dataset_version: str,
) -> AnnexIVPackage:
    """Assemble complete Annex IV documentation package."""
    
    from section1_general import generate_section1
    from section2_architecture import generate_section2
    from section3_training_data import generate_section3
    from section4_risk_management import generate_section4
    
    sections = {
        "1": generate_section1(repo_path, model_version),
        "2": generate_section2(repo_path, mlflow_run_id),
        "3": generate_section3(repo_path, dataset_version),
        "4": generate_section4(f"{repo_path}/compliance/risk_register.json", model_version),
        "5": generate_section5_from_test_results(repo_path, model_version),
        "6": generate_section6_from_ci_results(repo_path, mlflow_run_id),
        "7": load_standards_manifest(repo_path),
        "8": generate_section8_from_monitoring_config(repo_path),
    }
    
    package_content = json.dumps(sections, sort_keys=True)
    package_hash = hashlib.sha256(package_content.encode()).hexdigest()
    
    package = AnnexIVPackage(
        system_id=sections["1"]["system_manifest"]["system_id"],
        system_version=model_version,
        generated_at=date.today().isoformat(),
        sections=sections,
        package_hash=package_hash,
    )
    
    # Write versioned package to documentation store
    output_path = Path(repo_path) / "compliance" / "annex_iv" / f"v{model_version}.json"
    output_path.parent.mkdir(parents=True, exist_ok=True)
    with open(output_path, "w") as f:
        json.dump(package.__dict__, f, indent=2, default=str)
    
    return package

def trigger_if_documentation_relevant_change(changed_files: list[str]) -> bool:
    """Determine whether changed files warrant a documentation update."""
    
    documentation_triggers = [
        "requirements.txt",
        "pipeline/config.yaml",
        "compliance/system_manifest.json",
        "compliance/risk_register.json",
        "models/",
        "data/",
        "training/",
    ]
    
    return any(
        any(trigger in changed_file for trigger in documentation_triggers)
        for changed_file in changed_files
    )

Integrate this into your GitHub Actions or GitLab CI pipeline:

# .github/workflows/annex_iv_documentation.yml
name: Generate Annex IV Documentation

on:
  push:
    branches: [main, release/*]
    paths:
      - 'models/**'
      - 'training/**'
      - 'data/**'
      - 'compliance/system_manifest.json'
      - 'compliance/risk_register.json'
      - 'requirements.txt'

jobs:
  generate-documentation:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.12'

      - name: Install documentation generator
        run: pip install -r doc_generator/requirements.txt

      - name: Generate Annex IV package
        env:
          MLFLOW_TRACKING_URI: ${{ secrets.MLFLOW_TRACKING_URI }}
          MODEL_VERSION: ${{ github.sha }}
          MLFLOW_RUN_ID: ${{ needs.train.outputs.mlflow_run_id }}
          DATASET_VERSION: ${{ needs.prepare-data.outputs.dataset_version }}
        run: |
          python -m doc_generator.assembler \
            --repo-path . \
            --model-version $MODEL_VERSION \
            --mlflow-run-id $MLFLOW_RUN_ID \
            --dataset-version $DATASET_VERSION

      - name: Upload documentation package
        uses: actions/upload-artifact@v4
        with:
          name: annex-iv-${{ github.sha }}
          path: compliance/annex_iv/
          retention-days: 3650  # 10-year retention via artifact storage

      - name: Store in compliance documentation store
        run: |
          python -m doc_generator.uploader \
            --package-path compliance/annex_iv/v${{ github.sha }}.json \
            --documentation-store ${{ secrets.COMPLIANCE_STORE_URL }}

Documentation Retention and NCA Inspection Readiness

The EU AI Act requires providers to maintain technical documentation and to make it available to national competent authorities on request. For high-risk AI systems, documentation must remain available for a significant period after the system is placed on the market.

Your documentation store must satisfy two operational requirements: it must index documentation by system version so you can retrieve the documentation that was current at any specific point in time, and it must support authenticated access so that NCA inspectors can be granted time-limited access to the documentation for a specific system version without accessing your broader engineering systems.

# compliance/documentation_store.py

class ComplianceDocumentationStore:
    """
    Documentation store with NCA-ready access controls.
    
    Stores versioned Annex IV packages and generates time-limited
    inspector access tokens for NCA audit requests.
    """
    
    def __init__(self, storage_backend: str, encryption_key: str):
        self.storage = S3CompatibleStorage(storage_backend)
        self.encryption_key = encryption_key
    
    def store_package(self, package: AnnexIVPackage) -> str:
        """Store documentation package and return storage reference."""
        
        # Encrypt package for storage
        encrypted = encrypt_package(package, self.encryption_key)
        
        # Store with version-indexed key
        storage_key = (
            f"annex_iv/"
            f"{package.system_id}/"
            f"{package.system_version}/"
            f"{package.generated_at}_"
            f"{package.package_hash[:8]}.json.enc"
        )
        
        self.storage.put(storage_key, encrypted)
        
        # Record in audit log
        self.record_access_event("STORE", package.system_id, package.system_version)
        
        return storage_key
    
    def generate_nca_access_token(
        self,
        system_id: str,
        system_version: str,
        valid_hours: int = 48,
        authority_name: str = "",
        authority_reference: str = "",
    ) -> str:
        """
        Generate time-limited NCA access token for specific system version.
        
        Records the access grant in the compliance audit log.
        """
        
        token = generate_time_limited_token(
            system_id=system_id,
            system_version=system_version,
            valid_until=now_plus_hours(valid_hours),
            scope="annex_iv_read",
        )
        
        # Mandatory audit logging of NCA access grant
        self.record_access_event(
            "NCA_ACCESS_GRANT",
            system_id,
            system_version,
            metadata={
                "authority_name": authority_name,
                "authority_reference": authority_reference,
                "valid_hours": valid_hours,
                "token_hash": hash_token(token),
            }
        )
        
        return token
    
    def retrieve_for_version(self, system_id: str, system_version: str) -> AnnexIVPackage:
        """Retrieve the most recent documentation package for a specific system version."""
        
        prefix = f"annex_iv/{system_id}/{system_version}/"
        keys = sorted(self.storage.list(prefix))
        
        if not keys:
            raise DocumentationNotFoundError(system_id, system_version)
        
        latest_key = keys[-1]
        encrypted = self.storage.get(latest_key)
        package = decrypt_package(encrypted, self.encryption_key)
        
        self.record_access_event("RETRIEVE", system_id, system_version)
        
        return package

For EU-compliant storage, run this on infrastructure subject to EU jurisdiction. Hetzner Object Storage (Nuremberg/Falkenstein data centres), IONOS S3-compatible Object Storage, and OVHcloud Object Storage all provide S3-compatible APIs under EU-only legal jurisdiction without CLOUD Act exposure.

The Continuous Documentation Maintenance Model

The key operational shift that documentation automation enables is moving from a point-in-time documentation model to a continuous documentation model. Under the point-in-time model, documentation is produced during the conformity assessment and then manually updated — infrequently and incompletely — as the system evolves. Under the continuous model, documentation is produced automatically on every material change and is always current.

The documentation pipeline trigger conditions should be tuned to your update cadence. At minimum, trigger on:

The full assembly only runs on model deployment events. Individual section updates run on their respective triggers and are incorporated into the next full assembly.

EU-Native Tool Stack

The following EU-native or EU-jurisdiction tools integrate well with the documentation pipeline described above:

ComponentEU-Native OptionIntegration Point
Experiment trackingDetermined AI (San Francisco, but EU-deployable) / MLflow self-hosted on HetznerSection 2 and 6
Data versioningDVC (open source, self-hosted)Section 3
Risk registerCustom JSON schema in repository / JIRA EU-managed instanceSection 4
Test resultspytest with structured JSON outputSection 5 and 6
Compliance storageHetzner Object Storage / IONOS Object Storage / OVHcloudDocumentation store
CI/CDGitLab (EU SaaS at gitlab.com) / Forgejo self-hostedPipeline runner
Bias evaluationFairlearn (open source, self-hosted)Section 3

The documentation generator itself is a Python service that you run in your own infrastructure. It does not transmit documentation content to any third-party service.

30-Item Checklist: Annex IV Documentation Automation

Foundation (complete before automation)

Section generators (implement and test)

Assembly and storage (wire the pipeline)

CI/CD integration (automation triggers)

NCA inspection readiness

Ongoing governance

What Comes Next in This Series

This post has covered the documentation automation layer. The next post in this series covers Art.50 GPAI watermarking pipelines — the automated content labelling infrastructure required for general-purpose AI model providers before the August 2026 enforcement deadline. The fifth and final post assembles all automation layers into a full compliance automation stack with a reference implementation.

The posts in this series build on each other. The Art.72 monitoring infrastructure from the first post feeds into Section 8 of your Annex IV documentation. The Art.73 incident detection pipeline from the second post triggers documentation update events when an incident affects system specifications. The documentation pipeline from this post provides the evidence store that your Art.73 incident reports reference.

Compliance automation is not a collection of independent tools. It is a single pipeline with the technical documentation package as its output and continuous monitoring as its operational mode.


This post is part of the sota.io EU AI Act Compliance Automation series. Previous posts: Art.72 Post-Market Monitoring · Art.73 Incident Detection. Next: Art.50 GPAI Watermarking Pipelines.

EU-Native Hosting

Ready to move to EU-sovereign infrastructure?

sota.io is a German-hosted PaaS — no CLOUD Act exposure, no US jurisdiction, full GDPR compliance by design. Deploy your first app in minutes.