2026-06-03·5 min read·sota.io Team

EU AI Act Art.15 Accuracy Testing: Automated Robustness & Cybersecurity CI/CD Gates for High-Risk AI 2026

Post #1468 — EU AI Act CICD Compliance Testing Series #2/5

EU AI Act Art.15 CI/CD accuracy and robustness testing gates for high-risk AI

In Part 1 of this series, we built the scaffolding for an EU AI Act CI/CD compliance pipeline. Now we go deeper into the most technically demanding obligation: EU AI Act Art.15 — Accuracy, Robustness, and Cybersecurity.

Art.15 is not about paperwork. It mandates that high-risk AI systems must actually perform at declared accuracy levels, remain resilient against errors and adversarial inputs, and be protected against cybersecurity threats — throughout their lifecycle. This post shows you how to codify all three dimensions as automated gates in your CI/CD pipeline.

August 2, 2026 is 61 days away. Every high-risk AI provider needs these gates deployed before then.

What Art.15 Actually Requires

EU AI Act Art.15 imposes three distinct obligations on high-risk AI system providers:

1. Accuracy — Declared and Maintained

High-risk AI systems must achieve an appropriate level of accuracy for their intended purpose. Critically, providers must declare the relevant accuracy metrics in the instructions for use (linked to the EU AI database registration). This creates an auditable commitment: the accuracy you declare is the accuracy the NCA can test.

What this means for CI/CD: your pipeline must verify that every model version meets the declared accuracy threshold before deployment. A model that drifts below its declared metric is an Art.15 violation.

2. Robustness — Resilience Throughout the Lifecycle

Art.15 requires high-risk AI systems to be resilient with regard to errors, faults, or inconsistencies — both in inputs and in the system itself. This means your system should produce consistent, reliable outputs even when encountering:

Corrupted or malformed input data
Unusual edge cases outside the training distribution
Hardware or infrastructure failures
Adversarial inputs designed to degrade performance

3. Cybersecurity — Protection Against Manipulation

The cybersecurity dimension of Art.15 requires protection against attempts by third parties to exploit system vulnerabilities to alter the AI system's use, outputs, or performance. This connects directly to your model supply chain security: the weights, training pipelines, and inference endpoints are all in scope.

Translating Art.15 into CI/CD Gates

Here is the mapping from Art.15 obligations to concrete pipeline gates:

Art.15 Dimension	CI/CD Gate	Pass Condition
Accuracy	Performance regression check	Accuracy ≥ declared threshold
Accuracy	Metric drift detection	No statistically significant drop vs baseline
Robustness	Edge case / OOD tests	Graceful degradation, no crashes
Robustness	Adversarial perturbation test	Accuracy drop <10% under perturbation
Robustness	Data corruption injection	System handles malformed inputs
Cybersecurity	Model integrity check	Hash matches registered artifact
Cybersecurity	Dependency vulnerability scan	No critical CVEs in model dependencies
Cybersecurity	Inference endpoint security test	No model inversion / extraction attacks

Gate 1: Accuracy Regression Check

This is the foundation gate. Every deployment must verify the model meets its declared accuracy threshold.

# ci_gates/art15_accuracy_gate.py
import json
import sys
from pathlib import Path

def check_accuracy_gate(
    model_metrics_path: str,
    declared_threshold: float,
    metric_name: str = "accuracy"
) -> dict:
    """
    EU AI Act Art.15 accuracy gate.
    Verifies model meets its declared accuracy threshold before deployment.
    """
    with open(model_metrics_path) as f:
        metrics = json.load(f)
    
    current_accuracy = metrics.get(metric_name)
    if current_accuracy is None:
        return {
            "gate": "art15_accuracy",
            "status": "BLOCKED",
            "reason": f"Metric '{metric_name}' not found in evaluation output",
            "eu_ai_act_article": "Art.15"
        }
    
    passed = current_accuracy >= declared_threshold
    return {
        "gate": "art15_accuracy",
        "status": "PASSED" if passed else "BLOCKED",
        "metric": metric_name,
        "current_value": current_accuracy,
        "declared_threshold": declared_threshold,
        "delta": current_accuracy - declared_threshold,
        "eu_ai_act_article": "Art.15",
        "deployment_blocked": not passed,
        "remediation": None if passed else (
            f"Model accuracy {current_accuracy:.4f} is below declared threshold "
            f"{declared_threshold:.4f}. Retraining or threshold revision required."
        )
    }

if __name__ == "__main__":
    # Load from compliance config
    config = json.loads(Path("compliance/art15_config.json").read_text())
    result = check_accuracy_gate(
        model_metrics_path="artifacts/evaluation_metrics.json",
        declared_threshold=config["declared_accuracy_threshold"],
        metric_name=config.get("primary_metric", "accuracy")
    )
    print(json.dumps(result, indent=2))
    if result["status"] == "BLOCKED":
        sys.exit(1)  # Fail the CI build

The compliance/art15_config.json is the source of truth linking your pipeline to your EU AI database declaration:

{
  "system_name": "High-Risk Recruitment AI v3",
  "eu_ai_act_article": "Art.15",
  "declared_accuracy_threshold": 0.92,
  "primary_metric": "f1_score",
  "accuracy_metric_source": "EU_AI_Database_Registration_ID_DE-2026-HR-0047",
  "last_updated": "2026-05-15"
}

Gate 2: Robustness Test Suite

Robustness testing verifies the system behaves correctly under stress conditions. The key Art.15 insight: robustness isn't about perfection, it's about graceful degradation and consistency.

# ci_gates/art15_robustness_gate.py
import numpy as np
from typing import Callable

class Art15RobustnessGate:
    """
    EU AI Act Art.15 robustness gate.
    Tests model resilience against perturbations and edge cases.
    """
    
    def __init__(self, model_fn: Callable, baseline_accuracy: float):
        self.model = model_fn
        self.baseline_accuracy = baseline_accuracy
        self.results = []
    
    def test_input_perturbation(
        self, 
        test_data, 
        labels,
        noise_std: float = 0.05,
        max_accuracy_drop: float = 0.10
    ) -> dict:
        """Test accuracy under Gaussian noise perturbation."""
        noisy_data = test_data + np.random.normal(0, noise_std, test_data.shape)
        perturbed_accuracy = self._evaluate(noisy_data, labels)
        accuracy_drop = self.baseline_accuracy - perturbed_accuracy
        passed = accuracy_drop <= max_accuracy_drop
        return {
            "test": "input_perturbation",
            "noise_std": noise_std,
            "baseline_accuracy": self.baseline_accuracy,
            "perturbed_accuracy": perturbed_accuracy,
            "accuracy_drop": accuracy_drop,
            "max_allowed_drop": max_accuracy_drop,
            "status": "PASSED" if passed else "BLOCKED"
        }
    
    def test_missing_values(self, test_data, labels) -> dict:
        """Test behavior with missing/NaN inputs — common real-world failure mode."""
        corrupted = test_data.copy().astype(float)
        mask = np.random.random(corrupted.shape) < 0.05  # 5% missing
        corrupted[mask] = np.nan
        
        try:
            accuracy = self._evaluate(corrupted, labels)
            crashed = False
        except Exception as e:
            accuracy = 0.0
            crashed = True
        
        return {
            "test": "missing_values",
            "missing_rate": 0.05,
            "accuracy_with_missing": accuracy,
            "crashed": crashed,
            "status": "PASSED" if not crashed and accuracy > 0.5 else "BLOCKED"
        }
    
    def test_edge_cases(self, edge_case_fixtures: list) -> dict:
        """Test against known edge case fixtures (regulatory requirement for Annex III systems)."""
        failures = []
        for fixture in edge_case_fixtures:
            try:
                prediction = self.model(fixture["input"])
                if not fixture["expected_behavior"](prediction):
                    failures.append(fixture["name"])
            except Exception as e:
                failures.append(f"{fixture['name']} (crashed: {e})")
        
        return {
            "test": "edge_cases",
            "total_fixtures": len(edge_case_fixtures),
            "failures": failures,
            "failure_count": len(failures),
            "status": "PASSED" if len(failures) == 0 else "BLOCKED"
        }
    
    def _evaluate(self, data, labels) -> float:
        predictions = self.model(data)
        return np.mean(predictions == labels)
    
    def run_all(self, test_data, labels, edge_cases) -> dict:
        results = [
            self.test_input_perturbation(test_data, labels),
            self.test_missing_values(test_data, labels),
            self.test_edge_cases(edge_cases)
        ]
        all_passed = all(r["status"] == "PASSED" for r in results)
        return {
            "gate": "art15_robustness",
            "eu_ai_act_article": "Art.15",
            "overall_status": "PASSED" if all_passed else "BLOCKED",
            "tests": results,
            "blocked_tests": [r["test"] for r in results if r["status"] == "BLOCKED"]
        }

Gate 3: Cybersecurity Checks

The cybersecurity dimension of Art.15 requires protecting the AI system against manipulation. In a CI/CD context, this means three things:

3a. Model Artifact Integrity

Every model artifact in your pipeline should be hash-verified. If the weights were tampered with between training and deployment, the hash won't match.

# ci_gates/art15_model_integrity.py
import hashlib
import json
from pathlib import Path

def verify_model_integrity(
    model_path: str,
    expected_hash_path: str = "compliance/model_hashes.json"
) -> dict:
    """Verify model file hasn't been tampered with since training."""
    model_data = Path(model_path).read_bytes()
    actual_hash = hashlib.sha256(model_data).hexdigest()
    
    hashes = json.loads(Path(expected_hash_path).read_text())
    model_name = Path(model_path).name
    expected_hash = hashes.get(model_name)
    
    if expected_hash is None:
        return {
            "gate": "art15_model_integrity",
            "status": "BLOCKED",
            "reason": f"No registered hash for {model_name}",
            "action": "Run register_model_hash.py after training to register baseline"
        }
    
    passed = actual_hash == expected_hash
    return {
        "gate": "art15_model_integrity",
        "eu_ai_act_article": "Art.15",
        "model": model_name,
        "status": "PASSED" if passed else "BLOCKED",
        "hash_match": passed,
        "actual_sha256": actual_hash[:16] + "...",
        "remediation": None if passed else "Model file modified since training registration. Investigate supply chain."
    }

3b. Dependency Vulnerability Scan

Use EU-hosted or self-hosted SAST/SCA tools to scan model dependencies. Critical CVEs must block deployment.

# .github/workflows/art15-security-gate.yml
- name: EU AI Act Art.15 - Dependency Security Scan
  run: |
    pip install pip-audit
    pip-audit --require-hashes -r requirements.txt \
      --output-format=json > artifacts/security_scan.json
    python ci_gates/art15_security_gate.py
  env:
    BLOCK_ON_SEVERITY: CRITICAL

# ci_gates/art15_security_gate.py
import json, sys

def check_security_gate(scan_results_path: str, block_severity: str = "CRITICAL") -> dict:
    results = json.load(open(scan_results_path))
    critical_vulns = [
        v for v in results.get("vulnerabilities", [])
        if v.get("severity") == block_severity
    ]
    passed = len(critical_vulns) == 0
    return {
        "gate": "art15_cybersecurity",
        "eu_ai_act_article": "Art.15",
        "status": "PASSED" if passed else "BLOCKED",
        "critical_vulnerability_count": len(critical_vulns),
        "vulnerabilities": critical_vulns[:5],  # Top 5 for log brevity
        "remediation": None if passed else "Patch or pin affected packages before deployment"
    }

if __name__ == "__main__":
    result = check_security_gate("artifacts/security_scan.json")
    print(json.dumps(result, indent=2))
    if result["status"] == "BLOCKED":
        sys.exit(1)

Wiring Everything into One Pipeline

Here is a complete GitHub Actions workflow that runs all three Art.15 gates in sequence, with a structured compliance report:

# .github/workflows/eu-ai-act-art15-compliance.yml
name: EU AI Act Art.15 Compliance Gates

on:
  push:
    branches: [main, staging]
  pull_request:

jobs:
  art15-compliance:
    runs-on: ubuntu-22.04
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with: { python-version: "3.11" }
      
      - name: Install Dependencies
        run: pip install -r requirements.txt pip-audit

      - name: Art.15 Gate 1 — Accuracy Regression
        id: accuracy_gate
        run: python ci_gates/art15_accuracy_gate.py
        
      - name: Art.15 Gate 2 — Robustness Suite
        id: robustness_gate
        run: python ci_gates/art15_robustness_gate.py --run-all
        
      - name: Art.15 Gate 3 — Cybersecurity Scan
        id: security_gate
        run: |
          pip-audit -r requirements.txt --output-format=json \
            > artifacts/security_scan.json || true
          python ci_gates/art15_security_gate.py

      - name: Generate Art.15 Compliance Report
        if: always()
        run: python ci_gates/generate_art15_report.py
        
      - name: Upload Compliance Artifacts
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: art15-compliance-${{ github.sha }}
          path: artifacts/art15_*.json
          retention-days: 365  # EU AI Act Art.12 record-keeping requirement

The retention-days: 365 is deliberate — EU AI Act Art.12 requires providers to keep records that enable demonstration of conformity. Your CI artifacts are part of that audit trail.

EU-Native Tools for Art.15 Gates

Running your compliance gates on EU-hosted infrastructure eliminates CLOUD Act exposure for your audit artifacts:

Gate	EU-Native Option	Alternative
Accuracy tracking	MLflow (self-hosted Hetzner)	Weights & Biases EU region
Robustness testing	Evidently AI (self-hosted)	IBM OpenScale
Data quality	Great Expectations (local)	Soda Core (open source)
Security scanning	pip-audit (local)	Semgrep CE
CI/CD runner	Gitea Actions (self-hosted)	GitLab CE (EU)
Artifact storage	MinIO (Hetzner)	Scaleway Object Storage

Storing your compliance artifacts — accuracy reports, robustness test results, security scans — on EU infrastructure avoids the scenario where your NCA audit evidence is subject to US subpoena.

Connecting to Your RMS (Art.9)

Art.15 gates don't operate in isolation. They feed directly into your Art.9 Risk Management System:

Pre-deployment: Art.15 accuracy gate must pass before the Art.9 residual risk sign-off
Post-deployment: Accuracy monitoring triggers Art.9 re-evaluation if drift detected
Incident response: An Art.15 accuracy failure during production is a potential Art.73 serious incident if it harms end users

The CI/CD gate pattern means these connections are automated, not manual review steps that get skipped under deadline pressure.

The 25-Item Art.15 CI/CD Compliance Checklist

Before the August 2, 2026 deadline, verify:

Accuracy Gates

Declared accuracy threshold documented in EU AI database registration
art15_config.json links pipeline threshold to EU AI database entry
Accuracy gate blocks deployments that fall below declared threshold
Primary and secondary metrics both checked (e.g., F1 + recall for high-stakes classifiers)
Accuracy gate result logged with timestamp and model version

Robustness Gates

Input perturbation test implemented with defined max accuracy drop
Missing/corrupted input handler tested (no silent failures)
Edge case fixture library maintained and updated per release
Out-of-distribution (OOD) detection integrated into inference path
Consistency test: same input → same output across N runs (determinism check)

Cybersecurity Gates

Model artifact SHA256 hash registered at training time
Hash verification gate blocks deployment on mismatch
Dependency vulnerability scan runs on every PR
CRITICAL CVEs block deployment automatically
Model training pipeline access controls audited (who can push weights?)

Pipeline Integration

All gate results stored as JSON artifacts
Artifact retention set to ≥365 days (Art.12 record-keeping)
Gate failures generate structured compliance event (not just CI failure)
Compliance report generated for each deployment attempt
Gate results linked to EU AI database registration ID

Cross-Article Integration

Art.15 accuracy gate output feeds into Art.9 RMS pre-deployment check
Post-deployment accuracy monitoring triggers Art.9 re-evaluation on drift
Art.15 cybersecurity gate integrated with Art.9 residual risk assessment
Gate failures logged to Art.12 audit trail
Art.15 violations classified for Art.73 incident assessment

What's Next in This Series

We've now covered the general pipeline architecture (Part 1) and accuracy/robustness/cybersecurity gates (Part 2). Coming up:

Part 3: Art.14 Human Oversight — Automated checkpoint verification for human-in-the-loop requirements
Part 4: Art.12 Record-Keeping — Automated audit trail generation and tamper-evident logging
Part 5: Finale — Complete Compliance-as-Code stack deployed on EU infrastructure before August 2, 2026

Each article in this series adds a new layer to the same pipeline. By Part 5, you'll have a fully automated EU AI Act compliance gate system that generates audit-ready evidence with every CI run.

Key Takeaways

Art.15 is a behavioral obligation, not a documentation requirement. The NCA can test whether your declared accuracy is real. Your CI/CD pipeline is how you prove it is — every sprint, not just before the audit.

Three gates cover the three Art.15 dimensions:

Accuracy gate — verifies declared threshold is met on every deployment
Robustness gate — verifies graceful degradation under perturbation, corruption, and edge cases
Cybersecurity gate — verifies model integrity and no critical vulnerabilities in supply chain

Run these gates on EU-hosted infrastructure. Store artifacts for 12 months. Link everything to your EU AI database registration.

The August 2, 2026 deadline means these gates need to be in production, not in planning.

sota.io is an EU-native managed PaaS — deploy your AI compliance pipeline on Hetzner Germany infrastructure, no CLOUD Act exposure, from €9/month. Your CI/CD audit artifacts stay in Europe.

EU-Native Hosting

Ready to move to EU-sovereign infrastructure?

sota.io is a German-hosted PaaS — no CLOUD Act exposure, no US jurisdiction, full GDPR compliance by design. Deploy your first app in minutes.

Join the waitlist View pricing