2026-06-03·5 min read·sota.io Team

EU AI Act Art.15 Accuracy Testing: Automated Robustness & Cybersecurity CI/CD Gates for High-Risk AI 2026

Post #1468 — EU AI Act CICD Compliance Testing Series #2/5

EU AI Act Art.15 CI/CD accuracy and robustness testing gates for high-risk AI

In Part 1 of this series, we built the scaffolding for an EU AI Act CI/CD compliance pipeline. Now we go deeper into the most technically demanding obligation: EU AI Act Art.15 — Accuracy, Robustness, and Cybersecurity.

Art.15 is not about paperwork. It mandates that high-risk AI systems must actually perform at declared accuracy levels, remain resilient against errors and adversarial inputs, and be protected against cybersecurity threats — throughout their lifecycle. This post shows you how to codify all three dimensions as automated gates in your CI/CD pipeline.

August 2, 2026 is 61 days away. Every high-risk AI provider needs these gates deployed before then.


What Art.15 Actually Requires

EU AI Act Art.15 imposes three distinct obligations on high-risk AI system providers:

1. Accuracy — Declared and Maintained

High-risk AI systems must achieve an appropriate level of accuracy for their intended purpose. Critically, providers must declare the relevant accuracy metrics in the instructions for use (linked to the EU AI database registration). This creates an auditable commitment: the accuracy you declare is the accuracy the NCA can test.

What this means for CI/CD: your pipeline must verify that every model version meets the declared accuracy threshold before deployment. A model that drifts below its declared metric is an Art.15 violation.

2. Robustness — Resilience Throughout the Lifecycle

Art.15 requires high-risk AI systems to be resilient with regard to errors, faults, or inconsistencies — both in inputs and in the system itself. This means your system should produce consistent, reliable outputs even when encountering:

3. Cybersecurity — Protection Against Manipulation

The cybersecurity dimension of Art.15 requires protection against attempts by third parties to exploit system vulnerabilities to alter the AI system's use, outputs, or performance. This connects directly to your model supply chain security: the weights, training pipelines, and inference endpoints are all in scope.


Translating Art.15 into CI/CD Gates

Here is the mapping from Art.15 obligations to concrete pipeline gates:

Art.15 DimensionCI/CD GatePass Condition
AccuracyPerformance regression checkAccuracy ≥ declared threshold
AccuracyMetric drift detectionNo statistically significant drop vs baseline
RobustnessEdge case / OOD testsGraceful degradation, no crashes
RobustnessAdversarial perturbation testAccuracy drop <10% under perturbation
RobustnessData corruption injectionSystem handles malformed inputs
CybersecurityModel integrity checkHash matches registered artifact
CybersecurityDependency vulnerability scanNo critical CVEs in model dependencies
CybersecurityInference endpoint security testNo model inversion / extraction attacks

Gate 1: Accuracy Regression Check

This is the foundation gate. Every deployment must verify the model meets its declared accuracy threshold.

# ci_gates/art15_accuracy_gate.py
import json
import sys
from pathlib import Path

def check_accuracy_gate(
    model_metrics_path: str,
    declared_threshold: float,
    metric_name: str = "accuracy"
) -> dict:
    """
    EU AI Act Art.15 accuracy gate.
    Verifies model meets its declared accuracy threshold before deployment.
    """
    with open(model_metrics_path) as f:
        metrics = json.load(f)
    
    current_accuracy = metrics.get(metric_name)
    if current_accuracy is None:
        return {
            "gate": "art15_accuracy",
            "status": "BLOCKED",
            "reason": f"Metric '{metric_name}' not found in evaluation output",
            "eu_ai_act_article": "Art.15"
        }
    
    passed = current_accuracy >= declared_threshold
    return {
        "gate": "art15_accuracy",
        "status": "PASSED" if passed else "BLOCKED",
        "metric": metric_name,
        "current_value": current_accuracy,
        "declared_threshold": declared_threshold,
        "delta": current_accuracy - declared_threshold,
        "eu_ai_act_article": "Art.15",
        "deployment_blocked": not passed,
        "remediation": None if passed else (
            f"Model accuracy {current_accuracy:.4f} is below declared threshold "
            f"{declared_threshold:.4f}. Retraining or threshold revision required."
        )
    }

if __name__ == "__main__":
    # Load from compliance config
    config = json.loads(Path("compliance/art15_config.json").read_text())
    result = check_accuracy_gate(
        model_metrics_path="artifacts/evaluation_metrics.json",
        declared_threshold=config["declared_accuracy_threshold"],
        metric_name=config.get("primary_metric", "accuracy")
    )
    print(json.dumps(result, indent=2))
    if result["status"] == "BLOCKED":
        sys.exit(1)  # Fail the CI build

The compliance/art15_config.json is the source of truth linking your pipeline to your EU AI database declaration:

{
  "system_name": "High-Risk Recruitment AI v3",
  "eu_ai_act_article": "Art.15",
  "declared_accuracy_threshold": 0.92,
  "primary_metric": "f1_score",
  "accuracy_metric_source": "EU_AI_Database_Registration_ID_DE-2026-HR-0047",
  "last_updated": "2026-05-15"
}

Gate 2: Robustness Test Suite

Robustness testing verifies the system behaves correctly under stress conditions. The key Art.15 insight: robustness isn't about perfection, it's about graceful degradation and consistency.

# ci_gates/art15_robustness_gate.py
import numpy as np
from typing import Callable

class Art15RobustnessGate:
    """
    EU AI Act Art.15 robustness gate.
    Tests model resilience against perturbations and edge cases.
    """
    
    def __init__(self, model_fn: Callable, baseline_accuracy: float):
        self.model = model_fn
        self.baseline_accuracy = baseline_accuracy
        self.results = []
    
    def test_input_perturbation(
        self, 
        test_data, 
        labels,
        noise_std: float = 0.05,
        max_accuracy_drop: float = 0.10
    ) -> dict:
        """Test accuracy under Gaussian noise perturbation."""
        noisy_data = test_data + np.random.normal(0, noise_std, test_data.shape)
        perturbed_accuracy = self._evaluate(noisy_data, labels)
        accuracy_drop = self.baseline_accuracy - perturbed_accuracy
        passed = accuracy_drop <= max_accuracy_drop
        return {
            "test": "input_perturbation",
            "noise_std": noise_std,
            "baseline_accuracy": self.baseline_accuracy,
            "perturbed_accuracy": perturbed_accuracy,
            "accuracy_drop": accuracy_drop,
            "max_allowed_drop": max_accuracy_drop,
            "status": "PASSED" if passed else "BLOCKED"
        }
    
    def test_missing_values(self, test_data, labels) -> dict:
        """Test behavior with missing/NaN inputs — common real-world failure mode."""
        corrupted = test_data.copy().astype(float)
        mask = np.random.random(corrupted.shape) < 0.05  # 5% missing
        corrupted[mask] = np.nan
        
        try:
            accuracy = self._evaluate(corrupted, labels)
            crashed = False
        except Exception as e:
            accuracy = 0.0
            crashed = True
        
        return {
            "test": "missing_values",
            "missing_rate": 0.05,
            "accuracy_with_missing": accuracy,
            "crashed": crashed,
            "status": "PASSED" if not crashed and accuracy > 0.5 else "BLOCKED"
        }
    
    def test_edge_cases(self, edge_case_fixtures: list) -> dict:
        """Test against known edge case fixtures (regulatory requirement for Annex III systems)."""
        failures = []
        for fixture in edge_case_fixtures:
            try:
                prediction = self.model(fixture["input"])
                if not fixture["expected_behavior"](prediction):
                    failures.append(fixture["name"])
            except Exception as e:
                failures.append(f"{fixture['name']} (crashed: {e})")
        
        return {
            "test": "edge_cases",
            "total_fixtures": len(edge_case_fixtures),
            "failures": failures,
            "failure_count": len(failures),
            "status": "PASSED" if len(failures) == 0 else "BLOCKED"
        }
    
    def _evaluate(self, data, labels) -> float:
        predictions = self.model(data)
        return np.mean(predictions == labels)
    
    def run_all(self, test_data, labels, edge_cases) -> dict:
        results = [
            self.test_input_perturbation(test_data, labels),
            self.test_missing_values(test_data, labels),
            self.test_edge_cases(edge_cases)
        ]
        all_passed = all(r["status"] == "PASSED" for r in results)
        return {
            "gate": "art15_robustness",
            "eu_ai_act_article": "Art.15",
            "overall_status": "PASSED" if all_passed else "BLOCKED",
            "tests": results,
            "blocked_tests": [r["test"] for r in results if r["status"] == "BLOCKED"]
        }

Gate 3: Cybersecurity Checks

The cybersecurity dimension of Art.15 requires protecting the AI system against manipulation. In a CI/CD context, this means three things:

3a. Model Artifact Integrity

Every model artifact in your pipeline should be hash-verified. If the weights were tampered with between training and deployment, the hash won't match.

# ci_gates/art15_model_integrity.py
import hashlib
import json
from pathlib import Path

def verify_model_integrity(
    model_path: str,
    expected_hash_path: str = "compliance/model_hashes.json"
) -> dict:
    """Verify model file hasn't been tampered with since training."""
    model_data = Path(model_path).read_bytes()
    actual_hash = hashlib.sha256(model_data).hexdigest()
    
    hashes = json.loads(Path(expected_hash_path).read_text())
    model_name = Path(model_path).name
    expected_hash = hashes.get(model_name)
    
    if expected_hash is None:
        return {
            "gate": "art15_model_integrity",
            "status": "BLOCKED",
            "reason": f"No registered hash for {model_name}",
            "action": "Run register_model_hash.py after training to register baseline"
        }
    
    passed = actual_hash == expected_hash
    return {
        "gate": "art15_model_integrity",
        "eu_ai_act_article": "Art.15",
        "model": model_name,
        "status": "PASSED" if passed else "BLOCKED",
        "hash_match": passed,
        "actual_sha256": actual_hash[:16] + "...",
        "remediation": None if passed else "Model file modified since training registration. Investigate supply chain."
    }

3b. Dependency Vulnerability Scan

Use EU-hosted or self-hosted SAST/SCA tools to scan model dependencies. Critical CVEs must block deployment.

# .github/workflows/art15-security-gate.yml
- name: EU AI Act Art.15 - Dependency Security Scan
  run: |
    pip install pip-audit
    pip-audit --require-hashes -r requirements.txt \
      --output-format=json > artifacts/security_scan.json
    python ci_gates/art15_security_gate.py
  env:
    BLOCK_ON_SEVERITY: CRITICAL
# ci_gates/art15_security_gate.py
import json, sys

def check_security_gate(scan_results_path: str, block_severity: str = "CRITICAL") -> dict:
    results = json.load(open(scan_results_path))
    critical_vulns = [
        v for v in results.get("vulnerabilities", [])
        if v.get("severity") == block_severity
    ]
    passed = len(critical_vulns) == 0
    return {
        "gate": "art15_cybersecurity",
        "eu_ai_act_article": "Art.15",
        "status": "PASSED" if passed else "BLOCKED",
        "critical_vulnerability_count": len(critical_vulns),
        "vulnerabilities": critical_vulns[:5],  # Top 5 for log brevity
        "remediation": None if passed else "Patch or pin affected packages before deployment"
    }

if __name__ == "__main__":
    result = check_security_gate("artifacts/security_scan.json")
    print(json.dumps(result, indent=2))
    if result["status"] == "BLOCKED":
        sys.exit(1)

Wiring Everything into One Pipeline

Here is a complete GitHub Actions workflow that runs all three Art.15 gates in sequence, with a structured compliance report:

# .github/workflows/eu-ai-act-art15-compliance.yml
name: EU AI Act Art.15 Compliance Gates

on:
  push:
    branches: [main, staging]
  pull_request:

jobs:
  art15-compliance:
    runs-on: ubuntu-22.04
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with: { python-version: "3.11" }
      
      - name: Install Dependencies
        run: pip install -r requirements.txt pip-audit

      - name: Art.15 Gate 1 — Accuracy Regression
        id: accuracy_gate
        run: python ci_gates/art15_accuracy_gate.py
        
      - name: Art.15 Gate 2 — Robustness Suite
        id: robustness_gate
        run: python ci_gates/art15_robustness_gate.py --run-all
        
      - name: Art.15 Gate 3 — Cybersecurity Scan
        id: security_gate
        run: |
          pip-audit -r requirements.txt --output-format=json \
            > artifacts/security_scan.json || true
          python ci_gates/art15_security_gate.py

      - name: Generate Art.15 Compliance Report
        if: always()
        run: python ci_gates/generate_art15_report.py
        
      - name: Upload Compliance Artifacts
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: art15-compliance-${{ github.sha }}
          path: artifacts/art15_*.json
          retention-days: 365  # EU AI Act Art.12 record-keeping requirement

The retention-days: 365 is deliberate — EU AI Act Art.12 requires providers to keep records that enable demonstration of conformity. Your CI artifacts are part of that audit trail.


EU-Native Tools for Art.15 Gates

Running your compliance gates on EU-hosted infrastructure eliminates CLOUD Act exposure for your audit artifacts:

GateEU-Native OptionAlternative
Accuracy trackingMLflow (self-hosted Hetzner)Weights & Biases EU region
Robustness testingEvidently AI (self-hosted)IBM OpenScale
Data qualityGreat Expectations (local)Soda Core (open source)
Security scanningpip-audit (local)Semgrep CE
CI/CD runnerGitea Actions (self-hosted)GitLab CE (EU)
Artifact storageMinIO (Hetzner)Scaleway Object Storage

Storing your compliance artifacts — accuracy reports, robustness test results, security scans — on EU infrastructure avoids the scenario where your NCA audit evidence is subject to US subpoena.


Connecting to Your RMS (Art.9)

Art.15 gates don't operate in isolation. They feed directly into your Art.9 Risk Management System:

The CI/CD gate pattern means these connections are automated, not manual review steps that get skipped under deadline pressure.


The 25-Item Art.15 CI/CD Compliance Checklist

Before the August 2, 2026 deadline, verify:

Accuracy Gates

Robustness Gates

Cybersecurity Gates

Pipeline Integration

Cross-Article Integration


What's Next in This Series

We've now covered the general pipeline architecture (Part 1) and accuracy/robustness/cybersecurity gates (Part 2). Coming up:

Each article in this series adds a new layer to the same pipeline. By Part 5, you'll have a fully automated EU AI Act compliance gate system that generates audit-ready evidence with every CI run.


Key Takeaways

Art.15 is a behavioral obligation, not a documentation requirement. The NCA can test whether your declared accuracy is real. Your CI/CD pipeline is how you prove it is — every sprint, not just before the audit.

Three gates cover the three Art.15 dimensions:

  1. Accuracy gate — verifies declared threshold is met on every deployment
  2. Robustness gate — verifies graceful degradation under perturbation, corruption, and edge cases
  3. Cybersecurity gate — verifies model integrity and no critical vulnerabilities in supply chain

Run these gates on EU-hosted infrastructure. Store artifacts for 12 months. Link everything to your EU AI database registration.

The August 2, 2026 deadline means these gates need to be in production, not in planning.


sota.io is an EU-native managed PaaS — deploy your AI compliance pipeline on Hetzner Germany infrastructure, no CLOUD Act exposure, from €9/month. Your CI/CD audit artifacts stay in Europe.

EU-Native Hosting

Ready to move to EU-sovereign infrastructure?

sota.io is a German-hosted PaaS — no CLOUD Act exposure, no US jurisdiction, full GDPR compliance by design. Deploy your first app in minutes.