EU AI Act Art.15 Accuracy Testing: Automated Robustness & Cybersecurity CI/CD Gates for High-Risk AI 2026
Post #1468 — EU AI Act CICD Compliance Testing Series #2/5
In Part 1 of this series, we built the scaffolding for an EU AI Act CI/CD compliance pipeline. Now we go deeper into the most technically demanding obligation: EU AI Act Art.15 — Accuracy, Robustness, and Cybersecurity.
Art.15 is not about paperwork. It mandates that high-risk AI systems must actually perform at declared accuracy levels, remain resilient against errors and adversarial inputs, and be protected against cybersecurity threats — throughout their lifecycle. This post shows you how to codify all three dimensions as automated gates in your CI/CD pipeline.
August 2, 2026 is 61 days away. Every high-risk AI provider needs these gates deployed before then.
What Art.15 Actually Requires
EU AI Act Art.15 imposes three distinct obligations on high-risk AI system providers:
1. Accuracy — Declared and Maintained
High-risk AI systems must achieve an appropriate level of accuracy for their intended purpose. Critically, providers must declare the relevant accuracy metrics in the instructions for use (linked to the EU AI database registration). This creates an auditable commitment: the accuracy you declare is the accuracy the NCA can test.
What this means for CI/CD: your pipeline must verify that every model version meets the declared accuracy threshold before deployment. A model that drifts below its declared metric is an Art.15 violation.
2. Robustness — Resilience Throughout the Lifecycle
Art.15 requires high-risk AI systems to be resilient with regard to errors, faults, or inconsistencies — both in inputs and in the system itself. This means your system should produce consistent, reliable outputs even when encountering:
- Corrupted or malformed input data
- Unusual edge cases outside the training distribution
- Hardware or infrastructure failures
- Adversarial inputs designed to degrade performance
3. Cybersecurity — Protection Against Manipulation
The cybersecurity dimension of Art.15 requires protection against attempts by third parties to exploit system vulnerabilities to alter the AI system's use, outputs, or performance. This connects directly to your model supply chain security: the weights, training pipelines, and inference endpoints are all in scope.
Translating Art.15 into CI/CD Gates
Here is the mapping from Art.15 obligations to concrete pipeline gates:
| Art.15 Dimension | CI/CD Gate | Pass Condition |
|---|---|---|
| Accuracy | Performance regression check | Accuracy ≥ declared threshold |
| Accuracy | Metric drift detection | No statistically significant drop vs baseline |
| Robustness | Edge case / OOD tests | Graceful degradation, no crashes |
| Robustness | Adversarial perturbation test | Accuracy drop <10% under perturbation |
| Robustness | Data corruption injection | System handles malformed inputs |
| Cybersecurity | Model integrity check | Hash matches registered artifact |
| Cybersecurity | Dependency vulnerability scan | No critical CVEs in model dependencies |
| Cybersecurity | Inference endpoint security test | No model inversion / extraction attacks |
Gate 1: Accuracy Regression Check
This is the foundation gate. Every deployment must verify the model meets its declared accuracy threshold.
# ci_gates/art15_accuracy_gate.py
import json
import sys
from pathlib import Path
def check_accuracy_gate(
model_metrics_path: str,
declared_threshold: float,
metric_name: str = "accuracy"
) -> dict:
"""
EU AI Act Art.15 accuracy gate.
Verifies model meets its declared accuracy threshold before deployment.
"""
with open(model_metrics_path) as f:
metrics = json.load(f)
current_accuracy = metrics.get(metric_name)
if current_accuracy is None:
return {
"gate": "art15_accuracy",
"status": "BLOCKED",
"reason": f"Metric '{metric_name}' not found in evaluation output",
"eu_ai_act_article": "Art.15"
}
passed = current_accuracy >= declared_threshold
return {
"gate": "art15_accuracy",
"status": "PASSED" if passed else "BLOCKED",
"metric": metric_name,
"current_value": current_accuracy,
"declared_threshold": declared_threshold,
"delta": current_accuracy - declared_threshold,
"eu_ai_act_article": "Art.15",
"deployment_blocked": not passed,
"remediation": None if passed else (
f"Model accuracy {current_accuracy:.4f} is below declared threshold "
f"{declared_threshold:.4f}. Retraining or threshold revision required."
)
}
if __name__ == "__main__":
# Load from compliance config
config = json.loads(Path("compliance/art15_config.json").read_text())
result = check_accuracy_gate(
model_metrics_path="artifacts/evaluation_metrics.json",
declared_threshold=config["declared_accuracy_threshold"],
metric_name=config.get("primary_metric", "accuracy")
)
print(json.dumps(result, indent=2))
if result["status"] == "BLOCKED":
sys.exit(1) # Fail the CI build
The compliance/art15_config.json is the source of truth linking your pipeline to your EU AI database declaration:
{
"system_name": "High-Risk Recruitment AI v3",
"eu_ai_act_article": "Art.15",
"declared_accuracy_threshold": 0.92,
"primary_metric": "f1_score",
"accuracy_metric_source": "EU_AI_Database_Registration_ID_DE-2026-HR-0047",
"last_updated": "2026-05-15"
}
Gate 2: Robustness Test Suite
Robustness testing verifies the system behaves correctly under stress conditions. The key Art.15 insight: robustness isn't about perfection, it's about graceful degradation and consistency.
# ci_gates/art15_robustness_gate.py
import numpy as np
from typing import Callable
class Art15RobustnessGate:
"""
EU AI Act Art.15 robustness gate.
Tests model resilience against perturbations and edge cases.
"""
def __init__(self, model_fn: Callable, baseline_accuracy: float):
self.model = model_fn
self.baseline_accuracy = baseline_accuracy
self.results = []
def test_input_perturbation(
self,
test_data,
labels,
noise_std: float = 0.05,
max_accuracy_drop: float = 0.10
) -> dict:
"""Test accuracy under Gaussian noise perturbation."""
noisy_data = test_data + np.random.normal(0, noise_std, test_data.shape)
perturbed_accuracy = self._evaluate(noisy_data, labels)
accuracy_drop = self.baseline_accuracy - perturbed_accuracy
passed = accuracy_drop <= max_accuracy_drop
return {
"test": "input_perturbation",
"noise_std": noise_std,
"baseline_accuracy": self.baseline_accuracy,
"perturbed_accuracy": perturbed_accuracy,
"accuracy_drop": accuracy_drop,
"max_allowed_drop": max_accuracy_drop,
"status": "PASSED" if passed else "BLOCKED"
}
def test_missing_values(self, test_data, labels) -> dict:
"""Test behavior with missing/NaN inputs — common real-world failure mode."""
corrupted = test_data.copy().astype(float)
mask = np.random.random(corrupted.shape) < 0.05 # 5% missing
corrupted[mask] = np.nan
try:
accuracy = self._evaluate(corrupted, labels)
crashed = False
except Exception as e:
accuracy = 0.0
crashed = True
return {
"test": "missing_values",
"missing_rate": 0.05,
"accuracy_with_missing": accuracy,
"crashed": crashed,
"status": "PASSED" if not crashed and accuracy > 0.5 else "BLOCKED"
}
def test_edge_cases(self, edge_case_fixtures: list) -> dict:
"""Test against known edge case fixtures (regulatory requirement for Annex III systems)."""
failures = []
for fixture in edge_case_fixtures:
try:
prediction = self.model(fixture["input"])
if not fixture["expected_behavior"](prediction):
failures.append(fixture["name"])
except Exception as e:
failures.append(f"{fixture['name']} (crashed: {e})")
return {
"test": "edge_cases",
"total_fixtures": len(edge_case_fixtures),
"failures": failures,
"failure_count": len(failures),
"status": "PASSED" if len(failures) == 0 else "BLOCKED"
}
def _evaluate(self, data, labels) -> float:
predictions = self.model(data)
return np.mean(predictions == labels)
def run_all(self, test_data, labels, edge_cases) -> dict:
results = [
self.test_input_perturbation(test_data, labels),
self.test_missing_values(test_data, labels),
self.test_edge_cases(edge_cases)
]
all_passed = all(r["status"] == "PASSED" for r in results)
return {
"gate": "art15_robustness",
"eu_ai_act_article": "Art.15",
"overall_status": "PASSED" if all_passed else "BLOCKED",
"tests": results,
"blocked_tests": [r["test"] for r in results if r["status"] == "BLOCKED"]
}
Gate 3: Cybersecurity Checks
The cybersecurity dimension of Art.15 requires protecting the AI system against manipulation. In a CI/CD context, this means three things:
3a. Model Artifact Integrity
Every model artifact in your pipeline should be hash-verified. If the weights were tampered with between training and deployment, the hash won't match.
# ci_gates/art15_model_integrity.py
import hashlib
import json
from pathlib import Path
def verify_model_integrity(
model_path: str,
expected_hash_path: str = "compliance/model_hashes.json"
) -> dict:
"""Verify model file hasn't been tampered with since training."""
model_data = Path(model_path).read_bytes()
actual_hash = hashlib.sha256(model_data).hexdigest()
hashes = json.loads(Path(expected_hash_path).read_text())
model_name = Path(model_path).name
expected_hash = hashes.get(model_name)
if expected_hash is None:
return {
"gate": "art15_model_integrity",
"status": "BLOCKED",
"reason": f"No registered hash for {model_name}",
"action": "Run register_model_hash.py after training to register baseline"
}
passed = actual_hash == expected_hash
return {
"gate": "art15_model_integrity",
"eu_ai_act_article": "Art.15",
"model": model_name,
"status": "PASSED" if passed else "BLOCKED",
"hash_match": passed,
"actual_sha256": actual_hash[:16] + "...",
"remediation": None if passed else "Model file modified since training registration. Investigate supply chain."
}
3b. Dependency Vulnerability Scan
Use EU-hosted or self-hosted SAST/SCA tools to scan model dependencies. Critical CVEs must block deployment.
# .github/workflows/art15-security-gate.yml
- name: EU AI Act Art.15 - Dependency Security Scan
run: |
pip install pip-audit
pip-audit --require-hashes -r requirements.txt \
--output-format=json > artifacts/security_scan.json
python ci_gates/art15_security_gate.py
env:
BLOCK_ON_SEVERITY: CRITICAL
# ci_gates/art15_security_gate.py
import json, sys
def check_security_gate(scan_results_path: str, block_severity: str = "CRITICAL") -> dict:
results = json.load(open(scan_results_path))
critical_vulns = [
v for v in results.get("vulnerabilities", [])
if v.get("severity") == block_severity
]
passed = len(critical_vulns) == 0
return {
"gate": "art15_cybersecurity",
"eu_ai_act_article": "Art.15",
"status": "PASSED" if passed else "BLOCKED",
"critical_vulnerability_count": len(critical_vulns),
"vulnerabilities": critical_vulns[:5], # Top 5 for log brevity
"remediation": None if passed else "Patch or pin affected packages before deployment"
}
if __name__ == "__main__":
result = check_security_gate("artifacts/security_scan.json")
print(json.dumps(result, indent=2))
if result["status"] == "BLOCKED":
sys.exit(1)
Wiring Everything into One Pipeline
Here is a complete GitHub Actions workflow that runs all three Art.15 gates in sequence, with a structured compliance report:
# .github/workflows/eu-ai-act-art15-compliance.yml
name: EU AI Act Art.15 Compliance Gates
on:
push:
branches: [main, staging]
pull_request:
jobs:
art15-compliance:
runs-on: ubuntu-22.04
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with: { python-version: "3.11" }
- name: Install Dependencies
run: pip install -r requirements.txt pip-audit
- name: Art.15 Gate 1 — Accuracy Regression
id: accuracy_gate
run: python ci_gates/art15_accuracy_gate.py
- name: Art.15 Gate 2 — Robustness Suite
id: robustness_gate
run: python ci_gates/art15_robustness_gate.py --run-all
- name: Art.15 Gate 3 — Cybersecurity Scan
id: security_gate
run: |
pip-audit -r requirements.txt --output-format=json \
> artifacts/security_scan.json || true
python ci_gates/art15_security_gate.py
- name: Generate Art.15 Compliance Report
if: always()
run: python ci_gates/generate_art15_report.py
- name: Upload Compliance Artifacts
if: always()
uses: actions/upload-artifact@v4
with:
name: art15-compliance-${{ github.sha }}
path: artifacts/art15_*.json
retention-days: 365 # EU AI Act Art.12 record-keeping requirement
The retention-days: 365 is deliberate — EU AI Act Art.12 requires providers to keep records that enable demonstration of conformity. Your CI artifacts are part of that audit trail.
EU-Native Tools for Art.15 Gates
Running your compliance gates on EU-hosted infrastructure eliminates CLOUD Act exposure for your audit artifacts:
| Gate | EU-Native Option | Alternative |
|---|---|---|
| Accuracy tracking | MLflow (self-hosted Hetzner) | Weights & Biases EU region |
| Robustness testing | Evidently AI (self-hosted) | IBM OpenScale |
| Data quality | Great Expectations (local) | Soda Core (open source) |
| Security scanning | pip-audit (local) | Semgrep CE |
| CI/CD runner | Gitea Actions (self-hosted) | GitLab CE (EU) |
| Artifact storage | MinIO (Hetzner) | Scaleway Object Storage |
Storing your compliance artifacts — accuracy reports, robustness test results, security scans — on EU infrastructure avoids the scenario where your NCA audit evidence is subject to US subpoena.
Connecting to Your RMS (Art.9)
Art.15 gates don't operate in isolation. They feed directly into your Art.9 Risk Management System:
- Pre-deployment: Art.15 accuracy gate must pass before the Art.9 residual risk sign-off
- Post-deployment: Accuracy monitoring triggers Art.9 re-evaluation if drift detected
- Incident response: An Art.15 accuracy failure during production is a potential Art.73 serious incident if it harms end users
The CI/CD gate pattern means these connections are automated, not manual review steps that get skipped under deadline pressure.
The 25-Item Art.15 CI/CD Compliance Checklist
Before the August 2, 2026 deadline, verify:
Accuracy Gates
- Declared accuracy threshold documented in EU AI database registration
-
art15_config.jsonlinks pipeline threshold to EU AI database entry - Accuracy gate blocks deployments that fall below declared threshold
- Primary and secondary metrics both checked (e.g., F1 + recall for high-stakes classifiers)
- Accuracy gate result logged with timestamp and model version
Robustness Gates
- Input perturbation test implemented with defined max accuracy drop
- Missing/corrupted input handler tested (no silent failures)
- Edge case fixture library maintained and updated per release
- Out-of-distribution (OOD) detection integrated into inference path
- Consistency test: same input → same output across N runs (determinism check)
Cybersecurity Gates
- Model artifact SHA256 hash registered at training time
- Hash verification gate blocks deployment on mismatch
- Dependency vulnerability scan runs on every PR
- CRITICAL CVEs block deployment automatically
- Model training pipeline access controls audited (who can push weights?)
Pipeline Integration
- All gate results stored as JSON artifacts
- Artifact retention set to ≥365 days (Art.12 record-keeping)
- Gate failures generate structured compliance event (not just CI failure)
- Compliance report generated for each deployment attempt
- Gate results linked to EU AI database registration ID
Cross-Article Integration
- Art.15 accuracy gate output feeds into Art.9 RMS pre-deployment check
- Post-deployment accuracy monitoring triggers Art.9 re-evaluation on drift
- Art.15 cybersecurity gate integrated with Art.9 residual risk assessment
- Gate failures logged to Art.12 audit trail
- Art.15 violations classified for Art.73 incident assessment
What's Next in This Series
We've now covered the general pipeline architecture (Part 1) and accuracy/robustness/cybersecurity gates (Part 2). Coming up:
- Part 3: Art.14 Human Oversight — Automated checkpoint verification for human-in-the-loop requirements
- Part 4: Art.12 Record-Keeping — Automated audit trail generation and tamper-evident logging
- Part 5: Finale — Complete Compliance-as-Code stack deployed on EU infrastructure before August 2, 2026
Each article in this series adds a new layer to the same pipeline. By Part 5, you'll have a fully automated EU AI Act compliance gate system that generates audit-ready evidence with every CI run.
Key Takeaways
Art.15 is a behavioral obligation, not a documentation requirement. The NCA can test whether your declared accuracy is real. Your CI/CD pipeline is how you prove it is — every sprint, not just before the audit.
Three gates cover the three Art.15 dimensions:
- Accuracy gate — verifies declared threshold is met on every deployment
- Robustness gate — verifies graceful degradation under perturbation, corruption, and edge cases
- Cybersecurity gate — verifies model integrity and no critical vulnerabilities in supply chain
Run these gates on EU-hosted infrastructure. Store artifacts for 12 months. Link everything to your EU AI database registration.
The August 2, 2026 deadline means these gates need to be in production, not in planning.
sota.io is an EU-native managed PaaS — deploy your AI compliance pipeline on Hetzner Germany infrastructure, no CLOUD Act exposure, from €9/month. Your CI/CD audit artifacts stay in Europe.
EU-Native Hosting
Ready to move to EU-sovereign infrastructure?
sota.io is a German-hosted PaaS — no CLOUD Act exposure, no US jurisdiction, full GDPR compliance by design. Deploy your first app in minutes.