EU AI Act Art.73 Incident Detection Automation: Building the Serious Incident Reporting Pipeline
Post #1583 in the sota.io EU AI Act Compliance Automation Series (2/5)
The previous post in this series covered Art.72 post-market monitoring automation — the continuous observability infrastructure that must run throughout a high-risk AI system's operational lifetime. Art.73 is the escalation arm of that obligation: when your monitoring stack detects something that crosses the threshold of a "serious incident," you are legally required to report it to your national market surveillance authority within a tight deadline.
The problem is that "serious incident" is not a simple threshold you can encode as a single metric alert. It requires contextual judgment across multiple signals: the nature of the harm, the population affected, whether the AI system was a contributing cause, and whether the incident was unexpected given the system's technical documentation. Teams that attempt to handle this through manual escalation pathways — where an engineer reviews monitoring alerts and decides whether to notify compliance — create a bottleneck that can easily miss the 2-day or 15-day window.
This post shows how to automate the detection, classification, and initial reporting of serious incidents under Art.73, integrated with the Art.72 observability stack described in the previous post.
What Art.73 Actually Requires
Article 73 of Regulation (EU) 2024/1689 imposes notification obligations on providers of high-risk AI systems when a serious incident occurs.
A serious incident under Art.73 is any incident or malfunction that has, or could have, led to:
- Death of a person, or serious damage to a person's health or safety
- Serious and irreversible disruption of the management and operation of critical infrastructure
- Infringement of obligations under Union law protecting fundamental rights
- Serious damage to property or the environment
The notification timelines are tiered by severity:
- 2 calendar days from awareness: when a serious incident involves death or unexpected serious harm — the highest urgency tier, requiring an immediate preliminary notification even if full details are unavailable
- 10 calendar days from awareness: for other serious incidents that are not immediately life-threatening but still meet the statutory definition
- 15 calendar days from awareness: for serious incidents that are less severe but still within scope — the standard window for most software-related serious incidents
These are calendar days, not working days. A serious incident discovered on a Friday requires a notification by Sunday if the 2-day tier applies.
The notification must include: the nature of the incident, the affected AI system, the nature of harm, corrective measures taken or planned, and the deployer information where relevant. Within 30 days, providers must submit a final report with root cause analysis.
Why Manual Triage Fails
Most engineering teams build monitoring systems that produce operational alerts — CPU spikes, latency increases, error rate thresholds. These are designed for on-call engineers, not compliance officers. The gap between "an alert fired" and "this is a reportable Art.73 serious incident" involves several judgments that a generalist engineer is poorly positioned to make under time pressure:
Harm causation analysis. Was the AI system a proximate or contributing cause of the harm, or did the harm occur independently? Art.73 reporting is triggered when the AI system is in the causal chain, not merely when harm occurred to a user of a product that happens to contain an AI component.
Affected population scope. A single data quality error that produced incorrect outputs for a specific user segment may or may not be serious depending on who those users are and what decisions were made based on the AI output. An employment AI system (Annex III Point 4) affecting thousands of job applications is categorically different from the same error rate in a product-recommendation system.
Unexpectedness against technical documentation. Art.73 is triggered not just by harm, but by harm that was unexpected — meaning not covered by the known limitations documented in the technical documentation. If your conformity assessment identified a failure mode and documented it, an incident involving that failure mode may be reported differently than one involving an undocumented failure.
An automated pipeline cannot replace human judgment entirely, but it can ensure that the right signals are surfaced to the right people immediately, with enough contextual evidence pre-assembled to make the compliance decision rapidly and the notification process as fast as possible.
The Art.73 Automated Reporting Pipeline
An Art.73-compliant automated pipeline consists of five stages running continuously against the Art.72 monitoring data stream:
Stage 1: Harm Signal Detection
The pipeline begins with structured harm signal detection — pattern-matching against the Art.72 observability stream for events that could indicate statutory harm categories.
User-reported harm events. Route all user-reported errors, complaints, and outcomes through a classification step before they reach the support queue. Any report mentioning health, safety, legal decisions, employment, creditworthiness, or fundamental rights triggers Stage 2 classification rather than standard support triage.
Downstream decision tracking. For AI systems integrated into decision workflows, maintain event hooks for downstream actions: loan denials, employment rejections, medical triage classifications, credit score outputs. Each decision carries metadata about the AI confidence level and the recommendation type. Decisions outside the normal confidence distribution trigger automatic review.
Infrastructure-level harm signals. For AI systems classified under critical infrastructure categories (Annex III Points 1 and 2), correlate AI system outputs with infrastructure state changes: grid frequency deviations, network availability changes, or access control events that followed AI recommendations within a configurable time window.
class HarmSignalDetector:
HARM_CATEGORIES = {
"health_safety": ["death", "injury", "emergency", "hospital", "ambulance"],
"fundamental_rights": ["discrimination", "bias", "unlawful", "denied", "refused"],
"critical_infrastructure": ["outage", "disruption", "unavailable", "failure"],
"employment": ["dismissed", "rejected", "terminated", "denied employment"],
"credit": ["loan denied", "credit refused", "uninsurable", "risk score"],
}
def classify_event(self, event: dict) -> list[str]:
triggered = []
text = (event.get("message", "") + " " + event.get("user_input", "")).lower()
for category, keywords in self.HARM_CATEGORIES.items():
if any(kw in text for kw in keywords):
triggered.append(category)
return triggered
This first-pass classification is deliberately over-inclusive. False negatives (missed serious incidents) carry legal risk. False positives (unnecessary compliance reviews) carry only operational cost.
Stage 2: Seriousness Classification
Events that pass Stage 1 enter automated seriousness classification, which determines whether they meet the statutory definition of "serious incident" and which notification timeline applies.
Classification is a three-question chain:
Q1: Was the AI system in the causal chain? The system maintains a decision log linking every AI output to subsequent events within a configurable window. If the harm event occurred in a user session where an AI recommendation was made within the previous N hours, the system flags the AI system as a potential causal factor and assigns the burden of disproving causation to the compliance reviewer rather than the burden of proving it.
Q2: Was the harm within the documented risk envelope? Compare the incident fingerprint against the risk documentation index — the structured extract of known failure modes from the technical documentation. Incidents that do not match known failure modes are flagged as "unexpected," triggering the stricter notification timeline.
Q3: What severity tier applies? Map the harm category against the Art.73 severity ladder:
- Death or unexpected serious harm → 2-day window, immediate notification trigger
- Other statutory serious harm → 10-day window, expedited notification trigger
- Potentially serious but scope uncertain → 15-day window, standard notification trigger
The classification output includes a severity tier, a confidence score, and a pre-populated evidence packet.
Stage 3: Evidence Collection
Once Stage 2 assigns a severity tier, the pipeline immediately begins automated evidence collection — because the notification deadline starts running from the moment of awareness, not from the moment you feel ready to file.
The evidence collector assembles:
Decision trace. The complete inference trace for the AI outputs that may be implicated in the incident: input features, model version, confidence score, output value, and any post-processing steps. This trace must be cryptographically timestamped to establish that it represents the actual system state at the time of the incident.
Affected population count. How many users received the same or similar output during the incident window? For data quality incidents, this is the number of records that passed through the affected data pipeline during the affected period.
Deployer impact assessment. For provider AI systems used by deployers, automated queries to the deployer notification list with the incident description and affected version range. Art.73 requires providers to notify deployers; this step begins that process simultaneously with the internal classification.
Technical documentation snapshot. The current version of the technical documentation sections relevant to the incident type, including the risk classification, known limitations, and performance metrics. This is the reference document for determining whether the incident was "unexpected."
class EvidenceCollector:
def collect(self, incident_id: str, classification: IncidentClassification) -> EvidencePacket:
return EvidencePacket(
incident_id=incident_id,
ai_system_id=self.system_registry.get_current_version(),
decision_traces=self.trace_store.get_traces(
incident_id,
window_hours=classification.causal_window_hours
),
affected_user_count=self.usage_analytics.count_affected(
incident_id,
output_type=classification.output_type
),
deployer_list=self.deployer_registry.get_notifiable(
version_range=classification.version_range
),
technical_doc_snapshot=self.doc_registry.get_snapshot(
section_ids=classification.relevant_sections
),
timestamp_utc=datetime.utcnow().isoformat(),
)
Stage 4: Notification Orchestration
With the evidence packet assembled, Stage 4 begins the notification orchestration. This stage does not automatically file with the national authority — that step requires human review — but it ensures that the review is zero-friction and time-bounded.
Compliance officer alert. Immediate push notification to the designated compliance officer (or on-call compliance contact) with the severity tier, the 2-day/10-day/15-day deadline, and a link to the pre-populated notification draft.
Notification draft generation. The pipeline generates a structured first draft of the Art.73 notification form, populated with the evidence packet data. Most national market surveillance authorities are building standardised digital portals for Art.73 submissions; for those that have not yet launched, the draft is formatted as a structured document that can be copy-pasted.
Deadline tracking. The incident is entered into a compliance calendar with the applicable deadline and escalation triggers: if the compliance review is not completed within half the deadline window (1 day for 2-day tier, 5 days for 10-day tier, 7 days for 15-day tier), automatic escalation to senior management.
Deployer notification dispatch. Deployer notifications are sent automatically once the compliance officer approves the incident classification. These are separate from the NCA notification and have no statutory deadline, but prompt deployer notification reduces downstream harm and demonstrates good-faith compliance.
Stage 5: Report Generation and Audit Trail
After the initial notification is filed, the pipeline transitions to the 30-day final report preparation phase.
Automated root cause log. Continuous aggregation of causal evidence from the investigation period, structured as a machine-readable timeline. Engineers document their investigation findings directly into the pipeline's structured logging interface, ensuring that the root cause analysis is assembled incrementally rather than written from scratch at day 29.
Remediation tracking. Every corrective action — model rollback, data pipeline fix, monitoring threshold adjustment — is logged against the incident with timestamps and the engineer responsible. This forms the corrective measures section of the final report.
Recurrence prevention log. Automatically capture whether the incident fingerprint matches any previous incident in the system's history. Repeated incidents of the same type are a significant compliance risk indicator and require specific documentation in the final report.
Integration with the Art.72 Monitoring Stack
The Art.73 pipeline is not a standalone system — it is the escalation layer on top of the Art.72 observability stack described in the previous post. The connection points are:
Drift detection as incident trigger. Significant accuracy drift events from the Art.72 stack should be fed into the Art.73 Stage 1 classifier. A model that has drifted far outside its validated performance envelope is producing outputs that may be contributing to downstream harm the system cannot yet observe directly.
Shared decision trace infrastructure. Both Art.72 monitoring and Art.73 evidence collection require access to the same decision trace store. Building this infrastructure once, with the dual purpose of operational monitoring and compliance evidence, avoids the divergence that occurs when teams build separate monitoring and compliance logging systems.
Unified audit log. The Art.72 post-market monitoring plan specifies the format and retention period for monitoring data. The Art.73 evidence packets should be stored in the same audit log system, ensuring that the data available for the final report is governed by the same retention controls as the operational monitoring data.
Threshold Configuration and the Overcapture Problem
A common failure mode in automated compliance pipelines is threshold calibration. Teams configure harm detection too narrowly — only flagging events with explicit harm keywords — and miss incidents that meet the statutory definition but use different language. Others configure too broadly and generate so many false positives that the compliance team becomes desensitised to notifications.
The recommended calibration approach is probabilistic rather than deterministic: assign a confidence score to each classification, route high-confidence cases directly to notification draft generation, and route medium-confidence cases to a human review queue with a 4-hour SLA. Low-confidence cases are logged for pattern analysis but do not trigger immediate action.
The overcapture problem is preferable to undercapture. A false positive requires a compliance officer to spend 15 minutes reviewing and dismissing an event. A false negative — a missed serious incident — is a statutory violation with potential penalties under Art.99.
What This Means for the August 2026 Deadline
The EU AI Act's general obligations for high-risk AI systems, including Art.73, apply from August 2, 2026. Teams that are still building their observability stacks have two months to deploy this pipeline. The minimum viable Art.73 implementation for the August deadline is:
- A harm signal detector that routes relevant events to human review (Stage 1)
- A classification checklist that compliance officers complete for each flagged event (Stage 2 manual variant)
- An evidence collection script that assembles the decision trace and affected user count on demand (Stage 3 manual trigger)
- A notification draft template pre-populated with system metadata (Stage 4 manual trigger)
- A 30-day tracking ticket that is automatically created when a serious incident is classified (Stage 5 partial)
The full automated pipeline described above is the target state. The minimum viable variant above is what you need to demonstrate compliance on August 2 — with a documented roadmap to full automation.
The next post in this series covers automated Annex IV technical documentation generation: how to maintain the required technical documentation set in a living system where models are retrained, data pipelines evolve, and conformity assessments must be kept current.
This post is part of the sota.io EU AI Act Compliance Automation Series. Previous: EU AI Act Post-Market Monitoring Automation: Building the Art.72 ML Observability Stack.
EU-Native Hosting
Ready to move to EU-sovereign infrastructure?
sota.io is a German-hosted PaaS — no CLOUD Act exposure, no US jurisdiction, full GDPR compliance by design. Deploy your first app in minutes.