EU AI Act Regulatory Sandbox: Data Access, Personal Data Processing, and IP Protection (2026)
Post #1600 in the sota.io EU Compliance Series — EU AI Act Regulatory Sandbox 2026 #4/5
You have your regulatory sandbox approval. You have a testing protocol. Now you face the problem that actually kills sandbox projects before they deliver value: you cannot access the data you need to test your AI system under realistic conditions.
The EU AI Act regulatory sandbox is designed to let developers test high-risk AI systems with real-world constraints relaxed under supervised conditions. But "relaxed constraints" does not mean you can process any personal data you want. There are specific legal instruments — and specific safeguards — that determine what data you can access, how you can use it, and what must happen to it when your sandbox period ends.
This post covers the four data challenges every developer encounters inside a live EU AI Act sandbox: getting access to personal data for training and testing, maintaining GDPR compliance during that access, protecting your own trade secrets from NCA oversight, and handling data correctly at sandbox exit.
This is the fourth post in our five-part series. Part one covered Art.57 sandbox fundamentals. Part two covered the application and development plan. Part three covered testing protocol and evidence generation. Here we focus on data and IP. Part five will be the complete developer checklist for the full sandbox lifecycle.
The Data Problem in AI Regulatory Sandboxes
High-risk AI systems under EU AI Act Annex III — recruitment algorithms, creditworthiness assessments, remote biometric identification, medical diagnostic AI — share a common trait: they require large volumes of personal data to train, validate, and test properly.
Outside a sandbox, accessing this data for development purposes creates immediate GDPR friction. The data was collected for one purpose (loan applications, hiring, medical treatment). Using it to train an AI system is typically a different purpose, requiring a separate legal basis under GDPR Art.6 or explicit consent under Art.9 for special categories. Most organizations cannot provide that legal basis at the volume developers need.
The sandbox changes this calculus. Article 59 of the EU AI Act creates a specific legal instrument for processing personal data inside a sandbox that would otherwise be unavailable. Understanding exactly what Art.59 permits — and what it does not — determines whether your sandbox project can access the data it needs.
Article 59: Further Processing of Personal Data
Art.59 of the EU AI Act addresses a specific scenario: personal data that was lawfully collected for another purpose is repurposed for developing, training, or testing an AI system within a regulatory sandbox.
Under GDPR, this "further processing" requires either compatibility with the original purpose (GDPR Art.6(4)) or a fresh legal basis. For sensitive categories under GDPR Art.9 — health data, biometric data, financial vulnerability indicators — the compatibility analysis almost never succeeds for AI training purposes.
Art.59 creates the legal instrument that resolves this for sandbox participants. The key provisions:
What it permits: Processing of personal data collected for other purposes, for the specific purpose of developing or testing an AI system within an approved regulatory sandbox, where certain conditions are met.
Who it applies to: Both the AI developer and the data controller providing access to data. Both parties must be operating within the sandbox framework with NCA oversight.
Temporal limit: The processing authorization is coextensive with the sandbox period. When the sandbox ends, the Art.59 basis for further processing ends with it.
Scope limit: The data processed under Art.59 may only be used for the sandbox testing purpose. Model weights or parameters derived from the sandbox data cannot be transferred to production systems without a separate legal basis under GDPR.
The Four Mandatory Safeguards
Art.59 processing authorization is conditional on implementing four categories of safeguard. Each has practical implementation requirements.
1. Functional separation
The sandbox data environment must be technically isolated from the organization's production systems. This means separate compute instances, no shared credentials with production databases, and explicit access control lists that include only personnel working on the sandbox project.
Practically: create a dedicated sandbox environment (a separate Kubernetes namespace, a separate cloud project with its own IAM policies, or a physically separate on-premises installation). Document this separation in your sandbox technical documentation.
2. Pseudonymisation of personal data
Where technically feasible, personal data used in sandbox testing must be pseudonymised before developer access. Art.59 does not require full anonymisation — which would often destroy the statistical properties the AI needs to learn from — but it requires the removal of direct identifiers and substitution with reversible pseudonyms.
The pseudonymisation key must be held by the data controller, not the AI developer. The developer receives data where names, national identification numbers, and other direct identifiers have been replaced with opaque tokens. The data controller retains the ability to reverse pseudonymisation if legally required.
Practically: implement pseudonymisation at the data extraction layer before data enters the sandbox environment. Use consistent pseudonym tokens across sessions so the AI can learn correlational patterns without accessing real identifiers. Log all access to pseudonymisation keys.
3. No transfer outside the sandbox boundary
Personal data accessed under Art.59 authorization cannot be transferred, copied, or extracted from the sandbox environment to external systems. This applies to:
- Raw data files
- Model weights trained on personal data (these are transfers of derived information)
- Evaluation results that could allow re-identification
- Logs that contain personal data
The sandbox boundary must be technically enforced, not just procedurally required. Implement data loss prevention controls on outbound connections from the sandbox environment.
4. NCA supervisory access
The national competent authority supervising your sandbox must have the ability to inspect the data processing at any time. This does not mean the NCA has access to the personal data itself — that would create its own GDPR compliance issues. It means the NCA can access logs, pseudonymisation records, access control configurations, and technical documentation to verify compliance.
Maintain an audit trail of all data access within the sandbox that is accessible to the NCA on request.
What Data Can You Actually Access?
The practical question developers ask most often: which data sources can I get authorization to access under Art.59?
The answer depends on three variables: (1) who holds the original data and under what original legal basis it was collected, (2) whether that data controller is willing to enter into the sandbox data-sharing arrangement, and (3) whether your NCA has established the required supervision framework.
Public sector data sets
National competent authorities have authority to facilitate access to public sector data held by government entities — tax records, benefit claim data, public health surveillance data — for sandbox participants. This is the primary intended use case for Art.59. If your AI system needs labeled training data for creditworthiness assessment or employment screening, the NCA can coordinate access to aggregated public sector datasets under the sandbox framework.
The NCA acts as the intermediary. It negotiates the data sharing agreement with the public body, establishes the pseudonymisation protocol, and provides you with access to the data within the controlled sandbox environment.
Timeline: This process typically takes 4-8 weeks from sandbox approval to data access. Factor this into your sandbox development plan.
Private sector data sharing
Art.59 also covers personal data held by private sector entities — banks, hospitals, insurers — who agree to make it available for sandbox testing. This is voluntary on the data controller's part, but some controllers (particularly those in regulated industries with innovation mandates) will participate.
The data controller remains the GDPR data controller throughout. They determine pseudonymisation implementation, set boundaries on what attributes are available, and can terminate data access at any time. The sandbox framework gives them the legal basis (Art.59) to share — it does not compel them to share.
Synthetic data generation
For many high-risk AI domains, the practical path to sufficient training data is synthetic data generation from a real data seed. Art.59 permits this approach: access a pseudonymised real dataset under sandbox conditions, use it to train a generative model, generate a synthetic dataset that mirrors the statistical properties of the real data, then use the synthetic dataset for the bulk of your training.
The synthetic dataset has a different legal status than the real data. If it has been generated with sufficient differential privacy guarantees that re-identification of any real individual is not technically feasible, it may fall outside GDPR scope entirely. Your DPA (data protection authority) can provide guidance on the threshold — this analysis should involve your data protection officer.
Protecting Your Intellectual Property During NCA Oversight
Sandbox supervision creates a genuine IP risk that developers sometimes overlook: the national competent authority has access to your AI system's architecture, training approach, and technical documentation. In principle, NCA staff could observe novel techniques that constitute trade secrets.
The EU AI Act addresses this explicitly. Art.57 includes provisions requiring NCAs to protect the confidentiality of information shared by sandbox participants. Specifically:
- NCAs are bound by confidentiality obligations equivalent to those that apply to providers under Art.78 of the EU AI Act
- Technical documentation shared with the NCA during sandbox supervision cannot be disclosed to third parties or other sandbox participants
- NCAs are required to maintain secure handling procedures for confidential business information
Despite these protections, practical risk management within the sandbox is warranted.
What to share versus what to withhold
The sandbox requires enough technical transparency to allow the NCA to supervise compliance with AI Act requirements. It does not require disclosure of every proprietary aspect of your system.
Required disclosure: Model architecture at the level necessary to assess compliance with Art.9 (risk management system), Art.10 (training data governance), Art.13 (transparency), and Art.17 (quality management). The NCA needs to understand what your system does, how it makes decisions, and what safeguards exist — not the full model specification.
Not required: Source code beyond what is necessary to verify compliance; proprietary loss functions or training recipes; hyperparameter configurations beyond what affects the compliance analysis; third-party components covered by their own licenses.
Structuring your sandbox documentation for IP protection
Prepare two layers of technical documentation:
Layer 1 — Regulatory disclosure package: Contains everything the NCA needs to supervise compliance. Describes the system's function, risk management approach, training data sources, transparency mechanisms, and conformity assessment evidence. This is shared fully with the NCA.
Layer 2 — Technical implementation detail: Contains proprietary specifics that do not affect the compliance analysis. This layer is available to the NCA on specific request, not as part of standard supervision.
Document this two-layer approach in your sandbox agreement with the NCA upfront. Most NCAs will accept this structure because they are focused on compliance, not on understanding proprietary implementation details.
Watermarking and versioning
Before submitting any model artifacts to sandbox evaluation, implement internal watermarking or version fingerprinting. This serves two purposes: it allows you to detect if your model appears in unexpected contexts, and it provides evidence in any trade secret dispute about provenance.
Data Handling at Sandbox Exit
When your sandbox period ends — whether through planned completion, early termination, or transition to full deployment — you face specific data handling obligations.
What must be deleted
All personal data accessed under Art.59 authorization must be deleted from the sandbox environment at exit. This includes:
- Raw and pseudonymised training data
- Evaluation datasets
- Intermediate checkpoints that encode personal data
- Logs that contain personal data
Document the deletion with certificates that include timestamps, data categories deleted, and confirmation from the data controller.
What model state carries legal risk
Model weights trained on personal data exist in a legal grey zone. Regulators and courts have not definitively resolved whether a trained model constitutes "personal data" under GDPR. The precautionary approach: if your model was trained primarily on Art.59-authorized personal data, and you cannot demonstrate that the model is not a means of re-identifying any data subject, treat the model weights as personal data for GDPR purposes.
The practical implication: if you intend to use models trained in the sandbox in production, you need a GDPR-compliant production training pipeline that does not rely on Art.59 authorization. The sandbox gives you a model architecture and validated approach; it does not give you a production model trained on personal data you would not otherwise be able to use.
Synthetic datasets as exit artifacts
Synthetic datasets generated from Art.59-authorized data may be retainable post-sandbox if they meet differential privacy standards sufficient to prevent re-identification. Obtain a written opinion from your DPA before treating any synthetic dataset as free from Art.59 restrictions.
The sandbox final report and data appendix
Art.57 requires a final report upon sandbox completion. The data appendix to this report should document:
- All data sources accessed and under what authorization
- Pseudonymisation protocols implemented
- Data retention and deletion actions at exit
- Any synthetic datasets generated and their privacy analysis
- A statement of compliance with Art.59 requirements throughout the sandbox period
This documentation is your evidence of lawful processing during the sandbox. It also feeds directly into the Annex IV technical documentation package for your eventual conformity assessment.
Coordination With Your Data Protection Authority
Art.59 processing does not exempt you from DPA oversight. National competent authorities conducting AI sandbox supervision are required to cooperate with data protection authorities when sandbox activities involve personal data processing.
Practically: notify your national DPA of your sandbox participation and the data processing activities you intend to conduct. Many DPAs have issued guidance on Art.59 implementation that provides country-specific requirements beyond the baseline EU AI Act text.
In Germany (BfDI), France (CNIL), and the Netherlands (AP), DPAs have published sandbox-specific guidance that includes requirements for data protection impact assessments (DPIAs) conducted specifically for sandbox activities. Even if a DPIA is not strictly required under GDPR Art.35 for your project, conducting one for sandbox data processing demonstrates good faith and provides documentation that protects you during the NCA compliance review.
A Practical Data Access Checklist for Sandbox Participants
Before entering a sandbox and initiating data access:
Pre-sandbox:
- Identify data sources needed for realistic testing
- Determine whether Art.59 applies (personal data collected for other purposes)
- Contact NCA about public sector data access coordination
- Identify private sector data controllers willing to participate
- Evaluate synthetic data generation as a complement to real data
- Consult DPA on country-specific Art.59 implementation requirements
- Draft data sharing agreements with participating data controllers
Technical setup:
- Deploy functionally separated sandbox environment with documented isolation
- Implement pseudonymisation pipeline with key held by data controller
- Configure DLP controls on sandbox boundary
- Establish NCA audit log access
- Implement internal model watermarking before any NCA review
During sandbox:
- Log all data access with timestamps and user identifiers
- Document pseudonymisation key management operations
- Maintain two-layer documentation (regulatory disclosure + proprietary detail)
- Track all data sources and volumes accessed under Art.59
At sandbox exit:
- Delete all Art.59-authorized personal data with deletion certificates
- Assess model weights for personal data re-identification risk
- Obtain DPA opinion on synthetic dataset portability
- Complete data appendix to final sandbox report
- Archive compliance documentation for post-sandbox conformity assessment
What This Means for Your Infrastructure Choice
The sandbox data requirements have a specific implication for where you run your sandbox environment: it needs to stay within EU jurisdiction.
Personal data accessed under Art.59 authorization from a national competent authority is subject to GDPR. Processing it on infrastructure subject to third-country government access orders — including US CLOUD Act jurisdiction — creates a conflict with the GDPR data minimization and confidentiality requirements.
If your sandbox data includes health records, financial data, or biometric data coordinated by a German, French, or Dutch NCA, the expectation is that this data does not flow to infrastructure where a foreign government could access it via extraterritorial legal process.
This is where the infrastructure choices discussed in our Art.12/Art.19 record-keeping guide and our Art.9 risk management documentation guide become directly relevant to sandbox operations. EU-hosted infrastructure with no US-parent exposure is not just a preference for sandbox work — it may be the only technically compliant option when the NCA coordinates access to sensitive personal data categories.
Looking Ahead: The Sandbox Exit and Transition to Full Deployment
Post #5/5 in this series will cover the complete sandbox exit process: the Art.57 final report requirements, how to convert sandbox evidence into your full conformity assessment package, what post-sandbox monitoring obligations look like, and the complete developer checklist for the entire sandbox lifecycle from application to market access.
The data and IP framework covered in this post feeds directly into the exit report. If you have documented your Art.59 processing, maintained the required safeguards, and produced the data appendix described above, the exit documentation becomes a matter of assembly rather than reconstruction.
Part of the sota.io EU AI Act Regulatory Sandbox 2026 series: Developer Guide #1 · Application in Practice #2 · Testing Protocol #3 · Data Access & IP Protection #4 · Checklist Finale #5 (coming)
EU-Native Hosting
Ready to move to EU-sovereign infrastructure?
sota.io is a German-hosted PaaS — no CLOUD Act exposure, no US jurisdiction, full GDPR compliance by design. Deploy your first app in minutes.