Runtime Evidence as the New Architecture for EU AI Act Compliance

This analysis examines how AI governance is evolving from static compliance materials, such as policies, principles, and internal guidance documents, toward evidence-based compliance that can be tested at runtime. The specific question is whether the chain intent → control → evidence → approval → baseline is a legally useful way to organize compliance under the EU AI Act.

The issue matters because modern AI systems are not governed only at the moment a policy is written. They are deployed, updated, monitored, reconfigured, and used in changing factual contexts. A legal compliance program therefore needs to show not only that an organization adopted the right rules on paper, but also that the AI system actually operated within those rules.

In practical terms, this means being able to answer questions such as: What was the system approved to do? What risks were identified? What controls were put in place? What logs or records show how the system behaved? Who reviewed or approved the system’s use? What version, dataset, model, or configuration was treated as the approved baseline?

Assumptions

This analysis assumes the relevant systems may be high-risk AI systems under Regulation (EU) 2024/1689. “Approval” is used broadly to include conformity assessment, EU declaration, registration, deployer impact assessment or notification, internal accountable sign-off, and human review where required; not every AI use case requires the same approval mechanism.

The EU AI Act applies on a risk-based basis and can reach actors inside and outside the EU where AI systems or general-purpose AI models are placed on the EU market, put into service, or used in the EU under the Act’s scope rules.

Summary

The proposed chain intent → control → evidence → approval → baseline is a strong compliance architecture for high-risk AI systems under the EU AI Act. The Act does not reduce compliance to a policy manual; it requires lifecycle risk management, technical documentation, logging, quality-management controls, human oversight, accuracy/robustness/cybersecurity measures, conformity assessment, and deployer monitoring.

Intent maps to the legally defined intended purpose, reasonably foreseeable misuse, deployment context, affected persons, and fundamental-rights impact assessment where applicable. Article 9 requires a risk management system across the lifecycle, including risks in intended use and reasonably foreseeable misuse; Article 27 requires certain deployers to assess use context, affected categories of natural persons, harm risks, oversight, and mitigation.

Control maps to Article 9 risk control measures, Article 14 human oversight, Article 15 accuracy/robustness/cybersecurity, and Article 17 quality-management procedures. The legally relevant control layer must be technical and organizational, not merely declaratory.

Evidence maps most directly to Article 11 technical documentation, Article 12 automatic logs, Article 18 documentation retention, Article 19 provider log retention, Article 26 deployer log retention and monitoring, and Article 17 QMS recordkeeping. Article 12 is the key traceability provision: high-risk systems must technically enable automatic event logging throughout their lifetime.

Approval maps to conformity assessment, EU declaration of conformity, CE marking/registration where applicable, notified-body review for certain systems, deployer FRIA and notification duties, and human oversight decisions. The correct approval point depends on whether the actor is a provider, deployer, product manufacturer, importer, distributor, public authority, or sector-regulated entity.

Baseline maps to the conformity-assessed system version, Article 13 instructions and declared performance characteristics, Article 15 accuracy/robustness/cybersecurity metrics, Article 17 QMS controls, and change-management evidence. A defensible baseline should include system version, model/data/software versioning, intended purpose, approved use conditions, test metrics, known limitations, oversight protocol, logging schema, and change thresholds.

Does the EU AI Act support the move from policy documents to runtime evidence?

Conclusion. Yes, for high-risk AI systems. The EU AI Act’s structure makes runtime evidence legally important because the core duties are lifecycle duties: risk management, logging, documentation, monitoring, human oversight, robustness, incident handling, and authority-facing demonstrability.

Rule. Article 9 requires a risk-management system that is established, implemented, documented, maintained, and run as a continuous lifecycle process. The required process includes identifying and analyzing known and reasonably foreseeable risks, evaluating risks based on post-market monitoring data, and adopting targeted risk-management measures.

Article 11 requires technical documentation to be drawn up before placing a high-risk AI system on the market or putting it into service, kept up to date, and sufficient to demonstrate compliance with the relevant requirements. Article 12 then requires high-risk AI systems to technically allow automatic event logs throughout the system’s lifetime, with logging capabilities enabling traceability appropriate to the system’s intended purpose.

Article 17 requires providers to maintain a documented quality-management system that includes regulatory compliance strategy, design/development/testing controls, data management, risk management, post-market monitoring, serious-incident procedures, recordkeeping, resource management, and accountability allocation. Articles 18 and 19 require preservation of documentation and provider-controlled logs, with Article 19 setting at least a six-month floor for logs under the provider’s control unless other applicable law requires otherwise.

Application. A governance program consisting only of static AI policies would not, by itself, evidence whether a system operated within approved use conditions, whether human oversight occurred, whether logs captured material events, whether accuracy and robustness thresholds held after deployment, whether inputs remained relevant and representative, or whether risk controls were updated after incidents or drift. The Act’s evidentiary center of gravity is therefore not “we have a policy,” but “we can prove, for this system and this deployment, that the intended purpose, controls, runtime behavior, approvals, monitoring, and version baseline are aligned.”

How does the chain “intent → control → evidence → approval → baseline” map to the AI Act?

Provider and deployer obligations require different evidence chains

Conclusion. The chain should be role-specific. Providers and deployers have overlapping but distinct obligations, and the evidence architecture should identify which party owns each artifact.

Provider-side rule and application. Article 16 requires providers of high-risk AI systems to ensure compliance with the relevant high-risk requirements, maintain a QMS, keep documentation, keep automatically generated logs under their control, undergo conformity assessment, draw up the EU declaration of conformity, affix CE marking where applicable, comply with registration duties, take corrective action where necessary, and demonstrate conformity on request.

For providers, the chain should therefore operate as a pre-market and lifecycle technical file. Intent is the intended purpose and classification basis. Control is the design, data, testing, oversight, robustness, and cybersecurity control set. Evidence is the technical file, QMS records, test evidence, and logs. Approval is conformity assessment and EU declaration. Baseline is the conformity-assessed version and permitted change envelope.

Deployer-side rule and application. Article 26 requires deployers to use high-risk systems according to instructions, assign human oversight to competent and properly supported natural persons, ensure input data are relevant and sufficiently representative where the deployer controls input data, monitor operation, suspend use and notify where risk is detected, report serious incidents, and keep logs under their control for at least six months unless other applicable law requires otherwise.

For deployers, the chain should operate as a runtime governance record. Intent is the actual business or public-sector use case. Control is the deployment configuration, access rights, oversight process, input-data checks, and escalation process. Evidence is the runtime log, human review record, input-data audit, monitoring record, and incident report. Approval is the deployment approval and, where applicable, FRIA completion and notification. Baseline is the approved deployment configuration and operating conditions.

Human oversight and approval cannot be reduced to rubber-stamping

Conclusion. The “approval” stage must be substantive where human oversight is legally relevant. The Act requires that natural persons be able to understand, monitor, interpret, and intervene; mere formal approval without operational capability would be weak evidence.

Rule. Article 14 requires high-risk systems to be designed and developed with appropriate human-machine interface tools so they can be effectively overseen by natural persons. The oversight measures must allow assigned humans to understand capacities and limitations, monitor operation, detect anomalies, avoid automation bias, interpret outputs, decide not to use or to override outputs, and intervene or interrupt the system through a stop procedure where appropriate.

Article 26 separately requires deployers to assign human oversight to natural persons who have the necessary competence, training, authority, and support.

Application. A compliant approval chain should not merely record that a manager clicked “approve.” It should show the basis for approval: the intended purpose, the applicable baseline, known limitations, human oversight instructions, risk controls, escalation paths, and evidence reviewed. For systems used in sensitive contexts such as employment, education, biometric identification, migration, law enforcement, or access to essential services, approval evidence should also show that oversight personnel had authority to override, stop, or escalate system outputs where required.

Robustness, cybersecurity, and baseline management require runtime comparison against an approved state

Conclusion. The “baseline” stage is legally important because Article 15 requires lifecycle performance, resilience, and cybersecurity; those requirements are difficult to prove without a versioned baseline and runtime comparison evidence.

Rule. Article 15 requires high-risk AI systems to be designed and developed so they achieve appropriate accuracy, robustness, and cybersecurity and perform consistently throughout their lifecycle. The instructions must declare relevant accuracy metrics. The system must be resilient to errors, faults, inconsistencies, and unauthorized attempts to alter use, outputs, or performance.

Article 13 also requires instructions for use to include characteristics, capabilities, limitations, risks, accuracy/robustness/cybersecurity information, input-data specifications, information needed to interpret outputs, human oversight measures, and mechanisms for collecting, storing, and interpreting logs.

Application. A legally useful baseline should include, at minimum, the approved model and software version, intended purpose, operating environment, permitted users, input-data assumptions, declared performance metrics, robustness/cybersecurity assumptions, human oversight configuration, logging specification, and change-control thresholds. Runtime evidence should then show whether actual operation remained within that baseline. Deviations should trigger review, corrective action, suspension, or new approval depending on materiality and risk.

Limits and counterarguments

The chain should not be overstated. First, the full set of Article 9–15 high-risk obligations applies only where the AI system is high-risk or otherwise caught by the relevant provisions. Non-high-risk systems, GPAI models, and transparency-only systems may be governed by different duties. Official Commission Q&A materials describe high-risk classification by reference to systems that are safety components of regulated products or fall into specified Annex III use areas.

Second, runtime evidence must be reconciled with data-protection and sectoral retention law. Articles 19 and 26 expressly preserve other applicable Union or national law, especially personal-data law, when setting log-retention periods. More logging is not automatically better if it creates unlawful or excessive personal-data processing.

Third, official implementation timing is still important. Commission materials state that the AI Act entered into force on 2024-08-01 and describe staged applicability, while 2026 Digital Omnibus materials record a political agreement changing timing for certain high-risk obligations. The Parliament’s release states the provisional agreement needs formal adoption before becoming law, so final OJ status should be verified for any date-sensitive compliance plan.

Runtime Evidence as the New Architecture for EU AI Act Compliance

Assumptions

Summary

Does the EU AI Act support the move from policy documents to runtime evidence?

How does the chain “intent → control → evidence → approval → baseline” map to the AI Act?

Provider and deployer obligations require different evidence chains

Human oversight and approval cannot be reduced to rubber-stamping

Robustness, cybersecurity, and baseline management require runtime comparison against an approved state

Limits and counterarguments

More from the journal

Connecticut Enacts AI Transparency Act Covering Employer Automated Decision Tools, June 2026

DIFC Digital Economy Court Dismisses 300-Bitcoin Custody Claim Against Tabarak, 2026

Vietnam's AI Law No. 134/2025/QH15 Takes Effect March 2026 with Risk-Based Obligations

Ready to launch without the regulatory guesswork?

Try Licentium AI

Browse the Fintech Licensing Hub

Talk to us