Aikido

AI Pentesting for Compliance

Written by
Jens Gellynck

Can autonomous pentests meet compliance requirements?

For two decades, “penetration testing” has meant the same thing: once a year, you hire a firm, a human tester spends a week or two on your systems, and you get a PDF. Most compliance frameworks were written around exactly that ritual, a slow, manual, point-in-time engagement.

Software doesn’t ship once a year anymore. It ships many times a day. The annual pentest was already a snapshot that went stale the moment it was signed; in a world of continuous deployment it is a snapshot of a system that no longer exists.

Aikido’s AI penetration testing is built for that reality: continuous, autonomous testing that runs against your applications and APIs as they evolve, and produces a report you can hand to an auditor. The question from any CISO or compliance lead is: will that report be accepted for compliance?

The short answer is “for most frameworks, yes, and for a few that require an accredited human, no.” This article walks through major compliance frameworks and gives a clear verdict.

How Aikido’s autonomous pentest works

Aikido runs a fleet of AI agents through the same phases a human pentester follows:

  1. Scope and ownership validation. Targets are defined up front and split into attackable and accessible. To avoid abuse, we verify you own the assets in scope (e.g. via DNS record verification) before any test traffic is sent.
  2. Enumeration and threat modeling. Agents map every feature, endpoint, and API in scope. In white-box engagements the codebase itself is the enumeration source and the agents build a recon report and an attack plan. In grey/black-box engagements they rely on crawling, fuzzing and wordlist generation.
  3. Vulnerability identification and exploitation. Agents actively test for and exploit weaknesses, but only as far as needed to prove impact. They do not attempt to gain persistent access to your systems, meaning no data destruction, no denial of service, no lateral movement into infrastructure, no backdoors.
  4. Validation. Dedicated validator agents reproduce each finding to confirm it is real and exploitable, which is what separates a penetration test from a vulnerability scanner. Once the pentest is finished, all data and artifacts generated during the assessment are removed from our systems.
  5. Reporting. You get an executive summary plus full technical detail: description, severity, impact assessment, reproduction steps and remediation advice for every confirmed finding.
  6. Remediation. Instead of just leaving you with a to-do list, Aikido’s AI AutoFix automatically generates pull requests or patches for confirmed vulnerabilities (where possible). This allows you to fix the findings instantly and immediately retest to confirm the risk is gone.

What we test for. The methodology covers (but is not limited to) the OWASP Top 10, OWASP Top 10 for Agentic Applications, and the OWASP API Security Top 10, including the classes that automated scanners miss: broken access control and Broken Object-Level Authorization (BOLA), authentication flaws, injection, SSRF, security misconfiguration, and business-logic abuse. Because the agents are agentic rather than signature-based, they chain steps and reason about context the way an attacker does, then stop at proof-of-impact.

Why the report holds up as audit evidence

An autonomous tool is only useful for compliance if you can trust and prove how it behaved. Aikido’s testing is built around a structured methodology and a set of technically-enforced guardrails to keep autonomous testing safe, bounded, and auditable.

In practice that translates to controls auditors care about:

  • Hard guardrails, not prompts. Scope is enforced at the network level, and anything out of scope is automatically blocked. Guardrails are technically enforced through a kernel-level sandbox the agent has no credentials to escape, never through “soft” prompt instructions.
  • Containment and safety. Agents run in isolated sandboxes, are throttled and resource-limited, and a list of prohibited actions (DoS tooling, data exfiltration, destructive operations, credential brute-forcing) is enforced by design.
  • A kill switch and immutable audit trail. Every action, command, request and the agent’s reasoning is logged to a tamper-evident trail and retained for at least a year. Traffic carries a unique HTTP header and originates from dedicated static IPs, so it is traceable and whitelistable.
  • Accuracy. Findings are validated to avoid false positives and any AI model change is heavily benchmarked before it reaches production, ensuring a continuously improving system.
  • Complete transparency and verifiable coverage. We provide both customers and auditors with visibility into the entire testing process by delivering screenshots, a comprehensive coverage overview, and the explicit reasoning behind every agent action. Unlike traditional human pentests, which often only deliver final findings without providing evidence of what was actually executed, Aikido gives you full proof of work so you can verify exactly what was tested.

This combination is the evidence most audits ask for: independent third-party testing, reproducible findings, a transparent audit trail, and a structured report with remediation guidance.

Does a rightsized pentest hold up as evidence?

A rightsized pentest is an AI-driven assessment that dynamically scales its depth and pricing to match your application's exact size and architectural complexity. Aikido analyzes your repos, endpoints, and roles, then sets the right scope automatically. Small app, small price. Complex platform, thorough coverage.

Fundamentally, a rightsized pentest isn’t that different from our ‘normal’ pentests. The core difference is that the scope is automatically set rather than manually. It remains the client's responsibility to review and ensure that the scope is correctly set. As such rightsized pentests still meet the requirements of compliance frameworks that permit autonomous penetration testing.

What auditors look for

It helps to know what auditors or standards actually look for in a penetration test:

  • A documented, repeatable methodology: not ad-hoc poking.
  • Independence: the tester is not the same team that builds or runs the system.
  • Real testing of effectiveness: going beyond automated vulnerability scanning.
  • Evidence: findings, severity, and proof.
  • Remediation and re-testing of findings.

Notice what is usually not specified: that a human must press the keys. Some frameworks do require human or manual testing, those are the ones to watch out for.

Overview

Autonomous Penetration Testing — Compliance Framework Mapping
Framework Exact requirement Autonomous pentest? How the report helps
Regulation
NIS2 Art. 21(2)(e),(f); CIR 2024/2690 §6.10, §7.1 Yes Automated and pentest expressly contemplated
GDPR Art. 32(1)(d) Yes Demonstrates a regular testing process
CRA Annex I Part II(3); Part I Yes “Effective and regular” lifecycle testing
International
SOC 2 TSC CC4.1, CC7.1 Yes Independent evidence for “ongoing evaluations”
ISO 27001 Annex A 8.8, 8.25, 8.29 Yes “Planned, documented, repeatable” testing
Healthcare
HIPAA 45 CFR § 164.308(a)(8); 2024 NPRM Yes Evidences periodic technical evaluation; ready for the proposed annual pentest
HITRUST Control 06.h (Technical Compliance Checking); annual/ongoing Yes Autonomous test accepted; External Assessor validates the evidence
FDA FD&C § 524B + premarket guidance Yes Supplies the required penetration test report for premarket submission
EU MDR Annex I GSPR 17.2, 17.4; MDCG 2019-16 Yes Valid V&V evidence across the lifecycle
IEC 81001-5-1 §5.7.4 (SVV-4), §5.7.5 Yes Third-party independence satisfies SVV-4 directly
Automotive
ISO/SAE 21434 Yes
Government
ENS Measure mp.s.3; Art. 31 audit Yes Satisfies mp.s.3; continuous cadence beats the minimums
NIST 800-53 CA-8, CA-8(1), CA-8(2) Yes, with concurrence Independent, “beyond scanning”; confirm with assessor
EO 14028 SP 800-218 PW.8 / PW.8.2 Yes Artifact behind the CISA self-attestation
FedRAMP CA-8 + Pen Test Guidance (3PAO) No The authorizing and annual pentests must be performed by an accredited 3PAO
FISMA 800-53 CA-8 via RMF Yes, at agency discretion Mirrors CA-8
Finance
PCI DSS Req 11.4.1 to 11.4.6 No The 11.4 pentest must be performed by a qualified human tester; an autonomous report is not accepted as that test
DORA Art. 24-25 / Art. 26-27 (TLPT) Yes for 24/25; No for TLPT Meets the Art. 24/25 testing programme; Art. 26 TLPT requires external human red-teamers
FTC Safeguards 16 CFR §314.4(d)(2) Yes Continuous monitoring is an explicit substitute for the annual pentest
NYDFS 23 NYCRR §500.5 Yes Continuous monitoring, or annual pentest plus bi-annual assessment

Conclusion: The standards are behind the technology

The standards that say “test the effectiveness of your controls, regularly” (GDPR, CRA, NIS2, SOC 2, ISO 27001, HIPAA, the FDA guidance, IEC 81001-5-1, the SSDF) were written around an outcome, and they accommodate continuous autonomous testing without strain. Several of them arguably favor it.

The standards that struggle are the ones that encoded the delivery model of 2010 into the control itself: an annual, manual, human-credentialed engagement (3PAO accreditation, TLPT red teams, PCI’s “manual techniques”). They are not wrong to value human expertise. They were just written at a moment when the only way to run penetration testing was to hire a person for a week, once a year. They were not prepared for agentic systems that test every build, prove every finding, log every action, then fix and re-test what they find.

Industry Specifics

Finance Industry

PCI DSS

The requirement: PCI DSS is the most prescriptive framework here. It requires a documented penetration-testing methodology, internal and external penetration tests at least once a year and after significant changes, re-testing of anything you fix, and separate testing of the controls that segment the cardholder data environment (more often for service providers). The test has to be run by a qualified tester who is organizationally independent from the systems under test.

Can autonomous testing satisfy it? No. PCI’s own Penetration Testing Guidance draws a line between a penetration test and a vulnerability scan: a scan is automated, while a penetration test is a manual process of exploitation that relies on the skill of a qualified, independent tester. Automated tools may assist, but the guidance treats the manual work as the test itself. An autonomous penetration test will not be accepted as the PCI pentest.

Aikido’s agents do exploit business logic, BOLA and chained flaws, so they are worth running as continuous security testing alongside the required engagement, including after significant changes. That is a security benefit and a source of remediation evidence, not a PCI sign-off. Plan the qualified human pentest separately.

Verdict: Does not meet the requirement. The penetration test must be performed by a qualified human tester. Autonomous testing will not be accepted as the PCI pentest.

Reference: PCI DSS v4.0.1 Requirement 11.4 (11.4.1 to 11.4.6); PCI SSC Penetration Testing Guidance.

DORA

The requirement: DORA has two tiers of testing. The general tier is a digital operational resilience testing program that every financial entity must establish, and penetration testing is one of the methods it must use. The advanced tier requires Threat-Led Penetration Testing (TLPT) at least once every three years for significant financial entities, with strict rules about who may run it.

Can autonomous testing satisfy it? Yes for the general program, no for TLPT. The general program is method-flexible and its listed methods include penetration testing, so continuous autonomous testing fits it and goes beyond the periodic minimum. TLPT is different. It is modeled on the ECB’s TIBER-EU framework and requires external, qualified red-teamers and an external threat-intelligence provider, with significant credit institutions required to use external testers exclusively. That is, by design, a human red-team engagement.

Verdict: Yes for the general testing program. It will not meet the TLPT requirement, which mandates external human red-teamers. Use Aikido to run and evidence the general program. The TLPT is a separate engagement that must be performed by humans.

Reference: DORA (Regulation (EU) 2022/2554) Articles 24 and 25 (testing program), Articles 26 and 27 (TLPT).

FTC Safeguards Rule

The requirement: The FTC Safeguards Rule governs how financial institutions protect customer information. It requires regular testing of how well your safeguards work, and gives you two ways to do it: continuous monitoring, or, in its absence, an annual penetration test plus vulnerability assessments at least every six months. Testing is also required after major changes to operations.

Can autonomous testing satisfy it? Yes. Continuous monitoring is written in as a direct alternative to the annual penetration test, which is what continuous autonomous testing provides. For an institution that prefers the periodic route, one autonomous program supplies both the annual test and the six-month assessments. The rule sets no human or accreditation requirement for the tester.

Verdict: Yes. Continuous monitoring is an explicit substitute for the annual pentest.

Reference: FTC Safeguards Rule, 16 CFR 314.4(d) and 314.4(d)(2).

NYDFS Cybersecurity Regulation

The requirement: New York’s financial-services cybersecurity regulation applies to banks, insurers and other entities licensed in New York, and it is referenced beyond the state as a financial-sector baseline. Its penetration-testing section requires testing built around your risk assessment, structured as either continuous monitoring or an annual penetration test together with twice-yearly vulnerability assessments.

Can autonomous testing satisfy it? Yes. As with the FTC rule, the regulation treats continuous monitoring and the annual penetration test as alternatives. Continuous autonomous testing maps onto the continuous-monitoring path, and for entities that choose the periodic route it also produces the annual test and the twice-yearly assessments. The regulation sets no tester-accreditation requirement.

Verdict: Yes. Continuous monitoring satisfies the requirement on its own.

Reference: NYDFS Cybersecurity Regulation, 23 NYCRR 500.5 (Second Amendment, 2023).

Healthcare Industry

HIPAA

The requirement: HIPAA’s Security Rule does not explicitly name penetration testing. Its evaluation standard requires a periodic technical and non-technical evaluation of your safeguards, which is the hook pentests typically fall under. A proposed update from December 2024 would make it explicit, calling for vulnerability scanning at least every six months and penetration testing at least once a year. As of mid-2026 that update is not final, but the direction is clear.

Can autonomous testing satisfy it? Yes. Neither the current evaluation standard nor the proposed update mandates a human tester. An autonomous report is evidence of a periodic technical evaluation today, and it would satisfy the proposed annual-pentest obligation tomorrow, with continuous testing exceeding a once-a-year cadence.

Verdict: Yes, and ready for the proposed rule.

Reference: HIPAA Security Rule, 45 CFR 164.308(a)(8); 2024 NPRM (RIN 0945-AA22).

HITRUST CSF

The requirement: HITRUST CSF is a certifiable framework that US healthcare organizations and their vendors use to demonstrate protection of Protected Health Information (PHI). Penetration testing sits in its technical-compliance and security-assessment requirements. For the higher (r2) certification the test must fall within a rolling 12-month window and run as an ongoing program rather than a single yearly event, with findings tracked and retested.

Can autonomous testing satisfy it? Yes HITRUST does not require a human or accredited tester for the penetration test, and its preference for an ongoing program over a once-a-year event lines up with continuous autonomous testing. An autonomous penetration testing report is valid evidence for the assessor.

Verdict: Yes for the penetration testing requirement. The External Assessor validation is a separate audit step.

Reference: HITRUST CSF control 06.h (Technical Compliance Checking).

FDA premarket cybersecurity guidance

The requirement: The FDA’s guidance Cybersecurity in Medical Devices: Quality System Considerations and Content of Premarket Submissions recommends a layered security-testing approach, but in practice requires a penetration test report during pre-market submissions.

Can autonomous testing satisfy it? Yes. The guidance is outcome-oriented: it wants evidence that testing was performed, by whom, what scope, what was found and most importantly what you did about it. It ultimately comes down to ensuring the safety risks are under control. Aikido’s AI pentest report meets those expectations.

Verdict: Yes, but document scope, methods, and independence in the submission.

Reference: FD&C Act Section 524B; FDA premarket cybersecurity guidance (2025).

EU Medical Device Regulation (MDR)

The requirement: The MDR’s general safety and performance requirements expect medical-device software to be developed to the state of the art, with verification and validation and minimum IT-security measures across the product’s life. The EU’s medical-device cybersecurity guidance (MDCG 2019-16) names penetration testing as part of that verification and validation, alongside security-feature testing, fuzzing and vulnerability scanning.

Can autonomous testing satisfy it? Yes. The MDR and its guidance are method-neutral, and an autonomous penetration test is valid verification-and-validation evidence. Continuous testing also fits the life-cycle emphasis better than a one-off test. Similar to the FDA guidance, it comes down to proving the safety risks are under control.

Verdict: Yes.

Reference: EU MDR (Regulation (EU) 2017/745) Annex I, GSPR 17.2 and 17.4; guidance MDCG 2019-16.

IEC 81001-5-1

The requirement: This is the secure-software-lifecycle standard for health software, and will be harmonized for the MDR. Its software-system-testing activities include security-requirements testing, threat-mitigation testing, vulnerability testing and penetration testing. The standard asks for penetration testing to be carried out by a department or organization that is independent of the developers, and it has a separate provision on managing conflicts of interest between testers and developers.

Can autonomous testing satisfy it? Yes, and the independence requirement is a point in your favor. What the standard demands is organizational independence from the developers, not a human tester. As an external third party, Aikido satisfies that independence requirement, while the autonomous methodology and audit trail provide the documented, repeatable activity the standard expects.

Verdict: Yes, third-party independence satisfies the standard.

Reference: IEC 81001-5-1:2021 clause 5.7.4 (mapped to SVV-4) and 5.7.5.

Automotive

ISO/SAE 21434

The requirement: Automotive cybersecurity rests on ISO/SAE 21434, the engineering standard for vehicle cybersecurity, its risk methodology, and Threat Analysis and Risk Assessment (”TARA”). The standard names penetration testing as a way to validate that the cybersecurity goals were met.

Can autonomous testing satisfy it? Yes, for the parts it can reach. The standard is outcome based. Penetration testing is one validation method among several (e.g. fuzzing, SAST, DAST, …), and an autonomous pentest report is valid evidence, with continuous testing fitting the lifecycle emphasis as well. One caveat: vehicles are built on embedded components, physical security testing at a hardware level cannot be done through autonomous methods.

Verdict: Yes for the connected and backend attack surface. The standard is method-flexible and accepts autonomous testing as validation evidence.

Reference: ISO/SAE 21434:2021 cybersecurity validation (Clause 11, RQ-11-01).

Government & public sector

ENS

The requirement: Spain’s Esquema Nacional de Seguridad lists penetration testing as an explicit security measure. It is mandatory for high-category systems and recommended for medium-category systems, and recent results feed the framework’s periodic audit.

Can autonomous testing satisfy it? Yes. ENS specifies that you test, not who keys it in. Autonomous pentest results satisfy the measure, and continuous testing beats the recommended annual (high) and biennial (medium) frequencies.

Verdict: Yes, autonomous pentests fulfil the requirement.

Reference: ENS (Real Decreto 311/2022) Annex II measure mp.s.3; periodic audit under Article 31.

NIST SP 800-53

The requirement: NIST 800-53 has a dedicated penetration-testing control. It calls for penetration testing at a frequency the organization sets, provides for an independent penetration agent or team, and adds red-team exercises as an enhancement. The control says explicitly that penetration testing goes beyond automated vulnerability scanning and is carried out by agents and teams with demonstrable skills.

Can autonomous testing satisfy it? Largely yes, more so than PCI. The control is framed around independence and going beyond scanning, both of which Aikido meets: it is an independent third party, and it exploits and validates rather than only scanning. The control even uses the word “agents.” The residual judgment, whether an autonomous agent shows the required “skills,” sits with the assessing authority, so confirm acceptance with your assessor.

Verdict: Yes, with assessor concurrence. Independence and “beyond scanning” are clearly met.

Reference: NIST SP 800-53 Rev. 5, control CA-8 (with CA-8(1) and CA-8(2)).

US EO 14028

The requirement: This US executive order drove the Secure Software Development Framework (SSDF). The framework has a practice for testing executable code to find vulnerabilities, which is where dynamic testing, fuzzing and penetration testing sit. Suppliers to US federal agencies self-attest to following the framework on a CISA attestation form.

Can autonomous testing satisfy it? Yes. The framework is technology-neutral. An autonomous penetration test is a legitimate way to meet the code-testing practice, and your report is the evidence behind the self-attestation.

Verdict: Yes, human or manual testing is not required.

Reference: EO 14028 Section 4(e); NIST SSDF (SP 800-218) practice PW.8 and PW.8.2; CISA Secure Software Development Attestation Form (OMB M-22-18).

FedRAMP

The requirement: FedRAMP requires an annual penetration test across its baselines, run to a mandatory set of attack vectors and performed by an accredited third-party assessment organization (a 3PAO) for Moderate and High systems.

Can autonomous testing satisfy it? No. The penetration test behind a FedRAMP authorization, and the annual pentest that maintains it, must be performed by an accredited 3PAO. An autonomous test by a non-3PAO will not be accepted in an authorization package or an annual assessment. It can run between those engagements as additional security testing, but that is not authorization evidence.

Verdict: Does not meet the requirement. The pentests must be performed by an accredited third-party organization.

Reference: FedRAMP Penetration Test Guidance; NIST SP 800-53 control CA-8; 3PAO accreditation by A2LA.

FISMA

The requirement: FISMA inherits its testing expectations from NIST 800-53, applied through NIST’s Risk Management Framework. The scope and rigor are set by the agency and the system’s categorization.

Can autonomous testing satisfy it? Generally yes, the same logic as the NIST 800-53 control, subject to the agency’s assessment requirements. For systems that also pursue an external authorization with accredited-assessor rules (like FedRAMP), defer to that program’s constraints.

Verdict: Yes, but at agency discretion.

Reference: FISMA via NIST SP 800-53 (CA-8) and NIST SP 800-37 (Risk Management Framework).

International standards

SOC 2

The requirement: SOC 2 doesn’t explicitly require a pentest, but the AICPA (the governing body behind SOC 2) Trust Services Criteria point to one: the monitoring criterion mentions penetration testing as an acceptable evaluation method, and the criterion on detecting new vulnerabilities is supported by active testing. In practice, auditors expect pentest evidence, especially for a Type II report, within the audit period, with remediation and re-test evidence.

Can autonomous testing satisfy it? Yes. An independent third-party test is stronger evidence than an internal test, and the “ongoing or separate evaluations” language points toward continuous testing.

Verdict: Yes. Continuous testing maps directly to “ongoing evaluations.”

Reference: AICPA Trust Services Criteria CC4.1 and CC7.1

ISO/IEC 27001

The requirement: A few of ISO 27001’s Annex A controls are the hooks: one on managing technical vulnerabilities, which calls for “planned, documented and repeatable penetration tests or vulnerability assessments by competent and authorized persons”; one on security testing in development and acceptance; and one on the secure development lifecycle.

Can autonomous testing satisfy it? Yes. The language in the standard is almost a description of autonomous testing, and reproducibility is built into how the agents run. Auditors accept third-party automated and continuous testing as evidence here, and the report and audit trail provide the “documented” part.

Verdict: Yes. “Repeatable” testing is a natural fit.

Reference: ISO/IEC 27001:2022 Annex A; ISO/IEC 27002:2022 controls 8.8, 8.25 and 8.29.

European Regulations

NIS2 Directive

The requirement: NIS2 requires in-scope organizations to handle vulnerabilities and to have policies for assessing how well their security measures work. The implementing regulation spells this out with vulnerability management requirements and automated or manual security tests, penetration tests and vulnerability scans, carried out regularly and after significant changes.

Can autonomous testing satisfy it? Yes, explicitly. NIS2’s implementing regulation is one of the few instruments that names automated testing and penetration testing as acceptable methods. Continuous autonomous testing matches the “regular basis and after significant changes” language, and the report is evidence for both the vulnerability-management and effectiveness-assessment obligations.

Verdict: Yes. Automated and penetration testing are expressly contemplated.

Reference: NIS2 Directive (EU) 2022/2555 Article 21(2)(e) and (f); Implementing Regulation (EU) 2024/2690 Annex points 6.10 and 7.1.

GDPR

The requirement: GDPR requires a process for regularly testing, assessing and evaluating how well your technical and organizational security measures work.

Can autonomous testing satisfy it? Yes. GDPR does not prescribe a method and emphasizes regular testing, so continuous autonomous testing is a stronger demonstration of an ongoing process than an annual PDF. The report evidences both the testing and the remediation loop.

Verdict: Yes, it favors continuous testing.

Reference: GDPR (Regulation (EU) 2016/679) Article 32(1)(d).

Cyber Resilience Act

The requirement: The CRA requires products with digital elements to be placed on the market without known exploitable vulnerabilities and, as part of handling vulnerabilities over the product’s life, to apply effective and regular tests and reviews of the product’s security.

Can autonomous testing satisfy it? Yes. “Effective and regular” testing is what continuous autonomous pentesting delivers across the product lifecycle, and the report supports both the testing obligation and the “no known exploitable vulnerabilities” bar at release.

Verdict: Yes, “regular” testing favors autonomous penetration testing.

Reference: Cyber Resilience Act (Regulation (EU) 2024/2847) Annex I (Part I and Part II point 3); Article 13.

Share:

https://www.aikido.dev/blog/ai-pentesting-for-compliance

Subscribe for news

4.7/5
Tired of false positives?

Try Aikido like 100k others.
Start Now
Get a personalized walkthrough

Trusted by 100k+ teams

Book Now
Scan your app for IDORs and real attack paths

Trusted by 100k+ teams

Start Scanning
See how AI pentests your app

Trusted by 100k+ teams

Start Testing

Get secure now

Secure your code, cloud, and runtime in one central system.
Find and fix vulnerabilities fast automatically.

No credit card required | Scan results in 32secs.