Aikido

Balena uses AI pentesting to deeply validate complex APIs and eliminate friction from security testing

450,000+
API requests executed
350
targeted SAML requests

At a glance

  • Replaces frustrating manual pentests with context-aware AI testing
  • Uses white-box testing to validate complex OData-based APIs
  • Generates hundreds of thousands of valid API requests for deep coverage
  • Improves security confidence without vendor friction
  • Scales testing across features, regions, and deployments

Challenge

For Balena, security is not just about compliance. It is about proving that a highly complex IoT platform is secure by design. Balena provides IoT fleet management for embedded Linux devices, enabling customers to deploy and manage applications across fleets ranging from thousands to hundreds of thousands of devices. As their customer base grew, so did expectations around security.

The company achieved ISO 27001 certification in 2024 and is currently pursuing SOC 2 Type 2. As part of this effort, penetration testing became essential.

But manual pentesting created more friction than clarity.

Findings were often difficult to interpret and sometimes fundamentally misunderstood how Balena’s systems worked. For example, pentesters flagged the use of symmetric JWT signing as a vulnerability, despite it being a deliberate and valid design choice within Balena’s architecture.

“After every manual pentesting engagement, the conclusion was the same: next time we need to find someone else.”

Instead of building confidence, pentests became a recurring source of frustration.

Why Balena turned to AI pentesting

Balena discovered Aikido through security research and community exposure, including OWASP events and ongoing work in the Node.js ecosystem. At the same time, the team was becoming more comfortable with AI-assisted development tools, which made the idea of AI-driven pentesting a natural next step.

Initially, the decision to try AI pentesting was pragmatic.

“Aikido’s AI pentest was affordable compared to manual pentesting. What stood out immediately was the ability to provide context.”

Instead of relying on generic scanning techniques, Aikido could access Balena’s codebase and be guided using domain-specific knowledge. This shifted the question from finding a better vendor to understanding how to best use AI for security testing.

“Frustration led to automation, and automation led us to AI... finally breaking the cycle of constraining manual audits.”

Running the AI pentest

Getting started required minimal effort. Balena connected their repositories, configured the scope, and launched the test without legal or operational delays.

“It was fairly easy to just get a trial and click go. No major roadblocks.”

The team used a white-box approach, giving the AI access to their code and data model. Crucially, they instructed the AI to follow the OData specification, which defines how their API operates.

This made a significant difference. Previous pentesters had struggled to even construct valid OData requests. In contrast, the AI was able to interpret the specification, read the data model, and generate complex, valid queries. The result was a fundamentally different level of testing depth.

What the AI pentest delivered

The AI pentest generated over 450,000 API requests within standard working hours, many of which were syntactically correct and returned valid responses.

This level of precision stood out immediately.

“We’ve never seen this depth of OData query usage from any human pentester.”

Rather than sending irrelevant or generic attack payloads, the AI focused on realistic interactions with the system. It also uncovered meaningful issues early, even during a simple trial run.

Beyond scale, the AI demonstrated a level of context-aware testing that was missing from previous engagements.

When testing a new SAML integration, the AI identified the relevant code across repositories and generated roughly 350 targeted requests against those endpoints. It actively tested tenant isolation and permissions by chaining together object, organization, and user IDs, validating that users could not access data outside their scope.

Importantly, the value was not tied to finding a single critical vulnerability. Instead, it came from confidence in the testing process itself. The AI demonstrated that it understood the system and could explore it in ways that aligned with how the API actually works.

This eliminated a major source of friction that Balena experienced with manual pentesting.

“Now the question is not who we should hire next. It’s how we use the AI better and how much budget we want to allocate.”

Results

For Balena, the impact of AI pentesting is best understood as a shift from compliance-driven testing to precise, system-aware validation.

Instead of spending time correcting misunderstandings from external testers, the team can focus directly on improving security. Audit requirements can be met without the internal overhead of re-explaining architecture or validating incorrect findings.

At the same time, the ability to launch tests without legal or operational delays changes how security fits into product development. New features such as SAML integrations or new geolocations can be tested immediately, providing fast and credible security validation.

Transparency also improves. Rather than relying on static reports, Balena can show exactly what was tested through detailed request logs and agent traces.

ROI vs manual pentesting

Compared to previous manual engagements, AI pentesting delivered higher quality results at a lower cost. The biggest difference was operational. Manual pentesting required onboarding cycles with introductory calls, briefings, and access provisioning. With AI, that overhead disappears entirely.

Engineering efficiency also improved. Instead of deciphering static PDF reports and reproducing findings manually, engineers can directly reuse the exact scripts generated by the AI to validate and fix issues.

The depth of coverage is also materially different. Hundreds of thousands of requests, including complex OData queries, were executed within standard working hours. This level of scale and precision had not been achieved with human pentesters.

“With AI whitebox testing, sharing findings has never been easier. This direct code-level mapping frees our engineers to analyze complex logic and implement actual fixes, rather than debating false positives.”

Future outlook

Balena sees AI pentesting as a capability that improves with iteration.

Today, a significant portion of the testing budget is spent on the AI learning the system before moving into deeper attack paths. Over time, the goal is to reduce this discovery phase so that more effort can be focused on high-impact analysis.

Another opportunity lies in reporting. While the raw logs and traces provide full transparency, their volume makes them difficult to consume. A concise summary of attack strategies, successful vectors, and dead ends would make results easier to communicate and act on.

Looking ahead, Balena is particularly interested in a model where testing builds on previous runs, allowing the AI to retain context and continue exploring the system rather than starting from scratch.

Working with Aikido

Beyond the technology, the collaboration itself stood out.

“Fast results, honest communication, and zero empty sales promises. They gave us the time we needed to finish our vendor assessment without any pressure. Having actionable results in days, plus direct access to their engineers to tune the AI, is exactly the kind of partnership we want.”

Curious to read Balena's experiences using Aikido? Check out their blog post here.

Get secure now

Secure your code, cloud, and runtime in one central system.
Find and fix vulnerabilities fast automatically.

No credit card required | Scan results in 32secs.