How These Engineering Teams Use AI to Ship Fast, Safely and Meaningfully

See how engineering teams at RapDev, Snyk, Gradient AI and WHOOP measure quality, protect uptime and ship software quickly and safely.

Written by Taylor Rose
Published on Feb. 17, 2026
A close up of hands typing on a laptop with holographic images of code and AI icons floating in the foreground.
Image: Shutterstock
Summary: Senior engineers at RapDev, Snyk, Gradient AI and WHOOP describe how their teams measure software quality while using AI to increase development velocity. 

Engineers have an ongoing challenge: how to measure quality. 

Why? Because every release, update, sprint and launch can all be measured in wildly different ways. 

So how do you measure the effectiveness of your team?  

RapDev senior engineering manager, Kyle Brueckner, has a formula.

“When the system is healthy, velocity rises without trading off safety,” Brueckner said.

Andrew Oates, distinguished engineer at Snyk, said his team watches frequency. 

“The key to fast, safe releases is frequency — which is also a useful KPI,” Oates said. “The more frequent your releases, the less risk there is in each one and you exercise your release process and infrastructure end-to-end each time as well.” 

Meanwhile, Software Engineering Director Joshua Reback said that Gradient AI’s measure of quality can’t be boiled down to one metric.

“We do not target a single rule to guarantee fast and safe software releases,” Reback said.  “Instead, reliability comes from a combination of proven engineering practices that reduce risk while preserving velocity.” 

Built In spoke with four engineering leaders to hear in detail about how each team uses AI in production to ship fast, safely and meaningfully.

 

Image of Kyle Brueckner
Kyle Brueckner
Sr. Engineering Manager

RapDev helps customers become leaders in the race to deploy code faster as they upscale their operations.

 

What’s your rule for fast, safe releases — and what KPI proves it works?

My rule is: Make every release small, reversible and observable. If a change can’t be traced, tested and rolled back in minutes, it doesn’t ship. We lean on trunk-based development, test CI gates, automated change creation/approvals via ServiceNow DevOps and low‑friction ChatOps approvals; then let Datadog telemetry decide whether we continue, canary or roll back. The KPIs are the DORA set (deployment frequency, lead time, change failure rate, mean time to recovery) and the percent of changes auto approved with a complete audit trail. When the system is healthy, velocity rises without trading off safety.

 

What standard or metric defines “quality” in your stack?

Quality in my stack is defined by “change confidence,” which is the degree to which we can predict a change will behave in production before it ever ships. Practically, that’s a blend of correctness, operability and maintainability enforced by the toolchain. The metric I anchor on is escaped defects per release, normalized by change size, paired with SLO impact (error budget burn attributable to deployments). If we’re building quality, both trend down even as deployment frequency rises. We drive it with explicit standards, like test coverage, contract tests for integrations, static analysis/security scanning as non-negotiable gates and progressive delivery (canaries/feature flags) with automated rollback tied to Datadog monitors. A release is “high quality” when it’s boring, meaning it passes gates, is observable by default and doesn’t move SLOs in the wrong direction.

What it’s like to work on the engineering team at RapDev

“Quality in my stack is defined by ‘change confidence,’ which is the degree to which we can predict a change will behave in production before it ever ships.”

Name one AI/automation that shipped recently and its impact on your team or the business.

We recently shipped Arlo for Remote Execution with a specific customer: an AI agent paired with a lightweight endpoint agent on end user PCs. Arlo can now issue a command that the PC agent securely picks up via an outbound only connection, then automatically runs a guarded runbook collecting diagnostics, executing repair scripts and confirming the fix with structured results back to Arlo. We treated it like production automation from day one — allow‑listed, signed scripts, least‑privilege execution, rate limits, approval hooks for sensitive actions and a complete audit trail so every step is reviewable. This leads to faster triage and resolution for common endpoint issues, fewer escalations and a better employee support experience. In practice, “time to diagnose” moved from back-and-forth chats to near instant signals and a meaningful slice of tickets were resolved end to end without human intervention, freeing support/IT engineers to focus on high leverage work while improving SLA adherence.

 

 

Image of Andrew Oates
Andrew Oates
Distinguished Engineer

Snyk is a developer security platform that makes it easy for development teams to find, prioritize and fix security vulnerabilities in code, dependencies, containers and cloud infrastructure — and do it all right from the start.

 

What’s your rule for fast, safe releases and what KPI proves it works?

The key to fast, safe releases is frequency — which is also a useful KPI. The more frequent your releases, the less risk there is in each one and you exercise your release process and infrastructure end-to-end each time as well. This also improves your speed (latency to production).

I’ve found that infrequent releases, even with a highly disciplined engineering team, are very difficult to consistently land for systems of any real complexity.

Forcing yourself to do frequent releases (for back-end systems, weekly is what I’ve found to be ideal) will force investment in all other parts of your release process to make them high-quality, automated and reliable. If you’re able to consistently stick to those releases, and aren’t frequently having incidents in production, that’s a strong signal that everything is working well. We care deeply about developer trust — a release process that causes stress isn’t sustainable.

In addition to release frequency, we also look at change failure rate and time to restore service, because they tell you whether you’re shipping fast and safely.

What it’s like to work on the engineering team at Snyk

“We care deeply about developer trust — a release process that causes stress isn’t sustainable.” 

A key requirement for frequent, predictable releases is rollback safety; releases should always be rollback-safe, ideally for more than one version. This requires a modest amount of engineering discipline, but I’ve found that if you’re thinking about change management and APIs correctly, rollback safety becomes a natural outcome.

Ultimately, it only works if it’s a team sport — product, platform, security and engineering all aligned around the same goal: shipping quickly while maintaining trust.


What standard or metric defines “quality” in your stack?

Quality, for me, boils down to three key metrics. We define quality through outcomes, not opinions.

Quality metrics on Snyk’s engineering team

  • Customer satisfaction: Are customers (internal or external) getting the results and experience that they want from the application? Is it useful, compelling and differentiated – and does it remove friction for the people using it?
  • Reliability: Does the application reliably deliver for clients?
  • Velocity: Is the team able to innovate, deliver and pivot quickly? Is our stack helping us deliver or slowing us down?

Any engineering project needs a balanced diet of these, though it will often be correct to trade off one for another based on business needs. For example, by trading reliability for velocity on an early exploratory PoC; or the reverse on a core infrastructure component that is seldom changed and must be stable.

One of the best ways to drive quality is to have this conversation explicitly. Engineering is not about building the best thing — it’s about building the optimal thing, with constraints like cost and time-to-market as critical levers. Be explicit with your quality goals up front, what you’re optimising for and don’t be afraid of changing those goals over time.

 

Name one AI/automation that shipped recently and its impact on your team or the business.

One of the most exciting parts of my job right now is seeing the experimentation with different workflows as the tools evolve day to day. AI tools are incredibly powerful, but we’re all still figuring out how to use them safely and effectively — and we think that’s exactly where the industry needs to be: moving fast, but responsibly.

We launched Snyk Studio last fall to address this gap with AI-driven development. Snyk Studio helps ensure code is secured the moment it’s generated, before a human even needs to review it, no matter what tool your developers use.

We’ve rolled out Snyk Studio to all our engineers internally and received great feedback from the team on where it has improved their workflows and how to integrate it seamlessly. Getting that early internal feedback was crucial to the launch, and it means our developers are actively contributing to Snyk’s AI security future each time they write code.

The goal isn’t AI for AI’s sake — it’s helping teams ship faster without compromising security, and making secure development feel natural in day-to-day work.

 

 

Image of Joshua Reback
Joshua Reback
Software Engineering Director

Gradient AI is an AI and machine learning company that focuses on the insurance industry. 

 

What’s your rule for fast, safe releases — and what KPI proves it works?

We do not target a single rule to guarantee fast and safe software releases. Instead, reliability comes from a combination of proven engineering practices that reduce risk while preserving velocity.

We prioritize techniques such as feature flagging, schema-safe database changes and blue-green deployments with segmented rollouts. Together, these approaches allow teams to ship continuously, limit blast radius and recover quickly when issues arise.

The effectiveness of this model is ultimately measured through outcomes rather than intent. Sustained system uptime, low customer-visible incident rates and the ability to roll back or disable changes without disruption are the clearest indicators that fast, safe releases are working as designed.

 

What standard or metric defines “quality” in your stack?

Quality in our stack is defined by predictable outcomes under change. Rather than treating quality as a single metric or a static bar, we evaluate it through a small set of operational signals that demonstrate the system can evolve safely and reliably.

At the platform level, sustained system uptime, low customer-impacting incident rates and fast mean time to recovery (MTTR) indicate that changes are well-designed and well-contained. At the delivery level, a low change failure rate and the ability to deploy frequently without service degradation show that quality is built into the pipeline, not inspected after the fact.

Finally, architectural standards, such as backward-compatible schema changes, feature-flagged rollouts and automated validation, act as guardrails that make these outcomes repeatable. In our view, quality is proven not by how rarely we change the system, but by how confidently and safely we can change it.

What it’s like to work on the engineering team at Gradient AI

“In our view, quality is proven not by how rarely we change the system, but by how confidently and safely we can change it.” 

Name one AI/automation that shipped recently and its impact on your team or the business.

The utility of AI coding has reached an inflection point and allowed us to dramatically speed up many engineering activities.

We’ve been doing a lot of recent work to build out a new UI-heavy application. This application features multiple trends, charts, year-over-year views. This application is meant to enable one of Gradient’s client personas to gain and present insights to their clients on how to more profitably manage a book of insurance business. Thus, the ability to slice and dice the data using different categorical and date-driven filters and render the resulting data and trends in myriad ways, is critical.

AI coding assistants allowed us to take a mockup of the UI prepared by a designer and generate nearly production ready code in a little over a day of full-time engineering work. This sort of engineering output would have taken weeks prior to the advent of capable AI assistants. This has allowed us to spend less time turning the crank on writing code to mirror the desired UI and spend more time on product design, nailing down business logic, thinking through edge cases and the many other things that make or break the success of a product.

 

 

Image of Viviano Cantu
Viviano Cantu
Staff AI Engineer

WHOOP is a wearable healthtech company. 

 

What’s your rule for fast, safe releases — and what KPI proves it works?

I’ve found that speed comes from ownership. We decentralize so that engineers own domains, not tasks, and that ownership doesn’t end when code merges. You own the outcome, which means following through to make sure what you shipped actually works the way you intended.

We start projects with a tech plan to get alignment and catch bad ideas early. Then we break work into two-week chunks maximum. If it takes longer, you haven’t split it up enough. The KPI that matters? User metrics. Everything else (PRs shipped, A/B tests launched) is just a leading indicator.

What it’s like to work on the engineering team at Whoop

“We decentralize so that engineers own domains, not tasks and that ownership doesn’t end when code merges. You own the outcome, which means following through to make sure what you shipped actually works the way you intended.” 

What standard or metric defines “quality” in your stack?

With AI, errors don’t look like traditional errors. The failure modes are much more subtle and hard to track down. A bad response doesn’t throw an exception, it just gets ignored. Users quietly lose trust and you can’t trace that in a log file. So we built a system where AI grades our AI.

We have a set of “golden questions” that cover the range of things members actually ask. Automated agents grade responses on personalization, refusal rate, correctness. We validated these agents against human reviewers to make sure their judgment holds up. We also run this on live production traffic, not just test environments, so we catch regressions before they compound.

 

Name one AI/automation that shipped recently and its impact on your team or the business.

I’m most proud of the Advanced Labs Uploads. Members can upload bloodwork from any lab as a PDF or screenshot, in any language, and we extract the biomarkers automatically. Before writing production code, we built an eval set from manually graded reports. Tested different orchestration approaches, iterated on prompts until accuracy was solid. That eval work upfront is what lets us ship without second-guessing ourselves.

The feature has been a great success. We’ve seen a ton of uploads since launch and members can now see their lab results alongside their WHOOP data and get coaching from WHOOP Coach on what it all means.

 

 

Responses have been edited for length and clarity. Images provided by Shutterstock or listed companies.

Explore Job Matches.