Make SRE About More Than Reliability

What does site reliability engineering look like at LogRocket? The company’s VP of engineering shared insights on the team’s responsibilities, opportunities and successes.

Written by Kim Conway
Published on Feb. 08, 2022
Make SRE About More Than Reliability
Brand Studio Logo

Bugs? Bad code? It happens.

While having engineers who can efficiently resolve such issues is critical, what if the journey to those solutions offered more?

Dedicated as they are to providing product and engineering teams with front-end solutions, the team at LogRocket is no stranger to monitoring website performance and reliability. And Pascal Kriete knows firsthand that time is of the essence in maintaining a technological presence: “When something goes wrong, it goes wrong in a hurry.” That’s why the VP of engineering sought to extend his organization’s reach through the addition of a site reliability team. 

Responsible for “maintaining infrastructure and tooling,” LogRocket’s SRE team is helping the company to no longer rely on the sole expertise of on-call operations engineers. The added presence has allowed their product engineering team to grow in tandem with their deployment capabilities. But to Kriete, SRE is not just about fixes — it doubles as a source of learning opportunities for engineers as well.

Built In Boston spoke with Kriete about the decision to expand LogRocket’s engineering organization, the advantages of having engineers dedicated to site reliability and the skills that make for a good SRE engineer. 


 

LogRocket office
LogRocket

 

Pascal Kriete
VP of Engineering • LogRocket

 

What prompted you to create a site reliability team within your engineering organization?

Most of our engineers are on full-stack product teams that provide end-to-end support for their code. We have always held the belief that these teams should be able to deploy autonomously and continuously — at this point, that means multiple releases per day into an infrastructure that processes billions of events per day. 

Our SRE team is responsible for maintaining infrastructure and tooling that will safely sequence releases, halt deployments of bad code, allow engineers to easily diagnose issues, and let anyone pull the plug and roll back. They also advise our engineering teams on architectural decisions that improve overall robustness.

As a unique additional challenge, we also offer a self-hosted product that runs the same codebase, but must be administered by users who have no prior knowledge of the system. This is often deployed on cloud environments different from our own, where we have more limited experience. That means tooling is key — both for our customers and the solutions architecture team that directly supports them.

SRE should teach more than it fixes. Our engineers are encouraged to treat SRE as a resource, not a decision-maker.”

 

What advantages or improvements have you seen since implementing site reliability engineering, and how do you deal with any potential drawbacks?

Over the last year, we’ve gone from an environment where only operations engineers could be on call to a setup where a motivated engineer can join the incident response rotation. We’ve more than doubled the product engineering team and have gone from one deployment per day to routinely shipping five or six times per day.

The biggest potential drawback for us is that we risk centralizing infrastructure knowledge. We strongly take the view that SRE should teach more than it fixes. Our engineers are encouraged to treat SRE as a resource, not a decision-maker.

 

How is your site reliability team structured, and what skills do you look for in a good SRE engineer?

Our SRE team works alongside our product and solutions architecture teams. We look for engineers who have strong infrastructure skills, are exceptional teachers and are not afraid of digging deep into complex production code. We expect SREs to be driven to automate mundane tasks and be cautious enough to plan and execute complex workflows with minimal error.

 

 

Responses have been edited for length and clarity. Images via LogRocket and Shutterstock.

Hiring Now
Klaviyo
eCommerce • Marketing Tech • Software