Virta Health

Senior Site Reliability Engineer

Posted 5 Hours Ago

Remote

Hiring Remotely in USA

167K-216K

Senior level

Remote

Hiring Remotely in USA

167K-216K

Senior level

As a Senior Site Reliability Engineer at Virta Health, you'll build automation and tooling for reliability, enhance observability, and mentor engineering teams in best practices.

The summary above was generated by AI

Virta Health is on a mission to transform diabetes care and reverse the type 2 diabetes epidemic. Current treatment approaches aren’t working—over half of US adults have either type 2 diabetes or prediabetes. Virta is changing this by helping people reverse type 2 diabetes through innovations in technology, personalized nutrition, and virtual care delivery reinvented from the ground up. We have raised over $350 million from top-tier investors, and partner with the largest health plans, employers, and government organizations to help their employees and members restore their health and live diabetes-free. Join us on our mission to reverse diabetes in 100M.

As an SRE on the Infrastructure team at Virta, you will be building the foundation that will help our company move as fast as possible while meeting security and compliance requirements. Key projects for the team over the next two quarters include:

Implement an AI‑driven observability and metrics platform that automatically detects anomalies and highlights SLO risks, enabling product teams to make data‑driven decisions.
Enhancing system observability, reliability, and efficiency using off-the-shelf technology combined with internal tools developed in Python and Go to increase transparency and visibility into our systems as well as centralizing data.
Building out more products for our Product Development teams like observability (SLOs, alerting, dashboards) modules to allow them to spin up an MVP out of the box.
Improving incident readiness with better tooling and the right hygiene practices such as game days.
Engage with feature development teams in toil reduction exercises, capacity planning, load testing, SLO process, and other best practices — partnering with product teams to replace manual capacity planning with predictive/AI-driven scaling models and to codify self-healing runbooks that minimize toil
Improving the velocity and quality of our developer platform and tooling
General AI fluency desired: comfortable with concepts like prompt engineering, operational chatbots, and AI-assisted workflows to accelerate incident response and reliability improvements

We are in the midst of re-defining our incident response tooling/strategy, improving test tooling, and developing a strategy to ensure all applications are performant and available. Joining Virta would make you one of the key people defining and driving the future vision of what reliability and observability should look like.

Responsibilities

Ship automation and tooling that reduces toil, with high-quality, well-structured code.
Design and codify self-healing workflows and guardrails to minimize toil and improve reliability.
Steward SLO dashboards enhanced with AI/ML-assisted insights, leveraging AIOps-style observability to surface anomalies, predict error-budget burn, and improve signal quality across golden signals
Integrate load-testing into reliability engineering efforts, ensuring outcomes directly inform SLOs, scaling strategies, and capacity planning.
Partner with product teams to replace manual capacity planning with predictive/AI-driven scaling models and implement burn-rate based alerting.
Coach and mentor engineers; champion best practices and pragmatic reliability trade-offs.

90 Day Plan

Within your first 90 days at Virta, we expect you will do the following:

Teach and inspire other engineering team members through knowledge sharing, pair programming, and giving feedback during code reviews
Propose and implement one or more process improvements related to reliability and observability to make our engineering team even better
Deliver a proof-of-concept for an AIOps initiative, demonstrating how a manual reliability or observability process can be transformed into automation to reduce toil and improve insight

Must-Haves

Highly proficient in shipping backend code in high-quality production environments, with strong hands-on coding and automation expertise, and a deep understanding of reliability and production readiness practices
Hands-on expertise with automation and infrastructure-as-code (Terraform modules preferred), ideally with experience in observability
Experience designing and implementing highly observable, scalable systems — with a proven track record configuring AIOps / ML-based monitoring platforms — that support large numbers of users while reducing operational burden
Applied and general AI fluency: ability to leverage AI/ML-assisted observability (e.g., anomaly detection, error-budget burn prediction) while also being comfortable with concepts like prompt engineering, operational chatbots, and AI-assisted workflows to accelerate incident response and reliability improvements
Growth mindset and craftsmanship: ability to coach, mentor, and evangelize AI-first insights while continually improving engineering practices and following best practices

Values-driven culture

Virta’s company values drive our culture, so you’ll do well if:

You put people first and take care of yourself, your peers, and our patients equally
You have a strong sense of ownership and take initiative while empowering others to do the same
You prioritize positive impact over busy work
You have no ego and understand that everyone has something to bring to the table regardless of experience
You appreciate transparency and promote trust and empowerment through open access of information
You are evidence-based and prioritize data and science over seniority or dogma
You take risks and rapidly iterate

Is this role not quite what you're looking for? Join our Talent Community and follow us on Linkedin to stay connected!

As part of your duties at Virta, you may come in contact with sensitive patient information that is governed by HIPAA. Throughout your career at Virta, you will be expected to follow Virta's security and privacy procedures to ensure our patients' information remains strictly confidential. Security and privacy training will be provided.

Virta has a location based compensation structure. Starting pay will be based on a number of factors and commensurate with qualifications & experience. For this role, the compensation range is [min of $167,249 - $216,000. Information about Virta’s benefits is on our Careers page at: https://www.virtahealth.com/careers.

As a remote-first company, our team is spread across various locations with office hubs in Denver and San Francisco.
Clinical roles: We currently do not hire in the following states: AK, HI, RI
Corporate roles: We currently do not hire in the following states: AK, AR, DE, HI, ME, MS, NM, OK, SD, VT, WI.

#LI-remote

Top Skills

Aiops

Python

Terraform

Similar Jobs

Coinbase

Senior Site Reliability Engineer

3 Days Ago

Remote

United States

186K-219K Annually

Senior level

186K-219K Annually

Senior level

Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3

As a Senior Site Reliability Engineer, you will manage corporate IAM systems, develop cloud-native applications, and enhance automation while ensuring system reliability and security.

Top Skills: AnsibleAzure AdC#DockerDuoGoGoogle WorkspaceJavaKubernetesOktaPingPythonRubyTerraform

Coinbase

Senior Site Reliability Engineer

3 Days Ago

Remote

United States

140K-165K Annually

Senior level

140K-165K Annually

Senior level

Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3

The Senior Site Reliability Engineer will enhance system reliability and observability, support cloud deployment optimizations, provide mentorship, and improve incident management while ensuring software quality and operational integrity.

Top Skills: AWSAzureDatadogDockerEc2GCPGoKibanaKubernetesRubyTerraform

ServiceNow

Senior Site Reliability Engineer

3 Days Ago

Remote or Hybrid

San Diego, CA, USA

127K-215K Annually

Senior level

127K-215K Annually

Senior level

Artificial Intelligence • Cloud • HR Tech • Information Technology • Productivity • Software • Automation

As a Senior Site Reliability Engineer, you'll maintain the reliability and performance of the ServiceNow cloud infrastructure, driving automation and technical resolutions across hardware and applications for US Public Sector clients.

Top Skills: AutomationAWSAzureCloud TechnologiesDevOpsJavaScriptLinuxMariadbMySQLPostgresPythonScripting

What you need to know about the Boston Tech Scene

Boston is a powerhouse for technology innovation thanks to world-class research universities like MIT and Harvard and a robust pipeline of venture capital investment. Host to the first telephone call and one of the first general-purpose computers ever put into use, Boston is now a hub for biotechnology, robotics and artificial intelligence — though it’s also home to several B2B software giants. So it’s no surprise that the city consistently ranks among the greatest startup ecosystems in the world.

Key Facts About Boston Tech

Number of Tech Workers: 269,000; 9.4% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Thermo Fisher Scientific, Toast, Klaviyo, HubSpot, DraftKings
Key Industries: Artificial intelligence, biotechnology, robotics, software, aerospace
Funding Landscape: $15.7 billion in venture capital funding in 2024 (Pitchbook)
Notable Investors: Summit Partners, Volition Capital, Bain Capital Ventures, MassVentures, Highland Capital Partners
Research Centers and Universities: MIT, Harvard University, Boston College, Tufts University, Boston University, Northeastern University, Smithsonian Astrophysical Observatory, National Bureau of Economic Research, Broad Institute, Lowell Center for Space Science & Technology, National Emerging Infectious Diseases Laboratories