RapidSOS Jobs

Site Reliability Engineering Manager

RapidSOS

Site Reliability Engineering Manager

Posted 7 Days Ago

Be an Early Applicant

In-Office

Boston, MA, USA

185K-215K Annually

Senior level

In-Office

Boston, MA, USA

185K-215K Annually

Senior level

Lead the Site Reliability Engineering operations, ensuring system reliability and helping product teams manage their services. Drive proactive engineering practices, enhance team skills, and oversee cloud infrastructure management.

The summary above was generated by AI

In the time it takes you to read this job description, RapidSOS will have handled ~1,380 emergencies.

At RapidSOS, we are committed to using technology to build a safer, stronger future and working together to save lives. We’re in an exciting phase of growth, welcoming new members from across the globe to our mission-driven, ambitious, and inclusive team. Our work is founded on our values of elevating purpose, inventing tomorrow, delivering with urgency, serving with integrity, and winning together, all of which support a company culture where people can innovate, collaborate, grow, and, above all, make an impact.

RapidSOS is the leading public safety AI company that unlocks mission-critical intelligence for first responders and security teams – enabling faster, smarter and more accurate emergency response. Real-time data from the world’s largest safety network of 700M+ devices, 200+ global enterprises, and 23,000+ federal, state and local agencies fuels the RapidSOS HARMONY AI engine that delivers this intelligence to those who need it most. Learn more at www.RapidSOS.com.

What this role is about:
This is an engineering leadership role, not simply an on-call manager. The SRE Manager owns two things: keeping RapidSOS's cloud infrastructure running reliably, and helping product teams get to a place where they can run their own services without routing every operational issue through SRE. RapidSOS powers real-time emergency response by connecting life-critical data to first responders, so reliability here directly impacts outcomes in moments that matter.

You'll lead the SRE Operations team and report to the Director of SRE & Platform Engineering. The team has real roots in NOC-style operations, and the honest goal of this role is to move it toward something more engineering-focused and proactive: better tooling, better practices, more ownership at the service team level. That's a gradual transition, and you'll be the one shaping how it happens.

What you’ll do:

Own the reliability, scalability, and operational health of RapidSOS Kubernetes clusters, shared services, and core AWS infrastructure; ensure upgrades, capacity planning, node scaling, and testing that multi-region failover actually works
Drive the IaC foundation in Terraform/Atlantis and champion infrastructure-as-code as a core engineering standard
Partner with Engineering Managers to set SLOs for their services, establish error budgets, and help teams build the habits to operate what they ship; the goal is for product teams to own their services, not to have SRE own everything on their behalf
Maintain proactive reliability work: capacity planning, failure mode analysis, runbook quality, and chaos engineering exercises; run reliability reviews before major launches and organize failure mode exercises with product teams
Drive blameless postmortem practice, ensures every significant incident produces systemic improvements with clear ownership and closure
Run the Tier 1 on-call rotation: scheduling for primary and secondary engineers, coordination with the 3rd-party NOC, and keeping incident escalation processes smooth and manageable
Lead incident command on Sev-1s, escalate when needed, and keep engineering leadership informed throughout
Lead and grow a high-impact team by mentoring engineers, owning headcount, and thinking ahead about what the team needs as the function grows
Shape the team’s long-term AI strategy for infrastructure and operations by identifying opportunities for AI-driven automation and insight generation, evaluating tooling and workflows, and operationalizing best practices for scalable team-wide usage
Own reserved instance strategy and the team's AWS cost footprint, error budgets and SLOs across production services and communicate that picture clearly to engineering and product leadership
Work alongside Platform SRE on bigger infrastructure projects: Gateway API adoption, cross-region architecture, security changes

What we’re looking for in our ideal candidate:

7+ years in SRE, platform engineering, or DevOps, with at least two years where you were responsible for a team as a manager
You’ve been directly responsible for Kubernetes and AWS infrastructure in production environments where uptime and resilience are critical
Experience moving a team from reactive ops toward engineering-first reliability practices
You’ve worked collaboratively with engineering teams to proactively improve reliability, scalability, and operational readiness before issues reach production
Ability to write Python,review production-quality scripts, and tooling
You’ve applied SLOs, error budgets, and blameless postmortems in practice to improve reliability and drive better engineering decisionsHands-on familiarity with: Terraform/Atlantis, Kubernetes/Helm/ArgoCD, Datadog, Concourse CI/GitHub Actions, RabbitMQ, and AWS (EKS, RDS/Aurora, ElastiCache, VPC networking, IAM, KMS, Route53)

What we offer:

The chance to work with a passionate team on solving one of the largest challenges globally
Competitive salary and benefits and equity participation
A dynamic, flexible and fun start-up work environment with a highly talented team

If you're curious to learn more about RapidSOS, you can check out https://rapidsos.com/blog/

Starting pay for a successful applicant will depend on a variety of job-related factors, which may include experience, relevant skills, training, education, location, business needs, or market demands. The salary range for this role is $185,000 - $215,000. This role will also be eligible to receive equity options. #LI-Remote

RapidSOS is proud to be an equal opportunity workplace. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, or Veteran status.

Interested in the role but you don’t meet 100% of the requirements? We’d love to hear from you! We encourage you to apply; we’d be excited to see if your unique skill set and experience could be a match.

Similar Jobs

Highmark Health

Manager Site Reliability Engineering

2 Days Ago

In-Office or Remote

52K-84K Hourly

Mid level

52K-84K Hourly

Mid level

Healthtech

Manage the reliability and performance of healthcare IT systems, ensuring efficient operations through collaboration with cross-functional teams, monitoring metrics, and mitigating potential disruptions.

Top Skills: AIAutomationCi/CdCybersecurityHealthcare ItIotNetwork SegmentationTelehealth

MongoDB

Manager, Site Reliability Engineering - Storage Layer Service

13 Days Ago

Easy Apply

Hybrid

Easy Apply

157K-270K Annually

Senior level

157K-270K Annually

Senior level

Big Data • Cloud • Software • Database

Lead a small team of Site Reliability Engineers focused on the storage layer of MongoDB's cloud services. Responsibilities include setting technical direction, managing capacity, and improving operational processes.

Top Skills: AWSAzureCrossplaneGCPKubernetesTerraform

MongoDB

Technical Program Manager

19 Days Ago

Easy Apply

Remote or Hybrid

Easy Apply

126K-248K Annually

Senior level

126K-248K Annually

Senior level

Big Data • Cloud • Software • Database

As a Senior Technical Program Manager, you'll drive program execution, enhance production reliability, and facilitate cross-team coordination for MongoDB's cloud products.

Top Skills: Cloud NetworkingKubernetesMongodb AtlasObservability Stacks

What you need to know about the Boston Tech Scene

Boston is a powerhouse for technology innovation thanks to world-class research universities like MIT and Harvard and a robust pipeline of venture capital investment. Host to the first telephone call and one of the first general-purpose computers ever put into use, Boston is now a hub for biotechnology, robotics and artificial intelligence — though it’s also home to several B2B software giants. So it’s no surprise that the city consistently ranks among the greatest startup ecosystems in the world.

Key Facts About Boston Tech

Number of Tech Workers: 269,000; 9.4% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Thermo Fisher Scientific, Toast, Klaviyo, HubSpot, DraftKings
Key Industries: Artificial intelligence, biotechnology, robotics, software, aerospace
Funding Landscape: $15.7 billion in venture capital funding in 2024 (Pitchbook)
Notable Investors: Summit Partners, Volition Capital, Bain Capital Ventures, MassVentures, Highland Capital Partners
Research Centers and Universities: MIT, Harvard University, Boston College, Tufts University, Boston University, Northeastern University, Smithsonian Astrophysical Observatory, National Bureau of Economic Research, Broad Institute, Lowell Center for Space Science & Technology, National Emerging Infectious Diseases Laboratories