Job Title, Company or Keyword

Maximum of 25 job preferences reached.

Top Senior Site Reliability Engineer Jobs in Boston, MA

MongoDB

Site Reliability Engineer (Senior or Staff), Atlas

Reposted 4 Days AgoSaved

Easy Apply

Remote or Hybrid

Boston, MA

Easy Apply

127K-249K Annually

Senior level

127K-249K Annually

Senior level

Big Data • Cloud • Software • Database

As a Senior Site Reliability Engineer, you'll design and build complex systems, support Atlas platform operations, automate processes, and ensure high availability of services.

Top Skills: AWSAzureDnsGCPGoHTTPLinuxPythonRubyTls

Zscaler

Staff Production Engineer (SRE) (Federal)

Reposted 5 Days AgoSaved

Easy Apply

Remote or Hybrid

Boston, MA

Easy Apply

119K-170K Annually

Senior level

119K-170K Annually

Senior level

Cloud • Information Technology • Security • Software • Cybersecurity

As a Staff Site Reliability Engineer, you'll oversee Zscaler production data center services, optimize code, and ensure cloud service availability and performance. Collaborate with cross-functional teams to improve processes and resolve escalated issues.

Top Skills: BashDnsFirewallsGrafanaHTTPIcmpLoad BalancingNagiosOsi ModelPrometheusPythonTcp/Ip

Cohere Health

Site Reliability Engineer ll

Reposted 14 Hours AgoSaved

Easy Apply

Remote

Boston, MA

Easy Apply

100K-110K Annually

Mid level

100K-110K Annually

Mid level

Healthtech • Software

Operate and maintain AWS-hosted MERN applications and large-scale data workflows. Manage serverless and Spark-based pipelines, perform incident response and on-call duties, engineer automation to eliminate operational toil, ensure HIPAA/SOC2/HITRUST compliance, build observability and lead blameless post-mortems.

Top Skills: Amazon EcsAmazon EksAmazon EmrAthenaAws GlueAws LambdaAws SnsAws SqsCloudwatchEc2IamJavaScriptMernMySQLNode.jsOpentofuPysparkPythonRabbitMQTerraformTypescriptVpc

Domino Data Lab

Staff Site Reliability Engineer

Reposted 3 Days AgoSaved

Easy Apply

Remote or Hybrid

Boston, MA

Easy Apply

200K-230K Annually

Senior level

200K-230K Annually

Senior level

Artificial Intelligence • Machine Learning

Lead development of AI-assisted reliability tooling, own incident response end-to-end, improve observability and SLO/SLI frameworks, scale single-tenant SaaS operations, mentor engineers, and reduce recurring operational toil through engineering and automation.

Top Skills: Cloud PlatformsGoKubernetesLinuxLlm/Ai ToolingLogs And TracingObservability ToolingPythonSlo/Sli Frameworks

MongoDB

Site Reliability Engineer (Senior or Staff), Storage Layer Services (SLS)

Reposted 13 Days AgoSaved

Easy Apply

Remote or Hybrid

Boston, MA

Easy Apply

126K-248K Annually

Senior level

126K-248K Annually

Senior level

Big Data • Cloud • Software • Database

The Senior Site Reliability Engineer will develop and support distributed storage services, ensuring reliability and operational safety, with a focus on automation and efficiency.

Top Skills: AWSAzureDnsGoGoogle Cloud PlatformKubernetesLinuxPythonTcp/IpTls

Openly

DevOps/SRE II (Remote, US)

Reposted 5 Days AgoSaved

Remote

Boston, MA

115K-173K Annually

Junior

115K-173K Annually

Junior

Insurance

Build, automate, and maintain cloud infrastructure and CI/CD for Openly's insurance platform. Implement IaC, monitoring, and security best practices; lead incident response and postmortems; reduce operational toil through tooling and automation; influence architecture and deployment decisions.

Top Skills: AirflowAiven DebeziumArcgisBigQueryCircleCICloud FunctionsCloud RunCloudsqlComposerDatadogDonutFivetranGCPGcsGitGoJupyter NotebooksKafkaKubernetesNuxtPostgresPub/SubPythonRSlackSQLTailwindTerraformVuejsWebpackZoom

MongoDB

Site Reliability Engineer (Senior or Staff), Deployments

Reposted 19 Days AgoSaved

Easy Apply

Remote or Hybrid

Boston, MA

Easy Apply

127K-249K Annually

Senior level

127K-249K Annually

Senior level

Big Data • Cloud • Software • Database

Maintain and improve multi-cloud Kubernetes infrastructure, CI/CD (Argo Workflows/ArgoCD), observability, and networking. Build reliable continuous deployment tooling and onboarding flows, provide internal support, collaborate across Platform Engineering, contribute upstream (open-source/operators), and participate in a 24/7 on-call rotation to resolve deployment infrastructure issues.

Top Skills: AlertingArgo WorkflowsArgocdAWSAzureCi/CdContainersDnsGCPGoKubernetesLinuxLoad BalancerObservabilityPythonService MeshTcp/IpTls

DraftKings

Senior Site Reliability Engineer

Reposted 3 Days AgoSaved

Hybrid

Boston, MA

128K-160K Annually

Senior level

128K-160K Annually

Senior level

Digital Media • Gaming • Information Technology • Software • Sports • Esports • Big Data Analytics

Lead the design, automation, and scaling of global compute infrastructure across data centers, cloud, and on-prem. Operate GitOps with Rancher Fleet/Flux/Helm, build self-healing tooling, own cluster autoscaling and capacity strategy, define SLOs using Datadog, and participate in on-call rotation while mentoring peers.

Top Skills: AWSContainerdDatadogDockerFluxGCPGitopsGoHelmHpaInfrastructure As Code (Iac)KarpenterKedaKubernetesLinuxNutanixPythonRancher FleetVsphere

Manifold

Staff Site Reliability Engineer

Reposted YesterdaySaved

In-Office

Boston, MA

160K-205K Annually

Senior level

160K-205K Annually

Senior level

Software

Design, build, and operate multi-account cloud infrastructure using IaC. Automate customer deployments, manage CI/CD, troubleshoot production across infra/data/app layers, and handle networking, security, and compliance for regulated environments while collaborating with platform and professional services teams.

Top Skills: AirflowAuth0AWSAzureDbtDockerEcsGCPGithub ActionsLlmsOktaPackerPostgresSnowflakeTailscaleTerraformWireguard

MongoDB

Senior Site Reliability Engineer, Fleet Management

Reposted 7 Days AgoSaved

Easy Apply

Remote or Hybrid

Boston, MA

Easy Apply

127K-249K Annually

Senior level

127K-249K Annually

Senior level

Big Data • Cloud • Software • Database

Develop and maintain Kubernetes runtime environments, support developers, resolve critical issues, and participate in on-call rotations for production systems.

Top Skills: AWSAzureCert-ManagerCorednsCrdsCriCsiGatekeeperGCPGoHelmKubernetesKustomizeOperatorsPythonTerraform

Runpod

Site Reliability Engineer

22 Days AgoSaved

Remote

Boston, MA

150K-200K Annually

Senior level

150K-200K Annually

Senior level

Artificial Intelligence • Cloud • Software • Infrastructure as a Service (IaaS)

Ensure stability and resilience of Runpod's distributed AI platform by defining SLIs/SLOs, leading incident response, building observability and reliability tooling, automating operational workflows, and partnering with engineering teams to reduce toil and improve production readiness.

Top Skills: BashCi/CdContainerized Production SystemsGoGpu Observability ToolingGrafanaInfrastructure As CodeLinuxPrometheusPython

Cisco

Site Reliability Engineer

Reposted 3 Days AgoSaved

In-Office or Remote

Boston, MA

138K-256K Annually

Junior

138K-256K Annually

Junior

Cloud • Information Technology • Internet of Things • Professional Services • Software

Design, build, and maintain automation and deployment tooling to improve reliability and scalability across global cloud environments. Troubleshoot distributed systems, develop CI/CD and testing frameworks, automate cluster and environment provisioning, and collaborate with engineering and product teams to reduce operational overhead and support large-scale platform growth.

Top Skills: AnsibleGitlab CiLinuxRspecRuby

New

Track Smarter, Apply Better.

Ditch the spreadsheets. Organize your job search with our freeApplication Tracker.

Use For Free

Tulip

Senior Site Reliability Engineer

9 Days AgoSaved

Easy Apply

Hybrid

Boston, MA

Easy Apply

160K-200K Annually

Senior level

160K-200K Annually

Senior level

Enterprise Web • Hardware • Internet of Things • Software

Lead observability and reliability efforts: mentor teams on SLIs/SLOs, maintain triage/remediation workflows, perform incident response, debug production systems, and design core infrastructure and tooling for engineering teams.

Top Skills: AlloyClaude SkillsGemini GemsGoGrafanaKubernetesLokiMimirMongoDBOpentelemetryPostgresPrometheusPromqlTempoTypescript

Zscaler

Site Reliability Engineer-SkillBridge Intern

Reposted 22 Days AgoSaved

Easy Apply

Remote or Hybrid

Boston, MA

Easy Apply

Internship

Cloud • Information Technology • Security • Software • Cybersecurity

This internship role focuses on SRE skills, requiring collaboration and problem-solving in dynamic environments for Zscaler's Zero Trust Exchange team.

Top Skills: AnsibleAws EcsKubernetesLinuxPythonTerraform

GitLab

Site Reliability Engineer, Cloud Cost Utilization

Reposted 23 Days AgoSaved

Easy Apply

Remote

Boston, MA

Easy Apply

Mid level

Cloud • Security • Software • Cybersecurity • Automation

As a Cloud Cost Utilization SRE at GitLab, you'll manage cloud spending, improve tracking and optimization of cloud usage, and collaborate with finance and engineering teams to enhance cost efficiency across AWS and GCP.

Top Skills: AnsibleAWSElkGCPGrafanaLokiMimirPrometheusTempoTerraform

Cooley

Senior Technology Site Reliability Engineer

Reposted 5 Days AgoSaved

In-Office or Remote

Boston, MA

140K-205K Annually

Senior level

140K-205K Annually

Senior level

Information Technology • Legal Tech

The Senior Technology Site Reliability Engineer is responsible for maintaining and optimizing infrastructure and applications, ensuring reliability and performance while automating processes and collaborating with teams.

Top Skills: AWSChefDatadogGoGrafanaJavaPrometheusPuppetPythonSaltTerraform

Watershed Informatics

Site Reliability Engineer

Reposted 5 Days AgoSaved

In-Office

Boston, MA

Mid level

Cloud • Information Technology • Biotech

The Site Reliability Engineer will build and deploy Linux servers, research technologies, monitor system performance, and resolve technical incidents.

Top Skills: Infrastructure-As-CodeLinuxNetworkingVirtualization

Akamai Technologies

Site Reliability Engineer II

Reposted 5 Days AgoSaved

In-Office or Remote

Boston, MA

95K-171K Annually

Junior

95K-171K Annually

Junior

Cloud • Security • Software • Cybersecurity

As a Site Reliability Engineer II, you'll automate tasks, monitor AI workloads, enhance dashboards, support CI/CD processes, and collaborate with engineering teams on complex issues while participating in on-call rotations.

Top Skills: GoGrafanaKubernetesLinuxPrometheusPythonSaltstackTerraform

MongoDB

Site Reliability Engineer (Senior or Staff), Infrastructure Security

Reposted 25 Days AgoSaved

Easy Apply

Remote or Hybrid

Boston, MA

Easy Apply

127K-249K Annually

Senior level

127K-249K Annually

Senior level

Big Data • Cloud • Software • Database

The Senior Site Reliability Engineer will lead security design and implementation for cloud infrastructures, mentor teams, and automate security solutions.

Top Skills: AnsibleAWSAzureCloud Security ToolsCloudFormationGCPGoTerraform

QuEra Computing

Control System Engineer/Site Reliability Engineer (SRE)

Reposted 7 Days AgoSaved

In-Office

Boston, MA

160K-225K Annually

Senior level

160K-225K Annually

Senior level

Hardware • Quantum Computing

Lead integration, maintenance, and automation of heterogeneous hardware and software control systems for quantum computers. Manage networked lab infrastructure, CI/CD pipelines, observability, and provisioning. Support incident response, testing, and orchestration, collaborating with software, hardware, and test teams to ensure reliability and operational readiness of development and production environments.

Top Skills: AnsibleBashCi/CdDebianDhcpDnsDockerElkGitGitlab CiGoGrafanaHardware-In-The-Loop (Hil)JenkinsKubernetesLanLogging SystemsPrometheusPythonRack-Mount ServersRed HatRoutersSwitchesTcp/IpTerraformUbuntuVlanWanWindows

Akamai Technologies

Site Reliability Engineer

Reposted 7 Days AgoSaved

In-Office or Remote

Boston, MA

76K-136K Annually

Mid level

76K-136K Annually

Mid level

Cloud • Security • Software • Cybersecurity

Design, develop, test, and operate scalable infrastructure and services for Akamai Cloud. Implement and manage Infrastructure-as-Code (Terraform and similar tools), CI/CD, and observability. Automate reliability improvements, mentor engineers, collaborate on incident response and root-cause remediation, and participate in on-call rotations.

Top Skills: Alerting)AnsibleChefCi/CdInfrastructure As CodeLinuxLoggingObservability (MonitoringPuppetSaltstackTerraform

Onebrief

Senior Site Reliability Engineer (Arlington, VA) - Secret Clearance Required - Relocation Provided

4 Days AgoSaved

Remote

Boston, MA

180K-220K Annually

Senior level

180K-220K Annually

Senior level

Software • Defense

Work as an SRE embedded with product teams to improve reliability by fixing application code (primarily TypeScript), building observability (Prometheus, Loki, Grafana, Alloy), defining SLIs/SLOs, leading incident response and postmortems, automating toil, and supporting deployments across on‑prem DoD and AWS environments.

Top Skills: AlloyAWSBashContainersDockerGithub ActionsGitlab Ci/CdGoGrafanaJenkinsKubectlKubernetesLokiNode.jsPrometheusPythonTypescript

Cambridge Mobile Telematics

Principal Site Reliability Engineer, Machine Learning

8 Days AgoSaved

In-Office

Boston, MA

142K-178K Annually

Senior level

142K-178K Annually

Senior level

Software

Lead SRE ownership of ML platform SLOs and operational health for Ray on EKS and Databricks on EC2. Maintain observability with CloudWatch/Datadog, tune autoscaling and GPU scheduling, manage Databricks workspaces and IAM, optimize cost/capacity, codify infrastructure with Terraform and CI/CD, lead incident response and postmortems, perform security/OS maintenance, and participate in on-call rotation.

Top Skills: Amazon LinuxAws Ec2Aws EksCi/CdCloudwatchDatabricksDatadogDockerDynamoDBEc2 SpotEcsGpu SchedulingIamKubernetesLambdaOn Demand Capacity ReservationsPythonRayRds/AuroraS3SqsTerraformUbuntuUnity Catalog

Nebius

Site Reliability Engineer

Reposted 14 Hours AgoSaved

Remote

Boston, MA

100K-140K Annually

Mid level

100K-140K Annually

Mid level

Artificial Intelligence • Information Technology • Consulting

The Linux Systems Administrator will maintain and troubleshoot Linux systems, support network services, and work on systems integration while collaborating with infrastructure teams.

Top Skills: DhcpDnsLinuxNtpPython

Axon

Site Reliability Engineer II

10 Days AgoSaved

In-Office

Boston, MA

116K-165K Annually

Senior level

116K-165K Annually

Senior level

Artificial Intelligence • Cloud • Social Impact • Software • Wearables

Build and maintain the Zero Touch Platform: develop Temporal workflows and Go services to automate cloud operations, own production systems (SLOs, on-call, incidents), debug cloud-native distributed systems, create CI/CD and IaC automation, and produce documentation to enable self-service across engineering teams.