Get the job you really want.
Maximum of 25 job preferences reached.
Top Senior Site Reliability Engineer Jobs in Boston, MA
Big Data • Cloud • Software • Database
Lead a 6–8 person team managing the Kubernetes fleet and core runtime components (CoreDNS, cert-manager, Gatekeeper). Define technical vision and roadmap, guide migration from Terraform to Operator-driven lifecycle management, perform hands-on architectural reviews and PR reviews, resolve operational incidents, and collaborate with engineering leaders and stakeholders.
Top Skills:
Kubernetes,Coredns,Cert-Manager,Gatekeeper,Terraform,Crossplane,Operators,Aws,Gcp,Azure,Service Mesh,Load Balancing,Observability,Alerting,Containerization
Reposted 2 Days AgoSaved
Easy Apply
Easy Apply
Big Data • Cloud • Software • Database
This role involves building and maintaining observability services, ensuring service reliability, and collaborating with other teams on best practices.
Top Skills:
AWSFluentbitGCPJaegerKubernetesAzureQuickwitSplunkVectorVictoriametrics
Enterprise Web • Hardware • Internet of Things • Software
The Senior Site Reliability Engineer will mentor teams on observability practices, architect systems for growth, automate developer tasks, and debug production issues.
Top Skills:
GoKubernetesLgtm StackOpentelemetryPrometheusTypescript
Information Technology • Web3
The Site Reliability Engineer manages AWS Kubernetes infrastructure, ensuring operational excellence, security, and scalability, while implementing reliability improvements and best practices.
Top Skills:
ArgocdAWSBashDatadogEksGoKafkaKubernetesPostgresPythonSysdigTerraform
Marketing Tech • Social Media • Software • Analytics • Business Intelligence
As a Site Reliability Engineer, you'll design scalable systems, drive infrastructure initiatives, improve security, and collaborate across teams to enhance system resilience. You'll also investigate failures and contribute to security tooling while building your skills in a supportive environment.
Top Skills:
AnsibleAWSChefGithub ActionsGitlabGoJavaJenkinsLinuxPythonRubySaltstackTerraform
Artificial Intelligence • Fintech • Hardware • Information Technology • Sales • Software • Transportation
Design, scale, and manage AWS services for IoT devices. Collaborate on infrastructure, optimize performance, and ensure high availability of services.
Top Skills:
AWSBashGoHelmKubernetesPythonRubyTerraform
Big Data • Healthtech • HR Tech • Machine Learning • Software • Telehealth • Big Data Analytics
The Staff Site Reliability Engineer will architect, operate, and improve the platform while ensuring security compliance and enhancing development processes.
Top Skills:
AWSElasticsearchIstioKubernetesNatsNode.jsPostgresPythonReactTerraformTypescript
Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3
The role involves improving software reliability, automating processes, collaborating with teams on system optimization, and mentoring engineers to establish reliability as a core value.
Top Skills:
AWSAzureDatadogDockerEc2GCPGoKibanaKubernetesRubyTerraform
Reposted 12 Days AgoSaved
Easy Apply
Easy Apply
Big Data • Cloud • Software • Database
The Senior Site Reliability Engineer will support, maintain and grow the Atlas platform, focusing on automating processes and running multi-cloud environments.
Top Skills:
AWSAzureDnsGCPGoHTTPLinuxPythonRubyTls
eCommerce • Legal Tech • Professional Services • Software • Data Privacy
The Site Reliability Engineer will ensure systems run smoothly, work with automation tools, resolve issues, and drive operational improvements.
Top Skills:
AWSAzureCloudFormationDockerGCPGrafanaKubernetesMemcachedNew RelicOpentelemetryPostgresPrometheusPulumiRedisSentryTerraform
Reposted 9 Days AgoSaved
Easy Apply
Easy Apply
Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3
The Site Reliability Engineer will enhance CI/CD frameworks, automate cloud infrastructure, manage Kubernetes and AWS services, and ensure operational excellence.
Top Skills:
AnsibleAWSBashChefCi/CdDockerGitKubernetesPuppetPythonRubySaltTerraform
Artificial Intelligence • Other • Security • Software • Analytics • Big Data Analytics
The Lead Site Reliability Engineer will oversee the reliability and scalability of the infrastructure, lead a team in operational execution, ensure best practices in SRE, and mentor senior engineers.
Top Skills:
Ci/CdDockerGitopsGoKubernetesLinuxPythonTerraform
New
Track Smarter, Apply Better.
Ditch the spreadsheets. Organize your job search with our freeApplication Tracker.
Use For Free
Reposted 24 Days AgoSaved
Easy Apply
Easy Apply
Big Data • Cloud • Software • Database
As a Staff Site Reliability Engineer, you will empower developers by optimizing MongoDB Atlas, ensuring seamless performance across multiple cloud platforms while fostering a supportive culture.
Top Skills:
AWSGCPAzureMongoDB
Fintech • Software
The Principal Site Reliability Engineer is responsible for maintaining cloud infrastructure, ensuring application performance, and implementing automated solutions in a SaaS environment, while collaborating with security and software engineering teams.
Top Skills:
.NetAnsibleAppdynamicsAWSAzureAzure DevopsC#DatadogDynatraceHarnessJavaJenkinsKubernetesNew RelicTerraform
Cloud • Information Technology • Internet of Things • Software • Consulting • Infrastructure as a Service (IaaS) • Automation
Design, automate, and support OpenShift-based platforms, ensuring reliability and security while onboarding new managed services and handling incident responses.
Top Skills:
ArgoGoGrafanaJenkinsKubernetesLinuxOpenshiftPrometheusPythonTekton
Information Technology • Software
Lead the design, build, and operation of an API management and cloud infrastructure platform (Azure/AWS). Provide hands-on technical leadership for SRE/DevOps, IaC, container orchestration, CI/CD, observability, and incident response while mentoring engineers and driving platform reliability and automation.
Top Skills:
Api Management,Microsoft Azure,Aws,Kubernetes,Docker,Podman,Terraform,Azure Rm (Arm),Cloudformation,Ansible,Linux,Unix,Windows,Python,Shell Scripting,Groovy,Golang,Jenkins,Azure Devops,Github Actions,Pagerduty,Datadog,Splunk,Elk Stack,Opentelemetry (Otel),Azure Rbac,Aws Iam
Fintech • Payments
Lead an SRE team to improve reliability, observability, and automation across Azure. Define SLOs/SLIs, manage error budgets and on-call, standardize IaC/GitOps, build self-healing systems, and collaborate on cost, governance, and operational strategy.
Top Skills:
Azure,Log Analytics,Azure Monitor,Kql,Terraform,Bicep,Github Actions,Gitops,Python,Go,Bash,Powershell,Prometheus,Grafana,Iac,Azure Policies
Cloud • Information Technology • Security • Software • Cybersecurity
This internship role focuses on SRE skills, requiring collaboration and problem-solving in dynamic environments for Zscaler's Zero Trust Exchange team.
Top Skills:
AnsibleAws EcsKubernetesLinuxPythonTerraform
Insurance
The Lead SRE ensures the reliability and performance of Hiscox US tech platforms, leads a team, and implements operational excellence across environments.
Top Skills:
Arm TemplatesAWSAzureAzure DevopsBashDockerElkGrafanaJenkinsKubernetesNew RelicPowershellPrometheusPythonTerraform
Insurance
The Site Reliability Engineer ensures the reliability and performance of technology platforms, leads a team, and implements automation and monitoring strategies.
Top Skills:
Arm TemplatesAWSAzureAzure DevopsBashDockerElkGrafanaJenkinsKubernetesNew RelicPowershellPrometheusPythonTerraform
Big Data • Cloud • Healthtech • Software • Big Data Analytics
As a Senior Site Reliability Engineer, ensure system reliability and scalability, lead incident management, develop automation tools, and mentor team members.
Top Skills:
AnsibleAWSBashDockerGitGoHibernateJavaKubernetesLinuxMavenMySQLPythonRubyShellSolrSpringTomcatVagrant
Big Data • Cloud • Healthtech • Software • Big Data Analytics
As a Senior Site Reliability Engineer at Veeva, you will enhance the reliability and scalability of applications, lead incident management, and mentor team members while working with modern technologies.
Top Skills:
AnsibleAWSBashDockerGitGoHibernateJavaKubernetesLinuxMavenMySQLPythonRubyShellSolrSpringTomcatVagrant
3 Days AgoSaved
Easy Apply
Easy Apply
Hardware • Quantum Computing
Maintain and integrate hardware and software systems for quantum controls, manage lab and test infrastructure (HIL, K8s, networking, rack servers), automate provisioning and CI/CD, implement monitoring/alerting and observability, support incident response and root-cause analysis, and define operational procedures to ensure reliability across development and production environments.
Top Skills:
Python,Bash,Go,Docker,Git,Kubernetes,Grafana,Prometheus,Elk Stack,Gitlab Ci,Jenkins,Ansible,Terraform,Ubuntu,Debian,Red Hat,Windows,Dns,Dhcp,Tcp/Ip,Vlan,Lan,Wan,Routers,Switches,Rack Mount Servers,Hardware-In-The-Loop (Hil)
Information Technology • Internet of Things • Software • Virtual Reality
Lead reliability, availability, and resiliency strategies for large-scale systems, drive operational excellence, and provide technical mentorship across engineering teams.
Top Skills:
AWSCi/CdJavaMongoDBRabbitMQZookeeper
Healthtech • Insurance
The Senior Software Engineer will lead technical projects, mentor engineers, and build resilient cloud infrastructures focusing on SRE best practices.
Top Skills:
AWSCi/CdGCPGithub ActionsGrafanaKubernetesPrometheusTerraform
Top Boston Companies Hiring Senior Site Reliability Engineers
See AllPopular Job Searches
All Filters
Total selected ()
No Results
No Results





.png)
.png)

.png)





















