Get the job you really want.
Maximum of 25 job preferences reached.
Top Senior Site Reliability Engineer Jobs in Boston, MA
Security • Software • Analytics
Design, operate, and automate scalable, secure infrastructure for Axiom Cloud. Define SLOs, plan disaster recovery and capacity, tune performance, improve deployment practices, build reliability tooling, respond to incidents, and promote monitoring and observability across teams.
Top Skills:
Amazon EksAWSCircleCIDockerGithub ActionsGitlabGoKubernetesLinuxLlmsMonitoring And Observability ToolsPulumiTerraform
Blockchain • Fintech • Social Media • Cryptocurrency • NFT • Web3
Design, build, and operate scalable, highly available infrastructure and platform software for Zora's blockchain services (indexer, APIs, data pipelines). Automate workflows, maintain core systems, improve developer experience, participate in on-call rotation, and contribute strategic technical direction.
Top Skills:
AsyncioBaseBridgesCephCloudflare Pages FunctionsDatadogDockerEthereumGoIpfsKubernetesMongoDBOpentelemetryOptimismOptimistic RollupsPlasmaPolygonPostgresPythonRpc NodesSidechainsVercelZk-Rollups
Blockchain • Financial Services • Cryptocurrency • Web3
As a Senior Site Reliability Engineer, you will manage the reliability and efficiency of Kraken's Data platform, working with multiple teams to ensure high performance and scalability. Responsibilities include designing data governance mechanisms, managing CI/CD pipelines, implementing monitoring solutions, and collaborating on various data projects.
Top Skills:
Apache AirflowSparkAWSDebeziumDockerKafkaKubernetesPythonTerraform
Blockchain • Financial Services • Cryptocurrency • Web3
As a SRE/DevOps Engineer at Kraken, you will build infrastructure, support tools, drive standardization, and guide engineers in an efficient remote environment.
Top Skills:
BashContinuous IntegrationDockerGitGrafanaLinuxPrometheusPythonRustTerraform
Artificial Intelligence • Big Data • Computer Vision • Machine Learning • Natural Language Processing • Software • Cybersecurity
Maintain and improve the internal developer platform, observability stack, and AWS infrastructure (Terraform); manage Kubernetes at scale; troubleshoot distributed systems; drive security, reliability, cost and performance improvements; partner with product teams and participate in on-call support.
Top Skills:
AWSCkaContainersGoKubernetesLgtm StackLinuxOpensearchPythonServerlessTcp/IpTerraform
Information Technology • Software
The Site Reliability Engineer manages system reliability, performance, and scalability for end-user services, leading software deployments, incident management, and service quality improvements. Responsibilities include collaboration with teams, maintaining a product roadmap, and automation of processes.
Top Skills:
AgileAternityDevsecopsItilPowershellPython
Aerospace • Manufacturing
As a Site Reliability Engineer, you'll build and manage observability platforms for satellite communications, define SLOs/SLIs, and collaborate on incident response and deployment automation.
Top Skills:
ArgocdAWSElkGCPGoGrafanaIstioJaegerKubernetesLinkerdLokiOpentelemetryPrometheusPythonTempoTerraform
Security • Software
Maintain, automate, and improve operational tools and customer deployment processes; monitor and ensure service SLOs, backup/restore, alerting, and incident response; drive GitOps/IaC practices, cost tracking, and automation of repetitive tasks while supporting outages and upgrades.
Top Skills:
AnsibleAWSAzureBashGCPGitopsGrafanaHelmKubernetesPrometheusPythonTerraform
Fintech
The Staff Site Reliability Engineer role involves leading architecture, automating GCP environment, defining SLIs and SLOs, mentoring teammates, and enhancing system reliability and performance.
Top Skills:
ArgocdDatadogGCPGoHelmJavaScriptKubernetesPythonTerraformTypescript
Cloud • Software • Database
Lead design, build, and operate the YugabyteDB DBaaS infrastructure. Drive architecture, automate lifecycle and maintenance, manage incidents and on-call rotations, implement security/encryption processes, and optimize reliability using SRE principles and observability.
Top Skills:
AksAnsibleAWSAzureBashDockerEksGCPGitGithub ActionsGkeJavaKubernetesLinuxPostgresPrometheusPythonShellTerraform
Database
The Site Reliability Engineer will oversee the Digital Realty interconnection fabric network infrastructure, focusing on network operations, automation, and development. Responsibilities include maintaining global network infrastructure, responding to alerts, and working with various cloud platforms and automation tools.
Top Skills:
AnsibleAWSAzureGitGCPIbm CloudJenkinsLinuxOracle CloudPythonTerraform
Aerospace • Big Data • Greentech • Hardware • Social Impact
The Site Reliability Engineer will build, deploy, and operate computing services for satellite imaging, ensuring reliable and scalable infrastructure while collaborating with cross-functional teams.
Top Skills:
AlloyAnsibleBashCloud-Native InfrastructureGrafanaHelmK3SKubernetesKustomizeOpentelemetryPrometheusProxmoxPythonRke2TalosTerraform
New
Cut your apply time in half.
Use ourAI Assistantto automatically fill your job applications.
Use For Free
Artificial Intelligence • Cloud • Machine Learning • Software • Database • App development • Generative AI
As a Site Reliability Engineer at Replit, you'll enhance system reliability through observability, automation, incident management, and performance optimization, serving millions globally.
Top Skills:
AnsibleDatadogGoGoogle Cloud PlatformGrafanaKubernetesPrometheusPulumiPythonTerraform
Artificial Intelligence • Cloud • Machine Learning • Software • Database • App development • Generative AI
As a Staff Site Reliability Engineer at Replit, you will ensure infrastructure reliability, drive automation, lead incident management, and mentor the engineering team while enhancing system performance and observability.
Top Skills:
DatadogGoGoogle Cloud PlatformGrafanaKubernetesOpentelemetryPrometheusPythonTerraform
Fintech • Payments
Join WEX as a Site Reliability Engineer to enhance system reliability in Azure, automate tasks, and collaborate with teams to optimize performance and incident response.
Top Skills:
AnsibleAzureBashCloudFormationDockerElk StackGoGrafanaKubernetesPrometheusPythonSplunkTerraform
Logistics • Software • Transportation
Lead and mentor teams in DevOps and SRE, architect scalable Azure Cloud infrastructure, implement CI/CD and IaC, ensure database reliability, and drive cross-functional collaboration.
Top Skills:
Azure CloudAzure DevopsCi/CdCosmosdbDockerElkGrafanaKubernetesMySQLPostgresPrometheusRedisSQL ServerTerraform
Healthtech • Software
Maintain reliability, performance, and scalability of cloud-hosted services and databases. Implement SRE best practices, define SLIs/SLOs, respond to incidents, build monitoring and automation, perform DBA tasks (backups, restores, tuning), support CI/CD and DB migrations, and document runbooks and procedures.
Top Skills:
Amazon RdsAzure Sql DatabaseBashEcs FargateFlywayGitlabJenkinsKubernetesLiquibaseOctopus DeployOraclePostgresPowershellPythonRedisSolarwinds DpaSQL Server
Software
The role involves managing compute infrastructure for decentralized applications, requiring critical thinking, documentation skills, and experience in Kubernetes and blockchain management.
Top Skills:
BlockchainGitopsInfrastructure-As-CodeKubernetesProgramming Languages
Artificial Intelligence • eCommerce • Retail
Lead the SRE and DevOps team, ensure infrastructure reliability, oversee cloud operations, drive automation, and collaborate cross-functionally.
Top Skills:
AzureBashCi/CdDatadogDockerElk StackGoGrafanaKubernetesPowershellPrometheusPythonTerraform
Aerospace • Big Data • Greentech • Hardware • Social Impact
Design, deploy, and operate compute services for on-premises and cloud satellite imaging platforms. Build reproducible, scalable, highly available deployments, troubleshoot distributed systems, optimize constrained environments, document and automate operations, and participate in on-call rotations to ensure reliability for customer-facing and air-gapped deployments.
Top Skills:
AlloyAnsibleBashCudaGitopsGrafanaHelmJIRAK3SKubernetesKustomizeOpentelemetryPrometheusProxmoxPythonRke2TalosTerraform
Software
Join the SRE team to improve monitoring, alerting, observability, and reliability of Fireblocks' production systems. Triage incidents, run RCA, create runbooks and automation (Python, Lambda, shell, Ansible, ArgoCD), collaborate with R&D/support, and participate in on-call rotation.
Top Skills:
AnsibleArgocdAWSAws LambdaAzureBashBitbucketC++ChefCoralogixDatadogDockerGerritGitGitlabGCPHelmJavaScriptKubernetesLinuxMySQLNew RelicNginxNode.jsPhabricatorPrometheusPuppetPythonShellSplunk
Real Estate • Financial Services • PropTech
As a Site Reliability Engineer, you will support AWS Cloud products, optimize processes, enhance automation, and ensure system reliability and performance.
Top Skills:
ArgocdAWSAzure DevopsBashCi/CdCloudwatchDockerEksFluxcdGitKubernetesPowershellPythonSQLTerraform
Cloud • Software
In this role, you'll support large-scale applications, improve observability, mentor team members, and ensure reliability by collaborating on deployments and writing automation scripts while providing 24/7 support.
Top Skills:
AnsibleAWSBashConfluenceDockerElk StackGCPGitlab CicdGrafanaJenkinsJIRAKubernetesLinuxMongoDBMySQLNagiosOciPerlPostgresPrometheusPuppetPythonTerraform
Software
Lead SRE to define SRE strategy, architecture, and roadmap; design and operate containerized, compliant cloud environments; build observability, incident management, automation, and developer platform capabilities; mentor SRE team and collaborate with security, compliance, and product teams to ensure reliability at scale.
Top Skills:
AWSAws MarketplaceAzureAzure MarketplaceGCPGoogle Cloud MarketplaceGrafanaKubernetesPrometheusTerraform
Fintech • Payments
As a Site Reliability Engineer, you will monitor Azure Cloud systems, automate processes, respond to incidents, and collaborate with development teams to enhance reliability and performance.
Top Skills:
AzureBashDockerElk StackGoGrafanaKubernetesPrometheusPythonSplunkTerraform
Top Boston Companies Hiring Senior Site Reliability Engineers
See AllPopular Job Searches
All Filters
Total selected ()
No Results
No Results































