Get the job you really want.
Maximum of 25 job preferences reached.
Top Remote Senior Site Reliability Engineer Jobs in Boston, MA
Information Technology • Software
The Site Reliability Engineer manages system reliability, performance, and scalability for end-user services, leading software deployments, incident management, and service quality improvements. Responsibilities include collaboration with teams, maintaining a product roadmap, and automation of processes.
Top Skills:
AgileAternityDevsecopsItilPowershellPython
Cloud • Security • Software • Cybersecurity
The Senior Lead Site Reliability Engineer will ensure performance and uptime of security products, develop automation pipelines, and improve monitoring systems, working closely with various teams.
Top Skills:
AzureDatabricksDockerGoJenkinsKubernetesPythonTerraform
Aerospace • Manufacturing
As a Site Reliability Engineer, you'll build and manage observability platforms for satellite communications, define SLOs/SLIs, and collaborate on incident response and deployment automation.
Top Skills:
ArgocdAWSElkGCPGoGrafanaIstioJaegerKubernetesLinkerdLokiOpentelemetryPrometheusPythonTempoTerraform
Security • Software
Maintain, automate, and improve operational tools and customer deployment processes; monitor and ensure service SLOs, backup/restore, alerting, and incident response; drive GitOps/IaC practices, cost tracking, and automation of repetitive tasks while supporting outages and upgrades.
Top Skills:
AnsibleAWSAzureBashGCPGitopsGrafanaHelmKubernetesPrometheusPythonTerraform
Fintech
The Staff Site Reliability Engineer role involves leading architecture, automating GCP environment, defining SLIs and SLOs, mentoring teammates, and enhancing system reliability and performance.
Top Skills:
ArgocdDatadogGCPGoHelmJavaScriptKubernetesPythonTerraformTypescript
Cloud • Software • Database
Lead design, build, and operate the YugabyteDB DBaaS infrastructure. Drive architecture, automate lifecycle and maintenance, manage incidents and on-call rotations, implement security/encryption processes, and optimize reliability using SRE principles and observability.
Top Skills:
AksAnsibleAWSAzureBashDockerEksGCPGitGithub ActionsGkeJavaKubernetesLinuxPostgresPrometheusPythonShellTerraform
Healthtech • Other • Software
Responsible for managing PostgreSQL reliability for a 24x7 SaaS platform, incident response, automation, and enhancing database observability with various tools.
Top Skills:
AnsibleAzureBashDatadogGrafanaKubernetesNew RelicPostgresPowershellPrometheusPythonTerraform
Database
The Site Reliability Engineer will oversee the Digital Realty interconnection fabric network infrastructure, focusing on network operations, automation, and development. Responsibilities include maintaining global network infrastructure, responding to alerts, and working with various cloud platforms and automation tools.
Top Skills:
AnsibleAWSAzureGitGCPIbm CloudJenkinsLinuxOracle CloudPythonTerraform
Information Technology • Internet of Things • Software • Virtual Reality
Lead reliability, availability, and resiliency strategies for large-scale systems, drive operational excellence, and provide technical mentorship across engineering teams.
Top Skills:
AWSCi/CdJavaMongoDBRabbitMQZookeeper
Aerospace • Big Data • Greentech • Hardware • Social Impact
The Site Reliability Engineer will build, deploy, and operate computing services for satellite imaging, ensuring reliable and scalable infrastructure while collaborating with cross-functional teams.
Top Skills:
AlloyAnsibleBashCloud-Native InfrastructureGrafanaHelmK3SKubernetesKustomizeOpentelemetryPrometheusProxmoxPythonRke2TalosTerraform
Artificial Intelligence • Cloud • Machine Learning • Software • Database • App development • Generative AI
As a Site Reliability Engineer at Replit, you'll enhance system reliability through observability, automation, incident management, and performance optimization, serving millions globally.
Top Skills:
AnsibleDatadogGoGoogle Cloud PlatformGrafanaKubernetesPrometheusPulumiPythonTerraform
Artificial Intelligence • Cloud • Machine Learning • Software • Database • App development • Generative AI
As a Staff Site Reliability Engineer at Replit, you will ensure infrastructure reliability, drive automation, lead incident management, and mentor the engineering team while enhancing system performance and observability.
Top Skills:
DatadogGoGoogle Cloud PlatformGrafanaKubernetesOpentelemetryPrometheusPythonTerraform
New
Cut your apply time in half.
Use ourAI Assistantto automatically fill your job applications.
Use For Free
Security • Software
The Site Reliability Engineer will enhance production services' reliability and security, automate tasks, monitor systems, and manage incidents.
Top Skills:
AnsibleAWSAzureBashCloudFormationGCPLinux/UnixPowershellPythonRubyTerraformWindows Os
Logistics • Software • Transportation
Lead and mentor teams in DevOps and SRE, architect scalable Azure Cloud infrastructure, implement CI/CD and IaC, ensure database reliability, and drive cross-functional collaboration.
Top Skills:
Azure CloudAzure DevopsCi/CdCosmosdbDockerElkGrafanaKubernetesMySQLPostgresPrometheusRedisSQL ServerTerraform
Healthtech • Software
Maintain reliability, performance, and scalability of cloud-hosted services and databases. Implement SRE best practices, define SLIs/SLOs, respond to incidents, build monitoring and automation, perform DBA tasks (backups, restores, tuning), support CI/CD and DB migrations, and document runbooks and procedures.
Top Skills:
Amazon RdsAzure Sql DatabaseBashEcs FargateFlywayGitlabJenkinsKubernetesLiquibaseOctopus DeployOraclePostgresPowershellPythonRedisSolarwinds DpaSQL Server
Software
The role involves managing compute infrastructure for decentralized applications, requiring critical thinking, documentation skills, and experience in Kubernetes and blockchain management.
Top Skills:
BlockchainGitopsInfrastructure-As-CodeKubernetesProgramming Languages
Reposted 21 Days AgoSaved
Easy Apply
Easy Apply
Big Data • Cloud • Software • Database
Manage continuous delivery infrastructure for reliable code deployment. Collaborate with teams to streamline onboarding, support deployment systems, and participate in on-call rotations.
Top Skills:
Argo WorkflowsArgocdAWSAzureGoGoogle Cloud PlatformKubernetesPython
Artificial Intelligence • eCommerce • Retail
Lead the SRE and DevOps team, ensure infrastructure reliability, oversee cloud operations, drive automation, and collaborate cross-functionally.
Top Skills:
AzureBashCi/CdDatadogDockerElk StackGoGrafanaKubernetesPowershellPrometheusPythonTerraform
Aerospace • Big Data • Greentech • Hardware • Social Impact
Design, deploy, and operate compute services for on-premises and cloud satellite imaging platforms. Build reproducible, scalable, highly available deployments, troubleshoot distributed systems, optimize constrained environments, document and automate operations, and participate in on-call rotations to ensure reliability for customer-facing and air-gapped deployments.
Top Skills:
AlloyAnsibleBashCudaGitopsGrafanaHelmJIRAK3SKubernetesKustomizeOpentelemetryPrometheusProxmoxPythonRke2TalosTerraform
Software
Join the SRE team to improve monitoring, alerting, observability, and reliability of Fireblocks' production systems. Triage incidents, run RCA, create runbooks and automation (Python, Lambda, shell, Ansible, ArgoCD), collaborate with R&D/support, and participate in on-call rotation.
Top Skills:
AnsibleArgocdAWSAws LambdaAzureBashBitbucketC++ChefCoralogixDatadogDockerGerritGitGitlabGCPHelmJavaScriptKubernetesLinuxMySQLNew RelicNginxNode.jsPhabricatorPrometheusPuppetPythonShellSplunk
Real Estate • Financial Services • PropTech
As a Site Reliability Engineer, you will support AWS Cloud products, optimize processes, enhance automation, and ensure system reliability and performance.
Top Skills:
ArgocdAWSAzure DevopsBashCi/CdCloudwatchDockerEksFluxcdGitKubernetesPowershellPythonSQLTerraform
Cloud • Software
In this role, you'll support large-scale applications, improve observability, mentor team members, and ensure reliability by collaborating on deployments and writing automation scripts while providing 24/7 support.
Top Skills:
AnsibleAWSBashConfluenceDockerElk StackGCPGitlab CicdGrafanaJenkinsJIRAKubernetesLinuxMongoDBMySQLNagiosOciPerlPostgresPrometheusPuppetPythonTerraform
Software
Lead SRE to define SRE strategy, architecture, and roadmap; design and operate containerized, compliant cloud environments; build observability, incident management, automation, and developer platform capabilities; mentor SRE team and collaborate with security, compliance, and product teams to ensure reliability at scale.
Top Skills:
AWSAws MarketplaceAzureAzure MarketplaceGCPGoogle Cloud MarketplaceGrafanaKubernetesPrometheusTerraform
Computer Vision • Information Technology • Machine Learning • Natural Language Processing • Real Estate • Software
The SRE will maintain infrastructure for SaaS products on AWS, support developers, manage platform components, and handle IT tasks.
Top Skills:
AWSComputer VisionIacLarge Language ModelsNlpTerraform
Artificial Intelligence • Information Technology • Machine Learning • Software • Cybersecurity • Generative AI • Data Privacy
Lead global SRE and infrastructure teams to ensure reliability, scalability, and cost-efficiency of production and developer platforms. Define cloud and Kubernetes architecture, IaC, CI/CD, SLOs/SLIs, incident management, and cloud cost optimization while partnering with Security, Product, Finance, and Engineering.
Top Skills:
AIAutomationAWSCi/CdCloud-Native SystemsGCPInfrastructure As CodeKubernetesTerraform
Top Boston, MA Companies Hiring Remote Senior Site Reliability Engineers
See AllPopular Job Searches
All Filters
Total selected ()
No Results
No Results


































