Get the job you really want.
Maximum job preference limit reached.
Top Senior Site Reliability Engineer Jobs in Boston, MA
Other • Social Impact
Design and maintain ML infrastructure, ensure reliability and scalability, collaborate with teams, optimize system performance, and mentor others.
Top Skills:
AnsibleArgo CdDockerElk StackGpu AccelerationGrafanaHelmKubernetesPrometheusPythonPyTorchScikit-LearnTensorFlowTerraform
Other • Social Impact
As a Staff Site Reliability Engineer, you will design and maintain ML infrastructure, optimize performance, and guide teams in operational excellence.
Top Skills:
AnsibleArgo CdDockerElk StackGpu AccelerationGrafanaHelmKubernetesPrometheusPyTorchScikit-LearnTensorFlowTerraform
Other • Social Impact
Design and maintain machine learning infrastructure, ensuring reliability and scalability while mentoring team members and collaborating with various teams.
Top Skills:
AnsibleArgo CdDockerElk StackGpu AccelerationGrafanaHelmKubernetesMachine LearningPrometheusPyTorchScikit-LearnTensorFlowTerraform
Other • Social Impact
The Staff SRE will design, maintain, and scale ML infrastructure, improve reliability, and collaborate with teams to optimize ML workflows.
Top Skills:
AnsibleArgo CdDockerElk StackGpu AccelerationGrafanaHelmKubernetesPrometheusPythonPyTorchScikit-LearnTensorFlowTerraform
Reposted 20 Days AgoSaved
Artificial Intelligence • Marketing Tech • Sales • Software
The Site Reliability Engineer will enhance system performance, optimize data systems, manage infrastructure issues, and ensure efficient database operations.
Top Skills:
ClickhouseDatabasesLinuxNetworkingSQL
Fintech • Financial Services
The Senior Site Reliability Engineer will ensure system reliability, automate deployments, govern monitoring infrastructure, and enhance software delivery with cross-functional collaboration.
Top Skills:
AWSAzureBashGCPGroovyMonitoring ToolsNoSQLObservability ToolsPowershellSQL
Information Technology • Software
You will manage and improve the technology infrastructure, ensuring its efficiency and security, while mentoring junior team members and driving projects autonomously.
Top Skills:
CloudflareCloudflare WorkersGCPGoPostgresRedis
Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3
As a Senior Site Reliability Engineer, you will manage IAM systems, implement cloud-native applications, and enhance automation and security in operations, ensuring peak uptime and performance.
Top Skills:
AnsibleAWSAzureAzure AdC#DockerDuoGCPGoGoogle WorkspaceJavaKubernetesOktaPingPythonRubyTerraform
New
Track Smarter, Apply Better.
Ditch the spreadsheets. Organize your job search with our freeApplication Tracker.
Use For Free
Cloud • Information Technology
Lead and expand a Production SRE team, enhance infrastructure reliability, implement network automation, and shape SRE practices within the organization.
Top Skills:
AnsibleEnvoyExpressGitGoHaproxyJavaScriptJenkinsKafkaMySQLNapalmNode.jsPostgresPythonReactRedisSaltstack
Cloud • Information Technology
As a Staff Site Reliability Engineer, you will manage core infrastructure, improve reliability, automate operations, and support engineering teams in a remote environment.
Top Skills:
ElkEnvoyGoGrafanaGrpcHaproxyHashicorp NomadHoneycombJenkinsKafkaLinuxMySQLNode.jsPostgresPuppetRedis
Real Estate • Software • PropTech
As a Site Reliability Engineer, you will ensure system reliability and stability, troubleshoot issues, and optimize operational processes within Qualia's technology systems, focusing on Resware applications.
Top Skills:
.NetAzureIisPowershellSQL ServerTerraformWindows Server
Information Technology • Security • Cybersecurity
As a Staff Site Reliability Engineer, you will design and optimize Kubernetes infrastructure, maintain CI/CD pipelines, and mentor engineers, ensuring system reliability and automation practices.
Top Skills:
Argo CdBashCi/CdDatadogGithub ActionsGitlab CiGoGrafanaHelmJenkinsKubernetesOpentelemetryPrometheusPythonSpinnakerTerraform
Artificial Intelligence • Computer Vision • Hardware • Robotics • Metaverse
Design and implement GPU compute clusters, optimize operations for efficiency, troubleshoot and maintain large-scale infrastructure, and enhance researcher productivity.
Top Skills:
BashDockerEnrootGpfsKubernetesLustreMySQLPythonPyTorchSlurmTensorFlowTerraform
Financial Services
As a Senior Site Reliability Engineer, you will ensure system reliability and performance while collaborating on system design, coding, and incident response.
Top Skills:
AWSAzureDockerGCPJavaKubernetesOpentelemetryPrometheusSpring Boot
Blockchain • Information Technology
Lead the design and management of infrastructure for reliability, security, and scalability. Build developer tools, automate deployments, and ensure system performance in a blockchain environment.
Top Skills:
AnsibleAWSAzureGCPGoKubernetesPythonRustTerraform
Database
Manage infrastructure for Postgres databases, improve system architecture, enhance observability, implement CI/CD, and resolve support issues.
Top Skills:
AWSCdkGoInfrastructure As CodePulumiTerraformTypescript
Artificial Intelligence • Automotive • Machine Learning • Transportation
The Senior Site Reliability Engineer will enhance system reliability and performance, lead incident response, mentor junior members, and manage cloud infrastructure costs.
Top Skills:
AWSBashC++CloudFormationCloudwatchDatadogDockerGitlab CiGrafanaJavaJenkinsKubernetesPrometheusPythonTerraform
Cloud • Software
The Senior Site Reliability Engineer will deploy and maintain observability infrastructure, manage Kubernetes platforms, and enhance security for DoD networks.
Top Skills:
ArgocdAWSAzureBashFluxGCPGoGrafanaHelmIstioKeycloakKubernetesMimirPrometheusPulumiYaml
Cloud • Greentech • Other • Energy
In this role, you'll support virtualization and kernel performance, develop automation tools, optimize compute platforms for AI, and collaborate with hardware teams.
Top Skills:
CGoKvmLinuxQemuRustSmartnics
Artificial Intelligence • Healthtech • Machine Learning • Software • Biotech
Responsible for designing, building, and operating hybrid cloud and on-prem infrastructure, implementing SRE best practices, and automation.
Top Skills:
AnsibleAWSCloudFormationDatadogEksGoGrafanaKubeadmKvmPrometheusPythonTerraform
Automotive • Software
The Senior Site Reliability Engineer will optimize platform reliability, manage Kubernetes infrastructure, deploy monitoring solutions, and collaborate on system performance.
Top Skills:
AndroidArgocdAWSCircleCICrossplaneDockerGCPGitGoGrafanaKafkaKubernetesLokiNew RelicObjective-COpentelemetryPostgresPrometheusPythonReactRedisRedshiftReduxRuby On RailsSentrySwiftTerraformThanos
Artificial Intelligence • Information Technology • Consulting
In this role, you'll ensure the reliability and performance of the AI Studio inference platform, involving extensive work with telemetry pipelines, Kubernetes, and resilience in infrastructure design.
Top Skills:
BashGrafanaKubernetesMlopsPrometheusPythonTerraform
Software
The Senior Site Reliability Engineer will enhance system reliability, automate deployments, and mentor teams while managing AWS infrastructure and incident responses.
Top Skills:
AWSBuildkiteCloudflareDatadogEcsFargateGitKafkaMikro-OrmMongoDBNestjsNode.jsPostgresReactReact NativeTerraformTypescriptVue
Biotech
The Senior Site Reliability Engineer will architect and automate AWS and Kubernetes platforms, ensuring operational excellence for bioinformatics workflows.
Top Skills:
AWSAws CdkAws LambdaAws Secrets ManagerBashDockerEksGrafanaHelKubernetesPrometheusPythonTerraform
Big Data • Cloud • Marketing Tech • Social Impact • Software
The Senior SRE will manage global product deployments, provide engineering support, enhance CI/CD and monitoring, and maintain operational documentation.
Top Skills:
AWSCircleCIGCPGoJenkinsKubernetesPythonTerraform
Popular Job Searches
All Filters
Total selected ()
No Results
No Results