Get the job you really want.
Maximum of 25 job preferences reached.
Top Hybrid DevOps & Platform Engineering Jobs in Boston, MA
Big Data • Cloud • Software • Database
Lead a 6–8 person team managing the Kubernetes fleet and core runtime components (CoreDNS, cert-manager, Gatekeeper). Define technical vision and roadmap, guide migration from Terraform to Operator-driven lifecycle management, perform hands-on architectural reviews and PR reviews, resolve operational incidents, and collaborate with engineering leaders and stakeholders.
Top Skills:
Kubernetes,Coredns,Cert-Manager,Gatekeeper,Terraform,Crossplane,Operators,Aws,Gcp,Azure,Service Mesh,Load Balancing,Observability,Alerting,Containerization
Fitness • Hardware • Healthtech • Sports • Wearables
Design, build, and maintain internal tools and integrations powering business operations. Partner with Finance, Operations, and other teams to implement APIs, data pipelines, and automation. Ensure reliability and performance of systems like NetSuite, Salesforce, Stripe, and Boomi. Deliver production-quality web applications and mentor other engineers.
Top Skills:
Python,Java,Javascript,Typescript,Netsuite,Salesforce,Stripe,Boomi,Apis,Data Pipelines,Relational Databases,Sql,Middleware
Enterprise Web • Hardware • Internet of Things • Software
The Senior Site Reliability Engineer will mentor teams on observability practices, architect systems for growth, automate developer tasks, and debug production issues.
Top Skills:
GoKubernetesLgtm StackOpentelemetryPrometheusTypescript
Fintech • Machine Learning • Payments • Software • Financial Services
Lead design, development, deployment, and support of foundational AI systems including foundation model training, LLM inference, similarity search, guardrails, evaluation, and observability. Optimize large-scale AI performance (cost, latency, throughput), partner cross-functionally, and contribute to technical vision and roadmap.
Top Skills:
Python,Go,Scala,Java,C++,C#,Pytorch,Huggingface,Vectordbs,Nemo Guardrails,Aws Ultraclusters,Similarity Search,Llm Inference
Information Technology • Web3
The Site Reliability Engineer manages AWS Kubernetes infrastructure, ensuring operational excellence, security, and scalability, while implementing reliability improvements and best practices.
Top Skills:
ArgocdAWSBashDatadogEksGoKafkaKubernetesPostgresPythonSysdigTerraform
New
Cut your apply time in half.
Use ourAI Assistantto automatically fill your job applications.
Use For Free
Reposted 4 Days AgoSaved
Easy Apply
Easy Apply
Consumer Web • eCommerce • Marketing Tech • Retail • Software • Analytics • Generative AI
Lead Production Infrastructure teams to enhance compute runtimes, ensure reliability, improve observability, and implement SRE practices while driving technical design and incident management.
Top Skills:
Ci/CdGitopsGoHelmJavaKubernetesOpentelemetryPythonTerraform
Consumer Web • eCommerce • Software
This role involves designing and operating Kubernetes-based developer platforms, enhancing system reliability, and leading upgrades and integrations on AWS, emphasizing developer experience and operational efficiency.
Top Skills:
ArgocdAws EksGoHelmKubernetesTerraform
Artificial Intelligence • Cloud • Security • Software • Cybersecurity
Partner with Field and Product teams to design and implement LLM observability architectures, build proofs-of-concept, produce technical collateral, and advise customers to drive adoption and product feedback.
Top Skills:
Python,Javascript,Typescript,Datadog,Llm,Tracing,Metrics,Logging,Observability
Artificial Intelligence • Fintech • Insurance • Marketing Tech • Software • Analytics
Lead modernization of Liberty Mutual's global virtual app and desktop estate from Citrix to AVD, Windows 365 and RemoteApp. Build and coach an engineering team, define KPIs, automate pipelines, harden access, manage TCO, and improve reliability and patch compliance through SRE practices and Intune/Patch My PC Cloud.
Top Skills:
Citrix,Xenapp,Microsoft Remoteapp,Azure Virtual Desktop (Avd),Windows 365 Cloud Pc,Patch My Pc Cloud,Intune,Entra Id (Azure Ad),Zscaler,Fslogix,Azure,Sre Practices
Healthtech • Information Technology • Security • Software • Cybersecurity
Work on backend monitoring and operations tooling: build Java/JavaScript/TypeScript services, improve observability (metrics, logs, traces, dashboards, alerts), automate operational workflows, integrate monitoring/ticketing systems, contribute AI-enabled ops features, write tests, and participate in an agile team.
Top Skills:
Java,Javascript,Typescript,Restful Apis,Json,Git,Llms,Embeddings,Opentelemetry,Prometheus,Grafana,Elk,Opensearch,Datadog,Splunk,Docker,Kubernetes,Aws,Azure,Gcp,Ci/Cd
All Filters
Total selected ()
No Results
No Results





.png)















