Get the job you really want.
Maximum of 25 job preferences reached.
Top Hybrid DevOps & Platform Engineering Jobs in Boston, MA
Artificial Intelligence • Professional Services • Business Intelligence • Consulting • Cybersecurity • Generative AI
The role involves leading technology strategy, managing cross-functional teams, and driving cloud adoption and enterprise-wide transformations.
Top Skills:
AgileCloud ComputingScrum
Big Data • Cloud • Software • Database
Lead a 6–8 person team managing the Kubernetes fleet and core runtime components (CoreDNS, cert-manager, Gatekeeper). Define technical vision and roadmap, guide migration from Terraform to Operator-driven lifecycle management, perform hands-on architectural reviews and PR reviews, resolve operational incidents, and collaborate with engineering leaders and stakeholders.
Top Skills:
Kubernetes,Coredns,Cert-Manager,Gatekeeper,Terraform,Crossplane,Operators,Aws,Gcp,Azure,Service Mesh,Load Balancing,Observability,Alerting,Containerization
Fitness • Hardware • Healthtech • Sports • Wearables
Design, build, and maintain internal tools and integrations powering business operations. Partner with Finance, Operations, and other teams to implement APIs, data pipelines, and automation. Ensure reliability and performance of systems like NetSuite, Salesforce, Stripe, and Boomi. Deliver production-quality web applications and mentor other engineers.
Top Skills:
Python,Java,Javascript,Typescript,Netsuite,Salesforce,Stripe,Boomi,Apis,Data Pipelines,Relational Databases,Sql,Middleware
Enterprise Web • Hardware • Internet of Things • Software
The Senior Site Reliability Engineer will mentor teams on observability practices, architect systems for growth, automate developer tasks, and debug production issues.
Top Skills:
GoKubernetesLgtm StackOpentelemetryPrometheusTypescript
Fintech • Machine Learning • Payments • Software • Financial Services
Lead design, development, deployment, and support of foundational AI systems including foundation model training, LLM inference, similarity search, guardrails, evaluation, and observability. Optimize large-scale AI performance (cost, latency, throughput), partner cross-functionally, and contribute to technical vision and roadmap.
Top Skills:
Aws UltraclustersC#C++GoHuggingfaceJavaLlm InferenceNemo GuardrailsPythonPyTorchScalaSimilarity SearchVectordbs
New
Track Smarter, Apply Better.
Ditch the spreadsheets. Organize your job search with our freeApplication Tracker.
Use For Free
Information Technology • Web3
The Site Reliability Engineer manages AWS Kubernetes infrastructure, ensuring operational excellence, security, and scalability, while implementing reliability improvements and best practices.
Top Skills:
ArgocdAWSBashDatadogEksGoKafkaKubernetesPostgresPythonSysdigTerraform
Reposted 5 Days AgoSaved
Easy Apply
Easy Apply
Consumer Web • eCommerce • Marketing Tech • Retail • Software • Analytics • Generative AI
Lead Production Infrastructure teams to enhance compute runtimes, ensure reliability, improve observability, and implement SRE practices while driving technical design and incident management.
Top Skills:
Ci/CdGitopsGoHelmJavaKubernetesOpentelemetryPythonTerraform
Consumer Web • eCommerce • Software
This role involves designing and operating Kubernetes-based developer platforms, enhancing system reliability, and leading upgrades and integrations on AWS, emphasizing developer experience and operational efficiency.
Top Skills:
ArgocdAws EksGoHelmKubernetesTerraform
Artificial Intelligence • Cloud • Security • Software • Cybersecurity
Partner with Field and Product teams to design and implement LLM observability architectures, build proofs-of-concept, produce technical collateral, and advise customers to drive adoption and product feedback.
Top Skills:
Python,Javascript,Typescript,Datadog,Llm,Tracing,Metrics,Logging,Observability
Healthtech • Information Technology • Security • Software • Cybersecurity
Work on backend monitoring and operations tooling: build Java/JavaScript/TypeScript services, improve observability (metrics, logs, traces, dashboards, alerts), automate operational workflows, integrate monitoring/ticketing systems, contribute AI-enabled ops features, write tests, and participate in an agile team.
Top Skills:
Java,Javascript,Typescript,Restful Apis,Json,Git,Llms,Embeddings,Opentelemetry,Prometheus,Grafana,Elk,Opensearch,Datadog,Splunk,Docker,Kubernetes,Aws,Azure,Gcp,Ci/Cd
All Filters
Total selected ()
No Results
No Results






.png)














