Get the job you really want.
Maximum of 25 job preferences reached.
Top Hybrid DevOps & Platform Engineering Jobs in Boston, MA
Fitness • Hardware • Healthtech • Sports • Wearables
Design and build IT systems and processes, focusing on automation, security, and employee experience while collaborating with cross-functional teams.
Top Skills:
AIAutomationAWSGCPmacOSWindows
Artificial Intelligence • Professional Services • Business Intelligence • Consulting • Cybersecurity • Generative AI
The role involves leading technology strategy, managing cross-functional teams, and driving cloud adoption and enterprise-wide transformations.
Top Skills:
AgileCloud ComputingScrum
Big Data • Cloud • Software • Database
Lead a 6–8 person team managing the Kubernetes fleet and core runtime components (CoreDNS, cert-manager, Gatekeeper). Define technical vision and roadmap, guide migration from Terraform to Operator-driven lifecycle management, perform hands-on architectural reviews and PR reviews, resolve operational incidents, and collaborate with engineering leaders and stakeholders.
Top Skills:
AlertingAWSAzureCert-ManagerContainerizationCorednsCrossplaneGatekeeperGCPKubernetesLoad BalancingObservabilityOperatorsService MeshTerraform
Reposted 2 Days AgoSaved
Fintech • Machine Learning • Payments • Software • Financial Services
Lead design, development, deployment, and support of foundational AI systems including foundation model training, LLM inference, similarity search, guardrails, evaluation, and observability. Optimize large-scale AI performance (cost, latency, throughput), partner cross-functionally, and contribute to technical vision and roadmap.
Top Skills:
Aws UltraclustersC#C++GoHuggingfaceJavaLlm InferenceNemo GuardrailsPythonPyTorchScalaSimilarity SearchVectordbs
Enterprise Web • Hardware • Internet of Things • Software
The Senior Site Reliability Engineer will mentor teams on observability practices, architect systems for growth, automate developer tasks, and debug production issues.
Top Skills:
GoKubernetesLgtm StackOpentelemetryPrometheusTypescript
New
Cut your apply time in half.
Use ourAI Assistantto automatically fill your job applications.
Use For Free
Information Technology • Web3
The Site Reliability Engineer manages AWS Kubernetes infrastructure, ensuring operational excellence, security, and scalability, while implementing reliability improvements and best practices.
Top Skills:
ArgocdAWSBashDatadogEksGoKafkaKubernetesPostgresPythonSysdigTerraform
eCommerce • Healthtech • Pet • Retail • Pharmaceutical
Design and develop observability solutions, enhance performance and reliability of applications, support CI/CD workflows, and troubleshoot issues in a public cloud environment.
Top Skills:
AWSDatadogDynatraceFluent BitFluentdGCPJavaJenkinsKubernetesNext.JsNode.jsOpen Telemetry CollectorPrometheusReactStatsdTerraformTypescript
Reposted 5 Days AgoSaved
Easy Apply
Easy Apply
Consumer Web • eCommerce • Marketing Tech • Retail • Software • Analytics • Generative AI
Lead the Developer Infrastructure team, enhance developer productivity, optimize environments, streamline CI/CD pipelines, and manage team operations.
Top Skills:
AWSBuildkiteCi/CdDjangoFastapiGoKubernetesPantsPythonReactTerraformTypescript
Information Technology • Productivity • Professional Services • Software
The Cloud Engineer role involves developing configurations in YAML and JSON, building monitoring solutions, maintaining software applications, and writing custom scripts for various environments.
Top Skills:
DatadogGoJSONLinuxPowershellPythonShellUnixYaml
Artificial Intelligence • Cloud • Security • Software • Cybersecurity
Lead design and implementation of LLM observability features: prototype and scale product capabilities for tracing, evaluating, and debugging generative AI systems. Work cross-functionally to influence architecture, mentor engineers, prioritize customer pain points, and drive product and engineering decisions for reliable, high-performance AI observability.
Top Skills:
Distributed SystemsGenerative AiInference PipelinesLarge Language Models (Llms)Observability Tools/PlatformsPrompt EngineeringScalable Backend Architectures
All Filters
Total selected ()
No Results
No Results







.png)













