RAI Institute Logo

RAI Institute

Machine Learning Operations Manager

Posted 19 Days Ago
Be an Early Applicant
Cambridge, MA
Senior level
Cambridge, MA
Senior level
Lead a team to design and maintain ML infrastructure, manage projects, ensure system reliability, and integrate best practices in the ML lifecycle.
The summary above was generated by AI
Our mission is to solve the most important and fundamental challenges in AI and Robotics to enable future generations of intelligent machines that will help us all live better lives.

Who we are looking for:
We are seeking a Machine Learning Operations (ML-OPs) Manager who is both technically adept and an effective leader. In this role, you will lead a small team of engineers while also being hands-on in designing, building, and maintaining infrastructure that supports the entire lifecycle of Machine Learning (ML) projects. If you have a passion for building scalable ML infrastructure, mentoring engineers, and collaborating with world-class researchers, this is the role for you!

What You Will Do

  • Technical Leadership & Strategy: Drive the design, development, and maintenance of company-wide MLOps platforms and tools, leveraging Kubernetes infrastructure for ML and data processing applications.
  • Team Management & Mentorship: Manage and mentor a small team of engineers, providing technical guidance, setting priorities, and fostering a collaborative team culture
  • Scalability & Performance: Enable self-service access to ML-compute resources across on-prem and cloud environments, ensuring workload scalability, fault tolerance, and efficient job scheduling
  • Monitoring & Observability: Enhance system observability through integrations with tools and services such as FluentD, Prometheus, Grafana, and DataDog to improve reliability and debugging
  • Experiment & Model Lifecycle Management: Integrate ML applications with experiment tracking and model management services such as Weights and Biases
  • Best Practices & Collaboration: Champion engineering best practices, drive improvements in CI/CD, infrastructure automation, and reproducibility. Work closely with ML Engineers, Data Engineers, DevOps teams, and researchers to accelerate research and deployment.

What You Will Bring

  • BS or MS in Computer Science, Engineering, or equivalent
  • 5+ years of experience in an ML-Ops, DevOps, ML Engineering, or software engineering role
  • 2+ years of experience managing engineers (can be formal management or technical leadership)
  • Strong, hands-on experience with Kubernetes for ML applications
  • Experience developing ML-Ops platforms (covering data/artifact management, reproducibility, fault tolerance, experiment tracking, and model serving)
  • Proficiency in Python, Docker, and environment management tools (pip, poetry, uv, or similar)Familiarity with CI/CD tools (GitHub Actions, ArgoCD) and Infrastructure as Code (Terraform)

Skills We Value

  • Experience with job scheduling mechanisms like Kueue
  • Hands-on experience with workflow orchestration tools (Airflow, Metaflow, Argo Workflows)
  • Experience managing cloud infrastructure (GCP, AWS) and hybrid-cloud environments
  • Knowledge of scalable AI/ML platforms like Ray or PyTorch Lightning
  • Experience with logging & monitoring tools (FluentD, Prometheus, Grafana, DataDog or similar 
  • Exposure to ML model serving frameworks (TorchServe, ONNX Runtime, or similar)
  • Previous experience collaborating with research teams in academic or industrial settings

We provide equal employment opportunities to all employees and applicants for employment and prohibit discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws.

Top Skills

Airflow
Argo Workflows
Argocd
AWS
Ci/Cd
Datadog
Docker
Fluentd
GCP
Github Actions
Grafana
Kubernetes
Metaflow
Onnx Runtime
Prometheus
Python
Pytorch Lightning
Ray
Terraform
Torchserve
Weights And Biases
HQ

RAI Institute Cambridge, Massachusetts, USA Office

145 Broadway, Cambridge, MA, United States, 02142

Similar Jobs

60K-120K
Senior level
Healthtech
The role involves building DevOps and MLOps infrastructure for AI and ML solutions, leading projects, coaching teams, and enhancing analytics tools.
Top Skills: AWSAzureCC++DockerGoGCPJavaJavaScriptJuliaKubeflowKubernetesMlflowPythonRustSparkSQL
20-20
Internship
Aerospace • Energy
As a GE Aerospace intern, you'll assist in designing aircraft engines, focusing on engineering challenges, while building professional skills and networking opportunities.
Top Skills: Microsoft ProductsProgramming Languages
23 Days Ago
6 Locations
Mid level
Mid level
Fintech • HR Tech • Insurance • Consulting
As a Linux Engineer, you will manage Linux systems, troubleshoot applications, support security upgrades, and ensure operational stability.
Top Skills: Apache Http ServerLinuxOracle FusionOracle LinuxRedhat LinuxSolarisTomcatWeblogic

What you need to know about the Boston Tech Scene

Boston is a powerhouse for technology innovation thanks to world-class research universities like MIT and Harvard and a robust pipeline of venture capital investment. Host to the first telephone call and one of the first general-purpose computers ever put into use, Boston is now a hub for biotechnology, robotics and artificial intelligence — though it’s also home to several B2B software giants. So it’s no surprise that the city consistently ranks among the greatest startup ecosystems in the world.

Key Facts About Boston Tech

  • Number of Tech Workers: 269,000; 9.4% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Thermo Fisher Scientific, Toast, Klaviyo, HubSpot, DraftKings
  • Key Industries: Artificial intelligence, biotechnology, robotics, software, aerospace
  • Funding Landscape: $15.7 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Summit Partners, Volition Capital, Bain Capital Ventures, MassVentures, Highland Capital Partners
  • Research Centers and Universities: MIT, Harvard University, Boston College, Tufts University, Boston University, Northeastern University, Smithsonian Astrophysical Observatory, National Bureau of Economic Research, Broad Institute, Lowell Center for Space Science & Technology, National Emerging Infectious Diseases Laboratories

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account