RAI Institute Logo

RAI Institute

Machine Learning Operations Engineer

Reposted 11 Days Ago
In-Office
Cambridge, MA
Mid level
In-Office
Cambridge, MA
Mid level
Design and maintain ML-Ops platforms, enhancing observability and scalability of ML applications while ensuring high code quality and collaboration with engineering teams.
The summary above was generated by AI
Our Mission

Our mission is to solve the most important and fundamental challenges in AI and Robotics to enable future generations of intelligent machines that will help us all live better lives.

Machine Learning Operations (ML-Ops) Engineers build infrastructure that supports the entire lifecycle of Machine Learning (ML) projects from development to scaling and to deployment. If you have a passion for building the foundation that enables robotics research and engineering, you will want to join us! 

What You Will Do

  • Design, develop, and maintain company-wide platforms and tooling that utilize Kubernetes infrastructure to enable machine learning and data processing applications
  • Enable self-service access to ML-compute for our on-prem and cloud compute clusters, including support for job scheduling, workload scalability and workload fault tolerance
  • Enhance observability across ML applications through integrations with tools and services such as FluentD, Prometheus, Grafana and DataDog
  • Integrate ML applications with experiment tracking and management services like Weights and Biases
  • Elevate code quality and champion best practices in our engineering processes
  • Collaborate with Machine Learning Engineers, Data Engineers, DEVOPs engineers and researchers to build scalable solutions that improve engineering and research velocity.

What You Will Bring

  • BS or MS in Computer Science, Engineering, or equivalent
  • 3+ years of experience in an MLOPs, DevOps, ML Engineering or software engineering role
  • Strong hands-on experience deploying and managing applications running on Kubernetes
  • Experience developing MLOPS platforms to manage the lifecycle of ML experiments; including one or more of data and artifact management, reproducibility, fault-tolerance, experiment tracking and model serving
  • Experience with Docker and Python environment management tools such as pip, poetry, uv or similar
  • Proficient in software practices such as version control (Git), CI/CD (Github Actions, ArgoCD), Infrastructure as Code(Terraform).

Extra Skills We Value

  • Experience with Kueue, or similar job scheduling mechanisms
  • Experience with workflow orchestration tools such as Airflow, Metaflow, Argo Workflows or similar
  • Hands-on experience deploying and managing cloud infra on platforms like GCP and AWS
  • Experience with hybrid-cloud compute and data environments
  • Experience with Ray, Pytorch Lightning or similar scalable AI/ML platforms
  • Experience with application and system, logging with tools and services like FluentD, Prometheus, Grafana and DataDog or similar
  • Experience with Bazel build tool or similar
  • Experience with ML model serving frameworks such as Torchserve, ONNX runtime or similar
  • Experience working with research teams in an academic or industrial environment.

We provide equal employment opportunities to all employees and applicants for employment and prohibit discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws.

Top Skills

Airflow
Argocd
AWS
Bazel
Ci/Cd
Datadog
Docker
Fluentd
GCP
Git
Grafana
Kubernetes
Metaflow
Onnx Runtime
Prometheus
Python
Pytorch Lightning
Ray
Terraform
Torchserve
Weights And Biases
HQ

RAI Institute Cambridge, Massachusetts, USA Office

145 Broadway, Cambridge, MA, United States, 02142

Similar Jobs

11 Days Ago
In-Office
Boston, MA, USA
Senior level
Senior level
Artificial Intelligence • Information Technology • Software • Generative AI
The Senior ML Ops Engineer will architect and implement software solutions, focusing on MLOps, system design, and robust AI applications.
Top Skills: ArgocdDockerGCPHelmKubernetesPython
10 Days Ago
In-Office
Boston, MA, USA
Senior level
Senior level
Artificial Intelligence • Healthtech • Insurance • Software
As a Senior Software Engineer, you will develop backend infrastructure, build APIs, and improve machine learning deployment for a healthcare AI platform.
Top Skills: AWSAws-EcsAzureCdkDockerGCPKubernetesMl OpsNode.jsNoSQLPythonSQLTerraform
24 Days Ago
In-Office
Marlborough, MA, USA
126K-126K
Mid level
126K-126K
Mid level
Retail
The Lead MLOps Engineer will develop, deploy, and manage machine learning models, ensuring they are production-ready while mentoring junior staff. Responsibilities include building infrastructure, optimizing model performance, and collaborating across teams to integrate ML solutions.
Top Skills: AWSCi/Cd ToolsConfluenceDatabricksGitJIRAMlflowPysparkPython

What you need to know about the Boston Tech Scene

Boston is a powerhouse for technology innovation thanks to world-class research universities like MIT and Harvard and a robust pipeline of venture capital investment. Host to the first telephone call and one of the first general-purpose computers ever put into use, Boston is now a hub for biotechnology, robotics and artificial intelligence — though it’s also home to several B2B software giants. So it’s no surprise that the city consistently ranks among the greatest startup ecosystems in the world.

Key Facts About Boston Tech

  • Number of Tech Workers: 269,000; 9.4% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Thermo Fisher Scientific, Toast, Klaviyo, HubSpot, DraftKings
  • Key Industries: Artificial intelligence, biotechnology, robotics, software, aerospace
  • Funding Landscape: $15.7 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Summit Partners, Volition Capital, Bain Capital Ventures, MassVentures, Highland Capital Partners
  • Research Centers and Universities: MIT, Harvard University, Boston College, Tufts University, Boston University, Northeastern University, Smithsonian Astrophysical Observatory, National Bureau of Economic Research, Broad Institute, Lowell Center for Space Science & Technology, National Emerging Infectious Diseases Laboratories

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account