RAI Institute Logo

RAI Institute

Machine Learning Operations Engineer

Posted 22 Days Ago
Cambridge, MA
Mid level
Cambridge, MA
Mid level
As a Machine Learning Operations Engineer, you will design and maintain ML platforms, enhance observability, and collaborate with various engineering teams to enable machine learning applications.
The summary above was generated by AI

Our Mission


Our mission is to solve the most important and fundamental challenges in AI and Robotics to enable future generations of intelligent machines that will help us all live better lives.


Machine Learning Operations (ML-Ops) Engineers build infrastructure that supports the entire lifecycle of Machine Learning (ML) projects from development to scaling and to deployment. If you have a passion for building the foundation that enables robotics research and engineering, you will want to join us! 

What You Will Do

  • Design, develop, and maintain company-wide platforms and tooling that utilize Kubernetes infrastructure to enable machine learning and data processing applications
  • Enable self-service access to ML-compute for our on-prem and cloud compute clusters, including support for job scheduling, workload scalability and workload fault tolerance
  • Enhance observability across ML applications through integrations with tools and services such as FluentD, Prometheus, Grafana and DataDog
  • Integrate ML applications with experiment tracking and management services like Weights and Biases
  • Elevate code quality and champion best practices in our engineering processes
  • Collaborate with Machine Learning Engineers, Data Engineers, DEVOPs engineers and researchers to build scalable solutions that improve engineering and research velocity.

What You Will Bring

  • BS or MS in Computer Science, Engineering, or equivalent
  • 3+ years of experience in an MLOPs, DevOps, ML Engineering or software engineering role
  • Strong hands-on experience deploying and managing applications running on Kubernetes
  • Experience developing MLOPS platforms to manage the lifecycle of ML experiments; including one or more of data and artifact management, reproducibility, fault-tolerance, experiment tracking and model serving
  • Experience with Docker and Python environment management tools such as pip, poetry, uv or similar
  • Proficient in software practices such as version control (Git), CI/CD (Github Actions, ArgoCD), Infrastructure as Code(Terraform).

Extra Skills We Value

  • Experience with Kueue, or similar job scheduling mechanisms
  • Experience with workflow orchestration tools such as Airflow, Metaflow, Argo Workflows or similar
  • Hands-on experience deploying and managing cloud infra on platforms like GCP and AWS
  • Experience with hybrid-cloud compute and data environments
  • Experience with Ray, Pytorch Lightning or similar scalable AI/ML platforms
  • Experience with application and system, logging with tools and services like FluentD, Prometheus, Grafana and DataDog or similar
  • Experience with Bazel build tool or similar
  • Experience with ML model serving frameworks such as Torchserve, ONNX runtime or similar
  • Experience working with research teams in an academic or industrial environment.

We provide equal employment opportunities to all employees and applicants for employment and prohibit discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws.

Top Skills

AWS
Ci/Cd
Datadog
Docker
Fluentd
GCP
Git
Grafana
Kubernetes
Prometheus
Python
Terraform
HQ

RAI Institute Cambridge, Massachusetts, USA Office

145 Broadway, Cambridge, MA, United States, 02142

Similar Jobs

3 Days Ago
Remote
Hybrid
Boston, MA, USA
Senior level
Senior level
Artificial Intelligence • Cloud • Information Technology • Sales • Security • Software • Cybersecurity
The Principal AI/ML Ops Engineer drives MLOps strategy, enhancing ML model deployment and collaborating on AI-driven security solutions.
Top Skills: AWSDockerFastapiFlaskKubernetesPythonSagemakerTerraformTypescript
11 Days Ago
Quincy, MA, USA
123K-188K Annually
Senior level
123K-188K Annually
Senior level
AdTech • eCommerce • Food • Marketing Tech • Retail
The MLOps Engineer will design and implement ML infrastructure, manage ML model lifecycles, and collaborate with teams to automate ML workflows.
Top Skills: C#DockerJavaKubernetesPython
Yesterday
Cambridge, MA, USA
Senior level
Senior level
Biotech
The Senior MLOps Engineer will develop MLOps pipelines for LLMs, optimize AI systems, manage observability tools, and improve model deployment workflows.
Top Skills: AWSEc2EcsKubernetesLambdaMetaflowMlflowPythonRdsS3

What you need to know about the Boston Tech Scene

Boston is a powerhouse for technology innovation thanks to world-class research universities like MIT and Harvard and a robust pipeline of venture capital investment. Host to the first telephone call and one of the first general-purpose computers ever put into use, Boston is now a hub for biotechnology, robotics and artificial intelligence — though it’s also home to several B2B software giants. So it’s no surprise that the city consistently ranks among the greatest startup ecosystems in the world.

Key Facts About Boston Tech

  • Number of Tech Workers: 269,000; 9.4% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Thermo Fisher Scientific, Toast, Klaviyo, HubSpot, DraftKings
  • Key Industries: Artificial intelligence, biotechnology, robotics, software, aerospace
  • Funding Landscape: $15.7 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Summit Partners, Volition Capital, Bain Capital Ventures, MassVentures, Highland Capital Partners
  • Research Centers and Universities: MIT, Harvard University, Boston College, Tufts University, Boston University, Northeastern University, Smithsonian Astrophysical Observatory, National Bureau of Economic Research, Broad Institute, Lowell Center for Space Science & Technology, National Emerging Infectious Diseases Laboratories

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account