Cimulate AI Logo

Cimulate AI

Site Reliability Engineer

Reposted 4 Days Ago
In-Office
Boston, MA
Mid level
In-Office
Boston, MA
Mid level
The Site Reliability Engineer will ensure the reliability and performance of SaaS production systems, manage deployments and incident responses, and improve operational processes within a dynamic AI environment.
The summary above was generated by AI

Cimulate is an AI-native eCommerce search and discovery platform built on cutting-edge LLM technology. We help commerce brands deliver radically better shopping experiences—faster, more relevant, and easier to manage—by understanding context rather than relying on brittle rules-based personalization. Backed by top investors and founded by a team who’ve already built and sold another commerce startup (Celect to Nike), we’re now building the future of contextual commerce, and we’re just getting started.  

The Role
Cimulate is seeking a skilled Site Reliability Engineer to join our dynamic team as we revolutionize the future of commerce through intelligent, AI-driven systems. In this pivotal role, you’ll own the reliability, availability, and performance of our SaaS production environment—monitoring critical systems, managing deployments, and ensuring seamless operations for our customers. As a Site Reliability Engineer, you’ll manage production support processes, deployments (including model releases), and incident response, with an opportunity to grow the role into managing vendor partners for 24/7 follow-the-sun coverage. This position combines hands-on technical problem-solving with process ownership and operational leadership.

Your work will directly contribute to the stability and scalability of Cimulate’s AI platform, supporting our mission to help businesses operate and engage more intelligently.

Responsibilities

  • Ensure reliability, availability, and performance of SaaS production systems and AI pipelines.
  • Monitor production environments, deployed models, and data pipelines; respond rapidly to incidents and service disruptions.
  • Manage deployments, configuration changes, and release processes (e.g., model and service rollouts).
  • Maintain and enhance observability, monitoring, and alerting systems (e.g., Grafana, Prometheus, ELK).
  • Lead incident response, postmortems, and continuous improvement of operational processes and playbooks.
  • Partner with DevOps and engineering teams to improve scalability, fault tolerance, and automation.
  • Track and improve reliability metrics (SLAs, SLOs, SLIs).
  • Create and maintain clear technical documentation, including runbooks and escalation paths.
  • Participate in on-call rotation and drive improvements in incident management and response.
  • Grow into managing vendor teams providing 24/7 L1 operational coverage.

Requirements

  • Proven experience in monitoring and supporting production systems, preferably in a SaaS or multi-tenant environment.
  • Strong knowledge of Linux systems and scripting (Python, Bash, or Go).
  • Hands-on experience with cloud platforms (GCP preferred; AWS/Azure also valuable) and container orchestration (Kubernetes, Docker).
  • Familiarity with Infrastructure-as-Code (IaC) tools such as Terraform or Pulumi.
  • Understanding of networking, databases, and performance tuning.
  • Experience with observability, monitoring, and logging tools (Grafana, Prometheus, ELK, etc.).
  • Proficiency with Git, version control workflows, and CI/CD pipelines.
  • Strong analytical, debugging, and problem-solving skills.
  • Excellent communication and collaboration abilities, including with non-technical stakeholders.
  • Calm under pressure and effective in incident management situations.
  • Growth mindset with the ambition to build and lead scalable 24/7 production operations.

Nice to haves

  • Experience working with security, compliance, or audit frameworks.
  • Exposure to AI/ML pipelines or data-driven systems.
  • Prior experience managing offshore or vendor-based support teams.

Why Join Cimulate?

  • Work with a passionate and collaborative founding team
  • Make a real impact at an early-stage startup with high-growth potential
  • Help redefine the future of online shopping and personalization
  • Competitive compensation, equity, and benefits

Top Skills

AWS
Azure
Bash
Docker
Elk
GCP
Git
Go
Grafana
Kubernetes
Prometheus
Pulumi
Python
Terraform
HQ

Cimulate AI Boston, Massachusetts, USA Office

Boston, MA, United States, 02108

Similar Jobs

16 Hours Ago
In-Office
Raynham, MA, USA
77K-124K Annually
Mid level
77K-124K Annually
Mid level
Healthtech • Pharmaceutical • Manufacturing
The Site Reliability Engineer role involves enhancing reliability, analyzing equipment data, and coaching junior staff, while ensuring compliance with safety regulations.
Top Skills: Ecmms Software
2 Days Ago
In-Office
2 Locations
95K-151K Annually
Mid level
95K-151K Annually
Mid level
Cloud • Information Technology • Internet of Things • Software • Consulting • Infrastructure as a Service (IaaS) • Automation
Design, automate, and support OpenShift-based platforms, ensuring reliability and security while onboarding new managed services and handling incident responses.
Top Skills: ArgoGoGrafanaJenkinsKubernetesLinuxOpenshiftPrometheusPythonTekton
6 Days Ago
Easy Apply
Hybrid
Boston, MA, USA
Easy Apply
135K-220K Annually
Mid level
135K-220K Annually
Mid level
Artificial Intelligence • Enterprise Web • Information Technology • Machine Learning • Mobile • Software • Analytics
The Site Reliability Engineer will improve alert quality, maintain infrastructure, and enhance operational security while collaborating with teams.
Top Skills: Cloud TechnologiesGkeKubernetes

What you need to know about the Boston Tech Scene

Boston is a powerhouse for technology innovation thanks to world-class research universities like MIT and Harvard and a robust pipeline of venture capital investment. Host to the first telephone call and one of the first general-purpose computers ever put into use, Boston is now a hub for biotechnology, robotics and artificial intelligence — though it’s also home to several B2B software giants. So it’s no surprise that the city consistently ranks among the greatest startup ecosystems in the world.

Key Facts About Boston Tech

  • Number of Tech Workers: 269,000; 9.4% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Thermo Fisher Scientific, Toast, Klaviyo, HubSpot, DraftKings
  • Key Industries: Artificial intelligence, biotechnology, robotics, software, aerospace
  • Funding Landscape: $15.7 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Summit Partners, Volition Capital, Bain Capital Ventures, MassVentures, Highland Capital Partners
  • Research Centers and Universities: MIT, Harvard University, Boston College, Tufts University, Boston University, Northeastern University, Smithsonian Astrophysical Observatory, National Bureau of Economic Research, Broad Institute, Lowell Center for Space Science & Technology, National Emerging Infectious Diseases Laboratories

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account