Blitzy Logo

Blitzy

Senior Site Reliability Engineer

Posted 10 Hours Ago
Be an Early Applicant
In-Office
Cambridge, MA, USA
160K-180K Annually
Senior level
In-Office
Cambridge, MA, USA
160K-180K Annually
Senior level
Lead design, build, and operation of scalable, fault-tolerant cloud infrastructure. Define SLOs/SLAs, improve observability and incident response, own CI/CD and deployment automation, partner with engineering teams on reliability, capacity planning, performance benchmarking, cost optimization, and security for an AI platform.
The summary above was generated by AI

About Blitzy

Blitzy is a Cambridge, MA based AI software development platform on a mission to revolutionize the software development life cycle by autonomously building custom software to unlock the next industrial revolution. We're transforming how enterprises build software, turning enterprise requirements into production-ready code with an agentic software development platform that can autonomously execute 80% of the quantum of software development work. We're backed by multiple tier 1 investors, and have proven success as founders of previous start-ups.

 

Location: Cambridge, MA (In-Office)

Compensation: $160,000 - $180,000 + equity eligibility based on performance

The Role

As a Senior Site Reliability Engineer at Blitzy's Cambridge headquarters, you will be the backbone of our platform's reliability, scalability, and operational excellence. You'll work at the intersection of software engineering and infrastructure, ensuring our AI-powered development platform remains highly available and performant as we scale rapidly. This is a high-impact, hands-on role for an engineer who thrives in a fast-moving environment and takes deep ownership of the systems they build.

 

What Success Looks Like

  • In 30 days: You have a deep understanding of Blitzy's infrastructure architecture, have identified key reliability risks, and are actively contributing to on-call rotations.

  • In 90 days: You have shipped meaningful improvements to observability, incident response workflows, and deployment pipelines that measurably reduce MTTR and increase system uptime.

  • In 6 months: You have driven at least one major reliability initiative from inception to production, established SLO/SLA frameworks for critical services, and are a trusted technical voice shaping our infrastructure roadmap.

 

Areas of Ownership

  • Design, build, and operate scalable, fault-tolerant infrastructure across cloud environments (AWS, GCP, or Azure).

  • Define and enforce SLOs, SLAs, and error budgets; lead blameless postmortems and drive systemic improvements.

  • Build and maintain robust CI/CD pipelines, release automation, and deployment infrastructure.

  • Own observability: design and maintain logging, metrics, tracing, and alerting stacks (e.g., Prometheus, Grafana, Datadog, OpenTelemetry).

  • Partner closely with software engineering teams to embed reliability practices into the development lifecycle.

  • Drive capacity planning, performance benchmarking, and cost optimization across our infrastructure.

  • Champion security best practices within the infrastructure and deployment layers.

 

Required Experience

  • 5+ years of experience in Site Reliability Engineering, DevOps, or Infrastructure Engineering roles.

  • Strong proficiency in at least one major cloud platform (AWS preferred); experience with Kubernetes and container orchestration at scale.

  • Hands-on experience with infrastructure-as-code tools (Terraform, Pulumi, or equivalent).

  • Proven track record designing and maintaining high-availability, distributed systems.

  • Deep expertise in observability tooling, incident management, and on-call practices.

  • Strong scripting and automation skills (Python, Go, Bash, or similar).

  • Excellent communication skills with the ability to collaborate across engineering teams and present technical findings to leadership.

 

What Makes You Stand Out

  • Experience supporting AI/ML workloads or GPU-accelerated infrastructure.

  • Prior experience in a high-growth startup environment where you wore multiple hats.

  • Familiarity with eBPF, service mesh technologies (Istio, Linkerd), or advanced networking.

  • Contributions to open-source SRE/DevOps tooling or communities.

  • Experience building global, multi-region infrastructure with strict latency and availability requirements.

 

What Makes This Role Different

You won't be maintaining legacy systems or fighting fires in a sprawling monolith. At Blitzy, you're building reliability into a greenfield AI platform that is redefining how the world creates software. You'll have direct influence over architectural decisions, work side-by-side with world-class engineers, and see the tangible impact of your work as we scale to serve Fortune 500 customers. As a founding member of the Pune SRE team, you'll help shape the culture and technical standards of a team that will grow with the company.

 

Our Culture

 

Who we are:

Led by two pioneering co-founders we are one of the fastest growing companies in the U.S., creating our own category of enterprise autonomous software development. We automate thousands of hours of software development for our customers, which includes strong representation within the Fortune 500.

 

How we work:

We move Blitzy Fast: Time is both our company's and our clients' most precious asset. We move quickly and decisively to innovate internally and deliver exceptional software externally.

Championship Mindset: We operate like a professional sports team. We win as a team by holding ourselves and each other to high standards, collaborating in-person, and remaining focused on the mission.

Passion for Invention: We're pushing the frontier of what's possible, requiring constant innovation and iteration.

We Work for the Customer: We focus on delivering outsized value to the customers we work with and expanding those relationships into deep, meaningful partnerships.

 

We believe in being 'everyday athletes'—taking care of ourselves so we can bring our best minds to work. We promote great sleep, movement, and restorative activities for optimal mental performance. It makes for a happier and more productive team.

 

Blitzy is an equal opportunity employer committed to building a diverse and inclusive team. We believe different perspectives make us stronger.

 

Similar Jobs

7 Days Ago
Hybrid
Quincy, MA, USA
125K-188K Annually
Senior level
125K-188K Annually
Senior level
AdTech • eCommerce • Food • Marketing Tech • Retail
Lead design and implementation of cloud-native, highly available infrastructure and automation. Improve reliability via IaC, observability, incident response, SLOs, CI/CD, Kafka-based architectures, on-call support, mentoring, and cross-team reliability initiatives.
Top Skills: AksArgocdAWSAzureBashDatadogDockerElkGCPGitGithub ActionsGitopsGoJavaKafkaKubernetesPrometheusPythonRedisSpring BootTerraformTomcatUbuntu
21 Days Ago
Remote or Hybrid
United States
175K-200K Annually
Senior level
175K-200K Annually
Senior level
eCommerce • Fintech • Payments • Software
The role involves ensuring software reliability and performance, managing incidents, developing infrastructure automation, and mentoring junior engineers within a platform team.
Top Skills: AWSCloudFormationDatadogKubernetesOpentelemetryRubyRuby On RailsTerraform
21 Days Ago
In-Office or Remote
5 Locations
105K-198K Annually
Senior level
105K-198K Annually
Senior level
Artificial Intelligence • Healthtech • Information Technology • Other • Analytics
The Senior Site Reliability Engineer will manage and optimize cloud infrastructure on AWS, design Kubernetes clusters, and automate workflows while mentoring junior members.
Top Skills: AWSGithub ActionsKubernetesNewrelicPython

What you need to know about the Boston Tech Scene

Boston is a powerhouse for technology innovation thanks to world-class research universities like MIT and Harvard and a robust pipeline of venture capital investment. Host to the first telephone call and one of the first general-purpose computers ever put into use, Boston is now a hub for biotechnology, robotics and artificial intelligence — though it’s also home to several B2B software giants. So it’s no surprise that the city consistently ranks among the greatest startup ecosystems in the world.

Key Facts About Boston Tech

  • Number of Tech Workers: 269,000; 9.4% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Thermo Fisher Scientific, Toast, Klaviyo, HubSpot, DraftKings
  • Key Industries: Artificial intelligence, biotechnology, robotics, software, aerospace
  • Funding Landscape: $15.7 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Summit Partners, Volition Capital, Bain Capital Ventures, MassVentures, Highland Capital Partners
  • Research Centers and Universities: MIT, Harvard University, Boston College, Tufts University, Boston University, Northeastern University, Smithsonian Astrophysical Observatory, National Bureau of Economic Research, Broad Institute, Lowell Center for Space Science & Technology, National Emerging Infectious Diseases Laboratories

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account