Lead Site Reliability Engineer
ezCater is the #1 nationwide marketplace for business catering – a $21 billion market. We’re backed by Insight Venture Partners and Iconiq Capital, we’re on a path to $1B in 2019, and we’ll get there - even more surely if you come help us.
We’re looking for a top-notch, hands-on SRE to lead our small and talented infrastructure engineering team and help us elevate our game when it comes to designing, building and operating high-performance and highly-available systems.
At ezCater every engineer is responsible for the software they build, and SREs play a critical part in providing the tools, practices, and expertise to support them succeed.
Our production systems are hosted in AWS datacenters running a large Ruby on Rails web application and a handful of smaller services in Ruby, Node.js, and Java. We currently deploy 3-5 times a day. Our systems are stable and fire drills are rare. Technologies we’re currently using include:
- Amazon Web Services (EC2, ELB, S3, RDS, ElastiCache) and Ubuntu Linux
- Postgres, Redis, Memcached, ElasticSearch
- Chef, ServerSpec, Terraform, NewRelic, DataDog, Sumo Logic and Test Kitchen
In this mission-critical role, you would:
- Design, build, and maintain the core infrastructure for ezCater
- Actively manage the backlog for our infrastructure team and work closely with other SREs on the team to provide coaching and mentorship
- Help us increase developer productivity and get to true continuous delivery
- Develop operational and security standards and champion operational excellence and secure coding practices
- Partner with engineering teams closely to educate and consult
- Participate in solution design for new features, products, systems and tooling
- Debug complex problems across the whole stack
- Continually monitor application/system performance and costs, generate actionable insights and either implement or advocate for them
- Participate in on-call rotations, along with every member of the engineering team
- Ruthlessly eliminate repetitive manual tasks and recurring errors
- Ensure we are always employing best-of-breed tooling for all our infrastructure and automation needs
- Collaboratively plot course for the maturing and growth of ezCater’s infrastructure
- Participate (and sometimes run point) in handling production incidents
- Work closely with engineering teams to conduct root cause analysis for production incidents, and evolve infrastructure and tooling.
This role might be that rare opportunity if you:
- Thrive in a highly collaborative, no red-tape, rapid-growth environment
- Love building tooling and infrastructure to help developers be more productive
- Love eliminating repetitive manual tasks through automation
- Have a healthy appreciation of what it means to work in production
- Have solid Unix command line and systems chops
- Have experience with substantial, distributed SaaS or eCommerce systems
- Can point to a solid track record of success leading small-to-medium infrastructure teams
- Have vision and well-informed opinions about how to build infrastructure for a high-growth, technology-driven company that’s headed towards the $1B mark
ezCater is an equal opportunity employer. We embrace humans of every background, appearance, race, religion, color, national origin, gender, gender identity, sexual orientation, age, marital status, veteran status, and disability status. At the same time, we do not employ jerks, even brilliant ones.