Engineering Manager, Site Reliability - Resilience
ezCater is the world’s largest online marketplace for catering – a $60+ billion market in the U.S. We make it superbly easy for businesspeople to find and order great food for meetings and events, and we help our catering partners grow their business. We’re backed by $320 million in venture funding and in early 2019 were valued at $1.25 billion. Our mission is to power the world’s catering, and we’ll make it happen – even more surely if you come help us.
At ezCater, every engineer is responsible for the software they build. We’re looking for a top-notch, hands-on SRE Manager to build a Resilience Engineering team, focused on assisting engineers in improving the quality of their microservices with respect to reliability. We need you and your team to engineer solutions and instill practices that will detect and find ways software systems can fail, and prevent them from doing such. This is a high impact role that will significantly improve the reliability of ezCater’s software across the business.
Our production systems are hosted in AWS data centers running multiple Rails and Node.js services in Kubernetes. We employ continuous delivery to allow our developers to deploy as often as they need. Our systems are stable and fire drills are rare.
Technologies we’re currently using include:
- Amazon Web Services (EC2, S3, RDS, ElastiCache) and Ubuntu Linux
- Kubernetes, Postgres, Redis, Memcached, ElasticSearch
- Terraform, Chef, Fluentd, Test Kitchen, DataDog, Sumo Logic
In this mission-critical role, you will:
- Build, lead, and mentor a talented team of engineers to execute on projects, delight customers, and propel the business even higher into the stratosphere
- Level up the company with respect to writing microservices resistant and resilient to failure scenarios
- Design a framework and assessment system for measuring the production readiness of all microservices and infrastructure components deployed
- Create and operate new testing solutions for software running on our internal Kubernetes microservices hosting platform, such as Chaos Testing and Load Testing as a Service.
- Ruthlessly eliminate repetitive manual tasks and recurring errors
- Develop operational standards and champion operational excellence and secure coding practices
- Continually monitor application/system performance and costs, generate actionable insights and either implement or advocate for them
- Debug complex problems across the whole stack
- Participate (and sometimes run point) in handling production incidents
- Participate in solution design for new features, products, systems, and tooling
- Ensure we are always employing best-of-breed tooling for all our infrastructure and automation needs
- Collaboratively plot course for the maturing and growth of ezCater’s infrastructure
- Work closely with engineering teams to conduct root cause analysis for production incidents, and evolve infrastructure and tooling
- Participate in on-call rotations, along with every member of the engineering team
This role might be that rare opportunity if you:
- Thrive in a highly collaborative, no red-tape, rapid-growth environment
- Love building tooling and infrastructure to help developers be more productive
- Love eliminating repetitive manual tasks through automation
- Have a healthy appreciation of what it means to work in production
- Have solid Unix command line and systems chops
- Have experience with substantial, distributed SaaS or eCommerce systems
- Have vision and well-informed opinions about how to build infrastructure for a high-growth, technology-driven company that’s headed towards the $2B mark
- A passion for building and growing high performing teams. Prior management experience is not required, however a passion for inspiring and leading others is
What you’ll get from us:
Importantly, you’ll get a tremendous amount of authority and autonomy. You’ll own your outcomes and see measurable results for your efforts. With ezCater’s radical transparency and trust, you’ll have open access to the data that drives our decisions. ezUniversity sessions will provide plenty of opportunities to expand your mind.
At the same time, you’ll get sane working hours and a huge amount of flexibility around work/life balance. Have people in your life – of any age – who always, often, or sometimes need your help? We make room for that. Have a bad thing or a good thing happen to you? We make room for that, too.
Oh, and here’s what else you’ll get: Market salary, stock options you’ll help make worth a lot, the usual holidays, all-you-can-eat vacation, 401K, health/dental/FSA, long-term disability insurance, subsidized T-passes, a great office in the heart of Boston, a tremendous amount of responsibility and autonomy, wicked awesome co-workers, cupcakes (and many more goodies), and knowing that you helped get this rocket ship to the moon.
ezCater is an equal opportunity employer. We embrace humans of every background, appearance, race, religion, color, national origin, gender, gender identity, sexual orientation, age, marital status, veteran status, and disability status. At the same time, we do not employ jerks, even brilliant ones.