Site Reliability Engineering
Meet CarGurus—the #1 visited online car shopping website in the US. At CarGurus, we’re building the world’s most trusted and transparent automotive marketplace where it’s easy to find great deals from top-rated dealers.
Founded in 2006 by Langley Steinert (co-founder of TripAdvisor), CarGurus is a technology company with a passion for data and its power to simplify every aspect of the car shopping experience. Using proprietary technology, search algorithms and innovative data analytics, we provide unbiased validation on pricing, dealer reputation and vehicle history.
What You'll Do
Our Site Reliability Engineering team applies industry best practices and principles to ensure service reliability and resiliency. We accomplish this by:
- Collaborating with Engineering and Product Managers to define SLOs and monitoring of well designed SLIs
- Embedding with Engineering teams and Independently addressing or collaborating to complete architectural improvements
- Being the primary escalation for major incidents involving assigned services
- Participating in an on-call rotation
- Owning our Incident Response Process, including conducting blameless Postmortems
- Increasing robustness by automation of workflows, process improvements, CI/CD pipelines, and integrating modern toolsets
- Refusing to accept manual work as a solution to areas of weakness
- Partnering with Engineering teams to ensure new services are production ready
- Championing our organizational standards for designing, deploying, and scaling our products
- Making Data-Driven decisions to drive continuous improvement
- Evolving our tooling, logging, monitoring and alerting systems to increase observability and transparency
What You Bring to the Table
- Demonstrable strong background in software engineering with multiple languages and a firm belief in continuous testing and delivery, or significant relative operational experience running services at scale
- A bias for action, but sufficient emotional intelligence to approach colleagues with positive regard and understanding their challenges and decisions
- Curiosity and the acceptance that there are always ways to learn and grow
- The desire to be an active contributor in a collaborative and fast-paced environment
- Excitement in solving puzzles, discovering how a new service or tool works by identifying the individual components, libraries, and relationships it is built upon
- Understanding of technologies beyond coding such as Systems Engineering, Load Balancing, Configuration Management, Networking, Operating Systems, Troubleshooting, and Monitoring
- Comfort in dealing with Incidents and Availability Issues
- Familiarity with working with Cloud and Bare Metal infrastructure
- Exposure to industry standard observability tools and services
Technologies We Use
Terraform, Honeycomb, AWS, Prometheus, Java, Go, Ansible, Chef, Grafana, Docker, Kubernetes, Kafka, Elasticsearch, Sentry, Bazel, Concourse, Artifactory
At the core of our company culture is a spirit of innovation, curiosity and collaboration. True to our start-up roots, we’re nimble, flexible and hardworking. We have a great respect for testing and learning and a healthy aversion to scheduling meetings to discuss meetings. Lunch is catered daily. Gym membership is free. Foosball and ping pong are played often. Now a publicly-traded company, we’re as committed as ever to cultivating the culture that got us here.
In addition to the US, CarGurus operates sites in Canada, the UK and Germany with other markets on the horizon. Our offices are located in Cambridge, MA, Detroit, MI and Dublin, Ireland. If you’d like to learn more, please visit our careers page.