Site Reliability Engineer at Chewy
Chewy is looking to hire a Site Reliability Engineer at our Boston, MA location. Site Reliability Engineers are a cross between system and software engineers who are responsible for all operational aspects of Chewy’s e-commerce platform. The team is responsible for designing, building, monitoring, and maintaining the infrastructure of our internet-facing and internal services. We're looking for engineers who want to be a part of developing infrastructure software, maintaining it, and scaling Chewy’s technology stack. Come help us build a bigger and better Chewy as a Site Reliability Engineer. You will be part of a small family within Chewy that has a huge impact on our incredible growth. Ideal candidates will possess the ability to discuss complex technical concepts with a diverse audience across all areas of the organization. They will remain calm under pressure and always strive to add structure to high-pressure, fast paced tasks or projects.
What You'll Do:
- Focus on service stability and reliability by working with application owners to set SLOs, "Error Budget" and backup and DR strategies
- Define application monitoring and alerting strategy
- Perform capacity planning and production readiness assessment
- Embed with product teams during the design and requirements phase of new product development through to initial production launch
- Identify requirements for other operational teams (release engineering, automation, etc.) during application development phase
- Be a technology and Devops evangelist for the rest of the company
- Participate in on-call rotation for level 3 support escalations
What You'll Need:
- At least 5 years of experience working in an SRE role or similar
- Hands on experience with orchestration and system configuration tools such as Ansible, Puppet, Chef, Terraform, etc.
- Expert in building and maintaining highly available applications including redundancy, fail over, scalability, monitoring and performance.
- Strong experience with virtualization, monitoring and automation
- Software development experience (both scripting and “programming” languages)
- Experience working with open source community (troubleshooting, patch submission, etc.)
- Demonstrated 5+ years of Linux System Administration
- Experience with CI tools such as Bamboo, Jenkins, Hudson
- Ability to organize, troubleshoot and continuously learn
- Previous experience working within controls such as SOX, PCI, etc.
- This position may require travel
If you have a disability under the Americans with Disabilities Act or similar law, or you require a religious accommodation, and you wish to discuss potential accommodations related to applying for employment at our company, please contact [email protected]