Site Reliability Engineer
Our Opportunity:
Chewy is looking to hire Site Reliability Engineers at our Boston, MA location. Site Reliability Engineers are a cross between system and software engineers who are responsible for all operational aspects of Chewy’s e-commerce platform. The team is responsible for designing, building, monitoring, and maintaining the infrastructure of our internet-facing and internal services. We're looking for engineers who want to be a part of developing infrastructure software, maintaining it, and scaling Chewy’s technology stack. Come help us build a bigger and better Chewy as a Site Reliability Engineer. You will be part of a small family within Chewy that has a huge impact on our incredible growth. Ideal candidates will possess the ability to discuss complex technical concepts with a diverse audience across all areas of the organization. They will remain calm under pressure and always strive to add structure to high-pressure, fast paced tasks or projects.
What You’ll Do:
- Focus on service stability and reliability by working with application owners to set SLOs, "Error Budget" and backup and DR strategies
- Complete understanding of operational tools and concepts, such as alerting, monitoring, logging and health checks
- Perform capacity planning and production readiness assessment
- Embed with product teams during the design and requirements phase of new product development through to initial production launch
- Identify requirements for other operational teams (release engineering, automation, etc.) during application development phase
- Be a technology and DevOps evangelist for the rest of the company
- Participate in on-call rotation for level 3 support escalations
What You’ll Need:
- At least 5 years of experience working in an SRE role or similar
- Hands on experience with orchestration and system configuration tools such as Ansible, Puppet, Chef, Terraform [preferred], etc.
- Minimum 5+ years of experience in building and managing applications in public cloud platforms like AWS (preferred), GCP or Azure
- Expert in building and maintaining highly available applications including redundancy, fail over, scalability, monitoring and performance
- Highly skilled in identifying performance bottlenecks, identifying anomalous system behavior, and resolving root cause of service issues
- Solid understanding/experience of web services, databases and relating infrastructure/architecture
- Experience working with open source community (troubleshooting, patch submission, etc.)
- Demonstrated 5+ years of Linux System Administration
- Experience with CI tools such as Bamboo, Jenkins, CircleCI
- Ability to organize, troubleshoot and continuously learn
- Previous experience working within controls such as SOX, PCI, etc
- Candidate must possess a Bachelor’s degree in Computer Science, or related field, or equivalent experience
- Position may require travel
Bonus (if applicable):
- AWS Certified Solutions Architect
- Advanced Terraform knowledge and orchestration using Jenkins
- Datadog Integration expertise for containers
If you have a disability under the Americans with Disabilities Act or similar law, or you require a religious accommodation, and you wish to discuss potential accommodations related to applying for employment at our company, please contact [email protected].
To access Chewy’s Privacy Policy, which contains information regarding information collected from job applicants and how we use it, please click here: Chewy Privacy Policy (https://www.chewy.com/app/content/privacy).