Site Reliability Engineer
Our Opportunity:
Site Reliability Engineers are hybrid system and software engineers who are responsible for all operational aspects of Chewy’s eCommerce platform. Site Reliability Engineers are integrated within the infrastructure technology team. The team is responsible for designing, building, monitoring, and maintaining the infrastructure of our internet-facing and internal services. We're looking for engineers who want to be a part of developing infrastructure software, maintaining it, and scaling Chewy’s technology stack.
Come help us build a bigger and better Chewy as a Site Reliability Engineer. You will be part of a small family within Chewy that has a huge impact on our incredible growth.
What you'll do:
- Automate all aspects of server provisioning and operations.
- Work with development and infrastructure teams to design hardware and software platforms to meet the needs of the business.
- Build tools and solutions for bridging software development teams with system infrastructure.
- Build systems to proactively monitor health, performance and security of our production and non-production virtualized infrastructure.
- Design and maintain CI/CD systems and processes.
- Plan automated backups, disaster recovery and fail-over configurations.
What you'll need:
- At least 5 years of experience working in a site reliability engineering type role.
- Hands on experience with orchestration and system configuration tools such as Salt, Ansible, Fabric, Puppet, Chef, Terraform, etc.
- Expert in building and maintaining highly available applications including redundancy, fail over, scalability, monitoring and performance.
- Strong experience with virtualization, monitoring and automation.
- Software development experience (both scripting and “programming” languages).
- Experience working with open source community (troubleshooting, patch submission, etc).
- Ability to organize, troubleshoot and continuously learn.
- Demonstrated 5+ years of Linux System Administration.
- Experience with CI tools such as Bamboo, Jenkins, Hudson or CruiseControl.
- Previous experience working within controls such as SOX, PCI, etc.
- Position may require travel