Principal Site Reliability Engineer
Our architecture group is looking for a Principal Site Reliability Engineer (a.k.a. Site Reliability Architect) to create frameworks that combine engineering and application development to drive operational stability. You will work with engineering and platform teams that develop and support some of the latest technologies focused on the public cloud and containers.
As an experienced SRE professional, you work with the core teams combining software practices and engineering to strengthen the application/system reliability along with operational support. Your hands-on knowledge in system design, application development, testing, and operational stability helps transform the way the teams are operating to ensure they deliver high-quality products. You enjoy being instrumental in establishing best practices and tooling to automate operational processes.
About the SRA role...
- This role can be filled in our Boston, MA or Portland, OR office
- Architect a new common framework to establish an SRE Model across multiple teams
- Develop new processes to prevent problem recurrence; automating response to all non-exceptional service conditions
- Enhance SLO trending and centralized reporting
- Identify opportunities to improve architecture/engineering practices
- Mentor staff to replace manual processes with automation
- Coach teams to enhance incident response handling
- Collaborate across all level of the organization to drive the SRE model
The ideal candidate has...
- Bachelor's degree in one of the following: Management Information Systems, Computer Science, Software Engineering, Technology, and/or other related fields of study
- 5+ years of experience as a Site Reliability Engineer
- Ability to apply a systematic approach to solve problems with a sense of ownership and focus
- Effective communication skills with the ability to articulate technical details to different, sometimes non-technical audiences
- Expertise in designing, analyzing and troubleshooting large-scale distributed systems
- Advanced experience in supporting enterprise container based platforms
- Experience in cloud technologies such as architecting, developing or maintaining cloud solutions in public cloud environments (AWS/GCP)
- CI/CD - Deployment pipeline experience (Jenkins, Ansible)
- Familiarity with REST API design
- Devops container/orchestration tools (Kubernetes, Docker, Puppet, etc)
- AWS Deep knowledge
- Good knowledge of Python, GO, or similar scripting languages
- Experience with Configuration Management systems
- Knowledge of Unix/Linux based systems, and experience troubleshooting applications running on these systems
- Experience with software lifecycle including design, implementation, and delivery
- Agile environment experience
Acquia is an equal opportunity (EEO) employer. We hire without regard to age, color, disability, gender (including gender identity), marital status, national origin, race, religion, sex, sexual orientation, veteran status, or any other status protected by applicable law.