Cloud Engineering – Site Reliability Engineer (SRE)

athenahealth

Sorry, this job was removed at 10:26 a.m. (EST) on Monday, March 26, 2018

View 628 Jobs

Find out who's hiring in Watertown.

See all Developer + Engineer jobs in Watertown

View 628 Jobs

Apply

By clicking Apply Now you agree to share your profile information with the hiring company.

Save job

The Opportunity: We are seeking highly motivated individuals to help us run modern infrastructure services in the cloud. As we build cloud native services, you will be responsible for owning and operating the services as they scale in the healthcare space.
Position Summary: Site Reliability Engineering (SRE) is designed to combine software and systems engineering to build and run modern infrastructure. The role is designed to increase the reliability of core services that are used both internally and externally. You will focus on automation and scripting to build rapid and repeatable processes and work with product owners to deliver feedback to influence our roadmap based on the customer experience.
Responsibilities may include, but are not limited to:35% [Primary Function] Technical Execution

Take on responsibility for the end to end lifecycle of modern infrastructure services
Maintain and support infrastructure services in both development, integration and production environments
Review services before they go live in production
Enforce rigor on incident response and postmortems, build a culture of retrospect both success and failures
Design proactive monitoring and metrics against supported environment
Focus on automation to improve scale and reliability
Produce accurate, unambiguous technical design specifications to the appropriate detail
Deliver customer value in the form of high quality hardware, software components and services in adherence with IaaS and Release Engineering policies on Security, performance, longevity and Integration.
Identifies and proposes alternative technology in order to create scalable implementations and achieve results.
Coordinate and troubleshoot complex technical issues until resolution.
Accurately estimate the effort of development tasks; help to guide and provide feedback to the team and be more accurate in estimating.
Understand and follow engineering conventions, architectures, and best practices; implement new conventions where necessary, teaching those methodologies to more junior members of the team.
Provide high level T-shirt sizing for the work required to build smaller software components and services.
Scale systems to meet business demand.
Deploy systems to meet availability targets (HA/DR).
Develop automated tests utilizing test infrastructure to validate code, when applicable.
Adhere to DOD (story definition of done) including unit tests, functional testing, code reviews, no regressions, bug fixes, documentation and adhere to best coding practices.
Perform peer code reviews in order to ensure quality standards.
Identify and prioritize what technical debt will be eliminated.

30% Contributions to the Team

Act as the subject matter expert for area of assignment
Identify opportunities to influence the roadmap of infrastructure services.
Lead agile ceremonies to improve team performance.
Participates in team member interview process as needed; influences final hiring decisions.
Act as a scrum master for agile scrum teams as needed.

20% Mentorship of Others

Advise and mentor more junior team members to maximize overall productivity and effectiveness of the team.

15% Cross functional Coordination and Communication

Foster collaboration across the Technology and Product organizations.
Coordinate efforts within own team and immediate team members.
Cultivates strong business relationships with business stakeholders.
Explains solutions in a way that both technical and product audiences can grasp; shares insights with peers.
Share business and technical learnings with the broader dev and product organizations.
Collaborate with members of product and UX teams to design solutions, as appropriate.

Education, Experience, & Skills Required:

1-5 years of experience in an engineering role
Hands on experience in the public cloud, specifically Amazon Web Services (AWS)
Experience in an Agile environment preferred
Bachelor’s Degree or equivalent
Significant software engineering skills and computer science experience
Knowledge of scripting in Python/Bash
Experience with container schedulers such as Kubernetes, Mesosphere, Docker Swarm or ECS
Experience with modern logging stacks such as ElasticSearch or Graylog
Understanding of metrics collectors such as Graphite or Prometheus
Experience with DevOps tooling

Behaviors & Abilities Required:

Ability to learn and adapt in a fast-paced environment, while producing quality code
Ability to work collaboratively on a cross-functional team with a wide range of experience levels
Ability to analyze existing services and identify technical debt to work toward increasing sustainability
Finds creative way to execute even when there is no historical context or known path forward
Ability to design roadmaps and relevant solutions for end-users to access interfaces
Ability to assess the benefits, risks and success factors of potential applications
Strong mentoring and coaching skills that encourage growth for more junior members

Take on responsibility for the end to end lifecycle of modern infrastructure services
Maintain and support infrastructure services in both development, integration and production environments
Review services before they go live in production
Enforce rigor on incident response and postmortems, build a culture of retrospect both success and failures
Design proactive monitoring and metrics against supported environment
Focus on automation to improve scale and reliability
Produce accurate, unambiguous technical design specifications to the appropriate detail
Deliver customer value in the form of high quality hardware, software components and services in adherence with IaaS and Release Engineering policies on Security, performance, longevity and Integration.
Identifies and proposes alternative technology in order to create scalable implementations and achieve results.
Coordinate and troubleshoot complex technical issues until resolution.
Accurately estimate the effort of development tasks; help to guide and provide feedback to the team and be more accurate in estimating.
Understand and follow engineering conventions, architectures, and best practices; implement new conventions where necessary, teaching those methodologies to more junior members of the team.
Provide high level T-shirt sizing for the work required to build smaller software components and services.
Scale systems to meet business demand.
Deploy systems to meet availability targets (HA/DR).
Develop automated tests utilizing test infrastructure to validate code, when applicable.
Adhere to DOD (story definition of done) including unit tests, functional testing, code reviews, no regressions, bug fixes, documentation and adhere to best coding practices.
Perform peer code reviews in order to ensure quality standards.
Identify and prioritize what technical debt will be eliminated.

30% Contributions to the Team

Act as the subject matter expert for area of assignment
Identify opportunities to influence the roadmap of infrastructure services.
Lead agile ceremonies to improve team performance.
Participates in team member interview process as needed; influences final hiring decisions.
Act as a scrum master for agile scrum teams as needed.

20% Mentorship of Others

Advise and mentor more junior team members to maximize overall productivity and effectiveness of the team.

15% Cross functional Coordination and Communication

Foster collaboration across the Technology and Product organizations.
Coordinate efforts within own team and immediate team members.
Cultivates strong business relationships with business stakeholders.
Explains solutions in a way that both technical and product audiences can grasp; shares insights with peers.
Share business and technical learnings with the broader dev and product organizations.
Collaborate with members of product and UX teams to design solutions, as appropriate.

Education, Experience, & Skills Required:

1-5 years of experience in an engineering role
Hands on experience in the public cloud, specifically Amazon Web Services (AWS)
Experience in an Agile environment preferred
Bachelor’s Degree or equivalent
Significant software engineering skills and computer science experience
Knowledge of scripting in Python/Bash
Experience with container schedulers such as Kubernetes, Mesosphere, Docker Swarm or ECS
Experience with modern logging stacks such as ElasticSearch or Graylog
Understanding of metrics collectors such as Graphite or Prometheus
Experience with DevOps tooling

Behaviors & Abilities Required:

Ability to learn and adapt in a fast-paced environment, while producing quality code
Ability to work collaboratively on a cross-functional team with a wide range of experience levels
Ability to analyze existing services and identify technical debt to work toward increasing sustainability
Finds creative way to execute even when there is no historical context or known path forward
Ability to design roadmaps and relevant solutions for end-users to access interfaces
Ability to assess the benefits, risks and success factors of potential applications
Strong mentoring and coaching skills that encourage growth for more junior members

Read Full Job Description

Cloud Engineering – Site Reliability Engineer (SRE)

Location

Similar Jobs