Site Reliability Engineer at Interactions
Interactions is seeking an enthusiastic Software Engineer to serve in the capacity of Site Reliability Engineer. Ideally you have come up through the ranks as Software Engineer and are now focused to make delivering software as reliable, repeatable as possible. You have a knack for solving deep technical issue with your troubleshooting skills and want to turn that around by improving all aspects up and down the stack, from code improvements, blameless post mortems, automation and processes. If this sound appealing to you, join a dynamic team of DevOps, Data and SRE engineers in an exciting company at the crossroads of Machine Learning and Artificial Intelligence.
Creating tools to proactively monitor and improve end-to-end system performance, identify deficiencies, and potential failures throughout our infrastructure. Build deep, end-to-end knowledge of the complexity of our platform and continuously create improvements and automation to enhance durability, performance and supportability of the platform.
Essential Job Functions*:
- Lead development of processes and software necessary to maintain services post-deployment through data collection and monitoring ensuring overall health of the services provided.
- Develop new metrics/monitoring dashboards as additional coverage events become necessary.
- Monitor and continuously improve the availability and performance of infrastructure, systems and applications.
- Create and maintain documentation for processes, supported infrastructure resources and services
- Drive supportability improvements by improving automation, automatic alerting, self-healing architectures, etc.
- Create new alerts, find anomalies, fix things, and ask why something broke
- Manage, monitor, and troubleshoot daily processes and make improvements to current processes related to production operations
- Capture and analyze data on Systems Availability, MTBF, and MTTR across all Digital channels; identify patterns and drive changes to both systems and processes to provide sustained improvements.
- As a technology subject matter expert (SME), you will mentor platform and application engineers to stretch their knowledge and perspective.
- Troubleshoot and debug software delivered by various development teams and ensure that more junior members of the team are capable of the same; coach team members in this practice.
- Document automation and the interaction of software and system as necessary to enable in others; ensure that other members of the team meet the same high standard of documentation.
Preparation, Knowledge, Skills and Abilities:
- BS Degree in Computer Science or equivalent experience / technical degree
- 5+ years experience with supporting SaaS environment at scale
- 3+ years experience in software development, preferable Java and Python
- Ability to triage multiple issues simultaneously and work well under pressure
- Solid understanding/experience of networking, virtualization, storage, and monitoring
- Experience with Linux systems administration.
- Knowledge of and experience with network stack, protocols, network management and monitoring tools (Nagios, Check_MK, Splunk, Grafana)
- Solid project management and time management skills – ability to adjust to shifting priorities
- Driven to learn and try new things
- Experience supporting applications with a 24x7 SLA, and providing on-call support as part of a group rotation.
- Comfortable working in a NOC environment
- Excellent interpersonal and communication (oral, listening and writing) skills are required, especially to non-technical and senior leadership audiences.
- Strong collaboration skills and ability to communicate all aspects of the requirements, including the creation of formal documentation.
- Presentation skills to a variety of audiences
- Ability to demonstrate Interactions Values of:
- Being passionate about customer service
- Obsessing with our customer’s success
- Respecting each other
- Creating opportunity
- Embracing disruption
- Doing what we say we will do
- Experience working with various cloud providers (AWS/Azure)
- Knowledge of SIP / VoIP
- Hands on experience in an agile environment such as Scrum, Scrumban or XP.