Five9

Senior Site Reliability Engineer

Posted 2 Days Ago

Remote

Hiring Remotely in United States

92K-220K Annually

Senior level

Remote

Hiring Remotely in United States

92K-220K Annually

Senior level

The Senior Site Reliability Engineer will develop and maintain systems for reliability and scalability, focusing on software development, automation, incident response, and collaboration with teams.

The summary above was generated by AI

Join us in bringing joy to customer experience. Five9 is a leading provider of cloud contact center software, bringing the power of cloud innovation to customers worldwide.

Living our values everyday results in our team-first culture and enables us to innovate, grow, and thrive while enjoying the journey together. We celebrate diversity and foster an inclusive environment, empowering our employees to be their authentic selves.

We are seeking a Senior Site Reliability Engineer (SRE) to join our team and help build and maintain highly reliable, scalable systems. This role combines software engineering and operations expertise to ensure our services meet ambitious reliability targets while enabling rapid development and deployment.
This position requires approximately 50% software development and 50% operational work, focusing on automation, monitoring, and system reliability rather than manual operations. The team works collaboratively with our platform, application and database teams to provide a reliable and available service.

Key Responsibilities

Observability & Monitoring (25%)

Dashboards & Metrics: Design and implement comprehensive dashboards. These dashboards cover basis OS/platform level monitoring and application-level monitoring. These dashboards are broken into primary (RED) and secondary indicators (USE).
Availability & Reliability: Establish and maintain SLIs (Service Level Indicators), SLOs (Service Level Objectives), and error budgets for the service.
Performance Monitoring: Build alerting systems and performance monitoring to proactively identify and resolve issues before they impact users
Incident Response: Participate in on-call rotations and lead incident response efforts, including post-mortem analysis and remediation. Maintain the official on-call routing.

Infrastructure Automation & Deployment (25%)

CI/CD Pipeline Management: Maintain continuous integration and deployment pipelines working with our cloud and on-premise deployment teams.
Infrastructure as Code: Develop and maintain infrastructure using tools like Terraform, Ansible, or similar
Configuration Management: Automate system configuration and ensure consistency across environments

Security & Compliance (15%)

Security Automation: Implement security scanning, vulnerability management, and compliance monitoring
Access Control: Maintain proper authentication, authorization, and audit logging systems
Compliance Reporting: Ensure systems meet regulatory requirements and industry standards
Security Incident Response: Participate in security incident response and remediation efforts

Cost Optimization (15%)

Resource Management: Monitor and optimize cloud resource usage and costs
Capacity Planning: Analyze usage patterns and plan for future capacity needs
Cost Analysis: Provide recommendations for cost-effective architecture and resource allocation
Right-sizing: Implement automated scaling and resource optimization strategies

Common Services & Platform Engineering (20%)

Shared Infrastructure: Build and maintain common services like notification systems, caching layers, and message queues or third-party software stacks.
Database Operations: Manage database reliability, performance, and scaling (where not handled by dedicated DB teams)
Service Mesh & Networking: Implement and maintain service discovery, load balancing, and network policies
Developer Tools: Create and maintain tools and platforms that improve developer productivity and system reliability

Required Qualifications

Technical Skills

Programming Languages: Proficiency in at least two of: Python, Shell, PHP, Java, or similar languages
Cloud Platforms: Experience with AWS, GCP, or Azure infrastructure and services
Containerization: Hands-on experience with Docker, Kubernetes, and container orchestration
Monitoring & Observability: Experience with Prometheus, Grafana, ELK stack, or similar tools
Infrastructure as Code: Proficiency with Terraform, CloudFormation, or similar tools
Version Control: Expert-level Git usage and collaborative development practices

Operational Experience

Production Systems: 3+ years managing large-scale production environments
On-call Experience: Comfortable with 24/7 on-call responsibilities and incident response
System Administration: Strong Linux/Unix system administration skills
Networking: Understanding of TCP/IP, DNS, load balancing, and network security
Database Systems: Experience with SQL and NoSQL databases in production environments

SRE-Specific Knowledge

SLI/SLO Management: Experience defining and maintaining service level objectives
Error Budget Policy: Understanding of error budget concepts and implementation
Toil Reduction: Track record of identifying and eliminating repetitive manual work
Capacity Planning: Experience with performance testing and capacity management

Preferred Qualifications

Bachelor's degree in Computer Science, Engineering, or equivalent experience
Experience with microservices architecture and distributed systems
Knowledge of security best practices and compliance frameworks
Experience with chaos engineering and reliability testing
Previous experience in an SRE or DevOps role at a technology company
Contributions to open-source projects or technical communities

Success Metrics

Reliability: Maintain or improve service availability and reliability metrics
Toil Reduction: Measurable reduction in manual operational work through automation
Incident Response: Effective participation in incident response with focus on prevention
Code Quality: High-quality, well-tested code contributions to infrastructure and tooling
Collaboration: Effective partnership with development teams to improve system reliability

Team Culture & Values

Blameless Post-mortems: Learn from failures without blame or punishment
Automation First: Prefer automated solutions over manual processes
Measuring Everything: Data-driven decision making and continuous improvement
Sharing Knowledge: Document and share expertise across the team
Work-Life Balance: Sustainable on-call practices and reasonable operational load

Growth Opportunities

Opportunity to work on cutting-edge infrastructure and reliability challenges
Exposure to large-scale distributed systems and modern cloud technologies
Professional development budget for conferences, training, and certifications
Career progression path toward senior SRE, staff engineer, or management roles
Collaboration with engineering teams across the organization

Work Location: This role is fully remote for candidates who reside outside the 50 mile radius of our San Ramon office. For candidates who reside within 50 miles of our San Ramon location, this role is Hybrid and would require 3 days a week (M, W, TH) in our San Ramon office.

As part of our continued commitment to diversity, equity, and inclusion, Five9 supports pay transparency during the entire recruitment process. Actual compensation packages are based on several factors that are unique to each candidate including, but not limited to: skill set, depth of experience, certifications, and specific work location. The range displayed reflects the minimum and maximum target for new hire salaries for the job across the United States. Your recruiter can share more about the specific compensation package during your hiring process.

Additionally, the total compensation package for this position may also include an annual performance bonus, stock, and/or other applicable incentive compensation plans.

Our total reward package also includes:

Health, dental, and vision coverage, beginning on the first day of employment. Five9 covers 100% of the employee portion of the health, dental and vision coverage and shares a high portion of the dependent cost. We also offer Short & Long-Term Disability, Basic Life Insurance, and a 401k saving plan with employer matching.
Access to an innovative mental health support platform that offers personalized care and resources in areas such as: therapy, coaching and self-guided mindfulness exercises for all covered employees and their covered dependents.
Generous employee stock purchase plan.
Paid Time Off, Company paid holidays, paid volunteer hours and 12 weeks paid parental leave.

All compensation and benefits are subject to the requirements and restrictions set forth in the applicable plan documents and any written agreements between the parties.

The US base salary range for this role is below.

$91,500—$219,700 USD

Five9 embraces diversity and is committed to building a team that represents a variety of backgrounds, perspectives, and skills.  The more inclusive we are, the better we are.  Five9 is an equal opportunity employer.

View our privacy policy, including our privacy notice to California residents here: https://www.five9.com/pt-pt/legal.

Note: Five9 will never request that an applicant send money as a prerequisite for commencing employment with Five9.

Top Skills

AWS

Azure

CloudFormation

Docker

Elk Stack

GCP

Git

Grafana

Java

Kubernetes

PHP

Prometheus

Python

Shell

Terraform

Similar Jobs

Going

Senior Site Reliability Engineer

Yesterday

Remote

USA

155K-155K Annually

Expert/Leader

155K-155K Annually

Expert/Leader

Consumer Web • Mobile • Software • Travel • App development

The Senior Site Reliability Engineer will enhance CI/CD processes, automate quality checks, lead incident response, and mentor junior engineers to improve developer experience and system reliability.

Top Skills: AWSAzureBashCi/CdCloudFormationDatadogDockerElkGCPGoKubernetesOpentelemetryPrometheusPythonRubyTerraform

GitLab

Senior Site Reliability Engineer

7 Days Ago

Easy Apply

Remote

Easy Apply

151K-266K Annually

Senior level

151K-266K Annually

Senior level

Cloud • Security • Software • Cybersecurity • Automation

The Senior Site Reliability Engineer will automate operations, maintain systems, develop monitoring tools, respond to incidents, enhance security, and collaborate with engineering teams for optimal system performance.

Top Skills: AnsibleAWSElkGCPGitlabGoKubernetesPrometheusRubyTerraform

Cisco Meraki

Senior Site Reliability Engineer

10 Days Ago

Easy Apply

Remote or Hybrid

Easy Apply

147K-215K Annually

Senior level

147K-215K Annually

Senior level

Hardware • Information Technology • Security • Software • Cybersecurity • Conversational AI

The role involves developing and managing scalable cloud infrastructure, automating tasks, and leading technical projects in a 24/7 on-call environment.

Top Skills: AnsibleApache AirflowArgoAWSDebianDockerIaasLuigiPythonRubyScalaTerraformUbuntu

What you need to know about the Boston Tech Scene

Boston is a powerhouse for technology innovation thanks to world-class research universities like MIT and Harvard and a robust pipeline of venture capital investment. Host to the first telephone call and one of the first general-purpose computers ever put into use, Boston is now a hub for biotechnology, robotics and artificial intelligence — though it’s also home to several B2B software giants. So it’s no surprise that the city consistently ranks among the greatest startup ecosystems in the world.

Key Facts About Boston Tech

Number of Tech Workers: 269,000; 9.4% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Thermo Fisher Scientific, Toast, Klaviyo, HubSpot, DraftKings
Key Industries: Artificial intelligence, biotechnology, robotics, software, aerospace
Funding Landscape: $15.7 billion in venture capital funding in 2024 (Pitchbook)
Notable Investors: Summit Partners, Volition Capital, Bain Capital Ventures, MassVentures, Highland Capital Partners
Research Centers and Universities: MIT, Harvard University, Boston College, Tufts University, Boston University, Northeastern University, Smithsonian Astrophysical Observatory, National Bureau of Economic Research, Broad Institute, Lowell Center for Space Science & Technology, National Emerging Infectious Diseases Laboratories