Quality Coding Software Solutions LLC Logo

Quality Coding Software Solutions LLC

Lead Site Reliability Engineer

Reposted 3 Days Ago
Remote
Hiring Remotely in United States
Senior level
Remote
Hiring Remotely in United States
Senior level
Lead Site Reliability Engineer responsible for designing resilient systems, managing performance monitoring, defining SLIs/SLOs, and leading incident responses. Collaborate with teams for reliability improvements in cloud-based applications.
The summary above was generated by AI

Overview

QCSS Health is seeking a Lead Site Reliability Engineer (SRE) to drive performance, availability, and scalability of our cloud-based SaaS applications. You will work closely with Development and Product to design resilient systems and respond to operational incidents that impact performance or availability. QCSS Health offers an exciting opportunity to support Medicaid health plans and providers in improving health care outcomes for vulnerable populations. Our solutions positively impact over two hundred and fifty thousand Medicaid members.

Responsibilities

  • Design and implement site reliability engineering best practices across cloud infrastructure and application services hosted on AWS
  • Deploy and manage performance monitoring tools to track key application and infrastructure metrics
  • Define and manage Service Level Indicators (SLIs), Service Level Objectives (SLOs), and Service Level Agreements (SLAs) across platform services
  • Lead technical responses during critical incidents to restore service availability with minimal downtime
  • Perform root cause analysis and coordinate postmortems to ensure follow-up improvements are implemented
  • Identify reliability risks and propose infrastructure improvements to enhance system performance and fault tolerance
  • Collaborate with software engineering, DevOps, and database teams to embed reliability into the development lifecycle
  • Advise teams on system design for high availability, capacity planning, and disaster recovery

Qualifications and Experience

  • 7+ years in site reliability engineering, DevOps, or systems engineering roles
  • 3+ years in a lead technical role with significant experience implementing Application Performance Monitoring (APM) solutions for data intensive applications
  • Experience with a scaling SaaS business (20 – 50 employees)
  • Experience with AWS services (EC2, CloudWatch, S3, etc), Microsoft IIS, Windows Server environments, and SQL Server databases
  • Experience in monitoring API and File Transfer interfaces
  • Demonstrated success in implementing SLIs/SLOs and managing systems to defined SLAs
  • Demonstrated success in leading incident response and communication
  • Bachelor’s degree in Computer Science, Information Systems, Engineering or related field

Location/Travel

  • Position is flexible in terms of physical location and can be remote.
  • QCSS is headquartered in Cambridge, MA.
  • Must be willing to travel (10% or less).

Hiring Process

  • The initial screen will be a 30 minute call
  • The first interview will be a 60 minute video call
  • The final interview will include presentation of an exercise and be a 90 minute video call
  • The final step of the process will be a reference check


Background Checks/Federal HealthCare Program Exclusion Lists Screening

Candidates for this position will be required to undergo a pre-employment background check. All QCSS employees are subject to annual screening to ensure that they have not been excluded from the Federal Healthcare Programs (using the OIG and GSA Exclusion Lists) or State Medicaid Programs. The Company’s pre-employment background check and OIG/GSA Exclusion Lists screening program is administered in compliance with all federal, state and local laws.

Equal Opportunity Employer

QCSS is an Equal Opportunity Employer and strongly supports diversity in the workforce.

About Quality Coding Software Solutions

At QCSS Health, we are laser-focused on simplifying the complexities of MLTSS service delivery through innovative, data-driven solutions that result in greater cost-efficiencies, more equitable access to care, and improved health outcomes.

Our Mission:

To seamlessly integrate domain expertise with technology solutions to make Managed Long Term Services and Supports more successful. We are dedicated to enabling health plans and providers to improve the health outcomes of their vulnerable populations and thrive in a value-based healthcare system.

Our Core values:

Simplifying Complexity - We know managing cost-effective MLTSS programs can be difficult, but it doesn’t have to be. Our guiding principle is to develop innovative, easy-to-use solutions, designed specifically to meet the unique challenges of running MLTSS programs.

Member-Centric Excellence - We empower MLTSS organizations to deliver exceptional care to their members, placing individual outcomes, health equity, and the quality of care delivered at the forefront of everything we do.

Trust is at our Core - We foster trust by leading with integrity, ethics, and the reliability of our technology. Transparency, genuine listening, and unwavering dedication to fulfilling our commitments are fundamental to how we cultivate strong and meaningful relationships.

Top Skills

AWS
Cloudwatch
Ec2
Microsoft Iis
S3
SQL Server
Windows Server
HQ

Quality Coding Software Solutions LLC Cambridge, Massachusetts, USA Office

Cambridge, MA, United States, 02139

Similar Jobs

7 Days Ago
Remote or Hybrid
2 Locations
205K-257K Annually
Senior level
205K-257K Annually
Senior level
Fintech • Machine Learning • Payments • Software • Financial Services
Lead diverse technology projects as a Site Reliability Engineer to optimize and automate business-critical services, focusing on cloud-based solutions and advanced technologies.
Top Skills: AWSCassandraDockerKafkaNode.jsOpensearchPostgres
5 Days Ago
Remote
USA
170K-200K
Senior level
170K-200K
Senior level
Software • Energy
As a Lead Site Reliability Engineer, you'll manage the Product Reliability team, ensuring product performance, scalability, and availability while delivering technical improvements and mentoring team members.
Top Skills: AWSDockerKubernetesLinuxPostgresPythonRabbitMQTerraform
7 Days Ago
Remote
United States
170K-200K Annually
Senior level
170K-200K Annually
Senior level
Software
The Lead Site Reliability Engineer will oversee the architecture and operational excellence of Mattermost's infrastructure, mentoring teams and driving strategic initiatives for performance and reliability in regulated sectors.
Top Skills: AWSGrafanaKubernetesPrometheusTerraform

What you need to know about the Boston Tech Scene

Boston is a powerhouse for technology innovation thanks to world-class research universities like MIT and Harvard and a robust pipeline of venture capital investment. Host to the first telephone call and one of the first general-purpose computers ever put into use, Boston is now a hub for biotechnology, robotics and artificial intelligence — though it’s also home to several B2B software giants. So it’s no surprise that the city consistently ranks among the greatest startup ecosystems in the world.

Key Facts About Boston Tech

  • Number of Tech Workers: 269,000; 9.4% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Thermo Fisher Scientific, Toast, Klaviyo, HubSpot, DraftKings
  • Key Industries: Artificial intelligence, biotechnology, robotics, software, aerospace
  • Funding Landscape: $15.7 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Summit Partners, Volition Capital, Bain Capital Ventures, MassVentures, Highland Capital Partners
  • Research Centers and Universities: MIT, Harvard University, Boston College, Tufts University, Boston University, Northeastern University, Smithsonian Astrophysical Observatory, National Bureau of Economic Research, Broad Institute, Lowell Center for Space Science & Technology, National Emerging Infectious Diseases Laboratories

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account