Mattermost

Lead Site Reliability Engineer (SRE)

Reposted 3 Days Ago

Easy Apply

Remote

Hiring Remotely in United States

170K-200K Annually

Senior level

Easy Apply

Remote

Hiring Remotely in United States

170K-200K Annually

Senior level

The Lead Site Reliability Engineer will oversee the architecture and operational excellence of Mattermost's infrastructure, mentoring teams and driving strategic initiatives for performance and reliability in regulated sectors.

The summary above was generated by AI

At Mattermost, we build the #1 collaborative workflow solution for defense, intelligence, security, and critical infrastructure organizations. Trusted by governments, financial institutions, and technology companies, our platform enables secure, efficient operations for the world’s most critical teams.

We’re dedicated to empowering organizations to operate with confidence, reducing risks, and accelerating productivity. Guided by our core values of Customer Obsession, Earn Trust, Self Awareness, Ownership and High Impact, we collaborate closely with our customers to deliver solutions that meet complex needs and drive success.

To learn more, visit www.mattermost.com

Mattermost is seeking an experienced and visionary Lead Site Reliability Engineer (SRE) to guide the architecture, reliability, and operational excellence of the infrastructure powering our secure, mission-critical collaboration platform.

In this role, you will provide technical leadership across our SRE function, driving strategic initiatives for scalability, observability, performance, and automation across cloud and hybrid environments. You will mentor engineers, establish best practices, and collaborate closely with development, security, and operations teams to ensure our customers in defense, government, and critical infrastructure sectors experience exceptional reliability and performance.

Responsibilities Include:

Define the strategy, architecture, and roadmap for Mattermost’s site reliability engineering function, aligning infrastructure initiatives with product and business goals.
Lead the design, deployment, and optimization of production-grade containerized workloads, infrastructure-as-code, and compliant cloud environments for regulated domains (e.g., FedRAMP, DoD).
Establish and evolve observability, monitoring, and alerting frameworks to ensure performance, reliability, and capacity planning at scale.
Drive incident management processes, including on-call rotations, root cause analysis, and systemic reliability improvements.
Partner with security and compliance teams to meet data sovereignty, security, and regulatory requirements.
Champion automation and operational excellence to improve efficiency, reduce risk, and scale operations.
Oversee cloud cost management and capacity planning to optimize infrastructure spending while meeting performance targets.
Build and maintain a developer platform that enables fast, secure software delivery and improves application stability in production.
Mentor and coach SRE team members, fostering a culture of learning, collaboration, and technical excellence.

Requirements:

BS in Computer Science, Cybersecurity, Software Engineering, or a related technical field, or equivalent experience, with 5+ years of relevant experience in site reliability engineering, DevOps, or cloud infrastructure roles.
Proven expertise in container orchestration platforms, ideally Kubernetes.
Extensive experience with infrastructure-as-code, ideally Terraform.
Strong background in cloud platforms, ideally AWS.
Demonstrated experience designing and implementing monitoring, alerting, and performance optimization strategies.
Exceptional troubleshooting and incident management skills for distributed systems.
Proficiency in at least one scripting or programming language for automation.
Excellent communication skills with a track record of influencing cross-functional teams.
Experience leading globally distributed teams in a remote-first environment.
For candidates residing in the U.S.: This role may require the ability to obtain and maintain a U.S. government security clearance in the future. As such, U.S. applicants must be U.S. citizens and eligible under applicable clearance requirements.
Applicants must meet eligibility requirements for access to export-controlled information as defined by U.S. export control laws, including EAR and ITAR.

Preferences:

Familiarity with observability stacks such as Grafana and Prometheus.
Experience designing high-availability, disaster recovery, and scaling architectures.
Exposure to GCP and Azure cloud environments.
Leadership experience in highly regulated industries such as defense, finance, or critical infrastructure.
Experience with U.S. federal compliance frameworks and authorization processes, including FedRAMP, DoD ATO, NIST 800-53, and related government standards.
Experience preparing, delivering, and maintaining software offerings through AWS Marketplace and other cloud provider marketplaces (e.g., Azure Marketplace, Google Cloud Marketplace), including packaging, compliance validation, and ongoing operational support.
Open-source contributions in reliability, DevOps, or infrastructure tooling.
Certifications in cloud infrastructure, reliability, or DevOps engineering (e.g., CKA, CKAD, AWS Certified Solutions Architect).

Mattermost takes a market-based approach to pay and pay may vary depending on your location. The successful candidate’s starting pay will be determined based on job-related skills, experience, qualifications, work location, and market conditions. These ranges may be modified in the future.

Salary Range

$170,000—$200,000 USD

Mattermost is an EEO Employer, we are a remote-first, open-source company.

We are continually working to expand our hiring in more countries and regions, ensuring compliance with local laws and regulations, which takes time.

Mattermost values your unique perspective—we welcome all applicants. We encourage individuals from all backgrounds to apply and are committed to assessing candidates based on their skills and qualifications. We do not tolerate discrimination against staff or applicants based on race, religion, national origin, age, disability, pregnancy status, veteran status, or other personal characteristics.

If you require accommodations during the interview process, please let us know—we’re happy to assist.

Top Skills

AWS

Grafana

Kubernetes

Prometheus

Terraform

Similar Jobs

Commerce

Site Reliability Engineer

24 Days Ago

Remote

United States

84K-144K Annually

Senior level

84K-144K Annually

Senior level

Artificial Intelligence • Cloud • Consumer Web • eCommerce • Information Technology • Software

The Site Reliability Engineer will ensure application performance, architect monitoring tools, analyze systems, provide reliability recommendations, and support production.

Top Skills: AnsibleCentosDatadogDockerLinuxMySQLNew RelicRhelSQL

Health Catalyst

Site Reliability Engineer

6 Days Ago

Remote

Senior level

Big Data • Healthtech • Information Technology • Analytics

As a Lead Site Reliability Engineer, you'll design and manage scalable cloud infrastructure on GCP, optimize CI/CD processes, and ensure system reliability through observability and incident response, while mentoring others in a cross-product SRE group.

Top Skills: BashGitlab Ci/CdGkeGoogle Cloud PlatformJenkinsPythonSentrySumo LogicTerraform

Coinbase

Site Reliability Engineer

20 Days Ago

Easy Apply

Remote

USA

Easy Apply

140K-165K Annually

Junior

140K-165K Annually

Junior

Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3

The Site Reliability Engineer will enhance reliability and observability, automate processes, support engineering teams, and promote a culture of reliability at Coinbase.

Top Skills: AWSAzureDockerEc2GCPGoKubernetesRubyTerraform

What you need to know about the Boston Tech Scene

Boston is a powerhouse for technology innovation thanks to world-class research universities like MIT and Harvard and a robust pipeline of venture capital investment. Host to the first telephone call and one of the first general-purpose computers ever put into use, Boston is now a hub for biotechnology, robotics and artificial intelligence — though it’s also home to several B2B software giants. So it’s no surprise that the city consistently ranks among the greatest startup ecosystems in the world.

Key Facts About Boston Tech

Number of Tech Workers: 269,000; 9.4% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Thermo Fisher Scientific, Toast, Klaviyo, HubSpot, DraftKings
Key Industries: Artificial intelligence, biotechnology, robotics, software, aerospace
Funding Landscape: $15.7 billion in venture capital funding in 2024 (Pitchbook)
Notable Investors: Summit Partners, Volition Capital, Bain Capital Ventures, MassVentures, Highland Capital Partners
Research Centers and Universities: MIT, Harvard University, Boston College, Tufts University, Boston University, Northeastern University, Smithsonian Astrophysical Observatory, National Bureau of Economic Research, Broad Institute, Lowell Center for Space Science & Technology, National Emerging Infectious Diseases Laboratories