Socure Logo

Socure

Global Head of Site Reliability Engineering

Reposted 12 Days Ago
Remote
Hiring Remotely in USA
260K-285K Annually
Expert/Leader
Remote
Hiring Remotely in USA
260K-285K Annually
Expert/Leader
Lead global reliability for Socure's identity verification platform, focusing on system architecture, incident management, and improving developer experience using AI and advanced engineering practices.
The summary above was generated by AI
Why Socure?

Socure is building the identity trust infrastructure for the digital economy — verifying 100% of good identities in real time and stopping fraud before it starts. The mission is big, the problems are complex, and the impact is felt by businesses, governments, and millions of people every day.

We hire people who want that level of responsibility. People who move fast, think critically, act like owners, and care deeply about solving customer problems with precision. If you want predictability or narrow scope, this won’t be your place. If you want to help build the future of identity with a team that holds a high bar for itself — keep reading.

Overview

Socure is the leader in digital identity verification and fraud prevention. We are hiring a bold, hands-on Global Head of Site Reliability Engineering (SRE) to own end-to-end reliability for the platform that powers identity, fraud, and compliance decisions for thousands of organizations across regulated industries. You will lead the global reliability charter for our mission‑critical services and data platform, including public sector programs where Socure is authorized at FedRAMP Moderate and operates in AWS GovCloud (US).

You will set the strategy and build the systems that keep Socure always‑on: multi‑region resilience, graceful degradation, disaster readiness, and real‑time observability across a fast‑evolving stack. You will also lead our red‑team quality assurance function to design and run chaos engineering experiments that harden our infrastructure, data, and application layers under real‑world failure conditions. You will own developer experience for reliability—leading company‑wide CI/CD pipelines, release engineering, and ephemeral environments for rapid, isolated testing—and drive an AI‑first SRE strategy, applying machine learning for anomaly detection, adaptive alerting, automated runbooks, incident summarization, and capacity forecasting.

Socure’s platform safeguards highly sensitive, confidential data at massive scale, with workloads that demand low‑latency decisioning and continuous availability. This role is for a systems builder and culture carrier who has operated at the frontier of scale, reliability, and safety.

Why this role is compelling
  • Own global reliability for a platform trusted by financial institutions, fintechs, marketplaces, telecom, healthcare, and public sector programs—where availability, integrity, and clear evidence are non‑negotiable.

  • Steer reliability strategy for RiskOS, our risk orchestration engine that unifies identity, fraud, and compliance decisions and integrates a broad partner ecosystem—so improvements compound across every product and integration.

  • Lead with real impact: institutionalize best practices from large‑scale cloud incidents into a next‑generation reliability program that measurably improves uptime, latency, and time‑to‑recovery.

  • Shape developer experience at scale: own our CI/CD ecosystem, ephemeral test environments, and change‑management controls that enable safer, faster delivery for all engineering teams.

  • Work at the platform frontier: a real‑time Identity Graph, a powerful orchestration engine with deep explainability, and a modernization program toward product‑aligned, multi‑account AWS architecture with parity across commercial and GovCloud environments.

What you’ll do
  • Define the global reliability strategy and roadmap across availability, latency, durability, data integrity, cost efficiency, and safety—mapped to clear business outcomes and service level objectives.

  • Architect multi‑region, multi‑zone resilience patterns with automated failover, graceful degradation, and progressive delivery; validate readiness through continuous game days and fault‑injection experiments.

  • Build and lead a world‑class red‑team QA and chaos engineering program across infrastructure, data pipelines, and applications; codify attack playbooks and steady‑state guardrails to improve detection and recovery.

  • Establish a unified observability practice: end‑to‑end tracing, high‑signal alerting, health and saturation indicators, user‑journey telemetry, and incident command protocols—standardized into a single, actionable operations view.

  • Drive rigorous incident management: real‑time incident command, rapid mitigation, blameless post‑incident reviews, durable corrective actions, and automated safeguards.

  • Ensure public sector readiness and continuous authorization: sustain FedRAMP Moderate posture, prove environmental parity between commercial and GovCloud, and strengthen controls for data residency, deletion, and audit evidence.

  • Partner with product engineering to make reliability a product feature: embed reliability patterns into RiskOS workflows and make Identity Graph‑based decisions observable, explainable, and resilient by default.

  • Lead developer tooling and release engineering: own CI/CD pipelines, test sandboxes and ephemeral environments, and the golden paths that make shipping changes safe, repeatable, and fast.

  • Advance an AI‑first SRE strategy: deploy ML for anomaly detection, incident prediction, adaptive alerting, automated runbooks, incident summarization, and capacity forecasts; measure impact via concrete reliability and efficiency wins.

  • Lead capacity planning and performance engineering across compute, storage, and networking—delivering consistently low‑latency decisions at peak volumes.

  • Attract, grow, and retain exceptional reliability engineers and leaders across regions; run a humane, effective, continuously improving on‑call program.

What you’ll bring
  • Deep experience leading reliability for large‑scale, always‑on platforms with highly sensitive data—owning availability, latency, durability, and security across multiple product lines and regions.

  • Mastery in modern cloud architecture (AWS), product‑aligned multi‑account patterns, real‑time observability, progressive delivery, and automated disaster recovery—with a track record of measurable reliability gains.

  • Experience building red‑team and chaos engineering programs that surface systemic weaknesses, improve mean time to mitigate, and harden systems over time.

  • Proven leadership of developer tooling at scale: CI/CD, release engineering, and ephemeral environment strategies that increase velocity while reducing risk.

  • Strong partnership with product, data, and security; fluency in data lifecycle, retention and deletion, privacy, and governance for regulated industries and public sector.

  • A people‑first leadership style: you raise the bar on hiring and mentoring, set crisp principles, and build an ownership culture grounded in curiosity, accountability, and continuous learning.

Socure is an equal opportunity employer that values diversity in all its forms within our company. We do not discriminate based on race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.
If you need an accommodation during any stage of the application or hiring process—including interview or onboarding support—please reach out to your Socure recruiting partner directly.

Follow Us!

YouTube | LinkedIn | X (Twitter) | Facebook

Top Skills

AWS
Ci/Cd
Machine Learning

Similar Jobs

3 Hours Ago
Remote or Hybrid
United States
91K-169K Annually
Senior level
91K-169K Annually
Senior level
Artificial Intelligence • Cloud • Sales • Security • Software • Cybersecurity • Data Privacy
The Engagement Manager oversees multiple projects, manages client relationships, and ensures delivery of SailPoint solutions, focusing on project management and sales efforts.
Top Skills: Project ManagementSaaSSoftware
4 Hours Ago
Remote
USA
175K-215K Annually
Senior level
175K-215K Annually
Senior level
Artificial Intelligence • Information Technology • Marketing Tech • Software • SEO
The Senior Software Engineer will develop critical features for Agent Experience at Scrunch, focusing on full-stack solutions and infrastructure for marketing teams, optimizing APIs, databases, and cloud services.
Top Skills: Application Load BalancersCloud ServicesCloud StorageDrizzleGoMessage QueuesPythonRuby on RailsRest ApisSqlalchemyTypescript
7 Hours Ago
Easy Apply
Remote or Hybrid
United States
Easy Apply
Mid level
Mid level
Fintech • Mobile • Software • Financial Services
The Senior Analyst will oversee and monitor Market, Liquidity, and Capital risks, conducting reviews, developing tools, and collaborating with finance and risk teams.
Top Skills: PythonSQL

What you need to know about the Boston Tech Scene

Boston is a powerhouse for technology innovation thanks to world-class research universities like MIT and Harvard and a robust pipeline of venture capital investment. Host to the first telephone call and one of the first general-purpose computers ever put into use, Boston is now a hub for biotechnology, robotics and artificial intelligence — though it’s also home to several B2B software giants. So it’s no surprise that the city consistently ranks among the greatest startup ecosystems in the world.

Key Facts About Boston Tech

  • Number of Tech Workers: 269,000; 9.4% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Thermo Fisher Scientific, Toast, Klaviyo, HubSpot, DraftKings
  • Key Industries: Artificial intelligence, biotechnology, robotics, software, aerospace
  • Funding Landscape: $15.7 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Summit Partners, Volition Capital, Bain Capital Ventures, MassVentures, Highland Capital Partners
  • Research Centers and Universities: MIT, Harvard University, Boston College, Tufts University, Boston University, Northeastern University, Smithsonian Astrophysical Observatory, National Bureau of Economic Research, Broad Institute, Lowell Center for Space Science & Technology, National Emerging Infectious Diseases Laboratories

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account