CVS Health Logo

CVS Health

Executive Director, AI Ops Engineering

Posted Yesterday
Be an Early Applicant
In-Office or Remote
Hiring Remotely in Home, TN, USA
175K-335K Annually
Expert/Leader
In-Office or Remote
Hiring Remotely in Home, TN, USA
175K-335K Annually
Expert/Leader
The Executive Director, AI Ops Engineering will lead a team ensuring continuous operation and optimization of CVS's AI environment, driving reliability, availability, and scalability in platform operations, while overseeing security, observability, and innovation initiatives.
The summary above was generated by AI

We’re building a world of health around every individual — shaping a more connected, convenient and compassionate health experience. At CVS Health®, you’ll be surrounded by passionate colleagues who care deeply, innovate with purpose, hold ourselves accountable and prioritize safety and quality in everything we do. Join us and be part of something bigger – helping to simplify health care one person, one family and one community at a time.

Executive Director, AI Platform SRE

About the Role

CVS Health is seeking an Executive Director, AI Ops Engineering to build and lead a team of professionals responsible for the continuous operation, monitoring, and optimization of CVS's Enterprise AI environment. This is first and foremost an engineering leadership role — your core accountability is ensuring the platform is always on, always performing, and always improving.

CVS Health's AI platform is a critical enterprise asset powering clinical, operational, and consumer capabilities at scale across one of the nation's largest healthcare organizations. Keeping it reliable, observable, and continuously improving is the mission. Reporting to the Global Head of Infrastructure/AI Operations and Service Delivery, you will establish and maintain operational baselines across the full infrastructure stack, ensure all changes are continuously monitored, observed, and adjusted, and drive the highest levels of availability, reliability, and scalability across every layer of the environment.

This is a greenfield organizational build — the person in this role will define the operating model, shape the team culture, and establish the engineering standards that will govern CVS's AI infrastructure for years ahead. If you thrive on building from the ground up, this role was designed for you.

Teams You Will Lead

You will build and lead a multi-disciplinary SRE organization structured across nine functional areas spanning core platform operations and innovation. The team is organized to ensure full-spectrum coverage of the AI environment — from hardware and network through platform reliability, security, observability, and 24/7 operations — while continuously developing advanced automation and self-healing capabilities.

Core operational teams cover the following domains:

  • Platform Reliability — SLO/SLI/error budget management, availability baseline enforcement, cluster administration, GPU quota governance, and infrastructure-as-code
  • Infrastructure — Compute, storage, and hardware lifecycle management, including compliance controls and data isolation
  • Network — High-performance GPU networking, fabric management, security segmentation, and continuous network baseline enforcement
  • Observability — End-to-end monitoring strategy, alerting pipelines, SLI/SLO dashboards, and the feedback loops that connect operational data to improvement
  • Security SRE — Security posture, access controls, audit logging, vulnerability management, and regulatory compliance (HIPAA, NIST AI RMF)
  • 24/7 Operations Center — Round-the-clock incident response, on-call protocols, escalation management, and shift-level change execution, structured for sustainable coverage with no mandatory overtime
  • Change & Release Management — Change lifecycle governance, ITIL process management, compliance frameworks, ModelOps boundary definition, and platform knowledge base
  • FinOps — GPU cost governance, utilization optimization, tenant quota enforcement, and chargeback models in partnership with Finance

In addition to core operations, you will oversee three Innovation PODs — focused on AI-driven automation, infrastructure-as-code and self-service capabilities, and chaos engineering and resilience testing — with the goal of continuously reducing manual toil and building a self-healing, self-optimizing platform over time.

What You'll Do

Leadership

  • Own the SRE vision, strategy, and long-range roadmap with availability (>99.99%), reliability, and scalability as the primary measures of success
  • Lead, develop, and integrate all functional teams into a cohesive, always-on operations organization — setting clear ownership, accountability, and performance expectations for each team and each engineer
  • Establish and enforce operational baselines across all platform components; ensure deviations are detected, escalated, and resolved within defined SLAs
  • Drive end-to-end observability with continuous feedback loops connecting monitoring data to incident response, change decisions, and improvement cycles
  • Oversee change management ensuring every modification is risk-assessed, monitored during rollout, and baseline-validated post-deployment
  • Ensure configuration consistency and drift detection across all platform components to prevent baseline degradation over time
  • Build and sustain a high-performing 24/7 operations model — zero mandatory overtime, zero burnout attrition, and measurable team health and retention
  • Empower the Security SRE Lead to implement and maintain a world-class security posture, minimizing risk and ensuring robust compliance with frameworks like HIPAA and NIST AI RMF
  • Direct Innovation POD strategy to develop self-healing and autonomous capabilities that proactively prevent degradation before it impacts availability
  • Lead GPU FinOps governance — utilization optimization, tenant quota enforcement, and cost reduction — in partnership with the Finance organization
  • Manage vendor relationships and performance accountability

Program Governance

  • Lead the structured transition of operational ownership from the incumbent managed services provider to CVS's internal SRE organization, governing phased handoffs, competency validation, and milestone sign-offs, ensuring a seamless transition with minimal disruption to platform availability and business operations
  • Establish and lead the long-term operating model by institutionalizing key technical, architectural, and delivery leadership capabilities into permanent CVS roles, ensuring the organization is fully self-sustaining at program close

What You'll Bring

  • 10+ years in SRE, platform operations, or DevOps engineering leadership with a demonstrated focus on availability and reliability outcomes
  • 5+ years leading multiple technical teams simultaneously, including 24/7 operations organizations — with measurable team health, retention, and performance outcomes
  • Proven success establishing and enforcing operational baselines, SLO/SLI/error budget frameworks, and observability-driven continuous improvement in complex environments
  • Deep expertise in Kubernetes/OpenShift, IaC, GPU computing, and AI/ML infrastructure
  • Experience managing large-scale MSP transitions or platform operational handoffs while ensuring business continuity and minimizing disruption.
  • Demonstrated FinOps and GPU cost optimization experience in cloud or on-premises environments
  • Security framework implementation and compliance program management in regulated industries (HIPAA, NIST AI RMF)
  • Track record building sustainable 24/7 operations models with measurable retention and no burnout-related attrition
  • Executive stakeholder communication, vendor negotiation, and budget ownership
  • Background in innovation programs, POD structures, or centers of excellence
  • Willingness to travel and work off hours as required. Our 24/7 model is designed for sustainable, predictable coverage that eliminates mandatory overtime. As a leader, you will be an escalation point for critical incidents, but our goal is a resilient system and culture that protects our team's time

Preferred Qualifications

  • NVIDIA AI Enterprise, Run:AI, or GPU orchestration platform experience
  • Healthcare or regulated industry background
  • Certifications: ITIL Expert, PMP, AWS/Azure/GCP, CISSP
  • Familiarity with Cisco UCS, VAST storage, EVPN-VXLAN, and RDMA/RoCE protocols
  • Chaos engineering and AI-driven operations experience
  • Thought leadership: published work or speaking at industry conferences

Education

Required: Bachelor's in Computer Science, Engineering, or related field  |  Preferred: Master's degree

Pay Range

The typical pay range for this role is:

$175,100.00 - $334,750.00


This pay range represents the base hourly rate or base annual full-time salary for all positions in the job grade within which this position falls.  The actual base salary offer will depend on a variety of factors including experience, education, geography and other relevant factors.  This position is eligible for a CVS Health bonus, commission or short-term incentive program in addition to the base pay range listed above.  This position also includes an award target in the company’s equity award program. 
 

Our people fuel our future. Our teams reflect the customers, patients, members and communities we serve and we are committed to fostering a workplace where every colleague feels valued and that they belong.

Great benefits for great people

We take pride in offering a comprehensive and competitive mix of pay and benefits that reflects our commitment to our colleagues and their families.

This full‑time position is eligible for a comprehensive benefits package designed to support the physical, emotional, and financial well‑being of colleagues and their families. The benefits for this position include medical, dental, and vision coverage, paid time off, retirement savings options, wellness programs, and other resources, based on eligibility.


Additional details about available benefits are provided during the application process and on
Benefits Moments.

We anticipate the application window for this opening will close on: 04/30/2026

Qualified applicants with arrest or conviction records will be considered for employment in accordance with all federal, state and local laws.

Top Skills

Ai/Ml Infrastructure
Gpu Computing
Kubernetes
Openshift

CVS Health Boston, Massachusetts, USA Office

Boston, Massachusetts, United States, 02114

Similar Jobs

An Hour Ago
Remote
United States
135K-155K Annually
Mid level
135K-155K Annually
Mid level
eCommerce • Enterprise Web • Information Technology • Software • Database • Analytics • Business Intelligence
The Digital Customer Experience Lead translates the digital CX vision into scalable programs, designs self-service initiatives, and optimizes customer engagement to drive retention.
Top Skills: Automation ToolsCRMDigital Adoption ToolsSaaS
An Hour Ago
Easy Apply
Remote or Hybrid
USA
Easy Apply
140K-200K Annually
Senior level
140K-200K Annually
Senior level
Cloud • Information Technology • Security • Software • Cybersecurity
Lead a team in delivering Zscaler solutions while managing cross-functional relationships, driving operational excellence, and overseeing team performance.
Top Skills: Cloud Security ProtocolsFirewallsNetwork TechnologiesProxiesVpns
An Hour Ago
Remote
United States
165K-290K Annually
Senior level
165K-290K Annually
Senior level
Artificial Intelligence • Cloud • Consumer Web • eCommerce • Information Technology • Software
Lead the design and implementation of AI-based products, collaborate with stakeholders for strategic guidance, and build scalable systems while mentoring teams. Required to have significant experience in coding and AI technologies.
Top Skills: AngularAWSAzureGCPGenaiGoJavaLlmsPythonReactSQL

What you need to know about the Boston Tech Scene

Boston is a powerhouse for technology innovation thanks to world-class research universities like MIT and Harvard and a robust pipeline of venture capital investment. Host to the first telephone call and one of the first general-purpose computers ever put into use, Boston is now a hub for biotechnology, robotics and artificial intelligence — though it’s also home to several B2B software giants. So it’s no surprise that the city consistently ranks among the greatest startup ecosystems in the world.

Key Facts About Boston Tech

  • Number of Tech Workers: 269,000; 9.4% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Thermo Fisher Scientific, Toast, Klaviyo, HubSpot, DraftKings
  • Key Industries: Artificial intelligence, biotechnology, robotics, software, aerospace
  • Funding Landscape: $15.7 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Summit Partners, Volition Capital, Bain Capital Ventures, MassVentures, Highland Capital Partners
  • Research Centers and Universities: MIT, Harvard University, Boston College, Tufts University, Boston University, Northeastern University, Smithsonian Astrophysical Observatory, National Bureau of Economic Research, Broad Institute, Lowell Center for Space Science & Technology, National Emerging Infectious Diseases Laboratories

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account