NVIDIA Logo

NVIDIA

Director, Software Engineering - DGX Cloud Infrastructure

Posted 15 Days Ago
In-Office or Remote
5 Locations
284K-426K
Expert/Leader
In-Office or Remote
5 Locations
284K-426K
Expert/Leader
Lead the engineering organization focused on GPU-accelerated cloud infrastructure, ensuring automation and operational excellence while collaborating with internal and external partners.
The summary above was generated by AI

NVIDIA is seeking a strategic and technically grounded Director of Engineering to lead a high-impact organization at the intersection of core compute cloud infrastructure for AI factories. This organization is a key pillar in NVIDIA’s DGX Cloud ecosystem, building shared automation and reliability tooling that enables a sizable portion of our GPU-accelerated compute fleet.

You will further develop and scale an organization of engineers focused on running production software for large scale GPU-accelerated infrastructure. This organization partners closely with storage, networking, and several other teams across NVIDIA. You will be the engineering leader responsible for interfacing with some of our NVIDIA Cloud Partners to continuously meet our production excellence goals.

What You’ll Be Doing:

  • Build and grow a team of software engineers and leaders focused on automating day 0, 1, and 2 for large-scale GPU clusters running on bare metal and public clouds with service levels of various kinds.

  • Lead the design and continuous delivery of shared automation frameworks aligned with SLOs and error budgets.

  • Liaise with some of our NVIDIA Cloud Partners to ensure aligned priorities and sustained production excellence.

  • Drive clarity and execution through high ambiguity, translating broad, and ever evolving objectives into iterative delivery milestones.

  • Enable internal teams by reducing operational friction and improving automation coverage across the stack.

What We Need To See:

  • Proven experience leading software engineering teams (incl. SRE and/or DevOps) responsible for infrastructure automation, and distributed systems.

  • Demonstrated ability to build software engineering organizations, driving continuous incremental execution across teams, and operate effectively in highly ambiguous environments with ever evolving objectives.

  • Hands-on experience designing, running, or automating cloud infrastructure atop bare metal platforms and/or VMs.

  • Experience deploying cloud-native services on public clouds.

  • Track record of representing your company or division in external partnerships with public clouds, infrastructure vendors, and to internal partner teams.

  • Strong foundation in incremental delivery, and technical program execution.

  • Excellent written and verbal communication skills, with the ability to influence across levels and disciplines.

  • Bachelor of Science (or equivalent experience) or Master of Science degree in Computer Science or related field, with a minimum of 10+ overall years of experience developing and leading cloud infrastructure teams, and 5+ yrs of management experience

Ways to stand out from the crowd:

  • Relevant experience developing organizations at public cloud companies. Background leading teams running large-scale GPU clusters. Familiarity with technologies like Linux, NVIDIA BCM, Slurm, Infiniband, Kubernetes, Slurm, distributed storage, or BlueField DPUs.

  • Experience developing both internal-facing platform teams and customer-facing infrastructure as a service ones.

  • Track record of collaboration with security, or compliance teams including in regulated environments. Familiarity with AI/ML platform workloads and their reliability or performance characteristics.

NVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and hardworking people in the world working for us. If you're creative, hard-working and self-motivated, we want to hear from you! NVIDIA is leading the way in groundbreaking developments in Artificial Intelligence, High-Performance Computing and Visualization. The GPU, our invention, serves as the visual cortex of modern computers and is at the heart of our products and services. NVIDIA leads the way in groundbreaking developments in Artificial Intelligence, High-Performance Computing (HPC) and Visualization. DGX Cloud provides a serverless generative AI infrastructure to the world enabling NVIDIA’s AI supercomputer technologies to be used by anyone.

The base salary range is 284,000 USD - 425,500 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.

You will also be eligible for equity and benefits. NVIDIA accepts applications on an ongoing basis.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Top Skills

Bluefield Dpus
Distributed Storage
Infiniband
Kubernetes
Linux
Nvidia Bcm
Slurm

Similar Jobs

2 Hours Ago
Remote or Hybrid
Bentonville, AR, USA
126K-168K Annually
Senior level
126K-168K Annually
Senior level
Automotive • Professional Services • Software • Consulting • Energy • Chemical • Renewable Energy
The AI Safety Scientist ensures AI products meet safety and ethical standards, evaluates compliance, trains junior staff, and collaborates with teams.
Top Skills: AIData ScienceGdprIeee StandardsIso/Iec StandardsMl
2 Hours Ago
Remote or Hybrid
Los Angeles, CA, USA
145K-181K Annually
Senior level
145K-181K Annually
Senior level
Cloud • Fintech • Information Technology • Machine Learning • Software • App development • Generative AI
As a Senior Software Engineer, you will design, develop, and optimize cloud-based backend services, contributing to architecture, testing, and project planning in a collaborative environment.
Top Skills: AWSAzureC#C++Elastic SearchGCPJavaKafkaMicroservicesNifiNo-SqlRabbitMQRestful ApisServerless ArchitectureSQL
2 Hours Ago
Easy Apply
Remote
United States
Easy Apply
Mid level
Mid level
Cloud • Security • Software • Cybersecurity • Automation
Build and integrate security interfaces for enterprise tools using Vue.js or React, collaborating with a team to create responsive dashboards and configuration UIs.
Top Skills: CSS3GraphQLJavaScriptReactRest ApisScssTypescriptVue

What you need to know about the Boston Tech Scene

Boston is a powerhouse for technology innovation thanks to world-class research universities like MIT and Harvard and a robust pipeline of venture capital investment. Host to the first telephone call and one of the first general-purpose computers ever put into use, Boston is now a hub for biotechnology, robotics and artificial intelligence — though it’s also home to several B2B software giants. So it’s no surprise that the city consistently ranks among the greatest startup ecosystems in the world.

Key Facts About Boston Tech

  • Number of Tech Workers: 269,000; 9.4% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Thermo Fisher Scientific, Toast, Klaviyo, HubSpot, DraftKings
  • Key Industries: Artificial intelligence, biotechnology, robotics, software, aerospace
  • Funding Landscape: $15.7 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Summit Partners, Volition Capital, Bain Capital Ventures, MassVentures, Highland Capital Partners
  • Research Centers and Universities: MIT, Harvard University, Boston College, Tufts University, Boston University, Northeastern University, Smithsonian Astrophysical Observatory, National Bureau of Economic Research, Broad Institute, Lowell Center for Space Science & Technology, National Emerging Infectious Diseases Laboratories

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account