NVIDIA Logo

NVIDIA

Senior System Software Engineer - DevOps and Infrastructure Automation

Posted 4 Days Ago
In-Office or Remote
2 Locations
148K-288K
Mid level
In-Office or Remote
2 Locations
148K-288K
Mid level
The Senior DevOps Engineer will oversee the DevOps landscape, focusing on CI/CD pipelines, infrastructure automation, and collaboration with teams to enhance AI inference systems.
The summary above was generated by AI

We are now seeking a Senior DevOps Engineer for NVIDIA AI Inference Operations Team. This is a unique opportunity to be the cornerstone of our DevOps practice, taking full ownership of the critical systems that power our engineering innovation. You will be responsible for the entire DevOps landscape, from our CI/CD pipelines to our kernel build systems, driving efficiency and reliability across the organization. You will work with autonomy to design and implement the best solutions and collaborate with external partners to achieve our goals. If you're passionate about infrastructure, Kubernetes, automation, and observability, we want you with us at one of the most innovative companies in the world.

What you'll be doing:
  • Building and maintaining infrastructure from first principles needed to deliver our growing family of AI Inferencing products including Dynamo and NIXL.
  • Maintain CI/CD pipelines to automate the build, test, and deployment process and build improvements on the bottlenecks. Managing tools and enabling automations for redundant manual workflows via Github Actions, Gitlab, Terraform, etc
  • Enable performing scans and handling of security CVEs for infrastructure components
  • Extensive collaboration with cross-functional teams to integrate pipelines from deep learning frameworks and components is essential to ensuring seamless deployment and inference of deep learning models on our platform.
What we need to see:
  • Masters degree or equivalent experience
  • 3+ years of experience in Computer Science, computer architecture, or related field
  • Ability to work in a fast-paced, agile team environment
  • Excellent Bash, CI/CD, Python programming and software design skills, including debugging, performance analysis, and test design.
  • Experience in administering, monitoring, and deploying systems and services on GitHub and cloud platforms. Support other technical teams in monitoring operating efficiencies of the platform, and responding as needs arise.
  • Highly skilled in Kubernetes and Docker/containerd. Automation expert with hands on skills in frameworks like Ansible & Terraform. Experience in AWS, Azure or GCP
  • Knowledge of distributed systems programming.
Ways to stand out from the crowd:
  • Experience contributing to a large open-source deep learning community - use of GitHub, bug tracking, branching and merging code, OSS licensing issues handling patches, etc.
  • Experience in defining and leading the DevOps strategy (design patterns, reliability and scaling) for a team or organization.
  • Experience driving efficiencies in software architecture, creating metrics, implementing infrastructure as code and other automation improvements.
  • Deep understanding of test automation infrastructure, framework and test analysis.
  • Excellent problem solving abilities spanning multiple software (storage systems, kernels and containers) as well as collaborating within an agile team environment to prioritize deep learning-specific features and capabilities within Triton Inference Server, employing advanced troubleshooting and debugging techniques to resolve complex technical issues.

NVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most experienced and hard-working people in the world working for us. Are you creative and autonomous? Do you love a challenge? If so, we want to hear from you. Come help us build the real-time, efficient computing platform driving our success in the dynamic and quickly growing field Deep Learning and Artificial Intelligence.

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 148,000 USD - 235,750 USD for Level 3, and 184,000 USD - 287,500 USD for Level 4.

You will also be eligible for equity and benefits.

Applications for this job will be accepted at least until August 10, 2025.NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Top Skills

Ansible
AWS
Azure
Bash
Ci/Cd
DevOps
Docker
GCP
Git
Gitlab
Kubernetes
Python
Terraform

Similar Jobs

2 Hours Ago
In-Office or Remote
San Francisco, CA, USA
222K-349K Annually
Expert/Leader
222K-349K Annually
Expert/Leader
Cloud • Information Technology • Productivity • Security • Software • App development • Automation
The Senior Principal Product Designer will lead design initiatives across teams, delivering high-quality solutions and influencing product direction, while mentoring other designers.
Top Skills: Interaction DesignUxVisual Design
4 Hours Ago
Remote
USA
97K-154K
Mid level
97K-154K
Mid level
Artificial Intelligence • Cloud • Fintech • Professional Services • Software • Analytics • Financial Services
The Senior Account Development Representative engages Fortune 500 accounts to drive sales pipelines through networking, prospecting, and collaboration with Account Executives, ensuring customer satisfaction and revenue growth.
Top Skills: Linkedin Sales NavigatorSalesforce CRMSalesloftZoominfo
4 Hours Ago
In-Office or Remote
2 Locations
150K-180K
Senior level
150K-180K
Senior level
Aerospace • Defense
As a Staff Propulsion Analysis Engineer, you will conduct structural analyses, assess risks of hardware, mentor junior engineers, and improve analytical methods.
Top Skills: AbaqusAnsysCreoMatlabNastranNxPython

What you need to know about the Boston Tech Scene

Boston is a powerhouse for technology innovation thanks to world-class research universities like MIT and Harvard and a robust pipeline of venture capital investment. Host to the first telephone call and one of the first general-purpose computers ever put into use, Boston is now a hub for biotechnology, robotics and artificial intelligence — though it’s also home to several B2B software giants. So it’s no surprise that the city consistently ranks among the greatest startup ecosystems in the world.

Key Facts About Boston Tech

  • Number of Tech Workers: 269,000; 9.4% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Thermo Fisher Scientific, Toast, Klaviyo, HubSpot, DraftKings
  • Key Industries: Artificial intelligence, biotechnology, robotics, software, aerospace
  • Funding Landscape: $15.7 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Summit Partners, Volition Capital, Bain Capital Ventures, MassVentures, Highland Capital Partners
  • Research Centers and Universities: MIT, Harvard University, Boston College, Tufts University, Boston University, Northeastern University, Smithsonian Astrophysical Observatory, National Bureau of Economic Research, Broad Institute, Lowell Center for Space Science & Technology, National Emerging Infectious Diseases Laboratories

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account