NVIDIA Logo

NVIDIA

Senior System Software Engineer, Cloud Services

Posted 6 Days Ago
In-Office or Remote
2 Locations
184K-288K
Senior level
In-Office or Remote
2 Locations
184K-288K
Senior level
The role involves architecting and maintaining observability systems for cloud services, collaborating with teams for integration, driving automation, and addressing performance issues. It requires strong coding skills and experience with monitoring systems.
The summary above was generated by AI

Our team builds, operates, and maintains cloud-hosted services that provide user and service authentication/authorization across NVIDIA. Ensuring continuity of operations is critical to our mission.

We are in search of a highly proficient software engineer with extensive experience in AWS service development, deployment, and observability practices. In this capacity, you will have the responsibility of ensuring the reliability, performance, and scalability of our services, while providing the team with actionable insights for continuous improvement. You will build, implement, and coordinate observability infrastructure to proactively identify, fix, and address operational issues across our services.

What you’ll be doing:

  • Architect, implement, and maintain observability systems at scale to enable monitoring, alerting, logging, and tracing for our cloud-based services.

  • Define and refine service-level indicators (SLIs), service-level objectives (SLOs), and error budgets in partnership with service owners and product teams.

  • Invent, construct, and uphold actionable dashboards that display important measurements, SLI/SLOs, and system health for distributed services.

  • Collaborate with software, platform, and networking teams to integrate observability at all stages of the application lifecycle, from development to incident response.

  • Drive automation efforts to reduce manual toil in monitoring, telemetry, and incident response workflows; build and maintain self-service observability tooling.

  • Address performance and reliability issues by bringing to bear root cause analysis, distributed tracing, and log correlation.

  • Participate in Pager Duty rotations, contribute to post-incident reviews, detailing findings and driving solutions that improve long-term system resilience and visibility.

  • Develop expertise in the functions and capabilities of our offerings, and assist in managing our support channels for other NVIDIA teams.

What we need to see:

  • Bachelor’s or master’s degree in computer science, engineering, or equivalent experience in the field.

  • 8+ years in large-scale systems engineering roles with exposure to dealing with live service development, working end-to-end from service development, deployment, and observability, as well as being on-call.

  • Hands-on experience with modern monitoring systems (Prometheus, Grafana, Loki, Tempo, Datadog, New Relic, OpenTelemetry, etc.) within a production environment.

  • Advanced coding skills in Python, Go, or similar languages for building automation and integrating observability solutions. Comfort with JavaScript frameworks such as React and Next.js.

  • Proficiency in cloud platforms (AWS, GCP, Azure) and containerized environments (Kubernetes, Docker); experience with configuration-as-code tools (Terraform, Helm, Ansible).

  • Strong communication and collaboration skills, with experience working in global, cross-disciplinary teams.

  • Detailed, analytical problem-solving approach and high standards for operational excellence and customer happiness.

  • Experience with incident management, postmortem processes.

Ways to stand out from the crowd:

  • Familiarity with the Java Spring Boot framework, hands-on experience with Apache Cassandra and HashiCorp Vault would be very advantageous.

  • Besides our core duties, our team also manages multiple custom front-end services based on React for admin functions. Having relevant coding experience and being open to supporting development would be a huge plus.

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 184,000 USD - 287,500 USD.

You will also be eligible for equity and benefits.

Applications for this job will be accepted at least until September 7, 2025.NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Top Skills

Ansible
AWS
Azure
Datadog
Docker
GCP
Go
Grafana
Helm
JavaScript
Kubernetes
Loki
New Relic
Next.Js
Opentelemetry
Prometheus
Python
React
Tempo
Terraform

Similar Jobs

An Hour Ago
Remote or Hybrid
2 Locations
225K-287K Annually
Senior level
225K-287K Annually
Senior level
Cloud • Information Technology • Security • Software • Cybersecurity
The Senior Commercial Account Executive will focus on acquiring and expanding SMB accounts, drive new business, and build strategic relationships with clients while collaborating internally to enhance customer experience.
Top Skills: Google SuiteMsft SuiteSFDCTableau
An Hour Ago
Remote
US
Senior level
Senior level
Fintech • HR Tech • Payments • Social Impact • Financial Services
The Large Enterprise Account Executive drives revenue from high-value accounts, manages complex sales cycles, and engages with C-suite executives, leveraging technology and analytics to create business cases and increase sales.
Top Skills: Sales Engagement ToolsSalesforce
An Hour Ago
Easy Apply
Remote or Hybrid
US
Easy Apply
155K-255K
Expert/Leader
155K-255K
Expert/Leader
Marketing Tech • Social Media • Software • Analytics • Business Intelligence
Lead global social media and influencer marketing strategies. Manage platforms like LinkedIn, TikTok, and Instagram. Drive industry conversations and customer experiences while partnering with key teams to influence product development and customer engagement.
Top Skills: CanvaCapcutDavinci Resolve StudioFinal Cut ProHubspotMarketo MeasureSalesforceSprout

What you need to know about the Boston Tech Scene

Boston is a powerhouse for technology innovation thanks to world-class research universities like MIT and Harvard and a robust pipeline of venture capital investment. Host to the first telephone call and one of the first general-purpose computers ever put into use, Boston is now a hub for biotechnology, robotics and artificial intelligence — though it’s also home to several B2B software giants. So it’s no surprise that the city consistently ranks among the greatest startup ecosystems in the world.

Key Facts About Boston Tech

  • Number of Tech Workers: 269,000; 9.4% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Thermo Fisher Scientific, Toast, Klaviyo, HubSpot, DraftKings
  • Key Industries: Artificial intelligence, biotechnology, robotics, software, aerospace
  • Funding Landscape: $15.7 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Summit Partners, Volition Capital, Bain Capital Ventures, MassVentures, Highland Capital Partners
  • Research Centers and Universities: MIT, Harvard University, Boston College, Tufts University, Boston University, Northeastern University, Smithsonian Astrophysical Observatory, National Bureau of Economic Research, Broad Institute, Lowell Center for Space Science & Technology, National Emerging Infectious Diseases Laboratories

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account