ServiceNow Logo

ServiceNow

Senior Staff Machine Learning Engineer - DevOps/Site Reliability Engineer

Posted Yesterday
Remote
Hybrid
Hiring Remotely in Santa Clara, CA
Senior level
Remote
Hybrid
Hiring Remotely in Santa Clara, CA
Senior level
The Senior Staff Machine Learning Engineer will design and implement infrastructure for AI workloads, improve SRE practices, and mentor colleagues. They will work closely with AI engineers on efficient GPU cluster operation and ensure high-quality code delivery.
The summary above was generated by AI
Company Description
It all started in sunny San Diego, California in 2004 when a visionary engineer, Fred Luddy, saw the potential to transform how we work. Fast forward to today - ServiceNow stands as a global market leader, bringing innovative AI-enhanced technology to over 8,100 customers, including 85% of the Fortune 500®. Our intelligent cloud-based platform seamlessly connects people, systems, and processes to empower organizations to find smarter, faster, and better ways to work. But this is just the beginning of our journey. Join us as we pursue our purpose to make the world work better for everyone.
Job Description
This position requires passing a ServiceNow background screening, USFedPASS (US Federal Personnel Authorization Screening Standards). This includes a credit check, criminal/misdemeanor check and taking a drug test. Any employment is contingent upon passing the screening. Due to Federal requirements, only US citizens, US naturalized citizens or US Permanent Residents, holding a green card, will be considered.
PLATO (Platform Engineering and AI Technology Organization) at ServiceNow is a customer-focused innovative group building intelligent software using a variety of technology stacks to enable end-to-end, industry-leading work experiences for our customers. We are a group of people deeply invested in the success of our customers that happen to have expertise and knowledge in advanced technologies and software engineering best practices. We are data driven, structured, committed and we enjoy what we are doing. We prioritize robustness, performance and user experience over the technology stack and tools.
We are a group of technology professionals and platform engineers with a dual mission. We build and evolve the AI platform, and partner with teams to build products and end-to-end AI-powered work experiences. In equal measure, we lay the foundations, research, experiment, and de-risk AI technologies that unlock new work experiences in the future.
As a Senior Staff Machine Learning Engineer - Site Reliability Engineer you will:
  • Contribute to the design, development and implementation of infrastructure, platform, deployment and observability features that power AI workloads.
  • Collaborate with researchers, AI engineers, and infrastructure teams to ensure our GPU clusters perform efficiently, scale well, and remain reliable.
  • Contribute to the continuous improvement of the SRE practice by turning operational use cases into requirements for software tooling.
  • Contribute to the execution of deployment and support activities for AI/ML developers;
  • Build high-quality, clean, scalable and reusable code by enforcing best practices around software engineering architecture and processes (Code Reviews, Unit testing, etc.);
  • Work with the product owners to understand detailed requirements and own your code from design, implementation, test automation and delivery of high-quality product to our users;
  • Experience with operating LLMs on NVIDIA GPUs.
  • Be a mentor for colleagues and help promote knowledge-sharing.

Qualifications
To be successful in this role you have:
  • Experience in leveraging or critically thinking about how to integrate AI into work processes, decision-making, or problem-solving. This may include using AI-powered tools, automating workflows, analyzing AI-driven insights, or exploring AI's potential impact on the function or industry.
  • 8+ years of experience with infrastructure and platform operations, deployments, SRE, and DevOps with a continued focus on improving Platform health;
  • 6+ years of experience operating highly-available distributed workloads on Kubernetes following a DevOps approach.
  • 6+ years of development experience with Python, GoLang, Java or similar languages;
  • Experience with DevOps tooling (e.g. Helm / Ansible / Kubernetes / Prometheus /Splunk/ GitLab CI);
  • Strong working experience operating distributed systems built on Linux and J2EE;
  • Experience with software-defined networking, infrastructure as code and configuration management;
  • Experience building software for compliance and security in regulated environments
  • Ability to drive outcome in projects with material technical risk.

For positions in this location, we offer a base pay of >, plus equity (when applicable), variable/incentive compensation and benefits. Sales positions generally offer a competitive On Target Earnings (OTE) incentive compensation structure. Please note that the base pay shown is a guideline, and individual total compensation will vary based on factors such as qualifications, skill level, competencies, and work location. We also offer health plans, including flexible spending accounts, a 401(k) Plan with company match, ESPP, matching donations, a flexible time away plan and family leave programs.
Compensation is based on the geographic location in which the role is located and is subject to change based on work location.
Additional Information
Work Personas
We approach our distributed world of work with flexibility and trust. Work personas (flexible, remote, or required in office) are categories that are assigned to ServiceNow employees depending on the nature of their work. Learn more here .
Equal Opportunity Employer
ServiceNow is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, creed, religion, sex, sexual orientation, national origin or nationality, ancestry, age, disability, gender identity or expression, marital status, veteran status, or any other category protected by law. In addition, all qualified applicants with arrest or conviction records will be considered for employment in accordance with legal requirements.
Accommodations
We strive to create an accessible and inclusive experience for all candidates. If you require a reasonable accommodation to complete any part of the application process, or are unable to use this online application and need an alternative method to apply, please contact [email protected] for assistance.
Export Control Regulations
For positions requiring access to controlled technology subject to export control regulations, including the U.S. Export Administration Regulations (EAR), ServiceNow may be required to obtain export control approval from government authorities for certain individuals. All employment is contingent upon ServiceNow obtaining any export license or other approval that may be required by relevant export control authorities.
From Fortune. ©2024 Fortune Media IP Limited. All rights reserved. Used under license.

Top Skills

AI
Ansible
Gitlab Ci
Go
Helm
J2Ee
Java
Kubernetes
Linux
Nvidia Gpus
Prometheus
Python
Splunk

ServiceNow Waltham, Massachusetts, USA Office

275 Wyman Street, 2nd Floor, Waltham, MA, United States, 02451

Similar Jobs at ServiceNow

8 Days Ago
Remote
Hybrid
Santa Clara, CA, USA
173K-303K Annually
Senior level
173K-303K Annually
Senior level
Artificial Intelligence • Cloud • HR Tech • Information Technology • Productivity • Software • Automation
As a Staff Applied Research Scientist, you will innovate and improve AI language models, collaborate across teams, and oversee project lifecycles.
Top Skills: Design PatternsOopPython
7 Days Ago
Remote
Hybrid
San Diego, CA, USA
113K-192K Annually
Senior level
113K-192K Annually
Senior level
Artificial Intelligence • Cloud • HR Tech • Information Technology • Productivity • Software • Automation
The Senior Database Engineer analyzes query execution plans, performs performance tuning, supports production databases, and develops automation tools for operational efficiency.
Top Skills: ApacheMariadbMySQLPostgresTomcatUnixWeb Sphere
13 Days Ago
Remote
Hybrid
Santa Clara, CA, USA
188K-328K Annually
Expert/Leader
188K-328K Annually
Expert/Leader
Artificial Intelligence • Cloud • HR Tech • Information Technology • Productivity • Software • Automation
Drive architecture and implementation of GenAI features, design scalable services leveraging LLMs, and guide teams in AI adoption best practices.
Top Skills: Cloud DevelopmentGenaiJavaJavaScriptLlmsReactRelational DatabasesVueWeb Engineering

What you need to know about the Boston Tech Scene

Boston is a powerhouse for technology innovation thanks to world-class research universities like MIT and Harvard and a robust pipeline of venture capital investment. Host to the first telephone call and one of the first general-purpose computers ever put into use, Boston is now a hub for biotechnology, robotics and artificial intelligence — though it’s also home to several B2B software giants. So it’s no surprise that the city consistently ranks among the greatest startup ecosystems in the world.

Key Facts About Boston Tech

  • Number of Tech Workers: 269,000; 9.4% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Thermo Fisher Scientific, Toast, Klaviyo, HubSpot, DraftKings
  • Key Industries: Artificial intelligence, biotechnology, robotics, software, aerospace
  • Funding Landscape: $15.7 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Summit Partners, Volition Capital, Bain Capital Ventures, MassVentures, Highland Capital Partners
  • Research Centers and Universities: MIT, Harvard University, Boston College, Tufts University, Boston University, Northeastern University, Smithsonian Astrophysical Observatory, National Bureau of Economic Research, Broad Institute, Lowell Center for Space Science & Technology, National Emerging Infectious Diseases Laboratories

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account