Cerence Inc. Logo

Cerence Inc.

Sr. Principal Software Engineer

Posted Yesterday
Remote
Hiring Remotely in USA
141K-226K Annually
Senior level
Remote
Hiring Remotely in USA
141K-226K Annually
Senior level
Lead development and optimization of high-performance LLM inference pipelines across data center, edge, and embedded platforms. Improve latency and throughput via custom CUDA kernels, quantization, KV-cache and batching strategies, and deploy efficient, production-ready runtimes without external vendor dependency.
The summary above was generated by AI
A Moving Experience.

Who is Cerence AI? 

Cerence AI is the global leader in AI for transportation, specialized in building AI and voice-powered companions for cars, two-wheelers, and more that enable people to focus on what matters most. With over 500 million cars shipped with Cerence AI's technology, we partner with leading automakers (such as Volkswagen, Mercedes, Audi, Toyota and many more), mobility providers, and technology companies to power intuitive, integrated experiences that create safer, more connected, and more enjoyable journeys for drivers and passengers alike. 

 

Our Driving Force  

Our team is dedicated to pushing the boundaries of AI innovation, working around the globe with headquarters in Burlington, Massachusetts, USA and 16 other offices across Europe, Asia, and North America. We bring together diverse backgrounds, and varied skill sets with the shared goal of advancing the next generation of transportation user experiences. Our culture is customer-centric, collaborative, fast-paced, and fun, with continuous opportunities for learning and development to support your career growth. 

 

Interested in having a significant impact in a dynamic industry with a high-performing global team? We’re looking for an exceptional Senior Principal Software Engineer who is ready to drive the future of mobility with us! 

Job Description:

What You Will Work On 

  • Optimize and deploy highperformance LLM inference pipelines 

  • Own inference runtimes across data center, edge, and embedded platforms 

  • Push model performance through quantization, kernel fusion, and cache optimization 

  • Drive latency and throughput improvements that directly impact production products 

  • Enable efficient, reliable deployment without external vendor dependency 

 

Core Responsibilities 

Inference Engines & Runtime 

  • Build deep expertise and ownership of: 

  • vLLM 

  • TensorRT‑LLM 

  • llama.cpp 

  • QAIRT 

  • Extend and tune inference engines using custom CUDA kernels 

  • Adapt runtimes for constrained and embedded deployment environments 

 

Quantization & Numerical Optimisation 

  • Implement and evaluate quantisation strategies: 

  • INT8, INT4, FP4, FP8, mixed precision 

  • AWQ 

  • GPTQ 

  • Balance accuracy, latency, memory footprint, and throughput 

KV Cache Optimization 

  • Optimize key–value cache performance through: 

  • Paging 

  • Prefix caching 

  • Cacheaware memory layout design 

  • Reduce memory pressure while sustaining high throughput 

 

Latency & Throughput Optimisation 

  • Design and tune: 

  • Batching strategies 

  • Continuous batching 

  • Speculative decoding 

  • Optimize tail latency and tokens/sec under real production traffic patterns 

 

What Success Looks Like 

  • Models deploy efficiently on edge and embedded devices, not just servers 

  • Tokens/sec significantly outperform baseline implementations 

  • Endtoend latency is minimized and predictable 

  • Inference cost per request is materially reduced 

  • The company is no longer dependent on partners for inference optimization 

 

Required Experience & Skills 

Strongly Required 

  • Proven experience optimizing ML inference performance in production 

  • Deep understanding of GPU architecture and memory hierarchies 

  • Handson experience with CUDA and lowlevel performance tuning 

  • Experience deploying models beyond research environments 

Critical Technical Skills 

  • Inference engines: vLLM, TensorRTLLM, llama.cpp, QAIRT 

  • CUDA kernel development and profiling 

  • Quantisation techniques: INT8/INT4/FP4/FP8, AWQ, GPTQ 

  • KV cache optimisation and memory layout design 

  • Latency optimisation: batching, speculative decoding, continuous batching 

 

Common Problems You’ll Be Solving 

  • Deploy efficiently on edge or embedded targets 

  • Achieve competitive tokens/sec 

  • Reduce and stabilize inference latency 

You will be responsible for closing these gaps, creating a major competitive advantage. 

 

What we offer 

We offer a generous compensation and benefits package (in addition to the base salary), including: 

  • Salary range $141,400 USD - $226,300 USD It is not typical for offers to be made at or near the top of the range. The actual salary will be determined based on experience and other job-related factors. 

  • Annual bonus opportunity 

  • Insurance coverage (medical, dental, vision, life, and disability) 

  • Paid time off 

  • Paid holidays 

  • Company contribution to the RRSP (Registered Retirement Savings Plan) 

  • Equity awards for certain positions and levels 

  • Remote and/or hybrid work available depending on the position 

All compensation and benefits are subject to the terms and conditions of the underlying plans or programs, as applicable, and may be amended, terminated, or replaced from time to time. 

Cerence Inc. (Nasdaq: CRNC and www.cerence.com) is the global industry leader in creating unique, moving experiences for the automotive world. Spun out from Nuance in October 2019, Cerence is a new, independent company that has quickly gained traction as a leader in the automotive voice assistant space, working with all of the world’s leading automakers – from Ford and Fiat Chrysler to Daimler, Audi and BMW to Geely and SAIC – to transform how a car feels, responds and learns. Its track record is built on more than 20 years of industry experience and leadership and more than 500 million cars on the road today across more than 70 languages.  

 

As Cerence looks to the future and continues an ambitious growth agenda, we need someone to join the team and help build the future of voice and AI in cars. This is an exciting opportunity to join Cerence’s passionate, dedicated, global team and be a part of meaningful innovation in a rapidly growing industry. 

EQUAL OPPORTUNITY EMPLOYER

Cerence is firmly committed to Equal Employment Opportunity (EEO) and to compliance with all federal, state and local laws that prohibit employment discrimination on the basis of age, race, color, gender, gender identity, gender expression, sex, sex stereotyping, pregnancy, national origin, ancestry, religion, physical or mental disability, medical condition, marital status, citizenship status, sexual orientation, protected military or veteran status, genetic information and other protected classifications. Cerence Equal Employment Opportunity Policy Statement.

All prospective and current Employees need to remain vigilant when it comes to executing security policies in the workplace. This includes:

- Following workplace security protocols and training programs to familiarize with the ways to maintain a safe workplace.
- Following security procedures to report any suspicious activity.
- Having respect for corporate security procedures to allow those procedures to be effective.
- Adhering to company's compliance and regulations.
- Encouraging to follow a zero tolerance for workplace violence.

- Basic knowledge of information security and data privacy requirements (e.g., how to protect data & how to be handling this data).

- Demonstrative knowledge of information security through internal training programs.

HQ

Cerence Inc. Burlington, Massachusetts, USA Office

Burlington, MA, United States, 01803

Similar Jobs

Yesterday
Remote
2 Locations
205K-361K Annually
Senior level
205K-361K Annually
Senior level
Artificial Intelligence • Big Data • Cloud • Machine Learning • Software
Lead design, build, and scaling of Agentic AI and industry-specific solution accelerators on Genesys Cloud. Develop reusable multi-agent workflows, RAG-based knowledge orchestration, integrations with enterprise systems, prototypes, and production-grade AI applications. Provide technical leadership, mentor engineers, collaborate with product and industry SMEs, and deliver repeatable solution blueprints to accelerate deployment and time-to-value.
Top Skills: Ai SkillsAi StudioAnthropic ApisAPIsArchitectAutogenAWSAws BedrockAzureCcaasCopilotCrewaiCRMEnterprise SearchErpEvent FrameworksEvent-Driven ArchitecturesExperience OrchestrationGenesys CloudGCPJavaScriptJourney ManagementLangchainLanggraphLlm IntegrationMicroservicesMulti-Agent SystemsOpenai ApisPythonRag (Retrieval-Augmented Generation)Rest ApisSemantic KernelTypescriptVector Databases
5 Days Ago
Remote or Hybrid
5 Locations
58K-152K Hourly
Senior level
58K-152K Hourly
Senior level
Healthtech
Lead end-to-end design, development, and quality of cloud-based enterprise software solutions for caregivers. Define quality criteria, estimate costs, manage risks and engineering processes, integrate systems, and act as technical leader across large, complex projects.
Top Skills: AWSAzureAzure DevopsC#GitJavaJIRANoSQLPythonSQLTfs
5 Days Ago
Remote
United States
90K-230K Annually
Senior level
90K-230K Annually
Senior level
Consulting • Design
Design, build, and maintain end-to-end, accessible, secure full-stack software for a federal health/civilian agency. Build Section 508-compliant UIs, backend APIs, and services; follow API-first, CI/CD, automated testing, and security best practices; identify and remediate technical debt; mentor engineers and collaborate with cross-functional teams to deliver reliable, performant government software.
Top Skills: APIsAutomated TestingCi/CdCloud InfrastructureDatabasesDatastoresDockerSection 508TerraformVersion Control

What you need to know about the Boston Tech Scene

Boston is a powerhouse for technology innovation thanks to world-class research universities like MIT and Harvard and a robust pipeline of venture capital investment. Host to the first telephone call and one of the first general-purpose computers ever put into use, Boston is now a hub for biotechnology, robotics and artificial intelligence — though it’s also home to several B2B software giants. So it’s no surprise that the city consistently ranks among the greatest startup ecosystems in the world.

Key Facts About Boston Tech

  • Number of Tech Workers: 269,000; 9.4% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Thermo Fisher Scientific, Toast, Klaviyo, HubSpot, DraftKings
  • Key Industries: Artificial intelligence, biotechnology, robotics, software, aerospace
  • Funding Landscape: $15.7 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Summit Partners, Volition Capital, Bain Capital Ventures, MassVentures, Highland Capital Partners
  • Research Centers and Universities: MIT, Harvard University, Boston College, Tufts University, Boston University, Northeastern University, Smithsonian Astrophysical Observatory, National Bureau of Economic Research, Broad Institute, Lowell Center for Space Science & Technology, National Emerging Infectious Diseases Laboratories

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account