Sully.ai Logo

Sully.ai

Senior AI Systems Engineer (LLM Inference & Infra Optimization)

Posted 2 Days Ago
Remote
Hiring Remotely in US
Senior level
Remote
Hiring Remotely in US
Senior level
Lead efforts in deploying and optimizing large language models on GPU hardware, optimizing inference pipelines and managing multi-cloud infrastructures.
The summary above was generated by AI
About Us

At Sully.ai, we’re building cutting-edge AI-native infrastructure to power real-time, intelligent healthcare applications. Our team operates at the intersection of high-performance computing, ML systems, and cloud infrastructure — optimizing inference pipelines to support next-generation multimodal AI agents. We're looking for a deeply technical engineer who thrives at the systems level and loves building performant, scalable infrastructure.

The Role

We’re looking for a senior-level engineer to lead efforts in deploying and optimizing large language models on high-end GPU hardware and building the infrastructure that supports them. You'll work across the stack — from C++ and CUDA kernels to Python APIs — while also shaping our DevOps practices for scalable, multi-cloud deployments. This role blends systems performance, ML inference, and infrastructure-as-code to deliver low-latency, production-grade AI services.

What You’ll Do
  • LLM Inference Optimization: Develop and optimize inference pipelines using quantization, attention caching, speculative decoding, and memory-efficient serving.

  • Systems Programming: Build and maintain low-level modules in C++/CUDA/NCCL to squeeze the most out of GPUs and high-throughput architectures.

  • DevOps & Infrastructure Engineering: Stand up and manage multi-cloud environments using modern IaC frameworks such as Pulumi or Terraform. Automate infrastructure provisioning, deployment pipelines, and GPU fleet orchestration.

  • Real-Time Architectures: Design low-latency streaming and decision-support systems leveraging embedding models, VRAM token caches, and fast interconnects.

  • Developer Enablement: Build robust tooling, interfaces, and sandbox environments so that other engineers can contribute safely to the ML systems layer.

What We’re Looking For
  • Proficiency in C++, CUDA, and Python with experience in systems or ML infrastructure engineering.

  • Deep understanding of GPU architectures, inference optimization, and large model serving techniques.

  • Hands-on experience with multi-cloud environments (GCP, AWS, etc.) and infrastructure-as-code tools such as Pulumi, Terraform, or similar.

  • Familiarity with ML deployment frameworks (TensorRT, vLLM, DeepSpeed, Hugging Face Transformers, etc.).

  • Comfortable with DevOps workflows, containerization (Docker), CI/CD, and distributed system debugging.

  • (Bonus) Experience with streaming embeddings, semantic search, or hybrid retrieval architectures.

  • (Bonus) Interest in building tools that democratize high-performance systems for broader engineering teams.

Why Join Us
  • Collaborate with a highly technical team solving hard problems at the edge of AI and healthcare.

  • Work with bleeding-edge GPU infrastructure and build systems that push what's possible.

  • Be a foundational part of shaping AI-native infrastructure for real-time, mission-critical applications.

  • Help accelerate a meaningful product that improves how clinicians work and patients are cared for.

Sully.ai is an equal opportunity employer. In addition to EEO being the law, it is a policy that is fully consistent with our principles. All qualified applicants will receive consideration for employment without regard to status as a protected veteran or a qualified individual with a disability, or other protected status such as race, religion, color, national origin, sex, sexual orientation, gender identity, genetic information, pregnancy or age. Sully.ai prohibits any form of workplace harassment. 

Top Skills

C++
Cuda
Deepspeed
Docker
Hugging Face Transformers
Pulumi
Python
Tensorrt
Terraform
Vllm

Similar Jobs

3 Minutes Ago
In-Office or Remote
Palo Alto, CA, USA
143K-274K Annually
Senior level
143K-274K Annually
Senior level
Aerospace • Artificial Intelligence • Computer Vision • Software • Analytics • Defense • Big Data Analytics
The Director of Business Development will lead efforts in pursuit of new commercial opportunities in the space sector, managing customer relationships and capture strategies to increase market share.
Top Skills: Business DevelopmentMarket StrategyProposal Management
4 Minutes Ago
Remote
US
134K-214K Annually
Senior level
134K-214K Annually
Senior level
Cloud • Fintech • Food • Information Technology • Software • Hospitality
The Senior Backend Engineer will design and deliver reliable services and infrastructure, resolve performance issues, and uphold engineering best practices.
Top Skills: Apache PulsarAWSDynamoDBGoJavaKotlinPostgresTerraform
4 Minutes Ago
Remote
United States
154K-246K Annually
Senior level
154K-246K Annually
Senior level
Cloud • Fintech • Food • Information Technology • Software • Hospitality
The Director of Credit Risk will build risk management frameworks, oversee credit strategies, and manage a high-performing team to drive responsible growth in lending for Toast Capital.
Top Skills: Credit ModelingCredit Risk ManagementData Analysis

What you need to know about the Boston Tech Scene

Boston is a powerhouse for technology innovation thanks to world-class research universities like MIT and Harvard and a robust pipeline of venture capital investment. Host to the first telephone call and one of the first general-purpose computers ever put into use, Boston is now a hub for biotechnology, robotics and artificial intelligence — though it’s also home to several B2B software giants. So it’s no surprise that the city consistently ranks among the greatest startup ecosystems in the world.

Key Facts About Boston Tech

  • Number of Tech Workers: 269,000; 9.4% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Thermo Fisher Scientific, Toast, Klaviyo, HubSpot, DraftKings
  • Key Industries: Artificial intelligence, biotechnology, robotics, software, aerospace
  • Funding Landscape: $15.7 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Summit Partners, Volition Capital, Bain Capital Ventures, MassVentures, Highland Capital Partners
  • Research Centers and Universities: MIT, Harvard University, Boston College, Tufts University, Boston University, Northeastern University, Smithsonian Astrophysical Observatory, National Bureau of Economic Research, Broad Institute, Lowell Center for Space Science & Technology, National Emerging Infectious Diseases Laboratories

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account