Egen Jobs

Lead Machine Learning Engineer, Inference & Performance

Egen

Lead Machine Learning Engineer, Inference & Performance

Posted Yesterday

Remote

Hiring Remotely in USA

159K-250K Annually

Senior level

Remote

Hiring Remotely in USA

159K-250K Annually

Senior level

Design, optimize, and operate production LLM inference and training pipelines. Improve latency, throughput, and GPU utilization using techniques like batching, quantization, FlashAttention, and kernel-level profiling. Deploy and autoscale multiple models on shared GKE GPU clusters, consult with clients on performance and cost requirements, and carry prototypes to robust, scalable production services.

The summary above was generated by AI

About Egen:

Egen is a fast-growing and entrepreneurial company with a data-first mindset. We bring together the best engineering talent working with the most advanced technology platforms, including Google Cloud and Salesforce, to help clients drive action and impact through data and insights. We are committed to being a place where the best people choose to work so they can apply their engineering and technology expertise to envision what is next for how data and platforms can change the world for the better. We are dedicated to learning, thrive on solving tough problems, and continually innovate to achieve fast, effective results. If this describes you, we want you on our team.

Want to learn more about life at Egen? Check out these resources in addition to the job description.

Meet Egen

Life at Egen

Culture and Values at Egen

Career Development at Egen

Benefits at Egen

About the opportunity:

As a Senior AI Engineer, you will be at the forefront of our Generative AI initiatives. We treat AI as a software engineering discipline. You will be responsible for the full lifecycle of our AI features—specifically document intelligence and RAG pipelines—taking them from initial prototype to robust, scalable production services. You will solve for real-world constraints like latency, error handling, and cost optimization.

You’ll collaborate with a diverse range of clients to translate business needs into high-performance AI architectures. This role requires a blend of deep technical expertise in LLMs and a disciplined Software Engineering approach to ensure our solutions are robust, ethical, and scalable.

What You Will Do:

Optimize Inference: Build and tune production LLM serving with vLLM and SGLang—maximizing throughput and minimizing latency through batching, paged attention, quantization, and KV-cache strategies

Profile & Accelerate Training: Instrument and profile training runs to find bottlenecks, then resolve them with the right attention implementations (e.g. FlashAttention) tuned to the underlying hardware (H200, GB200)

Engineer for the Hardware: Apply a working understanding of GPU architecture and attention internals to choose the right approach per accelerator, rather than relying on defaults

Serve at Scale: Deploy and operate multiple models within shared GPU clusters on GKE, with autoscaling, efficient bin-packing, and graceful handling of mixed workloads

Drive Efficiency: Own GPU utilization as a first-class metric—measure it, improve throughput-per-dollar, and continuously raise the ceiling on what our fleet can deliver

Collaborate & Consult: Work directly with clients to understand performance, latency, and cost requirements, and translate them into pragmatic serving and training architectures

Your Technical Toolkit:

Core Languages: Mastery of Python and shell scripting; comfort reading and reasoning about lower-level (CUDA-adjacent) performance code is a strong plus
Inference Frameworks: Hands-on experience with vLLM, SGLash, or comparable high-performance serving stacks
GPU & Model Internals: Solid grasp of GPU architecture, the fundamentals of LLM inference, and the attention mechanism—including where the bottlenecks live and how FlashAttention and similar techniques address them across hardware generations (H200, GB200)
Profiling: Fluency with profiling tools to diagnose training and inference bottlenecks (compute-bound vs. memory-bound, kernel-level analysis)
Infrastructure: Strong Kubernetes (GKE) experience—deploying and autoscaling multiple models on shared GPU clusters on Google Cloud
Mindset: A strong software engineering foundation—you write clean, maintainable code, measure before optimizing, and understand the full SDLC

Basic Qualifications:

Bachelor's or Master's degree in Computer Science, Engineering, or a related technical field
5+ years of experience in ML/AI engineering, with a meaningful portion focused on performance, infrastructure, or systems
Proven track record of deploying and optimizing models in a production environment
Demonstrated experience profiling and improving GPU utilization for training and/or inference
Experience with Classic Machine Learning (neural nets, training, tuning) is a strong plus
Knowledge of Data Engineering and SQL

Personal Attributes:

Ownership: You take pride in your work and see optimizations through from profile to production
Curiosity: Hardware and serving frameworks change fast; you are a lifelong learner who stays ahead of the curve
Rigor: You measure before you optimize and let data, not intuition, guide where you spend effort
Consultative Spirit: You enjoy interacting with clients and can translate technical complexity into business value
Ethics: You prioritize responsible AI development and data privacy

Compensation & Benefits:

This role is eligible for our competitive salary and comprehensive benefits package to support your well-being:

- Comprehensive Health Insurance

- Paid Leave (Vacation/PTO)

- Paid Holidays

- Sick Leave

- Parental Leave

- Bereavement Leave

- 401 (k) Employer Match

- Employee Referral Bonuses

Check out our complete list of benefits here - >https://egen.ai/people/#benefits

Important: All roles are subject to standard hiring verification practices, which may include background checks, employment verification, and other relevant checks.

EEO and Accommodations:

Egen is an equal opportunity employer and is committed to inclusion, diversity, and equity in the workplace. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, disability status, protected veterans’ status, or any other characteristic protected by federal, state, or local laws. Egen will also consider qualified applications with criminal histories, consistent with legal requirements. Egen welcomes and encourages applications from individuals with disabilities. Reasonable accommodations are available for candidates during all aspects of the selection process. Please advise the talent acquisition team if you require accommodations during the interview process.

Similar Jobs

Capital One

Lead Relationship Manager (Remote-Eligible)

4 Minutes Ago

Remote or Hybrid

110K-125K Annually

Mid level

110K-125K Annually

Mid level

Fintech • Machine Learning • Payments • Software • Financial Services

Lead Relationship Manager responsible for growing Discover Global Network acceptance by developing partnerships, negotiating agreements, managing client portfolios, advising on acceptance/pricing, evaluating risk and compliance, performing market analysis, and maintaining C-suite relationships. Role is client-facing and requires about 25–30% travel.

Optum

Director of Claims Pricing Solutions - Remote

4 Minutes Ago

In-Office or Remote

135K-231K Annually

Senior level

135K-231K Annually

Senior level

Artificial Intelligence • Big Data • Healthtech • Information Technology • Machine Learning • Software • Analytics

Lead go-to-market pricing strategy for Optums claims pricing software and services. Develop client-specific pricing solutions, present software capabilities, manage contract lifecycle, and create pricing models and RFP responses. Collaborate cross-functionally to launch products, provide voice-of-customer feedback, and support sales and renewal initiatives.

Top Skills: 3M: GpcsAdvantasure: Ika ClaimsClaraprice: Payer Contract ManagementClarity FlowCognizant: Facets/NetworxConduent/Hsp: MeditracContract AnalyticsEpic: TapestryHealthedge SourceHealthedge: Health Rules PayerMd Clarity: RevfindOptum Payment System InterfaceOptum Rate ManagerPmmc: Contract ModelingQnxt/NetworxRam: HealthsuiteReimbursement Manager (Rem)Strata: Market Reimbursement IntelligenceZelis: Payer Compass

Optum

Consultant

4 Minutes Ago

In-Office or Remote

92K-164K Annually

Mid level

92K-164K Annually

Mid level

Artificial Intelligence • Big Data • Healthtech • Information Technology • Machine Learning • Software • Analytics

Serve as a payer subject matter expert and implementation lead, translating payer requirements into scalable solution designs and driving end-to-end implementations. Advise sales, product, and engineering; manage scope, risks, milestones, and cross-functional coordination; improve standardization, tooling, and playbooks to reduce rework and speed time-to-value while ensuring operational stability and payer satisfaction.

Top Skills: APIsClearinghouse ConnectivityData ExchangePlatform-Based SolutionsSystem Integrations

What you need to know about the Boston Tech Scene

Boston is a powerhouse for technology innovation thanks to world-class research universities like MIT and Harvard and a robust pipeline of venture capital investment. Host to the first telephone call and one of the first general-purpose computers ever put into use, Boston is now a hub for biotechnology, robotics and artificial intelligence — though it’s also home to several B2B software giants. So it’s no surprise that the city consistently ranks among the greatest startup ecosystems in the world.

Key Facts About Boston Tech

Number of Tech Workers: 269,000; 9.4% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Thermo Fisher Scientific, Toast, Klaviyo, HubSpot, DraftKings
Key Industries: Artificial intelligence, biotechnology, robotics, software, aerospace
Funding Landscape: $15.7 billion in venture capital funding in 2024 (Pitchbook)
Notable Investors: Summit Partners, Volition Capital, Bain Capital Ventures, MassVentures, Highland Capital Partners
Research Centers and Universities: MIT, Harvard University, Boston College, Tufts University, Boston University, Northeastern University, Smithsonian Astrophysical Observatory, National Bureau of Economic Research, Broad Institute, Lowell Center for Space Science & Technology, National Emerging Infectious Diseases Laboratories