Tavus Logo

Tavus

Senior Data Engineer

Posted Yesterday
In-Office or Remote
2 Locations
150K-200K Annually
Senior level
In-Office or Remote
2 Locations
150K-200K Annually
Senior level
As a Senior Data Engineer, you will own the data strategy, build and optimize data pipelines, and ensure high-quality datasets for AI models, collaborating closely with ML engineers.
The summary above was generated by AI
About Us

Tavus is a research lab pioneering human computing. We’re building AI Humans: a new interface that closes the gap between people and machines, free from the friction of today’s systems. Our real-time human simulation models let machines see, hear, respond, and even look real—enabling meaningful, face-to-face conversations. AI Humans combine the emotional intelligence of humans with the reach and reliability of machines, making them capable, trusted agents available 24/7, in every language, on our terms.

Imagine a therapist anyone can afford. A personal trainer that adapts to your schedule. A fleet of medical assistants that can give every patient the attention they need. With Tavus, individuals, enterprises, and developers can all build AI Humans to connect, understand, and act with empathy at scale.

We’re a Series A company backed by world-class investors including Sequoia Capital, Y Combinator, and Scale Venture Partners.

Be part of shaping a future where humans and machines truly understand each other.

The Role

Data is the foundation of everything we build. We’re looking for a Senior Data Engineer who goes beyond pipelines and cleaning datasets. You’ll own our entire data strategy, from sourcing and curating to structuring and optimizing, ensuring our models and products are powered by the highest-quality data possible. You’re a true master of your craft including data sourcing, formatting, labeling, cleaning, and making use of our internal data. 

Your Mission 🚀
  • Be a data guru – You anticipate the data needs not just for today, but for the future. You know how to curate diverse, high-quality datasets to ensure AI models reach their full potential.

  • Influence AI model training – Your data work will directly impact AI model performance, efficiency, and inference accuracy. You will collaborate closely with ML engineers to optimize datasets for maximum AI effectiveness.

  • Own, build and scale the data pipeline. You will be highly involved in data sourcing, and expand and own the curation, filtering and preprocessing pipelines across a variety of data modalities.

  • Be a data hunter – Web scraping, third-party deals, unconventional sources—you’ll find, collect, and curate the best multimodal data (text, video, images) to power our models. Manage large-scale data procurement to ensure our models train on the highest quality information.

  • Be a video data craftsman - we’re building something truly unique based on a blend of video and audio data. Throwing data at the problem is not a solution here, but you should be up for the challenge of making it work! You will own this challenge and ensure that our video and audio datasets are structured for AI success. You will help us truly flesh out the capabilities of our SOTA models!

  • Optimize labeling & automation – You will own the data labeling process and build automated workflows to make cleaning, labeling, and structuring data as efficient as possible. Work closely with our data annotation teams to ensure high-quality labeled data for ML models.

  • Turn internal data into gold – Our own platform is a goldmine of insights—help us unlock and use it to drive smarter decisions and supercharge growth.

  • Speed + precision – Move fast, but don’t break data. Every pipeline, dataset, and workflow should be tight, efficient, and built to last.

What We’re Looking For 🔥
  • You don’t just maintain - you build. From zero to fully running pipelines, you make things happen. You can take charge of how we use internal data to make smarter decisions.

  • Extreme ownership - You own data strategy end-to-end, proactively solving what data we need, where to get it, and how to structure it for AI impact.

  • Strategic mindset – You think beyond pipelines—you anticipate data needs before they arise and help shape AI development at Tavus.

  • Previous work with LLMs, multimodal data, is a big plus. You know how to source, structure, and optimize data for real AI impact.

  • Automation expert – You know how to automate data cleaning, structuring, and labeling workflows for efficiency and scale.

  • ML-first mindset – You understand that better data = better models and structure datasets to maximize AI model accuracy.

  • Fast, but flawless. Speed matters, but so does accuracy. You balance both.

  • You don’t follow best practices—you create them. A lot of what we’re doing is new- you set the standard for how data should be done.

  • Technical expertise – You have strong experience with Python, SQL, and large-scale data processing tools.

Bonus Points if:
  • You have some previous work with LLMs, multimodal data. You know how to source, structure, and optimize data for real AI impact.

  • You have experience with in-house video data collection and relevant studio setups. You know best practices for multimodal video and audio data collection.

 

Benefits & Culture

When you join Tavus, you’re joining a diverse and supportive team. Our work is driven by our people, and our success is shared by all. This position has a flexible work schedule, unlimited PTO, competitive healthcare, and gear stipends, as well as plenty of fun. At the end of the day, we want Tavus to be a place for you to learn, directly drive impact, and work with a team you love.

To learn more about our team culture and benefits, check out our hiring page.

Tavus is growing fast, and we’d like you to grow with us. If you’re excited to get your hands dirty and help make machines more human, drop your resume and we’ll be in touch.

We are not looking for cultural fits, we are looking for culture creators. Diversity is what drives our success – it’s at the core of how we hire, communicate, and work. We are inclusive to all and combine our diverse backgrounds, skill sets, and perspectives to build the best experiences for our clients.

Top Skills

Large-Scale Data Processing Tools
Python
SQL

Similar Jobs

7 Hours Ago
Easy Apply
Remote
USA
Easy Apply
174K-211K
Senior level
174K-211K
Senior level
Fintech • Social Impact • Software
The Senior Data Engineer II designs and maintains data pipelines, supports machine learning workflows, and collaborates with product teams to enhance data integration and analytics.
Top Skills: DatadogDbtFivetranHightouchPythonSagemakerSQLVertex Ai
Yesterday
Easy Apply
Remote or Hybrid
United States
Easy Apply
Senior level
Senior level
Fintech • Mobile • Software • Financial Services
As a Senior Data Engineer, you will design and maintain data solutions to support risk management, ensuring high data integrity and driving strategic evolution of the data platform.
Top Skills: AirtableAnsibleApache AirflowApache KafkaCloudFormationDbtGithub ActionsGitlab Ci/CdPythonSnowflakeSQLTerraform
14 Days Ago
Remote
USA
Senior level
Senior level
Software
The Senior Data Engineer will build and maintain analytics infrastructure, develop ETL processes, integrate data sources, mentor team members, and optimize data architecture.
Top Skills: AirflowAWSCi/CdCloudFormationKafkaPysparkPythonRedshiftSparkSQLTerraform

What you need to know about the Boston Tech Scene

Boston is a powerhouse for technology innovation thanks to world-class research universities like MIT and Harvard and a robust pipeline of venture capital investment. Host to the first telephone call and one of the first general-purpose computers ever put into use, Boston is now a hub for biotechnology, robotics and artificial intelligence — though it’s also home to several B2B software giants. So it’s no surprise that the city consistently ranks among the greatest startup ecosystems in the world.

Key Facts About Boston Tech

  • Number of Tech Workers: 269,000; 9.4% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Thermo Fisher Scientific, Toast, Klaviyo, HubSpot, DraftKings
  • Key Industries: Artificial intelligence, biotechnology, robotics, software, aerospace
  • Funding Landscape: $15.7 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Summit Partners, Volition Capital, Bain Capital Ventures, MassVentures, Highland Capital Partners
  • Research Centers and Universities: MIT, Harvard University, Boston College, Tufts University, Boston University, Northeastern University, Smithsonian Astrophysical Observatory, National Bureau of Economic Research, Broad Institute, Lowell Center for Space Science & Technology, National Emerging Infectious Diseases Laboratories

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account