Flagship Pioneering Logo

Flagship Pioneering

Senior Data Engineer, Applied AI

Posted 13 Days Ago
Cambridge, MA
Mid level
Cambridge, MA
Mid level
As a Data Engineer in the Applied AI group, you'll build data pipelines for AI-driven scientific platforms, collaborating with cross-functional teams to ensure effective data management and support machine learning workflows.
The summary above was generated by AI

🚀 About Lila Sciences

Lila Sciences is the world’s first scientific superintelligence platform and autonomous lab for life, chemistry, and materials science.  We are pioneering a new age of boundless discovery by building the capabilities to apply AI to every aspect of the scientific method.  We are introducing scientific superintelligence to solve humankind's greatest challenges, enabling scientists to bring forth solutions in human health, climate, and sustainability at a pace and scale never experienced before. Learn more about this mission at  www.lila.ai    

At Lila, we are uniquely cross-functional and collaborative. We are actively reimagining the way teams work together and communicate. Therefore, we seek individuals with an inclusive mindset and a diversity of thought. Our teams thrive in unstructured and creative environments. All voices are heard because we know that experience comes in many forms, skills are transferable, and passion goes a long way.

If this sounds like an environment you’d love to work in, even if you only have some of the experience listed below, please apply.

🌟 Your Impact at Lila

We are seeking a Senior Data Engineer to join our Applied AI group and build the data pipelines and tools that power our AI-driven scientific platform. In this role, you will design, implement, and maintain large-scale data architectures, ensuring secure and efficient data flows for advanced machine learning and generative AI models. You will also collaborate closely with scientists, software engineers, and product managers to translate internal algorithms and diverse datasets into production-ready tools and knowledgebases. This is a unique opportunity to shape the foundation of our AI capabilities by leveraging distributed systems, AWS, and Kubernetes to deliver scalable, reliable solutions that drive our agentic AI systems.

🛠️ What You'll Be Building

  • Data Pipeline Development: Design and implement robust, scalable data pipelines to support machine learning and generative AI workflows, including Retrieval-Augmented Generation (RAG).
  • Distributed Systems: Architect and manage distributed data-processing systems that handle large volumes of structured and unstructured data in real time.
  • Cloud & Infrastructure: Leverage AWS services (e.g., S3, EC2, Lambda, and others) to build highly available, fault-tolerant data solutions; utilize Kubernetes for container orchestration and scalability.
  • Integration & Collaboration: Work cross-functionally with scientists, engineers, and product managers to define platform requirements, integrate new data sources, and ensure seamless data flow into AI/ML pipelines.
  • Data Governance & Quality: Establish best practices for data security, compliance, and quality assurance, ensuring the reliability and integrity of all datasets used in production.
  • Performance Optimization: Monitor and optimize data workflows for throughput, fault-tolerance, and cost efficiency; implement robust logging, monitoring, and alerting for production readiness.

🧰 What You’ll Need to Succeed

  • Educational Background: Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
  • Professional Experience: 5+ years of experience building and maintaining production-grade data pipelines or distributed systems.
  • Proficiency in Python: Strong Python skills with a solid grasp of object-oriented programming principles and common data engineering libraries/frameworks.
  • Relational Databases: Fluency in relational database usage (e.g., PostgreSQL) for schema design, query optimization, and data governance.
  • AWS Expertise: Hands-on experience with AWS cloud services for data ingestion, storage, and processing; comfortable designing and deploying infrastructure-as-code solutions.
  • Distributed Systems Knowledge: Demonstrated ability to implement and manage distributed data-processing systems (e.g., Spark, Kafka, or similar).
  • Communication & Collaboration: Exceptional communication skills with the ability to explain complex technical concepts to both technical and non-technical stakeholders.

✨ Bonus Points For

  • Experience with ML & Generative AI: Prior work on data pipelines specifically supporting ML or generative AI models; familiarity with the MLOps lifecycle.
  • Retrieval-Augmented Generation (RAG): Hands-on experience with RAG techniques and knowledgebases for AI systems.
  • Kubernetes Proficiency: Comfort with container orchestration and scaling using Kubernetes.
  • Agentic AI Systems: Exposure to or experience building agent-driven platforms where AI systems autonomously execute complex tasks.
  • Startup Environment: Experience adapting quickly and delivering results in a fast-paced, evolving environment.
  • Domain Background: Exposure to life sciences, material sciences, or related fields.

🌈 We’re All In

Lila Sciences is committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity or Veteran status.

🤝 A Note to Agencies

Lila Sciences does not accept unsolicited resumes from any source other than candidates. The submission of unsolicited resumes by recruitment or staffing agencies to Lila Sciences or its employees is strictly prohibited unless contacted directly by Lila Science’s internal Talent Acquisition team. Any resume submitted by an agency in the absence of a signed agreement will automatically become the property of Lila Sciences, and Lila Sciences will not owe any referral or other fees with respect thereto.

Top Skills

AWS
Kafka
Kubernetes
Postgres
Python
Spark
HQ

Flagship Pioneering Cambridge, Massachusetts, USA Office

55 Cambridge Parkway, Suite 800E, Cambridge, MA, United States, 02142

Similar Jobs

7 Days Ago
Hybrid
Boston, MA, USA
176K-216K Annually
Senior level
176K-216K Annually
Senior level
Healthtech • Software • Analytics • Biotech • Pharmaceutical • Manufacturing
The Senior Data Engineer will design robust solutions for data ingestion and transformation, manage data pipelines, optimize models with AI/ML, and participate in data integration projects.
Top Skills: DatabricksDenodoEltETLKafkaSage MakerSparkTeradata
25 Days Ago
Cambridge, MA, USA
3-5
Mid level
3-5
Mid level
Artificial Intelligence • Software
The Data Engineer will build data pipelines, maintain large-scale architectures, collaborate with teams, and ensure data quality for AI integration.
Top Skills: AWSKafkaKubernetesPostgresPythonSpark
2 Hours Ago
Hybrid
Boston, MA, USA
Mid level
Mid level
Artificial Intelligence • Healthtech • Professional Services • Analytics • Consulting
The Decision Analytics Associate Consultant applies expertise in multi-omics data analysis and bioinformatics to solve challenges in life sciences, develops analysis pipelines, collaborates with teams, and provides strategic recommendations to clients.
Top Skills: CwlNextflowPythonRWdl

What you need to know about the Boston Tech Scene

Boston is a powerhouse for technology innovation thanks to world-class research universities like MIT and Harvard and a robust pipeline of venture capital investment. Host to the first telephone call and one of the first general-purpose computers ever put into use, Boston is now a hub for biotechnology, robotics and artificial intelligence — though it’s also home to several B2B software giants. So it’s no surprise that the city consistently ranks among the greatest startup ecosystems in the world.

Key Facts About Boston Tech

  • Number of Tech Workers: 269,000; 9.4% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Thermo Fisher Scientific, Toast, Klaviyo, HubSpot, DraftKings
  • Key Industries: Artificial intelligence, biotechnology, robotics, software, aerospace
  • Funding Landscape: $15.7 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Summit Partners, Volition Capital, Bain Capital Ventures, MassVentures, Highland Capital Partners
  • Research Centers and Universities: MIT, Harvard University, Boston College, Tufts University, Boston University, Northeastern University, Smithsonian Astrophysical Observatory, National Bureau of Economic Research, Broad Institute, Lowell Center for Space Science & Technology, National Emerging Infectious Diseases Laboratories

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account