Flagship Pioneering

Principal Data Engineer, Platform

Posted 24 Days Ago

Cambridge, MA

100K-150K

Senior level

Cambridge, MA

100K-150K

Senior level

As Principal Data Engineer, lead the design and implementation of data storage solutions, ensuring efficient management and governance of scientific datasets across diverse platforms.

The summary above was generated by AI

🚀 About Lila Sciences

Lila Sciences is the world’s first scientific superintelligence platform and autonomous lab for life, chemistry, and materials science.  We are pioneering a new age of boundless discovery by building the capabilities to apply AI to every aspect of the scientific method.  We are introducing scientific superintelligence to solve humankind's greatest challenges, enabling scientists to bring forth solutions in human health, climate, and sustainability at a pace and scale never experienced before. Learn more about this mission at  www.lila.ai  

At Lila, we are uniquely cross-functional and collaborative. We are actively reimagining the way teams work together and communicate. Therefore, we seek individuals with an inclusive mindset and a diversity of thought. Our teams thrive in unstructured and creative environments. All voices are heard because we know that experience comes in many forms, skills are transferable, and passion goes a long way.

If this sounds like an environment you’d love to work in, even if you only have some of the experience listed below, please apply.

🌟 Your Impact at Lila

As the Principal Data Engineer, you will be the technical leader of our data storage strategy. You will work as a member of the platform team to design and implement a system that efficiently stores, organizes, and retrieves complex datasets from various sources (e.g., laboratory instruments, simulations, ML systems). You will also set the direction for data infrastructure best practices, ensuring security, compliance, and top-tier performance. Your work will enable engineers and scientists to leverage high-quality, curated data to drive scientific discoveries and real-world applications.

🛠️ What You'll Be Building

Design and implement a scalable data lake and/or data warehouse architecture optimized for large volumes of heterogeneous scientific data.
Drive optimizations for query performance and data retrieval, reducing time to insight for end-users and downstream systems.
Implement data governance processes, including data cataloging, lineage, and quality controls.
Participate in the software development life cycle and drive continuous improvement, focusing on designing, implementing, and maintaining software services.
Develop reusable code, libraries, APIs, and services to improve efficiency and scalability.
Align development with strategic goals, ensuring software supports broader organizational needs.
Manage git repositories and CI/CD pipelines, enforce best practices, and foster a collaborative development culture.
Support infrastructure as code and design efficient deployment strategies.
Utilize observability tooling to monitor and optimize software performance.
Write clear, concise documentation for both engineers and end users.

🧰 What You’ll Need to Succeed

Minimum of 5 years of experience managing data systems in a production setting.
Python coding experience in the data domain.
Acute listening skills and patience to deeply understand user challenges.
Experience implementing petabyte scale data solutions.
Excellent problem-solving skills and team-first mentality.
Strong communication skills to effectively collaborate with team members and stakeholders across different data domains.
Energetic self-starter and independent thinker, with strong attention to detail.
Eager to work with highly skilled and dynamic teams in a fast-paced, entrepreneurial, and technical setting.

✨ Bonus Points For

Experience with workflow orchestration software (e.g., Temporal, Airflow, Dagster, Prefect).
Familiarity with data science and ML libraries (pandas, numpy, scipy).
Knowledge of modern developer tools (pydantic, pyright, uv, poetry).
Experience working in Kubernetes environments.
Familiarity with AWS services (e.g., IAM, RDS, S3, Redshift).

🌈 We’re All In

Lila Sciences is committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity or Veteran status.

🤝 A Note to Agencies

Lila Sciences does not accept unsolicited resumes from any source other than candidates. The submission of unsolicited resumes by recruitment or staffing agencies to Lila Sciences or its employees is strictly prohibited unless contacted directly by Lila Science’s internal Talent Acquisition team. Any resume submitted by an agency in the absence of a signed agreement will automatically become the property of Lila Sciences, and Lila Sciences will not owe any referral or other fees with respect thereto.

Top Skills

Airflow

AWS

Dagster

Kubernetes

Numpy

Pandas

Poetry

Prefect

Pydantic

Pyright

Python

Scipy

Temporal

55 Cambridge Parkway, Suite 800E, Cambridge, MA, United States, 02142

Similar Jobs

ZoomInfo

Principal Data Platform SW Engineer

5 Days Ago

Remote

184K-253K Annually

Senior level

184K-253K Annually

Senior level

Big Data • Information Technology • Machine Learning • Sales • Software • Database • Generative AI

Design and implement scalable data pipelines, lead architectural decisions, mentor junior engineers, and build data monitoring systems for high-quality solutions.

Top Skills: AirflowSparkAWSGCPJavaKafkaPythonScalaSnowflakeSQL

Fabric

Principal Data Platform Engineer

15 Days Ago

Remote

170K-230K Annually

Senior level

170K-230K Annually

Senior level

Fintech • Insurance

As a Principal Data Platform Engineer, you'll design a scalable data platform, drive data strategy, mentor teams, and ensure data governance and compliance.

Top Skills: AirbyteAirflowAWSBedrockClickhouseGlueMetabasePosthogRedshiftRudderstackS3Sagemaker

Lila Sciences

Principal Data Engineer, Platform

24 Days Ago

Cambridge, MA, USA

Senior level

Artificial Intelligence • Software

The Principal Data Engineer leads the data storage strategy, implements scalable architectures, ensures data governance, and supports software development practices.

Top Skills: AirflowAWSDagsterIamKubernetesNumpyPandasPoetryPrefectPydanticPyrightPythonRdsRedshiftS3ScipyTemporalUv

What you need to know about the Boston Tech Scene

Boston is a powerhouse for technology innovation thanks to world-class research universities like MIT and Harvard and a robust pipeline of venture capital investment. Host to the first telephone call and one of the first general-purpose computers ever put into use, Boston is now a hub for biotechnology, robotics and artificial intelligence — though it’s also home to several B2B software giants. So it’s no surprise that the city consistently ranks among the greatest startup ecosystems in the world.

Key Facts About Boston Tech

Number of Tech Workers: 269,000; 9.4% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Thermo Fisher Scientific, Toast, Klaviyo, HubSpot, DraftKings
Key Industries: Artificial intelligence, biotechnology, robotics, software, aerospace
Funding Landscape: $15.7 billion in venture capital funding in 2024 (Pitchbook)
Notable Investors: Summit Partners, Volition Capital, Bain Capital Ventures, MassVentures, Highland Capital Partners
Research Centers and Universities: MIT, Harvard University, Boston College, Tufts University, Boston University, Northeastern University, Smithsonian Astrophysical Observatory, National Bureau of Economic Research, Broad Institute, Lowell Center for Space Science & Technology, National Emerging Infectious Diseases Laboratories

Flagship Pioneering

Principal Data Engineer, Platform

Top Skills

Flagship Pioneering Cambridge, Massachusetts, USA Office

Similar Jobs

Principal Data Platform SW Engineer

Principal Data Platform Engineer

Principal Data Engineer, Platform

What you need to know about the Boston Tech Scene

Key Facts About Boston Tech