Rula Logo

Rula

Staff Data Engineer - AI (Remote)

Posted 18 Hours Ago
In-Office or Remote
2 Locations
184K-217K
Senior level
In-Office or Remote
2 Locations
184K-217K
Senior level
The Staff Data Engineer will build and maintain data pipelines for ML models in a collaborative team, focusing on quality and scalability, within a remote work setting.
The summary above was generated by AI

We believe that mental health is just as important as physical health. We recognize that mental health issues can be complex and multifaceted, and we are dedicated to treating the whole person, not just the symptoms.

We aim to create a world where mental health is no longer stigmatized or marginalized, but rather is embraced as an integral part of one's overall well-being. 

We believe that by providing quality care that is both evidence-based and compassionate, we can empower individuals to take charge of their mental health and achieve their full potential. We are passionate about making a positive impact on the lives of those struggling with mental health issues and we strive to be a force for positive change in the field of mental healthcare.

About the Role

We’re shaping the future of mental health care with AI-enabled experiences that enhance, not replace, the human connection at the core of therapy. Our north star is clinically-grounded and responsible AI designed to bring greater transparency, personalization, and continuous support across the therapy journey. Our work transforms therapy into an experience that’s more connected and accessible. As we expand our portfolio of AI experiences, we’re scaling our team to drive innovation and set a new standard for mental health care.

As a Data Engineer, you will help build and maintain the data pipelines that pull information from our central storage system to train machine learning models and AI tools, supporting a variety of use cases that support our providers and improve patient outcomes. You will be part of a collaborative group that values open discussions and quick adjustments to meet changing needs, working alongside data experts and other specialists to turn raw information into useful resources for our mission. This role sits within our data team, which is part of the overall engineering organization and is a close partner team to our ML Team, where your daily work—designing reliable flows of information, testing for accuracy, and solving unexpected challenges—will directly support innovations that help more individuals get the mental health support they deserve. If you enjoy turning complex data into something that makes a real difference in people's lives, this is your chance to contribute to meaningful advancements in health care.

Required Qualifications

  • 8+ years of Data Pipeline Development – specifically building and maintaining scalable ETL/ELT pipelines for ML/AI training workflows, using tools like AWS Glue, DBT, Dagster, Spark, or Ray for distributed processing of large-scale structured and unstructured data from Data Lakes. Strong proficiency in Spark, Python, and SQL for feature engineering, data transformation, and ensuring high-quality, versioned datasets suitable for model training and inference.

  • 8+ Years of Cloud Infrastructure & Data Warehousing experience, 4+ of which with a focus in AWS. This person should be proficient in AWS services such as Redshift, S3, Glue, IAM, EMR, and SageMaker for supporting ML/AI pipelines. Candidates may bring additional experience from other cloud environments (e.g., GCP services like BigQuery, GCS, Dataflow, or AI Platform; Azure services like Synapse Analytics, Blob Storage, Databricks, or Machine Learning Studio) to complement their AWS expertise. Experience optimizing data warehouses (e.g., Redshift, Snowflake, BigQuery) and managing data lakes (e.g., S3, GCS, Azure Blob) for large-scale, versioned ML training datasets, with a focus on partitioning, access controls, and integration with distributed processing frameworks like Spark.

  • Implementing scalable data validation, quality checks, and error-handling mechanisms tailored for ML/AI pipelines, including bias detection, anomaly identification, and dataset integrity to ensure high-fidelity training data. Familiarity with data governance practices, such as metadata management, lineage tracking for reproducible models, and compliance with regulations like CPRA or HIPAA in Data Lake environments.

  • Optimizing data pipelines, queries, and managing large datasets for efficiency and scalability. Knowledge of best practices for high-throughput systems.

  • Experience with data security measures (encryption, role-based access control, data masking). Understanding of compliance standards (e.g., HIPAA, SOC 2) and their application in data engineering.

  • Strong ability to work cross-functionally with data analysts, data scientists, and stakeholders. Effective communication skills to explain technical concepts to non-technical audiences. Adaptability to thrive in a fast-paced startup environment.

Preferred Qualifications

While having the preferred qualifications enhances your candidacy, having all of them is not mandatory. We encourage all interested applicants to apply, even those who may not meet every preferred requirement.

  • Hands-on experience with AWS tools like S3, Glue, EMR, SageMaker, and Lambda for building scalable ETL/ELT pipelines optimized for ML/LLM training, including feature engineering, data versioning, and handling large-scale unstructured data

  • Proven track record in implementing robust data validation, bias detection, and lineage tracking in Data Lakes, with familiarity in compliance standards (e.g., HIPAA for health data) and tools like Delta Lake or Iceberg to ensure high-fidelity, reproducible datasets for model training.

  • Familiarity with infrastructure as code (IaC) tools like Terraform or CloudFormation for managing cloud resources.

  • Experience implementing and maintaining CI/CD pipelines for data workflows.

  • Experience in monitoring and reducing costs for large-scale ML/AI workflows in AWS, using techniques like spot instances for training jobs, auto-scaling EMR clusters, and efficient S3 storage tiers (e.g., Intelligent-Tiering) to minimize expenses while maintaining performance.

  • Strong ability to partner with data scientists and ML engineers to design efficient pipelines, using orchestration tools (e.g., Airflow, Dagster) for incremental loading and cost optimization, while monitoring performance metrics like latency and resource utilization in AWS environments.

We're serious about your well-being! As part of our team, full-time employees receive:

  • 100% remote work environment (US-based only): Working hours to support a healthy work-life balance, ensuring you can meet both professional and personal commitments

  • Attractive pay and benefits: Full transparency of pay ranges regardless of where you live in the United States

  • Comprehensive health benefits: Medical, dental, vision, life, disability, and FSA/HSA

  • 401(k) plan access: Start saving for your future

  • Generous time-off policies: Including 2 company-wide shutdown weeks each year for self-care (for most employees)

  • Paid parental leave: Available for all parents, including birthing, non-birthing, adopting, and fostering

  • Employee Assistance Program (EAP): Support for your mental and physical health

  • New hire home office stipend: Set up your workspace for success

  • Quarterly department stipend: Fund team-building activities or in-person gatherings

  • Wellness events and lunch & learns: Explore a variety of engaging topics

  • Community and employee resource groups: Participate in groups that celebrate employee identity and lived experiences, fostering a sense of community and belonging for all

Our team

We believe that diversity, equity, and inclusion are fundamental to our mission of making mental healthcare work for everyone.  We are dedicated to having a culture of inclusion that will support our employees in feeling safe, seen, heard, and valued.

Compensation Range: $184.1K - $216.6K


#BI-Remote

Top Skills

Aws Glue
Aws Redshift
Azure Synapse Analytics
Dagster
Dataflow
Dbt
Emr
Gcp Bigquery
Gcs
Python
Ray
S3
Sagemaker
Spark
SQL

Similar Jobs at Rula

11 Minutes Ago
In-Office or Remote
2 Locations
156K-183K
Senior level
156K-183K
Senior level
Healthtech • Other • Social Impact • Software • Telehealth
As a Senior Application Security Engineer, you'll enhance security practices, manage vulnerabilities, and collaborate with teams on patient data protection.
Top Skills: DastJavaScriptNode.jsOwasp Top 10ReactSastTypescript
18 Hours Ago
In-Office or Remote
2 Locations
235K-277K
Expert/Leader
235K-277K
Expert/Leader
Healthtech • Other • Social Impact • Software • Telehealth
The Director of Cloud & Platform Engineering will lead SRE and Core Platform teams, focus on cloud infrastructure strategy, and promote DevOps culture.
Top Skills: AWSKubernetesTerraform
Yesterday
In-Office or Remote
2 Locations
184K-217K
Senior level
184K-217K
Senior level
Healthtech • Other • Social Impact • Software • Telehealth
The Staff Analytics Engineer will design data models, work with BI tools, and improve data architecture while collaborating across teams to drive insights into decision-making.
Top Skills: DbtHexLookerRedshiftSQL

What you need to know about the Boston Tech Scene

Boston is a powerhouse for technology innovation thanks to world-class research universities like MIT and Harvard and a robust pipeline of venture capital investment. Host to the first telephone call and one of the first general-purpose computers ever put into use, Boston is now a hub for biotechnology, robotics and artificial intelligence — though it’s also home to several B2B software giants. So it’s no surprise that the city consistently ranks among the greatest startup ecosystems in the world.

Key Facts About Boston Tech

  • Number of Tech Workers: 269,000; 9.4% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Thermo Fisher Scientific, Toast, Klaviyo, HubSpot, DraftKings
  • Key Industries: Artificial intelligence, biotechnology, robotics, software, aerospace
  • Funding Landscape: $15.7 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Summit Partners, Volition Capital, Bain Capital Ventures, MassVentures, Highland Capital Partners
  • Research Centers and Universities: MIT, Harvard University, Boston College, Tufts University, Boston University, Northeastern University, Smithsonian Astrophysical Observatory, National Bureau of Economic Research, Broad Institute, Lowell Center for Space Science & Technology, National Emerging Infectious Diseases Laboratories

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account