Novaprime Logo

Novaprime

Staff Data Engineer

Posted 4 Days Ago
In-Office
12 Locations
170K-185K Annually
Senior level
In-Office
12 Locations
170K-185K Annually
Senior level
The Staff Data Engineer will architect and operate the Databricks lakehouse, managing data ingestion, governance, and analytics collaboration with cross-functional teams.
The summary above was generated by AI

***This role is fully remote within the U.S. (occasional travel to meet teams or partners). In an effort to reduce the inevitable influx of fraudulent candidates, we're not posting it as remote.***

About Us:

Novaprime is a mortgage technology company dedicated to reducing the costs of originating loans by leveraging emerging technologies, with strong focus on AI and Distributed Ledger Technology (DLT). We accomplish our goals by focusing on data-driven innovation, working with some of the world's largest institutions, and creating outcomes. Novaprime is backed by key investors in the mortgage industry, VC, and financial services.

Job Description:

Novaprime is hiring a Staff Data Engineer to architect, build, and operate our Databricks-centric lakehouse on AWS. You will own the data lifecycle—streaming and batch ingestion, modeling, governance, quality, observability, and cost/perf—using Delta Lake, Delta Live Tables, and Databricks Workflows. This is a hands-on leadership role: you will set technical direction, deliver mission-critical pipelines, mentor engineers, and directly drive analytics by defining trusted metrics, instrumentation, and monitoring alongside product and ML. To succeed, you must enjoy thinking in systems and always learning.

Responsibilities:

  • Implement new technologies that yield competitive advantages and are aligned with our business goals.

  • Drive development from concept to market by combining various technologies and collaborating with a cross-functional team.

  • Define the lakehouse architecture and standards on Databricks (Unity Catalog governance, Workflows, DLT, Delta Lake).

  • Build and operate high-reliability streaming and batch pipelines with Structured Streaming, Auto Loader, CDC patterns, and backfills.

  • Design medallion data models and canonical domains; implement SCDs, schema evolution, and versioned/time-travel datasets.

  • Establish data quality, SLAs/SLOs, lineage/traceability, and audit-ready documentation aligned to SOC 2.

  • Drive analytics: define and govern KPI/metric definitions, build metrics pipelines, enable semantic consistency, and implement monitoring/alerting for data and dashboards.

  • Optimize cost/perf on Databricks (cluster policies, sizing, Photon, AQE, partitioning, file sizing, skew mitigation, Z-ORDER/OPTIMIZE).

  • Enforce security and privacy (Unity Catalog permissions, row/column-level controls, PII masking/tokenization, secrets management).

  • Enable self-serve with standardized, well-documented datasets; collaborate with ML on feature pipelines and Feature Store.

  • Champion software excellence: Git-based workflows, code reviews, automated testing, CI/CD for data, and IaC.

  • Collaborate with product managers, designers, and other stakeholders to develop strategies and implement new products and features.

  • Stay current with the latest technologies to maintain competitiveness and technological leadership in the market.

  • Various engineering-related tasks which continue to progress the organization’s mission.

Requirements:

  • B.S. in Computer Science or equivalent experience.

  • 8+ years building and operating production data platforms; 4+ years deep, hands-on Databricks/Spark (PySpark + SQL).

  • Proven ownership of a production lakehouse (S3 + Delta Lake) with strict SLAs and compliance requirements.

  • Expertise with Delta Lake (MERGE/CDC, schema evolution, time travel, OPTIMIZE/Z-ORDER, VACUUM) and DLT, Workflows, Auto Loader; Feature Store experience in production.

  • Strong data modeling (dimensional, canonical), SCD Types 1/2, and handling slowly changing entities and schema drift.

  • Track record delivering trustworthy datasets with monitoring, alerting, lineage, and clear documentation; able to define and maintain metric layers consumed by product and business.

  • Advanced Python and SQL; testing culture (pytest), CI/CD (GitHub Actions), and Terraform for Databricks; solid Git practices.

  • AWS foundations: S3, IAM, networking basics; event ingestion.

  • Excellent communication and leadership; able to drive design reviews, write clear technical docs, and mentor engineers in a remote, async environment.

Desired Experience:

  • Databricks SQL/Serverless, Unity Catalog lineage/system tables, and semantic layer experience.

  • Product analytics and observability: Mixpanel and New Relic.

  • Prior leadership of SOC 2 audits/readiness and data platform on-call rotations.

  • Previous startup experience.

Benefits:

  • Competitive salary, equity, and benefits.

  • Mostly remote work.

  • Opportunity to make a difference for millions and their ability to be homeowners.

We are an equal-opportunity employer and value diversity at our company. We do not discriminate based on race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.

Top Skills

AWS
Databricks
Delta Lake
Git
Python
SQL
Terraform

Similar Jobs

3 Days Ago
Remote or Hybrid
Orlando, FL, USA
Senior level
Senior level
Artificial Intelligence • Cloud • HR Tech • Information Technology • Productivity • Software • Automation
Design and develop scalable software components, collaborate with product managers, ensure quality through automated testing, and promote engineering craftsmanship while integrating AI into processes.
Top Skills: AngularJavaJunitKubernetesPythonReactSeleniumTestngVue
15 Days Ago
In-Office
23 Locations
167K-261K Annually
Senior level
167K-261K Annually
Senior level
Digital Media
As a Staff Data Engineer, you'll design data models and pipelines, mentor engineers, and lead initiatives to enhance Scribd's data platform, ensuring quality and scalability.
Top Skills: AirflowAws (S3DatabricksDatadogDelta LakeEmrGlueKafkaKinesis)PythonScalaSparkSQL
10 Days Ago
In-Office or Remote
52 Locations
183K-275K Annually
Senior level
183K-275K Annually
Senior level
Software
Lead design and development of enterprise data platform, overseeing data services and pipelines, ensuring reliability and governance, mentoring teams, and optimizing processes.
Top Skills: AirflowAWSDbtDockerFivetranGithub ActionsGoGoogle BigqueryJavaKubernetesPythonScalaSnowflakeTerraform

What you need to know about the Boston Tech Scene

Boston is a powerhouse for technology innovation thanks to world-class research universities like MIT and Harvard and a robust pipeline of venture capital investment. Host to the first telephone call and one of the first general-purpose computers ever put into use, Boston is now a hub for biotechnology, robotics and artificial intelligence — though it’s also home to several B2B software giants. So it’s no surprise that the city consistently ranks among the greatest startup ecosystems in the world.

Key Facts About Boston Tech

  • Number of Tech Workers: 269,000; 9.4% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Thermo Fisher Scientific, Toast, Klaviyo, HubSpot, DraftKings
  • Key Industries: Artificial intelligence, biotechnology, robotics, software, aerospace
  • Funding Landscape: $15.7 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Summit Partners, Volition Capital, Bain Capital Ventures, MassVentures, Highland Capital Partners
  • Research Centers and Universities: MIT, Harvard University, Boston College, Tufts University, Boston University, Northeastern University, Smithsonian Astrophysical Observatory, National Bureau of Economic Research, Broad Institute, Lowell Center for Space Science & Technology, National Emerging Infectious Diseases Laboratories

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account