We are looking for a Senior Data & AI Platform Engineer to build internal tools and services on top of our large-scale data infrastructure. Your primary focus will be developing systems that leverage vector embeddings, LLM APIs, and semantic search to unlock value from structured and unstructured data.
This is a hands-on engineering role for someone who enjoys building practical AI-powered tools — not just experiments — and shipping them into production in a fast-moving startup environment.
What You’ll DoDesign and build data-driven tools that operate on large datasets stored in S3 and Snowflake
Implement pipelines that:
Extract specific columns or datasets from Snowflake
Generate vector embeddings via APIs such as OpenAI
Store and manage embeddings in vector databases like Pinecone
Enable semantic search and similarity-based retrieval
Develop enrichment workflows that:
Query structured data
Use LLM APIs to generate new derived columns
Write enriched results back into Snowflake
Build reusable internal services and SDKs around embedding generation, prompt orchestration, and data augmentation
Optimize performance and cost across AWS infrastructure
Work closely with product and data teams to turn use cases into scalable engineering solutions
Ensure reliability, observability, and maintainability of AI-powered pipelines
Tool to extract a single Snowflake column, generate embeddings, push to Pinecone, and expose a semantic search API
Batch enrichment pipeline that queries records from Snowflake, calls OpenAI APIs for structured enrichment, and writes new columns back
Internal framework for LLM-based data transformation and validation
Query abstraction layer to make AI-enhanced analytics accessible to non-engineering teams
5+ years of software engineering experience
Strong backend engineering skills (Python preferred; other modern languages acceptable)
Solid experience with:
AWS (IAM, Lambda, ECS/EKS, S3, networking, security best practices)
Data warehousing (Snowflake preferred)
API design and distributed systems
Hands-on experience working with LLM APIs (e.g., OpenAI) and embedding workflows
Experience with vector databases (Pinecone or similar)
Strong understanding of data modeling, ETL/ELT patterns, and performance optimization
Production experience in at least one startup environment
Ability to operate independently and ship high-impact systems end-to-end
Experience building internal developer platforms or data tooling
Familiarity with prompt engineering and evaluation pipelines
Experience with orchestration frameworks (Airflow, Prefect, Dagster)
Exposure to retrieval-augmented generation (RAG) systems
Infrastructure-as-code experience (Terraform, CDK)
Experience managing large-scale embedding refresh and re-indexing workflows
Engineers and analysts can easily leverage AI-powered data enrichment
Embedding-based search works reliably at scale
New AI use cases can be implemented quickly using shared internal tooling
Systems are robust, observable, and cost-efficient
Work on practical, production-grade AI systems
Direct impact on how data is leveraged across the company
Startup speed with real ownership and autonomy
Opportunity to define the internal AI platform from the ground up
Top Skills
RevenueBase Boston, Massachusetts, USA Office
Boston, MA, United States, 02460
Similar Jobs
What you need to know about the Boston Tech Scene
Key Facts About Boston Tech
- Number of Tech Workers: 269,000; 9.4% of overall workforce (2024 CompTIA survey)
- Major Tech Employers: Thermo Fisher Scientific, Toast, Klaviyo, HubSpot, DraftKings
- Key Industries: Artificial intelligence, biotechnology, robotics, software, aerospace
- Funding Landscape: $15.7 billion in venture capital funding in 2024 (Pitchbook)
- Notable Investors: Summit Partners, Volition Capital, Bain Capital Ventures, MassVentures, Highland Capital Partners
- Research Centers and Universities: MIT, Harvard University, Boston College, Tufts University, Boston University, Northeastern University, Smithsonian Astrophysical Observatory, National Bureau of Economic Research, Broad Institute, Lowell Center for Space Science & Technology, National Emerging Infectious Diseases Laboratories



