Chaos Labs builds financial AI products that power safer, more accessible markets. Our risk management systems, analytics, and AI platform serve hundreds of billions in value for leading protocols and exchanges, including Kraken, Aave, Ethena, and Pendle. Since our founding in 2021, we've set the industry standard for on-chain risk management.
The Role:
We’re looking for a Senior AI Data Scientist to lead the design, evaluation, and evolution of the agentic systems behind Chaos AI.
This role sits at the intersection of LLMs, financial data, and decision systems. You’ll own how models reason over data, how agents coordinate and make decisions, and how we rigorously measure quality, correctness, and risk in production AI workflows.
You’ll work closely with product, engineering, and research to move from experimentation to reliable, scalable AI systems that operate under real financial constraints.
Responsibilities:
- Design and own single and multi-agent systems that reason, plan, and act over complex financial workflows
- Define agent behavior, memory, and tool-use strategies with a strong emphasis on correctness and controllability
- Develop and maintain LLM evaluation frameworks covering accuracy, faithfulness, latency, cost, regressions, and edge cases
- Design structured prompting, schemas, and tool-calling strategies for production LLM systems
- Build and operate MCP servers, including schema design, permissions, and safety boundaries
- Analyze model behavior and failure modes, turning qualitative issues into measurable signals
- Partner with engineering to productionize research insights: observability, retries, state, and reliability
- Optimize system performance and cost across models, workflows, and agent architectures
- Mentor engineers and data scientists, setting best practices for applied LLM and agentic systems
- 6+ years experience in data science, applied ML, or AI-focused software roles
- 3+ years building production AI / ML systems, with ownership beyond experimentation
- Deep hands-on experience with LLMs, agentic patterns, and tool-calling systems
- Strong Python skills and comfort working close to production systems and APIs
- Experience with RAG pipelines, embeddings, and vector databases
- Strong intuition for model behavior, trade-offs, and failure analysis
Preferred Qualifications:
- Experience applying ML or statistical methods to financial (crypto/DeFi a plus)
- Strong intuition for model behavior in production and how it shifts with data, users, and constraints
- Track record of building reusable ML or data abstractions that meaningfully improved team velocity or decision-making
Our Perks:
- Paid Time Off - 21 vacation days + 7 sick days + 8 observed U.S. company holidays
- Health Coverage - 100% employer-paid options (medical, dental & vision) for you and your dependents
- FSA / HSA Options - depending on selected health insurance plan
- Parental Leave - thoughtful policy to support your family needs
- Wellness Programs - OneMedical, Teladoc, Talkspace, and EAP
- Career Growth - opportunities in a rapidly expanding, global technology company with personalized professional development
- Pre-tax Commuter Benefits
- Compensation & Equity - competitive package aligned with growth and merit
- Salary and seniority level commensurate with experience (Range: $130K–$150K)
Top Skills
Similar Jobs
What you need to know about the Boston Tech Scene
Key Facts About Boston Tech
- Number of Tech Workers: 269,000; 9.4% of overall workforce (2024 CompTIA survey)
- Major Tech Employers: Thermo Fisher Scientific, Toast, Klaviyo, HubSpot, DraftKings
- Key Industries: Artificial intelligence, biotechnology, robotics, software, aerospace
- Funding Landscape: $15.7 billion in venture capital funding in 2024 (Pitchbook)
- Notable Investors: Summit Partners, Volition Capital, Bain Capital Ventures, MassVentures, Highland Capital Partners
- Research Centers and Universities: MIT, Harvard University, Boston College, Tufts University, Boston University, Northeastern University, Smithsonian Astrophysical Observatory, National Bureau of Economic Research, Broad Institute, Lowell Center for Space Science & Technology, National Emerging Infectious Diseases Laboratories



