hud Logo

hud

Open Role

Posted Yesterday
Be an Early Applicant
In-Office or Remote
20 Locations
Mid level
In-Office or Remote
20 Locations
Mid level
The role involves developing CUA evaluation frameworks, improving developer experience, conducting sales, and leading research teams in a fast-growing startup.
The summary above was generated by AI
About HUD

HUD (YC W25) is developing agentic evals for Computer Use Agents (CUAs) that browse the web. Our CUA Evals framework is the first comprehensive evaluation tool for CUAs.

Our Mission: People don't actually know if AI agents are working. To make AI agents work in the real world, we need detailed evals for a huge range of tasks.

We're backed by Y Combinator, and work closely with frontier AI labs to provide agent evaluation infrastructure at scale.

About the role

HUD is a fast-growing startup. If you can't find a role on our job board, feel free to suggest a new role, and we'll reach out if we find a good fit. :)

Things we might hire for:
  • Building new evaluations/eval environments for HUD's CUA evaluation framework.

  • Building out our CUA evals framework

  • Conducting outbound sales, developing partnerships and improving developer experience for CUA developers

  • Leading and supporting teams of research engineers as they build out our evals

  • General startup operations as we scale

Experience

Strong candidates may have:

  • Engagement with AI Safety and AI alignment

  • Understanding of LLM evaluation frameworks, particularly multimodal and agentic evaluations

  • Familiarity in using and deploying latest AI tools for operational efficiency

  • Experience in in fullstack LLM deployment, particularly for multimodal and agentic AI evaluations

  • Prior experience in fast-growing startup teams

Team & Company Details
  • Team Size: ~15 people currently, mostly full-time in-person, but some remote.

  • Our team: Our team includes 4 international Olympiad medallists (IOI, ILO, IPhO), serial AI startup founders, and researchers with publications at ICLR, NeurIPS etc

  • Company stage: We have received $2 million in seed funding, plus very strong demand and revenue growth beyond that. We are scaling profitably and fast to meet demand.

Logistics
  • Employment: Fulltime preferred, but we're willing to consider internship offers.

  • Location: Remote-friendly, but if you’re in the San Francisco Bay Area, we do have an office you can work together in. We prioritise applicants who can show up to meetings in Pacific Time (UTC-7:00/8:00) or China/Singapore Time (UTC +8:00).

  • Visa Sponsorship: We provide support for relocation and visas for strong full-time candidates. For part-time/contract/internship arrangements, we'll work fully remote (which makes things simpler anyway).

  • Timeline: Applications are rolling. The process should involve 1-2 interviews and take less than a week.

We prioritize operational aptitude and cultural fit. Motivated candidates are encouraged to apply even if they don't meet all criteria.

Due to high volume, we may not actively respond to every application, but feel free to contact us at [email protected] or elsewhere if we missed your application!

Top Skills

AI
Evaluation Frameworks
Llms

Similar Jobs

4 Hours Ago
In-Office or Remote
17 Locations
Entry level
Entry level
Semiconductor • Industrial • Manufacturing
The Logistics Analyst organizes supply chain services, manages inventory control, ensures product flow, and evaluates customer support commitments.
Yesterday
In-Office or Remote
20 Locations
Entry level
Entry level
Artificial Intelligence • Information Technology • Software
The Research Engineer will build environments for evaluation datasets, deliver custom datasets, and improve the evaluation harness based on skills and interests.
Top Skills: DockerLinuxPythonReact
2 Days Ago
In-Office or Remote
30 Locations
Expert/Leader
Expert/Leader
Fintech • Financial Services
Lead product management for risk technology at Xendit, focusing on Risk-as-a-Service architecture, process optimization, tooling, and project management for various stakeholders.
Top Skills: APIsEcommerce PlatformsRisk Management Tools

What you need to know about the Boston Tech Scene

Boston is a powerhouse for technology innovation thanks to world-class research universities like MIT and Harvard and a robust pipeline of venture capital investment. Host to the first telephone call and one of the first general-purpose computers ever put into use, Boston is now a hub for biotechnology, robotics and artificial intelligence — though it’s also home to several B2B software giants. So it’s no surprise that the city consistently ranks among the greatest startup ecosystems in the world.

Key Facts About Boston Tech

  • Number of Tech Workers: 269,000; 9.4% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Thermo Fisher Scientific, Toast, Klaviyo, HubSpot, DraftKings
  • Key Industries: Artificial intelligence, biotechnology, robotics, software, aerospace
  • Funding Landscape: $15.7 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Summit Partners, Volition Capital, Bain Capital Ventures, MassVentures, Highland Capital Partners
  • Research Centers and Universities: MIT, Harvard University, Boston College, Tufts University, Boston University, Northeastern University, Smithsonian Astrophysical Observatory, National Bureau of Economic Research, Broad Institute, Lowell Center for Space Science & Technology, National Emerging Infectious Diseases Laboratories

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account