Design, build, and scale AI infrastructure, focusing on cloud and bare metal solutions. Collaborate with teams and automate workflows for performance.
About the Role
We’re looking for a Senior Infrastructure Engineer to help us design, build, and scale the foundational architecture that powers our next-generation AI systems. This role is ideal for someone who thrives in a fast-paced, engineering-driven environment and finds joy in creating robust, elegant systems from scratch.
What You’ll Do
- Build and maintain stable, scalable, and highly available compute infrastructure, spanning cloud (AWS) and bare metal environments.
- Design and operate efficient storage solutions for large-scale AI training datasets and checkpoints.
- Develop high-performance online inference systems, optimizing for diverse GPU environments (e.g., H100, B200).
- Automate infra workflows to maximize reliability, observability, and performance across our platform.
- Collaborate closely with AI researchers and backend engineers to support evolving model deployment and experimentation needs.
- Lead and contribute to internal tooling, CI/CD pipelines (e.g., GitHub Actions), and monitoring infrastructure (e.g., Grafana, Prometheus, OpenTelemetry).
What We’re Looking For
- 3 years+ of experience in DevOps / SRE / Infra.
- Strong programming ability in Python or Golang (must be proficient in at least one).
- Production-level experience with Kubernetes in daily operations.
- Deep understanding of modern DevOps / SRE / Infra principles, especially around scalability, automation, and fault-tolerance.
- Hands-on experience with AWS services (e.g., EC2, S3, EKS, IAM, RDS, CloudFront).
- Ability to work independently, lead projects, and become a subject matter expert.
- Strong communication skills and a collaborative, self-motivated mindset.
Nice to Have
- A tinkerer’s spirit - you enjoy hacking, experimenting, and building things for fun or learning. Examples we value:
- Running your own home lab or mini data center.
- Building clever side projects or open-source tools.
- Writing high-quality technical articles or contributing to public repos.
- Developing lightweight, efficient tooling for infra monitoring or ops.
- Contributions to the open-source community or prior experience maintaining open/closed source systems.
- Familiarity with model training workflows (e.g., LLMs, GPUs, large data IO).
- Interest in working directly with AI researchers and understanding model performance trade-offs.
- Brain: We value intelligence and the pursuit of knowledge. Our team is composed of some of the brightest minds in the industry.
- Heart: We care deeply about our work, our users, and each other. Empathy and passion drive us forward.
- Gut: We trust our instincts and are not afraid to take bold risks. Innovation requires courage.
- Taste: We have a keen eye for quality and aesthetics. Our products are not just functional but also beautiful.
- Competitive salary, equity, and benefits package.
- Opportunity to work with a talented and passionate team at the forefront of AI and 3D technology.
- Flexible work environment, with options for remote and on-site work.
- Opportunities for fast professional growth and development.
- An inclusive culture that values creativity, innovation, and collaboration.
- Unlimited, flexible time off.
Benefits
- Competitive salary, benefits and stock options.
- 401(k) plan for employees.
- Comprehensive health, dental, and vision insurance.
- The latest and best office equipment.
Top Skills
AWS
Ci/Cd
Github Actions
Go
Grafana
Kubernetes
Opentelemetry
Prometheus
Python
Similar Jobs
Artificial Intelligence • Cloud • Consumer Web • Productivity • Software • App development • Data Privacy
Develop scalable software infrastructure for Machine Learning, optimize Dropbox's analytics platform, mentor junior engineers, and enhance security and performance for users.
Top Skills:
AWSC/C++DockerGoJavaKubeflowKubernetesPythonPyTorchRayTensorFlow
Computer Vision • Healthtech • Information Technology • Logistics • Machine Learning • Software • Manufacturing
As a Senior Software Engineer, you will enhance CI/CD platforms, manage cloud infrastructure, and collaborate with teams to optimize reliability and security.
Top Skills:
AWSAzureBashBuildkiteCircleCIDockerGCPGithub ActionsGoKubernetesPulumiPythonTerraform
Computer Vision • Healthtech • Information Technology • Logistics • Machine Learning • Software • Manufacturing
The role involves evolving CI/CD platforms, managing cloud infrastructure, overseeing container orchestration, and implementing best practices for cloud security.
Top Skills:
AWSAzureBashDockerGCPGoKubernetesPulumiPythonTerraform
What you need to know about the Boston Tech Scene
Boston is a powerhouse for technology innovation thanks to world-class research universities like MIT and Harvard and a robust pipeline of venture capital investment. Host to the first telephone call and one of the first general-purpose computers ever put into use, Boston is now a hub for biotechnology, robotics and artificial intelligence — though it’s also home to several B2B software giants. So it’s no surprise that the city consistently ranks among the greatest startup ecosystems in the world.
Key Facts About Boston Tech
- Number of Tech Workers: 269,000; 9.4% of overall workforce (2024 CompTIA survey)
- Major Tech Employers: Thermo Fisher Scientific, Toast, Klaviyo, HubSpot, DraftKings
- Key Industries: Artificial intelligence, biotechnology, robotics, software, aerospace
- Funding Landscape: $15.7 billion in venture capital funding in 2024 (Pitchbook)
- Notable Investors: Summit Partners, Volition Capital, Bain Capital Ventures, MassVentures, Highland Capital Partners
- Research Centers and Universities: MIT, Harvard University, Boston College, Tufts University, Boston University, Northeastern University, Smithsonian Astrophysical Observatory, National Bureau of Economic Research, Broad Institute, Lowell Center for Space Science & Technology, National Emerging Infectious Diseases Laboratories