Principal Data Center Systems Engineer
PathAI's mission is to improve patient outcomes with AI-powered pathology. Our platform promises substantial improvements to the accuracy of diagnosis and the efficacy of treatment of diseases like cancer, leveraging modern approaches in machine learning. Our team, comprising diverse employees with a wide range of backgrounds and experiences, is passionate about solving challenging problems and making a huge impact.
We're looking for a skilled engineer working in our budding Tech-Ops group focused on focused on designing, building, and operating our hybrid cloud/on-prem environment. This position will focus on our on-prem AI compute center which will do the heavy lifting of our growing ML teams.
If you're the right candidate, you'll be exercising all the skills you have and building new ones along the way:
- Designing, building, and operating our data center for our growing Machine Learning team
- Lead a small team of 3 engineers covering compute, storage and networking needed for our Machine Learning team
- Integrating our data center with our existing cloud infrastructure to create a seamless hybrid cloud environment
- Using your knowledge of networking, storage, and Linux to create a robust and scalable environment for PathAI
- Improving the capacity of our infrastructure through capacity planning, budgeting and forecasting, and implementation
- Improving the reliability and resilience of our infrastructure through root-cause analysis and reviewing gaps in designs and implementations of our infrastructure
- Work closely with a SRE Platform team delivering kubernetes on top of the Data Center systems
Requirements
Our employees' skills come in all shapes and sizes, but to be successful in this role with us, you'll at least need:
- Engineering skills. You’re a generalist in the tech-ops space with knowledge of enterprise grade hardware such as routers, firewalls, switches, load balancers, storage arrays and Linux systems.
- 5+ years of relevant experience
- Automation: You work hard to eliminate toil by automating everything through scripting, configuration management tools (Puppet/Chef/Ansible), code, and proper tooling.
- Operations experience. You’ve managed critical production infrastructure and are familiar with incident response, scaling, and rapid growth related challenges.
- You’ve written tooling for operational teams to use in their day-to-day work.
- Some experience and opinions on virtualization, containerization, or container orchestration platforms.
- A bachelor's degree in Computer Science or equivalent experience
- An insatiable intellectual curiosity and the ability to learn quickly in a complex space
Benefits
For the right candidate, we'll offer a competitive salary plus equity. Your compensation is rounded out by a strong benefits package:
- Flexible work hours, with work-from-home options available
- Three weeks of paid leave per year, an additional two weeks of sick time, plus extended holidays and team-approved leave
- Ten days of 100% subsidized childcare per year
- Healthcare, vision, and dental insurance plans (HMO or PPO), with voluntary add-ons available for dependent care, life, and accident coverage
- Commuter benefit available for public transit or parking
Most importantly, you'll be doing important work with a team of people you'll genuinely enjoy spending the day with.
PathAI is an equal opportunity employer, dedicated to creating a workplace that is free of harassment and discrimination. We base our employment decisions on business needs, job requirements, and qualifications — that's all. We do not discriminate based on race, gender, religion, health, personal beliefs, age, family or parental status, or any other status. We don't tolerate any kind of discrimination or bias, and we are looking for teammates who feel the same way.
#LI-Remote