Lead Platform Engineer
At DataRobot we’ve developed a powerful machine learning application, underpinned by a flexible platform that runs in many environments, from customer hardware to the cloud. We are seeking talented engineers with a strong background in Linux, programming, and distributed computing to build core services and components that make up the DataRobot stack and deliver them to the most challenging customer environments.
The Platform Engineering team has built a framework which makes it possible to add any service to the DataRobot application, build it, test it, configure it, and ship it to our cloud environment and our on-premise customers. We manage the entire dependency stack, optimizing our product for security, reliability, and performance, as well as extreme portability. We make it possible for DataRobot to run anywhere from single VM's in the cloud to large, high-performance computing clusters, conforming to stringent security protocols for sensitive data.
In particular you will:
- Build and improve the DataRobot installation framework.
- Automate the creation of infrastructure for our development, test, and production environments.
- Integrate new services and architectural components into the DataRobot application stack.
- Improve core systems and services related to logging, monitoring, security, and administration.
After you develop mastery of the software stack and company structure you will work with product owners, engineering leadership, and other development teams to design projects that exceed our customer's expectations. You will have significant freedom to define and grow your area of ownership and drive hiring, product management, people management, and development.
- Leadership skills: qualified candidates will have
- End-to-end project lifecycle experience - you have brought multiple projects through ideation, planning, implementation and production maintenance
- Excellent communication skills
- Previous experience as a team lead or manager
- A strong focus on delivering value
- Programming skills: qualified candidates will have
- Experience developing applications or tools using Python, Java, or similar (Python strongly preferred).
- Comfort writing unit and functional tests of their code.
- Ability to create distributable packages using setuptools, Maven, or similar.
- Experience building CI/CD systems to test, deploy, and ship their code.
- Linux skills: qualified candidates will have
- Comfort with the Bash CLI and scripting.
- Familiarity with multiple Linux distributions.
- Strong troubleshooting skills: finding and parsing logs, inspecting system status, and working in distributed systems.
- Operational skills: excellent candidates will have
- Experience with cloud infrastructure providers (AWS strongly preferred).
- Experience building and maintaining production environments and services.
- Experience with the full lifecycle of software development, from development to productionisation.
- Desire and ability to lead our Kubernetes deployments to SaaS and on-premise environments and enable teams to develop applications leveraging its API.
- Experience with Continuous Integration and Continuous Delivery - developing automation for build, test, deployment, and release processes
- Configuration management tools like Ansible, Puppet, or SaltStack
- Container orchestration technologies like Mesos, Kubernetes, and Docker Swarm.
- Knowledge of server provisioning systems such as Terraform, Packer, or Cobbler
- Experience administering or using Hadoop.
- Experience delivering software to on-premise environments.
- Expertise in Linux packaging and build tools.
Individuals seeking employment at DataRobot are considered without regards to race, color, religion, national origin, age, sex, marital status, ancestry, physical or mental disability, veteran status, gender identity, or sexual orientation.