Site Reliability Engineer (SRE)
Company Overview
Withings revolutionized connected health by launching the world's first Wi-Fi scale in 2009. Since then, we’ve become known for innovative devices which pair timeless design and advanced sensing capabilities. Our award-winning ecosystem includes the world’s first activity tracking analog wristwatch, an advanced sleep-tracking mat, and medically accurate devices for precise and effortless blood pressure and body temperature monitoring. Our mission is to bring the power of health and activity data into your everyday life, so you can stick around longer for your loved ones.
Job Summary
We are seeking a well qualified, highly motivated candidate to join our DevOps team as Site Reliability Engineer (SRE). The DevOps team is responsible for ensuring that our platform is fast and stable for the millions of active devices it serves around the globe, while remaining agile and scalable in order to meet future demand. We accomplish this through adherence to principles of observability, automation, and choosing the right tool to tackle each problem.
To optimize performance and efficiency, we use a hybrid baremetal+cloud infrastructure, controlling as much of the stack as we reasonably can. We adapt our platform and database architecture very frequently to support and enable our growth.
Day-to-day, responsibilities and duties may include:
- Supporting the availability and speed of our production applications
- Solving alerts and decreasing manual tasks by increasing automation
- Database management (debugging, upgrades)
- Improvement of continuous integration pipelines
- Web-services troubleshooting and performance improvement
- Additional operational responsibilities
Requirements:
- Servers: Ubuntu (KVM, LXC and physical host)
- Cloud: AWS, GCP and OVH
- Databases: Cassandra/ScyllaDB, PostgreSQL, MySQL, Riak, Redis Cluster
- Configuration Management: Ansible, Terraform
- Languages: Python, PHP and Bash
- Bachelor’s Degree or higher in Computer Science (or equivalent experience)
- Must have a valid passport and be able to travel internationally up to 10% of the time
Leading candidates will understand and adhere to the principles of site reliability engineering (shared ownership, work reduction through automation, operations through software) and be ready to enthusiastically meet the challenges of supporting high performance, high availability applications in a 24/7 real-time, heavy traffic environment. If that sounds like you, please get in touch!