Principal Software Engineer – Data Warehouse
TripAdvisor, the world’s largest travel site, operates at scale with over 700 million reviews, opinions, photos, and videos reaching over 390 million unique visitors each month. We are a data driven company, and we have lots and lots of data! The Data Warehouse team is responsible for building and managing the infrastructure and tools that enables the rest of the company to interact with the petabytes of data in our data lake.
Our mission statement is to create a new world-class analytics infrastructure for TripAdvisor. We just finished a massive project to create a new ETL platform based on Kafka and Samza providing a less than 7-minute latency between activity on the site to our data warehouse. Internal adoption of this platform has drastically changed how front-end teams track and analyze site activity and our current forecasts are for this to handle tens of terabytes of data a day in the near future.
Our next epoch will focus on building better tools and infrastructure for analyzing and making use of this data as well as a new data platform enabling real time access to this data across TripAdvisor and it’s subsidiaries. This is largely a green-field effort and we are looking for individuals who can help us design and build this new analytics infrastructure largely from the ground up.
This is a hands-on job for an engineer handling petabytes of data. The job requires both serious technical chops and effective communication skills. You will provide technical leadership in an environment that moves fast, collaborates between teams and team members, and that expects you to own your projects.
The Data Warehouse team works closely with our Analytics team to manage all stages of the data pipeline. We make use of technologies like Spark, Hive, Presto, and Snowflake in our ETL pipelines which expose the data to our analyst and machine learning applications. Some members of our team even do a bit of UI work for the series of internal tools we have built and maintained; these include tools that automatically detect data anomalies, enable analysts and engineers to easily build/manage their own ETLs, and managing the many A/B tests that TripAdvisor runs on a regular basis. Lastly, we use some of the latest machine learning libraries in our tools and analysis, including TensorFlow.
In addition to building the software that is the backbone of how TripAdvisor processes data, the warehouse team has operational responsibilities for ensuring our infrastructure is meeting the agreed upon SLAs.
We are looking for someone with a strong sense of responsibility: taking pride in your work, leveraging others, owning the problem.
- Bachelor of Science in Computer Science, Engineering or equivalent
- 8+ years of large scale, full life cycle development experience
- In-depth technical experience with big data technologies such as Hadoop (HDFS, Hive, Map/Reduce), Spark, Kafka/Samza
- Experience managing big data systems in a cloud setting is a plus
- General software engineering and programming experience; most of our larger projects are in Java, but we also have lots of scripts in Python, Bash.
- Linux experience
- Ops / DevOps experience is a plus
- ETL and SQL expertise is a plus