Senior Data Engineer - Data Connectivity at Thrasio
Key responsibilities include:
- Design, develop, and operate highly scalable, high-performance, and low-cost data pipelines in distributed data processing platforms with AWS/cloud technologies.
- Collaborate with Architects, Engineers, Product Managers, Analysts, and Data Scientists in the organization to construct complex data sources for Metrics, algorithms and machine learning models
- Own and Interface with other technology teams to extract, transform, and load data from a wide variety of data sources using SQL and AWS big data technologies.
- Creation and support of Batch and real-time data pipelines built on AWS technologies including API gateway, Glue, Kinesis, EMR, Lambda, and Athena etc.
- Develop tools, plug-ins, and solutions for CI/CD workflow using tools such as Jenkins, Cloud Formation, Docker, Github, and other microservices.
- Continual research of the latest big data and visualization technologies to provide new capabilities and increase efficiency.
- Drive and participate in architecture discussions, influence product roadmap, and take ownership and responsibility over new projects.
- Maintain and support existing platforms and evolve to newer technology stacks and architectures.
- Collaborate with the Data team, product owners, Business stakeholders, Scrum-master to refine and estimate stories/epics.
- Be an integral part of the scrum team to deliver on commitments on time and with good quality.
- Review and ensure all code documentation is complete and updated periodically. Review work of Junior associates in the team.
- Analyze the business Problems and deliver Optimal solutions using AWS Technologies.
What you bring to the party:
- 5+ years of industry experience in software development, data engineering, business intelligence, or related fields
- Passion for building scalable solutions that enable the extraction and processing of big data sets to enable accurate decision making
- Experience with software configuration management tools(Git) and CI/CD pipelines and their enabling tools such as Jenkins, Nexus, etc.
- Ability to write complex SQL statements.
- Experience working with cloud infrastructure services like Amazon Web Services and Google Cloud is preferred but not required.
- Proficiency in a programming language especially for data manipulation and ability to build, maintain and deploy sequences of automated processes with these tools - We use Python but are open to any object oriented programming language such as Java, C++, etc.
- Knowledge and strong advocate for software engineering best practices across the development lifecycle, including agile methodologies, coding standards, code reviews, source management, build processes, testing, and operations.
- Ability to work with Product and non-technical teams to gather requirements and translate them into data engineering tasks.
- Excellent written and verbal communication skills
Nice to have, but not required:
- BS or MS in Computer Science, Engineering, Information Technology, Analytics or an equivalent quantitative degree.
- Experience using data transformation tools such as panda, pyspark, etc.
- Experience in designing micro services based applications and data pipelines, concepts of ETLs Related experience in an e-commerce organization with a recurring revenue model is a strong plus
- Experience in using business intelligence reporting tools (PowerBI, Tableau, etc.)
- Experience with at least one cloud platform; Azure, Amazon, or Google. Thrasio uses Amazon and hence experience there is a plus.
- Experience working with AWS big data technologies (S3, Athena, EMR, Glue, Lambda, Kinesis, API gateway, RDS, DynamoDB, Step Function etc).