Want to Become a Data Engineer? Here’s What You Should Know

Leaders from Abacus Insights and Panalgo describe the skills that define data engineers.

Written by Olivia McClure
Published on Feb. 23, 2022
Want to Become a Data Engineer? Here’s What You Should Know
Brand Studio Logo

Like many people with mathematical backgrounds, Abacus Insights Chief Architect Ali Benamara began his tech journey as a software engineer. Yet, he ultimately discovered his passion extended beyond coding. 

As a data engineering leader, Benamara builds solutions for clients while leveraging a variety of skills, such as cloud computing and automation scripting. Having the opportunity to spend his days collecting data from heterogeneous sources and analyzing them at scale fuels his enthusiasm for his career. 

“My passion for my work is driven by the large-scale, impactful problem-solving I do each day,” Benamara said.

At Abacus Insights, Benamara tackles challenges related to the organization and exchange of healthcare data. For him and his teammates, solving complex problems involves breaking down data silos and ultimately helping healthcare organizations uncover differentiated insights. 

While data collection and analysis is a major component of data engineering, it’s not the only aspect of the job. According to Panalgo Data Engineering Manager Eric Girouard, scalability is an important feature of the field, considering areas like data size and key size matter when developing solutions for healthcare data analysis. 

Recently, his team automated the process used to determine which healthcare database features should be exposed to the company’s customers. While this project was challenging, Girouard said the final product saves engineers countless hours of work while reducing manual testing. 

For Benamara and Girouard, data engineering has unlocked opportunities to make an impact and stretch their skills. Built In Boston caught up with both professionals to get a glimpse of their day-to-day work. 

 

Ali Benamara
Chief Architect  • Abacus Insights

 

Abacus Insights’ platform enables healthcare organizations to organize and exchange data from disparate sources and formats. 

 

What led you to become a data engineer?

Having a background in mathematics led me to become a software engineer. Yet, as I grew professionally, I began to gravitate toward the complex challenges of big data. I enjoyed the process of collecting data from heterogeneous sources and analyzing them at scale, so I became a data engineer. 

 

How does data engineering differ from traditional software development, and what skills do you leverage the most in your daily work?

While software engineers create code that solves challenging problems, data engineers deal with ingesting, storing and analyzing data at scale. I leverage a wide range of skills, including Python coding, the extract, transform and load process, cloud computing, cloud data storage, automation scripting, and machine learning algorithms. I also use a variety of big data tools, such as Apache Spark, Airflow, Delta Lake, data security and Apache Parquet.

My passion for my work is driven by the large-scale, impactful problem-solving I do each day.” 



Describe a project you’re currently working on. What have you enjoyed about it, and what kind of challenges has your team faced so far?

I’m currently building a solution for a healthcare company. Our goal with this project is to reduce the latency it takes to ingest and process data fully. The client’s old architecture takes around 14 hours to finish processing, so we’re redesigning the system to deliver processed data in real time. 

It’s been incredibly rewarding to build something from the ground up alongside committed team members, yet we’ve encountered obstacles along the way. For example, we found it challenging to transition the data from processing in batches to real time. We initially thought it would only take one month to accomplish this, but it ultimately took three months. Despite the roadblocks we’ve faced, I enjoy working with healthcare and technology experts who are supportive and constantly teaching me new things.

 

 

 

Eric Girouard
Data Engineering Manager • Panalgo

 

Panalgo’s platform is designed to streamline healthcare data analytics by removing the need for complex programming. 

 

What led you to become a data engineer?

In a sense, data engineering found me. I have a mathematics and computer science background, so engineering roles that stood out to me the most were ones that involve working with large amounts of data. 

 

How does data engineering differ from traditional software development, and what skills do you leverage the most in your daily work?

Data engineering always focuses on scalability. For instance, a solution that works fine for a data size of 10GB may come crashing down when you run 10TB through it. Even small details, such as key size, add up. For instance, "user_birthday" repeated 100,000,000 times is 1.3 GB, while "bd" consumes only 200MB.

Key data engineering skills include an understanding of cloud computation frameworks, such as Apache Spark, and mastery of data structures, runtime complexities, and cloud storage solutions, including Amazon Web Services.

Data engineering always focuses on scalability.”


Describe a project you recently worked on. What was its impact, and what kind of challenges did your team face?

In the healthcare analytics field, one of the most challenging and manual aspects of hosting and exposing diverse datasets to customers comes down to the features available to each database. Claims databases differ structurally from medical record databases, which differ from hospital billing databases. 

Previously, we carefully examined the features and fields available for each type and then manually wrote configuration code to expose them. I recently automated this process by scanning the databases themselves to determine which features should be exposed to customers. Having over 600 databases in flight gave us ample room for error and large deviations in features between each. As a result, each case had to be considered. While this was certainly challenging, the final product saves countless engineering hours and requires less manual testing.

 

 

Responses have been edited for length and clarity. Images via listed companies and Shutterstock.

Hiring Now
Anduril
Aerospace • Artificial Intelligence • Hardware • Robotics • Security • Software • Defense