How Data Scientists Are Turning New Trends Into Gold

In the ever-evolving world of data science, data scientists are staying on top of trends to keep on the cutting edge.

Written by Anderson Chen
Published on Jun. 30, 2022
How Data Scientists Are Turning New Trends Into Gold
Brand Studio Logo

In the 15 years since the inception of Google’s Street View — powered by eye-catching vehicles plastered with Google iconography and strung with scaffolded camera rigs — the company’s Maps platform has amassed over 2 billion images and 10 million miles in more than 100 countries and territories. 

In that same time, Netflix has become a streaming giant, using its data trove to push recommended shows and make executive decisions on green lighting production; Amazon has leveraged purchase graph analytics — a complex web of interconnected behavioral data points — to capture 40 percent of the US ecommerce market; and Spotify and TikTok, with their respective data-backed algorithms, have carved out their own dominant, hyper-personalized niches.  

Data is the new currency, and big tech companies are the new-age treasuries. For data scientists of the modern data race, following the accelerating trends in data tools and techniques has become that much more important in determining to which victors go the spoils. 

The use of aggregated points of information governs many processes and products across the tech industry spectrum today. From MLOps that align devops and engineering with machine learning, to more efficient edge computing and A/B testing, evolving advancements in data technology — tailored to field and expertise — can better improve workflows by translating intellectual investments to practical results. 

Data scientists with a finger on the pulse of current data innovations generate rippling effects of efficiency down the workforce through new tools and processes, and up to the executive suite via revenue gains and reduced costs. According to McKinsey, leaving gaps in the production process that could be covered by advanced data analytics or AI-powered automation is leaving money on the table. Efficiency is informed by data metrics, which frees up employees to do what they do best — innovate, create and collaborate. 

To get a closer look at how some of Boston’s tech companies are turning data into gold, Built In Boston sat down with three data scientists to get a better sense of what innovative data trends they’re keeping a close eye on — and the bounties they may bring. 

 

Thomas Littrell
Senior Manager, Data Science • Perch

 

Perch is a technology-driven ecommerce platform that helps microbrands scale. With a focus on Amazon’s third-party marketplace, the company accelerates the sellers’ products by leveraging their growth playbook that encompasses everything from SEO practices and product roadmaps to various marketing methods and pricing strategies. Being responsible for these brands, Perch has to stay on the cutting edge. Thomas Littrell, senior manager of data science, said that the company is investing in specific tools to improve processes across teams. 

 

What’s one data science trend you’re watching closely right now, and why?

I’m excited by the rise of MLOps. Historically, practitioners often worked as craftspeople with messy code in Jupyter notebooks, little testing and loose feedback from production, if any. The result was that, despite heavy investment, organizations struggled to derive value from data science. Models were often not deployed, abandoned in production or worst of all, went unmonitored and silently became value-destructive. 

MLOps encompasses a wide range of technologies and ideas, but the core is bringing software engineering and DevOps best practices to machine learning. Data scientists now have the tools and organizational structures to test data quality; track experiments, version and test models; monitor deployed models; and continuously improve based on feedback at scale. The result is industrialized data science that realizes more value by developing and deploying models quickly and confidently.

 

What influence will this trend have on your work or your industry? 

While MLOps is changing most of the data science workflow, two perhaps less appreciated points are how it will impact the type of value we deliver and the skills data scientists need. MLOps allows data scientists to deliver value not just through ad hoc analyses but by deploying decision systems that free up human time for less routine tasks, and in some cases, can perform those tasks better. At Perch, for example, we’re investing in automated pricing and demand forecasting so that brand managers can focus on larger questions like new products and channel strategy. On the talent side, as models become commoditized and MLOps tools proliferate, software engineering skills like version control, testing and working with cloud tools and environments are becoming more important.

The result is industrialized data science that realizes more value by developing and deploying models quickly and confidently.”

 

How are you getting ahead of this trend now? 

While we’re still young, Perch is building a solid foundation in MLOPs both technically and organizationally. We have a modern data platform built on Snowflake, and the data science team uses kedro and pandera to structure our pipelines, track experiments and validate data quality. As we deploy our first models, we’ve integrated CI/CD in Gitlab and are investing in model monitoring capabilities to ensure that our models behave as expected in production. Ultimately the point of these tools is to create more value by iterating quickly, which also requires the right organizational structure. As part of the growth team, we have clear stakeholders and priorities, and get immediate feedback on our outputs.

 

 

Bryce Casavant
Senior Data Scientist • WHOOP

 

Whoop is a wearable health and fitness tracker and performance platform. Recently, the healthtech company has rolled out a variety of monitoring features such as skin temperature, blood oxygen and heart metrics. Along with their wearable hardware, Whoop uses their data platform to help customers optimize wellness as a whole package — fitness, sleep and behavior. For a senior data scientist like Bryce Casavant, being able to explore more efficient data experimentation at such a data-driven company goes a long way in improving his work. 

 

What’s one data science trend you’re watching closely right now, and why?

One trend that I’m following is Bayesian A/B testing. Traditionally, A/B tests use the frequentist approach, but Bayesian A/B testing is more powerful. One of the main advantages is being able to call an A/B test earlier. This ability allows stakeholders to iterate faster and not wait to get a large enough sample size. 

Another benefit is that the Bayesian approach outputs a distribution of each variant’s outcome. This allows for direct comparisons among the distributions, which can measure the probability of one variant being superior to another. Comparing variants in this way is more intuitive for understanding their differences than the frequentist approach, which returns a p-value. Also, the Bayesian approach provides a way to manage risk instead of a chosen significance level.

This ability allows stakeholders to iterate faster and not wait to get a large enough sample size.”

 

What influence will this trend have on your work or your industry? 

At Whoop, a Bayesian A/B testing framework is helping save time with testing by allowing us to stop tests early when we see a clear winning variant rather than waiting for the test to get enough statistical power. We’ve started implementing Bayesian A/B testing using PyMC, a probabilistic programming library for Python.

 

 

black and white group photo outside
Sense

 

George Zavaliagkos
VP of Technology • Sense

 

Sense is a smart home data platform that aims to curb global carbon emissions. By using smart sensors, big data and their monitoring system, the company is helping customers make greener decisions and save on both energy consumption and costs. When asked about how the company is leveraging new trends in the data space, VP of Technology George Zavaliagkos said, “We envision a future when edge computing plays a more significant role.” 

 

What’s one data science trend you’re watching closely right now, and why?

At Sense Labs we are operating with a mixed cloud and edge environment: Machine learning models are trained at the cloud and deployed to operate in real-time in the edge hardware installed at our customers’ homes.   

The cloud is great for storing and processing lots of data, and for training large and complex models. Real-time operation at the edge requires compact models, low memory footprint and low latency. So, we are watching with keen interest attempts to distill large ML models into compact representations, or to find subsets of large networks that provide most of the value at a fraction of the cost of evaluation time.  

We are deployed in 100,000 homes, and today we move data from each of those homes to the cloud in order to train our models.” 

 

What influence will this trend have on your work or your industry? 

On the cost side of the equation, as our customer base grows, so does our cloud bill.   Computing at the edge also grows with the customer base but it is free, so if we could leverage it, we can implement powerful learning algorithms at a fraction of the cost. Besides making our models smaller, we are also looking with a keen eye at distributed training approaches, or to approaches that leverage on-device, unsupervised transfer learning to achieve high accuracy with a small model footprint.

We are deployed in 100,000 homes, and today we move data from each of those homes to the cloud in order to train our models. We envision a future when edge computing plays a more significant role — small independent agents will be trained at each home using local computers, and the cloud will serve as a synchronization mechanism to inspect and aggregate those agents across our customer base.

 

 

Responses have been edited for length and clarity. Images via listed companies and Shutterstock.

Hiring Now
Click Therapeutics
Healthtech • Biotech • App development