September 8, 2017
Photography via 3Play Media


MIT is no stranger to pumping out innovative startups, and Boston-based 3Play Media just adds another chapter to that storyline.

Founded in 2007 by four students at the MIT Sloan School of Management, the company offers a far more efficient system for captioning online video than traditional methods, which involve pausing videos to mark time and write text, taking up to 10 hours per video hour.

What makes 3Play Media’s technology unique is that it  uses automated speech recognition software to produce transcripts that are easy to edit and can be created in a fraction of the time that manual transcription services require.

The students decided to try their blossoming service out at school, captioning videos for a handful of MIT OpenCourseWare classes.

Captioning over 100 hours of video, the newfound process of combining speech recognition tools with human editing far exceeded the students’ expectations: They were able to process captions in a few hours per video-hour.

They realized higher education was just the beginning.




“The reality is — online video is growing,” said Josh Miller, co-founder of 3Play Media. “We thought, if MIT needs this type of service, other people will, too. If all of TV has to be captioned, why wouldn’t all web video?”

Miller is currently the chief revenue officer of the startup, which he joined after completing the MBA program at MIT Sloan School of Management as a member of the Entrepreneurship & Innovation Program.

By 2008, Miller and the other founders were operating from an apartment in Somerville that boasted slanted floors and tight quarters, but that didn’t thwart business.

The company bootstrapped for the next couple of years because the venture capital community wanted to see traction before investing, Miller explained. This ended up working in 3Play Media’s favor.

We thought, if MIT needs this type of service, other people will, too



“Because we were truly bootstrapped for the first couple of years, we always had the mentality of being relatively lean and growing with the market,” Miller said. “It was hard to hear when we first started that [customers] weren’t ready for us, but we’re at a point now where people truly understand what it means to caption video.”

By 2010, 3Play Media was backed by a handful of angel investors and was racking up big-name customers, eventually landing Netflix, Time Warner Cable and Viacom, in addition to governmental agencies and universities like Yale and Boston University, which  were suddenly able to process hundreds of hours of video per day.

Today, the company has over 2,000 customers across virtually every industry that uses video, and it shows zero signs of slowing down.




Over the past few years, 3Play Media’s system has remained similar to how it worked at MIT but has grown in scale and product offerings.

Customers upload videos to 3Play’s website, where automatic speech-recognition software creates transcripts, subtitles and captions that are pushed to the cloud. Transcripts appear as documents with incorrect or inaudible words flagged for editing while video plays at the side.

Then, contracted editors can choose which transcripts they want to edit before pushing captions back into the cloud for customers to use.

Recently, 3Play Media launched an audio description tool, which describes what’s on the screen for someone who can’t see it.

While the audio description industry is relatively small and fragmented, Miller said he hopes to become a leader in the space, making content more accessible to the blind community.

While 3Play’s Boston team is relatively lean at 33, there are currently open roles for sales and engineering positions.

“One reason you’d want to work here is we’re relatively small in a market that is moving really quickly,” Miller said. “There’s huge growth potential ahead of us. And second, we’re doing something that is really helping people.”