Automated, Interactive Transcripts

by Jeffrey Way
Sep 06.2023

As you can imagine, Laracasts frequently receives requests for new courses and site features. One feature request in particular that often pops up is companion written tutorials for every video. It’s not a bad idea! In a perfect world, this would be a natural and perfect addition to the site. But, as always, there’s one big roadblock: time.

To do it properly would require a significant amount of work. And this is especially true for the types of videos we create at Laracasts. They don’t always map so easily to a written format. Nonetheless, the appeal is still there. I’ve even toyed around with implementing it, on and off, for a number of years now. But every time I get close to saying yes, I’m forced to return to the reality that we’re a very small business with limited resources. What you may refer to as a helpful new feature, I might consider “yet another thing we have to organize and be responsible for.” It’s a difficult tight-rope to walk sometimes. For every “wouldn’t it be cool if…”, somebody has to take ownership.

Middle Ground

I think I’ve landed on a reasonable middle-ground, however. One that’s imperfect, but cheap and automated. My two favorite things!

We’re programmers. If it can be automated, then it should be automated.

You’ll notice that all recent and new videos at Laracasts include a “Toggle Transcript” button at the bottom of the episode list in the sidebar.

Video page with "Toggle Transcript" button.

Clicking this will reveal an interactive transcript for the video. Press play, and the transcript will neatly follow along with the video, highlighting each section as you come to it.

Laracasts Video With Transcript.

And if you’d rather advance to the portion of the video where we, say, discuss Laravel Sail, then you can search for “Laravel Sail” in the transcript and instantly play the video at that exact timestamp. Useful!

Click here to see an example of a video that includes a companion transcript.

But, it’s not perfect. Perhaps one day, automated transcriptions will be indistinguishable from those that are manually prepared by a human. But right now - and especially for programming videos - that’s not possible. So, we have two ways to deal with this:

  1. Allow it. These are automatically generated; people will understand that it’s a close approximation, but not perfect.
  2. Require a human to review and edit every new transcript.

For a period of time a number of months ago, I chose the second option. And I was that human editor. The problem, again, was that it required a significant amount of time; time that would be better spent elsewhere. While I might eventually hire a third-party service or contractor to take ownership of this, for now, I’m going to keeps things automated and imperfect.

Planning the Feature

From the start, we knew that we wanted more than just a basic transcript below the video player. No, instead, we wanted it to be interactive. Consider these requirements that we came up with:

  • The transcript should be split into timestamps.
  • It should synchronize with the video player. Press play, and the transcript instantly transitions to the corresponding paragraph.
  • The user should be able to search the transcript for keywords, and tab between every occurrence.
  • Every segment of the transcript should be clickable. When clicked, the video immediately transitions to that exact timestamp.

I had no idea how to do any of these things. And, to be truthful, I couldn’t even tell you the definitions and difference between transcript, caption, and subtitle. ?

But that’s my favorite aspect of programming. We’re constantly introduced to things we don’t know how to do, and told to “figure it out.” You better. Your job depends on it.

As a quick aside, I’ve found this to be excellent training for your life in general. Whatever it is you need to accomplish, don’t always throw your hands up. Instead… figure it out! If your toilet is broken and needs to be replaced, don’t hire a plumber. Figure it out yourself. It’s not that hard. If you bought a home that has a small pool in the back, don’t hire a pool company to maintain the chemicals. Figure it out.

Anyhow, let’s get back to the transcription project. I of course knew that there are a variety of services that will automatically generate transcripts for an uploaded video. In fact, platforms like YouTube and Vimeo do this automatically. If you inspect one of these generated files, you may find that it has a .VTT extension. VTT stands for “Video Text Track.” Let’s have a look at one of these files.

WEBVTT - This file was automatically generated by VIMEO

00:00:04.400 --> 00:00:07.000
All right, welcome back. So in just a

00:00:07.200 --> 00:00:10.400
 little bit, maybe the next video, I will introduce you to a tool

00:00:10.400 --> 00:00:13.100
 called Pinia, but you know, what? Why don't

00:00:13.100 --> 00:00:16.300
 we take five minutes or so and just talk a little bit more

00:00:16.300 --> 00:00:19.400
 about how you would wire this up yourself? Okay.

Interesting! Notice how each section includes the corresponding timestamps. And you get that for free! Amazing.

The next hurdle is to figure out how to synchronize the transcript with the video, as it plays. That part is a little trickier, but you can probably imagine what needs to happen behind the scenes. Perhaps every second or so, the video should dispatch a JavaScript event with the current timestamp of the video. Our transcript “service” could then listen for this event, and highlight or “activate” the portion of the transcript that matches this segment.

To allow for this, I created a relatively simple Composer package that reads a VTT file, and converts it into a series of Line objects that encapsulate the line’s corresponding text and beginning / ending timestamps.

I could have reached for an existing package that supports a variety of text track file formats, but, again, I’m a big fan of figuring these things out yourself. Developers often say “Don’t reinvent the wheel.” I say reinvent to your heart’s content. (My personal workflow is to reach for existing packages for big or complex requirements. For smaller things, I almost always create an internal package myself.)

With this package, I can now load and parse a VTT file, like so:

use Laracasts\Transcriptions\Transcription;

$transcription = Transcription::load('path/to/file.vtt');

foreach ($transcription->lines() as $line) {
    // $line->body;
    // $line->toHtml();
    // $line->timestamp->begin();
    // $line->timestamp->end();    

// Group lines into full sentences.
// $transcription->lines()->groupBySentence();

Now that we can easily split this VTT file into lines, the front-end only requires a basic foreach statement to loop over the lines and generate a nicely formatted transcript. Perhaps something like:

   v-for="line in transcript" 
   <span>{{ line.timestamp.begin }}:</span>

   {{ line.body }}

Notice that data-time attribute on the div tag? We could pretty easily listen for a timestamp update event from the video player, and then write a query selector to find the closest matching transcript section.

And, really, that’s 90% of the work. As I worked on this new feature, I was repeatedly surprised by how simple it ended up being.

The Next Steps

We’re going to live with what we currently have on the site for a few months. Near the end of the year, perhaps we’ll consider taking another step. While there are dedicated “premium” transcription services, we might instead hire a dedicated monthly contractor who would be responsible for doing a “second pass” over each generated VTT file (among other things). But we’ll see!

Here's more articles you might enjoy...

An Introvert-Friendly Business Model

I’ve given this a good bit of thought. Late at night, when I can’t sleep, I imagine a potential variation on my life; one in which I commute to work each morning before my kids wake up. I spend 8-10 hours at a desk writing code, and return home around 6pm. I step out of my car, and approach the front door, slowly reaching for the keys in my pocket. I release one of those fatigued sighs that only I can hear, and then unlock the door to my two young kids, excited to see me. The best part of my day.

by Jeffrey Way
Business Impostor

When it comes to business, I - like many developer-turned-small-business-owners, I’d imagine - am really quite green. Not just green, deep green. No MBA in sight. I’m not even sure what that stands for, to be honest. Master of Business Administration? Is that it? Yep, a master of business… yours truly is not. I was instead thrust into the schools of “business acronym Googling” and “fly by the seat of your pants.”

by Jeffrey Way

Newsletter? Yes.

Don't worry: we'll only send you an email when we have something important to share.

Level Up Your Programming With Laracasts

$15 a month for everything we know about programming. Everything.

Join Laracasts