Chalking up another L to “the Churn”, it seems like this slightly hacky way to add native HTML5 captions and titles for audio content with webvtt no longer works.

The hack is described in detail in that post, but basically:

use a <video> element instead of an <audio> element. make the <source> your audio (mp3 or whatever)
add a <track> element with your WebVTT subtitles
make sure to use a poster image so the video gets some space in the page

This has been broken for a few years, at least according to my notes, and I keep failing at finding a fix.

As of now:

Firefox no longer shows the “CC” control or options to enable captions
Chrome-based browsers show the captions, but you can’t see them because they are behind the media controls, which never seem to go away.

A screenshot of Chrome's media player controls where the timeline scrubber obscures the captions. We see one line reading "the past twelve weeks" peeking above the controls.

This post is me absolving myself of finding a fix for now! 🕊️

#audio #captioning #subtitles #WebVTT #podcasts

🔗 April 16, 2023 at 6:51PM EDT • by

Marty McGuire

Wow, my podcast listening habits have changed quite a bit under home quarantine. I periodically export my listening history to the /listens page on my site.

Last update: March 18th. Today’s update added: 0 listens. 😯

#quarantine #podcasts #listening #habits

🔗 April 15, 2020 at 6:39PM EDT • by

Marty McGuire

Native HTML5 captions and titles for audio content with WebVTT

This is a write-up of my Sunday hack day project from IndieWebCamp NYC 2017!
You can see my portion of the IWC NYC demos here.

Prelude: Videos for audio content

Feel free to skip this intro if you are just here for the HTML how-to!

I've been doing a short ~10 minute podcast about the IndieWeb community since February, an audio edition of the This Week in the IndieWeb weekly newsletter cleverly titled This Week in the IndieWeb Audio Edition.

After the 2017 IndieWeb Summit, each episode of the podcast also featured a brief ~1 minute interview with one of the participants there. As a way of highlighting these interviews outside the podcast itself, I became interested in the idea of "audiograms" – videos that are primarily audio content for sharing on platforms like Twitter and Facebook. I wrote up my first steps into audiograms using WNYC's audiogram generator.

While these audiograms were able to show visually interesting dynamic elements like waveforms or graphic equalizer data, I thought it would be more interesting to include subtitles from the interviews in the videos. I learned that Facebook supports captions in a common format called SRT. However, Twitter's video offerings have no support for captions.

Thankfully, I discovered the BBC's open source fork of audiogram, which supports subtitles and captioning, including the ability to "bake in" subtitles by encoding the words directly into the video frames. It also relies heavily on BBC web infrastructure, and required quite a bit of hacking up to work with what I had available.

In the end, my process looked like this:

Export the audio of the ~1 minute interview to an mp3.
Type up a text transcript of the audio. Using VLC's playback controls and turning the speed down to 0.33 made this pretty easy.
Use a "forced alignment" tool called gentle to create a big JSON data file containing all the utterances and their timestamps.
Use the jq command line tool to munge that JSON data into a format that my hacked-up version of the BBC audiogram generator can understand.
Use the BBC audiogram generator to edit the timings and word groupings for the subtitles and generate the final video.
Bonus: the BBC audiogram generator can output subtitles in SRT format - but if I've already "baked them in" this feels redundant.

You can see an early example here. I liked these posts and found them easy to post to my site as well as Facebook, Twitter, Mastodon, etc. Over time I evolved them a bit to include more info about the interviewee. Here's a later example.

One thing that has stuck with me is the idea that Facebook could be displaying these subtitles, if only I was exporting them in the SRT format. Additionally, I had done some research into subtitles for HTML5 video with WebVTT and the <track> element and wondered if it could work for audio content with some "tricks".

TL;DR - Browsers will show captions for audio if you pretend it is a video

Let's skip to the end and see what we're talking about. I wanted to make a version of my podcast where the entire ~10 minutes could be listened to along with timed subtitles, without creating a 10-minute long video. And I did!

Here is a sample from my example post of an audio track inside an HTML5 <video> element with a subtitle track. You will probably have to click the "CC" button to enable the captioning

How does it work? Well, browsers aren't actually too picky about the data types of the <source> elements inside. You can absolutely give them an audio source.

Add in a poster attribute to the <video> element, and you can give the appearance of a "real" video.

And finally, add in the <source> element with your subtitle track and you are good to go.

The relevant source for my example post looks something like this:

<video controls poster="poster.png" crossorigin="anonymous" style="width: 100%" src="audio.mp3">
  <source class="u-audio" type="audio/mpeg" src="audio.mp3">
  <track label="English" kind="subtitles" srclang="en" src="https://media.martymcgui.re/.../subtitles.vtt">
</video>

So, basically:

Use a <video> element
Give it a poster attribute for a nice background
Use an audio file for the <source> inside
Use the <track> element for your captions/subtitles/etc.

But is that the whole story? Sadly, no.

Creating Subtitles/Captions in WebVTT Format

In some ways, This Week in the IndieWeb Audio Edition is perfectly suited for automated captioning. In order to keep it short, I spend a good amount of time summarizing the newsletter into a concise script, which I read almost verbatim. I typically end up including the transcript when I post the podcast, hidden inside a <details> element.

This script can be fed into gentle, along with the audio, to find all the alignments - but then I have a bunch of JSON data that is not particularly useful to the browser or even Facebook's player.

Thankfully, as I mentioned above, the BBC audiogram generator can output a Facebook-flavored SRT file, and that is pretty close.

After reading into the pretty expressive WebVTT spec, playing with an SRT to WebVTT converter tool, and finding an in-browser WebVTT validator, I found a pretty quick way of converting those in my favorite text editor which basically boils down to changing something like this:

00:00:02,24 --> 00:00:04,77
While at the 2017 IndieWeb Summit,

00:00:04,84 --> 00:00:07,07
I sat down with some of the
participants to ask:

Into this:

WEBVTT

00:00:02.240 --> 00:00:04.770
While at the 2017 IndieWeb Summit,

00:00:04.840 --> 00:00:07.070
I sat down with some of the
  participants to ask:

Yep. When stripped down to the minimum, the only real differences in these formats is the time format. Decimals delimit subsecond time offsets (instead of commas), and three digits of precision instead of two. Ha!

The Future

If you've been following the podcast, you may have noticed that I have not started doing this for every episode.

The primary reason is that the BBC audiogram tool becomes verrrrry sluggish when working with a 10-minute long transcript. Editing the timings for my test post took the better part of an hour before I had an SRT file I was happy with. I think I could streamline the process by editing the existing text transcript into "caption-sized" chunks, and write a bit of code that will use the pre-chunked text file and the word-timings from gentle to directly create SRT and WebVTT files.

Additionally, I'd like to make these tools more widely available to other folks. My current workflow to get gentle's output into the BBC audiogram tool is an ugly hack, but I believe I could make it as "easy" as making sure that gentle is running in the background when you run the audiogram generator.

Beyond the technical aspects, I am excited about this as a way to add extra visual interest to, and potentially increase listener comprehension for, these short audio posts. There are folks doing lots of interesting things with audio, such as the folks at Gretta, who are doing "live transcripts" with a sort of dual navigation mode where you can click on a paragraph to jump the audio around and click on the audio timeline and the transcript highlights the right spot. Here's an example of what I mean.

I don't know what I'll end up doing with this next, but I'm interested in feedback! Let me know what you think!

#site-update #IndieWeb #IWCNYC2017 #IWCNYC #IWC #audio #video #captioning #subtitles #audiogram #WebVTT #podcasts

🔗 October 17, 2017 at 8:57PM EDT • by

Marty McGuire

🎧 What Podcasts am I Listening To? May 2017 Edition

When gRegor became the third person in a week to ask me what podcasts I listen to, I realized that this is something that changes a lot, and might be worth taking the time to review periodically!

I tend to think of my podcasts by category, so I'll attempt to break them down below.

Obligatory (several people recommended them and/or they're advertised everywhere):

S-Town (yep, it is really very good. also, it is finite!)
99% Invisible (everything is designed by people and that is beautiful and troubling)

Comedy

My Favorite Murder (❤️ so funny. also, sometimes completely horrifying!)
My Brother, My Brother, and Me (gotta love those McElroys)
The Adventure Zone (the McElroys play D&D and are remarkably wonderful)
Harmontown (hilarious and off-putting)
Mouth Time with Reductress (Reductress has a podcast. YEP.)
CoolGames Inc (two silly boys make up video games)
ILLUSIONOID (hilarious improvised old-timey sci-fi radio dramas)
28 Plays Later (an American cartoonist and an Australian actor and game reviewer are good friends)
Scared Yet: The Podcast (the same American cartoonist from above and another American cartoonist talk about what kinds of horror scares people good)

Tech

Revision Path (weekly interviews showcasing Black designers and developers. Maurice is great and this podcast is great)
The Contrafabulists (taking apart mainstream tech stories to find the people and issues beneath)
This Week in the IndieWeb (disclaimer: I make this podcast)

Narrative

Welcome to Night Vale (weird and fun and funny)
Alice Isn't Dead (weird and fun and spooky)
The Orbiting Human Circus of the Air (adorable and artsy)
SAYER (scifi and horror. returning soon with Season 4!)
Thrilling Adventure Hour (live old-timey radio shows. so funny. still putting out content despite being "over"!)
Within the Wires (weird and spooky and sad)

Gaming (RPG and Otherwise)

The RPG Academy (disclaimer, I am part of one of the shows there)
She's a Super Geek (fun people playing interesting RPGs focusing on women as GMs)
Modifier (discussions about making games accessible and more!)
One Shot (hilarious people playing interesting RPGs)
Party of One Podcast (fun single person playing interesting RPGs with one GM)
Play Dead (serious conversations about death in games. reminds me of one of my favorite YouTube channels, Caitlin Doughty's Ask a Mortician)

Friends' Projects (disclaimer: I have been a guest on some of these)

Springtime for Springsteen (silly dudes ostensibly talking about Bruce Springsteen)
Hobo Radio (Joel and Lars give each other such a hard time about everything but mostly pop culture)
TheCurioso (Joe and Chris investigate the unusual and report back)
Baltimore Improv Group Podcast (sketches and interviews with local improv friends. disclaimer: I help produce this podcast)
The 4-Play Sex and Comedy Podcast (by lovely local folks and improv friends)
Monarch Broadcasting (my co-host Jonathan teaches his student to make their own podcasts!)
Uncanny Creativity (Brian is an improv buddy)
Expert of Nothing (local Baltimore comedy game show)
Friends with Black People (hasn't updated in a while but one of my favorite ideas for a podcast)
We Have to Ask (disclaimer: I co-host this podcast. Comic improvisation in alternate realities)

Politics and World Issues

Code Switch (super good
Pod Save the People (❤️ @deray)
Baltimore: The Rise of Charm City (Stacia L. Brown is amazing)
Hope Chest: A Podcast (seriously, listen to Stacia L. Brown)
Sidedoor (from the Smithsonian. started listening because I read they hired Stacia L. Brown to help produce)
Seminars About Long-term Thinking (longer TED talks about deeper issues like whether human culture can survive 10,000 years)
Dan Carlin's Common Sense and Hardcore History (now extra upsetting in this political climate!)

Whew! This took a while. I have a frightening number of podcasts in my reader (I use AntennaPod).

What do you think? Any surprise overlaps? Anything you want to check out, or suggest that I check out?

#podcasts #listen

🔗 May 30, 2017 at 6:26PM EDT • by

Marty McGuire

Posts Tagged podcasts

Native HTML5 captions and titles for audio content with WebVTT

Prelude: Videos for audio content

TL;DR - Browsers will show captions for audio if you pretend it is a video

Creating Subtitles/Captions in WebVTT Format

The Future

🎧 What Podcasts am I Listening To? May 2017 Edition