When planning IWC Baltimore, one of the things I wanted to prioritize was having high-quality discussion sessions on Saturday, to take on issues that attendees have on their minds, and to get their creative juice's going for the Sunday project hack day.
Aaron, Tantek, and I sat down after IWC Austin to do some (re-)planning of the typical IWC Saturday schedule based on lessons learned in 2017. We had some things in mind:
People start to get tired and distracted after about 45 minutes.
Each facilitator often had a different way of running their session. This sometimes led to discussion ending early, but often meant going over time.
Attendees need time to move between sessions, grab water, use the restroom, etc.
We mapped out several combinations of time for lunch, session length, break length, expected end-of-day, and even decoy time ("we'll tell them they have 5 minutes when actually they have 10").
With the constraints that we wanted to break for lunch at noon, and end the day before 6pm, we settled on a schedule of 5 sessions:
1 hour 15 minutes for lunch, with sessions beginning at 1:15
Facilitators should show up 10 - 15 minutes early to be briefed on how to facilitate well
5 sessions at 45 minutes per session
10 minute break between sessions
A final "Intro to Day 2" session to prepare attendees for the Sunday hackday, which we could shorten if needed.
To encourage more consistent facilitation, I created a "Running an IndieWebCamp Session" card that we handed out to facilitators encouraging them to assign at least one note taker a time keeper with checks at 10, 5, and 1 minute. The card also included tips on how to kick off the discussion.
Overall I think our planning really paid off. We were able to stick almost entirely to the schedule (very few sessions went over time, and none started late), the day ended on time, and informal feedback suggests that the discussions weren't hampered by the format.
Of course, there were some changes:
Our morning session went long, so there were 55 minutes for lunch rather than an hour and 15. Still, we started sessions on time.
Rather than pre-prepping facilitators, we gave them the facilitator cards and explained their contents in the time between sessions.
In addition to the per-session timekeepers, one of the organizers (me!) was responsible for loudly calling the end time for all sessions.
Having the extra time allowed for the unexpected, such as taking extra time to end the live stream for one session and begin it for the next.
After the 2017 IndieWeb Summit, each episode of the podcast also featured a brief ~1 minute interview with one of the participants there. As a way of highlighting these interviews outside the podcast itself, I became interested in the idea of "audiograms" – videos that are primarily audio content for sharing on platforms like Twitter and Facebook. I wrote up my first steps into audiograms using WNYC's audiogram generator.
While these audiograms were able to show visually interesting dynamic elements like waveforms or graphic equalizer data, I thought it would be more interesting to include subtitles from the interviews in the videos. I learned that Facebook supports captions in a common format called SRT. However, Twitter's video offerings have no support for captions.
Thankfully, I discovered the BBC's open source fork of audiogram, which supports subtitles and captioning, including the ability to "bake in" subtitles by encoding the words directly into the video frames. It also relies heavily on BBC web infrastructure, and required quite a bit of hacking up to work with what I had available.
In the end, my process looked like this:
Export the audio of the ~1 minute interview to an mp3.
Type up a text transcript of the audio. Using VLC's playback controls and turning the speed down to 0.33 made this pretty easy.
Use a "forced alignment" tool called gentle to create a big JSON data file containing all the utterances and their timestamps.
Use the jq command line tool to munge that JSON data into a format that my hacked-up version of the BBC audiogram generator can understand.
Use the BBC audiogram generator to edit the timings and word groupings for the subtitles and generate the final video.
Bonus: the BBC audiogram generator can output subtitles in SRT format - but if I've already "baked them in" this feels redundant.
You can see an early example here. I liked these posts and found them easy to post to my site as well as Facebook, Twitter, Mastodon, etc. Over time I evolved them a bit to include more info about the interviewee. Here's a later example.
One thing that has stuck with me is the idea that Facebook could be displaying these subtitles, if only I was exporting them in the SRT format. Additionally, I had done some research into subtitles for HTML5 video with WebVTT and the <track> element and wondered if it could work for audio content with some "tricks".
TL;DR - Browsers will show captions for audio if you pretend it is a video
Let's skip to the end and see what we're talking about. I wanted to make a version of my podcast where the entire ~10 minutes could be listened to along with timed subtitles, without creating a 10-minute long video. And I did!
Use the <track> element for your captions/subtitles/etc.
But is that the whole story? Sadly, no.
Creating Subtitles/Captions in WebVTT Format
In some ways, This Week in the IndieWeb Audio Edition is perfectly suited for automated captioning. In order to keep it short, I spend a good amount of time summarizing the newsletter into a concise script, which I read almost verbatim. I typically end up including the transcript when I post the podcast, hidden inside a <details> element.
This script can be fed into gentle, along with the audio, to find all the alignments - but then I have a bunch of JSON data that is not particularly useful to the browser or even Facebook's player.
Thankfully, as I mentioned above, the BBC audiogram generator can output a Facebook-flavored SRT file, and that is pretty close.
00:00:02,24 --> 00:00:04,77
While at the 2017 IndieWeb Summit,
00:00:04,84 --> 00:00:07,07
I sat down with some of the
participants to ask:
00:00:02.240 --> 00:00:04.770
While at the 2017 IndieWeb Summit,
00:00:04.840 --> 00:00:07.070
I sat down with some of the
participants to ask:
Yep. When stripped down to the minimum, the only real differences in these formats is the time format. Decimals delimit subsecond time offsets (instead of commas), and three digits of precision instead of two. Ha!
If you've been following the podcast, you may have noticed that I have not started doing this for every episode.
The primary reason is that the BBC audiogram tool becomes verrrrry sluggish when working with a 10-minute long transcript. Editing the timings for my test post took the better part of an hour before I had an SRT file I was happy with. I think I could streamline the process by editing the existing text transcript into "caption-sized" chunks, and write a bit of code that will use the pre-chunked text file and the word-timings from gentle to directly create SRT and WebVTT files.
Additionally, I'd like to make these tools more widely available to other folks. My current workflow to get gentle's output into the BBC audiogram tool is an ugly hack, but I believe I could make it as "easy" as making sure that gentle is running in the background when you run the audiogram generator.
Beyond the technical aspects, I am excited about this as a way to add extra visual interest to, and potentially increase listener comprehension for, these short audio posts. There are folks doing lots of interesting things with audio, such as the folks at Gretta, who are doing "live transcripts" with a sort of dual navigation mode where you can click on a paragraph to jump the audio around and click on the audio timeline and the transcript highlights the right spot. Here's an example of what I mean.
I don't know what I'll end up doing with this next, but I'm interested in feedback! Let me know what you think!
Hey Folks in/near NYC! @IndieWebCamp NYC is just 5 days away, 9/30 - 10/1
Last year’s IWC NYC was my first in-person IndieWeb experience, and I was completely caught up by the thoughtful people working first-hand to build a more personal, more social web; a web where your content, identity, and interactions are yours, instead of food for surveillance-powered ad-engines like Facebook.