After the 2017 IndieWeb Summit, each episode of the podcast also featured a brief ~1 minute interview with one of the participants there. As a way of highlighting these interviews outside the podcast itself, I became interested in the idea of "audiograms" – videos that are primarily audio content for sharing on platforms like Twitter and Facebook. I wrote up my first steps into audiograms using WNYC's audiogram generator.
While these audiograms were able to show visually interesting dynamic elements like waveforms or graphic equalizer data, I thought it would be more interesting to include subtitles from the interviews in the videos. I learned that Facebook supports captions in a common format called SRT. However, Twitter's video offerings have no support for captions.
Thankfully, I discovered the BBC's open source fork of audiogram, which supports subtitles and captioning, including the ability to "bake in" subtitles by encoding the words directly into the video frames. It also relies heavily on BBC web infrastructure, and required quite a bit of hacking up to work with what I had available.
In the end, my process looked like this:
Export the audio of the ~1 minute interview to an mp3.
Type up a text transcript of the audio. Using VLC's playback controls and turning the speed down to 0.33 made this pretty easy.
Use a "forced alignment" tool called gentle to create a big JSON data file containing all the utterances and their timestamps.
Use the jq command line tool to munge that JSON data into a format that my hacked-up version of the BBC audiogram generator can understand.
Use the BBC audiogram generator to edit the timings and word groupings for the subtitles and generate the final video.
Bonus: the BBC audiogram generator can output subtitles in SRT format - but if I've already "baked them in" this feels redundant.
You can see an early example here. I liked these posts and found them easy to post to my site as well as Facebook, Twitter, Mastodon, etc. Over time I evolved them a bit to include more info about the interviewee. Here's a later example.
One thing that has stuck with me is the idea that Facebook could be displaying these subtitles, if only I was exporting them in the SRT format. Additionally, I had done some research into subtitles for HTML5 video with WebVTT and the <track> element and wondered if it could work for audio content with some "tricks".
TL;DR - Browsers will show captions for audio if you pretend it is a video
Let's skip to the end and see what we're talking about. I wanted to make a version of my podcast where the entire ~10 minutes could be listened to along with timed subtitles, without creating a 10-minute long video. And I did!
Use the <track> element for your captions/subtitles/etc.
But is that the whole story? Sadly, no.
Creating Subtitles/Captions in WebVTT Format
In some ways, This Week in the IndieWeb Audio Edition is perfectly suited for automated captioning. In order to keep it short, I spend a good amount of time summarizing the newsletter into a concise script, which I read almost verbatim. I typically end up including the transcript when I post the podcast, hidden inside a <details> element.
This script can be fed into gentle, along with the audio, to find all the alignments - but then I have a bunch of JSON data that is not particularly useful to the browser or even Facebook's player.
Thankfully, as I mentioned above, the BBC audiogram generator can output a Facebook-flavored SRT file, and that is pretty close.
00:00:02,24 --> 00:00:04,77
While at the 2017 IndieWeb Summit,
00:00:04,84 --> 00:00:07,07
I sat down with some of the
participants to ask:
00:00:02.240 --> 00:00:04.770
While at the 2017 IndieWeb Summit,
00:00:04.840 --> 00:00:07.070
I sat down with some of the
participants to ask:
Yep. When stripped down to the minimum, the only real differences in these formats is the time format. Decimals delimit subsecond time offsets (instead of commas), and three digits of precision instead of two. Ha!
If you've been following the podcast, you may have noticed that I have not started doing this for every episode.
The primary reason is that the BBC audiogram tool becomes verrrrry sluggish when working with a 10-minute long transcript. Editing the timings for my test post took the better part of an hour before I had an SRT file I was happy with. I think I could streamline the process by editing the existing text transcript into "caption-sized" chunks, and write a bit of code that will use the pre-chunked text file and the word-timings from gentle to directly create SRT and WebVTT files.
Additionally, I'd like to make these tools more widely available to other folks. My current workflow to get gentle's output into the BBC audiogram tool is an ugly hack, but I believe I could make it as "easy" as making sure that gentle is running in the background when you run the audiogram generator.
Beyond the technical aspects, I am excited about this as a way to add extra visual interest to, and potentially increase listener comprehension for, these short audio posts. There are folks doing lots of interesting things with audio, such as the folks at Gretta, who are doing "live transcripts" with a sort of dual navigation mode where you can click on a paragraph to jump the audio around and click on the audio timeline and the transcript highlights the right spot. Here's an example of what I mean.
I don't know what I'll end up doing with this next, but I'm interested in feedback! Let me know what you think!
Jonathan Prozzi and I have challenged one another to make a post about improving our websites once a week. This is me getting back on the train!
In a previous site update I wrote about setting up a system to notify me whenever my site received webmentions. Essentially, this meant that I could now get notifications on my phone and desktop whenever somebody interacted with my site, such as: replying to one of my posts on their own site, retweeting or favoriting one of my posts, or even RSVPs to my Facebook events.
One thing I didn't super like about this system is that it used the Pushbullet service which, while great, is not under my control.
I've been running a Matrix chat server at home for a while now. I primarily use it to chat with people in my household in IRC channels. I use a really nice client for Matrix called Riot, which runs in the browser, but is also available on Android and iOS, and is capable of sending notifications about chat events, which I have found really handy.
Recently, I've added a chatbot to my Matrix server named Hubot, thanks to the Hubot-Matrix adapter. Hubot is super neat because it is fairly easy to script up new behaviors, and it has nice built-in support for the web - both for making web requests, but Hubot also runs a server for accepting web requests. Once I realized this, it occurred to me that I could replace my previous notification system that uses Pushbullet with one that goes through Hubot.
First, a note on security. Exposing a chatbot's HTTP listener interface to the great wide internet comes at some risk! I made sure to the following:
I run Hubot behind a firewall, so no plain HTTP traffic can come directly across the internet.
Using another home server, I set up nginx to act as a secure HTTPS proxy, using a certificate from Let's Encrypt to encrypt all traffic that goes over the internet.
I decided that any behaviors I write for Hubot that use the HTTP listener will use some kind of secret token to ensure that the request is valid. I don't want spammers blowing up my chatrooms!
I decided that the bot should:
Allow a user to request webmention.io notifications for a given site into any room.
Generate and store a "callback secret" to work with webmention.io's Web Hook system and tell the user the URL and callback secret to configure over on the Webmention.io Dashboard.
Accept HTTP requests from webmention.io at something like <HUBOT_HOST>/hubot/wmio/notify
Verify that the request contains the callback secret
Generate a nice text summary of the notification based on its contents
Send the notification to the room that the user was in when they made the follow request.
Once installed, you can start a conversation with your hubot and ask it to follow a site:
you> hubot wmio follow mycoolsite.biz
hubot> @you OK! Use this as your Web Hook: <HUBOT_URL>/hubot/wmio/notify
And use this as your callback secret: 1a2b3c4d5e6f7890000
The string "mycoolsite.biz" can actually be anything and should be something easy to remember in case you want to unfollow notifications later. Hubot doesn't check incoming mentions against it at the moment.
You can enter the URL and callback secret in the Webmention.io dashboard, and future webmentions will be sent to your Hubot and output into the room of your choice.
I don't know how useful hubot-webmentionio-notify will be for other folks at the moment, but I am excited be getting these notifications via services that I control. I look forward to building more fun things with Hubot!
Jonathan Prozzi and I have challenged one another to make a post about improving our websites once a week. This one should have gone up last week!
A few weeks ago I posted some thoughts about my IndieWeb setup called "Easier POSSE with Micropub Edits?" in which I wished for a tool that would let me take a given post from my site, syndicate it to silos like Twitter and Facebook (tweaking the content if I want), and updating the post on my site to show the links to those syndicated copies.
I failed to make at least one important thing clear in my original post – why do I care about syndication links? There are many reasons.
If I decide that a post should be syndicated to a silo, it's because I want it to reach the people who follow me there and, if that is true, I also want their interactions to come back to my site. So, in these ways, a post isn't "done" unless it is on my site, with syndicated copies on the silos I care about, and with syndication links for brid.gy to feed the interactions back.
When implementing a new feature, it always helps to have something to test against. So, I went looking for a Micropub client which supported queries and edits. The test suite for Micropub at micropub.rocks includes a lovely implementation report grid, showing which Micropub clients support what features of the spec.
Of the clients listed, two of them were web-based and Open Source. I had played with and liked Inkstone in the past, but its edit features are currently considered a work-in-progress. So, I tried out Micropublish.net, and it was exactly what I was looking for.
Micropublish has a feature to let you enter a URL for a post on your site to edit. It will use Micropub source content queries to get the source data for that post and let you edit the content and other properties of the post. It can then send a Micropub update to save the updated version of the post back to your site, if your server supports updates. It even has a great feature for developers - a "Preview" button will show you exactly what request will be sent to your server for the update.
Micropublish.net is a great tool for testing out Micropub query and update support, but my Micropub server is bespoke, hastily-written, hand-rolled Python. So, while it was easy enough to add query support, it took me a while to get my code structure cleaned up, write some tests, and actually implement updates.
A New Workflow
I am pleased to say that it works and, with the help of Micropublish.net, I now have a functioning workflow for publishing to my site and syndicating to silos like Twitter and Facebook, even from my phone, without having to open my laptop, edit YAML data, and push git repositories around. It looks like this.
Make a new post to my site with a micropub client like Quill.
Open the post for editing in micropublish.net (I use Url Forwarder for Android to make this super easy on my phone, a bookmarklet makes it easy on my laptop).
In a new tab, log in to Twitter and make a similar post, copy the URL to the new tweet into the Syndication field on my post.
Repeat the steps to make posts on Facebook, Mastodon, etc., copying their URLs into the Syndication field.
Finally, hit "Update" in micropublish.net to update my post with the syndication links.
This is still a very manual process, but it now makes it possible to finish a post in a way that I couldn't before. In the spirit of manual until it hurts, I will use this for a while and see what existing pain points remain, and what new ones appear, to help decide what comes next.
Thanks to Barry Frost for micropublish.net and to Tantek for the nudge to write an update!
Jonathan Prozzi and I have challenged one another to make a post about improving our websites once a week. Here’s mine!
In February 2011 I participated in the Thing-a-Day project on Posterous. It was the first time in a long time that I had published consistently, so when it was announced that Posterous was going away, I worked hard to grab my content and stored it somewhere.
Eventually it was November 2013, Wordpress was "out", static site generators were "in", and I wanted to give Octopress a try. I used Octopress' tools to import all my Wordpress content into Octopress, forgot about adding back the Disqus comments, and posted it all back online. In February 2014, I decided to resurrect my Posterous content, so I created posts for it and got everything looking nice enough.
In 2015 I learned about the IndieWeb, and decided it was time for a new approach to my identity and content online. I set up a new site at https://martymcgui.re/ based on Jekyll (hey! static sites are still "in"!) and got to work adding IndieWeb features.
Well, today I decided to get some of that old content off my other domain and into my official one. Thankfully, with Octopress being based on Jekyll, it was mostly just a matter of copying over the files in the _posts/ folder. A few tweaks to a few posts to make up for newer parsing in Jekyll, my somewhat odd URL structure, etc., and I was good to go!
"Owning" My Disqus Comments
Though I had long ago considered them lost, I noticed that some of my old posts had a section that the Octopress importer had added to the metadata of my posts from Wordpress:
You can request a compressed XML file containing all of your comment data, organized hierarchically into "category" (which I think can be configured per-site), "thread" (individual pages), and "post" (the actual comments), and includes info such as author name and email, the date it was created, the comment message with some whitelisted HTML for formatting and links, whether the comment was identified as spam or has been deleted, etc.
The XML format was making me queasy, and Jekyll data files often come in YAML format for editability, so I did the laziest XML to YAML transform possible, thanks to some Ruby and this StackOverflow post.
I dropped this into my Jekyll site as _data/disqus.yml, and ... that's it! I could now access the content from my templates in site.data.disqus.
I wrote a short template snippet that, if the post has a "meta" property with a "dsq_thread_id", to look in site.data.disqus.disqus.post and collect all Disqus comments where "thread.dsq:id" was the same as the "dsq_thread_id" for the post. If there are comments there, they're displayed in a "Comments" section on the page.
So now some of my oldest posts have some of their discussion back after more than 7 years!
I was (pleasantly) surprised to be able to recover and consolidate this older content. Thanks to past me for keeping good backups, and to Disqus for still being around and offering a comprehensive export.
As a bonus, since all of the comments include the commenter's email address, I could give them avatars with Gravatar, and (though they have no URL to link to) they would almost look right at home alongside the more modern mentions I display on my site.
Jonathan Prozzi and I have challenged one another to make a post about improving our websites once a week. I'm late with this one!
Most of the features on my website are experiments in learning new things. Sometimes I learn a better way of doing something that I've already built into the site and it's time to migrate!
Moving Media files from Git LFS to a Media Endpoint
I build my site with Jekyll, and I store my site's configuration and text content via Git. One of the things that most folks avoid with Git is storing text content (which fits into Git's model of efficiently storing differences over time) with large binary files like images, etc. (which Git cannot manage as efficiently).
When I first set up my site, I made use of Git LFS ("Large File Storage") for managing anything that wasn't text. Any images, video, or audio that I added to my site was stored in an _assets/ folder in a way that matched uploaded files to the posts they were a part of. Git LFS would transparently ship those files off to a secondary server rather than include their content in the Git repository itself. I had to go through some hoops to set up my local GitLab server to support Git LFS and to set up Git LFS with the server that handles receiving new posts via Micropub, compiling and deploying the site.
It turns out that there are many reasons that a site would want to handle media files separately from the text content that refers to them. In fact, it is a common enough pattern that the Micropub standard includes a definition for a separate "media endpoint" to handle file uploads. I shared a Micropub media endpoint implementation that I built called Spano a while back, and it has been working well with support from tools like Quill. So the text content of my site is served from https://martymcgui.re/, and my media files from https://media.martymcgui.re/. With a couple of changes in my code and my workflows, this has become the way I handle all media files for my site.
However, I still had a bunch of files in site being handled by Git LFS, and some of my Jekyll code (plugins and templates) for showing embeds expected files to be on the local filesystem. This past week I took some time to write some scripts to find all references to those local files, migrate them to my media server, and update the outgoing links. I also updated my embed handling so it didn't rely on local files. This let me delete a lot of local metadata I was keeping but not using, like all the EXIF tags in uploaded photos. I am now Git LFS free and it feels like one less thing to worry about.
Better Caching for Mentions from Webmention.io
When I finally started displaying webmentions, I had a very simple model for how to cache all the info from webmention.io. Basically: I stored all mentions in a big array and, when my site went to fetch new mentions, it would keep fetching until it saw the "last" mention again. This led to a bit of a bug where someone might send me a mention, update their page, and send the mention again. My site would not be able to recognize the "last" mention, so it would fetch all my mentions again, leading to everything appearing twice.
This past week I rewrote my mention handling to avoid this problem by replacing this array and storing mentions in a hash based on the source and target. The new code also checks to see if the verification date of the mention has changed (giving me a way to detect and notify about changed mentions in the future). I also reorganized my mention cache to include an index by the target URL on my site. This makes it a bit quicker to find mentions for a given page when rendering out the site.
Neither of these changes are really visible to readers of my site, but they have been useful for cleaning things up. The webmention.io handling in particular has brought my plugin a lot closer to being something I could release for other people to use!
TL;DR, my site now pulls attempts to recognize single-emoji comments and display them as a "Reaction".
Slightly longer version - my site uses webmention.io for handling webmentions, and I use brid.gy to backfeed interactions from Facebook to my own site. The way brid.gy handles Facebook reactions other than the standard "like" is a little quirky - they show up in webmention.io as a "reply" with a single emoji as the "content".
Using the Ruby twemoji library, my site checks the "content" of a reply against the emoji index and, if the content is a single emoji, pulls it out of the usual "reply" display and puts it in a facepile. The emoji itself is shown as an icon in the corner of the little face image.
While I was at it, I cleaned up a lot of my webmention-handling template to make things much clearer. This will make things easier for folks that want to re-use this code when I (eventually) release this as a Jekyll plugin.
Jonathan Prozzi and I have challenged one another to make a post about improving our websites once a week. I'm a little late with this one!
I recently added support for displaying mentions, such as likes, reposts, comments, etc. from around the web that refer to the posts on my site. One thing the update didn't do is catch another type of mention, such as when someone mentions me in a tweet (example). These get fed to my website by brid.gy, but weren't displayed anywhere.
So, I created a /mentions page for displaying these mentions. In the future, when a post mentions my homepage, the result will show up on the mentions page.
My mentions still don't yet update in real time - they are compiled into my site whenever I make a new post. That's coming up in the future, but I have taken one more step towards real-time interactions with notifications!
Webmention.io, the service that I use for accepting and storing webmentions, has a WebHook option that can notify your site whenever a new webmention has been received. I wrote up a simple Python service using Flask that will listen for these messages from webmention.io and send them to me via PushBullet, a notification service that I've been using for a while for other projects.
Now, I'll see a notification on my phone and laptop when another site sends me a webmention!
Webmentions are one of the most interesting and powerful technologies floating around the IndieWeb. At their most basic, they sites on the web to interact by sending a notification when a page on one site links to a page on another. When combined with machine-readable metadata like microformats2, they enable really neat social interactions between sites, feeding back likes, comments, bookmarks, shares, event RSVPs, and plenty more.
A site doesn't have to do all its own Webmention handling, and there are a few services that will handle them for you. I set up my website with the Webmention.io service back in August 2016 (so long ago!) and it's been accepting mentions from other sites since then. And, while there aren't a lot of websites that send Webmentions natively, there are services like Bridgy which uses Webmentions to backfeed social interactions to my site from sites like Facebook and Twitter. Pretty neat!
When I publish a post with a link to a site that support Webmentions, I still need to actually send that notification. I haven't yet built a tool that does that for my own website, but I have been able to make use of Aaron Parecki's Telegraph, which will take in a link to one of my posts and parse it for outgoing links, find out of the targets of those links support Webmentions, and allow me to send them with the press of a button. It's ridiculously easy to use and has the added benefit of letting me pick-and-choose which links go out as Webmentions.
Webmention.io has been collecting mentions for my site for something like 6 months, but they don't just magically show up on my site! Webmention.io provides an API for fetching the mention data for individual pages, or all mentions for my domain.
What works? Let's see!
Here's an example post with some Likes and RSVPs (both "yes"es and "maybe"s):
Overall, I'm really excited to finally be showing these on my site! I think Webmention is a pretty critical part of bringing the "social web" into the IndieWeb and back out of the silos. I am grateful to all the folks that have made this possible with their work on standards and tools!
In general, this means that you should make posts on your own site, then copy the post to silos like Twitter, Facebook, etc., to reach the folks in those communities. To complete the process, include links on your site from the original post out to the syndicated copies.
One fun reason to do this is that tools like brid.gy use syndication links in order to backfeed comments and reactions from silos like Facebook and Twitter to your own site.
I'd been collecting these links for a while and displaying them in a "hidden" way - so tools like bridgy could see them, but a human reading the page would not.
Yesterday I added a "See also:" section that includes links out to any syndicated copies of my posts on other sites.
First up, I added support for "tag aggregations" - essentially, pages that list all posts with a certain tag. So, any future editions of this audio newsletter that I post can be tagged with "this-week-indieweb-podcast" and will then show up on the "This Week in the IndieWeb Podcast" page. It should soon be possible to feed that page to a tool like Granary to convert the feed on that page, with its audio entries, into an RSS feed suitable for subscribing with a podcast app.
Next up, I added support for "Media Fragments", a W3C recommendation that allows linking to a specific timestamp to start (and even stop!) playback of video and audio. Aaron Parecki's recently implemented this on his own site and was kind enough to share the implementation! Now, you can create links that jump to a specific time of any audio or video post on my site.
For example, if you want to quickly jump to the part of the This Week in the IndieWeb audio edition that contains info about the next upcoming Homebrew Website Club meetings, it looks like this: https://martymcgui.re/2017/02/18/151503/#t=54
Media fragments could enable some fun things, such as a list of links that index directly to particular sections of a long recording.