Marty McGuire

Posts Tagged indieweb

2023
Wed Apr 26

Bad web dev ideas: emoji as IDs in URLs

It's another rambling web dev post sorry not sorry grab a beverage and let's go.

Begin with the Horrible Admission 🤫😳

Rather than make you wait, here it is:

I created and maintain An IndieWeb Webring and it uses emojis as IDs. When you add your site to the webring by signing in, an emoji ID is created for you, and it acts, in some ways, like your username or profile ID.

For example my webring profile is: 🕸️💍.ws/🚯

Hopefully you see the appeal. The webring is on an emoji domain, which I registered at IndieWeb Summit 2018 inspired by something Doug Beal presented. It was soon decided that I must in fact build a webring there and, leveraging his influence as the originator of the idea, Doug insisted that emojis be used as user IDs. Probably because: moar emojis!

Why even have IDs? 🛂🤷🏻‍♂️

This is a dang good question and I could probably sleep better if I drop them.

Ring-ness and identifying click sources

At the time, the idea was that a webring should be a ring, with adjacency. In other words, if I click the "next site" link on your website, it should consistently go to the same site. If I click the "previous site" link on their site, it should consistently go back to yours. In order to consistently know which site to send them to next, the webring needs to know where you're coming from.

Historically there has been a way to know where a visitor is coming from: the HTTP "Referer" [sic] header. However, at the time (and in the time since) there have been privacy issues around using this header, and it was not uncommon to see sites sabotage this info with purposeful redirects, browser standards were added to allow degrading or removing referrer info, and Chrome even implemented default policies to degrade it.

So the more reliable thing would be to put some identifier in the URL and trust that webring members wouldn't like weirdly spoof one anothers' unique webring URLs.

Profile pages

Another at the time thought was that webring members might want to "prove" that they were on the webring. An active profile page with rel-me verification linking back to their homepage could allow other sites and tools to say "oh hey! you're a member of this webring". Maybe we could show some stats there! At any rate, none of those use cases ever came into existence as far as I know. (Besides, stats are gross.)

Why not use <xyz> as an ID? 🤔💡

Some things I considered and discarded:

  • Just use the URL. URLs-appended-to-URLs require escaping and sanitizing and this gets ugly and long.
  • Use a hash of the URL. I like this approach (spoiler alert) but even when they are short hashes are also long and ugly.
  • Use a numeric ID like a database row. This would probably be fine but numeric IDs can be easily crawled (fine for a webring I guess??) and people get weird about low ID numbers sometimes (warning: orange site link). Also they are boring.
  • Something more clever. Well, we probably could have but this was a weekend hack kind of project.

The original sin 🍎🐍

There is a fun little NodeJS library called hash-emoji that takes any-ol-thing and gives you back a string of emojis of the length you ask. Under the hood, it uses a strong SHA256 hash in hexadecimal, parses that hexadecimal number, then uses modulo arithmetic to keep adding "digits" just like any hash algorithm, only the "digits" come from a collection of emojis that were in wide use back in 2017 when the library was created.

I alluded to this in my post about adding a directory to the webring in 2019, but slapping a random emoji or two or three on someone's profile is potentially problematic. When some folks pointed out it was odd that they were assigned a country flag for a place they had never been, I cheekily forked hash-emoji to make hash-emoji-without-borders, which is identical in all ways except I pulled out all flags. Now nobody would get a flag!

However, by removing the flags I had shrunk the key space for the emoji hash. It wasn't great to begin with - using one emoji from the initial set of ~1300 was bound to lead to collisions eventually - but by changing the set I felt like I was increasing the likelihood that some new signup would be assigned the same emoji ID as an already existing site in the ring.

So, I bumped the length of new IDs up to 2. And then, after an unrelated change where I started normalizing URLs differently before creating their hash ID, up to 3.

Let me tell you. 3 emoji almost always tell a story. You can't not see them. Sometimes those stories can be problematic! Or maybe it's nice but you don't find it relatable. Or maybe there is no story, per se, but you're assigned something that you object to being associated with, like weapons or kissing faces or drugs and alcohol or religious symbols, or ...

Some folks also ran into issues with the URLs on their web hosting. While I tend to think of the web as being universally UTF-8, that's not necessarily the case. Some hosts would mangle the Unicode URLs, resulting in the webring not being able to find them, resulting in sites being de-listed from the ring. To make it easy for the widest possible of webring users to simply copy their webring links and paste them into their site, I change it to use the %-encoded versions of the emojis. So now my beautiful 🕸️💍.ws/🚯 has become the horrific: https://xn--sr8hvo.ws/%F0%9F%9A%AF

Instead of worrying too much about any of this I just ... left it alone. A few times I received requests for folks who wanted something custom, and sometimes I obliged as long as it was a request for a unique one- or two-emoji ID, so it couldn't collide with someone's future random ID.

Fooling around, finding out 🥫🪱

I've been low-key working on porting the webring from its old and crumbling NodeJS implementation to (hear me out) PHP (I SAID HEAR ME OUT). The basic idea is to reduce the number of emails I get from GitHub where dependabot reports a vulnerability in this or that dependency-of-a-dependency, and more importantly to reduce the amount of time that updating those dependencies takes by reducing the amount of breakage that occurs. I guess this is me saying that churn among dependencies for NodeJS apps feels more disruptive to me than I expect churn to be for PHP. Please don't @ me. Any way.

hash-emoji is broken

When porting hash-emoji from Javascript to PHP I had some issues where the modulo and division math wasn't working. Turns out SHA256 digests, which are hex-encoded strings 64-characters in length, make for very large numbers when you represent them as numbers to do number math on them. With basic PHP numeric types this was turning up junk, zeroes for every modulo division.

So I tried out both of the main PHP extensions for arbitrary precision math, GMP and BC Math, and got results that were at least functional. However, they weren't the same as the Javascript hash-emoji implementation.

At least, they weren't the same until I updated my copy of hash-emoji to use BigInt to make sure it was doing its arbitrary precision math properly. It was at this point that Javascript hash-emoji now began consistently outputting the same results as my new PHP implementation.

That means the original hash-emoji algorithm, due to some quirks of Javascript Number math for large numbers, gives results that are not consistent with the same algorithm when using arbitrary precision math types.

With my skills I cannot hope to make a PHP port of hash-emoji that produces identically quirky results to the Javascript version, so, it looks like emoji IDs will have to change again.

Considering the abyss ⚫👀

I thought that, if the emoji ID generation has to change, maybe I can change it for the better? I brainstormed some ideas in the IndieWeb chat. One nice change would be to bring the emoji set up-to-date to at least Unicode 14 (published 2021, implemented widely during 2022). One major unsolved challenge would be to come up with some "unproblematic" set of emojis. For example, modifiers for skin tone and gender are widely supported, even in complex combinations like people kissing.

Thanks to sknebel some helpful suggestions, like generating IDs without skin tone modifiers and stripping out skin tone modifiers before looking up an ID. This would allow webring users to customize any emoji that have skin tone variants. Maybe that could be expanded to customizing more things when there are variants by gender, or complex combinations like family with two adults and two children, if it should come up. A "customize your ID" tool begins to design itself (lol).

sknebel also pointed me to an excellent resource that the Unicode consortium calls the "best definition of the full set [of emoji]" - the emoji-test.txt file. Here's the Unicode 14 emoji-test.txt file. It encodes each emoji in a line-oriented format, organized helpfully into groups and sub-groups like the ones you see on your favorite emoji keyboard.

I could parse this file out into various datasets annotated with their groupings and sub-groupings. Then, I could use those definitions to pull together any combination of (sub-)groupings that I want into different hash-emoji datasets.

I could expand a "customize your ID" tool to allow folks who don't like their initial ID to opt out of any groups they don't want emojis from. I could treat those groupings as a flag set and map that flag set to an emoji to prepend to their ID, so each combination of groups becomes its own key space. Nice!

Reader, let me tell you: I do not want to do all of that.

End with the other Horrible Admission 🤫😏

As implemented, the webring isn't a true ring. Whether a visitor clicks your custom emoji /next link or your custom emoji /previous link, the webring, in fact, sends you to another active site in the ring at random.

So what's next? I think the webring can function fine without these IDs. The copy-paste webring links can become identical for everyone, and the directory, sign-in, and dashboard pages don't make use of them at all. The one exception is individual profile page URLs for member pages, which I think I can safely drop.

What do you think? Are you horrified by any of this? Enraged? Got an ideas I can try instead? Drop me a reply, I'd love to hear from you!

Sat Apr 22

This Week in the IndieWeb Audio Edition • April 15th - 21st, 2023

Show/Hide Transcript

Unblocking your blogging, Twitter login troubles, and planning IndieWebCamp Nuremburg. It’s the audio edition for This Week in the IndieWeb for April 15th - 21st, 2023.

You can find all of my audio editions and subscribe with your favorite podcast app here: martymcgui.re/podcasts/indieweb/.

Music from Aaron Parecki’s 100DaysOfMusic project: Day 85 - Suit, Day 48 - Glitch, Day 49 - Floating, Day 9, and Day 11

Thanks to everyone in the IndieWeb chat for their feedback and suggestions. Please drop me a note if there are any changes you’d like to see for this audio edition!

Sun Apr 16

This Week in the IndieWeb Audio Edition • April 8th - 14th, 2023

Show/Hide Transcript

IndiePass development resumes, public problems with Webmention, and NPR flies the nest. It’s the audio edition for This Week in the IndieWeb for April 8th - 14th, 2023.

You can find all of my audio editions and subscribe with your favorite podcast app here: martymcgui.re/podcasts/indieweb/.

Music from Aaron Parecki’s 100DaysOfMusic project: Day 85 - Suit, Day 48 - Glitch, Day 49 - Floating, Day 9, and Day 11

Thanks to everyone in the IndieWeb chat for their feedback and suggestions. Please drop me a note if there are any changes you’d like to see for this audio edition!

Wed Apr 12

IndieWeb dev note: Microsub isn't a general-purpose storage API

This is probably relevant only to very few people and likely only for myself the next time I think up an idea along these lines.

Obligatory background

I'm a heavy daily user of IndieWeb-style social readers. For my setup, I run my own copy of Aaron Parecki's Aperture, a backend service which manages my subscriptions, fetches feeds, and organizes everything into "channels". On my reading devices, I use an app like Aaron Parecki's Monocle, which talks to Aperture to fetch and display my channels and the entries for each, mark things as read, and more.

These tools speak a protocol called Microsub, which defines a simple HTTP protocol for all those things Monocle does. It specifies how a client can ask a server for channels, list entries in a channel, mark things as read, delete things, add new subscriptions, and so on.

One bonus feature that Aperture has, but that is not part of the Microsub (with an "s") spec, is that in addition to subscribing to feeds, you can also push content into channels, using a different protocol called Micropub, Though they are off by one letter, they do very different things! Micropub (with a "p") is a specification for authoring tools to help you make posts to your personal site, with extensions that also allow for searching posts, updating content, and much more. In Aperture's case, Micropub support is quite minimal - it can be used to push a new entry into a channel, and that's it. It's designed for systems that might not have a public feed, or that create events in real time.

Okay but what's the problem?

I use Aperture to follow some YouTube channels, so I don't have to visit the distraction-heavy YouTube app to see when my favorite channels update. This is possible because every YouTube channel has an RSS feed! What matters is that a good feed reader can take the URL for a YouTube channel (like the one for IndieWebCamp on YouTube) and parse the page to find its feed (in this case, https://www.youtube.com/feeds/videos.xml?channel_id=UCco4TTt7ikz9xnB35HrD5gQ).

YouTube also provides feeds for playlists, and maybe more! It's a fun way to pull content, and they even support push notifications for these feeds via a standard called WebSub .

But! (of course there's a but!) YouTube's feeds encode some useful information, like the URL for a video's thumbnail image, and the description for the video, using an extension of RSS called Media RSS. This isn't recognized by Aperture, and it also isn't recognized by my go-to feed munging service Granary. As a result, while I can see when a new video is posted by the channels I follow, they... don't look like much!

Screenshot of feed reader Monocle showing YouTube videos. Each entry includes the title and URL of the channel, the title of the video, and when it was posted. And that's it.

All I can see is that a given channel posted a video, and the title of the video.

Okay can we get to the point?

I'd like to fix this, and my first (wrong) thought was: since Aperture already has these not-very-good entries, maybe I can make an automated system that:

  • acts like a Microsub client to fetch each entry from my YouTube Subscriptions channel
  • look at each to see if it's missing information like the thumbnail
  • for each entry with missing info, look up that info directly from YouTube, maybe via their API
  • somehow update the entry with this info.

Again, this is ... the wrong mental model. But why? The docs for Aperture, the Microsub backend, gives us a hint when it covers how to write Microsub clients.

Aperture has implemented the following actions in the Microsub spec:

Nowhere in that list is the ability to update or even create entries. Those things are outside the scope of the spec. The spec is intentionally narrow in describing how clients can manage channels, subscriptions, and mark read or delete entries pulled from those subscriptions. That's it! And that's good!

Remembering that the "write API" I was thinking of was actually Micropub (with a "p"), I took a look at the source for Aperture that handles Micropub requests and it does refreshingly few things. It allows creating new entries from a Micropub payload, and it supports uploading media that would go along with a payload. That's it. And that's good!

At this point, I thought I could still continue down my wrong-idea road. The automated system would:

  • act as a Microsub (with an "s") client to fetch each entry from my YouTube Subscriptions channel
  • look at each to see if it's missing information like the thumbnail
  • for each entry with missing info, look up that info directly from YouTube, maybe via their API
  • use Microsub to delete the original entry
  • use Micropub (with a "p") to create a new entry with all the new details

This approach... should work! However, it certainly felt like I was working against the grain.

I brought this up in the IndieWeb dev chat, where Aaron and Ryan cleared things up. Microsub is intentionally simple, and adding general operations to let clients treat the server like a general data store is way out of scope. Similarly, while Aperture supports some of Micropub, that's a choice specific to Aperture.

Have we learned anything?

The general consensus was that entries should get into a Microsub server like Aperture via feeds. And if the feeds I'm looking at don't have the content I want, I should make a feed that does! I should be able to make a proxy service that:

  • accepts a URL for a YouTube channel or playlist feed,
  • fetches the feed,
  • extracts everything I want from each entry, including thumbnails, and maybe even uses the YouTube API to get info like video length,
  • rewrites that in a feed format that Aperture likes better. Probably just HTML with microformats2 to make an h-feed

For each of my YouTube subscriptions, I'll swap out the YouTube RSS for the new proxied feed - something that the Microsub API is intended to let me automate.

One thing I mentioned in the chat discussion I linked above: I default to thinking of feed services like this proxy as "public infrastructure" by default - something that has to be on the public web, with all the maintenance and security issues that go along with that.

However, as I self-host my own Aperture instance, I can set up this proxy on the same server and ensure that it only accepts local requests. Fewer public endpoints, fewer problems.

Anyway, maybe I'll get that done and posted about in the near future, but today I just wanted to get these thoughts out of my head and close some tabs!

Fri Apr 7

This Week in the IndieWeb Audio Edition • April 1st - 7th, 2023

Show/Hide Transcript

Fun facts about Feedly, going indie with 11ty, and bye bye, Twitter API. It’s the audio edition for This Week in the IndieWeb for April 1st - 7th, 2023.

You can find all of my audio editions and subscribe with your favorite podcast app here: martymcgui.re/podcasts/indieweb/.

Music from Aaron Parecki’s 100DaysOfMusic project: Day 85 - Suit, Day 48 - Glitch, Day 49 - Floating, Day 9, and Day 11

Thanks to everyone in the IndieWeb chat for their feedback and suggestions. Please drop me a note if there are any changes you’d like to see for this audio edition!

Sat Apr 1

This Week in the IndieWeb Audio Edition • March 25th - 31st, 2023

Show/Hide Transcript

Test suites are totally sweet, and a virtual face-to-face to forward the Fediverse. It’s the audio edition for This Week in the IndieWeb for March 25th - 31st, 2023.

You can find all of my audio editions and subscribe with your favorite podcast app here: martymcgui.re/podcasts/indieweb/.

Music from Aaron Parecki’s 100DaysOfMusic project: Day 85 - Suit, Day 48 - Glitch, Day 49 - Floating, Day 9, and Day 11

Thanks to everyone in the IndieWeb chat for their feedback and suggestions. Please drop me a note if there are any changes you’d like to see for this audio edition!

Sat Mar 25

This Week in the IndieWeb Audio Edition • March 18th - 24th, 2023

Show/Hide Transcript

New community members, upcoming social web events, and digesting digests. It’s the audio edition for This Week in the IndieWeb for March 18th - 24th, 2023.

You can find all of my audio editions and subscribe with your favorite podcast app here: martymcgui.re/podcasts/indieweb/.

Music from Aaron Parecki’s 100DaysOfMusic project: Day 85 - Suit, Day 48 - Glitch, Day 49 - Floating, Day 9, and Day 11

Thanks to everyone in the IndieWeb chat for their feedback and suggestions. Please drop me a note if there are any changes you’d like to see for this audio edition!

Sat Mar 18

This Week in the IndieWeb Audio Edition • March 11th - 17th, 2023

Show/Hide Transcript

IndieWebCamp Düsseldorf planning, A People’s History of Twitter, and an AI’s history of James. It’s the audio edition for This Week in the IndieWeb for March 11th - 17th, 2023.

You can find all of my audio editions and subscribe with your favorite podcast app here: martymcgui.re/podcasts/indieweb/.

Music from Aaron Parecki’s 100DaysOfMusic project: Day 85 - Suit, Day 48 - Glitch, Day 49 - Floating, Day 9, and Day 11

Thanks to everyone in the IndieWeb chat for their feedback and suggestions. Please drop me a note if there are any changes you’d like to see for this audio edition!

Sat Mar 11

This Week in the IndieWeb Audio Edition • March 4th - 10th, 2023

Show/Hide Transcript

IndieWebCamp Düsseldorf planning, federated formatting, and BlueSky’s beta is brewing. It’s the audio edition for This Week in the IndieWeb for March 4th - 10th, 2023.

You can find all of my audio editions and subscribe with your favorite podcast app here: martymcgui.re/podcasts/indieweb/.

Music from Aaron Parecki’s 100DaysOfMusic project: Day 85 - Suit, Day 48 - Glitch, Day 49 - Floating, Day 9, and Day 11

Thanks to everyone in the IndieWeb chat for their feedback and suggestions. Please drop me a note if there are any changes you’d like to see for this audio edition!

Sat Mar 4

This Week in the IndieWeb Audio Edition • February 25th - March 3rd, 2023

Show/Hide Transcript

Wiki spring cleaning, a false Fediverse, and how do you hashtag? It’s the audio edition for This Week in the IndieWeb for February 25th - March 3rd, 2023.

You can find all of my audio editions and subscribe with your favorite podcast app here: martymcgui.re/podcasts/indieweb/.

Music from Aaron Parecki’s 100DaysOfMusic project: Day 85 - Suit, Day 48 - Glitch, Day 49 - Floating, Day 9, and Day 11

Thanks to everyone in the IndieWeb chat for their feedback and suggestions. Please drop me a note if there are any changes you’d like to see for this audio edition!