Bad web dev ideas: emoji as IDs in URLs

It's another rambling web dev post sorry not sorry grab a beverage and let's go.

Begin with the Horrible Admission 🤫😳

Rather than make you wait, here it is:

I created and maintain An IndieWeb Webring and it uses emojis as IDs. When you add your site to the webring by signing in, an emoji ID is created for you, and it acts, in some ways, like your username or profile ID.

For example my webring profile is: 🕸️💍.ws/🚯

Hopefully you see the appeal. The webring is on an emoji domain, which I registered at IndieWeb Summit 2018 inspired by something Doug Beal presented. It was soon decided that I must in fact build a webring there and, leveraging his influence as the originator of the idea, Doug insisted that emojis be used as user IDs. Probably because: moar emojis!

Why even have IDs? 🛂🤷🏻‍♂️

This is a dang good question and I could probably sleep better if I drop them.

Ring-ness and identifying click sources

At the time, the idea was that a webring should be a ring, with adjacency. In other words, if I click the "next site" link on your website, it should consistently go to the same site. If I click the "previous site" link on their site, it should consistently go back to yours. In order to consistently know which site to send them to next, the webring needs to know where you're coming from.

Historically there has been a way to know where a visitor is coming from: the HTTP "Referer" [sic] header. However, at the time (and in the time since) there have been privacy issues around using this header, and it was not uncommon to see sites sabotage this info with purposeful redirects, browser standards were added to allow degrading or removing referrer info, and Chrome even implemented default policies to degrade it.

So the more reliable thing would be to put some identifier in the URL and trust that webring members wouldn't like weirdly spoof one anothers' unique webring URLs.

Profile pages

Another at the time thought was that webring members might want to "prove" that they were on the webring. An active profile page with rel-me verification linking back to their homepage could allow other sites and tools to say "oh hey! you're a member of this webring". Maybe we could show some stats there! At any rate, none of those use cases ever came into existence as far as I know. (Besides, stats are gross.)

Why not use <xyz> as an ID? 🤔💡

Some things I considered and discarded:

Just use the URL. URLs-appended-to-URLs require escaping and sanitizing and this gets ugly and long.
Use a hash of the URL. I like this approach (spoiler alert) but even when they are short hashes are also long and ugly.
Use a numeric ID like a database row. This would probably be fine but numeric IDs can be easily crawled (fine for a webring I guess??) and people get weird about low ID numbers sometimes (warning: orange site link). Also they are boring.
Something more clever. Well, we probably could have but this was a weekend hack kind of project.

The original sin 🍎🐍

There is a fun little NodeJS library called hash-emoji that takes any-ol-thing and gives you back a string of emojis of the length you ask. Under the hood, it uses a strong SHA256 hash in hexadecimal, parses that hexadecimal number, then uses modulo arithmetic to keep adding "digits" just like any hash algorithm, only the "digits" come from a collection of emojis that were in wide use back in 2017 when the library was created.

I alluded to this in my post about adding a directory to the webring in 2019, but slapping a random emoji or two or three on someone's profile is potentially problematic. When some folks pointed out it was odd that they were assigned a country flag for a place they had never been, I cheekily forked hash-emoji to make hash-emoji-without-borders, which is identical in all ways except I pulled out all flags. Now nobody would get a flag!

However, by removing the flags I had shrunk the key space for the emoji hash. It wasn't great to begin with - using one emoji from the initial set of ~1300 was bound to lead to collisions eventually - but by changing the set I felt like I was increasing the likelihood that some new signup would be assigned the same emoji ID as an already existing site in the ring.

So, I bumped the length of new IDs up to 2. And then, after an unrelated change where I started normalizing URLs differently before creating their hash ID, up to 3.

Let me tell you. 3 emoji almost always tell a story. You can't not see them. Sometimes those stories can be problematic! Or maybe it's nice but you don't find it relatable. Or maybe there is no story, per se, but you're assigned something that you object to being associated with, like weapons or kissing faces or drugs and alcohol or religious symbols, or ...

Some folks also ran into issues with the URLs on their web hosting. While I tend to think of the web as being universally UTF-8, that's not necessarily the case. Some hosts would mangle the Unicode URLs, resulting in the webring not being able to find them, resulting in sites being de-listed from the ring. To make it easy for the widest possible of webring users to simply copy their webring links and paste them into their site, I change it to use the %-encoded versions of the emojis. So now my beautiful 🕸️💍.ws/🚯 has become the horrific: https://xn--sr8hvo.ws/%F0%9F%9A%AF

Instead of worrying too much about any of this I just ... left it alone. A few times I received requests for folks who wanted something custom, and sometimes I obliged as long as it was a request for a unique one- or two-emoji ID, so it couldn't collide with someone's future random ID.

Fooling around, finding out 🥫🪱

I've been low-key working on porting the webring from its old and crumbling NodeJS implementation to (hear me out) PHP (I SAID HEAR ME OUT). The basic idea is to reduce the number of emails I get from GitHub where dependabot reports a vulnerability in this or that dependency-of-a-dependency, and more importantly to reduce the amount of time that updating those dependencies takes by reducing the amount of breakage that occurs. I guess this is me saying that churn among dependencies for NodeJS apps feels more disruptive to me than I expect churn to be for PHP. Please don't @ me. Any way.

hash-emoji is broken

When porting hash-emoji from Javascript to PHP I had some issues where the modulo and division math wasn't working. Turns out SHA256 digests, which are hex-encoded strings 64-characters in length, make for very large numbers when you represent them as numbers to do number math on them. With basic PHP numeric types this was turning up junk, zeroes for every modulo division.

So I tried out both of the main PHP extensions for arbitrary precision math, GMP and BC Math, and got results that were at least functional. However, they weren't the same as the Javascript hash-emoji implementation.

At least, they weren't the same until I updated my copy of hash-emoji to use BigInt to make sure it was doing its arbitrary precision math properly. It was at this point that Javascript hash-emoji now began consistently outputting the same results as my new PHP implementation.

That means the original hash-emoji algorithm, due to some quirks of Javascript Number math for large numbers, gives results that are not consistent with the same algorithm when using arbitrary precision math types.

With my skills I cannot hope to make a PHP port of hash-emoji that produces identically quirky results to the Javascript version, so, it looks like emoji IDs will have to change again.

Considering the abyss ⚫👀

I thought that, if the emoji ID generation has to change, maybe I can change it for the better? I brainstormed some ideas in the IndieWeb chat. One nice change would be to bring the emoji set up-to-date to at least Unicode 14 (published 2021, implemented widely during 2022). One major unsolved challenge would be to come up with some "unproblematic" set of emojis. For example, modifiers for skin tone and gender are widely supported, even in complex combinations like people kissing.

Thanks to sknebel some helpful suggestions, like generating IDs without skin tone modifiers and stripping out skin tone modifiers before looking up an ID. This would allow webring users to customize any emoji that have skin tone variants. Maybe that could be expanded to customizing more things when there are variants by gender, or complex combinations like family with two adults and two children, if it should come up. A "customize your ID" tool begins to design itself (lol).

sknebel also pointed me to an excellent resource that the Unicode consortium calls the "best definition of the full set [of emoji]" - the emoji-test.txt file. Here's the Unicode 14 emoji-test.txt file. It encodes each emoji in a line-oriented format, organized helpfully into groups and sub-groups like the ones you see on your favorite emoji keyboard.

I could parse this file out into various datasets annotated with their groupings and sub-groupings. Then, I could use those definitions to pull together any combination of (sub-)groupings that I want into different hash-emoji datasets.

I could expand a "customize your ID" tool to allow folks who don't like their initial ID to opt out of any groups they don't want emojis from. I could treat those groupings as a flag set and map that flag set to an emoji to prepend to their ID, so each combination of groups becomes its own key space. Nice!

Reader, let me tell you: I do not want to do all of that.

End with the other Horrible Admission 🤫😏

As implemented, the webring isn't a true ring. Whether a visitor clicks your custom emoji /next link or your custom emoji /previous link, the webring, in fact, sends you to another active site in the ring at random.

So what's next? I think the webring can function fine without these IDs. The copy-paste webring links can become identical for everyone, and the directory, sign-in, and dashboard pages don't make use of them at all. The one exception is individual profile page URLs for member pages, which I think I can safely drop.

What do you think? Are you horrified by any of this? Enraged? Got an ideas I can try instead? Drop me a reply, I'd love to hear from you!

#🕸️💍 #webring #IndieWeb #update #emoji

🔗 April 26, 2023 at 11:38PM EDT • by

Marty McGuire

Site updates: showing emoji reactions

Inspired by Eddie Hinkle's recent post about viewing webmentions, I decided to improve the way I display webmentions on my site.

TL;DR, my site now pulls attempts to recognize single-emoji comments and display them as a "Reaction".

Slightly longer version - my site uses webmention.io for handling webmentions, and I use brid.gy to backfeed interactions from Facebook to my own site. The way brid.gy handles Facebook reactions other than the standard "like" is a little quirky - they show up in webmention.io as a "reply" with a single emoji as the "content".

Using the Ruby twemoji library, my site checks the "content" of a reply against the emoji index and, if the content is a single emoji, pulls it out of the usual "reply" display and puts it in a facepile. The emoji itself is shown as an icon in the corner of the little face image.

Example of some ❤️ reactions from Facebook

While I was at it, I cleaned up a lot of my webmention-handling template to make things much clearer. This will make things easier for folks that want to re-use this code when I (eventually) release this as a Jekyll plugin.

Tonight is a Homebrew Website Club night, but Baltimore is not having another official meetup until April 19th. Still, I wanted to get something done to continue my deal with Jonathan to post something IndieWeb related at least once per week.

#site-update #IndieWeb #Webmention #emoji

🔗 April 5, 2017 at 4:25PM EDT • by