Marty McGuire

Posts Tagged matrix.org

2019
Fri May 10

Archiving rooms from a Matrix.org Homeserver (including end-to-end encrypted rooms)

I'm in the middle of a Forever Project, migrating stuff and services off of an old server in my closet at home onto a new (smaller, better, faster!) server in my closet at home.

One such service is a Matrix.org Synapse homeserver that was used as a private Slack-alternative chat for my household, as well as a bridge to some IRC channels. I set it up by hand in haste some years ago and made some not-super-sustainable choices about it, including leaving the database in SQLite (2.2GB and feelin' fine), not documenting my DNS and port-forwarding setup very well, and a few other "oopsies".

I had been keeping the code up to date via "pip install" and the latest "master" tarballs, but when the announcement came about needing valid TLS for federation starting in 0.99.X, I wasn't sure if I was good to upgrade. (I later found out that I was okay, ha!)

I found some docs on the most recent ways to set up Matrix on a new server, and even on how to migrate from SQLite to PostgreSQL. However, I don't know if I'll be able to set aside the time to do it all at once, or if it'll be easier just to set it up fresh, or even if I need a homeserver right now. So, I decided to figure out how to make archives of the rooms I cared about, which included household conversations, recipes, and photos from around the house and on travels.

Overview

The process turned out to be pretty involved, which is why it gets a blog post! It boils down to needing these three things:

  • osteele/matrix-archive - Export a Matrix room message archive and photos.
  • matrix-org/pantalaimon - A proxy to handle end-to-end encrypted (E2EE) room content for matrix-archive
  • matrix-org/Olm - C library to handle the actual E2EE processing. Pantalaimon relies on this library and it's Python extensions.

Getting all the tools built required a pretty recent system, which my old server ain't. I ended up building and running them on my personal laptop, running Ubuntu 19.04.

Since both matrix-archive and pantalaimon are Python-based, I created a Python 3.7 virtualenv to keep everything in, rather than installing everything system-wide.

Olm

The Olm docs recommend building with CMake, but as someone unfamiliar with CMake I could get it to build and run tests, but could not actually get it installed on my system.

I ended up installing the main lib with:

  make && sudo make install

The Python extensions were a challenge and I am not sure that I remember all the details to properly document them here. I spent a good amount of time trying to follow the Olm instructions to get them installed into my Python virtualenv.

In the end, the pantalaimon install built its own version of the Python Olm extensions, so I'm going to guess this was enough for now.

Pantalaimon

The pantalaimon README was pretty straightforward, once I installed Olm system-wide. I activated my virtualenv and ran:

  python setup.py install

That resulted in a "pantalaimon" script installed in my virtualenv's bin dir, so I could (in theory) run it on the command line, pointing it at my running Synapse server:

  pantalaimon https://matrix.example.com:8448

That started a service on http://127.0.0.1:8009/ which matrix-archive would connect over, with pantalaimon handling all the E2EE decryption transparently.

matrix-archive

The matrix-archive setup instructions suggest using a dependency manager called "Pipenv" that I was not familiar with. I installed it in my virtualenv, then ran it to setup and install matrix-archive:

  pip install pipenv
  pipenv install

Pipenv "noticed" it was running in a virtualenv, and said so. This didn't seem to be much of a problem, but any command I tried to run with "pipenv run" would fail. I worked around this by looking in the "Pipfile" to see what commands were actually being run, and it turns out it was just calling specific Python scripts in the matrix-archive directory. So, I resolved to run those by hand.

MongoDB

matrix-archive requires MongoDB. I don't use it for anything else, so I had to "sudo apt install mongodb-server".

Running the Import

First, I set the environment variables needed by matrix-archive:

  export MATRIX_USER=<my username>
  export MATRIX_PASSWORD=<my password>
  export MATRIX_HOST=http://127.0.0.1:8009

Then confirmed it was working by getting a list of rooms with IDs:

  python list_rooms.py

I set up the list of room IDs in an environment variable:

  export MATRIX_ROOM_IDS=!room@server,!room2@server,...

And slurped in all the messages with:

  python import_messages.py

At the end, it said it had a bunch of messages. Hooray!

Running the Export

This is where things kind of ran off the rails. In trying to export messages I kept seeing Python KeyErrors about a missing 'info' key. It seems like maybe the Matrix protocol was updated to make this an optional key, but the upshot was that matrix-archive seemed to assume that every message with an image attached would have an 'info' with info about a thumbnail for that image.

Additionally, the script to download images had some naive handling for turning attachment URLs like "mxc://example.com/..." into downloadable URLs. Matrix supports DNS-based delegation, so you can say "the Matrix server for example.com is matrix.example.com:8448, and this script didn't handle that.

I did some nasty hacks to only get full-sized images, and from the right host:

  • updated the schema to return the full image URL instead of digging in for a thumbnail
  • added handling to export_messages.py to handle missing 'info', which was used to guess image mimetypes
  • added some hardcoding to map converted "mxc://" URLs to the right host.

Afterwards I was able to do an export of alllllll the images to a "images/" folder:

  python download_images.py --no-thumbnails

And could then export a particular room's history with:

  python export_messages.py --room-id ROOM-NAME --local-images --filename ROOM-NAME.html

Note that the "--room-id" flag above actually wants the human-readable room name, unless it's actually a room on the main matrix.org server.

Afterwards, I could open room-name.html in my browser, and see the very important messages and images I worked so hard to archive.

a message exchange. marty asks maktrobot to 'pug me'. maktrobot responds with an image of a pug.
Screenshot from the (very minimal) HTML export, including an image (of a pug, sourced by a chat bot).

What's Next?

For now, I'll be putting these files and images in a safe backup and not worrying about them too much, because I have them. I've already stopped my old Synapse server, and can tackle setting up the new one at my leisure. We've moved our house chats to Signal, and I've moved my IRC usage over to bridged Slack channels.

Running a Matrix Synapse homeserver for the past couple of years has been quite interesting! I really appreciate the hard working community (especially in light of their recent infrastructure troubles), and I recognize that it's a ton of work to build a federating network of real-time, private communication. I enjoyed the freedom of having my own chat service to run bots, share images, and discuss private moments without worrying about who might be reading the messages now or down the road.

That said, there are still some major usability kinks to work out. The end-to-end encryption rollout across homeservers and clients hasn't been the smoothest, and it can be an issue juggling E2EE keys across devices. I look forward to seeing how the community addresses issues like these in the future!

TL;DR - saving an archive of a room's history and files should not be this hard.

2017
Fri May 26

Site Updates: Webmention Notifications in Matrix with Hubot

Jonathan Prozzi and I have challenged one another to make a post about improving our websites once a week. This is me getting back on the train!

In a previous site update I wrote about setting up a system to notify me whenever my site received webmentions. Essentially, this meant that I could now get notifications on my phone and desktop whenever somebody interacted with my site, such as: replying to one of my posts on their own site, retweeting or favoriting one of my posts, or even RSVPs to my Facebook events.

One thing I didn't super like about this system is that it used the Pushbullet service which, while great, is not under my control.

I've been running a Matrix chat server at home for a while now. I primarily use it to chat with people in my household in IRC channels. I use a really nice client for Matrix called Riot, which runs in the browser, but is also available on Android and iOS, and is capable of sending notifications about chat events, which I have found really handy.

Recently, I've added a chatbot to my Matrix server named Hubot, thanks to the Hubot-Matrix adapter. Hubot is super neat because it is fairly easy to script up new behaviors, and it has nice built-in support for the web - both for making web requests, but Hubot also runs a server for accepting web requests. Once I realized this, it occurred to me that I could replace my previous notification system that uses Pushbullet with one that goes through Hubot.

First, a note on security. Exposing a chatbot's HTTP listener interface to the great wide internet comes at some risk! I made sure to the following:

  • I run Hubot behind a firewall, so no plain HTTP traffic can come directly across the internet.
  • Using another home server, I set up nginx to act as a secure HTTPS proxy, using a certificate from Let's Encrypt to encrypt all traffic that goes over the internet.
  • I decided that any behaviors I write for Hubot that use the HTTP listener will use some kind of secret token to ensure that the request is valid. I don't want spammers blowing up my chatrooms!

I decided that the bot should:

  • Allow a user to request webmention.io notifications for a given site into any room.
  • Generate and store a "callback secret" to work with webmention.io's Web Hook system and tell the user the URL and callback secret to configure over on the Webmention.io Dashboard.
  • Accept HTTP requests from webmention.io at something like <HUBOT_HOST>/hubot/wmio/notify
  • Verify that the request contains the callback secret
  • Generate a nice text summary of the notification based on its contents
  • Send the notification to the room that the user was in when they made the follow request.

With that in mind, I began learning lots about testing Hubot scripts, refreshing myself on Coffeescript, and so on.

I am now happy to introduce this first (janky) release of my Hubot Script, hubot-webmentionio-notify!

Once installed, you can start a conversation with your hubot and ask it to follow a site:

  you> hubot wmio follow mycoolsite.biz
  
hubot> @you OK! Use this as your Web Hook: <HUBOT_URL>/hubot/wmio/notify And use this as your callback secret: 1a2b3c4d5e6f7890000

The string "mycoolsite.biz" can actually be anything and should be something easy to remember in case you want to unfollow notifications later. Hubot doesn't check incoming mentions against it at the moment.

You can enter the URL and callback secret in the Webmention.io dashboard, and future webmentions will be sent to your Hubot and output into the room of your choice.

Notification example - a user on Twitter mentioned my Twitter handle in a post there.

I don't know how useful hubot-webmentionio-notify will be for other folks at the moment, but I am excited be getting these notifications via services that I control. I look forward to building more fun things with Hubot!