Marty McGuire

Posts Tagged self-hosting

2019
Fri May 10

Archiving rooms from a Matrix.org Homeserver (including end-to-end encrypted rooms)

I'm in the middle of a Forever Project, migrating stuff and services off of an old server in my closet at home onto a new (smaller, better, faster!) server in my closet at home.

One such service is a Matrix.org Synapse homeserver that was used as a private Slack-alternative chat for my household, as well as a bridge to some IRC channels. I set it up by hand in haste some years ago and made some not-super-sustainable choices about it, including leaving the database in SQLite (2.2GB and feelin' fine), not documenting my DNS and port-forwarding setup very well, and a few other "oopsies".

I had been keeping the code up to date via "pip install" and the latest "master" tarballs, but when the announcement came about needing valid TLS for federation starting in 0.99.X, I wasn't sure if I was good to upgrade. (I later found out that I was okay, ha!)

I found some docs on the most recent ways to set up Matrix on a new server, and even on how to migrate from SQLite to PostgreSQL. However, I don't know if I'll be able to set aside the time to do it all at once, or if it'll be easier just to set it up fresh, or even if I need a homeserver right now. So, I decided to figure out how to make archives of the rooms I cared about, which included household conversations, recipes, and photos from around the house and on travels.

Overview

The process turned out to be pretty involved, which is why it gets a blog post! It boils down to needing these three things:

  • osteele/matrix-archive - Export a Matrix room message archive and photos.
  • matrix-org/pantalaimon - A proxy to handle end-to-end encrypted (E2EE) room content for matrix-archive
  • matrix-org/Olm - C library to handle the actual E2EE processing. Pantalaimon relies on this library and it's Python extensions.

Getting all the tools built required a pretty recent system, which my old server ain't. I ended up building and running them on my personal laptop, running Ubuntu 19.04.

Since both matrix-archive and pantalaimon are Python-based, I created a Python 3.7 virtualenv to keep everything in, rather than installing everything system-wide.

Olm

The Olm docs recommend building with CMake, but as someone unfamiliar with CMake I could get it to build and run tests, but could not actually get it installed on my system.

I ended up installing the main lib with:

  make && sudo make install

The Python extensions were a challenge and I am not sure that I remember all the details to properly document them here. I spent a good amount of time trying to follow the Olm instructions to get them installed into my Python virtualenv.

In the end, the pantalaimon install built its own version of the Python Olm extensions, so I'm going to guess this was enough for now.

Pantalaimon

The pantalaimon README was pretty straightforward, once I installed Olm system-wide. I activated my virtualenv and ran:

  python setup.py install

That resulted in a "pantalaimon" script installed in my virtualenv's bin dir, so I could (in theory) run it on the command line, pointing it at my running Synapse server:

  pantalaimon https://matrix.example.com:8448

That started a service on http://127.0.0.1:8009/ which matrix-archive would connect over, with pantalaimon handling all the E2EE decryption transparently.

matrix-archive

The matrix-archive setup instructions suggest using a dependency manager called "Pipenv" that I was not familiar with. I installed it in my virtualenv, then ran it to setup and install matrix-archive:

  pip install pipenv
  pipenv install

Pipenv "noticed" it was running in a virtualenv, and said so. This didn't seem to be much of a problem, but any command I tried to run with "pipenv run" would fail. I worked around this by looking in the "Pipfile" to see what commands were actually being run, and it turns out it was just calling specific Python scripts in the matrix-archive directory. So, I resolved to run those by hand.

MongoDB

matrix-archive requires MongoDB. I don't use it for anything else, so I had to "sudo apt install mongodb-server".

Running the Import

First, I set the environment variables needed by matrix-archive:

  export MATRIX_USER=<my username>
  export MATRIX_PASSWORD=<my password>
  export MATRIX_HOST=http://127.0.0.1:8009

Then confirmed it was working by getting a list of rooms with IDs:

  python list_rooms.py

I set up the list of room IDs in an environment variable:

  export MATRIX_ROOM_IDS=!room@server,!room2@server,...

And slurped in all the messages with:

  python import_messages.py

At the end, it said it had a bunch of messages. Hooray!

Running the Export

This is where things kind of ran off the rails. In trying to export messages I kept seeing Python KeyErrors about a missing 'info' key. It seems like maybe the Matrix protocol was updated to make this an optional key, but the upshot was that matrix-archive seemed to assume that every message with an image attached would have an 'info' with info about a thumbnail for that image.

Additionally, the script to download images had some naive handling for turning attachment URLs like "mxc://example.com/..." into downloadable URLs. Matrix supports DNS-based delegation, so you can say "the Matrix server for example.com is matrix.example.com:8448, and this script didn't handle that.

I did some nasty hacks to only get full-sized images, and from the right host:

  • updated the schema to return the full image URL instead of digging in for a thumbnail
  • added handling to export_messages.py to handle missing 'info', which was used to guess image mimetypes
  • added some hardcoding to map converted "mxc://" URLs to the right host.

Afterwards I was able to do an export of alllllll the images to a "images/" folder:

  python download_images.py --no-thumbnails

And could then export a particular room's history with:

  python export_messages.py --room-id ROOM-NAME --local-images --filename ROOM-NAME.html

Note that the "--room-id" flag above actually wants the human-readable room name, unless it's actually a room on the main matrix.org server.

Afterwards, I could open room-name.html in my browser, and see the very important messages and images I worked so hard to archive.

a message exchange. marty asks maktrobot to 'pug me'. maktrobot responds with an image of a pug.
Screenshot from the (very minimal) HTML export, including an image (of a pug, sourced by a chat bot).

What's Next?

For now, I'll be putting these files and images in a safe backup and not worrying about them too much, because I have them. I've already stopped my old Synapse server, and can tackle setting up the new one at my leisure. We've moved our house chats to Signal, and I've moved my IRC usage over to bridged Slack channels.

Running a Matrix Synapse homeserver for the past couple of years has been quite interesting! I really appreciate the hard working community (especially in light of their recent infrastructure troubles), and I recognize that it's a ton of work to build a federating network of real-time, private communication. I enjoyed the freedom of having my own chat service to run bots, share images, and discuss private moments without worrying about who might be reading the messages now or down the road.

That said, there are still some major usability kinks to work out. The end-to-end encryption rollout across homeservers and clients hasn't been the smoothest, and it can be an issue juggling E2EE keys across devices. I look forward to seeing how the community addresses issues like these in the future!

TL;DR - saving an archive of a room's history and files should not be this hard.

2018
Sat May 12
🔖 Bookmarked Pulling My Thangs In Haus — jackyalciné https://jacky.wtf/weblog/pulling-thangs-in-haus/

“Dokku is really nifty in its singular goal - a plugin based platform as a service thingy”

2016
Wed Nov 30

Configuring Woodwind with imageproxy

A couple of days ago I wrote up a set of instructions for setting up a self-hosted copy of Woodwind with nginx and upstart. Since then I noticed that many images were broken on the feeds I was looking at - a common problem when a site that is served with HTTPS is displaying images and other content from another site that is served with HTTP.

Broken images are so 90s. But, like, sad 90s.

I noticed that the main site woodwind.xyz was serving images through special URLs like:

https://woodwind.xyz/imageproxy/?url=https%3A%2F%2Fmartymcgui.re%2Fimages%2Flogo.jpg&op=noop&sig=...

Looking in the source code, I found that Woodwind has support for image proxies, which are neat little services that can help serve remote HTTP content over HTTPS, resize images on the fly, and more.

I'd been meaning to set up one of these services for my own site, so this seemed like a good time to jump in!

Since my server already has Go I chose Will Norris' imageproxy, which has a similar deployment setup to how I am already running Woodwind: Upstart manages a standalone process and nginx acts as a proxy to pass along requests.

Installation was fairly simple, once I had my GOPATH set up correctly:

go get https://github.com/willnorris/imageproxy

For running a persistent service, Will has an example Upstart configuration, which I modified a bit and placed in /etc/init/imageproxy.conf:

description "Image Proxy server"
start on (net-device-up)
stop on runlevel [!2345]

respawn
exec start-stop-daemon --start -c www-data --exec /home/imageproxy_user/go/bin/imageproxy -- \
    -addr localhost:4593 \
    -log_dir /var/log/imageproxy \
    -cache /var/cache/imageproxy \
    -signatureKey @/etc/imageproxy.key

Before starting up the service, there were a few extra steps:

Create /var/log/imageproxy and /var/cache/imageproxy and make sure they are owned by the www-data user.

Create the "signature key" in /etc/imageproxy.key. This is used to authorize each image request so that random folks can't proxy random stuff through your imageproxy. I used the command line openssl tool for this, with an extra pass through awk to remove the newline character that openssl spits out.

$ sudo openssl rand -base64 42 | awk 'BEGIN{ORS="";} {print} > /etc/imageproxy.key

I also made sure that /etc/imageproxy.key was owned and readable by www-data and no other user.

I could then start up the server with:

sudo start imageproxy

Next it was time to configure nginx to send proxied image requests along to imageproxy. I opened up the nginx woodwind.conf file that I had created and added a new location block:

location ~ ^/imageproxy/ {
# pattern match to capture the original URL to prevent URL
# canonicalization, which would strip double slashes
if ($request_uri ~ "/imageproxy/(.+)") {
set $path $1;
rewrite .* /$path break;
}
proxy_pass http://localhost:4593;
}

After restarting nginx, requests to https://woodwind.yourdomain.com/imageproxy/... would be forwarded to the imageproxy server.

Finally, it was time to configure Woodwind to use the proxy. I opened up woodwind.cfg and added two lines:

IMAGEPROXY_URL = '/imageproxy'
IMAGEPROXY_KEY = '...' # the contents of /etc/imageproxy.key

A quick restart of the Woodwind service, a browser refresh and I have images aplenty!

Thanks for reading! I hope this little HOWTO was useful. I look forward to more fun with imageproxy in my IndieWeb adventures. How might you be able to put imageproxy to use?

Mon Nov 28

Self-Hosting kylewm's Woodwind Indie Reader

One of my favorite aspects of the IndieWeb community is that when you get things "right" with your website, you often get a bunch of fun interoperability with other IndieWeb-compatible websites "for free". For example, the Micropub standard lets you use lots of different clients to post to your own site, and the Webmention standard lets sites notify one another of things like comments, event RSVPs, etc.

Fundamental to having these technologies work well together is microformats2 (mf2), a lightweight way of marking up "structural information" in HTML so that a machine can make (some) sense of the information, such as the name, url, and photo of the author, hints on the important pieces of content in a page, etc.

Getting these things "right" on my own website led me to look for a "Reader" that would make use of the mf2 data and attempt to display it in a meaningful way.

One of the popular readers I saw talked about in the #IndieWeb chat was Woodwind. It was easy to get started by logging in with my own website and then subscribing to my own site to get all my h-feeds, h-entrys, h-cards, etc. in a row. Recently, the hosted version of Woodwind at https://woodwind.xyz/ was down for a few days, so I set out to host my own.

Initial Setup

Thankfully, Woodwind is on GitHub and the Installation instructions are pretty good for getting started. Since I already had a server with the expected dependencies (Python3, PostgreSQL, and Redis), I was able to get a test site up and running in a few steps:

  1. clone the git repo
    git clone https://github.com/kylewm/woodwind.git
    cd woodwind
  2. create Python3 virtualenv and activate it
    virtualenv --python=/usr/bin/python3 venv
    source venv/bin/activate
  3. install the required Python libraries with pip
    sudo apt-get install python3-dev
    pip install -r requirements.txt
  4. as the postgres user, create the woodwind database and the database user that would access it

    $ sudo -u postgres createdb woodwind
    $ sudo -u postgres psql woodwind
    woodwind=# create user woodwind_user with password '...'
  5. copy woodwind.cfg.template to woodwind.cfg and edit it up

    SECRET_KEY = '...' SERVER_NAME = 'woodwind.yourdomain.com' SQLALCHEMY_DATABASE_URI = 'postgres://woodwind_user:DB_PASSWORD@localhost/woodwind'

  6. run the init_db.py script
    python init_db.py
    • at this point I discovered a typo in woodwind/views.py that was throwing errors - a missing parenthesis. once fixed, this ran fine.
    • I've created a pull request for this, so kylewm can merge it back in eventually.
  7. finally, use uwsgi to run the demo version
    uwsgi woodwind-dev.ini
  8. visit localhost:3000 in my browser and I could see that woodwind was running!

This had me off to a very good start, but I wanted to be able to visit my copy of Woodwind from anywhere using a public domain name, protect my activity from eavesdroppers on the network with HTTPS, and have Woodwind up and running reliably across server crashes, reboots, etc.

Setting up Woodwind with uwsgi, Upstart, and nginx

Woodwind is an application written in Python. uwsgi is an application server that can run that code on demand, efficiently. It is possible to run uwsgi by hand as we did above, but I wanted the service to be started and managed automatically by the operating system.

I run an Ubuntu server with the upstart process manager. So, I created an upstart configuration for Woodwind at /etc/init/woodwind.conf:

description "woodwind uwsgi instance"
start on runlevel []
stop on runlevel []
respawn
setuid woodwind_user
setgid woodwind_user
chdir /home/woodwind_user/woodwind
env LC_ALL=C.UTF-8
export LC_ALL
env LANG=C.UTF-8
export LANG
script
. venv/bin/activate
uwsgi --ini woodwind.ini
end script

With this, the uwsgi server should start up on boot to serve Woodwind, and I can now manage woodwind from the command line. For example:

$ sudo start woodwind woodwind start/running, process 14104 $ status woodwind woodwind start/running, process 14104 $ sudo restart woodwind woodwind start/running, process 14246 $ sudo stop woodwind woodwind stop/waiting

Since I wanted to use HTTPS to protect my activity on Woodwind from network eavesdropping, I used Let's Encrypt and their certbot tool to create an SSL certificate for my domain. The steps are:

  1. Create a DNS entry for woodwind.yourdomain.com to point to the public IP address of my server. This may take some time to propagate and certbot won't work until it has taken effect.
  2. Install certbot
  3. Stop nginx from running, temporarily.
  4. Use certbot to issue a
    ./certbot-auto certonly --standalone \
      --standalone-supported-challenges http-01 \
      -d woodwind.yourdomain.com

This resulted in an SSL certificate and key pair that I could use to encrypt traffic to this domain.

Next up, I need something to actually handle the HTTPS requests and pass them along to uwsgi. I used nginx for this because I was already using it on this server. In my nginx config directory, I created a woodwind.conf file:

upstream woodwind {
server unix:/tmp/woodwind.sock;
}
upstream woodwind_wss {
server localhost:8077;
}
server {
listen *:80;
server_name woodwind.maktro.net;
server_tokens off;
ssl on;
ssl_certificate /etc/letsencrypt/live/woodwind.yourdomain.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/woodwind.yourdomain.com/privkey.pem;
ssl_protocols TLSv1 TLSv1.1 TLSv1.2;
ssl_prefer_server_ciphers on;
ssl_session_cache shared:SSL:10m;
ssl_ciphers "EECDH+AESGCM:EDH+AESGCM:ECDHE-RSA-AES128-GCM-SHA256:AES256+EECDH:DHE-RSA-AES128-GCM-SHA256:AES256+EDH:ECDHE-RSA-AES256-GCM-SHA384:DHE-RSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-RSA-AES128-SHA256:ECDHE-RSA-AES256-SHA:ECDHE-RSA-AES128-SHA:DHE-RSA-AES256-SHA256:DHE-RSA-AES128-SHA256:DHE-RSA-AES256-SHA:DHE-RSA-AES128-SHA:ECDHE-RSA-DES-CBC3-SHA:EDH-RSA-DES-CBC3-SHA:AES256-GCM-SHA384:AES128-GCM-SHA256:AES256-SHA256:AES128-SHA256:AES256-SHA:AES128-SHA:DES-CBC3-SHA:HIGH:!aNULL:!eNULL:!EXPORT:!DES:!MD5:!PSK:!RC4";
ssl_dhparam /etc/ssl/certs/dhparam.pem;
root /home/woodwind_user/woodwind;
access_log /var/log/nginx/woodwind_access.log;
error_log /var/log/nginx/woodwind_error.log;
location /_updates {
uwsgi_pass woodwind_wss;
}
location / {
try_files /woodwind/static/$uri /frontend/$uri @woodwind;
}
location @woodwind {
uwsgi_pass woodwind;
include uwsgi_params;
uwsgi_buffering off;
}
}

This nginx configuration has some things worth noting:

  • In addition to running a process that answers regular HTTP requests on a unix socket at /tmp/woodwind.sock, Woodwind also runs a service that answers WebSocket traffic at localhost:8077 for nifty features like live updating the page in your browser when a feed is updated.
  • Woodwind serves some static files out of its /woodwind/static folder as well as the /frontend folder. I needed to install the dependencies in /frontend using npm:
    $ cd frontend
    $ npm install --nodev

After all this setup, I restarted nginx and was able to visit Woodwind in my browser!

I am happy with my setup so far. I am not quite sure yet if I did the WebSockets configuration correctly, but in general things seem to be working alright. I hope this information is useful to someone down the road, even if it is just future me.