Marty McGuire

Posts Tagged silos

2021
Fri May 14
🔖 Bookmarked Pluralistic: 08 May 2021 – Pluralistic: Daily links from Cory Doctorow https://pluralistic.net/2021/05/08/copyfraud/#beethoven-just-wrote-music

“The resulting mess firmly favors attackers (wage stealers, fraudsters, censors, bullies) over defenders (creators, critics). Attackers don’t need to waste their time making art, which leaves them with the surplus capacity to master the counterintuitive “legal” framework.”

2019
Thu Nov 21

Amnesty International (rightly) calls out the damages of surveillance capitalism by Google and Facebook but, ouch, they serve up behavioral trackers for ad networks on their own content. 😩

Wed Sep 4

Facebook coming through with some pretty gross new UI changes, now adding little badge icons next to some people’s names indicating that they are a “Conversation Starter”, “Visual Storyteller”, and possibly others.

Really raises some questions for me. Mainly, what are their goals for this “feature”?

  • To boost engagement of people who want to earn a particular little badge? (“Oh, I’ve got to post more photos!”)
  • To boost engagement of people who see the badges? (“Ooh, I wanna join in these conversations they’re starting!”)
  • To make people without badges feel uninteresting?
2018
Tue May 22
🔖 Bookmarked IASC: The Hedgehog Review - Volume 20, No. 1 (Spring 2018) - Tending the Digital Commons: A Small Ethics toward the Future - http://www.iasc-culture.org/THR/THR_article_2018_Spring_Jacobs.php

“It is common to refer to universally popular social media sites like Facebook, Instagram, Snapchat, and Pinterest as “walled gardens.” But they are not gardens; they are walled industrial sites, within which users, for no financial compensation, produce data which the owners of the factories sift and then sell. Some of these factories (Twitter, Tumblr, and more recently Instagram) have transparent walls, by which I mean that you need an account to post anything but can view what has been posted on the open Web; others (Facebook, Snapchat) keep their walls mostly or wholly opaque. But they all exercise the same disciplinary control over those who create or share content on their domain.”

Thu May 17

Venmo announces they are removing features from their website over the coming months. The new usage policy literally includes the phrase “We updated our User Agreement to reflect that the use of Venmo on the Venmo.com website may be limited.”

Feels like a standard “silo’s gonna silo” moment, locking people into a mobile app versus making data available on the web. I do wonder how, if at all, it might be related to the requirements of GDPR.

Fri May 4

Leaving Netflix (and taking my data with me)

Netflix has been a staple in my life for years, from the early days of mailing (and neglecting) mostly-unscratched DVDs through the first Netflix original series and films. With Netflix as my catalog, I felt free to rid myself of countless DVDs and series box sets. Through Netflix I caught up on "must-see" films and shows that I missed the first time around, discovered unexpected (and wonderfully strange) things I would never have seen otherwise. Countless conversations have hinged around something that I've seen / am binging / must add to my list.

At times, this has been a problem. It's so easy to start a show on Netflix and simply let it run. At home we frequently spend a whole evening grinding through the show du jour. Sometimes whole days and weekends disappear. This be can true for more and more streaming services but, in my house, it is most true for Netflix. We want to better use our time, and avoid the temptation to put up our stocking'd feet, settle in, and drop out.

It's easy enough to cancel a subscription, and even easier to start up again later if we change our minds. However, Netflix has one, even bigger, hook into my life: my data. Literal years of viewing history, ratings, and the list of films and shows that I (probably) want to watch some day. I wanted to take that data with me, in case we don't come back and they delete it.

Netflix had an API, but after they shut it down in 2014, it's so far dead that even the blog post announcing the API shutdown is now gone from the internet.

Despite no longer having a formal API, Netflix is really into the single-page application style of development for their website. Typically this is a batch of HTML, CSS, and JavaScript that runs in your browser and uses internal APIs to fetch data from their service. Even better, they are fans of infinite scrolling, so you can open up a page like your "My List", and it loads more data as you scroll, until you've got it all loaded in your browser.

Once you've got all that information in your browser, you can script it right out of there!

After some brief Googling, I found a promising result about using the Developer Console in your browser to grab your My List data. That gave me the inspiration I needed to write some JavaScript snippets to grab the three main lists of data that I care about:

Each of these pages displays slightly different data, in HTML that can be extracted with very little javascript, and each loads more data as you scroll the page until it's all loaded. To extract it, I needed some code to walk the entries on the page, extract the info I want, store it in a list, turn the list into a JSON string, and the copy that JSON data to the clipboard. From there I can paste that JSON data into a data file to mess with later, if I want.

Extracting my Netflix Watch List

A screenshot of my watch list

The My List page has a handful of useful pieces of data that we can extract:

  • Name of the show / film
  • URL to that show on Netflix
  • A thumbnail of art for the show!

After eyeballing the HTML, I came up with this snippet of code to pull out the data and copy it to the clipboard:

(function(list){
  document.querySelectorAll('a.slider-refocus')
    .forEach(item => {
      list.push({
        title: item.getAttribute('aria-label'),
        url: item.getAttribute('href'),
        art: item.querySelector('.video-artwork').style.backgroundImage.slice(5,-2)
      })
    });
  copy(JSON.stringify(list, null, 2));
}([]));

The resulting data in the clipboard is an array of JSON objects like:

[
  {
    "title": "The Magicians",
    "url": "/watch/80092801?tctx=1%2C1%2C%2C%2C",
    "art": "https://occ-0-2433-2430.1.nflxso.net/art/b1eff/9548aa8d5237b3977aba4bddba257e94ee7b1eff.webp"
  },
  ...
]

I like this very much! I probably won't end up using the Netflix URLs or the art, since it belongs to Netflix and not me, but a list of show titles will make a nice TODO list, at least for the shows I want to watch that are not on Netflix.

Extracting my Netflix Ratings

A screenshot of my ratings list

More important to me than my to-watch list was my literal years of rating data. This page is very different from the image-heavy watch list page, and is a little harder to find. It's under the account settings section on the website, and is a largely text-based page consisting of:

  • date of rating (as "day/month/year")
  • title
  • URL (on Netflix)
  • rating (as a number of star elements. Lit up stars indicate the rating that I gave, and these can be counted to get a numerical value of the number of stars.)

The code I used to extract this info looks like this:

(function(list){
  document.querySelectorAll('li.retableRow')
    .forEach(function(item){
      list.push({
        date: item.querySelector('.date').innerText,
        title: item.querySelector('.title a').innerText,
        url: item.querySelector('.title a').getAttribute('href'),
        rating: item.querySelectorAll('.rating .star.personal').length
      });
    });
  copy(JSON.stringify(list, null, 2));
}([]));

The resulting data looks like:

[
  {
    "date": "9/27/14",
    "title": "Print the Legend",
    "url": "/title/80005444",
    "rating": 5
  },
  ...
]

While the URL probably isn't that useful, I find it super interesting to have the date that I actually rated each show!

One thing to note, although I wasn't affect by it, Netflix has replaced their 0-to-5 star rating system with a thumbs up / down system. You'd have to tweak the script a bit to extract those values.

Extracting my Netflix Watch History

A screenshot of my watch history page

One type of data I was surprised and delighted to find available was my watch history. This was another text-heavy page, with:

  • date watched
  • name of show (including episode for many series)
  • URL (on Netflix)

The code I used to extract it looked like this:

(function(list){
  document.querySelectorAll('li.retableRow')
    .forEach(function(item){
      list.push({
        date: item.querySelector('.date').innerText,
        title: item.querySelector('.title a').innerText,
        url: item.querySelector('.title a').getAttribute('href')
      });
    });
  copy(JSON.stringify(list, null, 2));
}([]));

The resulting data looks like:

[
  {
    "date": "3/20/18",
    "title": "Marvel's Jessica Jones: Season 2: \"AKA The Octopus\"",
    "url": "/title/80130318"
  },
  ...
]

Moving forward

One thing I didn't grab was my Netflix Reviews, which are visible to other Netflix users. I never used this feature, so I didn't have anything to extract. If you are leaving and want that data, I hope that it's similarly easy to extract.

With all this data in hand, I felt much safer going through the steps to deactivate my account. Netflix makes this easy enough, though they also make it extremely tempting to reactivate. Not only do they let you keep watching until the end of the current paid period – they tell you clearly that if you reactivate in 10 months, your data won't be deleted.

That particular temptation won't be a problem for me.

2017
Wed Oct 18

“Boost your post.” “Publish a post.”

Facebook, you sound desperate and pushy.

Thu Aug 10

Syndicating Audio Posts with WNYC's Audiogram Generator

I publish a few different podcasts and often find myself advertising new episodes by syndicating new posts to various social media silos.

Sadly, few social media services consider audio to be "a thing", despite often having robust support for video.

I'm certainly not the first person to notice this, and the fine folks at WNYC have taken this audio sharing problem head-on.

Enter the Audiogram Generator, an open source project that runs on NodeJS and uses FFMPEG to take samples from your audio files and munge them into short videos for sharing on social networks.

Here's a quick rundown of how I got the Audiogram Generator running on my macOS laptop using Docker.

I use Homebrew, so first I installed docker and docker-machine and created a new default machine:

  brew install docker docker-machine
  docker-machine create -d virtualbox default

Once that finished, I set my environment variables so the docker command line utility can talk to this machine:

eval $(docker-machine env)

Next, it was time to download the source for the audiogram generator from GitHub and build the Docker container for it:

  git clone https://github.com/nypublicradio/audiogram.git
  cd audiogram
  docker build -t audiogram .

Finally, I could run it:

  docker run -p 8888:8888 -t -i audiogram
  npm start

Once up, I pointed my browser at http://192.168.99.100:8888/ and I saw pretty much the interface that you see in the screenshot above.

The basic usage steps are:

  • Choose an audio file
  • Choose a template
    • Templates w/ images are hardcoded into the app, so if you want to use them with your own images you'll have to make changes to the source.
  • Choose a selection of the audio that is less than 300 seconds long
  • Add any text if the template requires it
  • Generate!
  • Download
  • Upload to silos!

I made a sample post to my own site using a selection of an interview and then syndicated that post by uploading the same video to Twitter, Facebook, and Mastodon.

I don't yet know exactly how I'll choose what portions to share on each silo, what text and links to accompany them to encourage folks to listen to the full episodes, and so on. There are also some quirks to learn. For example, Twitter has a maximum length of 2:20 for videos, and its cropping tool would glitch out and reset to defaults unless I stopped it "near" the end.

Thankfully, there is a very detailed Audiogram Generator usage doc with lots of examples and guidelines for making attention-getting posts.

For the near term I want to play with the tool to see what kinds of results I can make. Long-term I think this would be a really neat addition to my Screech tool, which is designed for posting audio to your own website.

How do you feel about audiograms? I'd love to hear other folks' thoughts!

Wed May 17
🔖 Bookmarked Notes From An Emergency http://idlewords.com/talks/notes_from_an_emergency.htm

“Silicon Valley brings us the worst of two economic systems: the inefficiency of a command economy coupled with the remorselessness of laissez-faire liberalism.”

Tue Apr 25

Site Updates: Importing Old Posts, Disqus Comments

Jonathan Prozzi and I have challenged one another to make a post about improving our websites once a week. Here's mine!

Back in 2008 I started a new blog on Wordpress. It seemed like a good idea! Maybe I would post some useful things and someone would offer me a job! I wanted to allow discussion without the dangers of letting strangers submit data directly to my server, so I set up the JavaScript-based Disqus comments service. I made a few posts per year and it eventually tapered off and I largely forgot about it.

In February 2011 I participated in the Thing-a-Day project on Posterous. It was the first time in a long time that I had published consistently, so when it was announced that Posterous was going away, I worked hard to grab my content and stored it somewhere.

Eventually it was November 2013, Wordpress was "out", static site generators were "in", and I wanted to give Octopress a try. I used Octopress' tools to import all my Wordpress content into Octopress, forgot about adding back the Disqus comments, and posted it all back online. In February 2014, I decided to resurrect my Posterous content, so I created posts for it and got everything looking nice enough.

In 2015 I learned about the IndieWeb, and decided it was time for a new approach to my identity and content online. I set up a new site at https://martymcgui.re/ based on Jekyll (hey! static sites are still "in"!) and got to work adding IndieWeb features.

Well, today I decided to get some of that old content off my other domain and into my official one. Thankfully, with Octopress being based on Jekyll, it was mostly just a matter of copying over the files in the _posts/ folder. A few tweaks to a few posts to make up for newer parsing in Jekyll, my somewhat odd URL structure, etc., and I was good to go!

"Owning" My Disqus Comments

Though I had long ago considered them lost, I noticed that some of my old posts had a section that the Octopress importer had added to the metadata of my posts from Wordpress:

meta:
  _edit_last: '1'
  _wp_old_slug: makerbot-cam-1-wiring
  dsq_thread_id: '604226727'

All of my Wordpress posts had this dsq_thread_id value, and that got me thinking. Could I export the old Disqus comment data and find a way to display it on my site? (Spoiler alert: yes I could).

Disqus actually has a export feature: https://disqus.com/admin/discussions/export/

You can request a compressed XML file containing all of your comment data, organized hierarchically into "category" (which I think can be configured per-site), "thread" (individual pages), and "post" (the actual comments), and includes info such as author name and email, the date it was created, the comment message with some whitelisted HTML for formatting and links, whether the comment was identified as spam or has been deleted, etc.

The XML format was making me queasy, and Jekyll data files often come in YAML format for editability, so I did the laziest XML to YAML transform possible, thanks to some Ruby and this StackOverflow post.

require 'active_support/core_ext/hash/conversions'
require 'yaml'
file = File.open("disqus_export.xml", "r")
hash = Hash.from_xml(file.read)
yaml = hash.to_yaml
File.open("disqus.yml", "w") { |file| file.write(yaml) }

This resulted in a YAML formatted file that looked like:

``` --- disqus: xmlns: http://disqus.com xmlns:dsq: http://disqus.com/disqus-internals xmlns:xsi: http://www.w3.org/2001/XMLSchema-instance xsi:schemaLocation: http://disqus.com/api/schemas/1.0/disqus.xsd http://disqus.com/api/schemas/1.0/disqus-internals.xsd category: dsq:id: ... forum: ... ... ... ```

I dropped this into my Jekyll site as _data/disqus.yml, and ... that's it! I could now access the content from my templates in site.data.disqus.

I wrote a short template snippet that, if the post has a "meta" property with a "dsq_thread_id", to look in site.data.disqus.disqus.post and collect all Disqus comments where "thread.dsq:id" was the same as the "dsq_thread_id" for the post. If there are comments there, they're displayed in a "Comments" section on the page.

So now some of my oldest posts have some of their discussion back after more than 7 years!

Here's an example post: https://martymcgui.re/2010/02/16/000000/

Example of old Disqus comments on display.

I was (pleasantly) surprised to be able to recover and consolidate this older content. Thanks to past me for keeping good backups, and to Disqus for still being around and offering a comprehensive export.

As a bonus, since all of the comments include the commenter's email address, I could give them avatars with Gravatar, and (though they have no URL to link to) they would almost look right at home alongside the more modern mentions I display on my site.

Update: Yep, added Gravatars.

Old Disqus comments now with avatars by Gravatar