Leaving Netflix (and taking my data with me)

Netflix has been a staple in my life for years, from the early days of mailing (and neglecting) mostly-unscratched DVDs through the first Netflix original series and films. With Netflix as my catalog, I felt free to rid myself of countless DVDs and series box sets. Through Netflix I caught up on "must-see" films and shows that I missed the first time around, discovered unexpected (and wonderfully strange) things I would never have seen otherwise. Countless conversations have hinged around something that I've seen / am binging / must add to my list.

At times, this has been a problem. It's so easy to start a show on Netflix and simply let it run. At home we frequently spend a whole evening grinding through the show du jour. Sometimes whole days and weekends disappear. This be can true for more and more streaming services but, in my house, it is most true for Netflix. We want to better use our time, and avoid the temptation to put up our stocking'd feet, settle in, and drop out.

It's easy enough to cancel a subscription, and even easier to start up again later if we change our minds. However, Netflix has one, even bigger, hook into my life: my data. Literal years of viewing history, ratings, and the list of films and shows that I (probably) want to watch some day. I wanted to take that data with me, in case we don't come back and they delete it.

Netflix had an API, but after they shut it down in 2014, it's so far dead that even the blog post announcing the API shutdown is now gone from the internet.

Despite no longer having a formal API, Netflix is really into the single-page application style of development for their website. Typically this is a batch of HTML, CSS, and JavaScript that runs in your browser and uses internal APIs to fetch data from their service. Even better, they are fans of infinite scrolling, so you can open up a page like your "My List", and it loads more data as you scroll, until you've got it all loaded in your browser.

Once you've got all that information in your browser, you can script it right out of there!

After some brief Googling, I found a promising result about using the Developer Console in your browser to grab your My List data. That gave me the inspiration I needed to write some JavaScript snippets to grab the three main lists of data that I care about:

Each of these pages displays slightly different data, in HTML that can be extracted with very little javascript, and each loads more data as you scroll the page until it's all loaded. To extract it, I needed some code to walk the entries on the page, extract the info I want, store it in a list, turn the list into a JSON string, and the copy that JSON data to the clipboard. From there I can paste that JSON data into a data file to mess with later, if I want.

Extracting my Netflix Watch List

A screenshot of my watch list

The My List page has a handful of useful pieces of data that we can extract:

  • Name of the show / film
  • URL to that show on Netflix
  • A thumbnail of art for the show!

After eyeballing the HTML, I came up with this snippet of code to pull out the data and copy it to the clipboard:

(function(list){
  document.querySelectorAll('a.slider-refocus')
    .forEach(item => {
      list.push({
        title: item.getAttribute('aria-label'),
        url: item.getAttribute('href'),
        art: item.querySelector('.video-artwork').style.backgroundImage.slice(5,-2)
      })
    });
  copy(JSON.stringify(list, null, 2));
}([]));

The resulting data in the clipboard is an array of JSON objects like:

[
  {
    "title": "The Magicians",
    "url": "/watch/80092801?tctx=1%2C1%2C%2C%2C",
    "art": "https://occ-0-2433-2430.1.nflxso.net/art/b1eff/9548aa8d5237b3977aba4bddba257e94ee7b1eff.webp"
  },
  ...
]

I like this very much! I probably won't end up using the Netflix URLs or the art, since it belongs to Netflix and not me, but a list of show titles will make a nice TODO list, at least for the shows I want to watch that are not on Netflix.

Extracting my Netflix Ratings

A screenshot of my ratings list

More important to me than my to-watch list was my literal years of rating data. This page is very different from the image-heavy watch list page, and is a little harder to find. It's under the account settings section on the website, and is a largely text-based page consisting of:

  • date of rating (as "day/month/year")
  • title
  • URL (on Netflix)
  • rating (as a number of star elements. Lit up stars indicate the rating that I gave, and these can be counted to get a numerical value of the number of stars.)

The code I used to extract this info looks like this:

(function(list){
  document.querySelectorAll('li.retableRow')
    .forEach(function(item){
      list.push({
        date: item.querySelector('.date').innerText,
        title: item.querySelector('.title a').innerText,
        url: item.querySelector('.title a').getAttribute('href'),
        rating: item.querySelectorAll('.rating .star.personal').length
      });
    });
  copy(JSON.stringify(list, null, 2));
}([]));

The resulting data looks like:

[
  {
    "date": "9/27/14",
    "title": "Print the Legend",
    "url": "/title/80005444",
    "rating": 5
  },
  ...
]

While the URL probably isn't that useful, I find it super interesting to have the date that I actually rated each show!

One thing to note, although I wasn't affect by it, Netflix has replaced their 0-to-5 star rating system with a thumbs up / down system. You'd have to tweak the script a bit to extract those values.

Extracting my Netflix Watch History

A screenshot of my watch history page

One type of data I was surprised and delighted to find available was my watch history. This was another text-heavy page, with:

  • date watched
  • name of show (including episode for many series)
  • URL (on Netflix)

The code I used to extract it looked like this:

(function(list){
  document.querySelectorAll('li.retableRow')
    .forEach(function(item){
      list.push({
        date: item.querySelector('.date').innerText,
        title: item.querySelector('.title a').innerText,
        url: item.querySelector('.title a').getAttribute('href')
      });
    });
  copy(JSON.stringify(list, null, 2));
}([]));

The resulting data looks like:

[
  {
    "date": "3/20/18",
    "title": "Marvel's Jessica Jones: Season 2: \"AKA The Octopus\"",
    "url": "/title/80130318"
  },
  ...
]

Moving forward

One thing I didn't grab was my Netflix Reviews, which are visible to other Netflix users. I never used this feature, so I didn't have anything to extract. If you are leaving and want that data, I hope that it's similarly easy to extract.

With all this data in hand, I felt much safer going through the steps to deactivate my account. Netflix makes this easy enough, though they also make it extremely tempting to reactivate. Not only do they let you keep watching until the end of the current paid period – they tell you clearly that if you reactivate in 10 months, your data won't be deleted.

That particular temptation won't be a problem for me.