Dan Connolly's tinkering lab notebook

Divesting from Flickr in the Annual File Purge

I spent much of this year's annual file purge recovering from Flickr going back on their 1TB storage offer.

While tinkering with Genode and catching up on Metamath (RIP, Norm Megill), I made a lot of use of github issues as my lab notebook; I can search copies of my comments in my email since I'm a closet librarian and I don't trust cloud services completely. One of these searches led me to the pile of monthly "account in violation free account limits" nasty-grams building up since May:

Back in 2015, I mostly knew better, but I did take them up on their terabyte storage offer:

My photostream on flickr goes back to Dec 2004 when it was big in the open web community. I could never bring myself to go premium, but in May 2013 when they announced the terabyte storage offer, I dusted it off. - MadMode: Syncing a 5 Year iPhoto Library with flickr

I have about 50GB of files from a Flickr data request Feb 17, 2019 on an external SSD. I didn't take the time to keep the private data separate from the code and other detailed notes, but briefly, I

  1. verified access to 47GB of data from a March 2019 Flickr data request ( etc.) by copying it from an external SSD to an internal NVMe.
    • Why did that take 20 minutes? Oops... I used a USB2 cable and so lost USB3 "SuperSpeed"
  2. verified that I can recover a favorite album from iPhoto
    • dealt with the fact that the photo_NNN.json files don't actually contain the name of the abc_NNN_xyz.jpg media files
    • joined the flickr ids with iPhoto ids using records from the 2015 upload process
      • used nix to bring up a juypter notebook environment with the relevant goodies: nix-shell -p "python3.withPackages(ps: with ps; [ipython jupyter numpy pandas pillow flickrapi progress crc32c])"
    • made a simple HTML list of links to photos
    • reverse-engineered the way Web-iPhoto would make an album of those photos from iPhoto files:
      • wrote out albums and photos JSON
        • discovered the README docs were incomplete and the code needs tree too.
  3. verified that I can recover a favorite keyword from Apple photos
    • reified the keyword as a directory with symlinks to the relevant photo files
    • toured simonwillison's dogsheep-photos work and osxphotos while decoding the Apple photos database.
    • evaluated photoprism, an "AI-Powered Photos App for the Decentralized Web" in hopes that open source AI would help me curate interesting photos the way Apple's applied computer-vision research did for Simon
      • wow! nicely packaged!
      • bulk import with move option for canonical naming: 2015/10/20150510_015146_88F59DFB.jpg
        • that hash is a Castagnoli crc32c, the one with hardware support, not the one from the python stdlib.
  4. deleted all 10,000+ non-private photos using the flickr API so I'll stop getting those monthly nasty-grams.
    • learned to use the progress package to see that it would take about an hour

I made lots of diigo bookmarks and annotations too.

Footnote on Apple photos dates

Apple support discussion Apr 2015 gives us some clues about the database schema, which seems to be an Aperture database (apdb).

Aperture uses Core Data, which is a database-independent abstraction layer, and thus does not use the native SQLite encoding for dates (juliandate), but rather the NSDate format, which should be a double-precision number of seconds since the reference date (2001-01-01 00:00:00 GMT). -- majid 2011-05-03