MadMode

Dan Connolly's tinkering lab notebook

Ubuntu 5.10, archive.org, and .torrent files

I'm ready to say goodbye to my copy of Ubuntu 5.10 for i386 on CD, after nearly 2 decades of keeping it as a combination keepsake and software supply chain anchor. I donated it to archive.org:

Ubuntu 5.10

While brainstorming about Merkle trees for file access, I noticed that not only does archive.org OCR the PDF I gave them of the cover and support browsing of the contents of Ubuntu 5.10 i386.iso, but they provide ubuntu-5.10-pc_archive.torrent, which means I can have high-performance access to the the full contents of the CDs for just 29k of storage. And brave supports .torrent files natively with WebTorrent (WebTorrent Tutorial looks pretty straightforward.)

But what's in that .torrent file? Aha! bencode from BEP 3! I've heard of it in OCapN discussion but didn't realize it comes from bittorrent. BitTorrent bencode format tools is really handy, including stopping in a debugger to see the details:

image

BCode.hs from haskell-torrent has a crisp specification:

-- | BCode represents the structure of a bencoded file
data BCode = BInt Integer                       -- ^ An integer
           | BString B.ByteString               -- ^ A string of bytes
           | BArray [BCode]                     -- ^ An array
           | BDict (M.Map B.ByteString BCode)   -- ^ A key, value map
  deriving (Show, Eq)

...

-- | Return the hash of the info-dict in a torrent file
hashInfoDict :: BCode -> IO Digest
hashInfoDict bc =
  do ih <- case info bc of
              Nothing -> fail "Could not find infoHash"
              Just x  -> return x
     let encoded = encode ih
     digest $ L.fromChunks $ [encoded]

Playing with parse-torrent in a parse-torrent-ubuntu-5.10 project on StackBlitz is handy in that it shows the infoHash, b890d2e1174a809d1cd0437de30400c542e0a939, but its JSON output misled me about the real structure: there is no infoHash key in the file; there's an info dictionary that gets hashed.

Say... Ubuntu offers bittorrent as a download option; maybe they keep a 5.10 .torrent file around? I didn't find one from them, but I did find:

Note the Source; yes, Internet Archive ingests BitTorrents.

Somehow my Ubuntu 5.10 i386.iso is 632,262 kb, which is 300 kb larger than theirs (631,962 kb). Maybe some unused space captured by gnome-disk-utility when I ripped the CD?