Getting my Personal Finance data back with hCalendar and hCard
The Quicken Interchange Format (QIF) is notoriously inadequate for clean import/export. The instructions for migrating Quicken data across platforms say:
- From the old platform, dump it out as QIF
- On the new platform, read in the QIF data
- After importing the file, verify that account balances in your new Quicken for Mac 2004 data file are the same as those in Quicken for Windows. If they don't match, look for duplicate or missing transactions.
I have not migrated my data from Windows98 to OS X because of this mess. I use win4lin on my debian linux box as life-support for Quicken 2001.
Meanwhile, Quicken supports printing any report to a tab-separated file, and I found that an exhaustive transaction report represents transfers unambiguously. Since October 2000, when my testing showed that I could re-create various balances and reports from these tab-separated reports, I have been maintaining a CVS history of my exported Quicken data, splitting it every few years:
$ wc *qtrx.txt
4785 38141 276520 1990-1996qtrx.txt
6193 61973 432107 1997-1999qtrx.txt
4307 46419 335592 2000qtrx.txt
5063 54562 396610 2002qtrx.txt
5748 59941 437710 2004qtrx.txt
26096 261036 1878539 total
I started a little module on dev.w3.org... I call it Quacken currently, but I think I'm going to have to rename it for trademark reasons. I started with normalizeQData.py to load the data into postgress for use with saCASH, but then saCASH went Java/Struts and all way before debian supported Java well enough for me to follow along. Without a way to run them in parallel and sync back and forth, it was a losing proposition anyway.
Then I managed to export the data to the web by first converting it to RDF/XML:
qtrx93.rdf: $(TXTFILES) $(PYTHON) $(QUACKEN)/grokTrx.py $(TXTFILES) >$@
... and then using searchTrx.xsl (inside a trivial CGI script) that puts up a search form, looks for the relevant transactions, and returns them as XHTML. I have done a few other reports with XSLT; nothing remarkable, but enough that I'm pretty confident I could reproduce all the reports I use from Quicken. But the auto-fill feature is critical, and I didn't see a way to do that.
Then came google suggest and ajax. I'd really like to do an ajax version of Quicken.
I switched the data from CVS to mercurial a few months ago, carrying the history over. I seem to have 189 commits/changesets, of which 154 are on the qtrx files (others are on the makefile and related scripts). So that's about one commit every two weeks.
Mercurial makes it easy to keep the whole 10 year data set, with all the history, in sync on several different computers. So I had it all with me on the flight home from the W3C Tech Plenary in France, where we did a microformats panel. Say... transactions are events, right? And payee info is kinda like hCard...
So factored out the parts of grokTrx.py that do the TSV file handling (trxtsv.py) and wrote an hCalendar output module (trxht.py).
I also added some SPARQL-ish filtering, so you can do:
python trxht.py --account 'MIT 2000' --class 200009xml-ny 2000qtrx.txt
And get a little microformat expense report:
9/20/00 SEPTEMBERS STEAKHOUSE ELMSFORD NY MIT 2000 19:19 c [Citi Visa HI]/200009xml-ny 29.33 9/22/00 RAMADA INNS ELMSFORD GR ELMSFORD NY MIT 2000 3 nights c [Citi Visa HI]/200009xml-ny 603.96 9/24/00 AVIS RENT-A-CAR 1 WHITE PLAINS NY MIT 2000 c [Citi Visa HI]/200009xml-ny 334.45 1/16/01 MIT MIT 2000 MIT check # 20157686 dated 12/28/00 c [Intrust Checking]/200009xml-ny -967.74
Mercurial totally revolutionizes coding on a plane. There's no way I would have been as productive if I couldn't commit and diff and such right there on the plane. I'm back to using CVS for the project now, in order to share it over the net, since I don't have mercurial hosting figured out just yet. But here's the log of what I did on the plane:
changeset: 19:d1981dd8e140 user: Dan Connolly <connolly@w3.org> date: Sat Mar 4 20:48:44 2006 -0600 summary: playing around with places changeset: 18:9d2f0073853b user: Dan Connolly <connolly@w3.org> date: Sat Mar 4 18:21:35 2006 -0600 summary: fixed filter arg reporting changeset: 17:3993a333747b user: Dan Connolly <connolly@w3.org> date: Sat Mar 4 18:10:10 2006 -0600 summary: more dict work; filters working changeset: 16:59234a4caeae user: Dan Connolly <connolly@w3.org> date: Sat Mar 4 17:30:28 2006 -0600 summary: moved trx structure to dict changeset: 15:425aab9bcc52 user: Dan Connolly <connolly@w3.org> date: Sat Mar 4 20:57:17 2006 +0100 summary: vcards for payess with phone numbers, states changeset: 14:cbd30e67647a user: Dan Connolly <connolly@w3.org> date: Sat Mar 4 19:12:38 2006 +0100 summary: filter by trx acct changeset: 13:9a2b49bc3303 user: Dan Connolly <connolly@w3.org> date: Sat Mar 4 18:45:06 2006 +0100 summary: explain the filter in the report changeset: 12:2ea13bafc379 user: Dan Connolly <connolly@w3.org> date: Sat Mar 4 18:36:09 2006 +0100 summary: class filtering option changeset: 11:a8f550c8759b user: Dan Connolly <connolly@w3.org> date: Sat Mar 4 18:24:45 2006 +0100 summary: filtering in eachFile; ClassFilter changeset: 10:acac37293fdd user: Dan Connolly <connolly@w3.org> date: Sat Mar 4 17:53:18 2006 +0100 summary: moved trx/splits fixing into eachTrx in the course of documenting trxtsv.py changeset: 9:5226429e9ef6 user: Dan Connolly <connolly@w3.org> date: Sat Mar 4 17:28:01 2006 +0100 summary: clarify eachTrx with another test changeset: 8:afd14f2aa895 user: Dan Connolly <connolly@w3.org> date: Sat Mar 4 17:19:36 2006 +0100 summary: replaced fp style grokTransactions with iter style eachTrx changeset: 7:eb020cda1e67 user: Dan Connolly <connolly@w3.org> date: Sat Mar 4 16:16:43 2006 +0100 summary: move isoDate down with field routines changeset: 6:123f66ac79ed user: Dan Connolly <connolly@w3.org> date: Sat Mar 4 16:14:45 2006 +0100 summary: tweak docs; noodle on CVS/hg scm stuff changeset: 5:4f7ca3041f9a user: Dan Connolly <connolly@w3.org> date: Sat Mar 4 16:04:07 2006 +0100 summary: split trxtsv and trxht out of grokTrx changeset: 4:95366c104b42 user: Dan Connolly <connolly@w3.org> date: Sat Mar 4 14:48:04 2006 +0100 summary: idea dump changeset: 3:62057f582298 user: Dan Connolly <connolly@w3.org> date: Sat Mar 4 09:55:48 2006 +0100 summary: handle S in num field changeset: 2:0c23921d0dd3 user: Dan Connolly <connolly@w3.org> date: Sat Mar 4 09:38:54 2006 +0100 summary: keep tables bounded; even/odd days changeset: 1:031b9758304c user: Dan Connolly <connolly@w3.org> date: Sat Mar 4 09:19:05 2006 +0100 summary: table formatting. time to land changeset: 0:2d515c48130b user: Dan Connolly <connolly@w3.org> date: Sat Mar 4 07:55:58 2006 +0100 summary: working on plane
I used doctest unit testing quite a bit, and rst for documentation:
Usage
Run a transaction report over all of your data in some date range and print it to a tab-separated file, say, 2004qtrx.txt. Then invoke a la:
$ python trxht.py 2004qtrx.txt >,x.html
$ xmlwf ,x.html
$ firefox ,x.htmlYou can give multiple files, as long as the ending balance of one matches the starting balance of the next:
$ python trxht.py 2002qtrx.txt 2004qtrx.txt >,x.htmlSupport for SPARQL-style filtering is in progress. Try:
$ python trxht.py --class myclass myqtrx.txt >myclass-transactions.htmlto simulate:
describe ?TRX where { ?TRX qt:split [ qs:class "9912mit-misc"] }.Future Work
- add hCards for payees (in progress)
- pick out phone numbers, city/state names
- support a form of payee smushing on label
- make URIs for accounts, categories, classses, payees
- support round-trip with QIF; sync back up with RDF export work in grokTrx.py
- move the quacken project to mercurial
- proxy via dig.csail.mit.edu or w3.org? both?
- run hg serve on homer? swada? login.csail?
- publish hg log stuff in a _scm/ subpath; serve the current version at the top