Speaking of Freebase, they've featured some work of mine that I never mentioned, so I suppose I should talk briefly about it.
Back at the end of 2008 I decided that after year of casually following Freebase that it was getting interesting enough to invest some time in learning it in a little more depth. Of course the only way to do that is hands-on, so I needed a project. I didn't want to start with an idea that had commercial potential (they're secret!) and I've got an interest in old places through my genealogy hobby, so I decided to load up the U. S. National Park Service's Register of Historic Places database. The source database is in dBase format, so grabbed a Python module to read it and started playing around with loading it into Freebase. Data reconciliation between two slightly crufty databases is a non-trivial issue, so I played around for quite a while on Freebase's sandbox before I was happy with the results and was ready to load it up on the production database.
Of course shortly after I got it all loaded the NPS released a new version of the database, so now I need to go back and update everything. That's OK though, because the first time around I'd only used the data to add types and properties to existing topics in Freebase (still over 10,000 topics with 100K+ facts). I hadn't created any new topics from scratch. This will be a good opportunity to load the entire database, at least to some level of significance (perhaps National and State, but not Local).
Another little project I did for Sunshine Week 2009 was add the Congressional Biography IDs (aka Library of Congress THOMAS IDs) to all the U. S. politicans. This ID is use in the online versions of all the bills that go through Congress, so is an important unique identifier.
Finally, another project which was just mentioned in the Freebase blog is my very first, very primitive Acre app, Untyped which can be used to find topics containing a specific keyword in their name which have no type assigned to them. Freebase is working hard to get as many topics as possible typed, so this tool can be used to help with that. Most of my other Freebase work has been done in Python, but this uses their new hosted templating engine. It's still a little rough around the edges, but has been improving a lot. Because it's hosted, you don't need to worry about running things on the Google App Engine or another hosting service.
More fun Freebase stuff in the pipe... Stay tuned!
No comments:
Post a Comment