17 August 2009

Breaking the 1 million barrier

I finished loading the updated National Register of Historic Places database into Freebase last week. In addition to containing the latest data released by the National Park Service, combined with the latest Wikipedia articles, this run created new topics where Freebase didn't have existing ones. You may remember that the initial run focused solely on reconciling existing Freebase topics.

Freebase should now have a complete copy of all National Register of Historic Places entries which are of International, National, or State significance. The Local significance listings still used the old strategy of only reconciling existing topics.

Below is a summary of the before and after counts. We picked up 4,535 entries which had either been added to Wikipedia, added to the Register, or both. On top of that we created another 20,553 entries, bringing the grand total to over 35,000 listings.

Starting Count Existing Topics Reconciled New Topics Created Ending Count
International 0 1 10 11
National 2010 699 4386 7095
State 2423 1121 16065 19609
Local 5978 2627 92 8690
TOTAL 10434 4535 20553 35518

Each topic contains a fair amount of information, so the entire load amount to about 750,000 "facts" (or "triples" in RDF-speak), bring the total number of facts that I've written to Freebase to over 1.1M. Unfortunately, their "tallybot" which does the nightly updating of totals has been broken for a while, so I'm only getting credited with a paltry 300K.

The one remaining loose end is to try and do a better job of reconciling the architects/builders and what the Park Service calls "significant people" associated with the listing. This will require human vetting of a queue of tasks, so it'll require some additional infrastructure to be put in place before I can set people loose on working on it.