26 March 2009

Google Summer of Code 2009 (GSoC2009)

If you know any students who are interested in open source software, the Google Summer of Code is a great opportunity. Encourage them to apply. The application period is open now and ends April 3.

One thousand students will be paid $4500 by Google for a summer of working on open source projects and will be mentored by experienced open source developers. To my mind, the experience and mentoring is almost more valuable than the case (although obviously that varies greatly depending on the economic situation of the student).

If you look at the list of projects, you'll see that there's something for every taste. Projects range from low-level bit banging in C on bare iron to bioinformatics to games to a wide variety of so-called "social" apps in a wide variety of different programming languages. Students and mentors come from almost one hundred different countries as well, so there's an enormous amount of diversity on that front as well.

I've been a mentor for three of the four years the program has been in existence (2006, 2007, 2008) and last year had the satisfaction of seeing one of my original students become a mentor himself. It's a lot of work, but very satisfying. Unfortunately my project won't be participating this year due to a combination of cutbacks at Google (about 10%) and a desire to rotate in new organizations, but the ArgoEclipse team would still love to mentor any new folks, students or other, who are interested in getting their feet wet with open source development.

11 March 2009

Freebase, open government, and enumerations

I'm preparing a short series of articles about Freebase, but Raymond Yee had a question about something I was working on over the weekend, so here's a quick hint to help him along.

What he calls "keys" are called "enumerated properties" in the Freebase documentation and there's an article on how to set them up. Unfortunately, the schema editor was broken when I was working on the National Register of Historic Places database schema, so I had to resort to reverse engineering things from the Explore view (accessible by pressing F8 on any page and scrolling to the bottom of the page) and then modifying the schema's property type by hand using their MQL query language. You can see the end result in the schema where item_number is typed as an enumeration.

There's also a good article on how to create a URL template that I used successfully to link to the original application submissions. For the Congressional Bioguide, it can be used to link back to the original biography.

Coincidentally and independently from Raymond's project, I was actually working on loading up all the Congressional Bioguide ID's last weekend because they are used in the XML form of legislation on THOMAS, which is run by the Library of Congress. I decided to take a slight detour to write a little name parser and Freebase name queryer in Python, so haven't actually gotten around to loading the IDs yet. One of the biggest problems in working with Freebase is reliably resolving personal names. They typically only have the main name that was used as the Wikipedia article name. There's really no telling what name form the article's editors will have chosen and even though the full name and some aliases are often identified in the opening sentence of the article, Freebase doesn't import this information from Wikipedia.