The goal of the Fifty State Project is to build scrapers and parsers in order to get as much state legislative data as possible in one place.
For details on the reasons for the project and goals behind the project see the project announcement.
To stay up to date and communicate with other contributors to the project visit the Fifty State Project Google Group.
For an editable overview of each state's progress visit the Sunlight Labs Wiki.
- Collect URLs of State Legislature and Legislative Information Pages [done]
- Grab legislators and legislation
- Build scrapers and obtain data files for legislation in each of the fifty states
- create sponsor relationship between legislators and legislation
- Grab votes
- Build scrapers and obtain data files for legislator votes on legislation
- create voting relationship between legislators and legislation
- Build tools on top of data
To encourage as many contributions as possible we aren't saying "write in Python" or anything, but we do need the code to follow a few guidelines.
For details on how scripts should be written and how they should run see :doc:`scripts/pyutils/README` (or :doc:`scripts/rbutils/README`). For details on how data should be stored see :doc:`data/README`.
- Valid options:
--year
: a year or years the parser should attempt--all
: Attempt to parse years from 1969-2009--upper
: Parse upper chamber--lower
: Parse lower chamber
- The vision is that the flow will look something like this:
- $ ./scripts/nc/get_legislation --year=2009 --upper
If you are interested in contributing the recommended procedure is to check on the Sunlight Labs Wiki and in the repository to see where your state is. The next step is to announce your interest on the Fifty State Project Google Group (this is where you can ask questions and make suggestions regarding the project).
Once you have claimed a state on the wiki and mailing list you should probably maintain your own fork of the project on github.
Please avoid making changes to files in other states/etc. on your state branch. Stick to editing files in the scripts/your_state directory and where necessary in any relevant utils directories.
Whenever your state script works as it should announce it on the mailing list and someone will merge your changes into the core.
As of June 15th 2009 the Fifty State Project is licensed under the GPLv3 license
See LICENSING for the full terms of the GPLv3.
It is preferred for the sake of maintenance that scripts are written in Python, several Ruby scripts also exist if you are unfamiliar with Python.
If you are completely unfamiliar with Python or Ruby writing a scraper in another language is preferred over not contributing at all but given the number of scripts already written in Python you are strongly encouraged to consider it first.
- BeautifulSoup
- html5lib
- simplejson if on Python 2.5
- (this list is out of date, refer to specific scripts/state directories for dependencies)
- hpricot (gem install hpricot)
- fastercsv (gem install fastercsv)
- mechanize (gem install mechanize)