Skip to content
ptwobrussell edited this page Mar 8, 2011 · 1 revision

1-0 Retweet visualization

2-2 Scraping XFN content from a web page

2-4 Using a breadth-first search to crawl XFN links

2-6 Extracting geo data from MapQuest Local

2-7 Parsing hRecipe data for a Pad Thai recipe

2-9 Parsing hReview data for a Pad Thai recipe

3-3 Converting an mbox to a more convenient JSON structure

3-5 A short script that demonstrates loading JSON data into CouchDB

3-6 A simple mapper that uses Python to map documents by their date/time stamps

3-7 Using a mapper and a reducer to count the number of messages written by date

3-8 Mapping and reducing by sender and recipient

3-9 Sorting documents by key by using a transpose mapper and exporting to another database

3-14 Creating discussion threads from mbox data via “jwz threading”

3-16 Using a thread pool to maximize read throughput from CouchDB

3-18 A robust approach for threading together discussion threads from mbox data

3-20 The data format expected by the SIMILE Timeline

4-2 Using OAuth to authenticate and grab some friend data

4-3 Example 4-2 refactored to use two common utilities for OAuth and making API requests

4-4 Harvesting, storing, and computing statistics about friends and followers

4-5 Resolving basic user information such as screen names from IDs

4-7 Finding common friends/followers for multiple Twitterers, with output that’s easier on the eyes

4-8 Crawling friends/followers connections

4-9 Calculating a Twitterer’s most popular followers

4-10 Exporting friend/follower data from Redis to NetworkX for easy graph analytics

4-11 Using NetworkX to find cliques in graphs

4-13 Grabbing data from the Infochimps Strong Links API

4-15 Visualizing graph data with Ubigraph

5-2 Extracting tweet entities with a little help from the twitter_text package

5-3 Harvesting tweets from a user or public time line

5-4 Extracting entities from tweets and performing simple frequency analysis

5-5 Finding @mention tweet entities that are also friends

5-7 Using couchdb-lucene to query tweet data

5-9 Reconstructing tweet discussion threads

5-11 Counting the number of times Twitterers have been retweeted by someone

5-12 Finding the tweets that have been retweeted most often

5-13 Counting hashtag entities in tweets

5-14 Harvesting tweets for a given query

5-15 Computing the set intersection of lines in files

5-17 Generating the data for an interactive tag cloud using WP-Cumulus

6-1 Simple normalization of company suffixes from address book data

6-2 Standardizing common job titles and computing their frequencies

6-5 Using built-in distance metrics from NLTK to compare small sets of items

6-6 Clustering job titles using a greedy heuristic

6-11 A minor modification to Example 6-6 that uses cluster.HierarchicalClustering instead of a greedy heuristic

6-12 Harvesting extended profile information for your LinkedIn contacts

6-14 Geocoding the locations of your LinkedIn contacts and exporting them to KML

7-1 Harvesting Google Buzz data

7-4 Running TF-IDF on sample data

7-5 Querying Google Buzz data with TF-IDF

7-7 Finding similar documents using cosine similarity

7-9 Using NLTK to compute collocations in a similar manner to the nltk.Text.collocations demo functionality

7-11 A template for connecting to IMAP using OAuth

7-12 A simple workflow for extracting the bodies of Gmail messages returned from a search

8-1 Harvesting blog data by parsing feeds

8-2 Using NLTK’s NLP tools to parse blog data

8-3 A document summarization algorithm

8-4 Augmenting the output of Example 8-3 to produce HTML markup that lends itself to analyzing the summarization algorithm’s results

8-5 Extracting entities from a text with NLTK

8-7 Discovering interactions between entities

8-9 Modification of script from Example 8-7

9-1 Getting an OAuth 2.0 access token for a desktop app

9-5 Querying the Open Graph for “programming” groups

9-11 Encapsulating FQL queries with a small Python class abstraction

9-13 Harvesting and munging friends data for the JIT’s RGraph visualization

9-14 Harvesting and munging data for the JIT’s Sunburst visualization

9-15 Exporting data so that it can easily be loaded into a spreadsheet for analysis

9-16 Harvesting and munging data to visualize mutual friends with a particular group

9-18 Harvesting data and computing the target JSON as displayed in Example 9-17

9-19 Harvesting and munging data for visualization as a WP-Cumulus tag cloud

Clone this wiki locally