-
Notifications
You must be signed in to change notification settings - Fork 490
Numbered examples
2-2 Scraping XFN content from a web page
2-4 Using a breadth-first search to crawl XFN links
2-6 Extracting geo data from MapQuest Local
2-7 Parsing hRecipe data for a Pad Thai recipe
2-9 Parsing hReview data for a Pad Thai recipe
3-3 Converting an mbox to a more convenient JSON structure
3-5 A short script that demonstrates loading JSON data into CouchDB
3-6 A simple mapper that uses Python to map documents by their date/time stamps
3-7 Using a mapper and a reducer to count the number of messages written by date
3-8 Mapping and reducing by sender and recipient
3-9 Sorting documents by key by using a transpose mapper and exporting to another database
3-14 Creating discussion threads from mbox data via “jwz threading”
3-16 Using a thread pool to maximize read throughput from CouchDB
3-18 A robust approach for threading together discussion threads from mbox data
3-20 The data format expected by the SIMILE Timeline
4-2 Using OAuth to authenticate and grab some friend data
4-3 Example 4-2 refactored to use two common utilities for OAuth and making API requests
4-4 Harvesting, storing, and computing statistics about friends and followers
4-5 Resolving basic user information such as screen names from IDs
4-7 Finding common friends/followers for multiple Twitterers, with output that’s easier on the eyes
4-8 Crawling friends/followers connections
4-9 Calculating a Twitterer’s most popular followers
4-10 Exporting friend/follower data from Redis to NetworkX for easy graph analytics
4-11 Using NetworkX to find cliques in graphs
4-13 Grabbing data from the Infochimps Strong Links API
4-15 Visualizing graph data with Ubigraph
5-2 Extracting tweet entities with a little help from the twitter_text package
5-3 Harvesting tweets from a user or public time line
5-4 Extracting entities from tweets and performing simple frequency analysis
5-5 Finding @mention tweet entities that are also friends
5-7 Using couchdb-lucene to query tweet data
5-9 Reconstructing tweet discussion threads
5-11 Counting the number of times Twitterers have been retweeted by someone
5-12 Finding the tweets that have been retweeted most often
5-13 Counting hashtag entities in tweets
5-14 Harvesting tweets for a given query
5-15 Computing the set intersection of lines in files
5-17 Generating the data for an interactive tag cloud using WP-Cumulus
6-1 Simple normalization of company suffixes from address book data
6-2 Standardizing common job titles and computing their frequencies
6-5 Using built-in distance metrics from NLTK to compare small sets of items
6-6 Clustering job titles using a greedy heuristic
6-12 Harvesting extended profile information for your LinkedIn contacts
6-14 Geocoding the locations of your LinkedIn contacts and exporting them to KML
7-1 Harvesting Google Buzz data
7-4 Running TF-IDF on sample data
7-5 Querying Google Buzz data with TF-IDF
7-7 Finding similar documents using cosine similarity
7-11 A template for connecting to IMAP using OAuth
7-12 A simple workflow for extracting the bodies of Gmail messages returned from a search
8-1 Harvesting blog data by parsing feeds
8-2 Using NLTK’s NLP tools to parse blog data
8-3 A document summarization algorithm
8-5 Extracting entities from a text with NLTK
8-7 Discovering interactions between entities
8-9 Modification of script from Example 8-7
9-1 Getting an OAuth 2.0 access token for a desktop app
9-5 Querying the Open Graph for “programming” groups
9-11 Encapsulating FQL queries with a small Python class abstraction
9-13 Harvesting and munging friends data for the JIT’s RGraph visualization
9-14 Harvesting and munging data for the JIT’s Sunburst visualization
9-15 Exporting data so that it can easily be loaded into a spreadsheet for analysis
9-16 Harvesting and munging data to visualize mutual friends with a particular group
9-18 Harvesting data and computing the target JSON as displayed in Example 9-17
9-19 Harvesting and munging data for visualization as a WP-Cumulus tag cloud