Skip to content
Steve Mortimer edited this page Feb 17, 2017 · 12 revisions

Welcome to the #blessed team's Charlotte Web Scraper Wiki!

Charlotte Web crawler

Our Charlotte Web crawler is a program made not just with programmers in mind, but also those who want an easy way to automate their internet experience. Charlotte is an open-source project intended to be small and has only a few, but massively useful features.

Charlotte is composed of two main parts:

  1. A Chrome browser extension
  2. An API outward (similar to IFTTT)

Chrome Extension

A Chrome browser extension Allows user to specify some settings Set up fields (key-value pairs) to be returned by Charlotte Exports the code as a file (or as a copy/paste) Has an if-then architecture (maybe commands similar to SQL?) Select XXX if YYY else ZZZ then ... A number of libraries written in various languages (JS, Python, C, Java) Share mostly the same API Charlotte is essentially a simple virtual browser (not complex) Performs a series of steps (operations) in order to scrape information

API (all namespaced to Charlotte)

There are two main classes to Charlotte. A mother spider and a large number of tiny little Charlotte babies. The mother spider spins the web while the baby spiders walk on it, all the while they send useful information back via the vibrations on the web. I actually have no idea what I’m talking about (yet) so ignore this part and anything after it.

Whenever you navigate to a new URL, Charlotte sends a baby spider there and that baby spider communicates information back to Charlotte.

This is probably a good idea to implement because: It’s asynchronous (things take time) It allows you to chain methods and overwrite the prototypes

Ex: Charlotte.from(“google.com”).bring( { data_item : location … } ).if ( [ conditions ] )

Clone this wiki locally