Skip to content

ultraeric/scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

vis

Environment Setup

  1. Clone repository into directory of your choice and navigate to vis folder (hereto referenced as <vis_home>)
  2. Navigate to .../<vis_home>/ and run the command sudo sh setup_env.sh.
  3. Run the command in the CL python3 setup.py install

Usage

To begin utilizing this module, first do import scraper. This imports scraper.Session which will keep track of the scraping session that you are currently running and handle multi-threading internally.

In the Scraper module there are two important classes: Scraper and Action. As a rule of thumb, Scraper is the information source, while Action is an action that acts on the information present in a Scraper.

Scraper

This class is the source of truth for any Action that acts on the scraper. Note that this means the Scraper itself does not actually do anything; you submit actions to this Scraper, initialize the Actions, and then run the queue.

Action

This class's instances act on a Scraper. To run an action immediately, you can use Action.execute(self). To spawn an action that attaches to the queue and runs when the resources are available, use Action.run(self). Running the latter will attach an action method to the Session queue which will run when an available thread can handle it.

To create custom actions, extend the Action class and override the Action.get_act(self, scraper) method. This should return a higher-order function that will be run and act on the information stored in the Scraper. It is possible to chain together Actions by creating, in the higher-order function, sub-actions and using Action.execute(scraper) to immediately run the action, thereby stringing together functionality and consolidating it into a single Action.

Examples

import scraper
queue = scraper.Session.action_queue
get_action = scraper.Scraper.Get_Action()
scraper1 = scraper.Scraper.Scraper(site = 'https://www.google.com/', actions = [get_action])
queue.populate_queue()
queue.run()

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published