Task Manager's task implementation

This page is to show you how to construct a task for the task manager to run

Description

Typically, a runnable/task will have these three parts as support:

crawler - crawl data from selected website
extractor - extract the downloaded data and get the information you need
dumper - put the information into the database

These parts can be altered if the task is for a different purpose, e.g. classification.
In the task file itself, there usually is only a run function which can be called in the Task Manager.

Implementation

Normally, you do the pipeline file by file.
For each file you want from the website:

first you crawl that file(using wget or request)
then extract and dump it
finally you delete that file to save space
move on to the next file you are going to get

Remember to put logging information to catch the possible exceptions in the task you are working on

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Task Manager's task implementation

Description

Implementation

Clone this wiki locally