Skip to content

Web scraping tool for data mining analysis. Contains a python script to perform the web scraping and the necessary instructions for GigaBlast Search Engine setup.

Notifications You must be signed in to change notification settings

data-science-work/data_mining_project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Web Scraping Tool For Data Mining

What does it do?

The application search the Internet utilizing the Gigablast search engine with terms that the user types to a csv file located at the root of the application. After the search is performed, the application creates txt files that can be imported to any platform for data mining analysis.

What do you need to run the app?

In order to run the app you need to create a Gigablast account. You can do so at Gigablast. It is a $5.00 fee. Gigablast charges $0.99 per 1000 queries. After you create the account, you need to set your userid and your code in the my_params variable inside text_mining.py.

UserId and Gigablast code

You will also need Python3 installed in your computer. You can visit the Python website for instructions on how to install Python.

Instructions on how to setup the running environment.

Once you have Python3 installed on your computer, and Python3 is in your global environment; instructions on how to make Python3 global in your environment here, you can clone the repo to your computer.

$ git clone https://github.com/diazgilberto/data_mining_project.git

After cloning the repo, open your terminal, cd to data_mining_project, and open the query.csv. Modify the file with the terms you want to search, save and close the file. On your terminal run the following command... python3 text_mining.py or python text_mining.py

it depends on how you setup python3 in your global environment. Shortly after you start the app, you will see status messages in the terminal. You will also notice that new .txt files are getting created in the root of data_mining_project. After the process is finished, your files a ready to performed data mining analysis.

HAPPY ANALYZING!!!!

About

Web scraping tool for data mining analysis. Contains a python script to perform the web scraping and the necessary instructions for GigaBlast Search Engine setup.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages