Skip to content
This repository has been archived by the owner on Oct 28, 2020. It is now read-only.

jfilter/scrape-gutenberg-de

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Scrape Gutenberg DE

Scrape all Books from Projekt Gutenberg-DE. Usefull, i.e., if you need a large corpus of German text to do some serious language modeling.

Usage

git clone https://github.com/jfilter/scrape-gutenberg-de --depth 1
pipenv install
pipenv run scrapy runspider scrape.py -o data.json

License

MIT.