Scrape Gutenberg DE

Scrape all Books from Projekt Gutenberg-DE. Usefull, i.e., if you need a large corpus of German text to do some serious language modeling.

Usage

git clone https://github.com/jfilter/scrape-gutenberg-de --depth 1
pipenv install
pipenv run scrapy runspider scrape.py -o data.json

MIT.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.editorconfig		.editorconfig
.gitignore		.gitignore
LICENSE		LICENSE
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
scrape.py		scrape.py