Skip to content

Bash Scripts to automate the extraction of Wikipedia data dumps, parsing out the articles, have them read by Friend Computer via PIco2Wave/Lame and move those recordings into Internet Archives library.

License

Notifications You must be signed in to change notification settings

tomhiggins/WikipediaRadio

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 

Repository files navigation

WikipediaRadio

Bash Script to automate the extraction of Wikipedia data dumps, parsing out the articles, have them read by Friend Computer via PIco2Wave/Lame and move those recordings into Internet Archives library.

You can listen to the results at https://archive.org/details/WikipediaRadio

While this was made for a particular project I have kept the code as generic as possible so that anyone can use it to automate bulk renderings of text into mp3. The function to upload to the Internet Arachive can be easily disabled if that is not needed.

Required

-Lame

-Pico2Wave

-WikiExtractor - https://github.com/attardi/wikiextractor

-InternetArchive Library Tools - https://internetarchive.readthedocs.io/en/latest/installation.html

What is WikipediaRadio?

A growing collection of Wikipedia Articles read by Friend Computer.

Use for streaming, podcasts, radio, blind/sight impaired reading of wikipedia, mixing, mashing and other unlisted activities. Continued exposure may cause increased knowing.

()All articles read by Friend Computer are extracted from the enwiki-20170320-pages-articles data dump.

()Data was extracted using WikiExtractor https://github.com/attardi/wikiextractor

()Friend Computer uses Pico2Wave to read the extracted articles.

()mp3 recordings are moved into the greater Internet Archive library using the InternetArchive Library tools https://internetarchive.readthedocs.io/en/latest/installation.html

()All of the above is managed by a bash script created for the purpose of aiding Friend Computer's mission of knowledge distribution https://github.com/tomhiggins/WikipediaRadio

To Do

-add code to get and install required code sudo apt-get install lame libttspico-utils

-Add code for getting and extracting wikipedia data torrent or wget wikiextractor

About

Bash Scripts to automate the extraction of Wikipedia data dumps, parsing out the articles, have them read by Friend Computer via PIco2Wave/Lame and move those recordings into Internet Archives library.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages