To get the latest version of the books, see the latest release.
They are also in the goodies
directory. Each book under geeksforgeeks-books
is generated with articles under a tag/category on geeksforgeeks.org. The book under leetcode-book
is generated from the articles on leetcode.com.
Here's how the books look like in the iBooks App and Kindle App on my iPad. Kindle hasn't been tested.
Book covers are made of word clouds based on the book content using word_cloud
If you want to generate books yourself. Here is an incomplete guide.
-
install Scrapy. It's is used to download webpages from
geeksforgeeks
andleetcode
. It follows the next page link and downloads webpages.Install it with
pip install scrapy
. I created two separate scrapy projects calledgeeksforgeeks
andleetcode
to download wepages from the sites. -
lxml and Boilerpipy (or BeautifulSoup). After downloading the html files, you need to extract the articles from them, I'm using
Boilerpipy
because it can handle webpages with different layout. But if you are only interested in thegeeksforgeeks
site, you can just uselxml
to extract the articles. It will probably be faster too.Boilerpipy
also removes the title of an article sometimes. So I had to do some post-processing withlxml
after to add the title back. -
Pandoc. It's used to convert html files or markdown files to epub, pdf and docx format files. The latex engine used in Pandoc can't handle gif images so only a few pdf books have been generated so far.
-
kindlegen is needed to generate
mobi
files for reading on Kindle or the Kindle App. -
WordCloud. The book covers are generated with
wordcloud
with a bit of meta in mind.
-
Crawling with Scrpay. Go to the
geeksforgeeks
subdirectory and run commands likescrapy crawl geeksforgeeks -a category=category -a name=name
.For example, running
scrapy crawl geeksforgeeks -a category=tag -a name=pattern-searching
will crawl from the pagehttp://www.geeksforgeeks.org/tag/pattern-searching/
. category and name are two arguments the spider takes. On geeksforgeeks, things can be organized bytag
orcategory
. Specify the category/tag and the name, Scrapy will do the rest for you. -
Generate a book. Now go into the
geeksforgeeks-books
subdirectory and you should be able to find a directory calledpattern-searching
. Now runpython generate_book.py pattern-searching 1.0
. It will clean the html files, concatenate the cleaned files into one html file, then usepandoc
to create an epub and pdf format files from the it. In the end a mobi file is created usingkindlegen
.
Style the books better. Those books are essentially styled via css
. Therefore styling <pre>
and <code>
, for instance, will style the code of the epub
books.
Convert gif
images to png
and use them so pandoc
can handle them.
Every tag or category on geeksforgeeks.org
can be turned into a book. So you are welcome to add/suggest more books.
The style for generating epub
books is under styles
subdirectory. epub
books are styled via css
. Welcome to submit your stylesheets.
The content in the books doesn't belong to me. I created the books so other people and me can read them offline on iPad or Kindle, and (hopefully) for a better reading experience.
The content on geeksforgeeks.org is licensed under Creative Commons Attribution-NonCommercial-NoDerivs 2.5 India. See the license here.
The content on leetcode
belongs to the site.
The code in this project is licensed under Apache License, Version 2.0. See the license here.
If you are interested in reading some random posts from geeksforgeeks.org to have something to think about when feeling bored, head to gfgreader.info.
Jing Zhou, gnijuohz at gmail.com.