This is the work-in-progress version of the upcoming O'Reilly book, Big Data for Chimps: A Seriously Fun guide to Hadoop and Terabyte-scale data processing.
Our intent is to provide the best guide for exploratory data analytics using Hadoop -- for data science in practice. We use high-level languages (Pig and Ruby) that make Hadoop a tool, not a framework, allowing re-use and rapid development. We'll cover enough Hadoop internals to save you from diving into the source code, and enough tuning advice to let you know where to drill deep.
In all cases, the focus is on maximizing your time and creativity -- on helping you uncover what question to ask and the right way to ask it.
O'Reilly has courageouly agreed to release the book under an http://creativecommons.org/licenses/by-nc-sa/3.0/[CC-BY-NC-SA]. To buy a physical copy of the book, or a Kindle (.mobi
) or iOS/Nook (.epub
), visite the early release http://shop.oreilly.com[O'Reilly bookstore] (TODO: link to early release page). Buy it now, and you'll get frequently-updated access and the final version once available.
This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/3.0/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.
Code is Apache licensed unless specifically labeled otherwise.