This repository is based on big-data-europe/docker-hadoop
The default version is 3.2.1.
To select other versions, please specify in the docker-compose.yml
.
To know all supported versions, review all branches in big-data-europe/docker-hadoop
- Latest Docker engine with docker-compose
- If you want to run Hadoop with many datanodes, all datanode clusters must be installed on Linux (see
docker-compose-datanode-clusters.yml
for the reason)
To deploy a basic HDFS cluster, run:
./start.sh
Stop and remove all HDFS containers, network:
./stop.sh
Access all dashboards:
- Namenode:
http://localhost:9870
- History server:
http://localhost:8188
- Datanode:
http://localhost:9864
- Nodemanager:
http://localhost:8042
- Resource manager:
http://localhost:8088
On the cluster machine, edit the datanode-cluster.env
docker-compose file by replacing 10.0.0.4
with the IP of the Host machine of the Namenode container.
Then deploy the cluster:
./start-datanode-cluster.sh
Stop and remove the Datanode cluster:
./stop-datanode-cluster.sh
After succesfully cluster deployment, do the Namenode Dashboard > Datanodes. Make sure the new Datanode is added, binded its Host's IP Address and balanced with the correct number of HDFS blocks.
To test Hadoop, attach to the Namenode container:
docker exec -it namenode bash
Create a simple text as the input:
echo "This is a simple test for Hadoop" > test.txt
Then create the corresponding input folder on HDFS:
hadoop fs -mkdir -p input
And copy out test file to HDFS:
hdfs dfs -put test.txt input/test.txt
After preparing the input file, we will get the WordCount program for Hadoop 3.2.1 in the hadoop-mapreduce-examples
executable jar file (If you use another Hadoop version, please change the path):
curl https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-mapreduce-examples/3.2.1/hadoop-mapreduce-examples-3.2.1.jar --output map_reduce.jar
Submit our WordCount Job to Hadoop (The program wordcount
can have different names on each hadoop-mapreduce-examples
version):
hadoop jar map_reduce.jar wordcount input output
If everything runs fine, we can see the output by requesting data from HDFS:
hdfs dfs -cat output/part-r-00000
Result:
This 1
is 1
a 1
simple 1
test 1
for 1
Hadoop 1
See detail in big-data-europe/docker-hadoop
You can use my Docker Commands Toolkt to clean your host machine
- Namenode will bind
namenode
as its host address. So downloading file via Namenode File System Browser will auto redirect tohttp://namenode:9870/webhdfs/v1/....
, which cause errors.