Skip to content

clowee/Alluxio-SQL-Engines-Comparison

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 

Repository files navigation

Alluxio-SQL-Engines-Comparison

Step 0: Install latest Ubuntu on the working machine.

Updating: sudo apt update; sudo apt -y upgrade; sudo apt install vim

Step 1: Install Spark on Ubuntu 20.04.2.0 LTS

Before downloading and setting up Spark, we need to install necessary dependencies:

- JDK
- Scala
- Git

Installing from the terminal with the command:

sudo apt install default-jdk scala git -y.

Verify installation with the following commands:

java -version; javac -version; scala -version; git --version.

Spark Installation: Dowloading latest Spark with Hadoop 2.7:

wget https://downloads.apache.org/spark/spark-3.1.1/spark-3.1.1-bin-hadoop2.7.tgz

Extract the files:

tar xvf spark-*

Create a directory for Spark:

sudo mv spark-3.1.1-bin-hadoop2.7 /opt/spark

(no output after successful operation)

Configuring Spark Environment: Adding Spark Home Paths:

echo "export SPARK_HOME=/opt/spark" >> ~/.profile ;echo "export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin" >> ~/.profile ;echo "export PYSPARK_PYTHON=/usr/bin/python3" >> ~/.profile

ALTERNATIVELY, use vim and .bashrc.

Open with:

vim ~/.bashrc

Insert the following:

export SPARK_HOME=/opt/spark; export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin

Activate the changes:

source ~/.bashrc

Running Spark: Start Master node:

start-master.sh

Check local host for Spark GUI: http://127.0.0.1:8080/ Create 1 worker node:

start-worker.sh spark://yourURL.

Mine is spark://sergei-VivoBook-ASUSLaptop-X421IA-M433IA:7077

In order to run Spark Shell:

/opt/spark/bin/spark-shell

To shut the nodes down:

stop-master.sh and stop-worker.sh

Setting up an hdfs database: Getting SSH and PDHS. SSH is needed for Hadoop and PDSH for better ssh resource management.

   sudo apt-get install pdsh

Download Hadoop 2.7 from the Apache website: https://hadoop.apache.org/release/2.7.0.html. Unpack:

tar xvf hadoop-*

Create a directory hadoop in opt and move the unpacked archive there, same as Spark. Edit the file etc/hadoop/hadoop-env.sh to define some parameters as follows:

set to the root of your Java installation

export JAVA_HOME=/usr/java/latest

To find JAVA_HOME:

dirname $(dirname $(readlink -f $(which javac)))

do the following with the found directory:

    export JAVA_HOME

Go to the Hadoop folder: /hadoop-2.7.0/bin and execute

./hadoop

The usage documentation will be shown.

Installing curl also helps:

sudo apt install curl -y

Intsalling and using the Star Schema Benchmark:

   cd ssb-dbgen
   make

Use the following to generate the tbl files.

   ./dbgen -s 1000 -T l
   ./dbgen -s 1000 -T p
   ./dbgen -s 1000 -T s
   ./dbgen -s 1000 -T d

the size can be anywhere from 0.01 to 1000.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published