Spark SQL

Jump to bottom

Animesh Trivedi edited this page Jul 26, 2018 · 30 revisions

About

This page contains atr's notes about Spark SQL performance investigation. Compiling spark

set -e 
MAVEN_OPTS="-XX:+TieredCompilation -XX:TieredStopAtLevel=1 -Xmx8g -XX:ReservedCodeCacheSize=2g"
branch=$(git branch | sed -n 's/^\* //p')
echo "Building the branch of : $branch" 
time mvn -T 2C -Phive -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests package
#./build/mvn -pl :spark-streaming_2.11 clean install - specific subsystem

Themes

Upgrading-Albis-to-VectorAPI

Spark-context-cleaner

How-spark-reads-parquet

Parquet-io-performance

Iterator-overheads

Sorting-on-strings

Notes-on-the-Code

Notes

Adding-null-sink

Parquet-partition-calculation

New-spark-config

Spark-config-details

Locality-scheduling-notes

Sort-Merge-Join

Parquet-generator

Codegen-examples

UnsafeCrailSerializer

Code-commentary

Spark-config-parameters

Animesh Trivedi