Skip to content

Spark SQL

Animesh Trivedi edited this page Jul 26, 2018 · 30 revisions

About

This page contains atr's notes about Spark SQL performance investigation. Compiling spark

set -e 
MAVEN_OPTS="-XX:+TieredCompilation -XX:TieredStopAtLevel=1 -Xmx8g -XX:ReservedCodeCacheSize=2g"
branch=$(git branch | sed -n 's/^\* //p')
echo "Building the branch of : $branch" 
time mvn -T 2C -Phive -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests package
#./build/mvn -pl :spark-streaming_2.11 clean install - specific subsystem 

Themes

Spark-JIRAs

Upgrading-Albis-to-VectorAPI

Spark-session

Spark-context-cleaner

How-spark-reads-parquet

Parquet-io-performance

Iterator-overheads

Sorting-on-strings

Notes-on-the-Code

Notes

Howto-DataSets

Adding-null-sink

Parquet-partition-calculation

New-spark-config

Spark-config-details

Locality-scheduling-notes

Notes

Sort-Merge-Join

Call-stacks

Parquet-generator

Codegen-examples

Ideas

Logs

Netty

UnsafeCrailSerializer

Links

Performance

Code-commentary

Spark-config-parameters

Clone this wiki locally