forked from alteryx/spark
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge remote-tracking branch 'upstream/master' into
mllib_pmml_model_export_SPARK-1406 Conflicts: mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeansModel.scala
- Loading branch information
Showing
1,425 changed files
with
77,024 additions
and
24,538 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
*.o | ||
*.so | ||
*.Rd | ||
lib | ||
pkg/man | ||
pkg/html |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
# SparkR Documentation | ||
|
||
SparkR documentation is generated using in-source comments annotated using using | ||
`roxygen2`. After making changes to the documentation, to generate man pages, | ||
you can run the following from an R console in the SparkR home directory | ||
|
||
library(devtools) | ||
devtools::document(pkg="./pkg", roclets=c("rd")) | ||
|
||
You can verify if your changes are good by running | ||
|
||
R CMD check pkg/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,67 @@ | ||
# R on Spark | ||
|
||
SparkR is an R package that provides a light-weight frontend to use Spark from R. | ||
|
||
### SparkR development | ||
|
||
#### Build Spark | ||
|
||
Build Spark with [Maven](http://spark.apache.org/docs/latest/building-spark.html#building-with-buildmvn) and include the `-PsparkR` profile to build the R package. For example to use the default Hadoop versions you can run | ||
``` | ||
build/mvn -DskipTests -Psparkr package | ||
``` | ||
|
||
#### Running sparkR | ||
|
||
You can start using SparkR by launching the SparkR shell with | ||
|
||
./bin/sparkR | ||
|
||
The `sparkR` script automatically creates a SparkContext with Spark by default in | ||
local mode. To specify the Spark master of a cluster for the automatically created | ||
SparkContext, you can run | ||
|
||
./bin/sparkR --master "local[2]" | ||
|
||
To set other options like driver memory, executor memory etc. you can pass in the [spark-submit](http://spark.apache.org/docs/latest/submitting-applications.html) arguments to `./bin/sparkR` | ||
|
||
#### Using SparkR from RStudio | ||
|
||
If you wish to use SparkR from RStudio or other R frontends you will need to set some environment variables which point SparkR to your Spark installation. For example | ||
``` | ||
# Set this to where Spark is installed | ||
Sys.setenv(SPARK_HOME="/Users/shivaram/spark") | ||
# This line loads SparkR from the installed directory | ||
.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths())) | ||
library(SparkR) | ||
sc <- sparkR.init(master="local") | ||
``` | ||
|
||
#### Making changes to SparkR | ||
|
||
The [instructions](https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark) for making contributions to Spark also apply to SparkR. | ||
If you only make R file changes (i.e. no Scala changes) then you can just re-install the R package using `R/install-dev.sh` and test your changes. | ||
Once you have made your changes, please include unit tests for them and run existing unit tests using the `run-tests.sh` script as described below. | ||
|
||
#### Generating documentation | ||
|
||
The SparkR documentation (Rd files and HTML files) are not a part of the source repository. To generate them you can run the script `R/create-docs.sh`. This script uses `devtools` and `knitr` to generate the docs and these packages need to be installed on the machine before using the script. | ||
|
||
### Examples, Unit tests | ||
|
||
SparkR comes with several sample programs in the `examples/src/main/r` directory. | ||
To run one of them, use `./bin/sparkR <filename> <args>`. For example: | ||
|
||
./bin/sparkR examples/src/main/r/pi.R local[2] | ||
|
||
You can also run the unit-tests for SparkR by running (you need to install the [testthat](http://cran.r-project.org/web/packages/testthat/index.html) package first): | ||
|
||
R -e 'install.packages("testthat", repos="http://cran.us.r-project.org")' | ||
./R/run-tests.sh | ||
|
||
### Running on YARN | ||
The `./bin/spark-submit` and `./bin/sparkR` can also be used to submit jobs to YARN clusters. You will need to set YARN conf dir before doing so. For example on CDH you can run | ||
``` | ||
export YARN_CONF_DIR=/etc/hadoop/conf | ||
./bin/spark-submit --master yarn examples/src/main/r/pi.R 4 | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
## Building SparkR on Windows | ||
|
||
To build SparkR on Windows, the following steps are required | ||
|
||
1. Install R (>= 3.1) and [Rtools](http://cran.r-project.org/bin/windows/Rtools/). Make sure to | ||
include Rtools and R in `PATH`. | ||
2. Install | ||
[JDK7](http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html) and set | ||
`JAVA_HOME` in the system environment variables. | ||
3. Download and install [Maven](http://maven.apache.org/download.html). Also include the `bin` | ||
directory in Maven in `PATH`. | ||
4. Set `MAVEN_OPTS` as described in [Building Spark](http://spark.apache.org/docs/latest/building-spark.html). | ||
5. Open a command shell (`cmd`) in the Spark directory and run `mvn -DskipTests -Psparkr package` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
#!/bin/bash | ||
|
||
# | ||
# Licensed to the Apache Software Foundation (ASF) under one or more | ||
# contributor license agreements. See the NOTICE file distributed with | ||
# this work for additional information regarding copyright ownership. | ||
# The ASF licenses this file to You under the Apache License, Version 2.0 | ||
# (the "License"); you may not use this file except in compliance with | ||
# the License. You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
# | ||
|
||
# Script to create API docs for SparkR | ||
# This requires `devtools` and `knitr` to be installed on the machine. | ||
|
||
# After running this script the html docs can be found in | ||
# $SPARK_HOME/R/pkg/html | ||
|
||
# Figure out where the script is | ||
export FWDIR="$(cd "`dirname "$0"`"; pwd)" | ||
pushd $FWDIR | ||
|
||
# Generate Rd file | ||
Rscript -e 'library(devtools); devtools::document(pkg="./pkg", roclets=c("rd"))' | ||
|
||
# Install the package | ||
./install-dev.sh | ||
|
||
# Now create HTML files | ||
|
||
# knit_rd puts html in current working directory | ||
mkdir -p pkg/html | ||
pushd pkg/html | ||
|
||
Rscript -e 'library(SparkR, lib.loc="../../lib"); library(knitr); knit_rd("SparkR")' | ||
|
||
popd | ||
|
||
popd |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
@echo off | ||
|
||
rem | ||
rem Licensed to the Apache Software Foundation (ASF) under one or more | ||
rem contributor license agreements. See the NOTICE file distributed with | ||
rem this work for additional information regarding copyright ownership. | ||
rem The ASF licenses this file to You under the Apache License, Version 2.0 | ||
rem (the "License"); you may not use this file except in compliance with | ||
rem the License. You may obtain a copy of the License at | ||
rem | ||
rem http://www.apache.org/licenses/LICENSE-2.0 | ||
rem | ||
rem Unless required by applicable law or agreed to in writing, software | ||
rem distributed under the License is distributed on an "AS IS" BASIS, | ||
rem WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
rem See the License for the specific language governing permissions and | ||
rem limitations under the License. | ||
rem | ||
|
||
rem Install development version of SparkR | ||
rem | ||
|
||
set SPARK_HOME=%~dp0.. | ||
|
||
MKDIR %SPARK_HOME%\R\lib | ||
|
||
R.exe CMD INSTALL --library="%SPARK_HOME%\R\lib" %SPARK_HOME%\R\pkg\ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
#!/bin/bash | ||
|
||
# | ||
# Licensed to the Apache Software Foundation (ASF) under one or more | ||
# contributor license agreements. See the NOTICE file distributed with | ||
# this work for additional information regarding copyright ownership. | ||
# The ASF licenses this file to You under the Apache License, Version 2.0 | ||
# (the "License"); you may not use this file except in compliance with | ||
# the License. You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
# | ||
|
||
# This scripts packages the SparkR source files (R and C files) and | ||
# creates a package that can be loaded in R. The package is by default installed to | ||
# $FWDIR/lib and the package can be loaded by using the following command in R: | ||
# | ||
# library(SparkR, lib.loc="$FWDIR/lib") | ||
# | ||
# NOTE(shivaram): Right now we use $SPARK_HOME/R/lib to be the installation directory | ||
# to load the SparkR package on the worker nodes. | ||
|
||
|
||
FWDIR="$(cd `dirname $0`; pwd)" | ||
LIB_DIR="$FWDIR/lib" | ||
|
||
mkdir -p $LIB_DIR | ||
|
||
# Install R | ||
R CMD INSTALL --library=$LIB_DIR $FWDIR/pkg/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
# | ||
# Licensed to the Apache Software Foundation (ASF) under one or more | ||
# contributor license agreements. See the NOTICE file distributed with | ||
# this work for additional information regarding copyright ownership. | ||
# The ASF licenses this file to You under the Apache License, Version 2.0 | ||
# (the "License"); you may not use this file except in compliance with | ||
# the License. You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
# | ||
|
||
# Set everything to be logged to the file target/unit-tests.log | ||
log4j.rootCategory=INFO, file | ||
log4j.appender.file=org.apache.log4j.FileAppender | ||
log4j.appender.file.append=true | ||
log4j.appender.file.file=R-unit-tests.log | ||
log4j.appender.file.layout=org.apache.log4j.PatternLayout | ||
log4j.appender.file.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss.SSS} %t %p %c{1}: %m%n | ||
|
||
# Ignore messages below warning level from Jetty, because it's a bit verbose | ||
log4j.logger.org.eclipse.jetty=WARN | ||
org.eclipse.jetty.LEVEL=WARN |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
Package: SparkR | ||
Type: Package | ||
Title: R frontend for Spark | ||
Version: 1.4.0 | ||
Date: 2013-09-09 | ||
Author: The Apache Software Foundation | ||
Maintainer: Shivaram Venkataraman <shivaram@cs.berkeley.edu> | ||
Imports: | ||
methods | ||
Depends: | ||
R (>= 3.0), | ||
methods, | ||
Suggests: | ||
testthat | ||
Description: R frontend for Spark | ||
License: Apache License (== 2.0) | ||
Collate: | ||
'generics.R' | ||
'jobj.R' | ||
'RDD.R' | ||
'pairRDD.R' | ||
'schema.R' | ||
'column.R' | ||
'group.R' | ||
'DataFrame.R' | ||
'SQLContext.R' | ||
'backend.R' | ||
'broadcast.R' | ||
'client.R' | ||
'context.R' | ||
'deserialize.R' | ||
'serialize.R' | ||
'sparkR.R' | ||
'utils.R' | ||
'zzz.R' |
Oops, something went wrong.