Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge master #10

Merged
merged 150 commits into from
Dec 9, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
150 commits
Select commit Hold shift + click to select a range
95b6dab
[SPARK-33287][SS][UI] Expose state custom metrics information on SS UI
gaborgsomogyi Nov 24, 2020
665817b
[SPARK-33457][PYTHON] Adjust mypy configuration
zero323 Nov 25, 2020
01321bc
[SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in…
zero323 Nov 25, 2020
d1b4f06
[SPARK-33494][SQL][AQE] Do not use local shuffle reader for repartition
cloud-fan Nov 25, 2020
b7f034d
[SPARK-33543][SQL] Migrate SHOW COLUMNS command to use UnresolvedTabl…
imback82 Nov 25, 2020
edab094
[SPARK-33224][SS][WEBUI] Add watermark gap information into SS UI page
HeartSaVioR Nov 25, 2020
c3ce970
[SPARK-33533][SQL] Fix the regression bug that ConnectionProviders do…
sarutak Nov 25, 2020
781e19c
[SPARK-33477][SQL] Hive Metastore support filter by date type
wangyum Nov 25, 2020
19f3b89
[SPARK-33549][SQL] Remove configuration spark.sql.legacy.allowCastNum…
gengliangwang Nov 25, 2020
2c5cc36
[SPARK-33509][SQL] List partition by names from a V2 table which supp…
MaxGekk Nov 25, 2020
7c59aee
[SPARK-27194][SPARK-29302][SQL] Fix commit collision in dynamic parti…
WinkerDu Nov 25, 2020
6f68ccf
[SPARK-31257][SPARK-33561][SQL] Unify create table syntax
rdblue Nov 25, 2020
d691d85
[SPARK-33496][SQL] Improve error message of ANSI explicit cast
gengliangwang Nov 25, 2020
9643eab
[SPARK-33540][SQL] Subexpression elimination for interpreted predicate
viirya Nov 25, 2020
7cf6a6f
[SPARK-31257][SPARK-33561][SQL][FOLLOWUP] Fix Scala 2.13 compilation
dongjoon-hyun Nov 25, 2020
1de3fc4
[SPARK-33525][SQL] Update hive-service-rpc to 3.1.2
wangyum Nov 25, 2020
c529426
[SPARK-33565][BUILD][PYTHON] remove python3.8 and fix breakage
shaneknapp Nov 25, 2020
fb7b870
[SPARK-33523][SQL][TEST][FOLLOWUP] Fix benchmark case name in SubExpr…
viirya Nov 25, 2020
919ea45
[SPARK-33562][UI] Improve the style of the checkbox in executor page
gengliangwang Nov 26, 2020
ed9e6fc
[SPARK-33565][INFRA][FOLLOW-UP] Keep the test coverage with Python 3.…
HyukjinKwon Nov 26, 2020
dfa3978
[SPARK-33551][SQL] Do not use custom shuffle reader for repartition
maryannxue Nov 26, 2020
d082ad0
[SPARK-33563][PYTHON][R][SQL] Expose inverse hyperbolic trig function…
zero323 Nov 27, 2020
433ae90
[SPARK-33566][CORE][SQL][SS][PYTHON] Make unescapedQuoteHandling opti…
LuciferYang Nov 27, 2020
8792280
[SPARK-33575][SQL] Fix misleading exception for "ANALYZE TABLE ... FO…
imback82 Nov 27, 2020
2c41d9d
[SPARK-33522][SQL] Improve exception messages while handling Unresolv…
imback82 Nov 27, 2020
e432550
[SPARK-28645][SQL] ParseException is thrown when the window is redefined
beliefer Nov 27, 2020
b9f2f78
[SPARK-33498][SQL] Datetime parsing should fail if the input string c…
leanken-zz Nov 27, 2020
35ded12
[SPARK-33141][SQL] Capture SQL configs when creating permanent views
luluorta Nov 27, 2020
13fd272
Spelling r common dev mlib external project streaming resource manage…
jsoref Nov 27, 2020
cf98a76
[SPARK-33570][SQL][TESTS] Set the proper version of gssapi plugin aut…
sarutak Nov 28, 2020
3650a6b
[SPARK-33580][CORE] resolveDependencyPaths should use classifier attr…
viirya Nov 28, 2020
bfe9380
[MINOR][SQL] Remove `getTables()` from `r.SQLUtils`
MaxGekk Nov 29, 2020
ba178f8
[SPARK-33581][SQL][TEST] Refactor HivePartitionFilteringSuite
wangyum Nov 29, 2020
b94ff1e
[SPARK-33590][DOCS][SQL] Add missing sub-bullets in Spark SQL Guide
kiszk Nov 29, 2020
c8286ec
[SPARK-33587][CORE] Kill the executor on nested fatal errors
zsxwing Nov 29, 2020
0054fc9
[SPARK-33588][SQL] Respect the `spark.sql.caseSensitive` config while…
MaxGekk Nov 29, 2020
a088a80
[SPARK-33585][SQL][DOCS] Fix the comment for `SQLContext.tables()` an…
MaxGekk Nov 29, 2020
3d54774
[SPARK-33517][SQL][DOCS] Fix the correct menu items and page links in…
liucht-inspur Nov 30, 2020
f93d439
[SPARK-33589][SQL] Close opened session if the initialization fails
wangyum Nov 30, 2020
a5e13ac
[SPARK-33582][SQL] Hive Metastore support filter by not-equals
wangyum Nov 30, 2020
feda729
[SPARK-33567][SQL] DSv2: Use callback instead of passing Spark sessio…
sunchao Nov 30, 2020
4851453
[MINOR] Spelling bin core docs external mllib repl
jsoref Nov 30, 2020
2da7259
[SPARK-32976][SQL] Support column list in INSERT statement
yaooqinn Nov 30, 2020
0fd9f57
[SPARK-33448][SQL] Support CACHE/UNCACHE TABLE commands for v2 tables
imback82 Nov 30, 2020
225c2e2
[SPARK-33498][SQL][FOLLOW-UP] Deduplicate the unittest by using check…
leanken-zz Nov 30, 2020
b665d58
[SPARK-28646][SQL] Fix bug of Count so as consistent with mainstream …
beliefer Nov 30, 2020
5cfbddd
[SPARK-33480][SQL] Support char/varchar type
cloud-fan Nov 30, 2020
6e5446e
[SPARK-33579][UI] Fix executor blank page behind proxy
Nov 30, 2020
0a612b6
[SPARK-33452][SQL] Support v2 SHOW PARTITIONS
MaxGekk Nov 30, 2020
6fd148f
[SPARK-33569][SQL] Remove getting partitions by an identifier prefix
MaxGekk Nov 30, 2020
030b313
[SPARK-33569][SPARK-33452][SQL][FOLLOWUP] Fix a build error in `ShowP…
MaxGekk Nov 30, 2020
f3c2583
[SPARK-33185][YARN][FOLLOW-ON] Leverage RM's RPC API instead of REST …
xkrogen Nov 30, 2020
c699435
[SPARK-33545][CORE] Support Fallback Storage during Worker decommission
dongjoon-hyun Nov 30, 2020
f5d2165
[SPARK-33440][CORE] Use current timestamp with warning log in HadoopF…
HeartSaVioR Nov 30, 2020
596fbc1
[SPARK-33556][ML] Add array_to_vector function for dataframe column
WeichenXu123 Dec 1, 2020
aeb3649
[SPARK-33613][PYTHON][TESTS] Replace deprecated APIs in pyspark tests
BryanCutler Dec 1, 2020
8016123
[SPARK-33592] Fix: Pyspark ML Validator params in estimatorParamMaps …
WeichenXu123 Dec 1, 2020
c50fcac
[SPARK-33607][SS][WEBUI] Input Rate timeline/histogram aren't rendere…
sarutak Dec 1, 2020
2af2da5
[SPARK-30900][SS] FileStreamSource: Avoid reading compact metadata lo…
HeartSaVioR Dec 1, 2020
1a042cc
[SPARK-33530][CORE] Support --archives and spark.archives option nati…
HyukjinKwon Dec 1, 2020
52e5cc4
[SPARK-27188][SS] FileStreamSink: provide a new option to have retent…
HeartSaVioR Dec 1, 2020
1034815
[SPARK-33572][SQL] Datetime building should fail if the year, month, …
waitinfuture Dec 1, 2020
e5bb293
[SPARK-32032][SS] Avoid infinite wait in driver because of KafkaConsu…
gaborgsomogyi Dec 1, 2020
d38883c
[SPARK-32405][SQL][FOLLOWUP] Throw Exception if provider is specified…
huaxingao Dec 1, 2020
9273d42
[SPARK-33045][SQL][FOLLOWUP] Support built-in function like_any and f…
beliefer Dec 1, 2020
cf4ad21
[SPARK-33503][SQL] Refactor SortOrder class to allow multiple childrens
prakharjain09 Dec 1, 2020
478fb7f
[SPARK-33608][SQL] Handle DELETE/UPDATE/MERGE in PullupCorrelatedPred…
aokolnychyi Dec 1, 2020
c24f2b2
[SPARK-33612][SQL] Add dataSourceRewriteRules batch to Optimizer
aokolnychyi Dec 1, 2020
5d0045e
[SPARK-33611][UI] Avoid encoding twice on the query parameter of rewr…
gengliangwang Dec 1, 2020
5a1c5ac
[SPARK-33622][R][ML] Add array_to_vector to SparkR
zero323 Dec 1, 2020
f71f345
[SPARK-33544][SQL] Optimize size of CreateArray/CreateMap to be the s…
tgravescs Dec 2, 2020
51ebcd9
[SPARK-32863][SS] Full outer stream-stream join
c21 Dec 2, 2020
a4788ee
[MINOR][SS] Rename auxiliary protected methods in StreamingJoinSuite
c21 Dec 2, 2020
290aa02
[SPARK-33618][CORE] Use hadoop-client instead of hadoop-client-api to…
dongjoon-hyun Dec 2, 2020
084d38b
[SPARK-33557][CORE][MESOS][TEST] Ensure the relationship between STOR…
LuciferYang Dec 2, 2020
28dad1b
[SPARK-33504][CORE] The application log in the Spark history server c…
echohlne Dec 2, 2020
df8d3f1
[SPARK-33544][SQL][FOLLOW-UP] Rename NoSideEffect to NoThrow and clar…
HyukjinKwon Dec 2, 2020
58583f7
[SPARK-33619][SQL] Fix GetMapValueUtil code generation error
leanken-zz Dec 2, 2020
91182d6
[SPARK-33626][K8S][TEST] Allow k8s integration tests to assert both d…
ScrapCodes Dec 2, 2020
a082f46
[SPARK-33071][SPARK-33536][SQL] Avoid changing dataset_id of LogicalP…
Ngone51 Dec 2, 2020
b76c6b7
[SPARK-33627][SQL] Add new function UNIX_SECONDS, UNIX_MILLIS and UNI…
gengliangwang Dec 2, 2020
92bfbcb
[SPARK-33631][DOCS][TEST] Clean up spark.core.connection.ack.wait.tim…
LuciferYang Dec 2, 2020
f94cb53
[MINOR][INFRA] Use the latest image for GitHub Action jobs
dongjoon-hyun Dec 3, 2020
4f96670
[SPARK-31953][SS] Add Spark Structured Streaming History Server Support
uncleGen Dec 3, 2020
90d4d7d
[SPARK-33610][ML] Imputer transform skip duplicate head() job
zhengruifeng Dec 3, 2020
878cc0e
[SPARK-32896][SS][FOLLOW-UP] Rename the API to `toTable`
xuanyuanking Dec 3, 2020
0880989
[SPARK-22798][PYTHON][ML][FOLLOWUP] Add labelsArray to PySpark String…
viirya Dec 3, 2020
3b2ff16
[SPARK-33636][PYTHON][ML][FOLLOWUP] Update since tag of labelsArray i…
viirya Dec 3, 2020
ff13f57
[SPARK-20044][SQL] Add new function DATE_FROM_UNIX_DATE and UNIX_DATE
gengliangwang Dec 3, 2020
512fb32
[SPARK-26218][SQL][FOLLOW UP] Fix the corner case of codegen when cas…
luluorta Dec 3, 2020
0706e64
[SPARK-30098][SQL] Add a configuration to use default datasource as p…
cloud-fan Dec 3, 2020
bd71186
[SPARK-33629][PYTHON] Make spark.buffer.size configuration visible on…
gaborgsomogyi Dec 3, 2020
aa13e20
[SPARK-33623][SQL] Add canDeleteWhere to SupportsDelete
aokolnychyi Dec 3, 2020
63f9d47
[SPARK-33634][SQL][TESTS] Use Analyzer in PlanResolutionSuite
cloud-fan Dec 3, 2020
7e759b2
[SPARK-33520][ML][PYSPARK] make CrossValidator/TrainValidateSplit/One…
WeichenXu123 Dec 4, 2020
8594958
[SPARK-33650][SQL] Fix the error from ALTER TABLE .. ADD/DROP PARTITI…
MaxGekk Dec 4, 2020
29e415d
[SPARK-33649][SQL][DOC] Improve the doc of spark.sql.ansi.enabled
gengliangwang Dec 4, 2020
e22ddb6
[SPARK-32405][SQL][FOLLOWUP] Remove USING _ in CREATE TABLE in JDBCTa…
huaxingao Dec 4, 2020
e02324f
[SPARK-33142][SPARK-33647][SQL] Store SQL text for SQL temp view
linhongliu-db Dec 4, 2020
15579ba
[SPARK-33430][SQL] Support namespaces in JDBC v2 Table Catalog
huaxingao Dec 4, 2020
e838066
[SPARK-33658][SQL] Suggest using Datetime conversion functions for in…
gengliangwang Dec 4, 2020
94c144b
[SPARK-33571][SQL][DOCS] Add a ref to INT96 config from the doc for `…
MaxGekk Dec 4, 2020
325abf7
[SPARK-33577][SS] Add support for V1Table in stream writer table API …
xuanyuanking Dec 4, 2020
91baab7
[SPARK-33656][TESTS] Add option to keep container after tests finish …
sarutak Dec 4, 2020
976e897
[SPARK-33640][TESTS] Extend connection timeout to DB server for DB2In…
sarutak Dec 4, 2020
233a849
[SPARK-27237][SS] Introduce State schema validation among query restart
HeartSaVioR Dec 4, 2020
990bee9
[SPARK-33615][K8S] Make 'spark.archives' working in Kubernates
HyukjinKwon Dec 4, 2020
acc211d
[SPARK-33141][SQL][FOLLOW-UP] Store the max nested view depth in Anal…
cloud-fan Dec 4, 2020
d671e05
[SPARK-33660][DOCS][SS] Fix Kafka Headers Documentation
Gschiavon Dec 4, 2020
de9818f
[SPARK-33662][BUILD] Setting version to 3.2.0-SNAPSHOT
dongjoon-hyun Dec 4, 2020
b6b45bc
[SPARK-33141][SQL][FOLLOW-UP] Fix Scala 2.13 compilation
dongjoon-hyun Dec 4, 2020
960d6af
[SPARK-33472][SQL][FOLLOW-UP] Update RemoveRedundantSorts comment
allisonwang-db Dec 4, 2020
1b4e35d
[SPARK-33651][SQL] Allow CREATE EXTERNAL TABLE with LOCATION for data…
cloud-fan Dec 5, 2020
154f604
[MINOR] Fix string interpolation in CommandUtils.scala and KafkaDataC…
imback82 Dec 6, 2020
6317ba2
[SPARK-33668][K8S][TEST] Fix flaky test "Verify logging configuration…
ScrapCodes Dec 6, 2020
e857e06
[SPARK-33652][SQL] DSv2: DeleteFrom should refresh cache
sunchao Dec 6, 2020
5250841
[SPARK-33256][PYTHON][DOCS] Clarify PySpark follows NumPy documentati…
HyukjinKwon Dec 6, 2020
4829781
[SPARK-33667][SQL] Respect the `spark.sql.caseSensitive` config while…
MaxGekk Dec 6, 2020
b94ecf0
[SPARK-33674][TEST] Show Slowpoke notifications in SBT tests
gatorsmile Dec 6, 2020
119539f
[SPARK-33663][SQL] Uncaching should not be called on non-existing tem…
imback82 Dec 7, 2020
e32de29
[SPARK-33675][INFRA] Add GitHub Action job to publish snapshot
dongjoon-hyun Dec 7, 2020
29096a8
[SPARK-33670][SQL] Verify the partition provider is Hive in v1 SHOW T…
MaxGekk Dec 7, 2020
e88f0d4
[SPARK-33683][INFRA] Remove -Djava.version=11 from Scala 2.13 build i…
sarutak Dec 7, 2020
73412ff
[SPARK-33680][SQL][TESTS] Fix PrunePartitionSuiteBase/BucketedReadWit…
dongjoon-hyun Dec 7, 2020
d48ef34
[SPARK-33684][BUILD] Upgrade httpclient from 4.5.6 to 4.5.13
sarutak Dec 7, 2020
87c0560
[SPARK-33671][SQL] Remove VIEW checks from V1 table commands
MaxGekk Dec 7, 2020
26c0493
[SPARK-33676][SQL] Require exact matching of partition spec to the sc…
MaxGekk Dec 7, 2020
1e0c006
[SPARK-33617][SQL] Add default parallelism configuration for Spark SQ…
wangyum Dec 7, 2020
d730b6b
[SPARK-32680][SQL] Don't Preprocess V2 CTAS with Unresolved Query
linhongliu-db Dec 7, 2020
da72b87
[SPARK-33641][SQL] Invalidate new char/varchar types in public APIs t…
yaooqinn Dec 7, 2020
c62b84a
[MINOR] Spelling sql not core
jsoref Dec 7, 2020
6aff215
[SPARK-33693][SQL] deprecate spark.sql.hive.convertCTAS
cloud-fan Dec 7, 2020
c0874ba
[SPARK-33480][SQL][FOLLOWUP] do not expose user data in error message
cloud-fan Dec 7, 2020
02508b6
[SPARK-33621][SQL] Add a way to inject data source rewrite rules
aokolnychyi Dec 7, 2020
e4d1c10
[SPARK-32320][PYSPARK] Remove mutable default arguments
Fokko Dec 8, 2020
b2a7930
[SPARK-33680][SQL][TESTS][FOLLOWUP] Fix more test suites to have expl…
dongjoon-hyun Dec 8, 2020
ebd8b93
[SPARK-33609][ML] word2vec reduce broadcast size
zhengruifeng Dec 8, 2020
8bcebfa
[SPARK-33698][BUILD][TESTS] Fix the build error of OracleIntegrationS…
sarutak Dec 8, 2020
5aefc49
[SPARK-33664][SQL] Migrate ALTER TABLE ... RENAME TO to use Unresolve…
imback82 Dec 8, 2020
3a6546d
[MINOR][INFRA] Add -Pdocker-integration-tests to GitHub Action Scala …
dongjoon-hyun Dec 8, 2020
031c5ef
[SPARK-33679][SQL] Enable spark.sql.adaptive.enabled by default
dongjoon-hyun Dec 8, 2020
99613cd
[SPARK-33677][SQL] Skip LikeSimplification rule if pattern contains a…
luluorta Dec 8, 2020
2b30dde
[SPARK-33688][SQL] Migrate SHOW TABLE EXTENDED to new resolution fram…
MaxGekk Dec 8, 2020
c05ee06
[SPARK-33685][SQL] Migrate DROP VIEW command to use UnresolvedView to…
imback82 Dec 8, 2020
a093d6f
[MINOR] Spelling sql/core
jsoref Dec 8, 2020
c001dd4
[SPARK-33675][INFRA][FOLLOWUP] Schedule branch-3.1 snapshot at master…
dongjoon-hyun Dec 8, 2020
6fd2345
[SPARK-32110][SQL] normalize special floating numbers in HyperLogLog++
cloud-fan Dec 8, 2020
3ac70f1
[SPARK-33695][BUILD] Upgrade to jackson to 2.10.5 and jackson-databin…
n-marion Dec 8, 2020
f021f6d
[MINOR][ML] Increase Bounded MLOR (without regularization) test error…
WeichenXu123 Dec 9, 2020
29fed23
[SPARK-33703][SQL] Migrate MSCK REPAIR TABLE to use UnresolvedTable t…
imback82 Dec 9, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
4 changes: 2 additions & 2 deletions .github/workflows/build_and_test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -153,7 +153,7 @@ jobs:
name: "Build modules: ${{ matrix.modules }}"
runs-on: ubuntu-20.04
container:
image: dongjoon/apache-spark-github-action-image:20201015
image: dongjoon/apache-spark-github-action-image:20201025
strategy:
fail-fast: false
matrix:
Expand Down Expand Up @@ -414,7 +414,7 @@ jobs:
- name: Build with SBT
run: |
./dev/change-scala-version.sh 2.13
./build/sbt -Pyarn -Pmesos -Pkubernetes -Phive -Phive-thriftserver -Phadoop-cloud -Pkinesis-asl -Djava.version=11 -Pscala-2.13 compile test:compile
./build/sbt -Pyarn -Pmesos -Pkubernetes -Phive -Phive-thriftserver -Phadoop-cloud -Pkinesis-asl -Pdocker-integration-tests -Pscala-2.13 compile test:compile

hadoop-2:
name: Hadoop 2 build with SBT
Expand Down
39 changes: 39 additions & 0 deletions .github/workflows/publish_snapshot.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
name: Publish Snapshot

on:
schedule:
- cron: '0 0 * * *'

jobs:
publish-snapshot:
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
branch:
- master
- branch-3.1
steps:
- name: Checkout Spark repository
uses: actions/checkout@master
with:
ref: ${{ matrix.branch }}
- name: Cache Maven local repository
uses: actions/cache@v2
with:
path: ~/.m2/repository
key: snapshot-maven-${{ hashFiles('**/pom.xml') }}
restore-keys: |
snapshot-maven-
- name: Install Java 8
uses: actions/setup-java@v1
with:
java-version: 8
- name: Publish snapshot
env:
ASF_USERNAME: ${{ secrets.NEXUS_USER }}
ASF_PASSWORD: ${{ secrets.NEXUS_PW }}
GPG_KEY: "not_used"
GPG_PASSPHRASE: "not_used"
GIT_REF: ${{ matrix.branch }}
run: ./dev/create-release/release-build.sh publish-snapshot
2 changes: 1 addition & 1 deletion R/CRAN_RELEASE.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ To release SparkR as a package to CRAN, we would use the `devtools` package. Ple

First, check that the `Version:` field in the `pkg/DESCRIPTION` file is updated. Also, check for stale files not under source control.

Note that while `run-tests.sh` runs `check-cran.sh` (which runs `R CMD check`), it is doing so with `--no-manual --no-vignettes`, which skips a few vignettes or PDF checks - therefore it will be preferred to run `R CMD check` on the source package built manually before uploading a release. Also note that for CRAN checks for pdf vignettes to success, `qpdf` tool must be there (to install it, eg. `yum -q -y install qpdf`).
Note that while `run-tests.sh` runs `check-cran.sh` (which runs `R CMD check`), it is doing so with `--no-manual --no-vignettes`, which skips a few vignettes or PDF checks - therefore it will be preferred to run `R CMD check` on the source package built manually before uploading a release. Also note that for CRAN checks for pdf vignettes to success, `qpdf` tool must be there (to install it, e.g. `yum -q -y install qpdf`).

To upload a release, we would need to update the `cran-comments.md`. This should generally contain the results from running the `check-cran.sh` script along with comments on status of all `WARNING` (should not be any) or `NOTE`. As a part of `check-cran.sh` and the release process, the vignettes is build - make sure `SPARK_HOME` is set and Spark jars are accessible.

Expand Down
2 changes: 1 addition & 1 deletion R/install-dev.bat
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ MKDIR %SPARK_HOME%\R\lib

rem When you pass the package path directly as an argument to R CMD INSTALL,
rem it takes the path as 'C:\projects\spark\R\..\R\pkg"' as an example at
rem R 4.0. To work around this, directly go to the directoy and install it.
rem R 4.0. To work around this, directly go to the directory and install it.
rem See also SPARK-32074
pushd %SPARK_HOME%\R\pkg\
R.exe CMD INSTALL --library="%SPARK_HOME%\R\lib" .
Expand Down
2 changes: 1 addition & 1 deletion R/pkg/DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: SparkR
Type: Package
Version: 3.1.0
Version: 3.2.0
Title: R Front End for 'Apache Spark'
Description: Provides an R Front end for 'Apache Spark' <https://spark.apache.org>.
Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"),
Expand Down
4 changes: 4 additions & 0 deletions R/pkg/NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -202,6 +202,7 @@ exportMethods("%<=>%",
"%in%",
"abs",
"acos",
"acosh",
"add_months",
"alias",
"approx_count_distinct",
Expand All @@ -222,6 +223,7 @@ exportMethods("%<=>%",
"array_remove",
"array_repeat",
"array_sort",
"array_to_vector",
"array_transform",
"arrays_overlap",
"array_union",
Expand All @@ -232,8 +234,10 @@ exportMethods("%<=>%",
"asc_nulls_last",
"ascii",
"asin",
"asinh",
"assert_true",
"atan",
"atanh",
"atan2",
"avg",
"base64",
Expand Down
6 changes: 3 additions & 3 deletions R/pkg/R/DataFrame.R
Original file line number Diff line number Diff line change
Expand Up @@ -2772,7 +2772,7 @@ setMethod("merge",
#' Creates a list of columns by replacing the intersected ones with aliases
#'
#' Creates a list of columns by replacing the intersected ones with aliases.
#' The name of the alias column is formed by concatanating the original column name and a suffix.
#' The name of the alias column is formed by concatenating the original column name and a suffix.
#'
#' @param x a SparkDataFrame
#' @param intersectedColNames a list of intersected column names of the SparkDataFrame
Expand Down Expand Up @@ -3231,7 +3231,7 @@ setMethod("describe",
#' \item stddev
#' \item min
#' \item max
#' \item arbitrary approximate percentiles specified as a percentage (eg, "75\%")
#' \item arbitrary approximate percentiles specified as a percentage (e.g., "75\%")
#' }
#' If no statistics are given, this function computes count, mean, stddev, min,
#' approximate quartiles (percentiles at 25\%, 50\%, and 75\%), and max.
Expand Down Expand Up @@ -3743,7 +3743,7 @@ setMethod("histogram",
#'
#' @param x a SparkDataFrame.
#' @param url JDBC database url of the form \code{jdbc:subprotocol:subname}.
#' @param tableName yhe name of the table in the external database.
#' @param tableName the name of the table in the external database.
#' @param mode one of 'append', 'overwrite', 'error', 'errorifexists', 'ignore'
#' save mode (it is 'error' by default)
#' @param ... additional JDBC database connection properties.
Expand Down
4 changes: 2 additions & 2 deletions R/pkg/R/RDD.R
Original file line number Diff line number Diff line change
Expand Up @@ -970,7 +970,7 @@ setMethod("takeSample", signature(x = "RDD", withReplacement = "logical",
MAXINT)))))
# If the first sample didn't turn out large enough, keep trying to
# take samples; this shouldn't happen often because we use a big
# multiplier for thei initial size
# multiplier for the initial size
while (length(samples) < total)
samples <- collectRDD(sampleRDD(x, withReplacement, fraction,
as.integer(ceiling(stats::runif(1,
Expand Down Expand Up @@ -1512,7 +1512,7 @@ setMethod("glom",
#'
#' @param x An RDD.
#' @param y An RDD.
#' @return a new RDD created by performing the simple union (witout removing
#' @return a new RDD created by performing the simple union (without removing
#' duplicates) of two input RDDs.
#' @examples
#'\dontrun{
Expand Down
2 changes: 1 addition & 1 deletion R/pkg/R/SQLContext.R
Original file line number Diff line number Diff line change
Expand Up @@ -203,7 +203,7 @@ getSchema <- function(schema, firstRow = NULL, rdd = NULL) {
})
}

# SPAKR-SQL does not support '.' in column name, so replace it with '_'
# SPARK-SQL does not support '.' in column name, so replace it with '_'
# TODO(davies): remove this once SPARK-2775 is fixed
names <- lapply(names, function(n) {
nn <- gsub(".", "_", n, fixed = TRUE)
Expand Down
4 changes: 2 additions & 2 deletions R/pkg/R/WindowSpec.R
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ setMethod("show", "WindowSpec",
#' Defines the partitioning columns in a WindowSpec.
#'
#' @param x a WindowSpec.
#' @param col a column to partition on (desribed by the name or Column).
#' @param col a column to partition on (described by the name or Column).
#' @param ... additional column(s) to partition on.
#' @return A WindowSpec.
#' @rdname partitionBy
Expand Down Expand Up @@ -231,7 +231,7 @@ setMethod("rangeBetween",
#' @rdname over
#' @name over
#' @aliases over,Column,WindowSpec-method
#' @family colum_func
#' @family column_func
#' @examples
#' \dontrun{
#' df <- createDataFrame(mtcars)
Expand Down
16 changes: 8 additions & 8 deletions R/pkg/R/column.R
Original file line number Diff line number Diff line change
Expand Up @@ -135,7 +135,7 @@ createMethods()
#' @rdname alias
#' @name alias
#' @aliases alias,Column-method
#' @family colum_func
#' @family column_func
#' @examples
#' \dontrun{
#' df <- createDataFrame(iris)
Expand All @@ -161,7 +161,7 @@ setMethod("alias",
#'
#' @rdname substr
#' @name substr
#' @family colum_func
#' @family column_func
#' @aliases substr,Column-method
#'
#' @param x a Column.
Expand All @@ -187,7 +187,7 @@ setMethod("substr", signature(x = "Column"),
#'
#' @rdname startsWith
#' @name startsWith
#' @family colum_func
#' @family column_func
#' @aliases startsWith,Column-method
#'
#' @param x vector of character string whose "starts" are considered
Expand All @@ -206,7 +206,7 @@ setMethod("startsWith", signature(x = "Column"),
#'
#' @rdname endsWith
#' @name endsWith
#' @family colum_func
#' @family column_func
#' @aliases endsWith,Column-method
#'
#' @param x vector of character string whose "ends" are considered
Expand All @@ -224,7 +224,7 @@ setMethod("endsWith", signature(x = "Column"),
#'
#' @rdname between
#' @name between
#' @family colum_func
#' @family column_func
#' @aliases between,Column-method
#'
#' @param x a Column
Expand All @@ -251,7 +251,7 @@ setMethod("between", signature(x = "Column"),
# nolint end
#' @rdname cast
#' @name cast
#' @family colum_func
#' @family column_func
#' @aliases cast,Column-method
#'
#' @examples
Expand Down Expand Up @@ -300,7 +300,7 @@ setMethod("%in%",
#' Can be a single value or a Column.
#' @rdname otherwise
#' @name otherwise
#' @family colum_func
#' @family column_func
#' @aliases otherwise,Column-method
#' @note otherwise since 1.5.0
setMethod("otherwise",
Expand Down Expand Up @@ -440,7 +440,7 @@ setMethod("withField",
#' )
#'
#' # However, if you are going to add/replace multiple nested fields,
#' # it is preffered to extract out the nested struct before
#' # it is preferred to extract out the nested struct before
#' # adding/replacing multiple fields e.g.
#' head(
#' withColumn(
Expand Down
4 changes: 2 additions & 2 deletions R/pkg/R/context.R
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ makeSplits <- function(numSerializedSlices, length) {
# For instance, for numSerializedSlices of 22, length of 50
# [1] 0 0 2 2 4 4 6 6 6 9 9 11 11 13 13 15 15 15 18 18 20 20 22 22 22
# [26] 25 25 27 27 29 29 31 31 31 34 34 36 36 38 38 40 40 40 43 43 45 45 47 47 47
# Notice the slice group with 3 slices (ie. 6, 15, 22) are roughly evenly spaced.
# Notice the slice group with 3 slices (i.e. 6, 15, 22) are roughly evenly spaced.
# We are trying to reimplement the calculation in the positions method in ParallelCollectionRDD
if (numSerializedSlices > 0) {
unlist(lapply(0: (numSerializedSlices - 1), function(x) {
Expand Down Expand Up @@ -116,7 +116,7 @@ makeSplits <- function(numSerializedSlices, length) {
#' This change affects both createDataFrame and spark.lapply.
#' In the specific one case that it is used to convert R native object into SparkDataFrame, it has
#' always been kept at the default of 1. In the case the object is large, we are explicitly setting
#' the parallism to numSlices (which is still 1).
#' the parallelism to numSlices (which is still 1).
#'
#' Specifically, we are changing to split positions to match the calculation in positions() of
#' ParallelCollectionRDD in Spark.
Expand Down
2 changes: 1 addition & 1 deletion R/pkg/R/deserialize.R
Original file line number Diff line number Diff line change
Expand Up @@ -250,7 +250,7 @@ readDeserializeWithKeysInArrow <- function(inputCon) {

keys <- readMultipleObjects(inputCon)

# Read keys to map with each groupped batch later.
# Read keys to map with each grouped batch later.
list(keys = keys, data = data)
}

Expand Down
69 changes: 66 additions & 3 deletions R/pkg/R/functions.R
Original file line number Diff line number Diff line change
Expand Up @@ -144,7 +144,7 @@ NULL
#' @param y Column to compute on.
#' @param pos In \itemize{
#' \item \code{locate}: a start position of search.
#' \item \code{overlay}: a start postiton for replacement.
#' \item \code{overlay}: a start position for replacement.
#' }
#' @param len In \itemize{
#' \item \code{lpad} the maximum length of each output result.
Expand Down Expand Up @@ -357,7 +357,13 @@ NULL
#' @examples
#' \dontrun{
#' df <- read.df("data/mllib/sample_libsvm_data.txt", source = "libsvm")
#' head(select(df, vector_to_array(df$features)))
#' head(
#' withColumn(
#' withColumn(df, "array", vector_to_array(df$features)),
#' "vector",
#' array_to_vector(column("array"))
#' )
#' )
#' }
NULL

Expand Down Expand Up @@ -455,6 +461,19 @@ setMethod("acos",
column(jc)
})

#' @details
#' \code{acosh}: Computes inverse hyperbolic cosine of the input column.
#'
#' @rdname column_math_functions
#' @aliases acosh acosh,Column-method
#' @note acosh since 3.1.0
setMethod("acosh",
signature(x = "Column"),
function(x) {
jc <- callJStatic("org.apache.spark.sql.functions", "acosh", x@jc)
column(jc)
})

#' @details
#' \code{approx_count_distinct}: Returns the approximate number of distinct items in a group.
#'
Expand Down Expand Up @@ -522,6 +541,19 @@ setMethod("asin",
column(jc)
})

#' @details
#' \code{asinh}: Computes inverse hyperbolic sine of the input column.
#'
#' @rdname column_math_functions
#' @aliases asinh asinh,Column-method
#' @note asinh since 3.1.0
setMethod("asinh",
signature(x = "Column"),
function(x) {
jc <- callJStatic("org.apache.spark.sql.functions", "asinh", x@jc)
column(jc)
})

#' @details
#' \code{atan}: Returns the inverse tangent of the given value,
#' as if computed by \code{java.lang.Math.atan()}
Expand All @@ -536,6 +568,19 @@ setMethod("atan",
column(jc)
})

#' @details
#' \code{atanh}: Computes inverse hyperbolic tangent of the input column.
#'
#' @rdname column_math_functions
#' @aliases atanh atanh,Column-method
#' @note atanh since 3.1.0
setMethod("atanh",
signature(x = "Column"),
function(x) {
jc <- callJStatic("org.apache.spark.sql.functions", "atanh", x@jc)
column(jc)
})

#' avg
#'
#' Aggregate function: returns the average of the values in a group.
Expand Down Expand Up @@ -2879,7 +2924,7 @@ setMethod("shiftRight", signature(y = "Column", x = "numeric"),
})

#' @details
#' \code{shiftRightUnsigned}: (Unigned) shifts the given value numBits right. If the given value is
#' \code{shiftRightUnsigned}: (Unsigned) shifts the given value numBits right. If the given value is
#' a long value, it will return a long value else it will return an integer value.
#'
#' @rdname column_math_functions
Expand Down Expand Up @@ -4570,6 +4615,24 @@ setMethod("timestamp_seconds",
column(jc)
})

#' @details
#' \code{array_to_vector} Converts a column of array of numeric type into
#' a column of dense vectors in MLlib
#'
#' @rdname column_ml_functions
#' @aliases array_to_vector array_to_vector,Column-method
#' @note array_to_vector since 3.1.0
setMethod("array_to_vector",
signature(x = "Column"),
function(x) {
jc <- callJStatic(
"org.apache.spark.ml.functions",
"array_to_vector",
x@jc
)
column(jc)
})

#' @details
#' \code{vector_to_array} Converts a column of MLlib sparse/dense vectors into
#' a column of dense arrays.
Expand Down
Loading