Skip to content

Commit

Permalink
[SPARK-21042][SQL] Document Dataset.union is resolution by position
Browse files Browse the repository at this point in the history
## What changes were proposed in this pull request?
Document Dataset.union is resolution by position, not by name, since this has been a confusing point for a lot of users.

## How was this patch tested?
N/A - doc only change.

Author: Reynold Xin <rxin@databricks.com>

Closes #18256 from rxin/SPARK-21042.
  • Loading branch information
rxin committed Jun 10, 2017
1 parent 5716354 commit b78e384
Show file tree
Hide file tree
Showing 3 changed files with 18 additions and 10 deletions.
1 change: 1 addition & 0 deletions R/pkg/R/DataFrame.R
Original file line number Diff line number Diff line change
Expand Up @@ -2646,6 +2646,7 @@ generateAliasesForIntersectedCols <- function (x, intersectedColNames, suffix) {
#' Input SparkDataFrames can have different schemas (names and data types).
#'
#' Note: This does not remove duplicate rows across the two SparkDataFrames.
#' Also as standard in SQL, this function resolves columns by position (not by name).
#'
#' @param x A SparkDataFrame
#' @param y A SparkDataFrame
Expand Down
13 changes: 9 additions & 4 deletions python/pyspark/sql/dataframe.py
Original file line number Diff line number Diff line change
Expand Up @@ -1175,18 +1175,23 @@ def agg(self, *exprs):

@since(2.0)
def union(self, other):
""" Return a new :class:`DataFrame` containing union of rows in this
frame and another frame.
""" Return a new :class:`DataFrame` containing union of rows in this and another frame.
This is equivalent to `UNION ALL` in SQL. To do a SQL-style set union
(that does deduplication of elements), use this function followed by a distinct.
Also as standard in SQL, this function resolves columns by position (not by name).
"""
return DataFrame(self._jdf.union(other._jdf), self.sql_ctx)

@since(1.3)
def unionAll(self, other):
""" Return a new :class:`DataFrame` containing union of rows in this
frame and another frame.
""" Return a new :class:`DataFrame` containing union of rows in this and another frame.
This is equivalent to `UNION ALL` in SQL. To do a SQL-style set union
(that does deduplication of elements), use this function followed by a distinct.
Also as standard in SQL, this function resolves columns by position (not by name).
.. note:: Deprecated in 2.0, use union instead.
"""
Expand Down
14 changes: 8 additions & 6 deletions sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
Original file line number Diff line number Diff line change
Expand Up @@ -1734,10 +1734,11 @@ class Dataset[T] private[sql](

/**
* Returns a new Dataset containing union of rows in this Dataset and another Dataset.
* This is equivalent to `UNION ALL` in SQL.
*
* To do a SQL-style set union (that does deduplication of elements), use this function followed
* by a [[distinct]].
* This is equivalent to `UNION ALL` in SQL. To do a SQL-style set union (that does
* deduplication of elements), use this function followed by a [[distinct]].
*
* Also as standard in SQL, this function resolves columns by position (not by name).
*
* @group typedrel
* @since 2.0.0
Expand All @@ -1747,10 +1748,11 @@ class Dataset[T] private[sql](

/**
* Returns a new Dataset containing union of rows in this Dataset and another Dataset.
* This is equivalent to `UNION ALL` in SQL.
*
* To do a SQL-style set union (that does deduplication of elements), use this function followed
* by a [[distinct]].
* This is equivalent to `UNION ALL` in SQL. To do a SQL-style set union (that does
* deduplication of elements), use this function followed by a [[distinct]].
*
* Also as standard in SQL, this function resolves columns by position (not by name).
*
* @group typedrel
* @since 2.0.0
Expand Down

0 comments on commit b78e384

Please sign in to comment.