Skip to content

Commit

Permalink
Closes #905. Added Note to ?":=" and added an exercise to reference s…
Browse files Browse the repository at this point in the history
…emantics vignette.
  • Loading branch information
arunsrinivasan committed Mar 9, 2016
1 parent 0931feb commit 9741af4
Show file tree
Hide file tree
Showing 3 changed files with 19 additions and 0 deletions.
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -212,6 +212,10 @@

20. `row.names` argument to `print.data.table` can now be changed by default via `options("datatable.print.rownames")` (`TRUE` by default, the inherited standard), [#1097](https://github.com/Rdatatable/data.table/issues/1097). Thanks to @smcinerney for the suggestion and @MichaelChirico for the PR.

21. Added a FAQ entry for the new update to `:=` which sometimes doesn't print the result on the first time, [#939](https://github.com/Rdatatable/data.table/issues/939).

22. Added `Note` section and examples to `?":="` for [#905](https://github.com/Rdatatable/data.table/issues/905).

### Changes in v1.9.6 (on CRAN 19 Sep 2015)

#### NEW FEATURES
Expand Down
9 changes: 9 additions & 0 deletions man/assign.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,11 @@ Unlike \code{<-} for \code{data.frame}, the (potentially large) LHS is not coerc

Since \code{[.data.table} incurs overhead to check the existence and type of arguments (for example), \code{set()} provides direct (but less flexible) assignment by reference with low overhead, appropriate for use inside a \code{for} loop. See examples. \code{:=} is more powerful and flexible than \code{set()} because \code{:=} is intended to be combined with \code{i} and \code{by} in single queries on large datasets.
}
\section{Note:}{
\code{X[a > 4, b := c]} is different from \code{X[a > 4][, b := c]}. The first expression updates (or adds) column \code{b} with the value \code{c} on those rows where \code{a > 4} evaluates to \code{TRUE}. \code{X} is updated \emph{by reference}, therefore no assignment needed.

The second expression on the other hand updates a \emph{new} \code{data.table} that's returned by the subset operation. Since the subsetted data.table is ephemeral (it is not assigned to a symbol), the result would be lost; unless the result is assigned, for example, as follows: \code{ans <- X[a > 4][, b := c]}.
}
\value{
\code{DT} is modified by reference and returned invisibly. If you require a copy, take a \code{\link{copy}} first (using \code{DT2 = copy(DT)}).
}
Expand All @@ -83,6 +88,10 @@ DT # DT changed by reference
DT[2, d := 10L][] # shorthand for update and print
DT[b > 4, b := d * 2L] # subassign to b with d*2L on those rows where b > 4 is TRUE
DT[b > 4][, b := d * 2L] # different from above. [, := ] is performed on the subset
# which is an new (ephemeral) data.table. Result needs to be
# assigned to a variable (using `<-`).
DT[, e := mean(d), by = a] # add new column by group by reference
DT["A", b := 0L, on = "a"] # ad-hoc update of column b for group "A" using
# joins-as-subsets with binary search and 'on='
Expand Down
6 changes: 6 additions & 0 deletions vignettes/datatable-reference-semantics.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -190,6 +190,12 @@ Let's look at all the `hours` to verify.
flights[, sort(unique(hour))]
```

#### Exercise: {.bs-callout .bs-callout-warning #update-by-reference-question}

What is the difference between `flights[hour == 24L, hour := 0L]` and `flights[hour == 24L][, hour := 0L]`? Hint: The latter needs an assignment (`<-`) if you would want to use the result later.

If you can't figure it out, have a look at the `Note` section of `?":="`.

### c) Delete column by reference

#### -- Remove `delay` column
Expand Down

0 comments on commit 9741af4

Please sign in to comment.