Skip to content

Commit

Permalink
Merge pull request #94 from hafen/master
Browse files Browse the repository at this point in the history
Updates
  • Loading branch information
hafen authored Oct 2, 2016
2 parents 5c979a8 + f572263 commit bb7beda
Show file tree
Hide file tree
Showing 21 changed files with 54 additions and 51 deletions.
10 changes: 7 additions & 3 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,18 @@ language: r
sudo: false
cache: packages

env:
global:
- _R_CHECK_FORCE_SUGGESTS_=FALSE
# env:
# global:
# - _R_CHECK_FORCE_SUGGESTS_=FALSE

r_github_packages:
- schloerke/RHIPE_dummy # fake the RHIPE requirement

branches:
only:
- master
- dev
- travis

notifications:
email:
Expand Down
14 changes: 7 additions & 7 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -1,23 +1,23 @@
How to contribute to Tessera / datadr
How to contribute to DeltaRho / datadr
=====================================

Thank you for sharing your code with the Tessera project. We appreciate your contribution!
Thank you for sharing your code with the DeltaRho project. We appreciate your contribution!

## Join the developer mailing list

If you're not already on the Tessera developers list, take a minute to join. This is as easy as sending an email to tessera-dev+subscribe@googlegroups.com.
If you're not already on the DeltaRho developers list, take a minute to join. This is as easy as sending an email to tessera-dev+subscribe@googlegroups.com.
It would be great if you'd introduce yourself to the group but it's not required. You can just let your code do the talking for you if you like.

## Check the issue tracker

Before you write too much code, check the [open issues in the datadr issue tracker](https://github.com/tesseradata/datadr/issues?state=open)
to see if someone else has already filed an issue related to your work or is already working on it. If not, go ahead and
[open a new issue](https://github.com/tesseradata/datadr/issues/new).
Before you write too much code, check the [open issues in the datadr issue tracker](https://github.com/delta-rho/datadr/issues?state=open)
to see if someone else has already filed an issue related to your work or is already working on it. If not, go ahead and
[open a new issue](https://github.com/delta-rho/datadr/issues/new).

## Announce your work on the mailing list

Shoot us a quick email on the mailing list letting us know what you're working on. There
will likely be people on the list who can give you tips about where to find relevant
will likely be people on the list who can give you tips about where to find relevant
source or alert you to other planned changes that might effect your work.

If the work you're proposing makes substantive changes to datadr, you may be asked to attach a design document
Expand Down
6 changes: 3 additions & 3 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
Package: datadr
Type: Package
Title: Divide and Recombine for Large, Complex Data
Version: 0.8.5
Date: 2016-03-15
Version: 0.8.6
Date: 2016-09-22
Authors@R: c(person("Ryan", "Hafen", email = "rhafen@gmail.com", role = c("aut", "cre")),
person("Landon", "Sego", role = "ctb"))
Maintainer: Ryan Hafen <rhafen@gmail.com>
Expand All @@ -12,7 +12,7 @@ Description: Methods for dividing data into subsets, applying analytical
on local disk, or on HDFS, in the latter case using the R and Hadoop
Integrated Programming Environment (RHIPE).
License: BSD_3_clause + file LICENSE
URL: http://tessera.io/docs-datadr
URL: http://deltarho.org/docs-datadr
LazyLoad: yes
LazyData: yes
NeedsCompilation: no
Expand Down
5 changes: 4 additions & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,9 @@ Version 0.8

FEATURES / CHANGES

- Update references point to DeltaRho (0.8.6)
- Fix for compatibility with `data.table` (0.8.6)
- Add `control` option to `makeExtractable()` (0.8.5)
- Add several new documentation examples (0.8.4)
- Update `removeData()` method for local disk connections (0.8.4)
- Remove Spark back-end files (0.8.3)
Expand Down Expand Up @@ -100,7 +103,7 @@ FEATURES / CHANGES

- add `addTransform()` method to specify transformations to be applied to
ddo/ddf objects with deferred evaluation (see
https://github.com/tesseradata/datadr/issues/24 for more information)
https://github.com/delta-rho/datadr/issues/24 for more information)
- revamp `drGetGlobals()` to properly traverse environments of user-defined
transformation functions and find all global variables and all package
dependencies
Expand Down
2 changes: 1 addition & 1 deletion R/datadr-package.R
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
#'
#' datadr: Divide and Recombine for Large, Complex Data
#'
#' \url{http://tessera.io/docs-datadr/}
#' \url{http://deltarho.org/docs-datadr/}
#'
#' @name datadr-package
#'
Expand Down
3 changes: 1 addition & 2 deletions R/dataops_read.R
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ readTextFileByChunk <- function(input, output, overwrite = FALSE, linesPerBlock
#' Experimental HDFS text reader helper function
#'
#' Experimental helper function for reading text data on HDFS into a HDFS connection
#' @param input a RHIPE input text handle created with \code{rhfmt}
#' @param input a ddo / ddf connection to a text input directory on HDFS, created with \code{\link{hdfsConn}} - ensure the text files are within a directory and that type = "text" is specified
#' @param output an output connection such as those created with \code{\link{localDiskConn}}, and \code{\link{hdfsConn}}
#' @param overwrite logical; should existing output location be overwritten? (also can specify \code{overwrite = "backup"} to move the existing output to _bak)
#' @param fn function to be applied to each chunk of lines (input to function is a vector of strings)
Expand Down Expand Up @@ -143,7 +143,6 @@ readHDFStextFile <- function(input, output = NULL, overwrite = FALSE, fn = NULL,
suppressMessages(output <- output)

mrExec(input,
setup = setup,
map = map,
reduce = reduce,
output = output,
Expand Down
4 changes: 3 additions & 1 deletion R/ddo_ddf_kvHDFS.R
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,7 @@ hasExtractableKV.kvHDFS <- function(x) {

#' Take a ddo/ddf HDFS data object and turn it into a mapfile
#' @param obj object of class 'ddo' or 'ddf' with an HDFS connection
#' @param control parameters specifying how the backend should handle things (most-likely parameters to \code{rhwatch} in RHIPE) - see \code{\link{rhipeControl}} and \code{\link{localDiskControl}}
#' @examples
#' \dontrun{
#' conn <- hdfsConn("/test/irisSplit")
Expand All @@ -89,7 +90,7 @@ hasExtractableKV.kvHDFS <- function(x) {
#' hdd[["3"]]
#' }
#' @export
makeExtractable <- function(obj) {
makeExtractable <- function(obj, control = NULL) {
if(!inherits(obj, "kvHDFS"))
stop("object must have an HDFS connection")

Expand All @@ -98,6 +99,7 @@ makeExtractable <- function(obj) {
# identity mr job
res <- mrExec(
obj,
control = control,
output = hdfsConn(mkd(file = "tmp_output"), type = "map", autoYes = TRUE, verbose = FALSE)
)

Expand Down
2 changes: 1 addition & 1 deletion R/divSpec_condDiv.R
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
#'
#' @references
#' \itemize{
#' \item \url{http://tessera.io}
#' \item \url{http://deltarho.org}
#' \item \href{http://onlinelibrary.wiley.com/doi/10.1002/sta4.7/full}{Guha, S., Hafen, R., Rounds, J., Xia, J., Li, J., Xi, B., & Cleveland, W. S. (2012). Large complex data: divide and recombine (D&R) with RHIPE. \emph{Stat}, 1(1), 53-67.}
#' }
#'
Expand Down
2 changes: 1 addition & 1 deletion R/divSpec_rrDiv.R
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
#'
#' @references
#' \itemize{
#' \item \url{http://tessera.io}
#' \item \url{http://deltarho.org}
#' \item \href{http://onlinelibrary.wiley.com/doi/10.1002/sta4.7/full}{Guha, S., Hafen, R., Rounds, J., Xia, J., Li, J., Xi, B., & Cleveland, W. S. (2012). Large complex data: divide and recombine (D&R) with RHIPE. \emph{Stat}, 1(1), 53-67.}
#' }
#'
Expand Down
2 changes: 1 addition & 1 deletion R/divide.R
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
#'
#' @references
#' \itemize{
#' \item \url{http://tessera.io}
#' \item \url{http://deltarho.org}
#' \item \href{http://onlinelibrary.wiley.com/doi/10.1002/sta4.7/full}{Guha, S., Hafen, R., Rounds, J., Xia, J., Li, J., Xi, B., & Cleveland, W. S. (2012). Large complex data: divide and recombine (D&R) with RHIPE. \emph{Stat}, 1(1), 53-67.}
#' }
#'
Expand Down
3 changes: 2 additions & 1 deletion R/divide_df.R
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,8 @@ getDivideDF <- function(data, by, postTransFn, bsvFn, update = FALSE) {
# d$i <- seq_len(nrow(b))
setkeyv(d, by$vars)

keyCols <- format(as.matrix(data.frame(unique(d))[,by$vars,drop = FALSE]), scientific = FALSE, trim = TRUE, justify = "none")
keyCols <- format(as.matrix(unique(d, by = key(d))[, by$vars, with = FALSE]),
scientific = FALSE, trim = TRUE, justify = "none")
keys <- apply(keyCols, 1, function(x) paste(paste(by$vars, "=", x, sep = ""), collapse = "|"))

res <- vector("list", length(keys))
Expand Down
2 changes: 1 addition & 1 deletion R/recombine.R
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
#'
#' @references
#' \itemize{
#' \item \url{http://tessera.io}
#' \item \url{http://deltarho.org}
#' \item \href{http://onlinelibrary.wiley.com/doi/10.1002/sta4.7/full}{Guha, S., Hafen, R., Rounds, J., Xia, J., Li, J., Xi, B., & Cleveland, W. S. (2012). Large complex data: divide and recombine (D&R) with RHIPE. \emph{Stat}, 1(1), 53-67.}
#' }
#'
Expand Down
19 changes: 8 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,31 +1,28 @@
# datadr: Divide and Recombine in R

[![Build Status](https://travis-ci.org/tesseradata/datadr.svg?branch=master)](https://travis-ci.org/tesseradata/datadr)
[![CRAN](http://www.r-pkg.org/badges/version/datadr)](https://cran.r-project.org/web/packages/datadr/index.html)
[![Join the chat at https://gitter.im/delta-rho/users](https://badges.gitter.im/Join%20Chat.svg)](https://gitter.im/delta-rho/users?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)
[![Build Status](https://travis-ci.org/delta-rho/datadr.svg?branch=master)](https://travis-ci.org/delta-rho/datadr)
[![CRAN](http://www.r-pkg.org/badges/version/datadr)](https://cran.r-project.org/package=datadr)

datadr is an R package that leverages [RHIPE](https://github.com/tesseradata/RHIPE) to provide a simple interface to division and recombination (D&R) methods for large complex data.
datadr is an R package that leverages [RHIPE](https://github.com/delta-rho/RHIPE) to provide a simple interface to division and recombination (D&R) methods for large complex data.

To get started, see the package documentation and function reference located [here](http://tesseradata.github.com/datadr).
To get started, see the package documentation and function reference located [here](http://deltarho.org/datadr).

Visualization tools based on D&R can be found [here](https://github.com/tesseradata/trelliscope).
Visualization tools based on D&R can be found [here](https://github.com/delta-rho/trelliscope).

## Installation

```r
# from CRAN:
install.packages("datadr")

# from packages.tessera.io:
options(repos = c(tessera = "http://packages.tessera.io", getOption("repos")))
install.packages("datadr")

# from github:
devtools::install_github("tesseradata/datadr")
devtools::install_github("delta-rho/datadr")
```

## License

This software is currently under the BSD license. Please read the [license](https://github.com/tesseradata/datadr/blob/master/LICENSE.md) document.
This software is currently under the BSD license. Please read the [license](https://github.com/delta-rho/datadr/blob/master/LICENSE.md) document.

## Acknowledgement

Expand Down
15 changes: 5 additions & 10 deletions cran-comments.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,7 @@
## Resubmission

This is a resubmission. I was asked to fix an issue that arose when tested on r-solaris-sparc with the tolerance of a few of my unit tests being too strict. I built R with --disable-long-double and flags -ffloat-store -fexcess-precision=standard and empirically obtained robust tolerances for the unit tests. A formal tolerance analysis was not feasible given the nature of the routines, but the result of the empirical analysis was satisfactory.

## Test environments

* local OS X install, R 3.2.4
* ubuntu 12.04 (on travis-ci), R 3.2.3
* ubuntu 12.04 (VM), R 3.2.4 with --disable-long-double
* local OS X install, R 3.3.1
* ubuntu 12.04 (on travis-ci), R 3.3.1
* win-builder (devel and release)

## R CMD check results
Expand All @@ -18,8 +13,6 @@ There were 2 NOTEs:
* checking CRAN incoming feasibility ... NOTE
Maintainer: 'Ryan Hafen <rhafen@gmail.com>'

Days since last update: 1

License components with restrictions and base license permitting such:
BSD_3_clause + file LICENSE
File 'LICENSE':
Expand All @@ -36,10 +29,12 @@ There were 2 NOTEs:

Suggests or Enhances not in mainstream repositories:
Rhipe

Availability using Additional_repositories specification:
Rhipe yes http://ml.stat.purdue.edu/packages

* checking package dependencies ... NOTE
Package suggested but not available for checking: 'Rhipe'

All words are spelled correctly.

The Rhipe R package is not required for any of the core functionality of datadr. It is an optional back end for datadr functions that ties R to Hadoop. Rhipe is a Linux-only R package available at http://ml.stat.purdue.edu/packages but since it has specific system-level dependencies (Hadoop, protocol buffers 2.5, etc.), we do not anticipate it being made easily available on CRAN. Since Rhipe only enhances functionality of datadr when used against a large Hadoop cluster, while all the same functionality runs fine without Rhipe on a local workstation, we believe it is safe to ignore this note.
Expand Down
2 changes: 1 addition & 1 deletion man/condDiv.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/datadr-package.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/divide.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 3 additions & 1 deletion man/makeExtractable.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/readHDFStextFile.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/recombine.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/rrDiv.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

0 comments on commit bb7beda

Please sign in to comment.