Skip to content
This repository has been archived by the owner on May 22, 2024. It is now read-only.

Development Guide

Hai Qian edited this page Sep 22, 2013 · 52 revisions

Here is a simple guide for developing wrapper functions of MADlib.

  1. Useful Utility Functions in PivotalR

    schema.madlib cbind db.array db.data.frame as.db.data.frame conn.id names delete content conn.eql as.factor eql lookat arraydb.to.arrayr

  2. Useful Internal Utility Functions in PivotalR

    Call a hidden function from command line for testing

    .db.getQuery .load.func .suppress.warnings .restore.warnings .check.madlib.version .get.params .get.res .is.conn.id.valid .unique.string .strip .get.dbms.str .madlib.version.number

  3. Useful R Code Snippets

    Raise an error Output a string Join multiple strings Exception handling Set the result class Check whether an object belongs to a class Get the command call as a language object Get the command call as a string Check whether an argument is missing Regular expression and gsub The for loop Check NULL value

  4. Examples

    S3 example S4 example

  5. After the coding of your madlib.newwrapper

Useful Utility Functions in PivotalR

The following functions are exposed to the users. For details, please refer to PivotalR's user manual.

schema.madlib

Returns MADlib schema name

cbind

Combine two db.obj objects

db.array

Combine multiple columns to form a column of array

db.data.frame

Create a wrapper of a table

as.db.data.frame

Create a copy of a table, data.frame, or file

conn.id

Returns the connection ID of an object

names

Returns all the column names

delete

Delete all related tables of an object (db.data.frame, table name, ...).

content

Returns the table name or SQL query of a db.obj object.

conn.eql

Are two connection ID equal?

as.factor

Convert a column into a categorial variable.

eql

Are two db.obj objects equal;

lookat

Load part or all of a table into memory. Try lookat(table_name, "all", array = FALSE) and lookat(table_name, "all") to see the difference.

arraydb.to.arrayr

This can be used together with lookat(table_name, "all", array = FALSE) to parse the results of an execution.

Useful Internal Utility Functions in PivotalR

Call an internal function from command line for testing

PivotalR:::.unique.string()  # NOTE: three ":" here

The following functions are "hidden" from the user, and you can call them only from within a PivotalR's function definition. You cannot call these functions from R's command line. However, when you are developing wrapper functions for MADlib, it is helpful to try and play around with these functions from command line. Then you can use the above method to call them.

.db.getQuery (query, con.id)

Execute the query string in connection conn.id, returns a data.frame, which is the result of the SQL query. The function .get.params is preferred than this one. One should use .get.res function instead.

.load.func (funcname, conn.id)

Load a SQL function definition from inst/sql/

.suppress.warnings (conn.id)

Suppress all warnings, returns the original warning levels

.restore.warnings (pre.warn)

Restore the original levels

.check.madlib.version (data, allowed.version=0.6)

When MADlib version is smaller than allowed.version, raise an error

.get.params (formula, data)

Analyze a formula and get the dependent, independent and grouping variables. Do pivoting if factor column is specified. Create intermediate table for db.Rquery and db.view objects.

.get.res (sql, tbl.output = NULL, conn.id)

SImilar to .db.getQuery but has exception handling. Returns the execution result.

.is.conn.id.valid (conn.id)

Check whether conn.id represents a valid existing connection

.unique.string ()

Generate a unique string.

.strip (str, rm = "\\s")

Remove the string rm from the beginning and end of the string str.

.get.dbms.str (conn.id)

Get the DBMS name (Greenplum, Postgres, or HAWQ).

.madlib.version.number(conn.id)

Get a double value which is the MADlib version number in the connected database.

Useful R Code Snippets

Raise an error

stop("We have ",
     "an error at line ", 365, "!")

Output a string

cat("We have ",
    "a string here ", 365, sep = "")

Join multiple strings

paste("We have", "something at", 365)
paste("We have ", "something at ", 365, sep="")
paste0("We have ", "something at ", 365)

a <- c(1,2,3)
paste(a, "is a", collapse=" + ", sep="")

Exception handling

res <- try(.db.getQuery(sql, conn.id(x)), silent = TRUE)
if (is(res, .err.class))
    stop("Could not do the summary!")

.get.res already has exception handling built in.

Set the result class

class(rst) <- "arima.css.madlib"

Check whether an object belongs to a class

is(x, "db.Rquery")
is(res, .err.class)
is(res, "data.frame")

Get the command call as a language object

call <- match.call()

Get the command call as a string

cat(match.call())

Check whether an argument is missing

if (missing(j)) {
    stop("Error")
}

Regular expression and gsub

gsub(regular-expression-to-replace, new-regular-expression, your-string)

gsub("\\d+", "digits", "1233535 is the number") # returns "digits is the number", note the double slashes

R's regular expressions use \\ instead of \.

The for loop

for (i in seq_len(n)) print(i) # seq_len(0) is integer(0), loop is not executed

Check NULL value

if (is.null(x))
    stop("Error: cannot be NULL!")

Examples

S3 example

Linear Regression

S4 example

ARIMA

After the coding of your madlib.newwrapper

  1. Add export("madlib.newwrapper") into the file NAMESPACE to expose it to the user

  2. Double check that all new internal functions start with "."

  3. Add user doc into the folder man/. You can use the existing user doc as prototypes.

  4. On the upper directory of PivotalR/, run (for example, version 0.1.100)

     $ R CMD build --resave-data PivotalR 2>/dev/null
     $ R CMD check --as-cran PivotalR_0.1.100.tar.gz
     $ R CMD install PivotalR_0.1.100.tar.gz
    

    Correct all errors, warnings and notes in the second step.

  5. File the pull request.

    For MADlib team member, directly push the code into a new branch.