-
Notifications
You must be signed in to change notification settings - Fork 7
Disclosive simulation
The code and examples simulates a client-server architecture. The server stores some datasets with some basic meta-data. Object-oriented programming offers some valuable features to present some concepts and ideas behind DataSHIELD, such as encapsulation.
We are going to represent in with R classes some data servers. We make the assumptions a server holds several datasets. The latter have each some data and a data dictionary. This class will be latter on extended to demonstrate some of the programming techniques DataSHIELD programmers experience. No encapsulation is yet to be applied; this will be added at a later stage.
Resources: TutorialDisclosive/R_Scripts_disclosive/disclosive_server.R
Readers can implement each class based on any descriptions provided. Solutions are made available in the aforementioned scripts. The simulation has been kept to a simple level so lower the barrier of comprehension. However, readers can implement their own version with a many error checking, should they wish to.
A class DataSet simulates how data is stored, with a basic data dictionary and the data itself.
Fields:
- meda.data (list) (public)
- data (data.frame)(public)
Methods:
- initialize(meta.data, data) (public)
The meta-data provides the names of the columns of the data frame. The dataframe simulates a single table database (Flat database).
Possible solution:
DataSet <- R6Class("DataSet", list(
meta.data = list(),
data = data.frame(),
initialize = function(meta.data, data)
{
stopifnot(is.list(meta.data), is.data.frame(data))
stopifnot(length(meta.data) == ncol(data))
self$meta.data <- meta.data
self$data <- data
colnames(self$data) <- self$meta.data
}
))
A class DataSet simulates a servers storing several datasets. It is a simplified version of a data server.
Fields:
- data.sets(list) (public)
Methods:
- initialize() (public)
- upload(meta.data, data, name) (public)
The field referred as datasets represents a collection made of instantiations of the class Dataset. It simulates any data servers can hold none or many different datasets. Each dataset is given a unique name; the latter becomes a unique method of calling of each element of a list.
Possible solution:
Server <- R6Class("Server", list(
datasets = NULL,
initaliaze = function()
{
self$datasets <- list()
},
upload = function(meta.data, data, name)
{
new.dataset <- DataSet$new(meta.data, data)
self$datasets[[name]] <- new.dataset
}
))
In a client-server architecture, the client and the server communicate by exchanging some messages. Those are referred as requests and responses. A simulation of these ideas can be modelled using the memory address of an instantiation of a Server class as the unique address of the server. The requests are the call made to access the fields or call a method. The responses are (loosely) modelled by the valued returned. If no value is returned, then it is assumed no response is required.
Resources: TutorialDisclosive/R_Scripts_disclosive/disclosive_client.R
Readers can implement each class based on a given description. A solution is provided in the aforementioned scripts. It is simple to keep the simulation easy to understand. However, readers can implement their own version with a many error checking as they wishes.
A class named Connection holds some unique addresses to the servers. The latter are the references to an instantiation of a Server class.
Fields:
- servers(list) (public)
Methods:
- initialize() (public)
- start.server() (public)
- connect(public)
- upload(server.name, path.to.data, meta.data, dataset.name) (public)
The servers field represents the existing connections to some servers. The method start.server instantiates the class Server and return a Server object. The Method connect adds a new server to the servers list. Finally the method upload provides a tool to upload a data set to an existing server.
Possible solution:
Connection <- R6Class("Connection", list(
servers = NULL,
initaliaze = function()
{
self$server <- list()
},
start.server = function()
{
return(Server$new())
},
connect = function(server.name,server)
{
stopifnot(is.character(server.name))
stopifnot(is.R6(server))
self$servers[[server.name]] <- server
},
upload = function(server.name, path.to.data,
meta.data, dataset.name)
{
stopifnot(is.character(server.name))
stopifnot(file.exists(path.to.data))
stopifnot(is.list(meta.data))
stopifnot(server.name %in% names(self$servers))
data <- read_csv(path.to.data)
self$servers[[server.name]]$upload(meta.data, data, dataset.name)
}
))
In this example a R script uses an instantiation of the class Connection to (1) start a server, (2) connect to the server, (3) upload the data from the server, (4) retrieve the data stored in a server.
Resources: TutorialDisclosive/R_Scripts_disclosive/main_disclosive.R
The code below shows how all the above operations are achieved.
source("R_Scripts_disclosive/disclosive_client.R")
print("-------------- Start servers and connect to servers -------------------")
connections <- Connection$new()
London <- connections$start.server()
Newcastle <- connections$start.server()
Edinburgh <- connections$start.server()
connections$connect("London",London)
connections$connect("Newcastle",Newcastle)
connections$connect("Edinburgh",Edinburgh)
print("-------------- upload the classic datasets -------------------")
print("Newcastle has upload classic 1 datasets")
path.to.data <- "data/classic_1.csv"
meta.data <- list("Title", "Author","GreatReadScore","Words","YearPub")
connections$upload("Newcastle",path.to.data,meta.data,"classic")
print("London has upload classic w datasets")
path.to.data <- "data/classic_2.csv"
meta.data <- list("Title", "Author","GreatReadScore","Words","YearPub")
connections$upload("London",path.to.data,meta.data,"classic")
print("Edinburghhas upload classic 1 datasets")
print("Edinbugh has upload classic 3 datasets")
path.to.data <- "data/classic_3.csv"
meta.data <- list("Title", "Author","GreatReadScore","Words","YearPub")
connections$upload("Edinburgh",path.to.data,meta.data,"classic")
print("---- retrieve some of the data from the servers and display them -----")
print(connections$servers[["Newcastle"]]$datasets[["classic"]]$data)
print(connections$servers[["London"]]$datasets[["classic"]]$data)
print(connections$servers[["Edinburgh"]]$datasets[["classic"]]$data)
You could attempt to show all the authors listed in each data set. You could also analyse statistically the year and the number of pages.
Patricia Ryser-Welch (DataSHIELD Team) DataSHIELD website