Skip to content

Disclosive simulation

Patricia Ryser-Welch edited this page Jun 16, 2020 · 23 revisions

DataSHIELD and Newcastle university logo eRum 2020 logo

Useful readings

Purpose

The code and examples simulates a client-server architecture. The server stores some datasets with some basic meta-data. Object-oriented programming offers some valuable features to present some concepts and ideas behind DataSHIELD, such as encapsulation.

Server Simulation

We are going to represent in with R classes some data servers. We make the assumptions a server holds several datasets. The latter have each some data and a data dictionary. This class will be latter on extended to demonstrate some of the programming techniques DataSHIELD programmers experience. No encapsulation is yet to be applied; this will be added at a later stage.

Resources: TutorialDisclosive/R_Scripts_disclosive/disclosive_server.R

Practice

Readers can implement each class based on any descriptions provided. Solutions are made available in the aforementioned scripts. The simulation has been kept to a simple level so lower the barrier of comprehension. However, readers can implement their own version with a many error checking, should they wish to.

DataSet class

A class DataSet simulates how data is stored, with a basic data dictionary and the data itself.

Fields:

  • meda.data (list) (public)
  • data (data.frame)(public)

Methods:

  • initialize(meta.data, data) (public)

The meta-data provides the names of the columns of the data frame. The dataframe simulates a single table database (Flat database).

Possible solution:

DataSet <- R6Class("DataSet", list(
           meta.data = list(),
           data = data.frame(),
           initialize = function(meta.data, data)
           {
              stopifnot(is.list(meta.data), is.data.frame(data))
              stopifnot(length(meta.data) == ncol(data))
              
              self$meta.data      <- meta.data
              self$data           <- data
              colnames(self$data) <- self$meta.data
           }
 ))

Server class

A class DataSet simulates a servers storing several datasets. It is a simplified version of a data server.

Fields:

  • data.sets(list) (public)

Methods:

  • initialize() (public)
  • upload(meta.data, data, name) (public)

The field referred as datasets represents a collection made of instantiations of the class Dataset. It simulates any data servers can hold none or many different datasets. Each dataset is given a unique name; the latter becomes a unique method of calling of each element of a list.

Possible solution:

Server  <- R6Class("Server", list(
           datasets = NULL, 
           initaliaze = function()
           {
             self$datasets <- list()
           },
           upload = function(meta.data, data, name)
           {
             new.dataset <- DataSet$new(meta.data, data)
             self$datasets[[name]] <- new.dataset
           }
  
))

Connection simulation

In a client-server architecture, the client and the server communicate by exchanging some messages. Those are referred as requests and responses. A simulation of these ideas can be modelled using the memory address of an instantiation of a Server class as the unique address of the server. The requests are the call made to access the fields or call a method. The responses are (loosely) modelled by the valued returned. If no value is returned, then it is assumed no response is required.

Resources: TutorialDisclosive/R_Scripts_disclosive/disclosive_client.R

Practice

Readers can implement each class based on a given description. A solution is provided in the aforementioned scripts. It is simple to keep the simulation easy to understand. However, readers can implement their own version with a many error checking as they wishes.

Connection class

A class named Connection holds some unique addresses to the servers. The latter are the references to an instantiation of a Server class.

Fields:

  • servers(list) (public)

Methods:

  • initialize() (public)
  • start.server() (public)
  • connect(public)
  • upload(server.name, path.to.data, meta.data, dataset.name) (public)

The servers field represents the existing connections to some servers. The method start.server instantiates the class Server and return a Server object. The Method connect adds a new server to the servers list. Finally the method upload provides a tool to upload a data set to an existing server.

Possible solution:

Connection <- R6Class("Connection", list(
                      servers = NULL,
                      initaliaze = function()
                      {
                        self$server <- list()
                      },
                      start.server = function()
                      {
                        return(Server$new())
                      },
                      connect = function(server.name,server)
                      {
                        stopifnot(is.character(server.name))
                        stopifnot(is.R6(server))
                        self$servers[[server.name]] <- server
                      },
                      upload = function(server.name, path.to.data, 
                                        meta.data, dataset.name)
                      {
                        stopifnot(is.character(server.name))
                        stopifnot(file.exists(path.to.data))
                        stopifnot(is.list(meta.data))
                        stopifnot(server.name %in% names(self$servers))
                        data <- read_csv(path.to.data)
                        self$servers[[server.name]]$upload(meta.data, data, dataset.name)
                      }
))

Client script

In this example a R script uses an instantiation of the class Connection to (1) start a server, (2) connect to the server, (3) upload the data from the server, (4) retrieve the data stored in a server.

Resources: TutorialDisclosive/R_Scripts_disclosive/main_disclosive.R

The code below shows how all the above operations are achieved.

source("R_Scripts_disclosive/disclosive_client.R")

print("-------------- Start servers  and connect to servers -------------------")
connections <- Connection$new()
London <- connections$start.server()
Newcastle <- connections$start.server()
Edinburgh <- connections$start.server()

connections$connect("London",London)
connections$connect("Newcastle",Newcastle)
connections$connect("Edinburgh",Edinburgh)


print("-------------- upload the classic datasets -------------------")
print("Newcastle has upload classic 1 datasets")
path.to.data <- "data/classic_1.csv"
meta.data <- list("Title", "Author","GreatReadScore","Words","YearPub")
connections$upload("Newcastle",path.to.data,meta.data,"classic")

print("London has upload classic w datasets")
path.to.data <- "data/classic_2.csv"
meta.data <- list("Title", "Author","GreatReadScore","Words","YearPub")
connections$upload("London",path.to.data,meta.data,"classic")

print("Edinburghhas upload classic 1 datasets")
print("Edinbugh has upload classic 3 datasets")
path.to.data <- "data/classic_3.csv"
meta.data <- list("Title", "Author","GreatReadScore","Words","YearPub")
connections$upload("Edinburgh",path.to.data,meta.data,"classic")



print("----  retrieve some of the data from the servers and display them -----")
print(connections$servers[["Newcastle"]]$datasets[["classic"]]$data)
print(connections$servers[["London"]]$datasets[["classic"]]$data)
print(connections$servers[["Edinburgh"]]$datasets[["classic"]]$data)

Practice

You could attempt to show all the authors listed in each data set. You could also analyse statistically the year and the number of pages.