Skip to content
alexwesterberg edited this page Jun 20, 2020 · 21 revisions

DataSHIELD and Newcastle university logo eRum 2020 logo

Non-disclosive Federated Analysis in R

The analysis of individual person-level data is often crucial in the biomedical and sciences. But ethical, legal, and regulatory restrictions often provide significant, though understandable and socially responsible, impediments to the sharing of individual-level data. Particularly, when those data are sensitive as is often the case with health data. This situation creates important challenges for the development of appropriate architectures for federated analysis systems, the associated R programming techniques, and the visualization of the data.

This workshop introduces first how an architectural approach can build non-disclosive federated analysis systems. Secondly, we present some practical exercises to illustrate the concepts of non-disclosive programming techniques in R. Finally, we discuss and provide some concrete examples of non-disclosive visualization techniques.

Structure of workshop

Because of the breadth of expertises in the participants of our workshop, you will explore:

  1. The differences between individual person-level data and other types of data.
  2. The technological framework in place to protect the disclosure of individual person-level data

The participants will simulate with an Object-Oriented Programming paradigm

  1. A disclosive client-server architecture
  2. Basic anonymisation and creation of synthetic data
  3. The use of a "server parser" to limit the access to the data
  4. The virtually-joined and server-level analysis
  5. The use of threshold to prevent inferential reconstruction of datasets

The final part will :

  1. Introduce DataSHIELD architecture
  2. Bring into context the aforementioned principles into a non-disclosive federated analysis system
  3. Demonstrate some DataSHIELD analysis of Covid-19 data
  4. Demonstrate some gene expression analysis using DataSHIELD

Let's explore

In the code section of this repository, you will find five R projects. Each of them can be downloaded and use alongside the wiki pages. The table below shows how each project relates to each tutorial in the wiki. You will need to clone or download the repository, to have access to these projects. You may need to join GitHub to continue with the tutorial.

R project Wiki tutorial Author
A. TutorialDisclosive Disclosive simulation P. Ryser-Welch
B. TutorialAnonymised Anonymisation and synthetic data P. Ryser-Welch
C. TutorialParser Limiting Access to the data P. Ryser-Welch
D. TutorialClientFunction Two types of analysis P. Ryser-Welch
E. TutorialThreshold Limiting inference P.Ryser-Welch
F. Analysing Covid-19 data Demonstration of a DataSHIELD analysis A. Westerberg
G. Omics Data Gene expression analysis L. Abarrategui
H. Visualisation Introduction to non-disclosive visualisation techniques D. Avraam
None an overview of the DataSHIELD P. Burton, P. Ryser-Welch, S. Wheater, M. Murtag, Yannick Macron
None Installing DataSHIELD A. Westerberg

All the tutorial starting with a letter [A-E] use three types of scripts:

  • main : The code that can be executed to demonstrate some analysis and their disclosivity.
  • client: The code that simulates any client code.
  • server: The code that simulates the server code.

While these simulations have been created to demonstrate certain issues with some Cloud and some federated system analysis, will explain how these ideas and concepts

DataSHIELD libraries and other elements can installed from this page: Installing DataSHIELD.

Once you have installed DataSHIELD, an introductory tutorial can then be completed.

Contributers and presenters

This workshop would have been happening without the DataSHIELD team. In particular,

DataSHIELD team

Images of the DataSHIEDL team