Home

eRum 2020 logo

Non-disclosive Federated Analysis in R

The analysis of individual person-level data is often crucial in the biomedical and sciences. But ethical, legal, and regulatory restrictions often provide significant, though understandable and socially responsible, impediments to the sharing of individual-level data. Particularly, when those data are sensitive as is often the case with health data. This situation creates important challenges for the development of appropriate architectures for federated analysis systems, the associated R programming techniques, and the visualization of the data.

This workshop introduces first how an architectural approach can build non-disclosive federated analysis systems. Secondly, we present some practical exercises to illustrate the concepts of non-disclosive programming techniques in R. Finally, we discuss and provide some concrete examples of non-disclosive visualization techniques.

Structure of workshop

Because of the breadth of expertises in the participants of our workshop, you will explore:

The differences between individual person-level data and other types of data.
The technological framework in place to protect the disclosure of individual person-level data

The participants will simulate with an Object-Oriented Programming paradigm

A disclosive client-server architecture
Basic anonymisation and creation of synthetic data
The use of a "server parser" to limit the access to the data
The virtually-joined and server-level analysis
The use of threshold to prevent inferential reconstruction of datasets

The final part will :

Introduce DataSHIELD architecture
Bring into context the aforementioned principles into a non-disclosive federated analysis system
Demonstrate some DataSHIELD analysis of Covid-19 data
Demonstrate some gene expression analysis using DataSHIELD

Let's explore

In the code section of this repository, you will find five R projects. Each of them can be downloaded and use alongside the wiki pages. The table below shows how each project relates to each tutorial in the wiki. You will need to clone or download the repository, to have access to these projects. You may need to join GitHub to continue with the tutorial.

R project	Wiki tutorial	Author
A. TutorialDisclosive	Disclosive simulation	P. Ryser-Welch
B. TutorialAnonymised	Anonymisation and synthetic data	P. Ryser-Welch
C. TutorialParser	Limiting Access to the data	P. Ryser-Welch
D. TutorialClientFunction	Two types of analysis	P. Ryser-Welch
E. TutorialThreshold	Limiting inference	P.Ryser-Welch
F. Analysing Covid-19 data	Demonstration of a DataSHIELD analysis	A. Westerberg
G. Omics Data	Gene expression analysis	L. Abarrategui
H. Visualisation	Introduction to non-disclosive visualisation techniques	D. Avraam
None	an overview of the DataSHIELD	P. Burton, P. Ryser-Welch, S. Wheater, M. Murtag, Yannick Macron
None	Installing DataSHIELD	A. Westerberg

All the tutorial starting with a letter [A-E] use three types of scripts:

main : The code that can be executed to demonstrate some analysis and their disclosivity.
client: The code that simulates any client code.
server: The code that simulates the server code.

While these simulations have been created to demonstrate certain issues with some Cloud and some federated system analysis, will explain how these ideas and concepts

DataSHIELD libraries and other elements can installed from this page: Installing DataSHIELD.

Once you have installed DataSHIELD, an introductory tutorial can then be completed.

Contributers and presenters

This workshop would have been happening without the DataSHIELD team. In particular,

DataSHIELD team

Images of the DataSHIEDL team

Patricia Ryser-Welch (DataSHIELD Team) DataSHIELD website

Provide feedback

Saved searches

Use saved searches to filter your results more quickly