-
Notifications
You must be signed in to change notification settings - Fork 7
Home
The analysis of individual person-level data is often crucial in the biomedical and sciences. But ethical, legal, and regulatory restrictions often provide significant, though understandable and socially responsible, impediments to the sharing of individual-level data. Particularly, when those data are sensitive as is often the case with health data. This situation creates important challenges for the development of appropriate architectures for federated analysis systems, the associated R programming techniques, and the visualization of the data.
This workshop introduces first how an architectural approach can build non-disclosive federated analysis systems. Secondly, we present some practical exercises to illustrate the concepts of non-disclosive programming techniques in R. Finally, we discuss and provide some concrete examples of non-disclosive visualization techniques.
Because of the breadth of expertises in the participants of our workshop, you will explore:
- The differences between individual person-level data and other types of data.
- The technological framework in place to protect the disclosure of individual person-level data
The participants will simulate with an Object-Oriented Programming paradigm
- A disclosive client-server architecture
- Basic anonymisation and creation of synthetic data
- The use of a "server parser" to limit the access to the data
- The virtually-joined and server-level analysis
- The use of threshold to prevent inferential reconstruction of datasets
The final part will :
- Introduce DataSHIELD architecture
- Bring into context the aforementioned principles into a non-disclosive federated analysis system
- Demonstrate some DataSHIELD analysis of Covid-19 data
- Demonstrate some gene expression analysis using DataSHIELD
In the code section of this repository, you will find five R projects. Each of them can be downloaded and use alongside the wiki pages. The table below shows how each project relates to each tutorial in the wiki. You will need to clone or download the repository, to have access to these projects. You may need to join GitHub to continue with the tutorial.
R project | Wiki tutorial | Author |
---|---|---|
A. TutorialDisclosive | Disclosive simulation | P. Ryser-Welch |
B. TutorialAnonymised | Anonymisation and synthetic data | P. Ryser-Welch |
C. TutorialParser | Limiting Access to the data | P. Ryser-Welch |
D. TutorialClientFunction | Two types of analysis | P. Ryser-Welch |
E. TutorialThreshold | Limiting inference | P.Ryser-Welch |
F. Analysing Covid-19 data | Demonstration of a DataSHIELD analysis | A. Westerberg |
G. Omics Data | Gene expression analysis | L. Abarrategui |
H. Visualisation | Introduction to non-disclosive visualisation techniques | D. Avraam |
None | an overview of the DataSHIELD | P. Burton, P. Ryser-Welch, S. Wheater, M. Murtag, Yannick Macron |
None | Installing DataSHIELD | A. Westerberg |
All the tutorial starting with a letter [A-E] use three types of scripts:
- main : The code that can be executed to demonstrate some analysis and their disclosivity.
- client: The code that simulates any client code.
- server: The code that simulates the server code.
While these simulations have been created to demonstrate certain issues with some Cloud and some federated system analysis, will explain how these ideas and concepts
DataSHIELD libraries and other elements can installed from this page: Installing DataSHIELD.
Once you have installed DataSHIELD, an introductory tutorial can then be completed.
This workshop would have been happening without the DataSHIELD team. In particular,
Patricia Ryser-Welch (DataSHIELD Team) DataSHIELD website