Skip to content

Commit

Permalink
Merge pull request #94 from surveilr/ani/elt-dev-transform
Browse files Browse the repository at this point in the history
feat: ELT Feature using CTR, DFA, DCLP1 Datasets #29
  • Loading branch information
sijucj authored Oct 25, 2024
2 parents bf592be + ab8c42f commit 0bd2b2d
Show file tree
Hide file tree
Showing 19 changed files with 5,778 additions and 176 deletions.
132 changes: 76 additions & 56 deletions lib/service/diabetes-research-hub/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,12 +9,13 @@ can view the results directly on your local system. The following steps will
guide you through converting your files, performing de-identification, V&V, and
verifying the data all within your own environment.


# Try outside this repo

### Requirements for Previewing the Edge UI:

1. **Surveilr Tool** (use latest version surveilr(0.30.1))
2. **Deno Runtime** (requires `deno` v2.0):
1. **Surveilr Tool** (use latest version surveilr)
2. **Deno Runtime** (requires `deno2.0` ):
[Deno Installation Guide](https://docs.deno.com/runtime/manual/getting_started/installation/)

Installation steps may vary depending on your operating system.
Expand Down Expand Up @@ -43,27 +44,46 @@ Installation steps may vary depending on your operating system.

### Step 4 : Execute the commands below

1. Run the single execution command:**
1. Run the ingestion**

```bash
deno run -A https://raw.githubusercontent.com/surveilr/www.surveilr.com/main/lib/service/diabetes-research-hub/drhctl.ts 'foldername'
# Use the command below if all the files in the study files folder are of type CSV
surveilr ingest files -r `study-files-folder-name`/ && surveilr orchestrate transform-csv
```

- Replace `foldername` with the name of your folder containing all CSV files to
- Replace `study-files-folder-name` with the name of your folder containing all CSV files to
be converted.

**Example:**

```bash
deno run -A https://raw.githubusercontent.com/surveilr/www.surveilr.com/main/lib/service/diabetes-research-hub/drhctl.ts study-files
surveilr ingest files -r ctr-study-files/ && surveilr orchestrate transform-csv
```
2.Transformation and UI execution process

Example: Dataset pattern is UVA DCLP1

execute

```bash
surveilr shell ./dataset-specific-package/dclp1-uva-study.sql.ts
```
3. server ui execution

execute

```bash
surveilr web-ui --port 9000
```
- After the above command completes execution launch your browser and go to
[http://localhost:9000/drh/index.sql](http://localhost:9000/drh/index.sql).

This method provides a streamlined approach to complete the process and see
the results quickly.

Note: Based on the dataset pattern,the steps 1 and 2 will change in the foldername as well as in the .sql.ts file to be invoked.

### Step 5 : Verify the Verification Validation Results in the UI

- Check the below section in UI and Perform the steps as in the second image
Expand All @@ -80,84 +100,84 @@ Installation steps may vary depending on your operating system.

<p align="center"><b>Image 2</b></p>

# Try it out in this repo using automation script

**Note**: Reference sample files can be found in the repository folder:
/service/diabetes-research-hub/study-files.zip

First, prepare the directory with sample files and copy them to this folder, or
extract the sample files and move them to this folder:
# Try It Out in This Repo (For Development Activities)

Each new dataset type requires manual review to assess study files, determine the mode of ingestion through Surveilr, and prepare a transformation SQL for DRH views. For every dataset, a new transform sql for the study, a combinedTracingView generator, and <studyName>.sql.ts must be created and maintained.

The process isn’t automated, the appropriate ingestion and transformation commands in Surveilr need to be manually selected based on the file types in the dataset folder

### Reference Sample Files

The following sample files are available in the repository:

- `/service/diabetes-research-hub/study-files.zip`
- `/service/diabetes-research-hub/ctr-study-files.zip`
- `/service/diabetes-research-hub/de-trended-analysis-files.zip`

Each of these folders contains different datasets.

### Preparing the Directory

First, prepare the directory by copying or extracting the required sample files into the appropriate folder:

```bash
$ cd service/diabetes-research-hub
```

Now
[Download `surveilr`](https://docs.opsfolio.com/surveilr/how-to/installation-guide/)
into this directory,and use the automation script
Next, download and install [Surveilr](https://docs.opsfolio.com/surveilr/how-to/installation-guide/) into this directory, then ingest and transform the data.

### Removing the Old RSSD

When switching between different datasets, be sure to remove the old RSSD before ingesting the new dataset:

```bash
# Use the automation script
$ deno run -A ./drhctl.ts study-files
$ rm resource-surveillance.sqlite.db
```

**Note**: `study-files` is the folder name containing csv files.
### Ingesting and Transforming Data

Depending on the dataset you're working with, use the appropriate folder name in the `surveilr ingest` command. Below are examples for each dataset:
# Try it out in this repo(for development activities)
```bash
# Ingest and transform the CSV files in the "study-files/" directory, creating resource-surveillance.sqlite.db
$ surveilr ingest files -r study-files/ && surveilr orchestrate transform-csv
The following SQL scripts will be used:
# Ingest and transform the CSV files in the "ctr-study-files/" directory
$ surveilr ingest files -r ctr-study-files/ && surveilr orchestrate transform-csv
- drh-deidentification.sql: De-identifies sensitive columns in the study data.
- stateless.sql: Creates database views for SQLPage preview.
- orchestrate-drh-vv.sql: Performs verification and validation on the study data
tables.
# Ingest and transform the CSV files in the "de-trended-analysis-files/" directory
$ surveilr ingest files -r de-trended-analysis-files/ && surveilr orchestrate transform-csv
```
**Note**: Reference sample files can be found in the repository folder:
/service/diabetes-research-hub/study-files.zip
### Running the SQL Package and Web UI
First, prepare the directory with sample files and copy them to this folder, or
extract the sample files and move them to this folder:
For each Dataset a custom <packagefilename>.sql.ts will be created that perfoms the custom file transformation SQl generation and sqlpage setup
```bash
$ cd service/diabetes-research-hub
# For DCLP1 study Dataset
$ surveilr shell ./dataset-specific-package/dclp1-uva-study.sql.ts
```
The directory should look like this now:
```
├── de-identification
| ├──drh-deidentification.sql
├── study-files
│ ├── author.csv
│ ├── publication.csv
│ └── ...many other study files
├── verfication-validation
| ├──orchestrate-drh-vv.sql
├── stateless.sql
```bash
# For CTR Anderson(2016) study Dataset
$ surveilr shell ./dataset-specific-package/anderson.sql.ts
```
Now
[Download `surveilr`](https://docs.opsfolio.com/surveilr/how-to/installation-guide/)
into this directory, then ingest and query the data:
```bash
# ingest and transform the CSV files in the "study-files/" directory, creating resource-surveillance.sqlite.db
$ surveilr ingest files -r study-files/ && surveilr orchestrate transform-csv
# For Detrended Fluctuation Analysis (colas 2019) study Dataset
$ surveilr shell ./dataset-specific-package/detrended-analysis.sql.ts
```
# Start the server
```bash
# Apply de-identification
$ cat de-identification/drh-deidentification.sql| surveilr orchestrate -n "deidentification"
$ surveilr web-ui --port 9000
```
```bash
# perform v&v ,generate combined cgm tracing views and sqlpage steup
$ surveilr shell ./package.sql.ts
You can now browse the Surveilr Web UI:
# start surveilr web-ui in "watch" mode to re-load package.sql.ts automatically
$ ../../std/surveilrctl.ts dev

# browse http://localhost:9000/ to see surveilr web UI
# browse http://localhost:9000/drh/index.sql to view DRH speciifc UI
```
- **http://localhost:9000/**: Main Surveilr Web UI
- **http://localhost:9000/drh/index.sql**: DRH-specific UI
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
// Define the path for the output CSV file
const outputFilePath = './detrended-fluctation-analysis/S1/detrended-fluctation-analysis-csv-dataset/supporting-files/cgm_file_metadata.csv';

// Function to generate CGM metadata
const generateCgmMetadata = async (numRecords: number) => {
const csvHeader = 'metadata_id,devicename,device_id,source_platform,patient_id,file_name,file_format,file_upload_date,data_start_date,data_end_date,study_id\n';
let csvContent = csvHeader;

for (let i = 1; i <= numRecords; i++) {
const metadataId = `MD-${String(i).padStart(3, '0')}`; // Format as MD-001, MD-002, ...
const devicename =`Medtronic MiniMed`;
const patientId = `${i}`; // Patient ID as 1, 2, ...
const fileName = `case ${i}`; // File name as case 1, case 2, ...

// Create a CSV line with the specified structure
const csvLine = `${metadataId},${devicename},,,"${patientId}",${fileName},csv,,,,DFA\n`;
csvContent += csvLine;
}

// Write the CSV content to a file
await Deno.writeTextFile(outputFilePath, csvContent.trim());
console.log(`CSV file generated at: ${outputFilePath}`);
};

// Generate 209 records
await generateCgmMetadata(209);
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
import * as fs from 'node:fs';
import * as path from 'node:path';
import { copySync, ensureDirSync } from "https://deno.land/std@0.203.0/fs/mod.ts"; // Use Deno's fs utilities
import { parse } from "https://deno.land/std@0.203.0/csv/mod.ts"; // Deno CSV parser

// Function to clean a CSV file (removing any 'Unnamed' columns)
async function cleanCsv(filePath: string, outputFolder: string): Promise<void> {
try {
// Read and process the CSV file
const fileContent = await Deno.readTextFile(filePath);
const rows = await parse(fileContent, { skipFirstRow: false });

const cleanedRows = rows.map((row: Record<string, string>) =>
Object.fromEntries(
Object.entries(row).filter(([key]) => !key.startsWith('Unnamed'))
)
);

// Write the cleaned CSV to the output folder
const filename = path.basename(filePath);
const newFilePath = path.join(outputFolder, filename);
const csvContent = cleanedRows.map((row: Record<string, any>) => Object.values(row).join(',')).join('\n');

await Deno.writeTextFile(newFilePath, csvContent);
console.log(`Successfully cleaned and moved '${filePath}' to '${newFilePath}'`);
} catch (error) {
console.error(`Error processing file '${filePath}': ${error.message}`);
}
}

// Function to rename the file by replacing spaces with underscores
function renameFileWithNoSpaces(filePath: string, outputFolder: string): string {
const filename = path.basename(filePath).replace(/ /g, '_');
const newFilePath = path.join(outputFolder, filename);
copySync(filePath, newFilePath);

console.log(`Renamed and moved file from '${filePath}' to '${newFilePath}'`);
return newFilePath;
}

// Main function to process files in the input folder
async function processFilesInFolder(inputFolder: string, outputSubfolderName: string = 'detrended-fluctation-analysis-csv-dataset') {
const outputFolder = path.join(inputFolder, outputSubfolderName);
ensureDirSync(outputFolder); // Ensure the output folder exists

for (const entry of Deno.readDirSync(inputFolder)) {
if (entry.isFile && entry.name.endsWith('.csv')) {
const filePath = path.join(inputFolder, entry.name);

// Rename and move the file to the new folder
const newFilePath = renameFileWithNoSpaces(filePath, outputFolder);

// Clean the CSV after renaming and moving
await cleanCsv(newFilePath, outputFolder);
}
}
}

// Example usage
const inputFolder = './detrended-fluctation-analysis/S1'; // Replace with your folder path
await processFilesInFolder(inputFolder);
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
import { ensureDirSync } from "https://deno.land/std/fs/mod.ts";
import * as path from "https://deno.land/std/path/mod.ts";
import Papa from "https://esm.sh/papaparse";

const folderPath = './ctr-anderson';
const outputFolder = './ctr-anderson/ctr-anderson-with-comma';

// Ensure output folder exists
ensureDirSync(outputFolder);

// Function to convert text files to CSV
async function convertTxtToCsvSpace(filePath: string, newFilePath: string) {
// Read the content of the file
const fileContent = await Deno.readTextFile(filePath);

// Parse the text file with pipe delimiter
const results = Papa.parse(fileContent, {
delimiter: "|", // Specify pipe delimiter
skipEmptyLines: true,
header: false, // Change to true if the first line is a header
});

// Convert parsed results to CSV format with comma delimiter
const csvContent = results.data.map((row: string[]) => row.join(",")).join("\n");

// Write to a new CSV file
await Deno.writeTextFile(newFilePath, csvContent);
console.log(`Converted ${path.basename(filePath)} to ${path.basename(newFilePath)}`);
}

// Iterate through all files in the folder
for await (const entry of Deno.readDir(folderPath)) {
if (entry.isFile && entry.name.endsWith('.txt')) {
const filePath = path.join(folderPath, entry.name);
const newFileName = entry.name.replace('.txt', '.csv');
const newFilePath = path.join(outputFolder, newFileName);

await convertTxtToCsvSpace(filePath, newFilePath);
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
import { ensureDirSync } from "https://deno.land/std/fs/mod.ts";
import * as path from "https://deno.land/std/path/mod.ts";

// Define the input text file and output CSV file paths
const inputFilePath = './detrended-fluctation-analysis/clinical_data.txt'; // Input text file path
const outputFolder = './detrended-fluctation-analysis/S1/detrended-fluctation-analysis-csv-dataset'; // Output folder path
const outputFilePath = path.join(outputFolder, 'clinical_data.csv'); // Output CSV file path

// Ensure output directory exists
ensureDirSync(outputFolder);

// Function to convert text data to CSV format
async function convertTextToCsvCommadelimiter(inputFilePath: string) {
// Read the input text file
const text = await Deno.readTextFile(inputFilePath);

// Split the input text into lines
const lines = text.split('\n');

// Prepare the CSV data array
const csvData: string[] = [];

// Set the header
const headers = ['pid', 'gender', 'age', 'BMI', 'glycaemia', 'HbA1c', 'follow.up', 'T2DM'];
csvData.push(headers.join(',')); // Join headers with comma

// Process each line after the header
for (const line of lines.slice(1)) {
if (line.trim() === "") continue; // Skip empty lines
// Replace double quotes and split by space
const row = line.replace(/"/g, '').split(/\s+/);

// Prepend the row with a pid value (based on the first column value)
const pid = row[0]; // Use the value in the first column as pid
const newRow = [pid, ...row.slice(1)]; // Keep the rest of the columns

// Join the new row with comma and add to csvData
csvData.push(newRow.join(','));
}

// Write the CSV data to a file
await Deno.writeTextFile(outputFilePath, csvData.join('\n'));
console.log(`Successfully converted to CSV: ${outputFilePath}`);
}

// Convert the input text file to CSV
await convertTextToCsvCommadelimiter(inputFilePath);
Loading

0 comments on commit 0bd2b2d

Please sign in to comment.