Merge pull request #94 from surveilr/ani/elt-dev-transform

feat: ELT Feature using CTR, DFA, DCLP1 Datasets #29
surveilr · Oct 25, 2024 · 0bd2b2d · 0bd2b2d
2 parents bf592be + ab8c42f
commit 0bd2b2d
Show file tree

Hide file tree

Showing 19 changed files with 5,778 additions and 176 deletions.
diff --git a/lib/service/diabetes-research-hub/README.md b/lib/service/diabetes-research-hub/README.md
@@ -9,12 +9,13 @@ can view the results directly on your local system. The following steps will
 guide you through converting your files, performing de-identification, V&V, and
 verifying the data all within your own environment.
 
+
 # Try outside this repo
 
 ### Requirements for Previewing the Edge UI:
 
-1. **Surveilr Tool** (use latest version surveilr(0.30.1))
-2. **Deno Runtime** (requires `deno` v2.0):
+1. **Surveilr Tool** (use latest version surveilr)
+2. **Deno Runtime** (requires `deno2.0` ):
    [Deno Installation Guide](https://docs.deno.com/runtime/manual/getting_started/installation/)
 
 Installation steps may vary depending on your operating system.
@@ -43,27 +44,46 @@ Installation steps may vary depending on your operating system.
 
 ### Step 4 : Execute the commands below
 
-1. Run the single execution command:**
+1. Run the ingestion**
 
    ```bash
-   deno run -A https://raw.githubusercontent.com/surveilr/www.surveilr.com/main/lib/service/diabetes-research-hub/drhctl.ts 'foldername'
+   # Use the command below if all the files in the study files folder are of type CSV 
+   surveilr ingest files -r `study-files-folder-name`/ && surveilr orchestrate transform-csv
    ```
 
-- Replace `foldername` with the name of your folder containing all CSV files to
+- Replace `study-files-folder-name` with the name of your folder containing all CSV files to
   be converted.
 
   **Example:**
 
   ```bash
-  deno run -A https://raw.githubusercontent.com/surveilr/www.surveilr.com/main/lib/service/diabetes-research-hub/drhctl.ts study-files
+  surveilr ingest files -r ctr-study-files/ && surveilr orchestrate transform-csv
   ```
+2.Transformation and UI execution process
+
+Example: Dataset pattern is UVA DCLP1
+
+execute 
+
+```bash
+  surveilr shell ./dataset-specific-package/dclp1-uva-study.sql.ts
+```
+3. server ui execution
+
+execute 
+
+```bash
+  surveilr web-ui --port 9000
 
+```
   - After the above command completes execution launch your browser and go to
     [http://localhost:9000/drh/index.sql](http://localhost:9000/drh/index.sql).
 
   This method provides a streamlined approach to complete the process and see
   the results quickly.
 
+Note: Based on the dataset pattern,the steps 1 and 2 will change in the foldername as well as in the .sql.ts file to be invoked.
+
 ### Step 5 : Verify the Verification Validation Results in the UI
 
 - Check the below section in UI and Perform the steps as in the second image
@@ -80,84 +100,84 @@ Installation steps may vary depending on your operating system.
 
 <p align="center"><b>Image 2</b></p>
 
-# Try it out in this repo using automation script
 
-**Note**: Reference sample files can be found in the repository folder:
-/service/diabetes-research-hub/study-files.zip
 
-First, prepare the directory with sample files and copy them to this folder, or
-extract the sample files and move them to this folder:
+# Try It Out in This Repo (For Development Activities)
+
+Each new dataset type requires manual review to assess study files, determine the mode of ingestion through Surveilr, and prepare a transformation SQL for DRH views. For every dataset, a new transform sql for the study, a combinedTracingView generator, and <studyName>.sql.ts must be created and maintained.
+
+The process isn’t automated, the appropriate ingestion and transformation commands in Surveilr need to be manually selected based on the file types in the dataset folder
+
+### Reference Sample Files
+
+The following sample files are available in the repository:
+
+- `/service/diabetes-research-hub/study-files.zip`
+- `/service/diabetes-research-hub/ctr-study-files.zip`
+- `/service/diabetes-research-hub/de-trended-analysis-files.zip`
+
+Each of these folders contains different datasets.
+
+### Preparing the Directory
+
+First, prepare the directory by copying or extracting the required sample files into the appropriate folder:
 
 ```bash
 $ cd service/diabetes-research-hub
 ```
 
-Now
-[Download `surveilr`](https://docs.opsfolio.com/surveilr/how-to/installation-guide/)
-into this directory,and use the automation script
+Next, download and install [Surveilr](https://docs.opsfolio.com/surveilr/how-to/installation-guide/) into this directory, then ingest and transform the data.
+
+### Removing the Old RSSD
+
+When switching between different datasets, be sure to remove the old RSSD before ingesting the new dataset:
 
 ```bash
-# Use the automation script
-$ deno run -A ./drhctl.ts study-files
+$ rm resource-surveillance.sqlite.db
 ```
 
-**Note**: `study-files` is the folder name containing csv files.
+### Ingesting and Transforming Data
+
+Depending on the dataset you're working with, use the appropriate folder name in the `surveilr ingest` command. Below are examples for each dataset:
 
-# Try it out in this repo(for development activities)
+```bash
+# Ingest and transform the CSV files in the "study-files/" directory, creating resource-surveillance.sqlite.db
+$ surveilr ingest files -r study-files/ && surveilr orchestrate transform-csv
 
-The following SQL scripts will be used:
+# Ingest and transform the CSV files in the "ctr-study-files/" directory
+$ surveilr ingest files -r ctr-study-files/ && surveilr orchestrate transform-csv
 
-- drh-deidentification.sql: De-identifies sensitive columns in the study data.
-- stateless.sql: Creates database views for SQLPage preview.
-- orchestrate-drh-vv.sql: Performs verification and validation on the study data
-  tables.
+# Ingest and transform the CSV files in the "de-trended-analysis-files/" directory
+$ surveilr ingest files -r de-trended-analysis-files/ && surveilr orchestrate transform-csv
+```
 
-**Note**: Reference sample files can be found in the repository folder:
-/service/diabetes-research-hub/study-files.zip
+### Running the SQL Package and Web UI
 
-First, prepare the directory with sample files and copy them to this folder, or
-extract the sample files and move them to this folder:
+For each Dataset a custom <packagefilename>.sql.ts will be created that perfoms the custom file transformation SQl generation and sqlpage setup
 
 ```bash
-$ cd service/diabetes-research-hub
+# For DCLP1 study Dataset  
+$ surveilr shell ./dataset-specific-package/dclp1-uva-study.sql.ts 
 ```
 
-The directory should look like this now:
 
-```
-├── de-identification
-|   ├──drh-deidentification.sql
-├── study-files
-│   ├── author.csv
-│   ├── publication.csv
-│   └── ...many other study files    
-├── verfication-validation
-|   ├──orchestrate-drh-vv.sql
-├── stateless.sql
+```bash
+# For CTR Anderson(2016) study Dataset  
+$ surveilr shell ./dataset-specific-package/anderson.sql.ts 
 ```
 
-Now
-[Download `surveilr`](https://docs.opsfolio.com/surveilr/how-to/installation-guide/)
-into this directory, then ingest and query the data:
-
 ```bash
-# ingest and transform the CSV files in the "study-files/" directory, creating resource-surveillance.sqlite.db
-$ surveilr ingest files -r study-files/ && surveilr orchestrate transform-csv
+# For Detrended Fluctuation Analysis (colas 2019) study Dataset  
+$ surveilr shell ./dataset-specific-package/detrended-analysis.sql.ts
 ```
+# Start the server 
 
 ```bash
-# Apply de-identification
-$ cat de-identification/drh-deidentification.sql| surveilr orchestrate -n "deidentification"
+$ surveilr web-ui --port 9000
 ```
 
-```bash 
-# perform v&v ,generate combined cgm tracing views and sqlpage steup
-$ surveilr shell ./package.sql.ts                 
 
+You can now browse the Surveilr Web UI:
 
-# start surveilr web-ui in "watch" mode to re-load package.sql.ts automatically
-$ ../../std/surveilrctl.ts dev
-
-# browse http://localhost:9000/ to see surveilr web UI
-# browse http://localhost:9000/drh/index.sql to view DRH speciifc UI
-```
+- **http://localhost:9000/**: Main Surveilr Web UI
+- **http://localhost:9000/drh/index.sql**: DRH-specific UI
diff --git a/lib/service/diabetes-research-hub/ctr-study-files.zip b/lib/service/diabetes-research-hub/ctr-study-files.zip
diff --git a/lib/service/diabetes-research-hub/dataset-prep-transformation/cgm-file-metadata-generator.ts b/lib/service/diabetes-research-hub/dataset-prep-transformation/cgm-file-metadata-generator.ts
@@ -0,0 +1,26 @@
+// Define the path for the output CSV file
+const outputFilePath = './detrended-fluctation-analysis/S1/detrended-fluctation-analysis-csv-dataset/supporting-files/cgm_file_metadata.csv';
+
+// Function to generate CGM metadata
+const generateCgmMetadata = async (numRecords: number) => {
+    const csvHeader = 'metadata_id,devicename,device_id,source_platform,patient_id,file_name,file_format,file_upload_date,data_start_date,data_end_date,study_id\n';
+    let csvContent = csvHeader;
+
+    for (let i = 1; i <= numRecords; i++) {
+        const metadataId = `MD-${String(i).padStart(3, '0')}`; // Format as MD-001, MD-002, ...
+        const devicename =`Medtronic MiniMed`;
+        const patientId = `${i}`; // Patient ID as 1, 2, ...
+        const fileName = `case ${i}`; // File name as case 1, case 2, ...
+
+        // Create a CSV line with the specified structure
+        const csvLine = `${metadataId},${devicename},,,"${patientId}",${fileName},csv,,,,DFA\n`;
+        csvContent += csvLine;
+    }
+
+    // Write the CSV content to a file
+    await Deno.writeTextFile(outputFilePath, csvContent.trim());
+    console.log(`CSV file generated at: ${outputFilePath}`);
+};
+
+// Generate 209 records
+await generateCgmMetadata(209);
diff --git a/lib/service/diabetes-research-hub/dataset-prep-transformation/clean-csv.ts b/lib/service/diabetes-research-hub/dataset-prep-transformation/clean-csv.ts
@@ -0,0 +1,61 @@
+import * as fs from 'node:fs';
+import * as path from 'node:path';
+import { copySync, ensureDirSync } from "https://deno.land/std@0.203.0/fs/mod.ts"; // Use Deno's fs utilities
+import { parse } from "https://deno.land/std@0.203.0/csv/mod.ts"; // Deno CSV parser
+
+// Function to clean a CSV file (removing any 'Unnamed' columns)
+async function cleanCsv(filePath: string, outputFolder: string): Promise<void> {
+    try {
+        // Read and process the CSV file
+        const fileContent = await Deno.readTextFile(filePath);
+        const rows = await parse(fileContent, { skipFirstRow: false });
+
+        const cleanedRows = rows.map((row: Record<string, string>) =>
+            Object.fromEntries(
+                Object.entries(row).filter(([key]) => !key.startsWith('Unnamed'))
+            )
+        );
+
+        // Write the cleaned CSV to the output folder
+        const filename = path.basename(filePath);
+        const newFilePath = path.join(outputFolder, filename);
+        const csvContent = cleanedRows.map((row: Record<string, any>) => Object.values(row).join(',')).join('\n');
+
+        await Deno.writeTextFile(newFilePath, csvContent);
+        console.log(`Successfully cleaned and moved '${filePath}' to '${newFilePath}'`);
+    } catch (error) {
+        console.error(`Error processing file '${filePath}': ${error.message}`);
+    }
+}
+
+// Function to rename the file by replacing spaces with underscores
+function renameFileWithNoSpaces(filePath: string, outputFolder: string): string {
+    const filename = path.basename(filePath).replace(/ /g, '_');
+    const newFilePath = path.join(outputFolder, filename);
+    copySync(filePath, newFilePath);
+
+    console.log(`Renamed and moved file from '${filePath}' to '${newFilePath}'`);
+    return newFilePath;
+}
+
+// Main function to process files in the input folder
+async function processFilesInFolder(inputFolder: string, outputSubfolderName: string = 'detrended-fluctation-analysis-csv-dataset') {
+    const outputFolder = path.join(inputFolder, outputSubfolderName);
+    ensureDirSync(outputFolder); // Ensure the output folder exists
+
+    for (const entry of Deno.readDirSync(inputFolder)) {
+        if (entry.isFile && entry.name.endsWith('.csv')) {
+            const filePath = path.join(inputFolder, entry.name);
+
+            // Rename and move the file to the new folder
+            const newFilePath = renameFileWithNoSpaces(filePath, outputFolder);
+
+            // Clean the CSV after renaming and moving
+            await cleanCsv(newFilePath, outputFolder);
+        }
+    }
+}
+
+// Example usage
+const inputFolder = './detrended-fluctation-analysis/S1'; // Replace with your folder path
+await processFilesInFolder(inputFolder);
diff --git a/...diabetes-research-hub/dataset-prep-transformation/text-to-csv-converter-commadelimiter.ts b/...diabetes-research-hub/dataset-prep-transformation/text-to-csv-converter-commadelimiter.ts
@@ -0,0 +1,40 @@
+import { ensureDirSync } from "https://deno.land/std/fs/mod.ts";
+import * as path from "https://deno.land/std/path/mod.ts";
+import Papa from "https://esm.sh/papaparse"; 
+
+const folderPath = './ctr-anderson';
+const outputFolder = './ctr-anderson/ctr-anderson-with-comma';
+
+// Ensure output folder exists
+ensureDirSync(outputFolder);
+
+// Function to convert text files to CSV
+async function convertTxtToCsvSpace(filePath: string, newFilePath: string) {
+    // Read the content of the file
+    const fileContent = await Deno.readTextFile(filePath);
+
+    // Parse the text file with pipe delimiter
+    const results = Papa.parse(fileContent, {
+        delimiter: "|",  // Specify pipe delimiter
+        skipEmptyLines: true,
+        header: false,  // Change to true if the first line is a header
+    });
+
+    // Convert parsed results to CSV format with comma delimiter
+    const csvContent = results.data.map((row: string[]) => row.join(",")).join("\n");
+
+    // Write to a new CSV file
+    await Deno.writeTextFile(newFilePath, csvContent);
+    console.log(`Converted ${path.basename(filePath)} to ${path.basename(newFilePath)}`);
+}
+
+// Iterate through all files in the folder
+for await (const entry of Deno.readDir(folderPath)) {
+    if (entry.isFile && entry.name.endsWith('.txt')) {
+        const filePath = path.join(folderPath, entry.name);
+        const newFileName = entry.name.replace('.txt', '.csv');
+        const newFilePath = path.join(outputFolder, newFileName);
+
+        await convertTxtToCsvSpace(filePath, newFilePath);
+    }
+}
diff --git a/lib/service/diabetes-research-hub/dataset-prep-transformation/txt-to-csv-spacetocomma.ts b/lib/service/diabetes-research-hub/dataset-prep-transformation/txt-to-csv-spacetocomma.ts
@@ -0,0 +1,47 @@
+import { ensureDirSync } from "https://deno.land/std/fs/mod.ts";
+import * as path from "https://deno.land/std/path/mod.ts";
+
+// Define the input text file and output CSV file paths
+const inputFilePath = './detrended-fluctation-analysis/clinical_data.txt'; // Input text file path
+const outputFolder = './detrended-fluctation-analysis/S1/detrended-fluctation-analysis-csv-dataset'; // Output folder path
+const outputFilePath = path.join(outputFolder, 'clinical_data.csv'); // Output CSV file path
+
+// Ensure output directory exists
+ensureDirSync(outputFolder);
+
+// Function to convert text data to CSV format
+async function convertTextToCsvCommadelimiter(inputFilePath: string) {
+    // Read the input text file
+    const text = await Deno.readTextFile(inputFilePath);
+
+    // Split the input text into lines
+    const lines = text.split('\n');
+
+    // Prepare the CSV data array
+    const csvData: string[] = [];
+
+    // Set the header
+    const headers = ['pid', 'gender', 'age', 'BMI', 'glycaemia', 'HbA1c', 'follow.up', 'T2DM'];
+    csvData.push(headers.join(',')); // Join headers with comma
+
+    // Process each line after the header
+    for (const line of lines.slice(1)) {
+        if (line.trim() === "") continue; // Skip empty lines
+        // Replace double quotes and split by space
+        const row = line.replace(/"/g, '').split(/\s+/);
+
+        // Prepend the row with a pid value (based on the first column value)
+        const pid = row[0]; // Use the value in the first column as pid
+        const newRow = [pid, ...row.slice(1)]; // Keep the rest of the columns
+
+        // Join the new row with comma and add to csvData
+        csvData.push(newRow.join(','));
+    }
+
+    // Write the CSV data to a file
+    await Deno.writeTextFile(outputFilePath, csvData.join('\n'));
+    console.log(`Successfully converted to CSV: ${outputFilePath}`);
+}
+
+// Convert the input text file to CSV
+await convertTextToCsvCommadelimiter(inputFilePath);