-
Notifications
You must be signed in to change notification settings - Fork 0
Data integration and visualization tool taking in chemical structures and bioassays for chemical risk assessment (CBRA) for publication
yenlow/chemBioViz
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
# README instructions for chemical-biological read across (CBRA) # # Also in supplemental material of publication: # Integrative Chemical–Biological Read-Across Approach for Chemical Hazard Classification. Y # Yen Low, Alexander Sedykh, Denis Fourches, Alexander Golbraikh, Maurice Whelan, Ivan Rusyn, and Alexander Tropsha. Chemical Research in Toxicology 2013 26 (8), 1199-1208. # DOI: 10.1021/tx400110 http://pubs.acs.org/doi/suppl/10.1021/tx400110f # # Yen Low (yenlow@gmail.com) # 05 June 2013 version 1.0 ########################################################## ################ OBJECTIVES OF PROGRAM ################### 1. Builds 4 models for comparison a) dual-space kNN of chemical and biological neighbors b) single space kNN of chemical neighbors c) single space kNN of biological neighbors d) single space kNN of hybrid neighbors (i.e. chemical+biological spaces) 2. Generates radial plots shown in desired_output.pdf In each radial plot, - nodes represent compounds - central node represents the target compound to be predicted - nearest kbio biological neighbors are positioned left of vertical axis - nearest kchem chemical neighbors are positioned right of vertical axis - edge length is proportional to the Jaccard distance between target compound and its neighbor - nearest neighbors are positioned closest to the 12-o'clock position - colors denote the observed class of the compound (black=nontoxic,-1; red=toxic,+1) ################ DATA FILES PROVIDED ################### Data set: Rat acute toxicity (Oral LD50) a) ld50_drg_n.xa (chemical descriptors) b) ld50_atp_csp_n.xa (biological descriptors) At least R 2.14 is recommended. Tested on R 2.14, 3.0, Windows 7-8 and Ubuntu 12.10 The following R packages are required: (script will automatically install them if necessary) 1. boot 2. caret 3. class 4. e1071 5. plotrix 6. ROCR 7. vegan ###### INSTRUCTIONS ############################# STEP 1: Unzip .zip package. Check that the following files are in the same folder: 1. ld50_drg_n.xa (chemical descriptors) 2. ld50_atp_csp_n.xa (biological descriptors) 3. master_script.R 4. multispaceNNobj.R 5. multispace_functions_AD_scaled.R 6. readXAfile.R 7. sampling.R 8. validationstats_BIN.R 9. bootstrapSD_BIN.R 10. desired_output.pdf 11. README.txt STEP 2: Run master_script.R using the following command If running R in terminal MODE (e.g. in linux): At the command prompt, enter: Rscript master_script.R OR If running R GUI (e.g. in Windows): Step 2.1: Start R GUI. Step 2.2: Set working directory to where CBRA is unzipped to Within R GUI, go to File -> Change dir... and enter the file path of the CBRA folder OR enter: setwd("[file path of CBRA]") - Use foward slash "/" instead of backslash "\" Step 2.2: Open master_script.R by File -> Open Step 2.3: Run master_script.R. Within R Editor, go to Edit -> Run all PROCESS TAKES 3 MINUTES ON A QUAD-CORE COMPUTER. PLEASE BE PATIENT. (Each model consists of multiple models generated by 5-fold external cross-validation and 10-fold internal cross-validation) ################################################### Output files generated: 1. .pdf (figures showing dual-space kNN) a) singlecpd.pdf (Compound #20's nearest biological and chemical neighbors) a) 6cpds_2by3grid.pdf (6 compounds' nearest biological and chemical neighbors in 2 by 3 grid) a) 4cpds_2by2grid.pdf (4 compounds' nearest biological and chemical neighbors in 2 by 2 grid) 2. .pred files (tables containing observed and predicted values of each compound) a) chem.pred b) gene.pred c) hybrid.pred d) dual.pred 3. validationstats_xxx.txt files (prediction performance of models, e.g. specificity, sensitivity, AUC) a) validationstats_chem.txt b) validationstats_gene.txt c) validationstats_hybrid.txt d) validationstats_dual.txt 4. dualspace.RData (.RData object containing data, models, objects for radial plots) 5. shuffleID.RData (.RData object containing randomizer seed used for cross validation) ######## NOTES ######################### If dependencies cannot be loaded, manually install the following R packages using: install.packages("[Rpackage]") For example, to install the R package, boot, enter: install.packages("boot") Required R packages: 1. boot 2. caret 3. class 4. e1071 5. plotrix 6. ROCR 7. vegan Please contact Yen Low at yenlow@gmail.com to report bugs.
About
Data integration and visualization tool taking in chemical structures and bioassays for chemical risk assessment (CBRA) for publication
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published