Skip to content

Data integration and visualization tool taking in chemical structures and bioassays for chemical risk assessment (CBRA) for publication

Notifications You must be signed in to change notification settings

yenlow/chemBioViz

Repository files navigation

# README instructions for chemical-biological read across (CBRA)
#
# Also in supplemental material of publication:
#    Integrative Chemical–Biological Read-Across Approach for Chemical Hazard Classification. Y
#    Yen Low, Alexander Sedykh, Denis Fourches, Alexander Golbraikh, Maurice Whelan, Ivan Rusyn, and Alexander Tropsha. Chemical Research in Toxicology 2013 26 (8), 1199-1208.
#    DOI: 10.1021/tx400110   http://pubs.acs.org/doi/suppl/10.1021/tx400110f 
#
# Yen Low (yenlow@gmail.com)
# 05 June 2013 version 1.0
##########################################################

################ OBJECTIVES OF PROGRAM ###################
1. Builds 4 models for comparison
   a) dual-space kNN of chemical and biological neighbors
   b) single space kNN of chemical neighbors
   c) single space kNN of biological neighbors
   d) single space kNN of hybrid neighbors (i.e. chemical+biological spaces)

2. Generates radial plots shown in desired_output.pdf
In each radial plot, 
- nodes represent compounds 
- central node represents the target compound to be predicted
- nearest kbio biological neighbors are positioned left of vertical axis 
- nearest kchem chemical neighbors  are positioned right of vertical axis 
- edge length is proportional to the Jaccard distance between target compound and its neighbor
- nearest neighbors are positioned closest to the 12-o'clock position
- colors denote the observed class of the compound (black=nontoxic,-1; red=toxic,+1)


################ DATA FILES PROVIDED ###################
Data set: Rat acute toxicity (Oral LD50)
   a) ld50_drg_n.xa (chemical descriptors)
   b) ld50_atp_csp_n.xa (biological descriptors)


At least R 2.14 is recommended.
Tested on R 2.14, 3.0, Windows 7-8 and Ubuntu 12.10

The following R packages are required:
(script will automatically install them if necessary)
1. boot
2. caret
3. class
4. e1071
5. plotrix
6. ROCR
7. vegan


###### INSTRUCTIONS #############################
STEP 1: Unzip .zip package. Check that the following files are in the same folder:
1.  ld50_drg_n.xa      (chemical descriptors)
2.  ld50_atp_csp_n.xa  (biological descriptors)
3.  master_script.R
4.  multispaceNNobj.R
5.  multispace_functions_AD_scaled.R
6.  readXAfile.R
7.  sampling.R
8.  validationstats_BIN.R
9.  bootstrapSD_BIN.R
10. desired_output.pdf
11. README.txt


STEP 2: Run master_script.R using the following command
If running R in terminal MODE (e.g. in linux): 
	At the command prompt, enter: Rscript master_script.R
OR
If running R GUI (e.g. in Windows): 
	Step 2.1: Start R GUI.
	Step 2.2: Set working directory to where CBRA is unzipped to
	  	  Within R GUI, go to File -> Change dir... and enter the file path of the CBRA folder
	          OR enter: setwd("[file path of CBRA]")  - Use foward slash "/" instead of backslash "\"
	Step 2.2: Open master_script.R by File -> Open
	Step 2.3: Run master_script.R. 
		  Within R Editor, go to Edit -> Run all

PROCESS TAKES 3 MINUTES ON A QUAD-CORE COMPUTER. PLEASE BE PATIENT.
(Each model consists of multiple models generated by 5-fold external cross-validation and 10-fold internal cross-validation)


###################################################
Output files generated:
1. .pdf (figures showing dual-space kNN)
   a) singlecpd.pdf      (Compound #20's nearest biological and chemical neighbors)
   a) 6cpds_2by3grid.pdf (6 compounds' nearest biological and chemical neighbors in 2 by 3 grid)
   a) 4cpds_2by2grid.pdf (4 compounds' nearest biological and chemical neighbors in 2 by 2 grid)
2. .pred files (tables containing observed and predicted values of each compound)
   a) chem.pred
   b) gene.pred
   c) hybrid.pred
   d) dual.pred
3. validationstats_xxx.txt files (prediction performance of models, e.g. specificity, sensitivity, AUC)
   a) validationstats_chem.txt
   b) validationstats_gene.txt
   c) validationstats_hybrid.txt
   d) validationstats_dual.txt   
4. dualspace.RData (.RData object containing data, models, objects for radial plots)
5. shuffleID.RData (.RData object containing randomizer seed used for cross validation)


######## NOTES #########################
If dependencies cannot be loaded, manually install the following R packages using:
install.packages("[Rpackage]")
For example, to install the R package, boot, enter: install.packages("boot")
Required R packages:
1. boot
2. caret
3. class
4. e1071
5. plotrix
6. ROCR
7. vegan


Please contact Yen Low at yenlow@gmail.com to report bugs.

About

Data integration and visualization tool taking in chemical structures and bioassays for chemical risk assessment (CBRA) for publication

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages