GitHub

To use this CAFA benchmarking software perform the following 4 steps:

Generate Data (use the dataLoadingScripts directory):

usage: main.py [-h] [-o FILE] [-n FILE] [-g FILE] [-d DATAOUTPUT]

The purpose of this step is generate all swissprot data required to load into database for querying

optional arguments:

-h, --help show this help message and exit

-o FILE, --older FILE

                    Older swissprot input file.

-n FILE, --newer FILE

                    Newer swissprot input file.

-g FILE, --goterm FILE

                    Go term file, tab separated.

-d DATAOUTPUT, --dataoutput DATAOUTPUT

                    Provide path for redirecting your output.

Create database and load schema

1.Create the database in mysql command line:

create [dbname];

exit

2.mysql -u [username] -p [dbname] < cafa.sql

(Note: cafa.sql located in schema directory)
Loading data generated in Step (1) into the database created in Step (2):

Edit file: /dbCAFA/dataLoaderScripts/cafaDBLoading.txt to reflect dbNAME and file paths of all the data files generated in Step 1.

Run the following command in shell:

mysql -u [username] -p < /dbCAFA/dataLoaderScripts/cafaDBLoading.txt

Note: Sometimes the path of the file is not found by mysql due to some problems, here is what I did to solve the issue:
```
  sudo apt-get install apparmor-utils

  sudo aa-complain /usr/sbin/mysqld

  sudo /etc/init.d/apparmor reload
```
Benchmark generation (use folder benchmarkGeneration):

To generate benchmarks firstly modify 2nd line in cafaModel/modelCAFA.py to reflect your username, password and database name.

Next run the script in benchmarkGeneration folder

usage: generateBenchmarks.py [-h] [--chooseType CHOOSETYPE]
```
                         [--chooseOntology CHOOSEONTOLOGY]
```
optional arguments:

-h, --help show this help message and exit

--chooseType CHOOSETYPE, -t CHOOSETYPE
```
                 Either noKnowledge or partialKnowledge
```
--chooseOntology CHOOSEONTOLOGY, -o CHOOSEONTOLOGY
```
                 Only if partialKnowledge choose:

                 molecular_function OR

                 biological_process OR

                 cellular_component; 
                 
                 skip this option if noKnowledge
```

TODO:

The data loading scripts for Protein, GO, Protein_GO and Evidence need editing so that they write AI of each table in the files generated. For consistency purposes Protein file needs to include a header. Parameterize the dates in prepareProtein.py
cafaDBLoading.txt needs: auto increments reset before you begin loading. Database name, file path should be parameterized. The infile command loading protein data should ignore the header once above point is completed.
Editing the main.py in dataLoaderScripts directory so that it can run the files from anywhere.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
benchmarkGeneration		benchmarkGeneration
cafaModel		cafaModel
dataLoaderScripts		dataLoaderScripts
mockDataFiles		mockDataFiles
schema		schema
.generateTargetsType2.py.swp		.generateTargetsType2.py.swp
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

asmariyaz23/dbCAFA

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages