Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
  • Loading branch information
memgonzales committed May 2, 2024
2 parents 8307c40 + 12be7d1 commit e9e3224
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ python3 phiembed.py --input <input_fasta> --model <model_joblib> --output <resul
```

- `input_fasta` is the path to the FASTA file containing the receptor-binding protein sequences. A sample FASTA file is provided [here](https://github.com/bioinfodlsu/phage-host-prediction/blob/main/sample.fasta).
- `model_joblib` is the path to the trained model (recognized format: joblib or compressed joblib, framework: scikit-learn). Download the trained model from this [link](https://drive.google.com/file/d/1bRloKMtPnp8QTOHx5IvSx_-8BspdVKNQ/view?usp=sharing). No need to uncompress, but doing so will speed up loading the model at the cost of additional storage requirements. Refer to this [guide](https://joblib.readthedocs.io/en/latest/generated/joblib.dump.html) for the list of accepted compressed formats.
- `model_joblib` is the path to the trained model (recognized format: joblib or compressed joblib, framework: scikit-learn). Download the trained model from this [link](https://drive.google.com/file/d/1bRloKMtPnp8QTOHx5IvSx_-8BspdVKNQ/view?usp=sharing). No need to uncompress, but doing so will speed up loading the model albeit at the cost of additional storage requirements. Refer to this [guide](https://joblib.readthedocs.io/en/latest/generated/joblib.dump.html) for the list of accepted compressed formats.
- `results_dir` is the path to the directory to which the results of running PHIEmbed will be written. The results of running PHIEmbed on the sample FASTA file are provided [here](https://github.com/bioinfodlsu/phage-host-prediction/tree/main/sample_results).

The results for each protein are written to a CSV file (without a header row). Each row contains two comma-separated values: a host genus and the corresponding prediction score (class probability). The rows are sorted in order of decreasing prediction score. Hence, the first row pertains to the top-ranked prediction.
Expand Down

0 comments on commit e9e3224

Please sign in to comment.