Update README.md

bioinfodlsu · Apr 23, 2024 · 6c2a934 · 6c2a934
1 parent 03232c2
commit 6c2a934
Showing 1 changed file with 1 addition and 5 deletions.
diff --git a/README.md b/README.md
@@ -70,23 +70,19 @@ python3 -m pip install -r requirements.txt
 python3 phiembed.py --input <input_filename> --output <output_filename>
 ```
 
-Arguments:
-
 -   `input_filename` is the filename of the FASTA file containing the receptor-binding protein sequences.
 -   `output_filename` is the filename of the file to which the results of running PHIEmbed will be written
 
 Each row in the results file contains two comma-separated values: a host genus and the predicted class probability. The rows are sorted in order of decreasing class probability. Hence, the first row in the results file corresponds to the top-ranked prediction.
 
-Under the hood, this script first converts each sequence into a protein embedding using ProtT5 (the top-performing protein language model based on our experiments) and then passes the embedding to a random forest classifier trained on our _entire_ dataset. 
+Under the hood, this script first converts each sequence into a protein embedding using ProtT5 (the top-performing protein language model based on our experiments) and then passes the embedding to a random forest classifier trained on our _entire_ dataset.
 
 ### Training PHIEmbed
 
 ```
 python3 train.py --input <training_dataset>
 ```
 
-Argument:
-
 -   `training_dataset` is the filename of the training dataset
 
 The training dataset should be formatted as a CSV file (without a header row). Each row corresponds to a training sample. The first column is for the protein IDs, the second column is for the host genera, and the next 1,024 columns are for the components of the ProtT5 embeddings.