Skip to content

Commit

Permalink
change VARIABLE names, improve failed genomes reporting
Browse files Browse the repository at this point in the history
  • Loading branch information
pchaumeil committed May 10, 2022
1 parent d90674d commit 196147a
Show file tree
Hide file tree
Showing 12 changed files with 215 additions and 128 deletions.
16 changes: 8 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@
[![Docker Image Version (latest by date)](https://img.shields.io/docker/v/ecogenomic/gtdbtk?sort=date&color=299bec&label=docker)](https://hub.docker.com/r/ecogenomic/gtdbtk)
[![Docker Pulls](https://img.shields.io/docker/pulls/ecogenomic/gtdbtk?color=299bec&label=pulls)](https://hub.docker.com/r/ecogenomic/gtdbtk)

<b>[GTDB-Tk v2.0.0](https://ecogenomics.github.io/GTDBTk/announcements.html) was released on April 8, 2022 along with new reference data for [GTDB R07-RS207](https://gtdb.ecogenomic.org/). Upgrading is recommended.</b>
<b> Please note v2.0.0+ is not compatible with GTDB R06-RS202. </b>
<b>[GTDB-Tk v2.1.0](https://ecogenomics.github.io/GTDBTk/announcements.html) was released on May 10, 2022. Upgrading is recommended.</b>
<b> Please note v2.1.0+ is not compatible with GTDB-Tk package [R207_v1](https://data.gtdb.ecogenomic.org/releases/latest/auxillary_files/gtdbtk_data.tar.gz). It is necessary to upgrade to GTDB-Tk package [R207_v2](https://data.gtdb.ecogenomic.org/releases/latest/auxillary_files/gtdbtk_v2_data.tar.gz).</b>

GTDB-Tk is a software toolkit for assigning objective taxonomic classifications to bacterial and archaeal genomes based on the Genome Database Taxonomy ([GTDB](https://gtdb.ecogenomic.org/)). It is designed to work with recent advances that allow hundreds or thousands of metagenome-assembled genomes (MAGs) to be obtained directly from environmental samples. It can also be applied to isolate and single-cell genomes. The GTDB-Tk is open source and released under the [GNU General Public License (Version 3)](https://www.gnu.org/licenses/gpl-3.0.en.html).

Expand All @@ -18,12 +18,12 @@ Please post questions and issues related to GTDB-Tk on the Issues section of the

## New Features

GTDB-Tk v2.0.0 includes the following new features:
- GTDB-TK now uses a **divide-and-conquer** approach where the bacterial reference tree is split into multiple order-level subtrees. This reduces the memory requirements of GTDB-Tk from **320 GB** of RAM when using the full GTDB R07-RS207 reference tree to approximately **35 GB**. A manuscript describing this approach is in preparation. If you wish to continue using the full GTDB reference tree use the `--full-tree` flag.
- Archaeal classifications now uses a refined set of 53 archaeal-specific marker genes based on the recent publication by [Dombrowski et al., 2020](https://www.nature.com/articles/s41467-020-17408-w). This set of archaeal marker genes is used by GTDB for curating the archaeal taxonomy.
- By default, all directories containing intermediate results are **now removed** by default at the end of the `classify_wf` and `de_novo_wf` pipelines. If you wish to retain these intermediates files use the `--keep-intermediates` flag.
- All MSA files produced by the `align` step are now compressed with gzip.
- The classification summary and failed genomes files are now the only files linked in the root directory of `classify_wf`.
GTDB-Tk v2.1.0 includes the following new features:
- GTDB-TK now uses a **divide-and-conquer** approach where the bacterial reference tree is split into multiple **class**-level subtrees. This reduces the memory requirements of GTDB-Tk from **320 GB** of RAM when using the full GTDB R07-RS207 reference tree to approximately **50 GB**. A manuscript describing this approach is in preparation. If you wish to continue using the full GTDB reference tree use the `--full-tree` flag.
This is the main change from v2.0.0. The split tree approach has been modified from order-level trees to class-level trees to resolve specific classification issues (See [#383](https://github.com/Ecogenomics/GTDBTk/issues/383)).
- Genomes that can not be assigned to a domain (e.g. genomes with no bacterial or archaeal markers or genomes with no genes called by Prodigal) are now reported in the gtdbtk.bac120.summary.tsv as 'Unclassified'
- Genomes filtered out during the alignment step are now reported in the `gtdbtk.bac120.summary.tsv` as 'Unclassified Bacteria/Archaea'
- `__write_single_copy_genes` flag in now available in the `classify_wf` and `de_novo_wf` workflows.


## Documentation
Expand Down
147 changes: 73 additions & 74 deletions gtdbtk/classify.py

Large diffs are not rendered by default.

94 changes: 69 additions & 25 deletions gtdbtk/config/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,19 +36,19 @@
PFAM_HMM_DIR = os.path.join(MARKER_DIR, 'pfam/')

SPLIT_DIR = os.path.join(GENERIC_PATH, 'split')
HIGH_SPLIT_DIR = os.path.join(SPLIT_DIR, 'high')
LOW_SPLIT_DIR = os.path.join(SPLIT_DIR, 'low')
HIGH_PPLACER_DIR = os.path.join(HIGH_SPLIT_DIR, 'pplacer')
LOW_PPLACER_DIR = os.path.join(LOW_SPLIT_DIR, 'pplacer')
HIGH_RED_DIR = os.path.join(HIGH_SPLIT_DIR, 'red')
LOW_RED_DIR = os.path.join(LOW_SPLIT_DIR, 'red')
BACKBONE_SPLIT_DIR = os.path.join(SPLIT_DIR, 'backbone')
CLASS_LEVEL_SPLIT_DIR = os.path.join(SPLIT_DIR, 'class_level')
BACKBONE_PPLACER_DIR = os.path.join(BACKBONE_SPLIT_DIR, 'pplacer')
CLASS_LEVEL_PPLACER_DIR = os.path.join(CLASS_LEVEL_SPLIT_DIR, 'pplacer')
BACKBONE_RED_DIR = os.path.join(BACKBONE_SPLIT_DIR, 'red')
CLASS_LEVEL_RED_DIR = os.path.join(CLASS_LEVEL_SPLIT_DIR, 'red')

LOW_TREE_MAPPING_FILE = os.path.join(LOW_SPLIT_DIR, 'tree_mapping.tsv')
CLASS_LEVEL_TREE_MAPPING_FILE = os.path.join(CLASS_LEVEL_SPLIT_DIR, 'tree_mapping.tsv')

HIGH_PPLACER_REF_PKG = 'gtdbtk_package_backbone.refpkg'
HIGH_RED_FILE = os.path.join(HIGH_RED_DIR, 'high_red_value.tsv')
LOW_PPLACER_REF_PKG = os.path.join(LOW_PPLACER_DIR, 'gtdbtk.package.{iter}.refpkg')
LOW_RED_FILE = os.path.join(LOW_RED_DIR, 'red_value_{iter}.tsv')
BACKBONE_PPLACER_REF_PKG = 'gtdbtk_package_backbone.refpkg'
BACKBONE_RED_FILE = os.path.join(BACKBONE_RED_DIR, 'backbone_red_value.tsv')
CLASS_LEVEL_PPLACER_REF_PKG = os.path.join(CLASS_LEVEL_PPLACER_DIR, 'gtdbtk.package.{iter}.refpkg')
CLASS_LEVEL_RED_FILE = os.path.join(CLASS_LEVEL_RED_DIR, 'red_value_{iter}.tsv')

RED_DIST_BAC_DICT = ''
RED_DIST_ARC_DICT = ''
Expand Down Expand Up @@ -124,21 +124,65 @@
"TIGR03625.HMM", "TIGR03632.HMM", "TIGR03654.HMM",
"TIGR03723.HMM", "TIGR03725.HMM", "TIGR03953.HMM"]}

#
#New Version of AR53_MARKERS
AR53_MARKERS = {"PFAM": ["PF04919.13.hmm","PF07541.13.hmm","PF01000.27.hmm",
"PF00687.22.hmm","PF00466.21.hmm","PF00827.18.hmm","PF01280.21.hmm","PF01090.20.hmm",
"PF01200.19.hmm","PF01015.19.hmm","PF00900.21.hmm","PF00410.20.hmm"],
"TIGRFAM":["TIGR00037.HMM","TIGR00064.HMM","TIGR00111.HMM",
"TIGR00134.HMM","TIGR00279.HMM","TIGR00291.HMM","TIGR00323.HMM",
"TIGR00335.HMM","TIGR00373.HMM","TIGR00405.HMM","TIGR00448.HMM",
"TIGR00483.HMM","TIGR00491.HMM","TIGR00522.HMM","TIGR00967.HMM",
"TIGR00982.HMM","TIGR01008.HMM","TIGR01012.HMM","TIGR01018.HMM",
"TIGR01020.HMM","TIGR01028.HMM","TIGR01046.HMM","TIGR01052.HMM",
"TIGR01171.HMM","TIGR01213.HMM","TIGR01952.HMM","TIGR02236.HMM",
"TIGR02338.HMM","TIGR02389.HMM","TIGR02390.HMM","TIGR03626.HMM",
"TIGR03627.HMM","TIGR03628.HMM","TIGR03629.HMM","TIGR03670.HMM",
"TIGR03671.HMM","TIGR03672.HMM","TIGR03673.HMM","TIGR03674.HMM",
"TIGR03676.HMM","TIGR03680.HMM"]}
AR53_MARKERS = {"PFAM": ["PF01868.17.hmm", "PF01282.20.hmm", "PF01655.19.hmm",
"PF01092.20.hmm", "PF01000.27.hmm", "PF00368.19.hmm",
"PF00827.18.hmm", "PF01269.18.hmm", "PF00466.21.hmm",
"PF01015.19.hmm", "PF13685.7.hmm", "PF02978.20.hmm",
"PF04919.13.hmm", "PF01984.21.hmm", "PF04104.15.hmm",
"PF00410.20.hmm", "PF01798.19.hmm", "PF01864.18.hmm",
"PF01990.18.hmm", "PF07541.13.hmm", "PF04019.13.hmm",
"PF00900.21.hmm", "PF01090.20.hmm", "PF02006.17.hmm",
"PF01157.19.hmm", "PF01191.20.hmm", "PF01866.18.hmm",
"PF01198.20.hmm", "PF01496.20.hmm", "PF00687.22.hmm",
"PF03874.17.hmm", "PF01194.18.hmm", "PF01200.19.hmm",
"PF13656.7.hmm", "PF01280.21.hmm"],
"TIGRFAM": ["TIGR00468.HMM", "TIGR01060.HMM", "TIGR03627.HMM",
"TIGR01020.HMM", "TIGR02258.HMM", "TIGR00293.HMM",
"TIGR00389.HMM", "TIGR01012.HMM", "TIGR00490.HMM",
"TIGR03677.HMM", "TIGR03636.HMM", "TIGR03722.HMM",
"TIGR00458.HMM", "TIGR00291.HMM", "TIGR00670.HMM",
"TIGR00064.HMM", "TIGR03629.HMM", "TIGR00021.HMM",
"TIGR03672.HMM", "TIGR00111.HMM", "TIGR03684.HMM",
"TIGR01077.HMM", "TIGR01213.HMM", "TIGR01080.HMM",
"TIGR00501.HMM", "TIGR00729.HMM", "TIGR01038.HMM",
"TIGR00270.HMM", "TIGR03628.HMM", "TIGR01028.HMM",
"TIGR00521.HMM", "TIGR03671.HMM", "TIGR00240.HMM",
"TIGR02390.HMM", "TIGR02338.HMM", "TIGR00037.HMM",
"TIGR02076.HMM", "TIGR00335.HMM", "TIGR01025.HMM",
"TIGR00471.HMM", "TIGR00336.HMM", "TIGR00522.HMM",
"TIGR02153.HMM", "TIGR02651.HMM", "TIGR03674.HMM",
"TIGR00323.HMM", "TIGR00134.HMM", "TIGR02236.HMM",
"TIGR03683.HMM", "TIGR00491.HMM", "TIGR00658.HMM",
"TIGR03680.HMM", "TIGR00392.HMM", "TIGR00422.HMM",
"TIGR00279.HMM", "TIGR01052.HMM", "TIGR00442.HMM",
"TIGR00308.HMM", "TIGR00398.HMM", "TIGR00456.HMM",
"TIGR00549.HMM", "TIGR00408.HMM", "TIGR00432.HMM",
"TIGR00264.HMM", "TIGR00982.HMM", "TIGR00324.HMM",
"TIGR01952.HMM", "TIGR03626.HMM", "TIGR03670.HMM",
"TIGR00337.HMM", "TIGR01046.HMM", "TIGR01018.HMM",
"TIGR00936.HMM", "TIGR00463.HMM", "TIGR01309.HMM",
"TIGR03653.HMM", "TIGR00042.HMM", "TIGR02389.HMM",
"TIGR00307.HMM", "TIGR03673.HMM", "TIGR00373.HMM",
"TIGR01008.HMM", "TIGR00283.HMM", "TIGR00425.HMM",
"TIGR00405.HMM", "TIGR03665.HMM", "TIGR00448.HMM"]}

#New Version of AR53_MARKERS
# AR53_MARKERS = {"PFAM": ["PF04919.13.hmm","PF07541.13.hmm","PF01000.27.hmm",
# "PF00687.22.hmm","PF00466.21.hmm","PF00827.18.hmm","PF01280.21.hmm","PF01090.20.hmm",
# "PF01200.19.hmm","PF01015.19.hmm","PF00900.21.hmm","PF00410.20.hmm"],
# "TIGRFAM":["TIGR00037.HMM","TIGR00064.HMM","TIGR00111.HMM",
# "TIGR00134.HMM","TIGR00279.HMM","TIGR00291.HMM","TIGR00323.HMM",
# "TIGR00335.HMM","TIGR00373.HMM","TIGR00405.HMM","TIGR00448.HMM",
# "TIGR00483.HMM","TIGR00491.HMM","TIGR00522.HMM","TIGR00967.HMM",
# "TIGR00982.HMM","TIGR01008.HMM","TIGR01012.HMM","TIGR01018.HMM",
# "TIGR01020.HMM","TIGR01028.HMM","TIGR01046.HMM","TIGR01052.HMM",
# "TIGR01171.HMM","TIGR01213.HMM","TIGR01952.HMM","TIGR02236.HMM",
# "TIGR02338.HMM","TIGR02389.HMM","TIGR02390.HMM","TIGR03626.HMM",
# "TIGR03627.HMM","TIGR03628.HMM","TIGR03629.HMM","TIGR03670.HMM",
# "TIGR03671.HMM","TIGR03672.HMM","TIGR03673.HMM","TIGR03674.HMM",
# "TIGR03676.HMM","TIGR03680.HMM"]}



Expand Down
20 changes: 10 additions & 10 deletions gtdbtk/config/output.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,8 +43,8 @@
PATH_AR53_TREE_FILE = join(DIR_CLASSIFY, '{prefix}.ar53.classify.tree')
PATH_BAC120_SUMMARY_OUT = join(DIR_CLASSIFY, '{prefix}.bac120.summary.tsv')
PATH_AR53_SUMMARY_OUT = join(DIR_CLASSIFY, '{prefix}.ar53.summary.tsv')
PATH_HIGH_BAC120_TREE_FILE = join(DIR_CLASSIFY, '{prefix}.high.bac120.classify.tree')
PATH_LOW_BAC120_TREE_FILE = join(DIR_CLASSIFY, '{prefix}.bac120.classify.tree.{iter}.tree')
PATH_BACKBONE_BAC120_TREE_FILE = join(DIR_CLASSIFY, '{prefix}.backbone.bac120.classify.tree')
PATH_CLASS_LEVEL_BAC120_TREE_FILE = join(DIR_CLASSIFY, '{prefix}.bac120.classify.tree.{iter}.tree')
PATH_BAC120_CONFLICT = join(DIR_CLASSIFY, '{prefix}.bac120.conflict.tsv')
PATH_AR53_DISAPPEARING_GENOMES = join(DIR_CLASSIFY, '{prefix}.ar53.disappearing_genomes.tsv')
PATH_BAC120_DISAPPEARING_GENOMES = join(DIR_CLASSIFY, '{prefix}.bac120.disappearing_genomes.tsv')
Expand All @@ -55,8 +55,8 @@
PATH_AR53_RED_DICT = join(DIR_CLASSIFY_INTERMEDIATE, '{prefix}.ar53.red_dictionary.tsv')
PATH_BAC120_PPLACER_CLASS = join(DIR_CLASSIFY_INTERMEDIATE, '{prefix}.bac120.classification_pplacer.tsv')
PATH_AR53_PPLACER_CLASS = join(DIR_CLASSIFY_INTERMEDIATE, '{prefix}.ar53.classification_pplacer.tsv')
PATH_BAC120_HIGH_PPLACER_CLASS = join(DIR_CLASSIFY_INTERMEDIATE, '{prefix}.bac120.high.classification_pplacer.tsv')
PATH_BAC120_LOW_PPLACER_CLASS = join(DIR_CLASSIFY_INTERMEDIATE, '{prefix}.bac120.low.classification_pplacer_tree_{iter}.tsv')
PATH_BAC120_BACKBONE_PPLACER_CLASS = join(DIR_CLASSIFY_INTERMEDIATE, '{prefix}.bac120.backbone.classification_pplacer.tsv')
PATH_BAC120_CLASS_LEVEL_PPLACER_CLASS = join(DIR_CLASSIFY_INTERMEDIATE, '{prefix}.bac120.class_level.classification_pplacer_tree_{iter}.tsv')



Expand All @@ -68,12 +68,12 @@
PATH_AR53_PPLACER_JSON = join(DIR_PPLACER, 'pplacer.ar53.json')

# SPLIT TREE
PATH_HIGH_BAC120_PPLACER_OUT = join(DIR_PPLACER, 'pplacer.high.bac120.out')
PATH_HIGH_BAC120_PPLACER_JSON = join(DIR_PPLACER, 'pplacer.high.bac120.json')
DIR_LOW_PPLACER = join(DIR_PPLACER, 'tree_{iter}')
PATH_LOW_BAC120_SUBMSA = join(DIR_LOW_PPLACER, 'user_msa_file.fasta')
PATH_LOW_BAC120_PPLACER_OUT = join(DIR_LOW_PPLACER, 'pplacer.low.bac120.out')
PATH_LOW_BAC120_PPLACER_JSON = join(DIR_LOW_PPLACER, 'pplacer.low.bac120.json')
PATH_BACKBONE_BAC120_PPLACER_OUT = join(DIR_PPLACER, 'pplacer.backbone.bac120.out')
PATH_BACKBONE_BAC120_PPLACER_JSON = join(DIR_PPLACER, 'pplacer.backbone.bac120.json')
DIR_CLASS_LEVEL_PPLACER = join(DIR_PPLACER, 'tree_{iter}')
PATH_CLASS_LEVEL_BAC120_SUBMSA = join(DIR_CLASS_LEVEL_PPLACER, 'user_msa_file.fasta')
PATH_CLASS_LEVEL_BAC120_PPLACER_OUT = join(DIR_CLASS_LEVEL_PPLACER, 'pplacer.class_level.bac120.out')
PATH_CLASS_LEVEL_BAC120_PPLACER_JSON = join(DIR_CLASS_LEVEL_PPLACER, 'pplacer.class_level.bac120.json')

# Command: infer
DIR_INFER = 'infer'
Expand Down
2 changes: 1 addition & 1 deletion gtdbtk/external/fastani.py
Original file line number Diff line number Diff line change
Expand Up @@ -274,7 +274,7 @@ def run_proc(self, q, r, ql, rl, output):
if rl is not None:
args.extend(['--rl', rl])
args.extend(['-o', output])
self.logger.debug(' '.join(args))
#self.logger.debug(' '.join(args))
proc = subprocess.Popen(args, stdout=subprocess.PIPE,
stderr=subprocess.PIPE, encoding='utf-8')
stdout, stderr = proc.communicate()
Expand Down
4 changes: 2 additions & 2 deletions gtdbtk/external/prodigal.py
Original file line number Diff line number Diff line change
Expand Up @@ -248,12 +248,12 @@ def run(self, genomic_files, tln_tables):
if len(lq_gids) > 10:
for lq_gid in lq_gids:
self.warnings.info(lq_gid)
fails.write(f'{lq_gid}\tno genes were called by Prodigal\n')
fails.write(f'{lq_gid}\tNo genes were called by Prodigal\n')
else:
for lq_gid in lq_gids:
self.logger.warning(f'Skipping: {lq_gid}')
self.warnings.info(lq_gid)
fails.write(f'{lq_gid}\tno genes were called by Prodigal\n')
fails.write(f'{lq_gid}\tNo genes were called by Prodigal\n')

fails.close()
return result_dict
5 changes: 5 additions & 0 deletions gtdbtk/io/classify_summary.py
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,11 @@ def add_row(self, row: ClassifySummaryFileRow):
raise GTDBTkExit(f'Attempting to add duplicate row: {row.gid}')
self.rows[row.gid] = row

def has_row(self):
if self.rows.items():
return True
return False

def write(self):
"""Writes the summary file to disk. None will be replaced with N/A"""
with open(self.path, 'w') as fh:
Expand Down
8 changes: 4 additions & 4 deletions gtdbtk/io/pplacer_classification.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,8 @@
from typing import Dict

from gtdbtk.biolib_lite.common import make_sure_path_exists
from gtdbtk.config.output import PATH_AR53_PPLACER_CLASS, PATH_BAC120_PPLACER_CLASS, PATH_BAC120_HIGH_PPLACER_CLASS, \
PATH_BAC120_LOW_PPLACER_CLASS
from gtdbtk.config.output import PATH_AR53_PPLACER_CLASS, PATH_BAC120_PPLACER_CLASS, PATH_BAC120_BACKBONE_PPLACER_CLASS, \
PATH_BAC120_CLASS_LEVEL_PPLACER_CLASS
from gtdbtk.exceptions import GTDBTkExit


Expand Down Expand Up @@ -66,7 +66,7 @@ class PplacerLowClassifyFileBAC120(PplacerClassifyFile):
"""Store the pplacer classifications for the BAC120 marker set."""

def __init__(self, out_dir: str, prefix: str,iter:str):
path = os.path.join(out_dir, PATH_BAC120_LOW_PPLACER_CLASS.format(prefix=prefix,iter=iter))
path = os.path.join(out_dir, PATH_BAC120_CLASS_LEVEL_PPLACER_CLASS.format(prefix=prefix,iter=iter))
super().__init__(path)


Expand All @@ -86,7 +86,7 @@ class PplacerHighClassifyFile(object):
"""Store the pplacer classifications."""

def __init__(self,out_dir: str,prefix: str):
self.path = os.path.join(out_dir, PATH_BAC120_HIGH_PPLACER_CLASS.format(prefix=prefix))
self.path = os.path.join(out_dir, PATH_BAC120_BACKBONE_PPLACER_CLASS.format(prefix=prefix))
self.rows = dict() # keyed by user_genome
self.none_value = 'N/A'

Expand Down
5 changes: 4 additions & 1 deletion gtdbtk/io/tree_mapping.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ def __init__(self):
self.gid = None
self.ani_classification = None
self.mapped_tree = None
self.rule = None

class GenomeMappingFile(object):
"""Store the GTDB-Tk classify summary output."""
Expand All @@ -47,7 +48,8 @@ def get_col_order(row: GenomeMappingFileRow = None):
row = GenomeMappingFileRow()
mapping = [('user_genome', row.gid),
('is_ani_classification', row.ani_classification),
('species_tree_mapped', row.mapped_tree)]
('class_tree_mapped', row.mapped_tree),
('classification_rule', row.rule)]
cols, data = list(), list()
for col_name, col_val in mapping:
cols.append(col_name)
Expand Down Expand Up @@ -88,4 +90,5 @@ def read(self):
row.gid = data[0]
row.ani_classification = data[1]
row.mapped_tree = data[2]
row.rule = data[3]
self.add_row(row)
23 changes: 21 additions & 2 deletions gtdbtk/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@
from gtdbtk.model.enum import Domain
from gtdbtk.pipeline.export_msa import export_msa
from gtdbtk.reroot_tree import RerootTree
from gtdbtk.tools import symlink_f, get_reference_ids
from gtdbtk.tools import symlink_f, get_reference_ids, confirm


class OptionsParser(object):
Expand Down Expand Up @@ -700,7 +700,14 @@ def parse_options(self, options):
raise GTDBTkExit("When running de_novo_wf, The '--skip_gtdb_refs' flag requires"
"'--custom_taxonomy_file' to be included to the command line.")

#options.write_single_copy_genes = False
if options.write_single_copy_genes and not options.keep_intermediates:
self.logger.warning('--write_single_copy_genes flag is set to True,'
' but --keep_intermediates is set to False. '
'The intermediate folder containing the single copy genes will be removed.')
if not confirm('Do you want to proceed?'):
self.logger.info('Exiting workflow.')
sys.exit(0)

self.identify(options)

options.identify_dir = options.out_dir
Expand Down Expand Up @@ -794,7 +801,19 @@ def parse_options(self, options):
check_dependencies(['prodigal', 'hmmalign', 'pplacer', 'guppy',
'fastANI'])



if options.write_single_copy_genes and not options.keep_intermediates:
self.logger.warning('--write_single_copy_genes flag is set to True,'
' but --keep_intermediates is set to False. '
'The intermediate folder containing the single copy genes will be removed.')
if not confirm('Do you want to proceed?'):
self.logger.info('Exiting workflow.')
sys.exit(0)


#options.write_single_copy_genes = False

self.identify(options)

options.identify_dir = options.out_dir
Expand Down
Loading

0 comments on commit 196147a

Please sign in to comment.