Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OrthoFinder generates BLAST files with NUL characters and is unhappy #663

Closed
ltsypin opened this issue Jan 26, 2022 · 7 comments
Closed

Comments

@ltsypin
Copy link

ltsypin commented Jan 26, 2022

Hello! I'm trying to run the tutorial on the example data set, and I'm getting the following error message:

ERROR: Blast0_0.txt is corrupted
Malformatted line in /Users/eukarya/Downloads/OrthoFinder/ExampleData/OrthoFinder/Results_Jan25/WorkingDirectory/Blast0_0.txt
Offending line was:

ERROR: Error processing files Blast0_*
Process Process-19:
Traceback (most recent call last):
  File "/Users/eukarya/miniconda3/envs/tgne/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/Users/eukarya/miniconda3/envs/tgne/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/eukarya/miniconda3/envs/tgne/bin/scripts_of/__main__.py", line 529, in Worker_ProcessBlastHits
    WaterfallMethod.ProcessBlastHits(*args, d_pickle=d_pickle, qDoubleBlast=qDoubleBlast)
  File "/Users/eukarya/miniconda3/envs/tgne/bin/scripts_of/__main__.py", line 516, in ProcessBlastHits
    Bij = blast_file_processor.GetBLAST6Scores(seqsInfo, blastDir_list, seqsInfo.speciesToUse[iSpecies], seqsInfo.speciesToUse[jSpecies], qDoubleBlast=qDoubleBlast)
  File "/Users/eukarya/miniconda3/envs/tgne/bin/scripts_of/blast_file_processor.py", line 65, in GetBLAST6Scores
    for row in blastreader:
_csv.Error: line contains NUL
ERROR: Blast1_0.txt is corrupted
Malformatted line in /Users/eukarya/Downloads/OrthoFinder/ExampleData/OrthoFinder/Results_Jan25/WorkingDirectory/Blast1_0.txt
Offending line was:

ERROR: Error processing files Blast1_*
Process Process-18:
Traceback (most recent call last):
  File "/Users/eukarya/miniconda3/envs/tgne/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/Users/eukarya/miniconda3/envs/tgne/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/eukarya/miniconda3/envs/tgne/bin/scripts_of/__main__.py", line 529, in Worker_ProcessBlastHits
    WaterfallMethod.ProcessBlastHits(*args, d_pickle=d_pickle, qDoubleBlast=qDoubleBlast)
  File "/Users/eukarya/miniconda3/envs/tgne/bin/scripts_of/__main__.py", line 516, in ProcessBlastHits
    Bij = blast_file_processor.GetBLAST6Scores(seqsInfo, blastDir_list, seqsInfo.speciesToUse[iSpecies], seqsInfo.speciesToUse[jSpecies], qDoubleBlast=qDoubleBlast)
  File "/Users/eukarya/miniconda3/envs/tgne/bin/scripts_of/blast_file_processor.py", line 65, in GetBLAST6Scores
    for row in blastreader:
_csv.Error: line contains NUL
ERROR: An error occurred, ***please review the error messages*** they may contain useful information about the problem.

Looking more closely at the BLAST files, they have a lot of '\x00' NUL characters at their start. I tried to rectify by replacing the offending files with unzipped ones that I cleaned with a bash script (for f in *.txt; do tr < $f -d '\x00' > ../$f; done) (note: I did this from a sub-directory with back up files, which is why the script refers to the directory above). This didn't work either, giving me:

ERROR: Query or hit sequence ID in BLAST results file was missing or incorrectly formatted.
ERROR: Blast0_0.txt is corrupted
Malformatted line in /Users/eukarya/Downloads/OrthoFinder/ExampleData/OrthoFinder/Results_Jan25/WorkingDirectory/Blast0_0.txt
Offending line was:
0	1	342	1	342	1.3e-197	680.6
ERROR: Error processing files Blast0_*
Process Process-2:

ERROR: Query or hit sequence ID in BLAST results file was missing or incorrectly formatted.
ERROR: Blast1_0.txt is corrupted
Malformatted line in /Users/eukarya/Downloads/OrthoFinder/ExampleData/OrthoFinder/Results_Jan25/WorkingDirectory/Blast1_0.txt
Offending line was:
.1e-18	87.0
ERROR: Error processing files Blast1_*
Process Process-3:
Traceback (most recent call last):
  File "/Users/eukarya/miniconda3/envs/tgne/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/Users/eukarya/miniconda3/envs/tgne/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/eukarya/miniconda3/envs/tgne/bin/scripts_of/__main__.py", line 529, in Worker_ProcessBlastHits
    WaterfallMethod.ProcessBlastHits(*args, d_pickle=d_pickle, qDoubleBlast=qDoubleBlast)
  File "/Users/eukarya/miniconda3/envs/tgne/bin/scripts_of/__main__.py", line 516, in ProcessBlastHits
    Bij = blast_file_processor.GetBLAST6Scores(seqsInfo, blastDir_list, seqsInfo.speciesToUse[iSpecies], seqsInfo.speciesToUse[jSpecies], qDoubleBlast=qDoubleBlast)
  File "/Users/eukarya/miniconda3/envs/tgne/bin/scripts_of/blast_file_processor.py", line 68, in GetBLAST6Scores
    sequence1ID = int(row[iQ].split(sep, 2)[1])
IndexError: list index out of range
Traceback (most recent call last):
  File "/Users/eukarya/miniconda3/envs/tgne/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/Users/eukarya/miniconda3/envs/tgne/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/eukarya/miniconda3/envs/tgne/bin/scripts_of/__main__.py", line 529, in Worker_ProcessBlastHits
    WaterfallMethod.ProcessBlastHits(*args, d_pickle=d_pickle, qDoubleBlast=qDoubleBlast)
  File "/Users/eukarya/miniconda3/envs/tgne/bin/scripts_of/__main__.py", line 516, in ProcessBlastHits
    Bij = blast_file_processor.GetBLAST6Scores(seqsInfo, blastDir_list, seqsInfo.speciesToUse[iSpecies], seqsInfo.speciesToUse[jSpecies], qDoubleBlast=qDoubleBlast)
  File "/Users/eukarya/miniconda3/envs/tgne/bin/scripts_of/blast_file_processor.py", line 68, in GetBLAST6Scores
    sequence1ID = int(row[iQ].split(sep, 2)[1])
IndexError: list index out of range
ERROR: An error occurred, ***please review the error messages*** they may contain useful information about the problem.

I'm stumped and would appreciate any help. In case it's helpful, my anaconda environment has the following packages and versions:

anndata                   0.7.8                    pypi_0    pypi
anyio                     2.2.0            py39hecd8cb5_1  
appdirs                   1.4.4              pyhd3eb1b0_0  
appnope                   0.1.2           py39hecd8cb5_1001  
argon2-cffi               20.1.0           py39h9ed2024_1  
arpack                    3.7.0                hefb7bc6_2    conda-forge
arviz                     0.11.4                   pypi_0    pypi
async_generator           1.10               pyhd3eb1b0_0  
attrs                     21.4.0             pyhd3eb1b0_0  
babel                     2.9.1              pyhd3eb1b0_0  
backcall                  0.2.0              pyhd3eb1b0_0  
beautifulsoup4            4.10.0             pyh06a4308_0  
biopython                 1.79             py39h89e85a6_1    conda-forge
black                     21.12b0                  pypi_0    pypi
blackcellmagic            0.0.3                    pypi_0    pypi
blas                      1.0                         mkl  
blast                     2.12.0          pl5262h78c34c6_0    bioconda
bleach                    4.1.0              pyhd3eb1b0_0  
bokeh                     2.3.3            py39hecd8cb5_0  
bottleneck                1.3.2            py39he3068b8_1  
brotli                    1.0.9                hb1e8313_2  
brotlipy                  0.7.0           py39h9ed2024_1003  
bzip2                     1.0.8                h1de35cc_0  
c-ares                    1.18.1               hca72f7f_0  
ca-certificates           2021.10.26           hecd8cb5_2  
certifi                   2021.10.8        py39hecd8cb5_2  
cffi                      1.15.0           py39hc55c11b_1  
cftime                    1.5.1.1          py39h67323c0_0  
charset-normalizer        2.0.4              pyhd3eb1b0_0  
click                     8.0.3              pyhd3eb1b0_0  
cloudpickle               2.0.0              pyhd3eb1b0_0  
colorcet                  2.0.6              pyhd3eb1b0_0  
cryptography              36.0.0           py39hf6deb26_0  
curl                      7.80.0               hca72f7f_0  
cycler                    0.11.0             pyhd3eb1b0_0  
cython                    0.29.25          py39he9d5cce_0  
cytoolz                   0.11.0           py39h9ed2024_0  
dask                      2021.10.0          pyhd3eb1b0_0  
dask-core                 2021.10.0          pyhd3eb1b0_0  
datashader                0.13.0             pyhd3eb1b0_1  
datashape                 0.5.4            py39hecd8cb5_1  
dbus                      1.13.6               ha13b53f_2    conda-forge
debugpy                   1.5.1            py39he9d5cce_0  
decorator                 5.1.0              pyhd3eb1b0_0  
defusedxml                0.7.1              pyhd3eb1b0_0  
diamond                   0.9.19               hfd59bb5_5    bioconda
dill                      0.3.4                    pypi_0    pypi
distributed               2021.10.0        py39hecd8cb5_0  
entrez-direct             16.2                 h193322a_0    bioconda
entrypoints               0.3              py39hecd8cb5_0  
expat                     2.4.3                he49afe7_0    conda-forge
fastme                    2.1.6.1              hb4d813b_0    bioconda
fastparquet               0.5.0            py39he3068b8_1  
fasttree                  2.1.10               hb4d813b_5    bioconda
ffmpeg                    4.2.2                h97e5cf8_0  
fonttools                 4.25.0             pyhd3eb1b0_0  
freetype                  2.11.0               hd8bbffd_0  
fsspec                    2022.1.0           pyhd3eb1b0_0  
gawk                      5.1.0                h8a0c380_0  
gettext                   0.21.0               h7535e17_0  
giflib                    5.2.1                haf1e3a3_0  
glib                      2.68.4               he49afe7_0    conda-forge
glib-tools                2.68.4               he49afe7_0    conda-forge
glpk                      4.65              h0f52abe_1004    conda-forge
gmp                       6.2.1                h23ab428_2  
gnutls                    3.6.15               hed9c0bf_0  
h5py                      3.6.0                    pypi_0    pypi
hdf4                      4.2.13               h39711bb_2  
hdf5                      1.10.6          nompi_hc5d9132_1114    conda-forge
heapdict                  1.0.1              pyhd3eb1b0_0  
holoviews                 1.14.6             pyhd3eb1b0_1  
html5lib                  1.1                pyhd3eb1b0_0  
hvplot                    0.7.3              pyhd3eb1b0_1  
icu                       69.1                 he49afe7_0    conda-forge
idna                      3.3                pyhd3eb1b0_0  
igraph                    0.9.6                ha5be149_0    conda-forge
imageio                   2.9.0              pyhd3eb1b0_0  
importlib-metadata        4.8.2            py39hecd8cb5_0  
importlib_metadata        4.8.2                hd3eb1b0_0  
intel-openmp              2021.4.0          hecd8cb5_3538  
ipykernel                 6.4.1            py39hecd8cb5_1  
ipython                   7.29.0           py39h01d92e1_0  
ipython_genutils          0.2.0              pyhd3eb1b0_1  
ipywidgets                7.6.5              pyhd3eb1b0_1  
iqplot                    0.2.4                    pypi_0    pypi
iqtree                    2.1.4_beta           h4de6764_0    bioconda
jbig                      2.1               h0d85af4_2003    conda-forge
jedi                      0.18.0           py39hecd8cb5_1  
jinja2                    3.0.2              pyhd3eb1b0_0  
joblib                    1.1.0              pyhd3eb1b0_0  
jpeg                      9d                   h9ed2024_0  
json5                     0.9.6              pyhd3eb1b0_0  
jsonschema                3.2.0              pyhd3eb1b0_2  
jupyter                   1.0.0            py39hecd8cb5_7  
jupyter_bokeh             3.0.4                      py_0    bokeh
jupyter_client            7.1.0              pyhd3eb1b0_0  
jupyter_console           6.4.0              pyhd3eb1b0_0  
jupyter_core              4.9.1            py39hecd8cb5_0  
jupyter_server            1.4.1            py39hecd8cb5_0  
jupyterlab                3.2.1              pyhd3eb1b0_1  
jupyterlab-spellchecker   0.7.2                    pypi_0    pypi
jupyterlab_pygments       0.1.2                      py_0  
jupyterlab_server         2.10.2             pyhd3eb1b0_1  
jupyterlab_widgets        1.0.0              pyhd3eb1b0_1  
jupytext                  1.13.6                   pypi_0    pypi
kiwisolver                1.3.1            py39h23ab428_0  
krb5                      1.19.2               hcd88c3b_0  
lame                      3.100                h1de35cc_0  
lcms2                     2.12                 hf1fd2bf_0  
leidenalg                 0.8.8            py39h9fcab8e_1    conda-forge
lerc                      3.0                  he49afe7_0    conda-forge
libblas                   3.9.0              12_osx64_mkl    conda-forge
libcblas                  3.9.0              12_osx64_mkl    conda-forge
libclang                  13.0.0          default_he082bbe_0    conda-forge
libcurl                   7.80.0               h6dfd666_0  
libcxx                    12.0.1               habf9029_1    conda-forge
libdeflate                1.8                  h0d85af4_0    conda-forge
libedit                   3.1.20210910         hca72f7f_0  
libev                     4.33                 h9ed2024_1  
libffi                    3.3                  hb1e8313_2  
libgfortran               5.0.0           9_3_0_h6c81a4c_23    conda-forge
libgfortran5              9.3.0               h6c81a4c_23    conda-forge
libglib                   2.68.4               hd556434_0    conda-forge
libiconv                  1.16                 h1de35cc_0  
libidn2                   2.3.2                h9ed2024_0  
liblapack                 3.9.0              12_osx64_mkl    conda-forge
libllvm10                 10.0.1               h76017ad_5  
libllvm13                 13.0.0               hd011deb_0    conda-forge
libnetcdf                 4.6.1                hfd9a460_4  
libnghttp2                1.46.0               ha29bfda_0  
libopus                   1.3.1                h1de35cc_0  
libpng                    1.6.37               ha441bb4_0  
libpq                     14.1                 hea3049e_1    conda-forge
libsodium                 1.0.18               h1de35cc_0  
libssh2                   1.9.0                ha12b0ac_1  
libtasn1                  4.16.0               h9ed2024_0  
libtiff                   4.3.0                hd146c10_2    conda-forge
libunistring              0.9.10               h9ed2024_0  
libuv                     1.42.0               h0d85af4_0    conda-forge
libvpx                    1.7.0                h378b8a2_0  
libwebp                   1.2.0                hacca55c_0  
libwebp-base              1.2.0                h9ed2024_0  
libxml2                   2.9.12               h7e28ab6_1    conda-forge
libxslt                   1.1.33               h1acebb3_3    conda-forge
libzlib                   1.2.11            h9173be1_1013    conda-forge
llvm-openmp               12.0.0               h0dcd299_1  
llvmlite                  0.36.0           py39he4411ff_4  
locket                    0.2.1            py39hecd8cb5_1  
lxml                      4.7.1            py39hf41e7f8_0    conda-forge
lz4-c                     1.9.3                h23ab428_1  
mafft                     7.490                hb4d813b_0    bioconda
markdown                  3.3.4            py39hecd8cb5_0  
markdown-it-py            1.1.0                    pypi_0    pypi
markupsafe                2.0.1            py39h9ed2024_0  
matplotlib                3.5.0            py39hecd8cb5_0  
matplotlib-base           3.5.0            py39h4f681db_0  
matplotlib-inline         0.1.2              pyhd3eb1b0_2  
mcl                       14.137          pl5262hb4d813b_6    bioconda
mdit-py-plugins           0.3.0                    pypi_0    pypi
metis                     5.1.0             h2e338ed_1006    conda-forge
mistune                   0.8.4           py39h9ed2024_1000  
mkl                       2021.4.0           hecd8cb5_637  
mkl-service               2.4.0            py39h9ed2024_0  
mkl_fft                   1.3.1            py39h4ab4a9b_0  
mkl_random                1.2.2            py39hb2f4e1b_0  
mmseqs2                   13.45111             h14b862d_1    bioconda
mpfr                      4.1.0                h0f52abe_1    conda-forge
mpi                       1.0                     openmpi    conda-forge
msgpack-python            1.0.2            py39hf7b0b51_1  
multipledispatch          0.6.0            py39hecd8cb5_0  
multiprocess              0.70.12.2                pypi_0    pypi
munkres                   1.1.4                      py_0  
muscle                    3.8.1551             hb280591_6    bioconda
mypy_extensions           0.4.3            py39hecd8cb5_1  
mysql-common              8.0.28               h694c41f_0    conda-forge
mysql-libs                8.0.28               h115446f_0    conda-forge
natsort                   8.0.2                    pypi_0    pypi
nbclassic                 0.2.6              pyhd3eb1b0_0  
nbclient                  0.5.3              pyhd3eb1b0_0  
nbconvert                 6.3.0            py39hecd8cb5_0  
nbformat                  5.1.3              pyhd3eb1b0_0  
ncurses                   6.3                  hca72f7f_2  
nest-asyncio              1.5.1              pyhd3eb1b0_0  
netcdf4                   1.5.7            py39h93ad9c5_0  
nettle                    3.7.3                h230ac6f_1  
networkx                  2.6.3              pyhd3eb1b0_0  
nodejs                    17.1.0               h0da8292_2    conda-forge
notebook                  6.4.6            py39hecd8cb5_0  
nspr                      4.32                 hcd9eead_1    conda-forge
nss                       3.74                 h31e2bf1_0    conda-forge
numba                     0.53.0           py39he2616bd_0    conda-forge
numexpr                   2.8.1            py39h2e5f0a9_0  
numpy                     1.21.2           py39h4b4dc7a_0  
numpy-base                1.21.2           py39he0bd621_0  
olefile                   0.46               pyhd3eb1b0_0  
openh264                  2.1.1                h8346a28_0  
openmpi                   4.1.2                hd3cd54c_0    conda-forge
openssl                   1.1.1m               hca72f7f_0  
orthofinder               2.5.4                hdfd78af_0    bioconda
packaging                 21.3               pyhd3eb1b0_0  
pandas                    1.3.5            py39h743cdd8_0  
pandocfilters             1.4.3            py39hecd8cb5_1  
panel                     0.12.1             pyhd3eb1b0_0  
param                     1.12.0             pyhd3eb1b0_0  
parso                     0.8.3              pyhd3eb1b0_0  
partd                     1.2.0              pyhd3eb1b0_0  
pathspec                  0.9.0                    pypi_0    pypi
pcre                      8.45                 he49afe7_0    conda-forge
perl                      5.26.2               h4e221da_0  
perl-archive-tar          2.32                    pl526_0    bioconda
perl-carp                 1.38                    pl526_3    bioconda
perl-common-sense         3.74                    pl526_2    bioconda
perl-compress-raw-bzip2   2.087           pl526h6de7cb9_0    bioconda
perl-compress-raw-zlib    2.087           pl526h770b8ee_0    bioconda
perl-exporter             5.72                    pl526_1    bioconda
perl-exporter-tiny        1.002001                pl526_0    bioconda
perl-extutils-makemaker   7.36                    pl526_1    bioconda
perl-io-compress          2.087           pl526h6de7cb9_0    bioconda
perl-io-zlib              1.10                    pl526_2    bioconda
perl-json                 4.02                    pl526_0    bioconda
perl-json-xs              2.34            pl526h04f5b5a_3    bioconda
perl-list-moreutils       0.428                   pl526_1    bioconda
perl-list-moreutils-xs    0.428                   pl526_0    bioconda
perl-pathtools            3.75            pl526h1de35cc_1    bioconda
perl-scalar-list-utils    1.52            pl526h01d97ff_0    bioconda
perl-types-serialiser     1.0                     pl526_2    bioconda
perl-xsloader             0.24                    pl526_0    bioconda
pexpect                   4.8.0              pyhd3eb1b0_3  
pickleshare               0.7.5           pyhd3eb1b0_1003  
pillow                    8.4.0            py39h98e4679_0  
pip                       21.2.4           py39hecd8cb5_0  
platformdirs              2.4.1                    pypi_0    pypi
prometheus_client         0.12.0             pyhd3eb1b0_0  
prompt-toolkit            3.0.20             pyhd3eb1b0_0  
prompt_toolkit            3.0.20               hd3eb1b0_0  
psutil                    5.8.0            py39h9ed2024_1  
ptyprocess                0.7.0              pyhd3eb1b0_2  
pycparser                 2.21               pyhd3eb1b0_0  
pyct                      0.4.6            py39hecd8cb5_0  
pygments                  2.10.0             pyhd3eb1b0_0  
pymde                     0.1.13           py39h89e85a6_0    conda-forge
pynndescent               0.5.4              pyhd3eb1b0_0  
pyopenssl                 21.0.0             pyhd3eb1b0_1  
pyparsing                 3.0.4              pyhd3eb1b0_0  
pyqt                      5.12.3           py39h6e9494a_8    conda-forge
pyqt-impl                 5.12.3           py39he44290a_8    conda-forge
pyqt5-sip                 4.19.18          py39h15fb055_8    conda-forge
pyqtchart                 5.12             py39he44290a_8    conda-forge
pyqtwebengine             5.12.1           py39he44290a_8    conda-forge
pyrsistent                0.18.0           py39hca72f7f_0  
pysocks                   1.7.1            py39hecd8cb5_0  
python                    3.9.7                h88f2d9e_1  
python-dateutil           2.8.2              pyhd3eb1b0_0  
python-igraph             0.9.9            py39h8c2f370_0    conda-forge
python_abi                3.9                      2_cp39    conda-forge
pytorch                   1.10.1                  py3.9_0    pytorch
pytz                      2021.3             pyhd3eb1b0_0  
pyviz_comms               2.0.2              pyhd3eb1b0_0  
pywavelets                1.1.1            py39he3068b8_4  
pyyaml                    6.0              py39hca72f7f_1  
pyzmq                     22.3.0           py39he9d5cce_2  
qt                        5.12.9               h2a607e2_5    conda-forge
qtconsole                 5.1.1              pyhd3eb1b0_0  
qtpy                      1.10.0             pyhd3eb1b0_0  
raxml                     8.2.12               hb4d813b_3    bioconda
raxml-ng                  1.1.0                habac362_0    bioconda
readline                  8.1.2                hca72f7f_1  
regex                     2021.11.2        py39hca72f7f_0  
requests                  2.27.1             pyhd3eb1b0_0  
scikit-image              0.16.2           py39hb2f4e1b_0  
scikit-learn              1.0.2            py39hae1ba45_1  
scipy                     1.7.3            py39h056f1c0_0    conda-forge
seaborn                   0.11.2             pyhd3eb1b0_0  
selenium                  3.141.0         py39h9ed2024_1000  
send2trash                1.8.0              pyhd3eb1b0_1  
setuptools                58.0.4           py39hecd8cb5_0  
sip                       4.19.13          py39h23ab428_0  
six                       1.16.0             pyhd3eb1b0_0  
sniffio                   1.2.0            py39hecd8cb5_1  
sortedcontainers          2.4.0              pyhd3eb1b0_0  
soupsieve                 2.3.1              pyhd3eb1b0_0  
sqlite                    3.37.0               h707629a_0  
suitesparse               5.10.1               h7aff33d_1    conda-forge
tbb                       2021.5.0             haf03e11_0  
tblib                     1.7.0              pyhd3eb1b0_0  
terminado                 0.9.4            py39hecd8cb5_0  
testpath                  0.5.0              pyhd3eb1b0_0  
texttable                 1.6.4              pyhd8ed1ab_0    conda-forge
threadpoolctl             2.2.0              pyh0d69192_0  
thrift                    0.11.0           py39h23ab428_0  
tk                        8.6.11               h7bc2e8c_0  
toml                      0.10.2             pyhd3eb1b0_0  
tomli                     1.2.3                    pypi_0    pypi
toolz                     0.11.2             pyhd3eb1b0_0  
torchvision               0.11.2                 py39_cpu    pytorch
tornado                   6.1              py39h9ed2024_0  
tqdm                      4.62.3             pyhd3eb1b0_1  
traitlets                 5.1.1              pyhd3eb1b0_0  
typed-ast                 1.4.3            py39h9ed2024_1  
typing-extensions         3.10.0.2             hd3eb1b0_0  
typing_extensions         3.10.0.2           pyh06a4308_0  
tzdata                    2021e                hda174b7_0  
ujson                     4.2.0            py39he9d5cce_0  
umap-learn                0.5.2            py39h6e9494a_1    conda-forge
urllib3                   1.26.7             pyhd3eb1b0_0  
watermark                 2.3.0                    pypi_0    pypi
wcwidth                   0.2.5              pyhd3eb1b0_0  
webencodings              0.5.1            py39hecd8cb5_1  
wget                      1.20.1               h051b688_0  
wheel                     0.37.1             pyhd3eb1b0_0  
widgetsnbextension        3.5.1            py39hecd8cb5_0  
x264                      1!157.20191217       h1de35cc_0  
xarray                    0.20.1             pyhd3eb1b0_1  
xlrd                      1.2.0                    pypi_0    pypi
xz                        5.2.5                h1de35cc_0  
yaml                      0.2.5                haf1e3a3_0  
zeromq                    4.3.4                h23ab428_0  
zict                      2.0.0              pyhd3eb1b0_0  
zipp                      3.7.0              pyhd3eb1b0_0  
zlib                      1.2.11            h9173be1_1013    conda-forge
zstd                      1.5.2                h582d3a0_0    conda-forge
@davidemms
Copy link
Owner

Hi

That is strange, even the corrected line looks too short. A normal line (from a different dataset) would look like this

0_1     1_26207 70.1    147     43      1       6       152     8       153     2.6e-48 188.7

From your conda environment it's quiet an old version of diamond that you are using so the first thing to try would be to update that and see if it resolves the problem. Let me know.

Best wishes
David

@ltsypin
Copy link
Author

ltsypin commented Feb 1, 2022

Thanks for your reply! Updating diamond helped, but now I'm getting another error:

Reconciling gene trees and species tree
---------------------------------------
Outgroup: Mycoplasma_hyopneumoniae
2022-02-01 15:22:03 : Starting Recon and orthologues
2022-02-01 15:22:03 : Starting OF Orthologues
Traceback (most recent call last):
  File "/Users/eukarya/miniconda3/envs/tgne/bin/orthofinder", line 7, in <module>
    main(args)
  File "/Users/eukarya/miniconda3/envs/tgne/bin/scripts_of/__main__.py", line 1778, in main
    GetOrthologues(speciesInfoObj, options, prog_caller)
  File "/Users/eukarya/miniconda3/envs/tgne/bin/scripts_of/__main__.py", line 1540, in GetOrthologues
    orthologues.OrthologuesWorkflow(speciesInfoObj.speciesToUse, 
  File "/Users/eukarya/miniconda3/envs/tgne/bin/scripts_of/orthologues.py", line 1090, in OrthologuesWorkflow
    ReconciliationAndOrthologues(recon_method, db.ogSet, nHighParallel, nLowParallel, i if qMultiple else None, stride_dups=stride_dups, q_split_para_clades=q_split_para_clades) 
  File "/Users/eukarya/miniconda3/envs/tgne/bin/scripts_of/orthologues.py", line 870, in ReconciliationAndOrthologues
    nOrthologues_SpPair = trees2ologs_of.DoOrthologuesForOrthoFinder(ogSet, species_tree_rooted_labelled, trees2ologs_of.GeneToSpecies_dash, 
  File "/Users/eukarya/miniconda3/envs/tgne/bin/scripts_of/trees2ologs_of.py", line 1123, in DoOrthologuesForOrthoFinder
    nOrthologues_SpPair = RunOrthologsParallel(ta, len(ogSet.speciesToUse), args_queue, n_parallel)
  File "/Users/eukarya/miniconda3/envs/tgne/bin/scripts_of/trees2ologs_of.py", line 1276, in RunOrthologsParallel
    proc.start()
  File "/Users/eukarya/miniconda3/envs/tgne/lib/python3.9/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/Users/eukarya/miniconda3/envs/tgne/lib/python3.9/multiprocessing/context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/Users/eukarya/miniconda3/envs/tgne/lib/python3.9/multiprocessing/context.py", line 284, in _Popen
    return Popen(process_obj)
  File "/Users/eukarya/miniconda3/envs/tgne/lib/python3.9/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/Users/eukarya/miniconda3/envs/tgne/lib/python3.9/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/Users/eukarya/miniconda3/envs/tgne/lib/python3.9/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/Users/eukarya/miniconda3/envs/tgne/lib/python3.9/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
TypeError: cannot pickle '_io.TextIOWrapper' object

I tried adding installing dill into the environment, in case something was up with pickle in Python 3.9, but that didn't change anything... What do you think is the next step?

@ltsypin
Copy link
Author

ltsypin commented Feb 2, 2022

In case it helps, I'm trying to run this on Mac OS 11.6.2.

@davidemms
Copy link
Owner

Hi

It sounds like this is to do with multiprocessing on Mac with python 3.9, and possibly 3.8 too. There's a similar issue reported here: GoogleCloudPlatform/gsutil#961

I think if you can install python 3.7 in your conda environment then that should resolve the issue for now. I will see what I can do to get a more permanent fix for Mac.

Best wishes
David

@davidemms
Copy link
Owner

Hi

It turned out this was a problem that had occurred before but the fix was (accidentally) not being used. I've submitted a change to the master branch on github, but in the meantime you can change line 42 of your file /Users/eukarya/miniconda3/envs/tgne/bin/scripts_of/__main__.py to if platform.system() == "Darwin": and if should work.

Best wishes
David

@ltsypin
Copy link
Author

ltsypin commented Feb 11, 2022

I'll give this a shot! Thanks a lot :-)

@aberaslop
Copy link

Dear David,
I hope you are doing well. As always, thank you for this great software.
I am encountering a similar problem to Itsypin's. This is a new problem that I have not encountered before in my other runs. This time I am using orthofinder in sets of filtered proteins (so the fasta files have been manipulated and are not the result of an annotation software).
I am running OrthoFinder version 2.5.4, in a server running in linux.

This is the error:
Running OrthoFinder algorithm

2023-02-19 20:29:28 : Initial processing of each species
ERROR: Blast0_0.txt is corrupted
Malformatted line in /nfs/wsi/tgm/projects/aileen/fusarium/rowena/orthoparser/CSEPS/OrthoFinder/Results_Feb19/WorkingDirectory/Blast0_0.txt
Offending line was:

ERROR: Error processing files Blast0_*
Process Process-50:
ERROR: Blast1_0.txt is corrupted
Malformatted line in /nfs/wsi/tgm/projects/aileen/fusarium/rowena/orthoparser/CSEPS/OrthoFinder/Results_Feb19/WorkingDirectory/Blast1_0.txt
Offending line was:

ERROR: Error processing files Blast1_*
Process Process-51:
Traceback (most recent call last):
File "/home/berasate/anaconda3/envs/orthofinder/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/home/berasate/anaconda3/envs/orthofinder/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/berasate/anaconda3/envs/orthofinder/bin/scripts_of/main.py", line 529, in Worker_ProcessBlastHits
WaterfallMethod.ProcessBlastHits(*args, d_pickle=d_pickle, qDoubleBlast=qDoubleBlast)
File "/home/berasate/anaconda3/envs/orthofinder/bin/scripts_of/main.py", line 516, in ProcessBlastHits
Bij = blast_file_processor.GetBLAST6Scores(seqsInfo, blastDir_list, seqsInfo.speciesToUse[iSpecies], seqsInfo.speciesToUse[jSpecies], qDoubleBlast=qDoubleBlast)
File "/home/berasate/anaconda3/envs/orthofinder/bin/scripts_of/blast_file_processor.py", line 65, in GetBLAST6Scores
for row in blastreader:
_csv.Error: line contains NUL
Traceback (most recent call last):
File "/home/berasate/anaconda3/envs/orthofinder/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/home/berasate/anaconda3/envs/orthofinder/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/berasate/anaconda3/envs/orthofinder/bin/scripts_of/main.py", line 529, in Worker_ProcessBlastHits
WaterfallMethod.ProcessBlastHits(*args, d_pickle=d_pickle, qDoubleBlast=qDoubleBlast)
File "/home/berasate/anaconda3/envs/orthofinder/bin/scripts_of/main.py", line 516, in ProcessBlastHits
Bij = blast_file_processor.GetBLAST6Scores(seqsInfo, blastDir_list, seqsInfo.speciesToUse[iSpecies], seqsInfo.speciesToUse[jSpecies], qDoubleBlast=qDoubleBlast)
File "/home/berasate/anaconda3/envs/orthofinder/bin/scripts_of/blast_file_processor.py", line 65, in GetBLAST6Scores
for row in blastreader:
_csv.Error: line contains NUL
ERROR: Blast2_0.txt is corrupted
Malformatted line in /nfs/wsi/tgm/projects/aileen/fusarium/rowena/orthoparser/CSEPS/OrthoFinder/Results_Feb19/WorkingDirectory/Blast2_0.txt
Offending line was:

ERROR: Error processing files Blast2_*
Process Process-52:
Traceback (most recent call last):
File "/home/berasate/anaconda3/envs/orthofinder/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/home/berasate/anaconda3/envs/orthofinder/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/berasate/anaconda3/envs/orthofinder/bin/scripts_of/main.py", line 529, in Worker_ProcessBlastHits
WaterfallMethod.ProcessBlastHits(*args, d_pickle=d_pickle, qDoubleBlast=qDoubleBlast)
File "/home/berasate/anaconda3/envs/orthofinder/bin/scripts_of/main.py", line 516, in ProcessBlastHits
Bij = blast_file_processor.GetBLAST6Scores(seqsInfo, blastDir_list, seqsInfo.speciesToUse[iSpecies], seqsInfo.speciesToUse[jSpecies], qDoubleBlast=qDoubleBlast)
File "/home/berasate/anaconda3/envs/orthofinder/bin/scripts_of/blast_file_processor.py", line 65, in GetBLAST6Scores
for row in blastreader:
_csv.Error: line contains NUL
ERROR: Blast3_0.txt is corrupted
Malformatted line in /nfs/wsi/tgm/projects/aileen/fusarium/rowena/orthoparser/CSEPS/OrthoFinder/Results_Feb19/WorkingDirectory/Blast3_0.txt
Offending line was:

ERROR: Error processing files Blast3_*
Process Process-53:
ERROR: Blast4_0.txt is corrupted
Malformatted line in /nfs/wsi/tgm/projects/aileen/fusarium/rowena/orthoparser/CSEPS/OrthoFinder/Results_Feb19/WorkingDirectory/Blast4_0.txt
Offending line was:

ERROR: Error processing files Blast4_*
Process Process-54:
ERROR: Blast5_0.txt is corrupted
Malformatted line in /nfs/wsi/tgm/projects/aileen/fusarium/rowena/orthoparser/CSEPS/OrthoFinder/Results_Feb19/WorkingDirectory/Blast5_0.txt
Offending line was:

ERROR: Error processing files Blast5_*
Process Process-55:
Traceback (most recent call last):
File "/home/berasate/anaconda3/envs/orthofinder/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/home/berasate/anaconda3/envs/orthofinder/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/berasate/anaconda3/envs/orthofinder/bin/scripts_of/main.py", line 529, in Worker_ProcessBlastHits
WaterfallMethod.ProcessBlastHits(*args, d_pickle=d_pickle, qDoubleBlast=qDoubleBlast)
File "/home/berasate/anaconda3/envs/orthofinder/bin/scripts_of/main.py", line 516, in ProcessBlastHits
Bij = blast_file_processor.GetBLAST6Scores(seqsInfo, blastDir_list, seqsInfo.speciesToUse[iSpecies], seqsInfo.speciesToUse[jSpecies], qDoubleBlast=qDoubleBlast)
File "/home/berasate/anaconda3/envs/orthofinder/bin/scripts_of/blast_file_processor.py", line 65, in GetBLAST6Scores
for row in blastreader:
_csv.Error: line contains NUL
Traceback (most recent call last):
File "/home/berasate/anaconda3/envs/orthofinder/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/home/berasate/anaconda3/envs/orthofinder/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/berasate/anaconda3/envs/orthofinder/bin/scripts_of/main.py", line 529, in Worker_ProcessBlastHits
WaterfallMethod.ProcessBlastHits(*args, d_pickle=d_pickle, qDoubleBlast=qDoubleBlast)
File "/home/berasate/anaconda3/envs/orthofinder/bin/scripts_of/main.py", line 516, in ProcessBlastHits
Bij = blast_file_processor.GetBLAST6Scores(seqsInfo, blastDir_list, seqsInfo.speciesToUse[iSpecies], seqsInfo.speciesToUse[jSpecies], qDoubleBlast=qDoubleBlast)
File "/home/berasate/anaconda3/envs/orthofinder/bin/scripts_of/blast_file_processor.py", line 65, in GetBLAST6Scores
for row in blastreader:
_csv.Error: line contains NUL
Traceback (most recent call last):
File "/home/berasate/anaconda3/envs/orthofinder/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/home/berasate/anaconda3/envs/orthofinder/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/berasate/anaconda3/envs/orthofinder/bin/scripts_of/main.py", line 529, in Worker_ProcessBlastHits
WaterfallMethod.ProcessBlastHits(*args, d_pickle=d_pickle, qDoubleBlast=qDoubleBlast)
File "/home/berasate/anaconda3/envs/orthofinder/bin/scripts_of/main.py", line 516, in ProcessBlastHits
Bij = blast_file_processor.GetBLAST6Scores(seqsInfo, blastDir_list, seqsInfo.speciesToUse[iSpecies], seqsInfo.speciesToUse[jSpecies], qDoubleBlast=qDoubleBlast)
File "/home/berasate/anaconda3/envs/orthofinder/bin/scripts_of/blast_file_processor.py", line 65, in GetBLAST6Scores
for row in blastreader:
_csv.Error: line contains NUL
ERROR: An error occurred, please review the error messages they may contain useful information about the problem.

I have inspected the Blast files, and the first line appears indeed corrupted with lots of "^@^@^@^@^@^@^@^@^@^@^@^". I have tried to eliminate the first line and restart orthofinder, but then the error is that it cannot find "0_0". I have also tried cleaning the files using the strategy above "for f in *.txt; do tr < $f -d '@' > ../$f; done", but it does not work. When I do grep "@" it returns 0.

I have also checked the solution above regarding changing line 42 of main.py to if platform.system() == "Darwin", but in my file, it was already like that.

Any pointers on how to fix this would be greatly appreciated! All the best,

L.

Blast9_9.txt
Blast0_0.txt
example_input_file_cseps.faa.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants