Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mikado.test() fails: how to resolve the "sqlite3.OperationalError: disk I/O error" #172

Closed
zebrafish-507 opened this issue May 2, 2019 · 14 comments
Assignees
Labels
Milestone

Comments

@zebrafish-507
Copy link

I installed mikado and all the dependencies using anconda. I tested the installed modules by:

import Mikado
Mikado.test()
The error message I always got is:
">>> import Mikado
/newlustre/home/longyong/anaconda3/lib/python3.6/site-packages/Mikado/configuration/configurator.py:529: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
scoring = yaml.load(scoring_file)

Mikado.test()
Running unit tests for Mikado
NumPy version 1.16.3
NumPy relaxed strides checking option: True
NumPy is installed in /newlustre/home/longyong/anaconda3/lib/python3.6/site-packages/numpy
Python version 3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, 01:22:34) [GCC 7.3.0]
nose version 1.3.7
...................................................................................................................TAA
....................E.EE...............................................['permissive', 'split']
..2019-05-02 16:33:46,029 - main - init.py:124 - ERROR - main - MainProcess - Mikado crashed, cause:
2019-05-02 16:33:46,029 - main - init.py:125 - ERROR - main - MainProcess - (sqlite3.OperationalError) disk I/O error
(Background on this error at: http://sqlalche.me/e/e3q8)
Traceback (most recent call last):
File "/newlustre/home/longyong/anaconda3/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 2262, in _wrap_pool_connect
return fn()
File "/newlustre/home/longyong/anaconda3/lib/python3.6/site-packages/sqlalchemy/pool/impl.py", line 361, in connect
return _ConnectionFairy._checkout(self, self._fairy)
File "/newlustre/home/longyong/anaconda3/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 760, in _checkout
fairy = _ConnectionRecord.checkout(pool)
File "/newlustre/home/longyong/anaconda3/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 492, in checkout
rec = pool._do_get()
File "/newlustre/home/longyong/anaconda3/lib/python3.6/site-packages/sqlalchemy/pool/impl.py", line 344, in _do_get
c = self._create_connection()
File "/newlustre/home/longyong/anaconda3/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 308, in _create_connection
return _ConnectionRecord(self)
File "/newlustre/home/longyong/anaconda3/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 437, in init
self.__connect(first_connect_check=True)
File "/newlustre/home/longyong/anaconda3/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 651, in __connect
pool.dispatch.connect(self.connection, self)
File "/newlustre/home/longyong/anaconda3/lib/python3.6/site-packages/sqlalchemy/event/attr.py", line 259, in call
fn(*args, **kw)
File "/newlustre/home/longyong/anaconda3/lib/python3.6/site-packages/Mikado/utilities/dbutils.py", line 109, in set_sqlite_pragma
cursor.execute("PRAGMA synchronous=OFF")
sqlite3.OperationalError: disk I/O error

The above exception was the direct cause of the following exception:"
How can I resolve this error?

@lucventurini
Copy link
Collaborator

Dear @zebrafish-507,
thank you for using our program and for reporting this bug. From a quick search, the problem that you are encountering seems to be something that could happen on NFS systems, although to be honest, I have not seen it happening on our cluster environments.
In any case, I will add a try-except clause to this to avoid the locking.

@lucventurini
Copy link
Collaborator

Dear @zebrafish-507 , I have uploaded the potential fix in the latest commit in the development branch (6113045). If you could please test it, I would be able to confirm the fix.

@lucventurini lucventurini self-assigned this May 2, 2019
@lucventurini lucventurini added this to the 1.5 milestone May 2, 2019
@zebrafish-507
Copy link
Author

Thanks! @luca,
I replaced the old compare.py and dbutils.py files with the new ones, where should I put the setup.cfg file. There was no folder named "mikado" in the path.

@lucventurini
Copy link
Collaborator

Dear @zebrafish-507,
the setup.cfg should not be necessary for your user case. Please use the anaconda version of compare.py, I have worked extensively on that part of the code since I uploaded the latest version to Anaconda, the new one most probably will break all sorts of stuff! Apologies for having lumped that file as well into the commit.

I hope the new dbutils.py will solve your issue.

Kind regards

@zebrafish-507
Copy link
Author

Dear @lucventurini,
I re-installed Mikado and ran the test. The I/O error still existed. See blow for the error message:

dbapi_connection = <sqlite3.Connection object at 0x7fd79b4bff10>
connection_record = <sqlalchemy.pool.base._ConnectionRecord object at 0x7fd789580dd8>

@event.listens_for(Engine, "connect")
def set_sqlite_pragma(dbapi_connection, connection_record):
    cursor = dbapi_connection.cursor()
    cursor.execute("PRAGMA foreign_keys=ON")
  cursor.execute("PRAGMA synchronous=OFF")

E sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) disk I/O error
E (Background on this error at: http://sqlalche.me/e/e3q8)

connection_record = <sqlalchemy.pool.base._ConnectionRecord object at 0x7fd789580dd8>
cursor = <sqlite3.Cursor object at 0x7fd78a5b4e30>
dbapi_connection = <sqlite3.Connection object at 0x7fd79b4bff10>

anaconda3/lib/python3.6/site-packages/Mikado-1.5-py3.6-linux-x86_64.egg/Mikado/utilities/dbutils.py:109: OperationalError

Best regards,

@lucventurini
Copy link
Collaborator

lucventurini commented May 2, 2019

Dear @zebrafish-507 , would the error persist if you installed only the updated dbutils.py file from the previous commit? That file contains the patch.

I am asking because the snippet you posted seems to have the non-patched version.

Kind regards

@zebrafish-507
Copy link
Author

zebrafish-507 commented May 3, 2019

Dear @lucventurini,
I re-installed mikado by git clone and by downloading the 1.3beta version, but the I/O error was always encountered. It said like this:
..2019-05-03 10:54:50,548 - main - init.py:124 - ERROR - main - MainProcess - Mikado crashed, cause:
2019-05-03 10:54:50,548 - main - init.py:125 - ERROR - main - MainProcess - (sqlite3.OperationalError) disk I/O error
(Background on this error at: http://sqlalche.me/e/e3q8)
Can you figure out what's wrong about it?
Best,

@lucventurini
Copy link
Collaborator

Dear @zebrafish-507, apologies for not being clear enough in the previous message.

The fix is present in the development branch of my personal fork (https://github.com/lucventurini/mikado), specifically this file:

https://raw.githubusercontent.com/lucventurini/mikado/development/Mikado/utilities/dbutils.py

There are two options:

  • install the github development version from the fork. I have made a lot of changes, though (please see the changelog). They mostly make the program better, and I have had everything fully tested, but still, it is the development branch.
  • patch the Mikado/utilities/dbutils.py file using the version from above. I had not touched that section of the codebase in years before yesterday, so plugging it in the new code should not cause any bug to emerge.

Regarding why the bug appears in the first place, I unfortunately cannot diagnose it from here. It looks like a problem with your disk, potentially triggered by NFS. I hope that the patch will provide a workaround.

Thank you for your patience.

Kind regards

@zebrafish-507
Copy link
Author

Dear @lucventurini,
I re-installed mikado according to your instructions, but couldn't resolve the problem as well. It is possible that there are some problems with our disk. I'll ask our manager for help. Thanks a lot.
Best,

@lucventurini
Copy link
Collaborator

Dear @zebrafish-507 ,
I have had a long look at unit-tests today. I am not sure it is your case, but the bug you triggered might have been a case of Mikado trying to overwrite some of the installation files when doing the tests. I should have now fixed the issue - I was testing for exactly such a scenario. Potentially this could solve your issue as well (although I caution that I was still not able to reproduce your exact bug).

If you could install from the https://github.com/lucventurini/mikado/ repository (development branch) we could verify whether this fixes the issue - unless you and your system manager found the issue to be originating in something specific of your system.

Kind regards

@lucventurini
Copy link
Collaborator

Dear @zebrafish-507 ,
any news on this front?

@zebrafish-507
Copy link
Author

Dear @lucventurini,
Sorry. I couldn't address this issue by reinstalling mikado. Our system manager had no idea about this I/O error as well. I'm wondering if mikado can work well with real data despite of this test error.

@lucventurini
Copy link
Collaborator

Dear @zebrafish-507,
A way to test it is to install, then clone from github, go into sample_data, and launch snakemake. The mini test pipeline will also need diamond and prodigal to be installed.
If that functions, it's probably some kind of obscure error in the tests, that I'm trying to catch and fix but that will not affect the program.

@lucventurini
Copy link
Collaborator

Closing for the time being.

lucventurini added a commit that referenced this issue Jun 5, 2019
* Switched to PySam for loading and fetching from genome files. Also, improved massively the speed of tests.

* Fixed previous commit

* Fixed travis bug

* Refactoring of check_index for Mikado compare (#166) and fix for #172

* Now Mikado will merge touching (NOT overlapping) exons coming from BED12 files. This should fix an issue with halLiftover

* This commit should fix a bunch of tests for when Mikado is installed with SUDO privileges (#137) potentially also fixing #172.

* Corrected a bug in the printing of transcriptomic BED12 files, corrected a bug in the serialisation of ORFs

* Fixed previous breakage

* Moved the code for checking the index into gene_dict. Also, now GeneDict allows access to positions as well.

* Minor edit to assigner

* Fixing previously broken commit

* Solving a bug which rendered the exclude_utr/protein_coding flags of mikado compare useless.

* Adding the GZI index to the tests directory to avoid permission errors. Addressing #175

* Corrected some testing. Moreover, now Mikado supports the BED12+1 format (ie gffread --bed output)

* Adding a maximum intron length for the default scoring configuration files.

* BROKEN. Proceeding on #142. Now the padding algorithm is aware of where a transcript finishes (intron vs exon). Moreover, we need to change the data structure for padding to a *directional* graph and keep in mind the distance needed to pad a transcript, to solve ambiguous cases in a deterministic (rather than random) way.

* Issue #174: modification to the abstractlocus.py file, to try to solve the issue found by @cschuh.

* #174: this should provide a solution to the issue, which is however only temporary. To be tested.

* #174: making the implicit "for" cycle explicit. Hopefully this should help pinpoint the error better.

* #174: peppered the failing block with try-except statements.

* #174: this should solve it. Now missing external scores in the database will cause Mikado to explicitly fail.

* Fixed #176

* BROKEN. Progress on #142, the code runs, but the tests are broken. **This might be legitimate as we changed the behaviour of the code.**

* Closing #155.

* #174: Now Mikado pick will die informatively if the SQLite3 database has not been found.

* #166: fixed some issues with self-compare

* BROKEN. We have to verify that the padding functions also on the 5' end, but we need to make a new test for that. The test development is in progress.

* The padding now should be tested and correct.

* Fixed previous commit. This should fix #142.
lucventurini added a commit that referenced this issue Jun 18, 2019
* Solved a small bug in the Gene class

* This commit should fix some of the performance issues found in Mikado compare when testing in the all vs all (issue #166).

* Updated the CHANGELOG.

* Slight improvements to the generic GFLine class and to the to_gff wrapper

* Solved some assorted bugs, from stop_codon parsing in GTF2 (for Augustus) to avoiding a very costly pragma check on MIDX databases.

* Now Mikado util stats will only return one value for the mode, making the table parsable

* Solved some small bugs introduced by changing the mode for mikado util stats

* Dropping automated support for Python3.5. The conda environment cannot be created successfully, too many packages have not been updated in the original repositories.

* Updating the conda environment to reflect that only Python>=3.6 is now accepted

* Various fixes for managing correctly BED12 files.

* Fix for the previous commit breaking TRAVIS

* Switched to PySam for loading and fetching from genome files. Also, improved massively the speed of tests.

* Fixed previous commit

* Fixed travis bug

* Refactoring of check_index for Mikado compare (#166) and fix for #172

* Now Mikado will merge touching (NOT overlapping) exons coming from BED12 files. This should fix an issue with halLiftover

* This commit should fix a bunch of tests for when Mikado is installed with SUDO privileges (#137) potentially also fixing #172.

* Corrected a bug in the printing of transcriptomic BED12 files, corrected a bug in the serialisation of ORFs

* Fixed previous breakage

* Moved the code for checking the index into gene_dict. Also, now GeneDict allows access to positions as well.

* Minor edit to assigner

* Fixing previously broken commit

* Solving a bug which rendered the exclude_utr/protein_coding flags of mikado compare useless.

* Adding the GZI index to the tests directory to avoid permission errors. Addressing #175

* Corrected some testing. Moreover, now Mikado supports the BED12+1 format (ie gffread --bed output)

* Adding a maximum intron length for the default scoring configuration files.

* BROKEN. Proceeding on #142. Now the padding algorithm is aware of where a transcript finishes (intron vs exon). Moreover, we need to change the data structure for padding to a *directional* graph and keep in mind the distance needed to pad a transcript, to solve ambiguous cases in a deterministic (rather than random) way.

* Issue #174: modification to the abstractlocus.py file, to try to solve the issue found by @cschuh.

* #174: this should provide a solution to the issue, which is however only temporary. To be tested.

* #174: making the implicit "for" cycle explicit. Hopefully this should help pinpoint the error better.

* #174: peppered the failing block with try-except statements.

* #174: this should solve it. Now missing external scores in the database will cause Mikado to explicitly fail.

* Fixed #176

* BROKEN. Progress on #142, the code runs, but the tests are broken. **This might be legitimate as we changed the behaviour of the code.**

* Closing #155.

* #174: Now Mikado pick will die informatively if the SQLite3 database has not been found.

* #166: fixed some issues with self-compare

* BROKEN. We have to verify that the padding functions also on the 5' end, but we need to make a new test for that. The test development is in progress.

* The padding now should be tested and correct.

* Fixed previous commit. This should fix #142.

* Development (#178)

* Switched to PySam for loading and fetching from genome files. Also, improved massively the speed of tests.

* Fixed previous commit

* Fixed travis bug

* Refactoring of check_index for Mikado compare (#166) and fix for #172

* Now Mikado will merge touching (NOT overlapping) exons coming from BED12 files. This should fix an issue with halLiftover

* This commit should fix a bunch of tests for when Mikado is installed with SUDO privileges (#137) potentially also fixing #172.

* Corrected a bug in the printing of transcriptomic BED12 files, corrected a bug in the serialisation of ORFs

* Fixed previous breakage

* Moved the code for checking the index into gene_dict. Also, now GeneDict allows access to positions as well.

* Minor edit to assigner

* Fixing previously broken commit

* Solving a bug which rendered the exclude_utr/protein_coding flags of mikado compare useless.

* Adding the GZI index to the tests directory to avoid permission errors. Addressing #175

* Corrected some testing. Moreover, now Mikado supports the BED12+1 format (ie gffread --bed output)

* Adding a maximum intron length for the default scoring configuration files.

* BROKEN. Proceeding on #142. Now the padding algorithm is aware of where a transcript finishes (intron vs exon). Moreover, we need to change the data structure for padding to a *directional* graph and keep in mind the distance needed to pad a transcript, to solve ambiguous cases in a deterministic (rather than random) way.

* Issue #174: modification to the abstractlocus.py file, to try to solve the issue found by @cschuh.

* #174: this should provide a solution to the issue, which is however only temporary. To be tested.

* #174: making the implicit "for" cycle explicit. Hopefully this should help pinpoint the error better.

* #174: peppered the failing block with try-except statements.

* #174: this should solve it. Now missing external scores in the database will cause Mikado to explicitly fail.

* Fixed #176

* BROKEN. Progress on #142, the code runs, but the tests are broken. **This might be legitimate as we changed the behaviour of the code.**

* Closing #155.

* #174: Now Mikado pick will die informatively if the SQLite3 database has not been found.

* #166: fixed some issues with self-compare

* BROKEN. We have to verify that the padding functions also on the 5' end, but we need to make a new test for that. The test development is in progress.

* The padding now should be tested and correct.

* Fixed previous commit. This should fix #142.

* Update Singularity.centos.def

Changed python to python3 during %post, otherwise it will use the system python2.7...

* Fixed small bug in external metrics handling

* Update Singularity.centos.def

* Development (#184)

* This should address #173 (both configuration file and docs) and #158

* Fix #181 and small bug fix for parsing Mikado annotations.

* Progress for #142 - this should fix the wrong ORF calculation for cases when the CDS was open at the 5' end.

* Fixed previous commit (always for #142)

* #142: corrected and tested the issue with one-off exons, for padding.

* This should fix and test #142 for good.

* Removed spurious warning/error messages

* #142: solved a bug which caused truncated transcripts at the 5' end not to be padded.

* #142: solved a problem which caused a false abort for transcripts on the - strand with changed stop codon.

* #142: fixing previous commit

* Pushing the fix for #182 onto the development branch

* Fix #183

* Fix #183 and previous commit

* #183: now Mikado configure will set a seed when generating the configuration file. The seed will be explicitly mentioned in the log.

* #177: made ORF loading slightly faster with pysam. Also made XML serialisation much faster using SQL sessions and multiprocessing.Pool instead of queues.

* Solved annoying bug that caused Mikado to crash with TAIR GFF3s.
lucventurini added a commit that referenced this issue Jun 19, 2019
* Solved a small bug in the Gene class

* This commit should fix some of the performance issues found in Mikado compare when testing in the all vs all (issue #166).

* Updated the CHANGELOG.

* Slight improvements to the generic GFLine class and to the to_gff wrapper

* Solved some assorted bugs, from stop_codon parsing in GTF2 (for Augustus) to avoiding a very costly pragma check on MIDX databases.

* Now Mikado util stats will only return one value for the mode, making the table parsable

* Solved some small bugs introduced by changing the mode for mikado util stats

* Dropping automated support for Python3.5. The conda environment cannot be created successfully, too many packages have not been updated in the original repositories.

* Updating the conda environment to reflect that only Python>=3.6 is now accepted

* Various fixes for managing correctly BED12 files.

* Fix for the previous commit breaking TRAVIS

* Switched to PySam for loading and fetching from genome files. Also, improved massively the speed of tests.

* Fixed previous commit

* Fixed travis bug

* Refactoring of check_index for Mikado compare (#166) and fix for #172

* Now Mikado will merge touching (NOT overlapping) exons coming from BED12 files. This should fix an issue with halLiftover

* This commit should fix a bunch of tests for when Mikado is installed with SUDO privileges (#137) potentially also fixing #172.

* Corrected a bug in the printing of transcriptomic BED12 files, corrected a bug in the serialisation of ORFs

* Fixed previous breakage

* Moved the code for checking the index into gene_dict. Also, now GeneDict allows access to positions as well.

* Minor edit to assigner

* Fixing previously broken commit

* Solving a bug which rendered the exclude_utr/protein_coding flags of mikado compare useless.

* Adding the GZI index to the tests directory to avoid permission errors. Addressing #175

* Corrected some testing. Moreover, now Mikado supports the BED12+1 format (ie gffread --bed output)

* Adding a maximum intron length for the default scoring configuration files.

* BROKEN. Proceeding on #142. Now the padding algorithm is aware of where a transcript finishes (intron vs exon). Moreover, we need to change the data structure for padding to a *directional* graph and keep in mind the distance needed to pad a transcript, to solve ambiguous cases in a deterministic (rather than random) way.

* Issue #174: modification to the abstractlocus.py file, to try to solve the issue found by @cschuh.

* #174: this should provide a solution to the issue, which is however only temporary. To be tested.

* #174: making the implicit "for" cycle explicit. Hopefully this should help pinpoint the error better.

* #174: peppered the failing block with try-except statements.

* #174: this should solve it. Now missing external scores in the database will cause Mikado to explicitly fail.

* Fixed #176

* BROKEN. Progress on #142, the code runs, but the tests are broken. **This might be legitimate as we changed the behaviour of the code.**

* Closing #155.

* #174: Now Mikado pick will die informatively if the SQLite3 database has not been found.

* #166: fixed some issues with self-compare

* BROKEN. We have to verify that the padding functions also on the 5' end, but we need to make a new test for that. The test development is in progress.

* The padding now should be tested and correct.

* Fixed previous commit. This should fix #142.

* Development (#178)

* Switched to PySam for loading and fetching from genome files. Also, improved massively the speed of tests.

* Fixed previous commit

* Fixed travis bug

* Refactoring of check_index for Mikado compare (#166) and fix for #172

* Now Mikado will merge touching (NOT overlapping) exons coming from BED12 files. This should fix an issue with halLiftover

* This commit should fix a bunch of tests for when Mikado is installed with SUDO privileges (#137) potentially also fixing #172.

* Corrected a bug in the printing of transcriptomic BED12 files, corrected a bug in the serialisation of ORFs

* Fixed previous breakage

* Moved the code for checking the index into gene_dict. Also, now GeneDict allows access to positions as well.

* Minor edit to assigner

* Fixing previously broken commit

* Solving a bug which rendered the exclude_utr/protein_coding flags of mikado compare useless.

* Adding the GZI index to the tests directory to avoid permission errors. Addressing #175

* Corrected some testing. Moreover, now Mikado supports the BED12+1 format (ie gffread --bed output)

* Adding a maximum intron length for the default scoring configuration files.

* BROKEN. Proceeding on #142. Now the padding algorithm is aware of where a transcript finishes (intron vs exon). Moreover, we need to change the data structure for padding to a *directional* graph and keep in mind the distance needed to pad a transcript, to solve ambiguous cases in a deterministic (rather than random) way.

* Issue #174: modification to the abstractlocus.py file, to try to solve the issue found by @cschuh.

* #174: this should provide a solution to the issue, which is however only temporary. To be tested.

* #174: making the implicit "for" cycle explicit. Hopefully this should help pinpoint the error better.

* #174: peppered the failing block with try-except statements.

* #174: this should solve it. Now missing external scores in the database will cause Mikado to explicitly fail.

* Fixed #176

* BROKEN. Progress on #142, the code runs, but the tests are broken. **This might be legitimate as we changed the behaviour of the code.**

* Closing #155.

* #174: Now Mikado pick will die informatively if the SQLite3 database has not been found.

* #166: fixed some issues with self-compare

* BROKEN. We have to verify that the padding functions also on the 5' end, but we need to make a new test for that. The test development is in progress.

* The padding now should be tested and correct.

* Fixed previous commit. This should fix #142.

* Update Singularity.centos.def

Changed python to python3 during %post, otherwise it will use the system python2.7...

* Fixed small bug in external metrics handling

* Update Singularity.centos.def

* This should address #173 (both configuration file and docs) and #158

* Fix #181 and small bug fix for parsing Mikado annotations.

* Progress for #142 - this should fix the wrong ORF calculation for cases when the CDS was open at the 5' end.

* Fixed previous commit (always for #142)

* #142: corrected and tested the issue with one-off exons, for padding.

* This should fix and test #142 for good.

* Removed spurious warning/error messages

* #142: solved a bug which caused truncated transcripts at the 5' end not to be padded.

* #142: solved a problem which caused a false abort for transcripts on the - strand with changed stop codon.

* #142: fixing previous commit

* Pushing the fix for #182 onto the development branch

* Fix #183

* Fix #183 and previous commit

* #183: now Mikado configure will set a seed when generating the configuration file. The seed will be explicitly mentioned in the log.

* #177: made ORF loading slightly faster with pysam. Also made XML serialisation much faster using SQL sessions and multiprocessing.Pool instead of queues.

* Solved annoying bug that caused Mikado to crash with TAIR GFF3s.

* Development (#184)

* This should address #173 (both configuration file and docs) and #158

* Fix #181 and small bug fix for parsing Mikado annotations.

* Progress for #142 - this should fix the wrong ORF calculation for cases when the CDS was open at the 5' end.

* Fixed previous commit (always for #142)

* #142: corrected and tested the issue with one-off exons, for padding.

* This should fix and test #142 for good.

* Removed spurious warning/error messages

* #142: solved a bug which caused truncated transcripts at the 5' end not to be padded.

* #142: solved a problem which caused a false abort for transcripts on the - strand with changed stop codon.

* #142: fixing previous commit

* Pushing the fix for #182 onto the development branch

* Fix #183

* Fix #183 and previous commit

* #183: now Mikado configure will set a seed when generating the configuration file. The seed will be explicitly mentioned in the log.

* #177: made ORF loading slightly faster with pysam. Also made XML serialisation much faster using SQL sessions and multiprocessing.Pool instead of queues.

* Solved annoying bug that caused Mikado to crash with TAIR GFF3s.
lucventurini added a commit to lucventurini/mikado that referenced this issue Feb 11, 2021
* Solved a small bug in the Gene class

* This commit should fix some of the performance issues found in Mikado compare when testing in the all vs all (issue EI-CoreBioinformatics#166).

* Updated the CHANGELOG.

* Slight improvements to the generic GFLine class and to the to_gff wrapper

* Solved some assorted bugs, from stop_codon parsing in GTF2 (for Augustus) to avoiding a very costly pragma check on MIDX databases.

* Now Mikado util stats will only return one value for the mode, making the table parsable

* Solved some small bugs introduced by changing the mode for mikado util stats

* Dropping automated support for Python3.5. The conda environment cannot be created successfully, too many packages have not been updated in the original repositories.

* Updating the conda environment to reflect that only Python>=3.6 is now accepted

* Various fixes for managing correctly BED12 files.

* Fix for the previous commit breaking TRAVIS

* Switched to PySam for loading and fetching from genome files. Also, improved massively the speed of tests.

* Fixed previous commit

* Fixed travis bug

* Refactoring of check_index for Mikado compare (EI-CoreBioinformatics#166) and fix for EI-CoreBioinformatics#172

* Now Mikado will merge touching (NOT overlapping) exons coming from BED12 files. This should fix an issue with halLiftover

* This commit should fix a bunch of tests for when Mikado is installed with SUDO privileges (EI-CoreBioinformatics#137) potentially also fixing EI-CoreBioinformatics#172.

* Corrected a bug in the printing of transcriptomic BED12 files, corrected a bug in the serialisation of ORFs

* Fixed previous breakage

* Moved the code for checking the index into gene_dict. Also, now GeneDict allows access to positions as well.

* Minor edit to assigner

* Fixing previously broken commit

* Solving a bug which rendered the exclude_utr/protein_coding flags of mikado compare useless.

* Adding the GZI index to the tests directory to avoid permission errors. Addressing EI-CoreBioinformatics#175

* Corrected some testing. Moreover, now Mikado supports the BED12+1 format (ie gffread --bed output)

* Adding a maximum intron length for the default scoring configuration files.

* BROKEN. Proceeding on EI-CoreBioinformatics#142. Now the padding algorithm is aware of where a transcript finishes (intron vs exon). Moreover, we need to change the data structure for padding to a *directional* graph and keep in mind the distance needed to pad a transcript, to solve ambiguous cases in a deterministic (rather than random) way.

* Issue EI-CoreBioinformatics#174: modification to the abstractlocus.py file, to try to solve the issue found by @cschuh.

* EI-CoreBioinformatics#174: this should provide a solution to the issue, which is however only temporary. To be tested.

* EI-CoreBioinformatics#174: making the implicit "for" cycle explicit. Hopefully this should help pinpoint the error better.

* EI-CoreBioinformatics#174: peppered the failing block with try-except statements.

* EI-CoreBioinformatics#174: this should solve it. Now missing external scores in the database will cause Mikado to explicitly fail.

* Fixed EI-CoreBioinformatics#176

* BROKEN. Progress on EI-CoreBioinformatics#142, the code runs, but the tests are broken. **This might be legitimate as we changed the behaviour of the code.**

* Closing EI-CoreBioinformatics#155.

* EI-CoreBioinformatics#174: Now Mikado pick will die informatively if the SQLite3 database has not been found.

* EI-CoreBioinformatics#166: fixed some issues with self-compare

* BROKEN. We have to verify that the padding functions also on the 5' end, but we need to make a new test for that. The test development is in progress.

* The padding now should be tested and correct.

* Fixed previous commit. This should fix EI-CoreBioinformatics#142.

* Development (EI-CoreBioinformatics#178)

* Switched to PySam for loading and fetching from genome files. Also, improved massively the speed of tests.

* Fixed previous commit

* Fixed travis bug

* Refactoring of check_index for Mikado compare (EI-CoreBioinformatics#166) and fix for EI-CoreBioinformatics#172

* Now Mikado will merge touching (NOT overlapping) exons coming from BED12 files. This should fix an issue with halLiftover

* This commit should fix a bunch of tests for when Mikado is installed with SUDO privileges (EI-CoreBioinformatics#137) potentially also fixing EI-CoreBioinformatics#172.

* Corrected a bug in the printing of transcriptomic BED12 files, corrected a bug in the serialisation of ORFs

* Fixed previous breakage

* Moved the code for checking the index into gene_dict. Also, now GeneDict allows access to positions as well.

* Minor edit to assigner

* Fixing previously broken commit

* Solving a bug which rendered the exclude_utr/protein_coding flags of mikado compare useless.

* Adding the GZI index to the tests directory to avoid permission errors. Addressing EI-CoreBioinformatics#175

* Corrected some testing. Moreover, now Mikado supports the BED12+1 format (ie gffread --bed output)

* Adding a maximum intron length for the default scoring configuration files.

* BROKEN. Proceeding on EI-CoreBioinformatics#142. Now the padding algorithm is aware of where a transcript finishes (intron vs exon). Moreover, we need to change the data structure for padding to a *directional* graph and keep in mind the distance needed to pad a transcript, to solve ambiguous cases in a deterministic (rather than random) way.

* Issue EI-CoreBioinformatics#174: modification to the abstractlocus.py file, to try to solve the issue found by @cschuh.

* EI-CoreBioinformatics#174: this should provide a solution to the issue, which is however only temporary. To be tested.

* EI-CoreBioinformatics#174: making the implicit "for" cycle explicit. Hopefully this should help pinpoint the error better.

* EI-CoreBioinformatics#174: peppered the failing block with try-except statements.

* EI-CoreBioinformatics#174: this should solve it. Now missing external scores in the database will cause Mikado to explicitly fail.

* Fixed EI-CoreBioinformatics#176

* BROKEN. Progress on EI-CoreBioinformatics#142, the code runs, but the tests are broken. **This might be legitimate as we changed the behaviour of the code.**

* Closing EI-CoreBioinformatics#155.

* EI-CoreBioinformatics#174: Now Mikado pick will die informatively if the SQLite3 database has not been found.

* EI-CoreBioinformatics#166: fixed some issues with self-compare

* BROKEN. We have to verify that the padding functions also on the 5' end, but we need to make a new test for that. The test development is in progress.

* The padding now should be tested and correct.

* Fixed previous commit. This should fix EI-CoreBioinformatics#142.

* Update Singularity.centos.def

Changed python to python3 during %post, otherwise it will use the system python2.7...

* Fixed small bug in external metrics handling

* Update Singularity.centos.def

* Development (EI-CoreBioinformatics#184)

* This should address EI-CoreBioinformatics#173 (both configuration file and docs) and EI-CoreBioinformatics#158

* Fix EI-CoreBioinformatics#181 and small bug fix for parsing Mikado annotations.

* Progress for EI-CoreBioinformatics#142 - this should fix the wrong ORF calculation for cases when the CDS was open at the 5' end.

* Fixed previous commit (always for EI-CoreBioinformatics#142)

* EI-CoreBioinformatics#142: corrected and tested the issue with one-off exons, for padding.

* This should fix and test EI-CoreBioinformatics#142 for good.

* Removed spurious warning/error messages

* EI-CoreBioinformatics#142: solved a bug which caused truncated transcripts at the 5' end not to be padded.

* EI-CoreBioinformatics#142: solved a problem which caused a false abort for transcripts on the - strand with changed stop codon.

* EI-CoreBioinformatics#142: fixing previous commit

* Pushing the fix for EI-CoreBioinformatics#182 onto the development branch

* Fix EI-CoreBioinformatics#183

* Fix EI-CoreBioinformatics#183 and previous commit

* EI-CoreBioinformatics#183: now Mikado configure will set a seed when generating the configuration file. The seed will be explicitly mentioned in the log.

* EI-CoreBioinformatics#177: made ORF loading slightly faster with pysam. Also made XML serialisation much faster using SQL sessions and multiprocessing.Pool instead of queues.

* Solved annoying bug that caused Mikado to crash with TAIR GFF3s.
lucventurini added a commit to lucventurini/mikado that referenced this issue Feb 11, 2021
* Solved a small bug in the Gene class

* This commit should fix some of the performance issues found in Mikado compare when testing in the all vs all (issue EI-CoreBioinformatics#166).

* Updated the CHANGELOG.

* Slight improvements to the generic GFLine class and to the to_gff wrapper

* Solved some assorted bugs, from stop_codon parsing in GTF2 (for Augustus) to avoiding a very costly pragma check on MIDX databases.

* Now Mikado util stats will only return one value for the mode, making the table parsable

* Solved some small bugs introduced by changing the mode for mikado util stats

* Dropping automated support for Python3.5. The conda environment cannot be created successfully, too many packages have not been updated in the original repositories.

* Updating the conda environment to reflect that only Python>=3.6 is now accepted

* Various fixes for managing correctly BED12 files.

* Fix for the previous commit breaking TRAVIS

* Switched to PySam for loading and fetching from genome files. Also, improved massively the speed of tests.

* Fixed previous commit

* Fixed travis bug

* Refactoring of check_index for Mikado compare (EI-CoreBioinformatics#166) and fix for EI-CoreBioinformatics#172

* Now Mikado will merge touching (NOT overlapping) exons coming from BED12 files. This should fix an issue with halLiftover

* This commit should fix a bunch of tests for when Mikado is installed with SUDO privileges (EI-CoreBioinformatics#137) potentially also fixing EI-CoreBioinformatics#172.

* Corrected a bug in the printing of transcriptomic BED12 files, corrected a bug in the serialisation of ORFs

* Fixed previous breakage

* Moved the code for checking the index into gene_dict. Also, now GeneDict allows access to positions as well.

* Minor edit to assigner

* Fixing previously broken commit

* Solving a bug which rendered the exclude_utr/protein_coding flags of mikado compare useless.

* Adding the GZI index to the tests directory to avoid permission errors. Addressing EI-CoreBioinformatics#175

* Corrected some testing. Moreover, now Mikado supports the BED12+1 format (ie gffread --bed output)

* Adding a maximum intron length for the default scoring configuration files.

* BROKEN. Proceeding on EI-CoreBioinformatics#142. Now the padding algorithm is aware of where a transcript finishes (intron vs exon). Moreover, we need to change the data structure for padding to a *directional* graph and keep in mind the distance needed to pad a transcript, to solve ambiguous cases in a deterministic (rather than random) way.

* Issue EI-CoreBioinformatics#174: modification to the abstractlocus.py file, to try to solve the issue found by @cschuh.

* EI-CoreBioinformatics#174: this should provide a solution to the issue, which is however only temporary. To be tested.

* EI-CoreBioinformatics#174: making the implicit "for" cycle explicit. Hopefully this should help pinpoint the error better.

* EI-CoreBioinformatics#174: peppered the failing block with try-except statements.

* EI-CoreBioinformatics#174: this should solve it. Now missing external scores in the database will cause Mikado to explicitly fail.

* Fixed EI-CoreBioinformatics#176

* BROKEN. Progress on EI-CoreBioinformatics#142, the code runs, but the tests are broken. **This might be legitimate as we changed the behaviour of the code.**

* Closing EI-CoreBioinformatics#155.

* EI-CoreBioinformatics#174: Now Mikado pick will die informatively if the SQLite3 database has not been found.

* EI-CoreBioinformatics#166: fixed some issues with self-compare

* BROKEN. We have to verify that the padding functions also on the 5' end, but we need to make a new test for that. The test development is in progress.

* The padding now should be tested and correct.

* Fixed previous commit. This should fix EI-CoreBioinformatics#142.

* Development (EI-CoreBioinformatics#178)

* Switched to PySam for loading and fetching from genome files. Also, improved massively the speed of tests.

* Fixed previous commit

* Fixed travis bug

* Refactoring of check_index for Mikado compare (EI-CoreBioinformatics#166) and fix for EI-CoreBioinformatics#172

* Now Mikado will merge touching (NOT overlapping) exons coming from BED12 files. This should fix an issue with halLiftover

* This commit should fix a bunch of tests for when Mikado is installed with SUDO privileges (EI-CoreBioinformatics#137) potentially also fixing EI-CoreBioinformatics#172.

* Corrected a bug in the printing of transcriptomic BED12 files, corrected a bug in the serialisation of ORFs

* Fixed previous breakage

* Moved the code for checking the index into gene_dict. Also, now GeneDict allows access to positions as well.

* Minor edit to assigner

* Fixing previously broken commit

* Solving a bug which rendered the exclude_utr/protein_coding flags of mikado compare useless.

* Adding the GZI index to the tests directory to avoid permission errors. Addressing EI-CoreBioinformatics#175

* Corrected some testing. Moreover, now Mikado supports the BED12+1 format (ie gffread --bed output)

* Adding a maximum intron length for the default scoring configuration files.

* BROKEN. Proceeding on EI-CoreBioinformatics#142. Now the padding algorithm is aware of where a transcript finishes (intron vs exon). Moreover, we need to change the data structure for padding to a *directional* graph and keep in mind the distance needed to pad a transcript, to solve ambiguous cases in a deterministic (rather than random) way.

* Issue EI-CoreBioinformatics#174: modification to the abstractlocus.py file, to try to solve the issue found by @cschuh.

* EI-CoreBioinformatics#174: this should provide a solution to the issue, which is however only temporary. To be tested.

* EI-CoreBioinformatics#174: making the implicit "for" cycle explicit. Hopefully this should help pinpoint the error better.

* EI-CoreBioinformatics#174: peppered the failing block with try-except statements.

* EI-CoreBioinformatics#174: this should solve it. Now missing external scores in the database will cause Mikado to explicitly fail.

* Fixed EI-CoreBioinformatics#176

* BROKEN. Progress on EI-CoreBioinformatics#142, the code runs, but the tests are broken. **This might be legitimate as we changed the behaviour of the code.**

* Closing EI-CoreBioinformatics#155.

* EI-CoreBioinformatics#174: Now Mikado pick will die informatively if the SQLite3 database has not been found.

* EI-CoreBioinformatics#166: fixed some issues with self-compare

* BROKEN. We have to verify that the padding functions also on the 5' end, but we need to make a new test for that. The test development is in progress.

* The padding now should be tested and correct.

* Fixed previous commit. This should fix EI-CoreBioinformatics#142.

* Update Singularity.centos.def

Changed python to python3 during %post, otherwise it will use the system python2.7...

* Fixed small bug in external metrics handling

* Update Singularity.centos.def

* This should address EI-CoreBioinformatics#173 (both configuration file and docs) and EI-CoreBioinformatics#158

* Fix EI-CoreBioinformatics#181 and small bug fix for parsing Mikado annotations.

* Progress for EI-CoreBioinformatics#142 - this should fix the wrong ORF calculation for cases when the CDS was open at the 5' end.

* Fixed previous commit (always for EI-CoreBioinformatics#142)

* EI-CoreBioinformatics#142: corrected and tested the issue with one-off exons, for padding.

* This should fix and test EI-CoreBioinformatics#142 for good.

* Removed spurious warning/error messages

* EI-CoreBioinformatics#142: solved a bug which caused truncated transcripts at the 5' end not to be padded.

* EI-CoreBioinformatics#142: solved a problem which caused a false abort for transcripts on the - strand with changed stop codon.

* EI-CoreBioinformatics#142: fixing previous commit

* Pushing the fix for EI-CoreBioinformatics#182 onto the development branch

* Fix EI-CoreBioinformatics#183

* Fix EI-CoreBioinformatics#183 and previous commit

* EI-CoreBioinformatics#183: now Mikado configure will set a seed when generating the configuration file. The seed will be explicitly mentioned in the log.

* EI-CoreBioinformatics#177: made ORF loading slightly faster with pysam. Also made XML serialisation much faster using SQL sessions and multiprocessing.Pool instead of queues.

* Solved annoying bug that caused Mikado to crash with TAIR GFF3s.

* Development (EI-CoreBioinformatics#184)

* This should address EI-CoreBioinformatics#173 (both configuration file and docs) and EI-CoreBioinformatics#158

* Fix EI-CoreBioinformatics#181 and small bug fix for parsing Mikado annotations.

* Progress for EI-CoreBioinformatics#142 - this should fix the wrong ORF calculation for cases when the CDS was open at the 5' end.

* Fixed previous commit (always for EI-CoreBioinformatics#142)

* EI-CoreBioinformatics#142: corrected and tested the issue with one-off exons, for padding.

* This should fix and test EI-CoreBioinformatics#142 for good.

* Removed spurious warning/error messages

* EI-CoreBioinformatics#142: solved a bug which caused truncated transcripts at the 5' end not to be padded.

* EI-CoreBioinformatics#142: solved a problem which caused a false abort for transcripts on the - strand with changed stop codon.

* EI-CoreBioinformatics#142: fixing previous commit

* Pushing the fix for EI-CoreBioinformatics#182 onto the development branch

* Fix EI-CoreBioinformatics#183

* Fix EI-CoreBioinformatics#183 and previous commit

* EI-CoreBioinformatics#183: now Mikado configure will set a seed when generating the configuration file. The seed will be explicitly mentioned in the log.

* EI-CoreBioinformatics#177: made ORF loading slightly faster with pysam. Also made XML serialisation much faster using SQL sessions and multiprocessing.Pool instead of queues.

* Solved annoying bug that caused Mikado to crash with TAIR GFF3s.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants