Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempt to fix #349. #365

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

christianbioinf
Copy link
Contributor

Hi,

I know that this may not be a suitable solution for the community of umi_tools but I still wanted to provide my progress on resolving #349.

I found out that it is caused by the use of unordered structures like dicts (for python < 3.5) and sets (for all version of python3) since this order sometimes influences the results of dedup operations (e.g. the order in which clusters are being built). This "random" behaviour can be suppressed by setting the environment variable PYTHONHASHSEED before starting the interpreter. However, since I am freezing the application with Pyinstaller, it is currently not possible to set this before the interpreter inside the bootloader starts up (see pyinstaller/pyinstaller#3665).

As I am only interested in reproducibility with python3.5+, I simply replaced all relevant occurences of sets with dicts inside network.py. Note that I also tried to fix the recursive version of the breadth-first search algorithm (if someone is interested in this). For completeness, I updated the test result files and nosetests is now running fine even without setting PYTHONHASHSEED.

As I already mentioned, I am aware of the limitations of my fix (in the supported python versions) and the dependency on the order inside dicts which is currently only considered an implementation detail and not part of the spec yet. Also, my use case is rather specific but since more and more people and researchers are concerned about reproducibility, freezing python application (especially those with lot of python package dependencies) is important and hence it may be also interesting for a wider audience.

I am happy to take feedback or suggestions how I could improve it in order for it to be merged into the master branch.

Best,
Christian

…ries (only true for python3.5+) in cases where the order of elements influences the results. Also updating test results to compensate for using dicts as ordered structures. Note that it is now not necessary to set the environment variable PYTHONHASHSEED anymore.
@IanSudbery
Copy link
Member

Hi @christianbioinf The latest release should finally bring reproducible behavoir to UMI-tools without the need for custom fixes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants