Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi,
I know that this may not be a suitable solution for the community of umi_tools but I still wanted to provide my progress on resolving #349.
I found out that it is caused by the use of unordered structures like
dicts
(for python < 3.5) andsets
(for all version of python3) since this order sometimes influences the results of dedup operations (e.g. the order in which clusters are being built). This "random" behaviour can be suppressed by setting the environment variablePYTHONHASHSEED
before starting the interpreter. However, since I am freezing the application withPyinstaller
, it is currently not possible to set this before the interpreter inside the bootloader starts up (see pyinstaller/pyinstaller#3665).As I am only interested in reproducibility with python3.5+, I simply replaced all relevant occurences of
sets
withdicts
insidenetwork.py
. Note that I also tried to fix the recursive version of the breadth-first search algorithm (if someone is interested in this). For completeness, I updated the test result files andnosetests
is now running fine even without settingPYTHONHASHSEED
.As I already mentioned, I am aware of the limitations of my fix (in the supported python versions) and the dependency on the order inside
dicts
which is currently only considered an implementation detail and not part of the spec yet. Also, my use case is rather specific but since more and more people and researchers are concerned about reproducibility, freezing python application (especially those with lot of python package dependencies) is important and hence it may be also interesting for a wider audience.I am happy to take feedback or suggestions how I could improve it in order for it to be merged into the master branch.
Best,
Christian