Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New facts/totaling script using dask for chunking does not work on Docker #343

Open
bobkopp opened this issue Sep 3, 2024 · 3 comments
Open
Assignees
Labels
bug Something isn't working
Milestone

Comments

@bobkopp
Copy link
Collaborator

bobkopp commented Sep 3, 2024

The use of dask for chunking improves speed greatly on Amarel but does not currently work on docker.

https://github.com/radical-collaboration/facts/blob/development/modules/facts/total/total_workflow.py

@bobkopp bobkopp added the bug Something isn't working label Sep 3, 2024
@bobkopp bobkopp added this to the v1.2 milestone Sep 3, 2024
@bobkopp
Copy link
Collaborator Author

bobkopp commented Sep 5, 2024

@AlexReedy has narrowed this down to something about the dask chunksize - works with a chunksize of 3000 but not 50, at least in the set up he was running. @kemccusker maybe this is enough of a clue to figure this out?

@AlexReedy
Copy link
Collaborator

iterating a bit, on this 3000 was just some ridiculous value to see if it worked, tried at 200 and that was a little slow and 500 seems to be working pretty quickly.

@AlexReedy
Copy link
Collaborator

AlexReedy commented Sep 6, 2024

default chunksize of 500 works on amarel and on docker (for default runs) I have pushed the changes into the development branch and left a further elaborating.

1c4befc

We can either close this or leave it open as the two systems have drastically different memory capacities so 500 may work for a default on docker but probably won't for a full run (66,000 locations) though I don't think anyone is doing that on Docker right now so... It's up to you guys.

May be better to leave open so @kemccusker can confirm the changes work. Note for @kemccusker, last week we just swapped out your totaling script for the old old one. You can

  1. Clone the new dev branch
  2. CP the total_workflow.py script from the current dev branch into yours
  3. If you are actually still using the version of total_workflow.py that uses dask all you have to do is change the chunksize (but I don't think you are)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants