Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems with large variables in global section #1225

Closed
BoPeng opened this issue Mar 1, 2019 · 11 comments
Closed

Problems with large variables in global section #1225

BoPeng opened this issue Mar 1, 2019 · 11 comments

Comments

@BoPeng
Copy link
Contributor

BoPeng commented Mar 1, 2019

With #1219 , we now pickle all the variables generated by the global section, and send it around to all steps, substeps, tasks, etc. However, with the following workflow,

[global]
HUGE_VAR=something

[a]
print(1)

we are sending a large variable to a unnecessarily , and in some cases would cause serious performance problem.

The fix seems to be easy because SoS parse sections and find variables that are used in a step. That is to say, we can handle step a in a way that only pass variables used by a to it. Some caution needs to be made to nested workflows, especially when the workflow is read from another file though.

@BoPeng
Copy link
Contributor Author

BoPeng commented Mar 1, 2019

The patch does not cover the case where variables are used by input statement but not substeps, namely

[global]
a=range(10)
b=range(11, 20)

[default]
input: for_each=dict(i=range(len(a)))

task: trunk_size=10

print(i)

@gaow
Copy link
Member

gaow commented Mar 1, 2019

Thanks for the fix! There is a related concern I raised earlier about unpickling behavior of substeps, that is, they each unpickle the same file, rather than copy from the step which unpickles the file just ones. Is that a relevant concern to this issue?

@BoPeng
Copy link
Contributor Author

BoPeng commented Mar 1, 2019

There is no "file" involved. Originally I "preserved" the global section by pickling a dictionary with all the variables, now I simple replace the original global_def (statements to be executed) to be

  1. global_def, which is some stripped and compiled version of statements with only import, def, and class
  2. global_vars, which is the pickleable stuff from the global section after it is executed (only once).

These two structures are pickled around with other information of the steps and substeps.

The patch just filter global_vars and keep only variables that are used in the step.

@gaow
Copy link
Member

gaow commented Mar 1, 2019

Oh I see, it is passed around as a variable. Unpickling from a variable should be as good as a deep copy particularly now after you've filtered it.

Should I give it a go, or wait for tests to pass?

@BoPeng
Copy link
Contributor Author

BoPeng commented Mar 1, 2019

One test failed. I just figured out why...

@BoPeng
Copy link
Contributor Author

BoPeng commented Mar 1, 2019

The tests should pass... we will see.

@gaow
Copy link
Member

gaow commented Mar 1, 2019

Test still building ... but I submitted the jobs anyways. It seems to be running well!

@gaow
Copy link
Member

gaow commented Mar 1, 2019

Looks like test failed, but it worked for my job -- no hanging or blocking behavior observed.

@BoPeng
Copy link
Contributor Author

BoPeng commented Mar 1, 2019

Yes, it is more troublesome to find all variables that will be used in a step. The last one I fixed was

a = [1,2]

[10]
input: for_each='a'

because a is a string, not a variable. For this exact reason I proposed to get rid of this syntax during our redesign of input but we kept it.

Let me check what remains.

@BoPeng
Copy link
Contributor Author

BoPeng commented Mar 1, 2019

We have 300+ test cases now and the effort was paying off. The failing case is

[global]
parameter: par=5 
def a():
  print(par)

[default]
task:
a()

So basically the default step does not mention par at all, but uses it as a global function.

@BoPeng
Copy link
Contributor Author

BoPeng commented Mar 1, 2019

Travis passed!

I have a feeling that this will fail in more complicated cases though. Let us see.

@BoPeng BoPeng closed this as completed Mar 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants