Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prune non-consumed EDProducers that are only in Tasks #26438

Closed
makortel opened this issue Apr 11, 2019 · 16 comments · Fixed by #29553
Closed

Prune non-consumed EDProducers that are only in Tasks #26438

makortel opened this issue Apr 11, 2019 · 16 comments · Fixed by #29553

Comments

@makortel
Copy link
Contributor

makortel commented Apr 11, 2019

If a configuration contains an EDProducer in a Task (and only in Tasks) that is not consumed by the chain of consumes starting from the output modules, the produce() of such an EDProducer is not run, but all other transition functions are. This behavior may be confusing.

The behavior becomes even more confusing with SwitchProducer that intentionally injects all the case-producers into the configuration (like they would be in an anonymous task), but only one of them intended to be run. This means that all non-event transitions will be called for all the case-producers, which may be a bit surprising.

It follows that e.g. if CUDA producers (either as a case of SwitchProducer, or being consumed by a case -producer) want to call CUDA API from non-produce()/acquire() functions, the availability of CUDA devices would have to be explicitly checked.

If we would prune non-consumed EDProducers, they would be ignored for all transitions, and no explicit checks for CUDA (or anything else similar) would have to be done. The exceptions would be the constructor and destructor, that would still be always called for all EDProducers.

@cmsbuild
Copy link
Contributor

A new Issue was created by @makortel Matti Kortelainen.

@davidlange6, @Dr15Jones, @smuzaffar, @fabiocos, @kpedro88 can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@Dr15Jones
Copy link
Contributor

assign core

@cmsbuild
Copy link
Contributor

New categories assigned: core

@Dr15Jones,@smuzaffar you have been requested to review this Pull request/Issue and eventually sign? Thanks

@Dr15Jones
Copy link
Contributor

Producers explicitly on a Path/EndPath should not be pruned. Only Producers on Tasks could be potentially pruned.

@makortel makortel changed the title Prune non-consumed EDProducers Prune non-consumed EDProducers that are only in Tasks Apr 11, 2019
@makortel
Copy link
Contributor Author

Producers explicitly on a Path/EndPath should not be pruned. Only Producers on Tasks could be potentially pruned.

Good point, I edited the issue description and the title accordingly.

@makortel
Copy link
Contributor Author

Regarding the SwitchProducer use-case, we could likely easily communicate the non-chosen cases via an untracked parameter from python to C++ to know for which modules the constructor call can be avoided.

A further complication comes from the configuration validation though (done as part of the module construction). This step leads to the call of module's static fillDescriptions(), and adding parameters that were missing from the input ParameterSet. Notably this step can add tracked parameters, and can thus modify the hash of the full job configuration, and therefore has to be run for all cases of the SwitchProducer.

@Dr15Jones had another idea that would allow omitting the unneeded module construction altogether (and thus loading unneeded GPU/accelerator runtime libraries). We could omit the PSet validation if we would know that the module PSet is "complete" (i.e. PSet validation would not add any new parameters). We could define module PSets generated by fillDescriptions() as "complete" (by some "hidden" property of the module PSet), and disallow addition and removal of parameters, and then make SwitchProducer to only allow "complete" module PSets.

This behavior would,however, essentially forbid the use of helper plugins in modules used by SwitchProducer. A possible solution could be to implement and utilize something along #25270 (comment) (i.e. store possible values of plugin PSets within the module python object) and allow changing the plugin PSet with one of the allowed ones.

After writing these notes from a discussion of mine and @Dr15Jones I realized though that dealing with SwitchProducer alone is not sufficient. We can have GPU/accelerator modules also in Tasks outside of SwitchProducer, in fact that is the currently recommended pattern: put all you can in the Tasks, put only what needed in SwitchProducer, and let the prefecthing take care of running the proper modules.

So we need to think about this further.

@makortel
Copy link
Contributor Author

We can have GPU/accelerator modules also in Tasks outside of SwitchProducer, in fact that is the currently recommended pattern: put all you can in the Tasks, put only what needed in SwitchProducer, and let the prefecthing take care of running the proper modules.

A possibility @Dr15Jones suggested quickly on the other day: we could introduce a SwitchTask that would have a similar switching mechanism for Tasks as SwitchProducer has for producers and aliases.

One complication further came to my mind: ESProducers. In principle SwitchTask could be used for ESProducers as well. After reading the documentation again, this approach should actually work pretty easily (only the accelerators-specific ESProducers need to be put in the Tasks to be able to avoid loading their DSO's, and we don't have any of them (except in patatrack prototype) so it should be easy to make a policy that all such ESProducers etc. should be put in Tasks).

@makortel
Copy link
Contributor Author

makortel commented Dec 6, 2019

#28576 raises a bit more general issue that what if we can't even build the EDProducers on some architectures.

@fwyzard
Copy link
Contributor

fwyzard commented Dec 10, 2019

In the present system, what happens if the two alternatives in a SwitchProducer implement e.g. edm::BeginLuminosityBlockProducer or edm::EndLuminosityBlockProducer ?

Do the e.g. globalBeginLuminosityBlockProduce and globalEndLuminosityBlockProduce methods get called ?

@Dr15Jones
Copy link
Contributor

They both get called.

@makortel
Copy link
Contributor Author

makortel commented Mar 29, 2020

Summarizing from #28576 (comment): we plan to

  • After constructing all modules, i.e. figuring out the data dependence DAG, immediately delete all unscheduled (=not in any Path or EndPath) modules whose output is not consumed
    • Architecture-dependent initialization should be then done in beginJob() or beginStream()
      that is not called if the module is deleted because no-one consumed its output
  • Add globalBeginJob() to stream modules with GlobalCache (all other cases already have beginJob() or beginStream())
    • run after constructor, but before beginStream()
    • one call (global) per module (no stream replication)
    • not called if module is deleted
    • e.g. for ML inference (TensorFlow etc), initialize global cache here

@hqucms
Copy link
Contributor

hqucms commented Apr 8, 2020

Summarizing from #28576 (comment): we plan to

  • After constructing all modules, i.e. figuring out the data dependence DAG, immediately delete all unscheduled (=not in any Path or EndPath) modules whose output is not consumed

    • Architecture-dependent initialization should be then done in beginJob() or beginStream()
      that is not called if the module is deleted because no-one consumed its output
  • Add globalBeginJob() to stream modules with GlobalCache (all other cases already have beginJob() or beginStream())

    • run after constructor, but before beginStream()
    • one call (global) per module (no stream replication)
    • not called if module is deleted
    • e.g. for ML inference (TensorFlow etc), initialize global cache here

@makortel To follow up on this -- have these already been implemented?

@makortel
Copy link
Contributor Author

makortel commented Apr 8, 2020

Summarizing from #28576 (comment): we plan to

  • After constructing all modules, i.e. figuring out the data dependence DAG, immediately delete all unscheduled (=not in any Path or EndPath) modules whose output is not consumed

    • Architecture-dependent initialization should be then done in beginJob() or beginStream()
      that is not called if the module is deleted because no-one consumed its output
  • Add globalBeginJob() to stream modules with GlobalCache (all other cases already have beginJob() or beginStream())

    • run after constructor, but before beginStream()
    • one call (global) per module (no stream replication)
    • not called if module is deleted
    • e.g. for ML inference (TensorFlow etc), initialize global cache here

To follow up on this -- have these already been implemented?

@hqucms Not yet. I'm working on the first one (turned out to be a bit trickier than I hoped), the second one fill follow (should be easier). Or would it actually make the work in #29172 easier if the globalBeginJob() would become available sooner?

@hqucms
Copy link
Contributor

hqucms commented Apr 8, 2020

Or would it actually make the work in #29172 easier if the globalBeginJob() would become available sooner?

@makortel That would be nice, but it's not urgent. Just let me know when it becomes available:)

@makortel
Copy link
Contributor Author

After constructing all modules, i.e. figuring out the data dependence DAG, immediately delete all unscheduled (=not in any Path or EndPath) modules whose output is not consumed

  • Architecture-dependent initialization should be then done in beginJob() or beginStream()
    that is not called if the module is deleted because no-one consumed its output

This part is done in #29553.

@makortel
Copy link
Contributor Author

Add globalBeginJob() to stream modules with GlobalCache (all other cases already have beginJob() or beginStream())

  • run after constructor, but before beginStream()
  • one call (global) per module (no stream replication)
  • not called if module is deleted
  • e.g. for ML inference (TensorFlow etc), initialize global cache here

This part is done in #29585 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants