Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[review] Dask performance tuning + Dask plugin + Smart Cache handling + Composite Node performance tuning + Template Node #123

Merged
merged 41 commits into from
May 5, 2021

Conversation

yidong72
Copy link
Collaborator

@yidong72 yidong72 commented Mar 4, 2021

In this PR, I made following changes:

  1. By default, the Dask dataframe meta data is using the meta_setup.
  2. Added the necessary persist node. The changes of 1-2 make existing dask workflow 2 times faster.
  3. Use the ml.xgboos for XGBoost strategies.
  4. Added the SimpleParallelNode to make parallel computation easy.
  5. Added a notebook as an example use of SimpleParallelNode
  6. Improves the DaskComputeNode to handle multiple ports simultaneously.
  7. Move the DaskComputeNode, SimpleParallelNode and PersistNode into a separate Greenflow plugin
  8. Change the core plots to matplotlib as bqplot is not stable enough.

I tested all the notebooks.

@yidong72 yidong72 requested a review from avolkov1 March 4, 2021 21:22
@yidong72
Copy link
Collaborator Author

yidong72 commented Mar 8, 2021

I the latest change, I added SimpleNodeMixinNode to simplify the node code and added caching mechanism to improve the performance. I tested all the notebooks. It is working and ready for review!

@yidong72
Copy link
Collaborator Author

This should be my last round of change.

The composite node only build the graph once in the update function so ports_setup, meta_setup, conf_schema can be called cheaply.

GridRandomSearchNode is also fixed to removed the strong requirement that all upstream graphs need to be serializable.

Copy link
Contributor

@avolkov1 avolkov1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still reviewing and testing. Left two comments so far.

Copy link
Contributor

@avolkov1 avolkov1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few comments. The logic in the SimpleNodeMixin seems to me like it really should be part of NodeTaskGraphMixin. Let me know what you think.

Can you explain a bit more regarding the SimpleNodeMixin, especially its update, ports_setup and meta_setup methods, if you disagree with me regarding that belonging in the NodeTaskGraphMixin? I think all nodes should have this logic injected when they are within a task graph. It shouldn't be necessary to inherit explicitly from any mixin for a Node's implementation.

greenflow/greenflow/dataframe_flow/simpleNodeMixin.py Outdated Show resolved Hide resolved
greenflow/greenflow/dataframe_flow/simpleNodeMixin.py Outdated Show resolved Hide resolved
greenflow/greenflow/dataframe_flow/simpleNodeMixin.py Outdated Show resolved Hide resolved
greenflow/greenflow/dataframe_flow/simpleNodeMixin.py Outdated Show resolved Hide resolved
greenflow/greenflow/dataframe_flow/simpleNodeMixin.py Outdated Show resolved Hide resolved
@yidong72
Copy link
Collaborator Author

I made a few changes to make the template state private. I checked all the notebooks and unit tests.

@yidong72 yidong72 changed the title [review] Dask performance tuning + Dask plugin [review] Dask performance tuning + Dask plugin + Smart Cache handling + Composite Node performance tuning + Template Node Apr 28, 2021
Copy link
Contributor

@avolkov1 avolkov1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Everything works. I'll merge this.

@avolkov1 avolkov1 merged commit 1d8ec9d into NVIDIA:develop May 5, 2021
@yidong72 yidong72 deleted the dask-performance branch May 10, 2021 17:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants