Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Diffusion Transformer Training Pipeline #10843

Merged
merged 6 commits into from
Oct 13, 2024
Merged

Conversation

zpx01
Copy link
Contributor

@zpx01 zpx01 commented Oct 10, 2024

What does this PR do ?

Implements end-to-end diffusion transformer (DiT) pretraining / fine-tuning.

Collection: diffusion

Changelog

  • Adds DiT model implementation with cross attention functionality.
  • Adds EDM diffusion sampler for higher-order sampling.
  • Adds training script to train DiT models on text/image datasets.

Usage

Readme contains instructions on how to launch training.

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

# limitations under the License.


import math

Check notice

Code scanning / CodeQL

Unused import Note

Import of 'math' is not used.


import math
from typing import Dict, Literal, Optional

Check notice

Code scanning / CodeQL

Unused import Note

Import of 'Dict' is not used.
Import of 'Literal' is not used.
Import of 'Optional' is not used.
import math
from typing import Dict, Literal, Optional

import numpy as np

Check notice

Code scanning / CodeQL

Unused import Note

Import of 'np' is not used.

import numpy as np
import torch
import torch.nn.functional as F

Check notice

Code scanning / CodeQL

Unused import Note

Import of 'F' is not used.
import torch.nn.functional as F
from diffusers.models.embeddings import TimestepEmbedding, get_3d_sincos_pos_embed
from einops import rearrange
from einops.layers.torch import Rearrange

Check notice

Code scanning / CodeQL

Unused import Note

Import of 'Rearrange' is not used.
nemo/collections/diffusion/models/dit/dit_layer_spec.py Dismissed Show dismissed Hide dismissed

import torch
import torch.nn as nn
import torch.nn.functional as F

Check notice

Code scanning / CodeQL

Unused import Note

Import of 'F' is not used.

import importlib
import warnings
from dataclasses import dataclass, field

Check notice

Code scanning / CodeQL

Unused import Note

Import of 'field' is not used.
import numpy as np
import torch
import torch.distributed
from einops import rearrange

Check notice

Code scanning / CodeQL

Unused import Note

Import of 'rearrange' is not used.
Comment on lines +165 to +167
def training_step(
self, data_batch: dict[str, torch.Tensor], iteration: int
) -> tuple[dict[str, torch.Tensor], torch.Tensor]:

Check notice

Code scanning / CodeQL

Returning tuples with varying lengths Note

EDMPipeline.training_step returns
tuple of size 2
and
tuple of size 3
.
@ethanhe42
Copy link
Member

can you fix "
Isort and Black Formatting / reformat_with_isort_and_black (pull_request_target) "?

@ko3n1g ko3n1g added Run CICD and removed Run CICD labels Oct 11, 2024
Signed-off-by: Zeeshan Patel <zeeshanp@berkeley.edu>
Signed-off-by: Zeeshan Patel <zeeshanp@berkeley.edu>
Signed-off-by: Zeeshan Patel <zeeshanp@berkeley.edu>
Signed-off-by: Zeeshan Patel <zeeshanp@berkeley.edu>
Signed-off-by: Zeeshan Patel <zeeshanp@berkeley.edu>
@zpx01
Copy link
Contributor Author

zpx01 commented Oct 11, 2024

can you fix " Isort and Black Formatting / reformat_with_isort_and_black (pull_request_target) "?

@ethanhe42 this is fixed now, it should be ready to merge.

from einops import rearrange
from einops.layers.torch import Rearrange
from megatron.core import parallel_state
from megatron.core.models.common.embeddings.rotary_pos_embedding import get_pos_emb_on_this_cp_rank

Check notice

Code scanning / CodeQL

Unused import Note

Import of 'get_pos_emb_on_this_cp_rank' is not used.
from megatron.core import parallel_state
from megatron.core.models.common.embeddings.rotary_pos_embedding import get_pos_emb_on_this_cp_rank
from megatron.core.transformer.module import MegatronModule
from torch import nn

Check notice

Code scanning / CodeQL

Unused import Note

Import of 'nn' is not used.
nemo/collections/diffusion/models/dit/dit_model.py Dismissed Show dismissed Hide dismissed
@ethanhe42
Copy link
Member

seems that one is still failing "Code scanning results / CodeQL"

@ethanhe42 ethanhe42 enabled auto-merge (squash) October 11, 2024 20:52
@ethanhe42 ethanhe42 enabled auto-merge (squash) October 13, 2024 07:02
@ethanhe42 ethanhe42 merged commit ce21ffb into NVIDIA:main Oct 13, 2024
165 of 169 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants