Fix PGAN issues #308

hvgazula · 2024-03-23T02:18:05Z

From @satra :

There is an issue with loading checkpoints (in multi-GPU case).

Notes (03/22/2024)

could not run the example on dgx100, so moving it to CPU for testing.
4 CPUs, 96 GB: was able to write shard for different resolutions

nobrainer/nobrainer/processing/generation.py

Line 8 in 902a124

from ..dataset import get_dataset

This is a relic from a previous iteration. This means replacing the following snippet as well

nobrainer/nobrainer/processing/generation.py

Lines 150 to 158 in 902a124

    
           dataset = get_dataset( 
        
               file_pattern=info.get("file_pattern"), 
        
               batch_size=batch_size, 
        
               num_parallel_calls=num_parallel_calls, 
        
               volume_shape=(resolution, resolution, resolution), 
        
               n_classes=1, 
        
               scalar_label=True, 
        
               normalizer=info.get("normalizer") or normalizer, 
        
           )

hvgazula · 2024-03-23T02:25:39Z

running the code in the PGAN notebook with the aforementioned fix throws up the following error

hvgazula · 2024-03-23T02:41:20Z

Note: Running the brain_extraction.ipynb turns up DatasetAdapter at https://github.com/tensorflow/tensorflow/blob/51871ec0c5d2925cbbf7aa539087ac51ea27892e/tensorflow/python/keras/engine/data_adapter.py#L987 and type(x) returns python.data.ops.dataset_ops.BatchDataset

Adding dataset.batch(1) before the call to fit in generation.py still returns the same error.

satra · 2024-03-23T12:36:06Z

this should get you past the dataset issues and the notebook appears to train.

diff --git a/nobrainer/processing/generation.py b/nobrainer/processing/generation.py
--- a/nobrainer/processing/generation.py
+++ b/nobrainer/processing/generation.py
@@ -5,7 +5,7 @@ import tensorflow as tf
 
 from .base import BaseEstimator
 from .. import losses
-from ..dataset import get_dataset
+from ..dataset import Dataset
 
 
 class ProgressiveGeneration(BaseEstimator):
@@ -147,15 +147,17 @@ class ProgressiveGeneration(BaseEstimator):
             if batch_size % self.strategy.num_replicas_in_sync:
                 raise ValueError("batch size must be a multiple of the number of GPUs")
 
-            dataset = get_dataset(
+            dataset = Dataset.from_tfrecords(
                 file_pattern=info.get("file_pattern"),
-                batch_size=batch_size,
                 num_parallel_calls=num_parallel_calls,
                 volume_shape=(resolution, resolution, resolution),
                 n_classes=1,
-                scalar_label=True,
-                normalizer=info.get("normalizer") or normalizer,
+                scalar_labels=True
             )
+            n_epochs = info.get("epochs") or epochs
+            dataset.batch(batch_size) \
+            .normalize(info.get("normalizer") or normalizer) \
+            .repeat(n_epochs)
 
             with self.strategy.scope():
                 # grow the networks by one (2^x) resolution
@@ -164,9 +166,7 @@ class ProgressiveGeneration(BaseEstimator):
                     self.model_.discriminator.add_resolution()
                 _compile()
 
-                steps_per_epoch = (info.get("epochs") or epochs) // info.get(
-                    "batch_size"
-                )
+                steps_per_epoch = n_epochs // info.get("batch_size")
 
                 # save_best_only is set to False as it is an adversarial loss
                 model_checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(
@@ -182,7 +182,7 @@ class ProgressiveGeneration(BaseEstimator):
 
             print("Transition phase")
             self.model_.fit(
-                dataset,
+                dataset.dataset,
                 phase="transition",
                 resolution=resolution,
                 steps_per_epoch=steps_per_epoch,  # necessary for repeat dataset
@@ -191,7 +191,7 @@ class ProgressiveGeneration(BaseEstimator):
 
             print("Resolution phase")
             self.model_.fit(
-                dataset,
+                dataset.dataset,
                 phase="resolution",
                 resolution=resolution,
                 steps_per_epoch=steps_per_epoch,

satra · 2024-03-23T13:10:44Z

added a few updates above and generation notebook completes.

hvgazula · 2024-03-23T14:37:25Z

Thank you very much @satra. dataset.dataset it is.

hvgazula · 2024-03-23T14:37:31Z

nobrainer/nobrainer/processing/generation.py

Lines 167 to 169 in 902a124

    
           steps_per_epoch = (info.get("epochs") or epochs) // info.get( 
        
               "batch_size" 
        
           )

steps_per_epoch should be training_size // batch_size and not as noted. Also, there is no need to calculate this explicitly. The fit function will take care of it in the default case. See this

hvgazula · 2024-03-23T15:34:40Z

TODO: warm start

hvgazula self-assigned this Mar 23, 2024

hvgazula added a commit that referenced this issue Mar 23, 2024

resolved #308 (comment)

093daff

hvgazula changed the title ~~PGAN: get_dataset doesn't exist in dataset.py~~ Fix PGAN issues Mar 23, 2024

hvgazula added this to the 1.2.1 milestone Mar 23, 2024

hvgazula added a commit that referenced this issue Mar 23, 2024

resolved #308 (comment)

66edbb2

satra mentioned this issue Apr 3, 2024

Fix PGAN notebook #319

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix PGAN issues #308

Fix PGAN issues #308

hvgazula commented Mar 23, 2024 •

edited

Loading

hvgazula commented Mar 23, 2024

hvgazula commented Mar 23, 2024

satra commented Mar 23, 2024 •

edited

Loading

satra commented Mar 23, 2024

hvgazula commented Mar 23, 2024

hvgazula commented Mar 23, 2024

hvgazula commented Mar 23, 2024

Fix PGAN issues #308

Fix PGAN issues #308

Comments

hvgazula commented Mar 23, 2024 • edited Loading

hvgazula commented Mar 23, 2024

hvgazula commented Mar 23, 2024

satra commented Mar 23, 2024 • edited Loading

satra commented Mar 23, 2024

hvgazula commented Mar 23, 2024

hvgazula commented Mar 23, 2024

hvgazula commented Mar 23, 2024

hvgazula commented Mar 23, 2024 •

edited

Loading

satra commented Mar 23, 2024 •

edited

Loading