Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StreamingDataLoader returns empty state dict #19320

Closed
awaelchli opened this issue Jan 20, 2024 · 0 comments · Fixed by #19326
Closed

StreamingDataLoader returns empty state dict #19320

awaelchli opened this issue Jan 20, 2024 · 0 comments · Fixed by #19326
Labels
bug Something isn't working data (external) litdata package ver: 2.2.x

Comments

@awaelchli
Copy link
Contributor

Bug description

The StreamingDataLoader returns empty state dict after it has fetched samples from the dataset.

What version are you seeing the problem on?

master

How to reproduce the bug

import lightning as L
import torch


def run():
    fabric = L.Fabric(devices=4)
    fabric.launch()

    train_dataloader = create_dataloader()

    state = {"train_dataloader": train_dataloader}

    train_iterator = iter(train_dataloader)
    next(train_iterator)
    next(train_iterator)
    next(train_iterator)

    fabric.print("train_dataloader:", train_dataloader.state_dict())  # Why is it empty?

    fabric.save("my-checkpoint.pth", state)
    if fabric.global_rank == 0:
        state = torch.load("my-checkpoint.pth")
        print("saved train_dataloader:", state["train_dataloader"])  # Why is it empty?

    fabric.barrier()


def create_dataloader():
    from lightning.data import StreamingDataset, CombinedStreamingDataset, StreamingDataLoader
    from lightning.data.streaming.item_loader import TokensLoader

    train_datasets = [
        StreamingDataset(
            input_dir="data/slimpajama/train",
            item_loader=TokensLoader(block_size=128),
        ),
        StreamingDataset(
            input_dir="data/starcoder",
            item_loader=TokensLoader(block_size=128),
        ),
    ]
    combined_dataset = CombinedStreamingDataset(datasets=train_datasets)
    train_dataloader = StreamingDataLoader(combined_dataset, batch_size=4, num_workers=8)
    return train_dataloader


if __name__ == "__main__":
    run()

Error messages and logs

The state is empty as printed.

Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/4
Initializing distributed: GLOBAL_RANK: 2, MEMBER: 3/4
Initializing distributed: GLOBAL_RANK: 1, MEMBER: 2/4
Initializing distributed: GLOBAL_RANK: 3, MEMBER: 4/4
----------------------------------------------------------------------------------------------------
distributed_backend=nccl
All distributed processes registered. Starting with 4 processes
----------------------------------------------------------------------------------------------------

train_dataloader: {}
saved train_dataloader: {}

Environment

Current environment
  • CUDA:
    - GPU:
    - NVIDIA A10G
    - NVIDIA A10G
    - NVIDIA A10G
    - NVIDIA A10G
    - available: True
    - version: 12.1
  • Lightning:
    - lightning: 2.2.0.dev0
    - lightning-cloud: 0.5.61
    - lightning-utilities: 0.10.0
    - pytorch-lightning: 2.1.2
    - pytorch-triton: 2.2.0+e28a256d71
    - torch: 2.3.0.dev20240110+cu121
    - torch-tb-profiler: 0.4.3
    - torchmetrics: 1.2.0
  • Packages:
    - absl-py: 2.0.0
    - accelerate: 0.24.1
    - aiofiles: 22.1.0
    - aiohttp: 3.9.0
    - aiosignal: 1.3.1
    - aiosqlite: 0.19.0
    - annotated-types: 0.6.0
    - antlr4-python3-runtime: 4.9.3
    - anyio: 3.7.1
    - appdirs: 1.4.4
    - argon2-cffi: 23.1.0
    - argon2-cffi-bindings: 21.2.0
    - arrow: 1.3.0
    - asttokens: 2.4.1
    - async-timeout: 4.0.3
    - attrs: 23.1.0
    - babel: 2.13.1
    - beautifulsoup4: 4.12.2
    - bitsandbytes: 0.41.0
    - black: 23.12.0
    - bleach: 6.1.0
    - boto3: 1.29.4
    - botocore: 1.32.4
    - cachetools: 5.3.2
    - certifi: 2023.11.17
    - cffi: 1.16.0
    - chardet: 5.2.0
    - charset-normalizer: 3.3.2
    - click: 8.1.7
    - colorama: 0.4.6
    - comm: 0.2.0
    - dataproperty: 1.0.1
    - datasets: 2.15.0
    - debugpy: 1.8.0
    - decorator: 5.1.1
    - defusedxml: 0.7.1
    - dill: 0.3.7
    - distro: 1.8.0
    - docker-pycreds: 0.4.0
    - docstring-parser: 0.15
    - einops: 0.7.0
    - entrypoints: 0.4
    - exceptiongroup: 1.2.0
    - executing: 2.0.1
    - fastapi: 0.104.1
    - fastjsonschema: 2.19.0
    - filelock: 3.13.1
    - fqdn: 1.5.1
    - frozenlist: 1.4.0
    - fsspec: 2023.10.0
    - gitdb: 4.0.11
    - gitpython: 3.1.40
    - google-auth: 2.23.4
    - google-auth-oauthlib: 1.1.0
    - grpcio: 1.59.3
    - gviz-api: 1.10.0
    - h11: 0.14.0
    - httpcore: 1.0.2
    - httpx: 0.25.2
    - huggingface-hub: 0.19.4
    - idna: 3.4
    - importlib-resources: 6.1.1
    - iniconfig: 2.0.0
    - ipykernel: 6.26.0
    - ipython: 8.17.2
    - ipython-genutils: 0.2.0
    - ipywidgets: 8.1.1
    - isoduration: 20.11.0
    - isort: 5.13.2
    - jedi: 0.19.1
    - jinja2: 3.1.2
    - jmespath: 1.0.1
    - joblib: 1.3.2
    - json5: 0.9.14
    - jsonargparse: 4.27.1
    - jsonlines: 4.0.0
    - jsonpointer: 2.4
    - jsonschema: 4.20.0
    - jsonschema-specifications: 2023.11.1
    - jupyter-client: 7.4.9
    - jupyter-core: 5.5.0
    - jupyter-events: 0.9.0
    - jupyter-server: 2.10.1
    - jupyter-server-fileid: 0.9.0
    - jupyter-server-terminals: 0.4.4
    - jupyter-server-ydoc: 0.6.1
    - jupyter-ydoc: 0.2.5
    - jupyterlab: 3.6.1
    - jupyterlab-pygments: 0.2.2
    - jupyterlab-server: 2.25.2
    - jupyterlab-widgets: 3.0.9
    - lightning: 2.2.0.dev0
    - lightning-cloud: 0.5.61
    - lightning-utilities: 0.10.0
    - lm-eval: 0.3.0
    - markdown: 3.5.1
    - markdown-it-py: 3.0.0
    - markupsafe: 2.1.3
    - matplotlib-inline: 0.1.6
    - mbstrdecoder: 1.1.3
    - mdurl: 0.1.2
    - mistune: 3.0.2
    - mpmath: 1.3.0
    - multidict: 6.0.4
    - multiprocess: 0.70.15
    - mypy-extensions: 1.0.0
    - nbclassic: 1.0.0
    - nbclient: 0.9.0
    - nbconvert: 7.11.0
    - nbformat: 5.9.2
    - nest-asyncio: 1.5.8
    - networkx: 3.2.1
    - nltk: 3.8.1
    - notebook: 6.5.6
    - notebook-shim: 0.2.3
    - numexpr: 2.8.7
    - numpy: 1.26.2
    - nvidia-cublas-cu12: 12.1.3.1
    - nvidia-cuda-cupti-cu12: 12.1.105
    - nvidia-cuda-nvrtc-cu12: 12.1.105
    - nvidia-cuda-runtime-cu12: 12.1.105
    - nvidia-cudnn-cu12: 8.9.2.26
    - nvidia-cufft-cu12: 11.0.2.54
    - nvidia-curand-cu12: 10.3.2.106
    - nvidia-cusolver-cu12: 11.4.5.107
    - nvidia-cusparse-cu12: 12.1.0.106
    - nvidia-nccl-cu12: 2.19.3
    - nvidia-nvjitlink-cu12: 12.3.101
    - nvidia-nvtx-cu12: 12.1.105
    - oauthlib: 3.2.2
    - omegaconf: 2.3.0
    - openai: 1.3.6
    - overrides: 7.4.0
    - packaging: 23.2
    - pandas: 2.1.3
    - pandocfilters: 1.5.0
    - parso: 0.8.3
    - pathspec: 0.12.1
    - pathvalidate: 3.2.0
    - peft: 0.6.2
    - pexpect: 4.8.0
    - pillow: 10.1.0
    - pip: 23.3
    - platformdirs: 4.0.0
    - pluggy: 1.3.0
    - portalocker: 2.8.2
    - prometheus-client: 0.19.0
    - prompt-toolkit: 3.0.41
    - protobuf: 4.23.4
    - psutil: 5.9.6
    - ptyprocess: 0.7.0
    - pure-eval: 0.2.2
    - pyarrow: 14.0.1
    - pyarrow-hotfix: 0.6
    - pyasn1: 0.5.1
    - pyasn1-modules: 0.3.0
    - pybind11: 2.11.1
    - pycountry: 22.3.5
    - pycparser: 2.21
    - pydantic: 2.5.1
    - pydantic-core: 2.14.3
    - pygments: 2.17.1
    - pyjwt: 2.8.0
    - pytablewriter: 1.2.0
    - pytest: 7.4.3
    - python-dateutil: 2.8.2
    - python-json-logger: 2.0.7
    - python-multipart: 0.0.6
    - pytorch-lightning: 2.1.2
    - pytorch-triton: 2.2.0+e28a256d71
    - pytz: 2023.3.post1
    - pyyaml: 6.0.1
    - pyzmq: 24.0.1
    - referencing: 0.31.0
    - regex: 2023.10.3
    - requests: 2.31.0
    - requests-oauthlib: 1.3.1
    - rfc3339-validator: 0.1.4
    - rfc3986-validator: 0.1.1
    - rich: 13.7.0
    - rouge-score: 0.1.2
    - rpds-py: 0.13.1
    - rsa: 4.9
    - s3transfer: 0.7.0
    - sacrebleu: 1.5.0
    - safetensors: 0.4.1
    - scikit-learn: 1.3.2
    - scipy: 1.11.4
    - send2trash: 1.8.2
    - sentencepiece: 0.1.99
    - sentry-sdk: 1.38.0
    - setproctitle: 1.3.3
    - setuptools: 68.0.0
    - six: 1.16.0
    - smmap: 5.0.1
    - sniffio: 1.3.0
    - soupsieve: 2.5
    - sqlitedict: 2.1.0
    - stack-data: 0.6.3
    - starlette: 0.27.0
    - sympy: 1.12
    - tabledata: 1.3.3
    - tcolorpy: 0.1.4
    - tensorboard: 2.15.1
    - tensorboard-data-server: 0.7.2
    - tensorboard-plugin-profile: 2.14.0
    - terminado: 0.18.0
    - threadpoolctl: 3.2.0
    - tinycss2: 1.2.1
    - tokenizers: 0.15.0
    - tomli: 2.0.1
    - torch: 2.3.0.dev20240110+cu121
    - torch-tb-profiler: 0.4.3
    - torchmetrics: 1.2.0
    - tornado: 6.3.3
    - tqdm: 4.66.1
    - tqdm-multiprocess: 0.0.11
    - traitlets: 5.13.0
    - triton: 2.1.0
    - typepy: 1.3.2
    - types-python-dateutil: 2.8.19.14
    - typeshed-client: 2.4.0
    - typing-extensions: 4.8.0
    - tzdata: 2023.3
    - uri-template: 1.3.0
    - urllib3: 2.0.7
    - uvicorn: 0.24.0.post1
    - wandb: 0.16.0
    - wcwidth: 0.2.11
    - webcolors: 1.13
    - webencodings: 0.5.1
    - websocket-client: 1.6.4
    - werkzeug: 3.0.1
    - wheel: 0.41.2
    - widgetsnbextension: 4.0.9
    - xgboost: 2.0.2
    - xxhash: 3.4.1
    - y-py: 0.6.2
    - yarl: 1.9.3
    - ypy-websocket: 0.8.4
    - zstandard: 0.22.0
  • System:
    - OS: Linux
    - architecture:
    - 64bit
    - ELF
    - processor: x86_64
    - python: 3.10.10
    - release: 5.15.0-1051-aws
    - version: pip installation using github repository incomplete #56~20.04.1-Ubuntu SMP Tue Nov 28 15:43:31 UTC 2023

More info

No response

@awaelchli awaelchli added bug Something isn't working needs triage Waiting to be triaged by maintainers data (external) litdata package and removed needs triage Waiting to be triaged by maintainers labels Jan 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working data (external) litdata package ver: 2.2.x
Projects
None yet
1 participant