Skip to content
This repository has been archived by the owner on Sep 19, 2022. It is now read-only.

Mnist dataset server is down #325

Open
Jeffwan opened this issue Mar 17, 2021 · 5 comments
Open

Mnist dataset server is down #325

Jeffwan opened this issue Mar 17, 2021 · 5 comments

Comments

@Jeffwan
Copy link
Member

Jeffwan commented Mar 17, 2021

E2e test is down. Reason is straightforwad that server report 503 issue and I did some check and notice this has been tracked in torch community.

As the patch is only available on master and there's no way to specify the download path. I can try to either disable that single test case and wait for stable release or build a nightly image which takes extra efforts

Using distributed PyTorch with gloo backend
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Traceback (most recent call last):
  File "/var/mnist.py", line 150, in <module>
    main()
  File "/var/mnist.py", line 123, in main
    transforms.Normalize((0.1307,), (0.3081,))
  File "/opt/conda/lib/python3.6/site-packages/torchvision-0.2.1-py3.6.egg/torchvision/datasets/mnist.py", line 46, in __init__
    epoch, batch_idx * len(data), len(train_loader.dataset),
  File "/opt/conda/lib/python3.6/site-packages/torchvision-0.2.1-py3.6.egg/torchvision/datasets/mnist.py", line 114, in download
    if should_distribute():
  File "/opt/conda/lib/python3.6/urllib/request.py", line 223, in urlopen
    return opener.open(url, data, timeout)
  File "/opt/conda/lib/python3.6/urllib/request.py", line 532, in open
    response = meth(req, response)
  File "/opt/conda/lib/python3.6/urllib/request.py", line 642, in http_response
    'http', request, response, code, msg, hdrs)
  File "/opt/conda/lib/python3.6/urllib/request.py", line 570, in error
    return self._call_chain(*args)
  File "/opt/conda/lib/python3.6/urllib/request.py", line 504, in _call_chain
    result = func(*args)
  File "/opt/conda/lib/python3.6/urllib/request.py", line 650, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 503: Service Unavailable

Confirmed this is a server side issue.

https://discuss.pytorch.org/t/mnist-server-down/114433
pytorch/vision#3554

@andreyvelich
Copy link
Member

@Jeffwan We faced with the same problem in Katib.
We currently using FashionMNIST instead of MNIST: https://github.com/kubeflow/katib/blob/master/examples/v1beta1/pytorch-mnist/mnist.py#L137.
I believe it hosts in the PyTorch servers.

@yanniszark
Copy link
Contributor

@andreyvelich this sounds like a good solution.
Another way would be to pre-download the dataset in the image.
The problem is how to make a new image for the example. The current one is from the GCP registry, which is no longer available.

@Jeffwan
Copy link
Member Author

Jeffwan commented Mar 18, 2021

@Jeffwan We faced with the same problem in Katib.
We currently using FashionMNIST instead of MNIST: https://github.com/kubeflow/katib/blob/master/examples/v1beta1/pytorch-mnist/mnist.py#L137.
I believe it hosts in the PyTorch servers.

Sounds good. Let me double check if the code is compatible with FashionMnist dataset. If it is and data server is reliable. We can quickly change to it.

@Jeffwan
Copy link
Member Author

Jeffwan commented Mar 25, 2021

Code has been changed #327
We need a better way to publish images. This can be done after 1.3 release

@umka1332
Copy link

Code has been changed #327
We need a better way to publish images. This can be done after 1.3 release

Hi @Jeffwan Kubeflow 1.3 is already released. Is there any progress on this?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants