Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Where I can download a small/tiny dataset to test your repo? #4779

Closed
zydjohnHotmail opened this issue Sep 13, 2021 · 15 comments · Fixed by #4919
Closed

Where I can download a small/tiny dataset to test your repo? #4779

zydjohnHotmail opened this issue Sep 13, 2021 · 15 comments · Fixed by #4919
Labels
question Further information is requested

Comments

@zydjohnHotmail
Copy link

Hello:
I am very new to YoloV5, I want to follow your tutorial. However, I found it is nearly impossible, all the datasets examples in the /data folder. I mean from all the .yaml files, the smallest dataset size is more than 2GB. It is nearly impossible to download and run the Python code on any of them.
Do you have any small, tiny dataset for testing. I mean small, I am talking about less 100KB including the images, with only one or at most 3 labels. For example, can I have for 3 images with one label, so I can test and get to know the process.
I found some small datasets, but they were created before YoloV5, and I don’t know how to use them.
I also made one small dataset with Microsoft VOTT, with 20 images and some Json format files, but I don’t know how I can convert them to YoloV5 format.
Please advise! I am using Python 3.9 on Windows 10.
Thanks,

@zydjohnHotmail zydjohnHotmail added the question Further information is requested label Sep 13, 2021
@github-actions
Copy link
Contributor

github-actions bot commented Sep 13, 2021

👋 Hello @zydjohnHotmail, thank you for your interest in YOLOv5 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.

For business inquiries or professional support requests please visit https://ultralytics.com or email Glenn Jocher at glenn.jocher@ultralytics.com.

Requirements

Python>=3.6.0 with all requirements.txt installed including PyTorch>=1.7. To get started:

$ git clone https://github.com/ultralytics/yolov5
$ cd yolov5
$ pip install -r requirements.txt

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

CI CPU testing

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), validation (val.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu every 24 hours and on every commit.

@glenn-jocher
Copy link
Member

@zydjohnHotmail YOLOv5 default dataset is COCO128. It's only 6MB. It downloads automatically, no action is required on your part:

python train.py --data coco128.yaml

@zydjohnHotmail
Copy link
Author

zydjohnHotmail commented Sep 14, 2021

Hello:
Look at the output:
E:\Videos\yolov5>python train.py --data coco128.yaml
train: weights=yolov5s.pt, cfg=, data=coco128.yaml, hyp=data/hyps/hyp.scratch.yaml, epochs=300, batch_size=16, imgsz=640, rect=False, resume=False, nosave=False, noval=False, noautoanchor=False, evolve=None, bucket=, cache=None, image_weights=False, device=, multi_scale=False, single_cls=False, adam=False, sync_bn=False, workers=8, project=runs/train, entity=None, name=exp, exist_ok=False, quad=False, linear_lr=False, label_smoothing=0.0, upload_dataset=False, bbox_interval=-1, save_period=-1, artifact_alias=latest, local_rank=-1, freeze=0, patience=100
github: up to date with https://github.com/ultralytics/yolov5
YOLOv5 v5.0-430-gaa18599 torch 1.9.0+cpu CPU

hyperparameters: lr0=0.01, lrf=0.2, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, cls=0.5, cls_pw=1.0, obj=1.0, obj_pw=1.0, iou_t=0.2, anchor_t=4.0, fl_gamma=0.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0
Weights & Biases: run 'pip install wandb' to automatically track and visualize YOLOv5 runs (RECOMMENDED)
TensorBoard: Start with 'tensorboard --logdir runs\train', view at http://localhost:6006/

WARNING: Dataset not found, nonexistent paths: ['E:\Videos\datasets\coco128\images\train2017']
Downloading https://github.com/ultralytics/yolov5/releases/download/v1.0/coco128.zip ...
100%|█████████████████████████████████████████████████████████████████████████████| 6.66M/6.66M [00:00<00:00, 18.9MB/s]
'unzip' is not recognized as an internal or external command,
operable program or batch file.
Dataset autodownload failure

             from  n    params  module                                  arguments

0 -1 1 3520 models.common.Focus [3, 32, 3]
1 -1 1 18560 models.common.Conv [32, 64, 3, 2]
2 -1 1 18816 models.common.C3 [64, 64, 1]
3 -1 1 73984 models.common.Conv [64, 128, 3, 2]
4 -1 3 156928 models.common.C3 [128, 128, 3]
5 -1 1 295424 models.common.Conv [128, 256, 3, 2]
6 -1 3 625152 models.common.C3 [256, 256, 3]
7 -1 1 1180672 models.common.Conv [256, 512, 3, 2]
8 -1 1 656896 models.common.SPP [512, 512, [5, 9, 13]]
9 -1 1 1182720 models.common.C3 [512, 512, 1, False]
10 -1 1 131584 models.common.Conv [512, 256, 1, 1]
11 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
12 [-1, 6] 1 0 models.common.Concat [1]
13 -1 1 361984 models.common.C3 [512, 256, 1, False]
14 -1 1 33024 models.common.Conv [256, 128, 1, 1]
15 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
16 [-1, 4] 1 0 models.common.Concat [1]
17 -1 1 90880 models.common.C3 [256, 128, 1, False]
18 -1 1 147712 models.common.Conv [128, 128, 3, 2]
19 [-1, 14] 1 0 models.common.Concat [1]
20 -1 1 296448 models.common.C3 [256, 256, 1, False]
21 -1 1 590336 models.common.Conv [256, 256, 3, 2]
22 [-1, 10] 1 0 models.common.Concat [1]
23 -1 1 1182720 models.common.C3 [512, 512, 1, False]
24 [17, 20, 23] 1 229245 models.yolo.Detect [80, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 256, 512]]
Model Summary: 283 layers, 7276605 parameters, 7276605 gradients, 17.1 GFLOPs

Transferred 362/362 items from yolov5s.pt
Scaled weight_decay = 0.0005
optimizer: SGD with parameter groups 59 weight, 62 weight (no decay), 62 bias
Traceback (most recent call last):
File "E:\Videos\yolov5\utils\datasets.py", line 395, in init
raise Exception(f'{prefix}{p} does not exist')
Exception: train: ..\datasets\coco128\images\train2017 does not exist

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "E:\Videos\yolov5\train.py", line 611, in
main(opt)
File "E:\Videos\yolov5\train.py", line 509, in main
train(opt.hyp, opt, device, callbacks)
File "E:\Videos\yolov5\train.py", line 207, in train
train_loader, dataset = create_dataloader(train_path, imgsz, batch_size // WORLD_SIZE, gs, single_cls,
File "E:\Videos\yolov5\utils\datasets.py", line 98, in create_dataloader
dataset = LoadImagesAndLabels(path, imgsz, batch_size,
File "E:\Videos\yolov5\utils\datasets.py", line 400, in init
raise Exception(f'{prefix}Error loading data from {path}: {e}\nSee {HELP_URL}')
Exception: train: Error loading data from ..\datasets\coco128\images\train2017: train: ..\datasets\coco128\images\train2017 does not exist
See https://docs.ultralytics.com/yolov5/tutorials/train_custom_data

E:\Videos\yolov5>
Do I have to download something else?

@zydjohnHotmail
Copy link
Author

Some additional information: OS: Windows 10; Python: 3.9.7 (x64); I already installed all necessary python modules.

@glenn-jocher
Copy link
Member

glenn-jocher commented Sep 14, 2021

@zydjohnHotmail it appears you may have environment problems. Please ensure you meet all dependency requirements if you are attempting to run YOLOv5 locally. If in doubt, create a new virtual Python 3.8 environment, clone the latest repo (code changes daily), and pip install -r requirements.txt again. We also highly recommend using one of our verified environments below.

Requirements

Python>=3.6.0 with all requirements.txt installed including PyTorch>=1.7. To get started:

$ git clone https://github.com/ultralytics/yolov5
$ cd yolov5
$ pip install -r requirements.txt

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

CI CPU testing

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), validation (val.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu every 24 hours and on every commit.

@zydjohnHotmail
Copy link
Author

Hello:
I fixed the issue already. The main issue is: in Windows 10, there is no such: unzip command built-in.
I can use 7-zip to unzip the downloaded zip file.
But your repo assume all the environments are in UNIX/Linux, they have all kinds of unzip.
However, not all people are using Linux/Unix, even more and more people are using them now.
Please consider to detect the environment, if it is Windows, then command "unzip" is usually not available.
Thanks,

@glenn-jocher
Copy link
Member

glenn-jocher commented Sep 14, 2021

@zydjohnHotmail YOLOv5 operates on all 3 operating systems including Windows, and makes no assumption about a users OS. Windows CI checks are run every 24 hours and on every commit, and are currently passing. The last windows-latest check was 13 hours ago here:
https://github.com/ultralytics/yolov5/runs/3593290876?check_suite_focus=true

@glenn-jocher
Copy link
Member

@zydjohnHotmail the windows-latest autodownload of our tests using COCO128 is here, everything works correctly in this Windows environment. Do you know what the difference is between your version of windows and github's windows-latest (https://github.com/actions/virtual-environments) runner?
https://github.com/ultralytics/yolov5/runs/3593290876?check_suite_focus=true

Screen Shot 2021-09-14 at 2 45 09 PM

@zydjohnHotmail
Copy link
Author

Hi:
I don't know how you check for Windows 10. But from my experience, there is no built-in command "unzip", to unzip one zipped folder, you have to manually click on the folder and select unzip.
I think it is main reason the command "python train.py --data coco128.yaml" failed.
You see this error:
'unzip' is not recognized as an internal or external command,
If you have one PC running Windows 10, you can test and let me know where you can run the command "unzip".

@glenn-jocher
Copy link
Member

@zydjohnHotmail ok thanks for the feedback! Yes I see that the unzip not found is the cause of the error. I see the github windows-latest runners are Windows Server 2019 instances, maybe these versions of Windows come included with additional tools like unzip.

@zydjohnHotmail
Copy link
Author

Hello:
As I didn't have experience with Windows Server 2019, so I can't say it doesn't have unzip. But how many people can use Windows Server 2019? I had experience with Windows Server 2012, but it didn't have unzip. You should consider 99.99% of the people may use Windows 10%. Less than 0.0001% of the people have chance to touch Windows Server any version.

@glenn-jocher
Copy link
Member

glenn-jocher commented Sep 14, 2021

@zydjohnHotmail yes I understand. These are simply the available environments for GitHub CI checks. When we can get access to a normal windows (Windows Home?) we'll try to reproduce.

@zydjohnHotmail
Copy link
Author

Hi,
I think you can write a simple unzip using Python, and download it first, then run it when download any datasets using .zip format. Anyway, you have to change the process so that at least those people using Windows 10 can run any training without too much issue. For Linux people, you can ignore them, as they are not in majority!

@Matias2379
Copy link

Hi:
I don't know how you check for Windows 10. But from my experience, there is no built-in command "unzip", to unzip one zipped folder, you have to manually click on the folder and select unzip.
I think it is main reason the command "python train.py --data coco128.yaml" failed.
You see this error:
'unzip' is not recognized as an internal or external command,
If you have one PC running Windows 10, you can test and let me know where you can run the command "unzip".

I had the same issue on Windows 10. My quick fix was to replace line 360 in general.py:

                #r = os.system(f'unzip -q {f} -d {root} && rm {f}')  # unzip
                r = os.system(f'powershell -command Expand-Archive -Path {f} -DestinationPath {root}')
                r = os.system(f'powershell -command Remove-Item {f}')

Had to Remove the zip file in a separate line because I couldn´t get the command chain in powershell working.
Hope it helps.

@glenn-jocher
Copy link
Member

@zydjohnHotmail @Matias2379 perhaps python zipfile library might be a good os-agnostic alternate? zipfile.extract() seems like it might be able to replace the current code:
https://docs.python.org/3/library/zipfile.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants