Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Translate tutorial 2_new_dataset #799

Merged
merged 3 commits into from
Aug 6, 2021
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/tutorials/2_new_dataset.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ The annotation json files in COCO format has the following necessary keys:

There are three necessary keys in the json file:

- `images`: contains a list of images with theire informations like `file_name`, `height`, `width`, and `id`.
- `images`: contains a list of images with their information like `file_name`, `height`, `width`, and `id`.
- `annotations`: contains the list of instance annotations.
- `categories`: contains the category name ('person') and its ID (1).

Expand Down
93 changes: 92 additions & 1 deletion docs_zh-CN/tutorials/2_new_dataset.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,94 @@
# 教程 2: 增加新的数据集

内容建设中……
## 通过组织数据格式来自定义数据集
jin-s13 marked this conversation as resolved.
Show resolved Hide resolved

### 将数据集组织为现有格式

使用自定义数据集最简单的方法是将其转换为现有的COCO数据集格式。

COCO数据集格式的json标注文件有以下关键字:

```python
'images': [
{
'file_name': '000000001268.jpg',
'height': 427,
'width': 640,
'id': 1268
},
...
],
'annotations': [
{
'segmentation': [[426.36,
...
424.34,
223.3]],
'keypoints': [0,0,0,
0,0,0,
0,0,0,
427,220,2,
443,222,2,
414,228,2,
449,232,2,
408,248,1,
454,261,2,
0,0,0,
0,0,0,
411,287,2,
431,287,2,
0,0,0,
458,265,2,
0,0,0,
466,300,1],
'num_keypoints': 10,
'area': 3894.5826,
'iscrowd': 0,
'image_id': 1268,
'bbox': [402.34, 205.02, 65.26, 88.45],
'category_id': 1,
'id': 215218
},
...
],
'categories': [
{'id': 1, 'name': 'person'},
]
```

Json文件中必须包含以下三个关键字:

- `images`: 包含图片信息的列表,提供图片的 `file_name`, `height`, `width` 和 `id` 等信息。
- `annotations`: 包含实例标注的列表。
- `categories`: 包含数据集中分类的名称 ('person') 和对应的 ID (1)。
jin-s13 marked this conversation as resolved.
Show resolved Hide resolved

在数据预处理完成后,用户需要修改配置文件以使用该数据集。

在 `configs/my_custom_config.py` 文件中,需要进行如下修改:

```python
...
# 数据集设定
dataset_type = 'MyCustomDataset'
classes = ('a', 'b', 'c', 'd', 'e')
...
data = dict(
samples_per_gpu=2,
workers_per_gpu=2,
train=dict(
type=dataset_type,
ann_file='path/to/your/train/json',
img_prefix='path/to/your/train/img',
...),
val=dict(
type=dataset_type,
ann_file='path/to/your/val/json',
img_prefix='path/to/your/val/img',
...),
test=dict(
type=dataset_type,
ann_file='path/to/your/test/json',
img_prefix='path/to/your/test/img',
...))
...
```