本仓库对 Models 与 basecls 中的模型进行基本测速, 输出信息包括:每个 iter 运行时间(单位: ms),最大显存占用(单位: MiB),平均 cpu 占用率(top loadavg 的显示值)。
# 运行单卡 benchmark
./run_benchmark.sh
# 运行单卡+八卡 benchmark
./run_benchmark.sh -d
# 运行单卡+八卡+pytorch benchmark
./run_benchmark.sh -d -t
运行指定 benchmark 只能获得每个 iter 的运行时间这一项数据,无法自动从外部监控显存/cpu等信息。
-
基本参数
指定模型和卡数:
run.py -b <benchmark_name> -n <ngpus>
-
可选参数
- 指定 batch_size:
run.py -b <benchmark_name> -n <ngpus> --batch-size <bs>
- 指定训练步数:
run.py -b <benchmark_name> -n <ngpus> --steps <n_steps>
- 使用混合精度训练:
run.py -b <benchmark_name> -n <ngpus> -m mp
- 使用 trace:
run.py -b <benchmark_name> -n <ngpus> -t
- trace 指定使用 symbolic=True 模式:
run.py -b <benchmark_name> -n <ngpus> -t --symbolic
- 使用 dataloader:
run.py -b <benchmark_name> -n <ngpus> --loader
- 使用 preloader(需要先打开 dataloader):
run.py -b <benchmark_name> -n <ngpus> --loader --preload
- 指定 batch_size:
-
示例运行
> ./run.py -b faster_rcnn -n 1 --steps 20
command: source prepare.sh && python3 ./detection/train_random.py -a faster_rcnn -n 1 -s 20
13 14:12:40 load_serialized_obj_from_url: download to or using cached /home/chenyuanzhao/.cache/megengine/serialized/928d77_resnet50_fbaug_76254_4e14b7d1.pkl
Step 0, Loss (5.559 0.688 0.478 4.394 0.000 ), Time (tot:0.529, data:0.000)
Step 1, Loss (5.552 0.688 0.478 4.386 0.000 ), Time (tot:0.526, data:0.000)
Step 2, Loss (5.542 0.687 0.477 4.377 0.000 ), Time (tot:0.519, data:0.000)
Step 3, Loss (5.532 0.687 0.477 4.368 0.000 ), Time (tot:0.524, data:0.000)
Step 4, Loss (5.522 0.687 0.477 4.358 0.000 ), Time (tot:0.525, data:0.000)
Step 5, Loss (5.513 0.687 0.477 4.349 0.000 ), Time (tot:0.527, data:0.000)
Step 6, Loss (5.503 0.688 0.477 4.338 0.000 ), Time (tot:0.527, data:0.000)
Step 7, Loss (5.492 0.687 0.476 4.328 0.000 ), Time (tot:0.530, data:0.000)
Step 8, Loss (5.484 0.688 0.476 4.319 0.000 ), Time (tot:0.527, data:0.000)
Step 9, Loss (5.470 0.685 0.476 4.308 0.000 ), Time (tot:0.527, data:0.000)
Step 10, Loss (5.462 0.688 0.476 4.298 0.000 ), Time (tot:0.526, data:0.000)
Step 11, Loss (5.448 0.685 0.476 4.288 0.000 ), Time (tot:0.526, data:0.000)
Step 12, Loss (5.438 0.686 0.475 4.276 0.000 ), Time (tot:0.524, data:0.000)
Step 13, Loss (5.428 0.685 0.475 4.267 0.000 ), Time (tot:0.523, data:0.000)
Step 14, Loss (5.417 0.685 0.475 4.257 0.000 ), Time (tot:0.522, data:0.000)
Step 15, Loss (5.403 0.682 0.475 4.246 0.000 ), Time (tot:0.521, data:0.000)
Step 16, Loss (5.393 0.684 0.475 4.234 0.000 ), Time (tot:0.521, data:0.000)
Step 17, Loss (5.385 0.685 0.475 4.225 0.000 ), Time (tot:0.521, data:0.000)
Step 18, Loss (5.374 0.684 0.474 4.213 0.003 ), Time (tot:0.521, data:0.000)
Step 19, Loss (5.361 0.683 0.474 4.200 0.003 ), Time (tot:0.521, data:0.000)
==================== summary ====================
benchmark: detection
mode: imperative
loader:
arch: faster_rcnn
train_mode: normal
batchsize: 2
#GPU: 1
avg time: 0.521 seconds
会发现 run.py 只是对模型做了分发,仍然是进入到特定的模型子目录中去运行对应的 train_random.py。
- shufflenet
- resnet
- faster_rcnn
- atss
- retinanet
- vision_transformer
- basecls related models
- torch_shufflenet
- torch_resnet
- torch_vision_transformer
- timm related models