预训练放开vision encoder，效果很差 #430

liuheng0111 · 2024-03-04T07:29:55Z

liuheng0111
Mar 4, 2024

使用llava1.5模型结构，在第一个预训练使用大量caption数据放开vision encoder、mlp_2x的参数训练，global batch_size 192, 训练多步后测试模型发现模型完全没有caption能力，而且loss也比单独放开mlp_2x大很多，消融发现是vision encoder放开后提取图像信息能力变差了很多，请问如果放开vision encoder应该如何训练？ yi-vl stage1使用Laion-400M data for pretraining 放开了vit和mlp_2x之后，vision encoder能很好的提取图像信息吗？

Stage	Global batch size	Learning rate	Gradient clip	Epochs
Stage 1, 2	4096	1e-4	0.5	1
Stage 3	256	2e-5	1.0	2

我采用的Learning rate也是1e-4，mlp_2x没有使用layer normalizations, 使用bf16训练，没有gradient clip，batch_size是192；你们使用的bfloat16训练么？bfloat16训练如何设置gradient clip， deepspeed bfloat16是不是不支持这个参数的设置？

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

预训练放开vision encoder，效果很差 #430

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

预训练放开vision encoder，效果很差 #430

liuheng0111 Mar 4, 2024

Replies: 0 comments

liuheng0111
Mar 4, 2024