Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] 关于微调来达到语音克隆的一些测试与问题 #404

Closed
EnochYe opened this issue Jul 20, 2024 · 6 comments
Closed

[BUG] 关于微调来达到语音克隆的一些测试与问题 #404

EnochYe opened this issue Jul 20, 2024 · 6 comments
Labels
bug Something isn't working

Comments

@EnochYe
Copy link

EnochYe commented Jul 20, 2024

你好,我是一名人工智能行业的爱好者,首先非常感谢你们做出了fish-speech这么优秀的开源项目!但是我在自己尝试微调fish-speech模型来达到语音克隆的效果时,遇到了一些问题,想向你们请教一下微调的细节问题,不知你们能否为我解答一下。

首先,我这里有一份30min的模特语音数据,在此基础上我进行了如下的微调与测试:

  1. 测试1
  • 训练数据输入:30min的音频,使用audio-preprocess库进行了自动切片、响度匹配、打标等处理。

  • 超参数设置:训练1000个epoch

  • 克隆结果:

    • 测试文本:'''它的宽大的叶子也是片片向上。就像这白杨树一样傲然挺立的守卫他们家乡的哨兵。难道你又不更远一点。想到这样枝枝叶叶靠紧团结。而那个叫布鲁诺的小伙子却仍在原地踏步。老板一边耐心地听着他的抱怨。'''

    • 测试结果:使用了10s中的参考音频来作为fake.npy,结果如附件里的“测试1-1000epoch”和“测试1-600epoch所示。

    • 存在问题:生成的音频中有含糊不清、错误的部分,且语速节奏很奇怪。

  1. 测试2
  • 训练数据输入:5min的音频,使用audio-preprocess库进行了自动切片、响度匹配、打标等处理。

  • 超参数设置:训练1000个epoch

  • 克隆结果:

    • 测试文本:'''它的宽大的叶子也是片片向上。就像这白杨树一样傲然挺立的守卫他们家乡的哨兵。难道你又不更远一点。想到这样枝枝叶叶靠紧团结。而那个叫布鲁诺的小伙子却仍在原地踏步。老板一边耐心地听着他的抱怨。'''

    • 测试结果:使用了10s中的参考音频来作为fake.npy,结果如附件里的“测试2-1000epoch”所示。

    • 存在问题:生成的音频中有含糊不清、错误以及非常长的无意义的部分。

  1. 测试3
  • 训练数据输入:5min的音频,使用audio-preprocess库进行了自动切片、响度匹配、打标等处理。

  • 超参数设置:训练100个epoch

  • 克隆结果:

    • 测试文本:'''它的宽大的叶子也是片片向上。就像这白杨树一样傲然挺立的守卫他们家乡的哨兵。难道你又不更远一点。想到这样枝枝叶叶靠紧团结。而那个叫布鲁诺的小伙子却仍在原地踏步。老板一边耐心地听着他的抱怨。'''

    • 测试结果:使用了10s中的参考音频来作为fake.npy,结果如附件里的“测试3”所示。

    • 存在问题:生成的音频中会出现多余的句子。

对于我上面的这些实验,我分别尝试了比较长和比较短的训练音频、多的epoch和少的epoch,但是总是达不到理想的效果。对于我出现的这些问题,请问你们那边有什么建议吗?或者你们对于语音克隆微调的时候,有没有什么比较稳定好用的setting?

再次感谢你们做出了如此优秀的项目,同时也期待你们的解答~
(由于github issue不能添加音频文件为附件,以上提到的附件放到了此onedrive链接里面:https://1drv.ms/f/s!Anj5aIRFC0FNhFPcqUbNXm1EDv8m?e=RdLLjQ)


2024年7月20日10:19:33 Update
通过翻阅仓库的issue,我看到作者说微调的训练数据要在30min-1h,对此 我又使用了45min的另一位模特的声音进行微调测试。
我分别尝试了500epoch和1000epoch的权重,并且在推理的时候也都使用了5s左右音频的fake.npy来作为参考。
但是也是存在“生成的音频中会出现多余的句子”的问题。

@EnochYe EnochYe added the bug Something isn't working label Jul 20, 2024
@AnyaCoder
Copy link
Collaborator

audio (22).zip
image
可以试试这个参数

@leng-yue
Copy link
Member

100 个 epoch 有一些过拟合的风险, 结尾多于句子建议使用 webui 自动重抽, 我们在模型层面还在做一些优化.

@EnochYe
Copy link
Author

EnochYe commented Jul 21, 2024

audio (22).zip image 可以试试这个参数

当我在使用微调好的模型进行推理的时候,如果带上"--max-new-tokens 1024" 总会导致生成过程中报错,同时如果使用fish-speech原始模型进行推理的时候能够正常生成但是生成效果并未提升。

不知道是不是我对"每批最大令牌数"这个参数理解错误导致的。

另附我的生成命令、测试结果、以及报错详情:

  • 命令:'''python tools/llama/generate.py
    --text "它的宽大的叶子也是片片向上。就像这白杨树一样傲然挺立的守卫他们家乡的哨兵。难道你又不更远一点。想到这样枝枝叶叶靠紧团结。而那个叫布鲁诺的小伙子却
    仍在原地踏步。老板一边耐心地听着他的抱怨。"
    --prompt-text "杭州体育与你同行大家好,欢迎收看今天的杭州体育家,我是马谦。"
    --prompt-tokens ./idr_test_data/MaQian/fake.npy
    --checkpoint-path ./checkpoints/maqian_100
    --num-samples 8
    --compile
    --max-new-tokens 1024'''
  • 测试结果:https://1drv.ms/f/s!Anj5aIRFC0FNhFhO32JqvTV87CGu?e=loGut9
  • 报错详情:> python tools/llama/generate.py \ --text "它的宽大的叶子也是片片向上。就像这白杨树一样傲然挺立的守卫他们家乡的哨兵。难道你又不更远一点。想到这样枝枝叶叶靠紧团结。而那个叫布鲁诺的小伙子却 仍在原地踏步。老板一边耐心地听着他的抱怨。" \ --prompt-text "杭州体育与你同行大家好,欢迎收看今天的杭州体育家,我是马谦。" \ --prompt-tokens ./idr_test_data/MaQian/fake.npy \ --checkpoint-path ./checkpoints/maqian_100 \ --num-samples 8 \ --compile \ --max-new-tokens 1024 2024-07-21 13:04:47.783 | INFO | __main__:main:639 - Loading model ... 2024-07-21 13:04:53.314 | INFO | __main__:load_model:347 - Restored model from checkpoint 2024-07-21 13:04:53.314 | INFO | __main__:load_model:351 - Using DualARTransformer 2024-07-21 13:04:53.314 | INFO | __main__:load_model:357 - Compiling function... 2024-07-21 13:04:53.319 | INFO | __main__:main:648 - Time to load model: 5.54 seconds 2024-07-21 13:04:53.336 | INFO | __main__:generate_long:432 - Encoded text: 它的宽大的叶子也是片片向上. 2024-07-21 13:04:53.337 | INFO | __main__:generate_long:432 - Encoded text: 就像这白杨树一样傲然挺立的守卫他们家乡的哨兵.难道你又不更远一点. 2024-07-21 13:04:53.337 | INFO | __main__:generate_long:432 - Encoded text: 想到这样枝枝叶叶靠紧团结.而那个叫布鲁诺的小伙子却.仍在原地踏步. 2024-07-21 13:04:53.337 | INFO | __main__:generate_long:432 - Encoded text: 老板一边耐心地听着他的抱怨. 2024-07-21 13:04:53.338 | INFO | __main__:generate_long:450 - Generating sentence 1/4 of sample 1/8 0%| | 0/1023 [00:00<?, ?it/s]/data-disk/users/users/miniconda3/envs/fish-speech/lib/python3.10/site-packages/torch/backends/cuda/__init__.py:342: FutureWarning: torch.backends.cuda.sdp_kernel() is deprecated. In the future, this context manager will be removed. Please see, torch.nn.attention.sdpa_kernel() for the new context manager, with updated signature. warnings.warn( 0%| | 1/1023 [00:17<5:02:04, 17.73s/it]/data-disk/users/users/miniconda3/envs/fish-speech/lib/python3.10/site-packages/torch/backends/cuda/__init__.py:342: FutureWarning: torch.backends.cuda.sdp_kernel() is deprecated. In the future, this context manager will be removed. Please see, torch.nn.attention.sdpa_kernel() for the new context manager, with updated signature. warnings.warn( 0%|▏ | 2/1023 [00:17<2:06:02, 7.41s/it]/data-disk/users/users/miniconda3/envs/fish-speech/lib/python3.10/site-packages/torch/backends/cuda/__init__.py:342: FutureWarning: torch.backends.cuda.sdp_kernel() is deprecated. In the future, this context manager will be removed. Please see, torch.nn.attention.sdpa_kernel() for the new context manager, with updated signature. warnings.warn( 83%|██████████████████████████████████████████████████████████████████████████▋ | 849/1023 [00:20<00:04, 41.76it/s] 2024-07-21 13:05:14.000 | INFO | __main__:generate_long:496 - Compilation time: 20.66 seconds 2024-07-21 13:05:14.000 | INFO | __main__:generate_long:505 - Generated 851 tokens in 20.66 seconds, 41.19 tokens/sec 2024-07-21 13:05:14.000 | INFO | __main__:generate_long:508 - Bandwidth achieved: 20.19 GB/s 2024-07-21 13:05:14.001 | INFO | __main__:generate_long:513 - GPU Memory used: 1.10 GB 2024-07-21 13:05:14.001 | INFO | __main__:main:681 - Sampled text: 它的宽大的叶子也是片片向上. 2024-07-21 13:05:14.001 | INFO | __main__:generate_long:450 - Generating sentence 2/4 of sample 1/8 11%|█████████▍ | 108/1023 [00:00<00:02, 351.97it/s]<frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [0,0,0] Assertion index out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [1,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [2,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [3,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [4,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [5,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [6,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [7,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [8,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [9,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [10,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [11,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [12,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [13,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [14,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [15,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [16,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [17,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [18,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [19,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [20,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [21,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [22,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [23,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [24,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [25,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [26,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [27,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [28,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [29,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [30,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [31,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [32,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [33,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [34,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [35,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [36,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [37,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [38,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [39,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [40,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [41,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [42,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [43,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [44,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [45,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [46,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [47,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [48,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [49,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [50,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [51,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [52,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [53,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [54,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [55,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [56,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [57,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [58,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [59,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [60,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [61,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [62,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [63,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [64,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [65,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [66,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [67,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [68,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [69,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [70,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [71,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [72,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [73,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [74,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [75,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [76,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [77,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [78,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [79,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [80,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [81,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [82,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [83,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [84,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [85,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [86,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [87,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [88,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [89,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [90,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [91,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [92,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [93,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [94,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [95,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [96,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [97,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [98,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [99,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [100,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [101,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [102,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [103,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [104,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [105,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [106,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [107,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [108,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [109,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [110,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [111,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [112,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [113,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [114,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [115,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [116,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [117,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [118,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [119,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [120,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [121,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [122,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [123,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [124,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [125,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [126,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [127,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. 14%|████████████ | 139/1023 [00:00<00:02, 337.01it/s] Traceback (most recent call last): File "/data-disk/users/users/repository/fish-speech/tools/llama/generate.py", line 694, in <module> main() File "/data-disk/users/users/miniconda3/envs/fish-speech/lib/python3.10/site-packages/click/core.py", line 1157, in __call__ return self.main(*args, **kwargs) File "/data-disk/users/users/miniconda3/envs/fish-speech/lib/python3.10/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) File "/data-disk/users/users/miniconda3/envs/fish-speech/lib/python3.10/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, **ctx.params) File "/data-disk/users/users/miniconda3/envs/fish-speech/lib/python3.10/site-packages/click/core.py", line 783, in invoke return __callback(*args, **kwargs) File "/data-disk/users/users/repository/fish-speech/tools/llama/generate.py", line 678, in main for response in generator: File "/data-disk/users/users/repository/fish-speech/tools/llama/generate.py", line 484, in generate_long y = generate( File "/data-disk/users/users/miniconda3/envs/fish-speech/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/data-disk/users/users/miniconda3/envs/fish-speech/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/data-disk/users/users/repository/fish-speech/tools/llama/generate.py", line 261, in generate x = decode_n_tokens( File "/data-disk/users/users/repository/fish-speech/tools/llama/generate.py", line 201, in decode_n_tokens if cur_token[0, 0, -1] == im_end_id: RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile withTORCH_USE_CUDA_DSA to enable device-side assertions.

不知道你那边是否有遇到类似的情况以及相关的建议?

@EnochYe
Copy link
Author

EnochYe commented Jul 21, 2024

100 个 epoch 有一些过拟合的风险, 结尾多于句子建议使用 webui 自动重抽, 我们在模型层面还在做一些优化.

嗯,我分别尝试微调100epoch与1000epoch,发现确实少点epoch的模型效果更好。

但是小短句里面会生成额外的语音这个问题确实还是比较严重,大约生成8次结果里面只有2次效果比较好。不知道你那边还有没有什么trick或者建议?

另附我的测试结果:https://1drv.ms/f/s!Anj5aIRFC0FNhFhO32JqvTV87CGu?e=loGut9

@AnyaCoder
Copy link
Collaborator

说明一下你的硬件情况

@EnochYe
Copy link
Author

EnochYe commented Jul 21, 2024

说明一下你的硬件情况

2024-07-21 at 15 48 17@2x

@EnochYe EnochYe closed this as completed Jul 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants