Can't load tokenizer for 'gpt2'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'gpt2' is the correct path to a directory containing all relevant files for a GPT2TokenizerFast tokenizer. #1261

ycoe · 2023-09-28T10:00:08Z

Dify version

0.3.23

Cloud or Self Hosted

Self Hosted

Steps to reproduce

配置一个新的 embedding 模型，比如 MiniMax (其它也可重现同样报错)
将embedding 模型设置为 MiniMax
创建一个空的数据集，并设置 embedding 模型为 MiniMax (默认就是了)
上传PDF，一路下一步
在索引时会报错

✔️ Expected Behavior

期望可以使用数据集中配置的 embedding 进行索引

❌ Actual Behavior

索引任务失败了：
Can't load tokenizer for 'gpt2'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'gpt2' is the correct path to a directory containing all relevant files for a GPT2TokenizerFast tokenizer.

不知道为什么，它使用gpt2了，而我没有配置gpt2

crazywoola · 2023-09-29T11:18:44Z

You need to setup a proxy for api and worker in order to get this model up. and run.

Add 2 envs http_proxy and https_proxy in docker-compose.yml should fix this.

ycoe · 2023-10-08T09:12:28Z

不是梯子的问题吧，我用的是国内的嵌入模型，试了两个：
minimax 和智普AI的嵌入模型都会报这个错误
这两个模型都是国内的，按理说不需要梯子

跟官方人员确认了，是在国内的

ycoe · 2023-10-08T09:27:41Z

发现一个问题：
我在现有的私有化部署的dify应用，已经使用过GPT3.5的嵌入模型创建过数据集的，切换为 minimax 的嵌入模型就会报这个错
而我在本地全新的dify应用上，使用minimax或智普AI的嵌入模型相互切换创建数据集都可以成功索引

ycoe · 2023-10-08T09:51:53Z

找到了这一个提交
e55dd13

docker镜像里面是把 gpt2打包进去了，可以直接使用，但不太清楚为什么minimax / 智普AI的嵌入模型还需要依赖到 gp2
我将本地的缓存拷进服务器就解决了这个问题

crazywoola · 2023-10-10T01:14:45Z

找到了这一个提交 e55dd13

docker镜像里面是把 gpt2打包进去了，可以直接使用，但不太清楚为什么minimax / 智普AI的嵌入模型还需要依赖到 gp2 我将本地的缓存拷进服务器就解决了这个问题

因为中间计算token的那个用了huggingface的一个模型。

crazywoola · 2024-06-06T01:25:06Z

这个问题早就没有了

zhmyahg · 2024-06-06T01:46:28Z

请问楼主解决了吗，我使用千帆模型和LangChain的ConversationSummaryBufferMemory时也遇到了一样的问题，在创建一个memory后手动添加内容memory.save_context时报错 @ycoe

Dify version

0.3.23

Cloud or Self Hosted

Self Hosted

Steps to reproduce

配置一个新的 embedding 模型，比如 MiniMax (其它也可重现同样报错)

将embedding 模型设置为 MiniMax

创建一个空的数据集，并设置 embedding 模型为 MiniMax (默认就是了)

上传PDF，一路下一步

在索引时会报错

✔️ Expected Behavior

期望可以使用数据集中配置的 embedding 进行索引

❌ Actual Behavior

索引任务失败了： Can't load tokenizer for 'gpt2'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'gpt2' is the correct path to a directory containing all relevant files for a GPT2TokenizerFast tokenizer.

不知道为什么，它使用gpt2了，而我没有配置gpt2

ycoe · 2024-06-06T02:58:43Z

请问楼主解决了吗，我使用千帆模型和LangChain的ConversationSummaryBufferMemory时也遇到了一样的问题，在创建一个memory后手动添加内容memory.save_context时报错 @ycoe

Dify version

0.3.23

Cloud or Self Hosted

Self Hosted

Steps to reproduce

配置一个新的 embedding 模型，比如 MiniMax (其它也可重现同样报错)

将embedding 模型设置为 MiniMax

创建一个空的数据集，并设置 embedding 模型为 MiniMax (默认就是了)

上传PDF，一路下一步

在索引时会报错

✔️ Expected Behavior

期望可以使用数据集中配置的 embedding 进行索引

❌ Actual Behavior

索引任务失败了： Can't load tokenizer for 'gpt2'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'gpt2' is the correct path to a directory containing all relevant files for a GPT2TokenizerFast tokenizer.
不知道为什么，它使用gpt2了，而我没有配置gpt2

我遇到的就是有个模型没有下载到本地导致的，手工下载下就解决了
不过dify调用国内的嵌入模型非常慢（貌似还去调用了国外的什么东西，没具体去跟）

363843342 · 2024-06-19T09:04:09Z

请问楼主解决了吗，我使用千帆模型和LangChain的ConversationSummaryBufferMemory时也遇到了一样的问题，在创建一个记忆后手动添加内容memory.save_context时报错

Dify 版本

0.3.23

云或自托管

自托管

重现步骤

配置一个新的 embedding 模型，比如 MiniMax （其它也可重现同样报错）

将embedding 模型设置为 MiniMax

创建一个空的数据集，并设置 embedding 模型为 MiniMax （默认就是了）

上传PDF，一路下一步

在索引时会报错

✔️ 预期行为

期望可以使用数据集中配置的 embedding 进行索引

❌ 实际行为

索引任务失败了： Can't load tokenizer for 'gpt2'.如果您尝试从“https://huggingface.co/models”加载它，请确保您没有同名的本地目录。否则，请确保“gpt2”是包含 GPT2TokenizerFast 分词器所有相关文件的目录的正确路径。
不知道为什么，它使用gpt2了，而我没有配置gpt2

我看了下千帆不能用这个”gpt2“，这是openai给纯英文用的分段总结模型，但是langchain似乎没给教程怎么更换这个模型到百度的api上，要在这个llm = QianfanChatEndpoint()代码之前给个指定模型为支持中文总结缩写的模型，并不知道怎么解决

ycoe added the 🐞 bug Something isn't working label Sep 28, 2023

ycoe closed this as completed Oct 8, 2023

crazywoola mentioned this issue Oct 10, 2023

Can't load tokenizer for 'gpt2'. #1295

Closed

dosubot bot mentioned this issue Sep 10, 2024

Docker startup error. Some models cannot be used. #8204

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't load tokenizer for 'gpt2'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'gpt2' is the correct path to a directory containing all relevant files for a GPT2TokenizerFast tokenizer. #1261

Can't load tokenizer for 'gpt2'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'gpt2' is the correct path to a directory containing all relevant files for a GPT2TokenizerFast tokenizer. #1261

ycoe commented Sep 28, 2023

crazywoola commented Sep 29, 2023

ycoe commented Oct 8, 2023 •

edited

Loading

ycoe commented Oct 8, 2023

ycoe commented Oct 8, 2023

crazywoola commented Oct 10, 2023

crazywoola commented Jun 6, 2024

zhmyahg commented Jun 6, 2024 •

edited

Loading

Dify version

Cloud or Self Hosted

Steps to reproduce

✔️ Expected Behavior

❌ Actual Behavior

ycoe commented Jun 6, 2024

Dify version

Cloud or Self Hosted

Steps to reproduce

✔️ Expected Behavior

❌ Actual Behavior

363843342 commented Jun 19, 2024

Dify 版本

云或自托管

重现步骤

✔️ 预期行为

❌ 实际行为

Can't load tokenizer for 'gpt2'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'gpt2' is the correct path to a directory containing all relevant files for a GPT2TokenizerFast tokenizer. #1261

Can't load tokenizer for 'gpt2'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'gpt2' is the correct path to a directory containing all relevant files for a GPT2TokenizerFast tokenizer. #1261

Comments

ycoe commented Sep 28, 2023

Dify version

Cloud or Self Hosted

Steps to reproduce

✔️ Expected Behavior

❌ Actual Behavior

crazywoola commented Sep 29, 2023

ycoe commented Oct 8, 2023 • edited Loading

ycoe commented Oct 8, 2023

ycoe commented Oct 8, 2023

crazywoola commented Oct 10, 2023

crazywoola commented Jun 6, 2024

zhmyahg commented Jun 6, 2024 • edited Loading

Dify version

Cloud or Self Hosted

Steps to reproduce

✔️ Expected Behavior

❌ Actual Behavior

ycoe commented Jun 6, 2024

Dify version

Cloud or Self Hosted

Steps to reproduce

✔️ Expected Behavior

❌ Actual Behavior

363843342 commented Jun 19, 2024

Dify 版本

云或自托管

重现步骤

✔️ 预期行为

❌ 实际行为

ycoe commented Oct 8, 2023 •

edited

Loading

zhmyahg commented Jun 6, 2024 •

edited

Loading