Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

新增生僻字模型 #10390

Closed
shiyutang opened this issue Jul 14, 2023 · 5 comments
Closed

新增生僻字模型 #10390

shiyutang opened this issue Jul 14, 2023 · 5 comments
Assignees

Comments

@shiyutang
Copy link
Collaborator

背景

经过需求征集#10334 和每周技术研讨会 #10223 讨论,我们确定了新增生僻字模型的任务。

解决步骤

  1. 替换现有字典txt为扩充《通用规范汉字表》的字典。
  2. 在现有数据集上通过数据合成copy paste等方式实现语料的平衡,并重新训练PPOCRV3的检测和识别模型。
  3. 对比训练后模型在普通文字和生僻字上的检测、识别精度,并和PPOCRV3模型最优模型进行对比;达到普通字精度不变或者更高,生僻字上精度进一步提升的效果。
  4. 提交PR到ppocr,替换最优模型。
@hademen
Copy link

hademen commented Jul 18, 2023

生僻字模型的模型在哪里可以下载

@shiyutang
Copy link
Collaborator Author

这是一个命题任务,可以报名参与,使用aistudio资源进行训练。训练之后就在相关PR中公布。

@zhoutianrui-tongji
Copy link

请问现在有生僻字模型了吗?

@zhuxiaobin
Copy link

@shiyutang 生僻字模型进度咋样了?

@dc6273632
Copy link

生僻字模型有消息了吗?

@PaddlePaddle PaddlePaddle locked and limited conversation to collaborators Jun 5, 2024
@SWHL SWHL converted this issue into discussion #12738 Jun 5, 2024

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants