Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

数据问题 #11

Open
hongxiaDu opened this issue Jun 3, 2022 · 2 comments
Open

数据问题 #11

hongxiaDu opened this issue Jun 3, 2022 · 2 comments

Comments

@hongxiaDu
Copy link

感谢您的工作,请问一下,MOOCCubeX中提供的课程数量有将近4000门,概念concept有63万+,但是concept-course的关系中,概念只有20万+,课程只有800+,请问这部分的数据是不是不完整,需要利用提供的概念抽取Pipeline得到其他课程的概念呢?但是为什么概念63万+,而与课程对应的却只有20万+呢?

@yuq-1s yuq-1s added the bug Something isn't working label Jun 6, 2022
@yuq-1s
Copy link
Collaborator

yuq-1s commented Jun 6, 2022

Hi 只有800+个课程有概念是因为只有它们有视频字幕,我们是从视频字幕中抽取概念的。

@yuq-1s yuq-1s removed the bug Something isn't working label Jun 16, 2022
@yuq-1s
Copy link
Collaborator

yuq-1s commented Jun 16, 2022

我们有20万+个概念有concept-course关联是因为这些概念是从字幕中抽取来的,其他的概念是从其他渠道获取的。其实我们的算法可以抽取远多于20万+的概念,但为了保证数据质量,我们在MOOCCubeX中只公开了这些概念。如果你想自己抽取更多的概念,得到更多的concept-course对,可以尝试使用我们的概念抽取工具,参见issue #9

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants