-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NebulaGraph-DataX:ReaderAndWriter Development #1
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Please resubmit the PR at https://github.com/nebula-contrib/DataX |
czpmango
approved these changes
Oct 17, 2022
czpmango
approved these changes
Oct 24, 2022
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
方案描述:
本项目架构主要参考DataX官方代码库,项目主要分为Writer和Reader两部分进行开发和测试。由于DataX官方代码库中尚未接入其他图数据库插件,所以本项目开发中也考虑到图数据库与其他数据库开发的差异。
从原理角度解释项目:
我们以从关系型数据库MySQL中的数据同步插入到NebulaGraph中为例。之所以能够把关系型数据库中的表数据同步到NebulaGraph图数据库中,是因为nebula中存在Tag和Edge Type这两个类表概念,Tag作为标签类型,标识着某一类点的属性,可以近似为关系型数据库的表结构,而Edge Type同理,也可以近似为关系型数据中的关系表结构。因此当我们想同步关系型数据中某些实体表时,则对应的就是nebula中对应的Tag。而某一Tag下会对应多个Vertex,其相当于关系型数据中的表项,而Edge对于Edge Type的关系同样类似Tag和Vertex的关系。因此无论nebula作为Reader还是Writer都可以从结构角度匹配上关系型数据库中的表结构。
Reader插件开发的实现思路:
通过nebula nGql的查询语句 match 查询Tag标签下对应的所有Vertex节点,并返回节点集合组装成record包集合,通过DataX发送给关系型数据库,注意关系型数据库的Table表需要和Tag标签的名称完全一致,否则会出现匹配失败。同理,利用match查询语句查询某个制定edge_type类型下的所有边,然后通过关系型数据库的插入语句写入目标数据库中。需要注意的是,当使用match语句时,需要确保Tag和Edge Type已经建立索引。
Writer插件开发的实现思路:
在开发Writer插件时,我们需要利用配置文件中的column字段,利用元信息获得哪些字段属于哪些标签和边类型,然后进行匹配。在获取这些信息后,我们利用从reader插件端获取到的record包集合,通过Java8中stream的filter,map等操作组装插入语句。由于我们规定关系型数据库的表名称和图数据库的标签和边类型必须一致,所以我们可以直接利用tag_name和edgeType_name进行字段的匹配,通过insert vertex <tag_name>和insert edge <edge_type>语句插入到nebula中。