Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

能否提供3.0-snapshot版本的jar包,以及给予更加详细的使用教程,辛苦辛苦 #50

Closed
liangye20 opened this issue Jun 18, 2022 · 16 comments

Comments

@liangye20
Copy link

No description provided.

@wey-gu
Copy link
Contributor

wey-gu commented Jun 18, 2022

https://www.siwei.io/spark-on-nebula-graph/

这里有可以无障碍编译的容器环境,还一并提供了能跑通的 example jar 和一起连接的 nebula 集群环境。可以参考一下。

@Reid00
Copy link
Contributor

Reid00 commented Aug 18, 2022

@wey-gu hi siwei 我在尝试用pyspark 的 spark-connector 写入数据,但是当我尝试连接读取数据的时候碰到了这个错误,能否帮忙排查下:
image

已知我目前做了这些:

  1. 所在机器可以ping 通nebula 所在服务器
  2. nebula-spark-connector-3.0.0.jar 放到了spark 2.4 的jar 文件夹下

请问这个如何排查这个问题?
BTW: https://www.siwei.io/spark-on-nebula-graph/ 这个blog 写的挺好的,但是主要针对的是用docker 一键部署尝鲜的情况,
有没有完整的python sample 如何使用spark-connector 去读取,写入数据。

@wey-gu
Copy link
Contributor

wey-gu commented Aug 18, 2022

@wey-gu hi siwei 我在尝试用pyspark 的 spark-connector 写入数据,但是当我尝试连接读取数据的时候碰到了这个错误,能否帮忙排查下:

image

已知我目前做了这些:

  1. 所在机器可以ping 通nebula 所在服务器

  2. nebula-spark-connector-3.0.0.jar 放到了spark 2.4 的jar 文件夹下

请问这个如何排查这个问题?

BTW: https://www.siwei.io/spark-on-nebula-graph/ 这个blog 写的挺好的,但是主要针对的是用docker 一键部署尝鲜的情况,

有没有完整的python sample 如何使用spark-connector 去读取,写入数据。

您的版本是匹配的么?报错像是指向了版本匹配。

另外 nebula-spark-c 的读走的是 storaged,所以网络上也有需求(https://www.siwei.io/nebula-algo-spark-k8s/

@Reid00
Copy link
Contributor

Reid00 commented Aug 18, 2022

  • spark version 2.4.0
  • nebula-spark-connector-3.0.0.jar 用的是这个jar 包
  • nebula 版本 v3.2.0
    当我用python 代码直接连接,而不是用pyspark shell的使用,报错是这个
    image

@wey-gu
Copy link
Contributor

wey-gu commented Aug 18, 2022

  • spark version 2.4.0

  • nebula-spark-connector-3.0.0.jar 用的是这个jar 包

  • nebula 版本 v3.2.0

当我用python 代码直接连接,而不是用pyspark shell的使用,报错是这个

image

/spark/bin/pyspark --driver-class-path nebula-spark-connector-3.0.0.jar --jars nebula-spark-connector-3.0.0.jar

class-path 是给了的对么?

@Reid00
Copy link
Contributor

Reid00 commented Aug 18, 2022

[root@qcc-hd-test-3 spark240]# ./bin/pyspark --driver-class-path ./jars/nebula-spark-connector-3.0.0.jar --jars ./jars/nebula-spark-connector-3.0.0.jar
Python 2.7.5 (default, Aug  7 2019, 00:51:29) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-39)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/spark240/jars/nebula-spark-connector-3.0.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/spark240/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.15.2-1.cdh5.15.2.p0.3/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Setting default log level to "WARN".

这次添加了--driver-class-path nebula-spark-connector-3.0.0.jar --jars nebula-spark-connector-3.0.0.jar 再pyspark shell 中还是相同的错误
在python 代码中不知道怎么添加这个jar

@Nicole00
Copy link
Contributor

@wey-gu hi siwei 我在尝试用pyspark 的 spark-connector 写入数据,但是当我尝试连接读取数据的时候碰到了这个错误,能否帮忙排查下: image

已知我目前做了这些:

  1. 所在机器可以ping 通nebula 所在服务器
  2. nebula-spark-connector-3.0.0.jar 放到了spark 2.4 的jar 文件夹下

请问这个如何排查这个问题? BTW: https://www.siwei.io/spark-on-nebula-graph/ 这个blog 写的挺好的,但是主要针对的是用docker 一键部署尝鲜的情况, 有没有完整的python sample 如何使用spark-connector 去读取,写入数据。

你可以看下报错日志,你用的nebula一定是2.6.x版本的,不是3.2.0。 客户端版本和服务端不兼容

@Reid00
Copy link
Contributor

Reid00 commented Aug 18, 2022

刚刚因为IP 写错的原因,测试了下,在shell 可以正常运行了.
但是在python 文件中如何写代码?下面的写法本地调试不通

if __name__ == "__main__":
    spark = SparkSession.builder.appName("nebula-connector").getOrCreate()

    df = spark.read.format(
        "com.vesoft.nebula.connector.NebulaDataSource").option(
        "type", "vertex").option(
        "spaceName", "Relation").option(
        "label", "Company").option(
        "returnCols", "keyno,name").option(
        "metaAddress", "10.0.7.00:9559").option(
        "partitionNumber", 1).load()

    df.show(7, truncate=False)

@Reid00
Copy link
Contributor

Reid00 commented Aug 18, 2022

need to add specific jars in sparkSession

if __name__ == "__main__":

    spark = SparkSession.builder\
            .config("spark.jars", "/opt/spark240/jars/nebula-spark-connector-3.0.0.jar")\
            .appName("nebula-connector").getOrCreate()

    df = spark.read.format(
        "com.vesoft.nebula.connector.NebulaDataSource").option(
        "type", "vertex").option(
        "spaceName", "Relation").option(
        "label", "Company").option(
        "returnCols", "keyno,name").option(
        "metaAddress", "10.0.7.89:9559").option(
        "partitionNumber", 1).load()

    df.show(7, truncate=False)

@Reid00
Copy link
Contributor

Reid00 commented Aug 18, 2022

又有另外一个问题,scala 中写入数据Nebula config 是这样写的

python 中怎么写入这个connector config 吗? 还是python 版本无法写?

import com.vesoft.nebula.connector.{
  NebulaConnectionConfig,
  WriteMode,
  WriteNebulaEdgeConfig,
  WriteNebulaVertexConfig
}

@wey-gu
Copy link
Contributor

wey-gu commented Aug 18, 2022

又有另外一个问题,scala 中写入数据Nebula config 是这样写的

python 中怎么写入这个connector config 吗? 还是python 版本无法写?

import com.vesoft.nebula.connector.{

  NebulaConnectionConfig,

  WriteMode,

  WriteNebulaEdgeConfig,

  WriteNebulaVertexConfig

}

👍 哈你找到了在非 pyshell 里加载 jar 给 spark session 的方式。

我之前是“零知识”研究的弄通读的,写还没弄过,不过我在这篇文章的英文版里放了我是怎么找到call reader 的方式并 map 对应读的配置的 string 的,你可以把那个 blog 的语言切换英文看一下。

手机回复不方便弄,我这几天在出差路上🤦‍♂️。

如果不行,我回头找时间研究一下哈。

@Nicole00
Copy link
Contributor

python 中怎么写入这个connector config 吗? 还是python 版本无法写?

pyspark的写法参考siwei的这个readme说明:https://github.com/wey-gu/nebula-up, 他这个repo很棒,有使用示例的

@Reid00
Copy link
Contributor

Reid00 commented Aug 19, 2022

@Nicole00 多谢回复。我刚刚把这个repo 看了下,没找到 如何在Python 文件中调用jar 包里面的com.vesoft.nebula.connector. 中的NebulaConnectionConfig, 方法 😥

wey-gu added a commit that referenced this issue Aug 23, 2022
@wey-gu
Copy link
Contributor

wey-gu commented Aug 23, 2022

@Nicole00 的帮助下,例子可以跑通了,参考

#55

Nicole00 pushed a commit that referenced this issue Aug 24, 2022
* doc: pyspark write example

* Added pyshell calling lines and python file header

discussed in #50
Thanks to @Reid00

* Update README.md

wording

* Update README_CN.md

* Update README.md

* Update README_CN.md

* Update README.md

* Update README_CN.md
Nicole00 added a commit that referenced this issue Aug 31, 2022
* pyspark example added (#51)

* pyspark example added

* Update README_CN.md

* support delete related edges when delete vertex (#53)

* support delete related edges when delete vertex

* add test

* add example for delete vertex with edge (#54)

* doc: pyspark write example (#55)

* doc: pyspark write example

* Added pyshell calling lines and python file header

discussed in #50
Thanks to @Reid00

* Update README.md

wording

* Update README_CN.md

* Update README.md

* Update README_CN.md

* Update README.md

* Update README_CN.md

* spark2.2 reader initial commit

* spark2.2 reader initial commit

* extract common config for multi spark version

* delete common config files

* extract common config and utils

* remove common test

* spark connector reader for spark 2.2

* spark connector writer for spark 2.2

* revert example

* refactor spark version & close metaProvider after finish writing

* refactor common package name

* fix scan part

* refactor spark version for spark2.2

* connector writer for spark2.2

Co-authored-by: Wey Gu <weyl.gu@gmail.com>
@Nicole00
Copy link
Contributor

@Reid00 Hi,你的问题解决了没,该issue 要被close了

@Reid00
Copy link
Contributor

Reid00 commented Nov 14, 2022

yes, thank you for your help @wey-gu @Nicole00 😀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants