Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can we have Node2Vec example #49

Closed
porscheme opened this issue May 24, 2022 · 13 comments · Fixed by #58
Closed

Can we have Node2Vec example #49

porscheme opened this issue May 24, 2022 · 13 comments · Fixed by #58
Labels
type/question Type: question about the product

Comments

@porscheme
Copy link

As the subject says can we have Node2Vec example?
@wey-gu

@wey-gu
Copy link
Contributor

wey-gu commented May 25, 2022

Will look into this and come with an example ;)

@wey-gu
Copy link
Contributor

wey-gu commented May 29, 2022

Dear @porscheme

Today I got the bandwidth to run node2vec for you: https://gist.github.com/wey-gu/53e35bc2da571a919f4f0c248c5dd9fc as an example

@sunkararp
Copy link

What version spark/scala should I using to run nebula-algorithm?

Can you provide spark 3.0.0 compatible version?

@wey-gu
Copy link
Contributor

wey-gu commented Jun 16, 2022

What version spark/scala should I using to run nebula-algorithm?

Can you provide spark 3.0.0 compatible version?

For now, it's 2.4.x only as documented, could you possibly use 2.4.x first?

I noticed nebula-exchange supported 3.0.0 with vesoft-inc/nebula-exchange#41, but the equivalent work is not yet planned in nebula-algorithm, but i created an issue for it just now.

@sunkararp
Copy link

sunkararp commented Jun 16, 2022

We normally use spark 3.2.1 but downgraded for Nebula to spark 2.4.6 and scala 2.11.12
Getting below error

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/sql/sources/v2/ReadSupport
        at java.base/java.lang.ClassLoader.defineClass1(Native Method)
        at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017)
        at java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:174)
        at java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:800)
        at java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:698)
        at java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:621)
        at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:579)
        at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
        at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
         $NebulaDataFrameReader.loadEdgesToDF(package.scala:146)

@sunkararp
Copy link

sunkararp commented Jun 16, 2022

It's blocking us making any progress; can you expedite support to spark 3?

@wey-gu
Copy link
Contributor

wey-gu commented Jun 16, 2022

It's blocking us making any progress; can you expedite support to spark 3?

@Nicole00 could you help point directions on why java.lang.NoClassDefFoundError: org/apache/spark/sql/sources/v2/ReadSupport encountered in spark 2.4.6 and scala 2.11.12, please?

@wey-gu
Copy link
Contributor

wey-gu commented Jun 16, 2022

@sunkararp Before @Nicole00 could help look into it, maybe you could refer to my nebula-up playground environment to see the differences?

https://github.com/wey-gu/nebula-up/

after running curl -fsSL nebula-up.siwei.io/all-in-one.sh | bash -s -- v3 spark in a machine with docker, you will have a nebula graph + spark 2.4.

then ~/.nebula-up/nebula-algo-pagerank-example.sh will run page rank in the spark container, you could enter the spark container with docker exec -it spark_master_1 bash to check its difference from yours 2.4.6?

@sunkararp
Copy link

sunkararp commented Jun 16, 2022

  • I was using Spark 2.4.6 and Scala 2.11.12
  • I'm having some challenges in building scala with SBT as there wasn't sbt support to 2.11.12, this could be an issue. I'm not sure
  • It could some version incompatibility, not sure

@sunkararp
Copy link

sunkararp commented Jun 16, 2022

I'm finally able to run in spark 2.4.6, Scala 2.11.12 and OpenJDK 64-Bit 1.8.0_252.

But getting java.lang.NullPointerException

Can you please look into this ASAP?

Below is my spark-submit

spark-submit --master "spark://10.155.48.35:7077" --conf spark.driver.extraClassPath=/home/jovyan/* --conf spark.executor.extraClassPath=/home/jovyan/* --conf spark.executor.instances=3 --conf spark.executor.memory=16G --conf spark.driver.maxResultSize=10G --conf spark.driver.host=10.155.50.21 --class com.vesoft.nebula.algorithm.Main --packages "com.vesoft:nebula-spark-connector:3.0.0,org.apache.spark:spark-core_2.11:2.4.4,org.apache.spark:spark-sql_2.11:2.4.4,com.github.scopt:scopt_2.11:3.7.1,com.typesafe:config:1.4.0,org.apache.spark:spark-mllib_2.11:2.4.4" --deploy-mode client nebula-algorithm-3.0.0.jar -p dev.algorithm.conf

below is my conf file

{
  spark: {
    app: {
        name: My Graph Algorithm 1.0
        partitionNum:100
    }
    master:local
  }

  data: {
    source: nebula
    sink: nebula
    hasWeight: false
  }

  nebula: {
    read: {
        graphAddress: "10.0.195.64:9669"
        metaAddress: "10.0.213.158:9559"
        space: StudentCentral
        user:root
        pswd:nebula        
        labels: ["STUDENT_HAS_CLASS_TCODE"]
    }

    write:{
        graphAddress: "10.0.195.64:9669"
        metaAddress: "10.0.213.158:9559"
        user:root
        pswd:nebula
        space:StudentCentral
        tag:Student
        type:update
    }
  }  


  algorithm: {
    executeAlgo: node2vec
   node2vec:{
       maxIter: 10,
       lr: 0.025,
       dataNumPartition: 10,
       modelNumPartition: 10,
       dim: 10,
       window: 3,
       walkLength: 1,
       numWalks: 3,
       p: 1.0,
       q: 1.0,
       directed: true,
       degree: 30,
       embSeparate: ",",
       modelPath: "hdfs://namenode:9000/model"
    }
  }
}

@sunkararp
Copy link

  • finally able to fix the java.lang.NullPointerException, fix it explained here
  • It works fine for smaller dataset
  • Implementation wasn't using spark worker nodes, was it a known issue?
  • For large data, we are getting java.lang.OutOfMemoryError: GC overhead limit exceeded exception

@wey-gu
Copy link
Contributor

wey-gu commented Jun 20, 2022

Great to see your explorations and results 👍🏻, sorry I couldn't help you on them.

@sunkararp
Copy link

  • This implementation works for small dataset

  • For large dataset, you need huge amount of memory to process. Also, it doesn't use spark capabilities

  • Do you have any modifications to huge dataset? Any Pregel based solutions?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/question Type: question about the product
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants