Link Prediction in Citation Networks

A set of methods and model evaluation metrics for predicting links in an academic citation network using Apache Spark and Scala.

Description

In this experimental study we develop methods and try to evaluate models for predicting links in an academic citation network, by taking two different aspects into consideration:

Having an insight about the existing network and some of its links and trying to restore a portion of it that has been deliberately removed
Having no information about the existing network and rely only on the information of the scientific papers in order to predict the structure of the whole network.

For the first aspect we used supervised binary classification and more specifically the method of Logistic Regression which had a very good result, with F1 score close to 86% against the testing set. For the second aspect we relied mainly on Jaccard Similarity of the MinHash LSH of each paper’s abstract which had being vectorized using TF-IDF.

For more detailed information check the draft paper.

Prerequisites

Dataset

Our dataset contains 27,770 academic papers that are associated with the following information:

1. unique ID
2. publication year (between 1993 and 2003)
3. title
4. authors
5. name of journal
6. abstract

And exists under src/main/resources.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
src/main		src/main
Draft Paper - Barzokas - Link Prediction.pdf		Draft Paper - Barzokas - Link Prediction.pdf
LICENSE		LICENSE
README.md		README.md
build.sbt		build.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Link Prediction in Citation Networks

Description

Prerequisites

Dataset

About

Languages

License

vbarzokas/apache-spark-link-prediction

Folders and files

Latest commit

History

Repository files navigation

Link Prediction in Citation Networks

Description

Prerequisites

Dataset

About

Topics

Resources

License

Stars

Watchers

Forks

Languages