Skip to content
ruig2 edited this page Feb 12, 2019 · 28 revisions

Paper presentations

References to big data visualization

Massive data visualization analysis - analysis of current visualization techniques and main challenges for the future

Download link: https://ieeexplore.ieee.org/abstract/document/7975704

This paper lists a few big data viz tools including D3, Tableau and so on. Those tools introduced in the paper that we are not that familiar with are:

Though the above tools are claimed to be for big data, in fact, it is still static and small compared to what we want to do.

For example, the above tools focus on the rendering part. The input could be the number of people in each country which is stored in an Excel file, and then a heat map or count map is generated accordingly. And our target is to analyze dynamic data based on the fact that people are born and die every day, and we want to know the number of people born in each town each week.

Large-Scale Graph Visualization and Analytics

Ma, Kwan-Liu, and Chris W. Muelder. "Large-scale graph visualization and analytics." Computer 46.7 (2013): 39-46. Linke: https://ieeexplore.ieee.org/abstract/document/6576786

This paper introduces a few key topics in the area of big graph visualization. People are working on new algorithms and developing new tools, however, we didn't find an integrated tool (ideally with a webpage UI) to render real-time and dynamic big graph.

Visual Analysis of Large Heterogeneous Social Networks by Semantic and Structural Abstraction

Summarized by Rui Guo.

Link: https://ieeexplore.ieee.org/abstract/document/1703364

It's the paper that proposes OntoVis system. This paper focuses on heterogeneous networks, which means one node can be either a user or an organization. By drawing users and organizations in the same graph, we can understand the relationship between users, between organizations and between users and organization. A few layout algorithms for geometric data (i.e. nodes without a fixed location on the graph) are discussed in the paper.

We are not sure if we'd like to build such a heterogeneous system. A trade-off here is the more types of nodes in a graph, then the easier users will be lost.

Gephi : An Open Source Software for Exploring and Manipulating Networks

Link: https://gephi.org/publications/gephi-bastian-feb09.pdf Summarized by Rui Guo.

Gephi is an open-source project that tries to visualize graph data. This is the best graph visualize system I got so far. The introduction/demo video of it can be found at https://gephi.org/features/ .

Pros of this project are:

  1. the useful features. As the video shows, users can merge nodes or split nodes, run queries and do clustering and so on. Almost all the ideas that I come up with graph visualizing (except visualize in 3D) can be found here.
  2. It's maintained by a startup company, and it's more stable than the research projects developed in universities.

Cons of this project are:

  1. it's client-based. You need to download it and install it, and this can be inconvenient and hard to scale because big data can be distributed on different machines.
  2. it's still small data. The above paper was published in 2009 and at that time a large network means 20,000 nodes.
  3. it's not active recently. People are still working on this but commits are not as frequent as in 2016 or before.

Interactive Visualizations demo from Oxford Internet Institute, University of Oxford

Link to the demo: http://oxfordinternetinstitute.github.io/InteractiveVis/network/# Summarized by Rui Guo.

This is a beautiful visualization demo. It's built in 2012 and utilizes sigma.js.

Pros:

  • Beautiful UI: 1) the colors of nodes and edges are gentle and distinguished; 2) curved lines instead of straight lines
  • Interactive and the latency is very low.
  • Utilization of sigma.js , that means the potential to use GPU.

Cons:

  • It's small and static data. It will be great if we can show dynamic data and run query on the graph (e.g. who followed me yesterday and who unfollowed).

graphVizdb: A Scalable Platform for Interactive Large Graph Visualization

Summarized by Rui Guo on 2019-02-07.

One sentence summary: This paper proposes a web-based tool to visualize big geometric data (100M nodes and 100M edges).

Pros:

  • Support large data set (100M nodes and 100M edges) by offline pre-proceeding. During it,
  1. locations on the canvas are assigned to nodes by layout algorithms (e.g. greedy algorithm to put the part of the graph with the largest number of edges in the center of the canvas);
  2. a few abstraction layers are computed (e.g. by compressing a part of the graph into a node), and the layers are similar to the Google Map layers: when zoom-in, you see provinces of a country and then towns, cities and so on.
  3. B+ tree and R-tree are used to index nodes and their locations on the canvas. In fact, the experimental results show DB is not a bottleneck at all.
  • A few operators supported: check the graph vertically (zoom-in) and horizontally (move the graph around) with R-tree index, search tags of nodes and edges with B+ tree index.
  • Low latency when checking the graph on the web client. The pre-proceeding may take an hour but the online demo reacts in seconds.
  • It shows a linear growth in the overall running time when querying a larger part of the same graph.

Cons:

  • The web client is the bottleneck. It seems to use CPU only rather than GPU. Per the statistical data shows, the rendering and delivering time are the major part. -> after a powerful web FE is developed, then we can deal with 1) stream data (e.g. new nodes and edges coming every day, and we may want to do pre-proceeding progressively), 2) advanced graph query (e.g. users who followed and unfollowed me yesterday) to tell a better story about the system architecture.
  • More advanced interactive operators can be supported. For example, users may want to merge a few nodes into one node manually as in the Gephi demo.
  • Each node is of same size and color. Though nodes and edges are marked with different text tags, they are not distinguishable when checking the graph. It still remains a problem to render one node on the graph because one node may have lots of fields.

Visualizing large knowledge graphs: A performance analysis

Summarized by Rui Guo on Feb 11, 2019

This paper focuses on the data processing part of the graph visualization pipeline, and analyzes the performance by cooking the data by Spark.

What we can learn from the paper:

  • We can use Spark and other big data frameworks to pre-proceed graph data, and the source code of the experiments in this paper is available.
  • Big graph can be handled by a few machines in a short time (the paper says "12 million vertices and 172 million edges can be executed in <25s").
  • The denser a graph is (more edges in a graph), the slower the proceeding is. Luckily, in the real world graphs are sparse (e.g. a friendship network on Facebook).
  • References and tools to visualize big data. This paper is a good point to start to know the entire visualization world.

What we can do differently from this paper:

  • This paper works on the pre-proceeding (e.g. PageRank and layout algorithms) part on big graphs. We'd like to build an interactive and dynamic system, and that means we need to focus on the rendering part after the pre-proceeding stage.