-
Notifications
You must be signed in to change notification settings - Fork 82
Related Work
- Nov 3, 2017 Rapid Sampling for Visualizations with Ordering Guarantees, VLDB 2015, http://www.vldb.org/pvldb/vol8/p521-kim.pdf
- Nov 3, 2017 A Fast Graphical Javascript Library: Desk.gl https://uber.github.io/deck.gl/#/ by Qiushi
- Oct 18, 2017 Introduce Incrementally visualization algorithms (I’ve Seen “Enough”: Incrementally Improving Visualizations to Support Rapid Decision Making) by Hao
- Oct 11, 2017 Introduce Hashedcubes paper (Hashedcubes: Simple, Low Memory, Real-Time Visual Exploration of Big Data) by Te-Yu
- Oct 6, 2017 Introduce Tableau paper (On Improving User Response Times in Tableau, SIGMOD 2015) by Taewoo
- Sep 29, 2017 Introduce Kylin by Jianfeng
Massive data visualization analysis - analysis of current visualization techniques and main challenges for the future
Download link: https://ieeexplore.ieee.org/abstract/document/7975704
This paper lists a few big data viz tools including D3, Tableau and so on. Those tools introduced in the paper that we are not that familiar with are:
- Google Fusion Tables. This is a good introduction video about it: https://www.youtube.com/watch?v=5l7IyS3u4w8 . It will be discontinued soon (https://support.google.com/fusiontables/answer/9185417?hl=en).
- Quadrigam. A data viz company somehow similar to Tableau but it is based on Spain. It’s interactive. Demos are at http://www.quadrigram.com/gallery/
- Datawrapper. Similar to Tableau and Quadrigam. Demos are at https://www.datawrapper.de/
Though the above tools are claimed to be for big data, in fact, it is still static and small compared to what we want to do.
For example, the above tools focus on the rendering part. The input could be the number of people in each country which is stored in an Excel file, and then a heat map or count map is generated accordingly. And our target is to analyze dynamic data based on the fact that people are born and die every day, and we want to know the number of people born in each town each week.
Ma, Kwan-Liu, and Chris W. Muelder. "Large-scale graph visualization and analytics." Computer 46.7 (2013): 39-46. Linke: https://ieeexplore.ieee.org/abstract/document/6576786
This paper introduces a few key topics in the area of big graph visualization. People are working on new algorithms and developing new tools, however, we didn't find an integrated tool (ideally with a webpage UI) to render real-time and dynamic big graph.
Summarized by Rui Guo.
Link: https://ieeexplore.ieee.org/abstract/document/1703364
It's the paper that proposes OntoVis system. This paper focuses on heterogeneous networks, which means one node can be either a user or an organization. By drawing users and organizations in the same graph, we can understand the relationship between users, between organizations and between users and organization. A few layout algorithms for geometric data (i.e. nodes without a fixed location on the graph) are discussed in the paper.
We are not sure if we'd like to build such a heterogeneous system. A trade-off here is the more types of nodes in a graph, then the easier users will be lost.
Link: https://gephi.org/publications/gephi-bastian-feb09.pdf Summarized by Rui Guo.
Gephi is an open-source project that tries to visualize graph data. This is the best graph visualize system I got so far. The introduction/demo video of it can be found at https://gephi.org/features/ .
Pros of this project are:
- the useful features. As the video shows, users can merge nodes or split nodes, run queries and do clustering and so on. Almost all the ideas that I come up with graph visualizing (except visualize in 3D) can be found here.
- It's maintained by a startup company, and it's more stable than the research projects developed in universities.
Cons of this project are:
- it's client-based. You need to download it and install it, and this can be inconvenient and hard to scale because big data can be distributed on different machines.
- it's still small data. The above paper was published in 2009 and at that time a large network means 20,000 nodes.
- it's not active recently. People are still working on this but commits are not as frequent as in 2016 or before.
Link to the demo: http://oxfordinternetinstitute.github.io/InteractiveVis/network/# Summarized by Rui Guo.
This is a beautiful visualization demo. It's built in 2012 and utilizes sigma.js.
Pros:
- Beautiful UI: 1) the colors of nodes and edges are gentle and distinguished; 2) curved lines instead of straight lines
- Interactive and the latency is very low.
- Utilization of sigma.js , that means the potential to use GPU.
Cons:
- It's small and static data. It will be great if we can show dynamic data and run query on the graph (e.g. who followed me yesterday and who unfollowed).
Summarized by Rui Guo on 2019-02-07.
- Paper link: https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7498340
- Demo video: https://vimeo.com/117547871
- Online demo: http://graphvizdb.com/ . Not very stable to access in the US, maybe due to the server located in Europe.
One sentence summary: This paper proposes a web-based tool to visualize big geometric data (100M nodes and 100M edges).
Pros:
- Support large data set (100M nodes and 100M edges) by offline pre-proceeding. During it,
- locations on the canvas are assigned to nodes by layout algorithms (e.g. greedy algorithm to put the part of the graph with the largest number of edges in the center of the canvas);
- a few abstraction layers are computed (e.g. by compressing a part of the graph into a node), and the layers are similar to the Google Map layers: when zoom-in, you see provinces of a country and then towns, cities and so on.
- B+ tree and R-tree are used to index nodes and their locations on the canvas. In fact, the experimental results show DB is not a bottleneck at all.
- A few operators supported: check the graph vertically (zoom-in) and horizontally (move the graph around) with R-tree index, search tags of nodes and edges with B+ tree index.
- Low latency when checking the graph on the web client. The pre-proceeding may take an hour but the online demo reacts in seconds.
- It shows a linear growth in the overall running time when querying a larger part of the same graph.
Cons:
- The web client is the bottleneck. It seems to use CPU only rather than GPU. Per the statistical data shows, the rendering and delivering time are the major part. -> after a powerful web FE is developed, then we can deal with 1) stream data (e.g. new nodes and edges coming every day, and we may want to do pre-proceeding progressively), 2) advanced graph query (e.g. users who followed and unfollowed me yesterday) to tell a better story about the system architecture.
- More advanced interactive operators can be supported. For example, users may want to merge a few nodes into one node manually as in the Gephi demo.
- Each node is of same size and color. Though nodes and edges are marked with different text tags, they are not distinguishable when checking the graph. It still remains a problem to render one node on the graph because one node may have lots of fields.
Summarized by Rui Guo on Feb 11, 2019
This paper focuses on the data processing part of the graph visualization pipeline, and analyzes the performance by cooking the data by Spark.
What we can learn from the paper:
- We can use Spark and other big data frameworks to pre-proceed graph data, and the source code of the experiments in this paper is available.
- Big graph can be handled by a few machines in a short time (the paper says "12 million vertices and 172 million edges can be executed in <25s").
- The denser a graph is (more edges in a graph), the slower the proceeding is. Luckily, in the real world graphs are sparse (e.g. a friendship network on Facebook).
- References and tools to visualize big data. This paper is a good point to start to know the entire visualization world.
What we can do differently from this paper:
- This paper works on the pre-proceeding (e.g. PageRank and layout algorithms) part on big graphs. We'd like to build an interactive and dynamic system, and that means we need to focus on the rendering part after the pre-proceeding stage.