-
Notifications
You must be signed in to change notification settings - Fork 82
Related Work
- Nov 3, 2017 Rapid Sampling for Visualizations with Ordering Guarantees, VLDB 2015, http://www.vldb.org/pvldb/vol8/p521-kim.pdf
- Nov 3, 2017 A Fast Graphical Javascript Library: Desk.gl https://uber.github.io/deck.gl/#/ by Qiushi
- Oct 18, 2017 Introduce Incrementally visualization algorithms (I’ve Seen “Enough”: Incrementally Improving Visualizations to Support Rapid Decision Making) by Hao
- Oct 11, 2017 Introduce Hashedcubes paper (Hashedcubes: Simple, Low Memory, Real-Time Visual Exploration of Big Data) by Te-Yu
- Oct 6, 2017 Introduce Tableau paper (On Improving User Response Times in Tableau, SIGMOD 2015) by Taewoo
- Sep 29, 2017 Introduce Kylin by Jianfeng
Massive data visualization analysis - analysis of current visualization techniques and main challenges for the future
Download link: https://ieeexplore.ieee.org/abstract/document/7975704
This paper lists a few big data viz tools including D3, Tableau and so on. Those tools introduced in the paper that we are not that familiar with are:
- Google Fusion Tables. This is a good introduction video about it: https://www.youtube.com/watch?v=5l7IyS3u4w8 . It will be discontinued soon (https://support.google.com/fusiontables/answer/9185417?hl=en).
- Quadrigam. A data viz company somehow similar to Tableau but it is based on Spain. It’s interactive. Demos are at http://www.quadrigram.com/gallery/
- Datawrapper. Similar to Tableau and Quadrigam. Demos are at https://www.datawrapper.de/
Though the above tools are claimed to be for big data, in fact, it is still static and small compared to what we want to do.
For example, the above tools focus on the rendering part. The input could be the number of people in each country which is stored in an Excel file, and then a heat map or count map is generated accordingly. And our target is to analyze dynamic data based on the fact that people are born and die every day, and we want to know the number of people born in each town each week.
Ma, Kwan-Liu, and Chris W. Muelder. "Large-scale graph visualization and analytics." Computer 46.7 (2013): 39-46. Linke: https://ieeexplore.ieee.org/abstract/document/6576786
This paper introduces a few key topics in the area of big graph visualization. People are working on new algorithms and developing new tools, however, we didn't find an integrated tool (ideally with a webpage UI) to render real-time and dynamic big graph.
Summarized by Rui Guo.
Link: https://ieeexplore.ieee.org/abstract/document/1703364
It's the paper that proposes OntoVis system. This paper focuses on heterogeneous networks, which means one node can be either a user or an organization. By drawing users and organizations in the same graph, we can understand the relationship between users, between organizations and between users and organization. A few layout algorithms for geometric data (i.e. nodes without a fixed location on the graph) are discussed in the paper.
We are not sure if we'd like to build such a heterogeneous system. A trade-off here is the more types of nodes in a graph, then the easier users will be lost.
Link: https://gephi.org/publications/gephi-bastian-feb09.pdf Summarized by Rui Guo.
Gephi is an open-source project that tries to visualize graph data. This is the best graph visualize system I got so far. The introduction/demo video of it can be found at https://gephi.org/features/ .
Pros of this project are:
- the useful features. As the video shows, users can merge nodes or split nodes, run queries and do clustering and so on. Almost all the ideas that I come up with graph visualizing (except visualize in 3D) can be found here.
- It's maintained by a startup company, and it's more stable than the research projects developed in universities.
Cons of this project are:
- it's client-based. You need to download it and install it, and this can be inconvenient and hard to scale because big data can be distributed on different machines.
- it's still small data. The above paper was published in 2009 and at that time a large network means 20,000 nodes.
- it's not active recently. People are still working on this but commits are not as frequent as in 2016 or before.
Link to the demo: http://oxfordinternetinstitute.github.io/InteractiveVis/network/# Summarized by Rui Guo.
This is a beautiful visualization demo. It's built in 2012 and utilizes sigma.js.
Pros:
- Beautiful UI: 1) the colors of nodes and edges are gentle and distinguished; 2) curved lines instead of straight lines
- Interactive and the latency is very low.
- Utilization of sigma.js , that means the potential to use GPU.
Cons:
- It's small and static data. It will be great if we can show dynamic data and run query on the graph (e.g. who followed me yesterday and who unfollowed).
Summarized by Rui Guo on 2019-02-07.
- Paper link: https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7498340
- Demo video: https://vimeo.com/117547871
- Online demo: http://graphvizdb.com/ . Not very stable to access in the US, maybe due to the server located in Europe.
One sentence summary: This paper proposes a web-based tool to visualize big geometric data (100M nodes and 100M edges).
Pros:
- Support large data set (100M nodes and 100M edges) by offline pre-proceeding. During it,
- locations on the canvas are assigned to nodes by layout algorithms (e.g. greedy algorithm to put the part of the graph with the largest number of edges in the center of the canvas);
- a few abstraction layers are computed (e.g. by compressing a part of the graph into a node), and the layers are similar to the Google Map layers: when zoom-in, you see provinces of a country and then towns, cities and so on.
- B+ tree and R-tree are used to index nodes and their locations on the canvas. In fact, the experimental results show DB is not a bottleneck at all.
- A few operators supported: check the graph vertically (zoom-in) and horizontally (move the graph around) with R-tree index, search tags of nodes and edges with B+ tree index.
- Low latency when checking the graph on the web client. The pre-proceeding may take an hour but the online demo reacts in seconds.
- It shows a linear growth in the overall running time when querying a larger part of the same graph.
Cons:
- The web client is the bottleneck. It seems to use CPU only rather than GPU. Per the statistical data shows, the rendering and delivering time are the major part. -> after a powerful web FE is developed, then we can deal with 1) stream data (e.g. new nodes and edges coming every day, and we may want to do pre-proceeding progressively), 2) advanced graph query (e.g. users who followed and unfollowed me yesterday) to tell a better story about the system architecture.
- More advanced interactive operators can be supported. For example, users may want to merge a few nodes into one node manually as in the Gephi demo.
- Each node is of same size and color. Though nodes and edges are marked with different text tags, they are not distinguishable when checking the graph. It still remains a problem to render one node on the graph because one node may have lots of fields.
Link of the paper: https://www.sciencedirect.com/science/article/pii/S0167739X17323610
Summarized by Rui Guo on Feb 11, 2019
This paper focuses on the data processing part of the graph visualization pipeline, and analyzes the performance by cooking the data by Spark.
What we can learn from the paper:
- We can use Spark and other big data frameworks to pre-proceed graph data, and the source code of the experiments in this paper is available.
- Big graph can be handled by a few machines in a short time (the paper says "12 million vertices and 172 million edges can be executed in <25s").
- The denser a graph is (more edges in a graph), the slower the proceeding is. Luckily, in the real world graphs are sparse (e.g. a friendship network on Facebook).
- References and tools to visualize big data. This paper is a good point to start to know the entire visualization world.
What we can do differently from this paper:
- This paper works on the pre-proceeding (e.g. PageRank and layout algorithms) part on big graphs. We'd like to build an interactive and dynamic system, and that means we need to focus on the rendering part after the pre-proceeding stage. What's more, the input data may be dynamic rather than static (i.e. streamed data coming in every second), and the query/interaction can be dynamic (e.g. query a sub-graph, select different labels).
Download link: https://link.springer.com/chapter/10.1007/978-3-030-00374-6_33 Summarized by Rui Guo on Feb 11, 2019
Similar to the above one (Visualizing large knowledge graphs: A performance analysis) though this one is shorter and doesn't work on performance analysis.
Download link: http://www.semantic-web-journal.net/system/files/swj1227.pdf Summarized by Rui Guo on Feb 14, 2019
It is from the same group as the graphvizdb project. This paper focuses on range query, e.g. how many users are between age 20-year-old and 50-year-old. This is a different view to visualize a graph by charts and histograms rather than by drawing the graph itself on the screen. A new tree data structure (maybe similar to B+ tree?) is defined in the paper to speed up the query.
Download link: http://graphvis.com/pubs/ahmed-et-al-icwsm15.pdf Website: http://graphvis.com/ Summarized by Rui Guo on Feb 20, 2019
This is a web-based graph data visualization tool developed at Purdue University. However, there is no related video and I need to apply for access to the online demo. I'll share more details if I am given access.
Pros:
- web-based client and interactive. Cons:
- deal with static data (?);
- The paper lists lots of its fancy features, but it is easy to get lost here. Maybe it would be better to demo its a few key features on a specific dataset and show that it is a killer-level application rather than a huge combination of visualization algorithms.
Paper link: https://www.liebertpub.com/doi/pdf/10.1089/big.2015.0056
Demo link: https://imperialcollegelondon.app.box.com/v/bitcoinVis
Summarized by Rui Guo on Feb 21, 2019
This paper focuses on Bitcoin transaction viz. Each transaction is presented in a few nodes (input account, output account) and edges (transaction). They use SigmaJS for the web-based UI part and ForceAtlas2 (reference: ForceAtlas2, a Continuous Graph Layout Algorithm for Handy Network Visualization Designed for the Gephi Software) as the layout algorithm. Per our previous discussion, we would like to adopt the same technical tools to develop our system.
Pros of this paper:
-
Very straight forward visualization, and provides pretty convincing examples that show the effectiveness of the system (e.g. transaction rate attack by sending the small amount of money between a few accounts over and over again);
-
Well-organized: it introduces how Bitcoin works, how the visualization system is built and shares a few examples, and then try to persuade the reader the effectiveness of the system by giving the feedback from the visitors to their group (unlike the traditional domains of database, we cannot compare two visualization systems by benchmarks and experiments).
Future works of the paper:
- It focuses on one block of transactions, and maybe it can be extended to across multiple blocks to track the transactions on the same account.
Paper link: https://www.hindawi.com/journals/abi/2017/1278932/
Summarized by Rui Guo on Feb 23, 2019
This paper compares 4 popular graph viz tools. According to the paper, for general viz purpose Gephi is the best, and for scalability, Pajek-XXL is the best (support > 10 billion nodes).
I googled a little bit, and the homepages and video demos for those tools are:
Tulip (http://tulip.labri.fr/TulipDrupal/)
- geometric data: https://www.youtube.com/watch?v=Gs8FeatDccI
- Run on Google map: https://www.youtube.com/watch?v=PWBpmLtTFH8
Cytoscape (https://cytoscape.org/what_is_cytoscape.html):
- https://www.youtube.com/watch?v=iGpxX0Kd4Z0
- Web version: Cytoscape.js http://js.cytoscape.org/
Pajek (http://mrvar.fdv.uni-lj.si/pajek/):
Paper Link: https://arxiv.org/pdf/1601.08059.pdf
Summarized by Rui Guo
- 144 references, and lots of tools mentioned in the paper Challenges in big graph viz:
- Queries -> user-specific
- Sampling and filtering
- Aggregation (e.g. clustering)
- Offline pre-proceedings, incremental/progressive to new and dynamic data
Book link: https://neo4j.com/graph-databases-book/
Summarized by Rui Guo on March 23, 2019
Chapter 6 of the book talks about the internal design of Neo4j. Basically, instead of a B+ tree, Neo4j stores relationships in linked lists. For example, all the friends of Alice are stored in a list, and if we have the header of the friend list then it is not necessary for us to query a B+ tree anymore. To speed up the query on the list, each Neo4j record is of the same length, that means given the record id, the location of the record can be calculated immediately using the fixed record length and the offset to the record.
Currently, the bottleneck of the visualization system is on the frontend and network. In the near future we might get issues on the backend performance and Neo4j can be one of the solutions here.