In previous post Visualize DAG with Graphviz, I talked about how to use Graphviz to display DAG. Graphviz is greate but it can only generate pictures in svg. For nodes with large amount of data, it will be hard to show all the data in a single picture.
With today’s javascript, it will be nice to show and hide node infomation dynamically. Rowland has created a internal repo called gexplorer, which runs on a flask server with vis.js on the front end. The problem with this implementation is that it is hard to share the data. The other people has to have access to your server, or have to spin up his own server. Or he can only receive screenshot of the picture. I think it would be nice if we have this kinda of dynamic implementation with jupyterlab.
Last Friday(03/24/2023), a colleage Tzintzuni Garcia from bio team of GDC, did a presentation about his attempt. He used bokeh to build the visulaizaiton. Network is shown in the example pictures of bokeh but I could not find good tutorial about how to build network with bokeh on its offical website. I think network is not one of their main focus. And in the presentation, the library is not hendling hierarchy for DAG very well.
Recently I found a python library called pyvis. Which also use vis.js. vis.js is a
A dynamic, browser based visualization library.
The library is designed to be easy to use, to handle large amounts of dynamic data, and to enable manipulation of and interaction with the data.
The library consists of the components DataSet, Timeline, Network, Graph2d and Graph3d.
It has rich implementation for networks. Pyvis is mainly the python wrapper for the network part of vis.js.
The gdc samples from the previous post will look like:
Code
Here is the code for this visualization with pyvis:
A few things to metion about this code.
As you might notice, I used the same color algorithm written by Rowland.
layout=True, directed=True are needed for a good hierarchical layout. Those are not set by default.
display(HTML("graph.html")) is needed for google colab to display the chart. It is not neccessary if you are using jupyterlab etc.
options = options.replace("hubsize", "directed") will replace the subMethod of hierarchical of layout from hubsize to directed, which works better for our DAG. I think hubsize might be more suitable for Undireacted Graph.
The Algorithm from Graphviz is better than vis.js. Our DAG in vis.js sometimes have edges crossing each other. Then you have to manually draw the nodes around to seperate them.
Examples for our live data
Examle
The code is run in our jupyterlab server, which will fetch out the data from our postgres database. Due to the size of the data, I have limited the max_depth and max_width to 10.
Code
The following is the code for the live data:
The thing to notice about this code is that, to get good hierarchical layout, I rewrte the bsf method in our psqlgraph repo and set the level property for each node. This was before I figure out the hierarchical sortMethod property. But I kept this implementation so I can set the max_width for my graph.