One of the most powerful ways through which we convey the results of data science is visualization, from simple Excel graphs through advanced displays like network diagrams and bespoke visuals. What most outside the data science community don’t realize is just how much artistry is involved in the creation of some of those visualizations, from the impact of color schemes on perception in geographic mapping to the layout algorithms and data filtering used in network visualizations. Given the rising use of networks to understand everything from social media to semantic graphs, just how much of an impact do our layout algorithms and filtering decisions have on the final images we see?
Network visualizations are at once beautiful and informative, helping us make sense of the macro through micro patterns in the vast connected ecosystems that define the world around us. Yet, like any form of data visualization, network visualization does not capture the sum total reality of our data so much as it constructs one possible reality.
When we think of scientific visualization, we think that the images we see present to us the one single “truth” of a dataset, without realizing that any given dataset can tell many different stories depending on the questions we ask of it and the filters we apply to answer those questions.
The myriad possible filters we apply to a graph to reduce its dimensionality, the layout algorithm that places the nodes in space, the clustering algorithms like modularity that group nodes by “similarity,” the definition of “similarity” that we hand to those clustering algorithms, the node sizing algorithms like PageRank, the color scheme and the random seeds used by many algorithms that ensure each run yields a very different image: all of these conspire to ensure that a single dataset can yield a nearly infinite number of possible visualizations.
How does this process play out in a real world visualization task?
In April 2016 my open data GDELT Project began recording the list of hyperlinks found in the body of each worldwide online news article it monitors. Not all news articles contain links, but many link to external websites such as the homepages of organizations being mentioned in the article or other news outlets from which specific story elements were sourced. These external sources of information provide powerful insights into which websites each news outlet considers worthy of mentioning, in much the same way that the references in an academic paper offer insights into the works believed most relevant and reputable by each field.