Visualizing multidimensional data: a brief historical overview

The results of a MOEA search are presented as a set of multidimensional data points. In order to form useful conclusions from our results, we must have the ability to comprehend the multidimensional differences between results and effectively analyze and communicate them to decision makers.

Navigating through multiple dimensions is an inherently complex task for the human mind. We perceive the world in three dimensions, and thinking in higher dimensional space can be heavily taxing.  The difficulty of comprehending multidimensional data is compounded when one must display the data on a two dimensional surface such as a sheet of paper or a computer screen. The challenge of “flattening” data has persisted for centuries, and has plagued not only those who were concerned with gleaning scientific insights from data, but also artists and those seeking to accurately portray the physical world as perceived by the human eye.

For much of human history, even the greatest artists were unable to accurately express the three dimensional world in a two dimensional plane. Nicolo da Bologna’s 14th century work, The Marriage, fails to convey any sense of three dimensional space, giving the viewer the impression that the figures painted have been pressed against a pane of glass.

drums

Nicolo da Bologna’s The Marriage (1350s) is unable to convey any sense of depth to the viewer.

During the Italian Renaissance, artists rediscovered the mathematics of perspective, allowing them to break free of the constraints of their two dimensional canvas and convey realistic images that gave the illusion of a third dimension. Raphael’s The school of Athens masterfully uses perspective to imbue his painting with a sense of depth. Through clever exploitation of Euclidean geometry and the mechanics of the human eye, Raphael is able to use the same medium (paint on a two dimensional surface) to convey a much richer representation of his subjects than his Bolognese predecessor.

Sanzio_01

Raphael’s The School of Athens (1509-1511) is an example of a masterful use of perspective. The painting vividly depicts a three dimensional space.

In the twentieth century, artists began attempting to covey more than three dimensions in two dimensional paintings. Cubists such as Picasso, attempted to portray multiple viewpoints of the same image simultaneously, and futurists such as Umberto Boccioni attempted to depict motion and “Dynamism” in their paintings to convey time as a fourth dimension.

picasso_portrait_of_dora_marr_1938

Pablo Picasso’s Portrait of Dora Maar (1938), depicts a woman’s face from multiple viewpoints simultaneously

Dinamismo di un ciclista GM5

Umberto Boccioni’s Dynamism of a Cyclist (1913) attempts to portray a fourth dimension, time, through a sense of motion and energy in the painting. Can you tell this is supposed to be a cyclist, or did I go too far out there for a water programming blog?

Regardless of your views on the validity of modern art, as engineers and scientists we have to admit that in this area we share similar goals and challenges with these artists: to effectively convey multidimensional data in a two dimensional space. Unlike artists, whose objectives are to convey emotions, beauty or abstract ideas through their work, we in the STEM fields seek to gain specific insights from multidimensional data that will guide our actions or investigations.

A notable historical example of the effective use of clever visualization was English physician John Snow’s map of the London Cholera epidemic of 1854. Snow combined data of cholera mortality with patient home addresses  to map the locations of cholera deaths within the city.

John Snow's cholera map of Soho

John Snow’s map of the 1854 London Cholera Epidemic. Each black bar is proportional to the number of cholera deaths at a given residence. Pumps are depicted using black circles. One can clearly see that the cholera deaths are clustered around the pump on Broad Street (which I’ve circled in red).

The results of his analysis led Snow to conclude that a contaminated well was the likely source of the outbreak, a pioneering feat in the field of public health. Snow’s effective visualization not only provided him with insights into the nature of the problem, but also allowed him to effectively communicate his results to a general public who had previously been resistant to the idea of water borne disease.

In his insightful book Visual Explanations: Images and Quantities, Evidence and Narrative, Edward Tufte points to three strengths within John Snow’s use of data visualization in his analysis of the epidemic. First, Snow provided the appropriate context for his data. Rather than simply plotting a time series of cholera deaths, Snow placed those deaths within a new context, geographic location, which allowed him to make the connection to the contaminated pump. Second, Snow made quantitative comparisons within his data. As Tufte points out, a fundamental question when dealing with statistical analysis is “Compared with what?” It’s not sufficient to simply provide data about those who were struck with the disease, but also to explain why certain populations were not effected. By complimenting his data collection with extensive interviews of the local population, Snow was able to show that there were indeed people who escaped disease within the area of interest, but these people all got their water from other sources, which strengthened his argument that the pump was the source of the epidemic. Finally, Tufte insists that one must always consider alternative explanations than the one that seems apparent from the data visualization before drawing final conclusions. It is easy for one to make a slick but misleading visualization, and in order to maintain credibility as an analyst, one must always keep an open mind to alternative explanations. Snow took the utmost care in crafting and verifying his conclusion, and as a result his work stands as a shining example of the use of visualization to explore multidimensional data.

While Snow’s methodology is impressive, and Tufte’s observations about his work helpful, we cannot directly use his methodology to future evaluation of multidimensional data, because his map is only useful when evaluating data from the epidemic of 1854. There is a need for general tools that can be applied to multidimensional data to provide insights through visualizations. Enter the field of visual analytics. As defined by Daniel Keim “Visual analytics combines automated-analysis  techniques with interactive visualization for an effective understanding, reasoning and decision making on the basis of very large and complex data sets”. The field of visual analytics combines the disciplines of data analysis, data management, geo-spacial and temporal processes, spacial decision support, human-computer interaction and statistics. The goal of the field is to create flexible tools for visual analysis and data mining. Noted visualization expert Alfred Inselberg proposed  six criterion that successful visualization tools should have:

  1. Low representational complexity.
  2. Works for any number of dimensions
  3. Every variable is treated uniformly
  4. the displayed object can be recognized under progressive transformations (ie. rotation, translation, scaling, perspective)
  5. The display easily/intuitively conveys information on the properties of the N-dimensional object it represents
  6. The methodology is based on rigorous mathematical and algorithmic results

Using the above criteria, Inselberg developed the Parallel Coordinate plot. Parallel Coordinate plots transform multidimensional relationships into two dimensional patterns which are well suited for visual data mining.

Step 1

An example of a five dimensional data set plotted on a Parallel Coordinate plot. Each line represents a data point, while each axis represents the point’s value in each dimension.

As water resources analysts dealing with multiobjective problems, it is critical that we have the ability to comprehend and communicate the complexities of multidimensional data. By learning from historical data visualization examples and making use of cutting edge visual analytics, we can make this task much more manageable. Parallel coordinate plots are just one example of the many visualization tools that have been created in recent decades by the ever evolving field of visual analytics.  As computing power continues its rapid advancement, it is important that we as analysts continue to ask ourselves whether we can improve our ability to visualize and gain insights from complex multidimensional data sets.

 

Advertisements

3 thoughts on “Visualizing multidimensional data: a brief historical overview

  1. Pingback: Water Programming Blog Guide (3) – Water Programming: A Collaborative Research Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s