14 Mar 2016

Visualization Theory Tools and Resources

As a programmer, I was much more familiar with the RGB Color Model before writing this post. However, many color principles are more easily expressed through the HSV Color Model, so if you’re not familiar with the latter, take a minute to learn it. (It’s not difficult.)

Zooming in on Color

As we saw in my last post, color was one of the least accurate visual encodings that Cleveland and McGill measured. That means that viewers have the most difficulty gauging the exact values represented by changes in color saturation and hue. However, I’m sure you’ve seen a visualization that used color to great effect. Here’s an example that I really like from the New York Times.

New York Times Political Views Visualization

The use of color in this visualization is very effective at highlighting one year’s population’s political leanings over time by only giving one year’s line any color saturation (all others are a light gray). Additionally, the polical leanings of that highlighted line are double encoded by color hue and y position, which is very effective since in U.S. politics, it’s a strong convention to represent republicans with red and democrats with blue.

So, color is the least accurate encoding, yet we can find plenty of examples that use color to great effect. So which is it? Is color important or not? The answer, of course, is that it’s very important, but that it’s also very easy to misuse since it has certain limitations as a visual encoding.

Rules for Encoding Data with Color

Since it’s easy to misuse color as a visual encoding, it’d be helpful to have some guidelines to go by when making visualizations. Stephen Few does just that in his article Practical Rules for Using Color in Charts.

  • Rule 1: If you want different objects of the same color in a table or graph to look the same, make sure that the background—the color that surrounds them—is consistent.
  • Rule 2: If you want objects in a table or graph to be easily seen, use a background color that contrasts sufficiently with the object.
  • Rule 3: Use color only when needed to serve a particular communication goal.
  • Rule 4: Use different colors only when they correspond to differences of meaning in the data.
  • Rule 5: Use soft, natural colors to display most information and bright and/or dark colors to highlight information that requires greater attention.
  • Rule 6: When using color to encode a sequential range of quantitative values, stick with a single hue (or a small set of closely related hues) and vary intensity from pale colors for low values to increasingly darker and brighter colors for high values.
  • Rule 7: Non-data components of tables and graphs should be displayed just visibly enough to perform their role, but no more so, for excessive salience could cause them to distract attention from the data.
  • Rule 8: To guarantee that most people who are colorblind can distinguish groups of data that are color coded, avoid using a combination of red and green in the same display.
  • Rule 9: Avoid using visual effects in graphs.

not all information is created equal

Stephen Few also includes some other experts’ opinions in his paper, including Maureen Stone, who pointed out that “regarding color, people tend to think that more and brighter is better….[but] experts realize that color should be used meaningfully, not arbitrarily or gratuitously.” This sentiment is very much in line with Few’s rules number 3 and 4. Stephen Few goes on to say that “not all information is created equal. Sometimes information must be included that plays only a minor role in the message you’re trying to communicate.” So it’s important to use color carefully as a highlighting tool, and its overuse undermines its effectiveness.

When working with categorical data, color hues are useful encodings, but you should make sure the perceived brightness of all of your colors is the same. Otherwise, certain categories will pop out at the viewer. Keeping color palettes that you can always go back to is a great idea. Here are three palettes suggested by Few (including a high intensity, medium intensity, and low intensity palette).

Three Color Palette's from Stephen Few

Higher intensity colors can be used for highlighting data, and the pale palette can be used to de-emphasize data when necessary. Additionally, when using color on a small graphical component (like points or even lines), using more saturated colors makes them more easily distinguishable.

Maureen Stone goes more in depth to color theory here, boiling these rules down to 3 guidelines:

  • Assign color according to function
  • Use contrast to highlight, analogy to group
  • Control value contrast for legibility

Applying The Color Rules

In searching the internet for data visualization, I’ve come across my fair share of bad visualizations. The following is an example that needs a little touching up regarding its treatment of color. Before we get into that though, I want to emphasize that this visualization is far from the worst I’ve seen. It’s just that it breaks some of the above rules and can be improved with very minor color changes.

A table which uses color in an inefficient way

Applying some of Few’s rules to this table, we can make it a much more effective visualization. Applying Rule 1 and Rule 9, we can determine that we should remove the textured background. Actually, this background image is a heavily blurred picture of power balls. It doesn’t help the viewer contextualize the information any more than the title already does, and this unnecessary “ink” on the graphic is known as chartjunk and should be removed. Next, falling in line with Rule 2, Rule 3, and Rule 4, we don’t need to change the color hue of each alternating line in the table, and instead use color brightness (value) to distinguish alternating rows, which is a subtler way to go about it. Finally, following Rule 5 and Rule 7, the intensity of the header row’s color can be lowered while still maintaining its distinction. All of these changes result in the following table, which is much more pleasant to read even though the changes we made were very minor.

A reimagination of the previous table with its color usage corrected

Color Antipattern: Rainbow Color Scale

The color usage rules above include warning against two antipatters: using a red/green color schemes and using visual effects. There’s another one that’s important to bring up due to its prevalence, known as the rainbow color scale. This scale refers to using a range of hues to represent quantitative data, which is in direct conflict with Rule 6 above. It can be tempting to use a rainbow color scale to add flair to a visualization, but it’s actually harmful to the visualization to do so.

This blog post from goes into the dangers of a rainbow color scale with a heatmap example from a paper published in the Journal of the American Water Resources Association.

Rainbow Color Scale Heatmap of U.S. Evapotranspiration

In the blog post, the author points out that there appears to be a very large change in values right down the middle of the map (from orange-yellow to blue-green). However, if you look at the legend values, you will realize that these values are not that different, changing from the range 0.8-0.89 to 0.7-0.79. The reason for this apparent discrepancy, though, is the fact that the luminance changes drastically for those two adjacent value ranges, which can be observed by removing all color saturation from the legend.

Luminance Values of Rainbow Color Scale Heatmap Legend

You might think that this can be remedied by choosing colors of similar intensity, but even that’s not enough. In the color palettes we saw above from Stephen Few, each palette seemed to have the same brightness, but removing the saturation from those palettes, we can see that the luminance of these color swatches also varies within each palette. This is because human perception is not perfectly consistent across color hues, so luminance must be adjusted to make the viewer perceive the different hues as similar.

Luminance Values of Stephen Few's Color Palettes

The real problem is the usage of such a wide range of hues. In the words of Maureen Stone, “hue contrast is easy to overuse to the point of visual clutter.” This effect was measured by David Borland and Russel M. Taylor II in their paper, Rainbow Color Map (Still) Considered Harmful. They found the same problems with the rainbow color scale, saying “the rainbow color map confuses viewers through its lack of perceptual ordering, obscures data through its uncontrolled luminance variation, and actively misleads interpretation through the introduction of non-data-dependent gradients.”

hue contrast is easy to overuse to the point of visual clutter

Borland and Taylor compare the rainbow color scale to three other gradient scales: grayscale, black-body radiation (which mimics the color of a cannonball being heated), and an isoluminant gradient (meaning luminance is constant). While the latter three color scales have different strenghts and weaknesses that can get very complicated depending on other aspects of your data and the nature of your visualization, the rainbow color scale is less useful across the board for encoding quantitative data on a gradient scale and should always be avoided.

Rainbow Color Scale Comparison with Other Gradient Color Scales

Color Brewer

With so many color considerations, it can be an overwhelming to even know where to start. Luckily, there is a great online tool by Cynthia Brewer that many data visualizers use called ColorBrewer. All of the color palettes generated by this tool are very aesthetically appealing, and you can select from several options to get the best palette for your needs.

There are qualitative palettes for categorical data, sequential palettes for quantified data, and diverging palettes for quantified data that extends in two directions from some sensible origin point. Other options include palettes that are colorblind save and printer friendly. The example visualizations are all maps, but these palettes are useful for any type of visualization. If it’s absolutely necessary, you can achieve a continuous gradient for quantified data by interpolating the colors between the colors of any given palette, but always remember that viewer accuracy for differences on a color gradient is lowest of all of the visual encodings.

comments powered by Disqus