Charting a Course
October 25th, 2007(Today’s post is about charts. Mainly, I’m going to talk about the production of charts, and it’ll benefit you the most if you’re looking to make some yourself. They might seem simple, but I think this post will increase your appreciation of the humble chart, too. Even if you’re not much into making them, this post should allow you to pick out some frequent flaws in charts you may see, an always-popular topic at dinner parties.)
Charts are one of the most important parts of data analysis. They form one half of the “dynamic duo” of reporting tools at your disposal, along with tables. Tables are good for specific facts and figures, while charts are more important for visualizing general trends and relationships.
As I’m sure most of you already know (and may be thinking to yourselves already) charts can often be pretty boring, or worse, downright useless. Yet, like data analysis itself, they don’t HAVE to be that way. As Edward Tufte (the preeminent chart-maker and yet another person listed in my sidebar) says, if your data or chart is boring, you have the wrong one/s. (And if this blog is boring, I must say that your reading resolve is admirable.)
So how do you make sure you have the right chart? As one might expect, Tufte’s books are an excellent place to start. He has put out four books so far, and they’re all excellent reads, both beautiful and informative. If you’ve never had the occasion to see a beautiful chart (yes, they do exist), Tufte’s book can be a real eye-opener. Mostly, they are full of vibrant color examples of charts (and other statistical graphics) either done right or done spectacularly wrong. Virtually all the charting books I’ve read take his work as a starting point, and if you read Tufte’s work you can probably see why. Collectively, his books are as close as the statistical visualization field has to a “Bible”, if you will.
Tufte is all about maximising what he calls the “data-ink ratio”: in other words, getting the most data across in the least amount of ink. (Thank goodness this blog doesn’t use ink. And those of you whispering that pixels count too can be quiet.) You want your charts to be as “high resolution” as possible, with a minimum of extraneous labels and other distractions, something Tufte bluntly refers to as “chartjunk”. Sadly, Tufte’s loathed chartjunk is everywhere. Despite the natural human tendency to add this stuff, you have to constantly be on guard against these bits of meaningless flash and fluff. If you get rid of the chartjunk, you should be communicating the maximal amount of information as quickly and succinctly as possible, which is the whole goal of a chart to begin with.
The strict limitation of color is an outgrowth of this idea, too, and is another one of Tufte’s core principles. He suggests that color should be left out of a chart as the default and only included as necessary. The main thing is that color should not harm, and if it does cause harm, it should often be left out or muted, even in lieu of other concerns.
Another outgrowth of the chartjunk idea is the elimination of gridlines. Put simply, most gridlines, labels, and legends are superfluous. (Unless you’re a mapmaker. And not a pirate.) They should be ruthlessly deleted or retooled whenever possible. It’s a lot like the editing process for writing, actually. Get rid of all the stuff you don’t need. (Ask my brother Ed about how much fluff he cuts out of my posts!) (Ed. note: Would you call that “blogjunk”? I can think of a good number of blogs that seem to be made up of it entirely…)
Ideally then, when you get rid of all the chartjunk, the flashy colors, the gridlines, and the labels there should be one thing left, for the most part: data. Thus, you have Tufte’s simplest and most important principle: “Above all, show the data.” Hard to argue with that. (I smell a new catchphrase for this blog, in fact: “SHOW ME THE DATA!”) Tufte talks about other aspects of charting, but there’s no way I can write about the content of four entire books here. (Information-wise, anyway, so you can quit snickering.) If you’re interested in Tufte’s work, his best (and formative) work is “The Visual Display of Quantitative Information”. Despite the book’s enormously exciting title, (I definitely got a few comments while reading it) I encourage you to take a look for yourself and actually see what a vibrant, practical, and aesthetically pleasing book it is. In fact, I would even say that I consider it and his other three books works of art in their own right.
However, one important criticism some people have of Tufte’s work is that he doesn’t give a lot of tips on how to implement his suggestions. And it’s true, I have to admit. So to modify the religious metaphor a bit from earlier, I guess you could say I view his books as sort of a “Tao of Charting” instead, with less one-hand clapping. (And by the way, anyone who says you can’t do that is wrong. If you slap your fingers against your palm (of the same hand) quickly, it really does sorta sound like a clap. Yes, I do have too much free time on my hands. Or hand.)
By contrast, a great book to look at for concrete implementation is “Show Me the Numbers”, by Stephen Few. They did kind of rip off my catchphrase, but I won’t hold that against them - it’s the best handbook on charting I’ve seen, and I’ve read my share. (OK, Tom Cruise had the saying first, I admit it.) In fact, I’d recommend it to anyone that makes charts regularly. Few’s book is refreshingly specific, and I’ll try to give you a few of the basic points here, which you will surely notice the “Tuftian” principles in throughout:
First are the kinds of charts you should avoid, like 3D charts; data series frequently obscure each other in 3D charts and cross-comparisons are especially difficult. Not to mention that they are often visually distracting and needlessly flashy for the information they are attempting to convey. Despite this, you will often see 3D charts, as they are easy to create in spreadsheet programs (so you can blame Microsoft again here if you want). But you should avoid them - keep them 2D.
Pie charts are just as bad. Pie charts measure quantities by area, but people don’t seem to understand that if you double the diameter of a circle, you quadruple the area (A = pi * r2 ). That’s probably why pizzas are often sold as “small, medium, and large” instead of by diameter. Try it for yourself if you’re curious. If you try and guess the area of a random circle, you’ll probably overestimate the area for small circles and underestimate the area for large circles, making them poor intuitive measures for many graphs. (I guess this means that if you want a nice surprise you should order large pizzas. Unless you’re on a diet.)
So, seeing magnitude and cross comparisons on pie charts is hard, if not impossible. You’re better off using a standard bar chart - or, rarely, a stacked bar chart - instead. Again, you’ll see plenty of pie charts around too (being just as easy to create), so resist the temptation. (There even seems to be a fair amount of 3D pie charts. You can guess what I think about those.)
What should you do, then? Your main tools are horizontal/vertical bar charts (bars running across or down the page), line charts (dots connected in a data series), and x-y scatter plots (both the x and y axis are categories with only dots on the graph). Almost any chart you need to make should be covered by those basic three. (Lots of examples of the different chart types are here, for reference.)
As for charting elements, make your charts in greyscale unless you can’t distinguish the data series. In that case, try to choose colors that are muted but high contrast, and clearly distinct from each other (not an easy balancing act, to be sure, but worth doing). A color wheel is your friend here. Try to choose colors that are on opposite ends of the color wheel, or at least far enough apart to be distinct. In general, these colors fit the bill: red, green, yellow, blue, black, white, pink, cyan, gray, orange, brown, and purple. (I’m sure that the similarity to karate belt colors is no accident.) Try to save white as your background contrast color, and black, pink, red, and yellow for your highlighting colors. As Tufte suggests, lighter, paler versions of these colors are soothing and easier to look at, so keep them muted if possible.
If you use Excel, you’ll have to do a lot of tweaking to get colors that meet these criteria. My Excel suggestions are: aqua, brown, grey-25%, lavender, light orange, and lime. (The grey-25% is a custom color. Go to the tools –> options –> color –> modify and select the specific grey to use it. You can also get an array of other colors this way.) If you need more colors, try using custom colors with the help of the color wheel. Few made all the charts in his book with Excel, so you should be able to work out something.
Finally, eliminate most gridlines and borders (except to distinguish charts from other text on the page), mute axes, and eliminate most legends and titles. (Let your inner pirate cartographer out.) Add in skinny, light grey (25%-40%) gridlines if you really need them, and only use them in one direction (horizontal or vertical) if you can. Make most axes light grey (25%-40%), with a minimum of axis labels and tick marks, which should face outward. Directly label and annotate the data when you can, instead of having a legend. If you must have a legend, keep it as near to the data as possible, usually centered across the top. Finally, make the chart title part of the graphing area when you can. All of this should make your chart easier to look at and reduce searching around the page.
All this is a bit much “Tell” without enough “Show”, though. (And I’m sure you remember how lame that was in elementary school.) Regular reader(s) (hi Mom!) may have noticed that there haven’t been any charts on this blog so far. That’s because I think charts are best used sparingly. Still, I wouldn’t want to leave you hanging without any examples, so here are a couple of before-and-after pictures of charts I redid for work:
1st Chart:
2nd Chart:
And here’s a good recent charting example from The Economist. It combines a statistical graphic (the map) with a chart.
From the general to the specific, that’s my quick (no snickering!) primer on charting. Note that I left a lot out a lot of charting concerns and finer details in my post; to get the full story, feel free to read the two books I mentioned.
In summary, if you make charts, I hope you enjoyed. If you read and use charts, I hope you found it interesting. (And if you’re still bored, then I think you probably deserve a medal at this point.)
| | | del.icio.us |
October 27th, 2007 at 8:57 am
Good post, good references. Tufte is always a sure bet. I disagree with your view on pie charts. There are some cases where a pie chart is the best chart, no matter what Tufte says (I just discussed this in my blog).
I also don’t agree with that “important criticism” regarding lack of implementation tips by Tufte. We need general, theoretical principles that show us the way, not another “tips and tricks”.
Jorge, I think your commentary is insightful and interesting. I just read your blog post and some of your references. In light of your post, I sat here for a minute and tried to think of places where pie charts would be useful. Perhaps in a situation where you had a very specific audience you knew well, with say 6 or less items on your chart, it would be a good place to use a pie chart, as long as you didn’t need anything more specific than a percentage and exact cross comparisons were not important. Maybe you could use a pie chart as a very quick “get to know your data” kind of thing.
Still, that’s my own limited imagination. In what situations would you use a pie chart? And why?
As for Tufte, I’d say “criticism” is a bit of a misnomer anyway. All I meant is that Tufte is great for developing your intuitions and for rare or unusual situations. If you need a concrete guide on how to start charting, though, you may need to look for other supplements to his ideas. That’s what I like about Few’s book - if you’re a charting novice, his exercises and specific advice can be extremely helpful. I’d say both Tufte and Few are useful in their own ways. I try and incorporate them both in a “yin and yang” kind of relationship.
I am not immune to concerns about beauty and emotional intelligence, as you mention in your blog. People have to both understand something (IQ) and be motivated to understand it (EQ, or emotional intelligence) to read it. My concern is that, unless you know your audience well, adding EQ touches to your charts may help some and turn off others. I would suggest making the site itself, reports that accompany the chart, or other aspects more appealing rather than making the chart itself more appealing.
Thanks again for your comment Jorge. I think your ideas are an excellent counterpoint to people like Tufte and Few (as well as people like William Cleveland). I downloaded your dashboard, and I was going to include it in one of my posts, but I don’t think I’ve found a way to work it in yet. It’s pretty cool though. It really is amazing what you can do in Excel.
- Dave
November 2nd, 2007 at 12:29 am
[…] surely getting used to my tendency towards that by now) but I got to use a lot of what I know about charts and tables to make the spreadsheet and it was a good opportunity to […]
March 21st, 2008 at 12:31 am
[…] see the basis for my data visualization comments below, check out my previous post on the subject here (which draws largely on the work of Edward Tufte and Steven […]