‘What is to be sought in designs for the display of information is the clear portrayal of complexity. Not the complication of the simple’ – Edward Tufte
It turns out that the dissatisfaction we experience looking at one PowerPoint chart after another is because they’re invariably bad. Occasionally we see something good – a clean, simple graphic that lets the data speak for itself. Edward Tufte wrote The Visual Display of Quantitative Information to explain why and show us how.
The book itself is a work of art; hard cover, beautifully typeset, and full of examples that illustrate and inspire. This summary is no substitute for the real thing. Get the book. Also, this post won’t make a lot of sense without having read the book first.
Tuft begins with two chapters that address the principles of graphical excellence and graphical integrity, using excellent examples to make his point.
‘Data graphics visually display measured quantities by means of the combined use of point, line, a coordinate system, numbers, symbols, words, shading, and color.’ ‘At their best, graphics are instruments for reasoning about quantitative information.’
Principles of Graphical Excellence
- Graphical excellence is the well-designed presentation of interesting data – a matter of substance, of statistics, and of design.
- Graphical excellence consists of complex ideas communicated with clarity, precision, and efficiency.
- Graphical excellence is that which give to the viewer the greatest number of ideas in the shortest time with the least ink in the smallest space.
- Graphical excellence is nearly always multivariate.
- And graphical excellence requires telling the truth about the data.
Principles of Graphical Integrity
- The representation of numbers, as physically measured on the surface of the graphic itself, should be directly proportional to the numerical quantities represented.
- Clear, detailed, and thorough labeling should be used to defeat graphical distortion and ambiguity. Write out explanations of the data on the graphic itself. Label important events in the data.
- Show data variation, not design variation.
- In time-series displays of money, deflated and standardized units of monetary measurement are nearly always better than nominal units.
- The number of information-carrying (variable) dimensions depicted should not exceed the number of dimensions in the data.
- Graphics must not quote data out of context.
‘Tables usually outperform graphics in reporting on small data sets of 20 numbers of less. The special power of graphics comes in the display of large data sets’. p56
‘Lie Factor (LF) = size of effect shown in graphic / size of effect in data.’ p57
Tuft moves on to a theory of data graphics
Theory of Data Graphics
Tuft outlines five principles in the theory of data graphics that ‘produce substantial changes in graphical design’:
- Above all else show the data.
- Maximize the data-ink ratio.
- Erase non-data-ink.
- Erase redundant data-ink.
- Revise and edit.
Data-ink ratio (p93) =
- data-ink/total used to print the graphic
- proportion of a graphic’s ink devoted to the non-redundant display of data-information
- 1.0 – proportion of a graphic that can be erased without loss of data-information
‘Redundant data-ink depicts the same number over and over. The labeled, shaded bar of the bar chart, for example, unambiguously locates the altitude in six separate ways (any five of the six can be erased and the sixth will still indicate the height);’ p96
Tuft implores us to ‘forgo chartjunk‘, including:
- moire vibration,
- the grid and
- the duck.
‘Contemporary optical art relies on moire effects, in which the design interacts with the physiological tremor of the eye to produce the distracting appearance of vibration and movement.’ p107
‘This moire vibration, probably the most common form of graphical clutter, is inevitably bad art and bad data graphics.’ p108
‘Moire effects have proliferated with computer graphics (in programs such as Excel).’ p111
‘:moire vibration is an undisciplined ambiguity, with an illusive, eye-straining quality that contaminates the entire graphics, it has no place in data graphical design.’ p112
‘When a graphic serves as a look-up table, then a grid may help in reading and interpolating. But even in this case the grids should be muted relative to the data.’ p116
‘When a graphic is taken over by decorative forms of computer debris, when the data measure and structures become Design Elements, when the overall design purveys Graphical Style rather than quantitative information, then that graphic may be called a duck in honor if the duck-form store,’ p116
‘The addition of a fake perspective to the data structure clutters many graphics.’ p118
Data-Ink Maximization and Graphical Design
‘In this chapter the principles are applied to many graphical designs, basic and advanced, including box plots, bar charts, histograms, and scatterplots. New designs result.’ p123
- Becomes a quartile plot.
- The line begins at the minimum and ends at the maximum.
- The middle half is offset to show the quartile range.
- Alternatively the middle half can be indicated by the absence of a line.
- The median is revealed by a gap in the line.
- Alternatively the median can be revealed with a dot.
- The quartile plot can be used to frame a scatterplot.
- These techniques can be applied to other designs such as a parallel schematic plot.
- Erase the box, retaining a thin baseline.
- Use a white grid instead of ticks on the vertical axis.
- Range frame
- Replace the frame lines with a range-frame.
- Make each range-frame a quartile plot.
- Frame the bivariate scatter with the marginal distribution of each variable.
- ‘The dot-dash-plot combines the two fundamental graphical designs used in statistical analysis, the marginal frequency distribution and the bivariate distribution.’ p133
Multi Functioning Graphical Elements
‘The principle, then, is: Mobilize every graphical element, perhaps several times over, to show the data.’ p139
Data built data measures p139
The graphical element that actually locates or plots the data is the data measure.’
‘The ink of the data measure can itself carry data;’
Stem-and-leaf plot p140
- ‘The stem-and-leaf plot constructs the distribution of a variable with numbers themselves:’
- ‘The simplest – and most useful – meaningful market is a digit.’
Data based grids p145
- ‘Very occasionally the grid can report directly on the data’.
- ‘, the vertical grid lines in the published version are irregularly spaced, keyed to significant events’. p148
Double-functioning labels p149
- Range frames show the actual minimum and maximum realized in the data:
- ‘, the range-frame with range-labels is superior to the range-frame with round number labels.’
- ‘Numbers also double-function when used both to name things (like an identification number) and to reflect an ordering.’
- Co-ordinate labels can be turned into data measures, plotting the line on the chart directly. p151
- ‘The Y-scale now resembles the dashes of the dot-dash-plot, with the vertical column of data-positioned numbers serving as the dashes to indicate the marginal distribution.’
- ‘The method of data-based markers for the marginal distributions suggests a further enhancement of the dot-dash-plot:’ p152
- ‘This graphical arrangement performs better for smaller data sets (say 30 observations or less) and when a fine level of detail is required.’
‘Color often generates graphical puzzles.’ p154
‘, varying shades of gray show varying quantities better than color.’
‘Multiple layers of information are created by multiple viewing depths and multiple viewing angles.’ p154
- ‘what is seen from a distance, an overall structure usually aggregated from an underlying microstructure.’
- ‘what is seen up close and in detail, the fine structure of the data;’
- ‘what is seen implicitly, underlying the graphic – that which is behind the graphic.’
Table graphic example p158
- Read vertically the data is ranked, with names spaced in proportion to the percentages.
- Across the columns the data is paired, to show how the values changed over time.
- The slopes are also compared and unusual slopes stand out.
High-Resolution Data Graphics
‘The principle: Maximize data density and the size of the data matrix, within reason (but at the same time exploiting the maximum resolution of the available data-display technology).’ p166
- ‘Data graphics should often be based on large rather than small data matrices and have a high rather than low data density.’
- ‘The simple things belong in tables or in the text; graphics can give a sense of large and complex data sets that cannot be managed in any other way.’
- ‘a variety of data-reduction techniques – averaging, clustering, smoothing – can thin the numbers out before plotting.’
‘The Shrink Principle: Graphics can be shrunk way down.’ p167
- ‘Many data graphics can be reduced in area by more than half their currently produced size with virtually no loss in legibility.’
High density data tables, such as used in sport statistics, ‘provide an excellent model for all tables, even those in corporate presentations.’ p160
data density of a display = number of entries in data matrix/area of data display
‘Data-thin displays move viewers towards ignorance and passivity, and at times diminish the credibility of the source.’ p161
‘Very few statistical graphics achieve the information display rates found in maps.’ p166
Small multiples p168
- ‘Small multiples resemble the frames of a movie: a series of graphics, showing the same combination of variables, indexed by changes in another variable.
- ‘Small multiples are inherently multivariate, like nearly all interesting problems and solutions in data analysis.’
- ‘Small multiples are an excellent architecture for showing large quantities of multivariate data.’
- ‘Sparklines are small, high-resolution graphics usually embedded in a full context of words, number, images.’
- ‘Sparklines are datawords: data-intense, design-simple, word-sized graphics.’
- ‘By showing recent changes in relation to many past changes, sparklines provide a context for nuanced analysis.’
- ‘Sparklines reduce recency bias.’
- Sparklines show overall trend along with local detail.
- ‘Colors help link sparklines with numbers.’
- ‘Sparklines efficiently display and narrate binary data.’
Aesthetics and Technique in Data Graphical Design
‘Good design has two key elements: Graphical elegance is often found in simplicity of design and complexity of data.’ p177
Attractive displays of statistical information:
- have a properly chosen format and design
- use words, numbers, and drawing together
- reflect a balance, a proportion, a sense of relevant scale
- display an accessible complexity of detail
- often have a narrative quality, a story to tell about the data
- are drawn in a professional manner, with the technical details of production done with care
- avoid content-free decoration, including chartjunk.
The basic structures for showing data are the sentence, the table, and the graphic:
- The conventional sentence is a poor way to show more than two numbers because it prevents comparisons within the data.
- Tables are preferable to graphics for many small data sets.
- A table is nearly always better than a dumb pie chart; the only worse design than a pie chart is several of them.
- Given their low data-density and failure to order numbers along a visual dimension, pie charts should never be used.
- Tables also work well when the data presentation requires many localized comparisons.
- One supertable is far better than a hundred little bar charts.
- For sets of highly labeled numbers, a wordy data graphic – coming close to straight text – works well. p180
- The principle of data/text integration is: Data graphics are paragraphs about data and should be treated as such.
- Write little messages on the plotting field, label outliers and interesting data points, integrate the caption and legend.
- Use the same typeface for text and graphic.
- Avoid ruled lines that separate different types of information.
Friendly vs unfriendly graphics
- See table on p183
- Lines in data should be thin
- An effective aesthetic device is the orthogonal intersection of lines of different weights.
- E.g. time-series with heavier horizontal as the data measure.
- Graphics should tend toward the horizontal, greater in length than height.
- Assists left-to-right labeling.
- Assists emphasis on causal influence.
- Use golden rectangle or other well known proportions.
- +- 50% wider than tall