by Juan C. Dürsteler
[message nº 74]
|The creation of quantitative graphics has well defined components and a well defined grammar. Knowing them helps to distinguish the different parts of a graphic and to choose its representation properly.
The specification of a graphic representation, no matter how complex it is, is susceptible to being expressed by means of language rules. Leland Wilkinson, professor of statistics at the NorthWestern University and writer of the statistical program SYSTAT that gave birth to SYSTAT, INC founded by himself, has recently written a book titled “The Grammar of Graphics”.
We won’t enter here into the complexities of grammar building. Those interested in the topic can consult the book itself or the book “Visual Language Theory” edited by Marriot and Meyer.
What I’m interested in sharing here is the decomposition of the components of a graphic that Wilkinson makes. To him specifications are divided into seven components that he calls:
- DATA: a set of operations that create variables from a dataset. A dataset differentiates from the raw data in that it has a structure, for example a table or a database. A variable can be a subset of the dataset (e.g. a column within the table).
- TRANS: transformations that are applied to the variables (e.g. sorting, average…)
- FRAME: a set of variables, related by operators that define a space. It is the particular selection of variables that we choose to represent.
- SCALE: Geometric transformations that define the scale in which we represent (linear, logarithmic, probabilistic…)
- COORD:The coordinate system to be used. Among them you find Cartesian, polar, cylindrical, etc.
- GRAPH: lThe elements to be drawn (e.g. points, rectangles or other shapes) and their aesthetical attributes (colour, filling pattern…).
- GUIDE: Elements that put the graphs into context, like axis, legends, grids…
You can see an example of this decomposition in the following diagram
Components of a graphic.
Note that some of them are visible (Graph, Guides) while others don't appear physically in the graphic, although they are indispensable to perform the graphic, like Trans, Scale or Data.
|Country ||Male ||Female |
|Russia ||62 ||75 |
|Finland ||67 ||76 |
|Germany ||68 ||75 |
|Austria ||68 ||76 |
|Canada ||69 ||77 |
|U.S.A ||69 ||77 |
|France ||69 ||77 |
|U.K. ||69 ||76 |
|Japan ||71 ||77 |
|Sweden ||72 ||78 |
|Raw data is that of each country. |
Since we have structured it onto a table, it becomes a dataset.
Each column can be associated to a variable.
The space defined by the combination of variables to be represented.
In this case Country x Male and Female
The dataset has been sorted by increasing age of the "Male" variable.
We use a linear scale with a minimum at 60 and maximum of 80.
Vertical scale is quantitative.
Horizontal scale is categorical.
| || |
Rectangles with shaded colours
Legend containing the codification relating colours and variables.
Vertical axis depicting the quantitative scale
Horizontal axis with the names associated to the categorical scale.
OK, we know that, but how does this help us when we already have a graphical representation program (like Excel for example) and we have to face the task of creating a chart for the next meeting?
The above mentioned scheme is very powerful in order to prepare graphics since it abstracts from the particular details and focuses on the elements that any statistical graphic is composed of, no matter how different to what we are used to it is.
So we can consider each one of them separately in order to apply the procedure that we discussed last week (see the previous issue). To avoid boring the reader I will concentrate on two aspects that are usually the subject of abuse.
|Scale ||Annotation |
Some of the guides that are used
in statistical graphics
according to L. Wilkinson
Guides: they give us a way to interpret the graphic.
If we assign different colours to different variables, we need a legend in order to know which colour corresponds to which variable. Axes provide a way for us to know the range of values represented. A grid can help to identify the specific value of a particular graph (a point for example).
Nevertheless, many times what stands out most from a chart is a thick grid, with some robust axes containing big numbers in bold typeface.
Sometimes we forget that the goal of a graphic is to show the data. For this reason Edward Tufte proposes the so called data/ink ratio, that stands for the ratio between the amount of ink used to depict the data against the total amount of ink used in the graphic.
What Tufte proposes is not, of course, to begin counting the area covered by the ink but to consider how much ink that does not belong to data can be erased without losing information. Many grids resembling graph paper would disappear… Guides are not the graphic, they are only auxiliary elements that should not compete with the data, but complement it.
Aesthetical attributes of the el graph.
|Form ||Surface ||Movement ||Sound ||Text |
Some of the aesthetical atributes of graphs
(the graphical representation of data) according to L. Wilkinson
Colours, patterns, 3D effects are also the most visible things in a graphic and frequently they are used abusively.
Strongly saturated colours placed side by side in area graphics, for example, distract the attention from what is really important, which is the variables.
The same can be said about textures with highly contrasted patterns that produce effects of vibration particularly boring and that, again, focus our attention on what is accessory, not in what is of our interest.
For good design is the one that is there, but you don’t notice it. A good graphic is the one that helps you to gain insight, the one that makes you say Aha!, even though its structure, colours, etc. are not perceived consciously.
Links of this issue: