|InfoVis.net>Magazine>message nº 6||Published 2000-08-14|
|También disponible en Español|
The digital magazine of InfoVis.net
Maybe his best known contribution is the Fast Fourier Transform (FFT). Nevertheless, Tukey has contributed to modern statistics in many other ways. Among them Exploratory Data Analysis (EDA) is one of the most outstanding.
His book Exploratory Data Analysis (1977) is the classic reference on the topic. EDA is a philosophy of statistical data exploration basically of graphic nature. For this reason sometimes it is confused with graphical statistics, although EDA goes far beyond this.
The interest of EDA, for our purposes, relies on the power that graphics add to the statistical tools. Graphics provide a great help in understanding the meaning of the data.
It's worth spending some time to take a look to some of the plots invented by Tukey as the Box-and-Whisker Plot or the Stem-and-Leaf Diagram, among others.
In the Stem & Leaf diagram, each element of data represents its own value and, at the same time, occupies a space in a way in which we obtain simultaneously the profile of a univariate distribution and the presentation of the data themselves. Moreover, repetitive information is reduced to a minimum.
As an example of this, I have prepared a train timetable from a leaflet of the line Castelldefels-Barcelona(Sants) gathered at Renfe's railway station.
Originally, the timetable occupies a 10 rows by 9 columns table with an additional "widow" column for the 22:38 train. A total of 91 fields with hh.mm format, 455 characters.
In the Stem & Leaf diagram we represent the hours to the left of the separation bar | and minutes of each train departure to the right. The train frequency can be easily deduced from the length of the rows. Moreover it's very easy to identify the pattern of departure of the trains.
On the other hand, given that at some hours the frequency is exactly the same, (for instance between 13 and 20), we can even reduce further the timetable without losing any information and increasing the clarity and ease of use (see the reduced Stem & Leaf diagram to the right)
Finally we have 59 2-digit fields that add up 118 characters plus the separator bars. This is 4 times less digits than with the original timetable. Less space and more clarity.
This tells us that an appropriate disposition of data can be twice as informative and that graphic representation can contribute enormously to pattern perception and to the understanding of the nature of phenomena.
Does anybody imagine looking at sales evolution without representing it graphically?. Who hasn't seen the stock exchange evolution as a wavy line?. Or an illness frequency histogram by population age?.
Surely many of us are using EDA in a daily basis without even knowing it.
Links of this issue:
Subscribe to the free newsletter