También disponible en Español

Inf@Vis!

The digital magazine of InfoVis.net

Where do my visitors go?
by Juan C. Dürsteler [message nº 65]

In order to know the behaviour of web visitors it's necessary to gather data, treat it and interpret it. We are going to explore the associated problems as the basis for the next issue on visualisation of web user behaviour.

What the visitors of a web site do when surfing is vital information in order to take fundamental decisions on what to modify about the web's information architecture, where to place the most profitable banners and which strategy drives more people to the elements that yield more economical profit.

Nevertheless to know our users' behaviour is not an easy task. As in any other branch of knowledge, in order to understand what's happening we need:

  1. To gather reliable and relevant data about the phenomenon.

  2. To treat the data (classify, sort, perform statistical calculations, etc.)

  3. To interpret the data and to extract conclusions (possibly by means of visualisation techniques)

Although the previous analysis is extremely basic, too many times I've seen decisions being taken without having established the appropriate channels to capture data and with a less than rigorous treatment of it. This typically leads to erroneous interpretations. To take decisions with ambiguous and fragmentary information is on many occasions an unavoidable art. Doing it without gathering data and interpreting it properly is a temerity.

Typically, the data gathering occurs in the logfile generated by the server

In it are stored the IP addresses that our visitors are using along with the pages downloaded and other information related to our web site server activity. Each server has its own format but there's a common log format that most of them can produce and most of the analysers can understand.

The treatment of this data is the subject of specialised analysis software, like the many listed on the page on web-log analysis of the Open Directory Project . 

A very popular example is Analog, a free software that produces a large amount of statistics. It's available in more than 28 languages. Its graphical capabilities are reduced to elementary charts.

Other providers, like Nedstat, place a piece of software in the web that gathers the data. The statistics are prepared in their own servers (unlike Analog). The graphical output is also quite elementary.

The analysis of search engine logfiles is one of the aspects with least literature and software available, perhaps due to the fact that they are in the realm of the large portals. Any news about the topic will be welcome.

Regarding the interpretation, if we base our research just on the logfile analysis, there is a series of important data that can't be known for sure. Among this data is 

  • the number of visitors, 

  • the time spent inside the web site or 

  • the true browser they are using. 

These facts can only be estimated, not truly known. Surprised?, see the thorough description of the interpretation problems that Stephen Turner, the Analog creator, provides.

So logfile analysis has been the starting point in order to begin to know what happens inside our websites. But it's completely insufficient. Although it's important to know how many pages we serve, which ones are the most consulted, etc. it doesn't tell us the most important things about the usability: what the users like and what they dislike, what pages are more attractive, how visitors move inside the website…

Pure logfile analysis is falling short. Next week we'll see how information visualisation could help us in this arena.

Links of this issue:

http://dmoz.org/Computers/Software/Internet/Site_Management/Log_Analysis/  
http://www.statslab.cam.ac.uk/~sret1/analog/  
http://www.nedstat.com/  
http://www.statslab.cam.ac.uk/~sret1/analog/docs/webworks.html  
© Copyright InfoVis.net 2000-2014