También disponible en Español

Inf@Vis!

The digital magazine of InfoVis.net

Finding needles in the haystack.
by Juan C. Dürsteler [message nº 22]

Finding relevant information in Internet's informational ocean may require completely new approaches to the problem.

Searching for information in Internet is hard work that requires, in almost every case, an important intervention by the user. Even with systems like Google   that sometimes surprises me with its very accurate results, usually you have to struggle with a sea of data, looking for the information of interest. This is especially true when less usual or standard the information you are looking for is.

It's indispensable to differentiate between data retrieval and information retrieval systems, with all the spectrum of intermediate cases.

The most easy to use and predictable systems are those of data retrieval. In them, the task consists of returning the documents into a collection that contains certain keywords or whose fields satisfy certain clearly defined logical conditions.

This is the case of a company data base where, with the aid of some knowledge of the query system and (obviously) of the business, you can extract the necessary data for the daily operation. The typical visualisation systems of this case are bar graphs, pie charts and the usual graphs in the companies.

At the other end of the spectrum there are the information retrieval systems, where the data can be of a generic nature, not necessarily well structured and semantically ambiguous.

This possible semantic ambiguity is at the heart of the fundamental problem of these systems: that of relevance. Relevance separates what is of interest for us from what is not, and it's different for every user or even every moment.

The objective of an information retrieval system is to provide all the relevant documents to the user with the minimum number of irrelevant ones. In other words, to return the maximum signal with the minimum noise.

But, by way of example, the other day, looking for information on Mosaic, the pioneer of the browsers, I found a mountain of information on ceramics, glazed tiles and even some history of ancient Rome. But, frankly, not much on what I really was looking for.

Solving the semantic ambiguity isn't an easy task and nowadays it requires an important collaboration from our side. Current systems are still far away from solidly interpreting the semantics of what we ask them. On the other hand, in many cases, we only have a fuzzy idea of how to express what we are looking for.

For this reason work has been undertaken in order to find new ways to visualise the information and interact with the user with the aim of obtaining easier ways to highlight and to separate the relevant information.

Regarding this, it's very commendable the reading of the excellent book Modern Information Retrieval , particularly chapter 10, that is devoted to these topics, can be consulted on-line.

In order to find the needles that all of us are looking for better in Internet's haystack we will need new information retrieval systems coupled with visualisation and interaction systems that allow us to visually identify the interesting information.

Links of this issue:

http://www.google.com  
http://www.infovis.net/printRec.php?rec=glosario&lang=2#Relevancia  
http://www.ncsa.uiuc.edu  
http://www.infovis.net/printRec.php?rec=llibre&lang=2#ModernInfoRet  
http://sunsite.dcc.uchile.cl/irbook/chapters/chap10.html  
© Copyright InfoVis.net 2000-2014