También disponible en Español

Inf@Vis!

The digital magazine of InfoVis.net

Multimodal Systems
by Juan C. Dürsteler [message nº 139]

The interface between humans and computers still suffers from many deficiencies. Multimodal systems using multibiometric elements, multimodal interfaces and multisensor systems are beginning to alleviate many of them.
 
Multimodal_en.gif (16164 bytes)
Multimodal systems are supported by three legs: multisensor systems, multibiometric systems and multimodal interfaces
Source: Diagram by the author
Communicating humans and machines has always been a difficult and complex matter, probably due to the fact that insufficient sophistication of the internal behaviour of the machines has obliged their designers to make the user adapt to its way of functioning instead of doing the inverse.

For example, it’s important that my computer knows who am I in order to identify my user profile, customising my desktop, and avoiding that other users could erase accidentally (or maliciously) my work. Nevertheless my computer is stupid enough not to recognise me but only what I remember (my user id and password) or, at most, what I own (a magnetic card). For any one of us (humans) knowing who our speaker is, is a trivial issue of just glancing at him /her or just hearing his/her voice.

Another example, in the hospital where my wife works, like many other ones in the world, they are trying to get rid of paper, managing medical histories, prescriptions and other elements of medical practice by computer. In general this is causing big usability problems that even get to the bitter opposition that has obliged the Los Angeles Cedar-Sinai hospital to dismantle the system they tried to implement.

For traditional computers aren’t appropriate for some tasks for which very efficient operatives have been developed, many times supported by paper, like patient face to face emergency consultation, planning a road over a map or even air traffic control using paper strips. Many of these activities are collaborative ones, making the digital counterpart very difficult to implement by traditional means.

For this reason a new generation of multimodal systems is appearing, trying to alleviate many of these problems in a flexible, adaptable, robust and fault tolerant way form the perspective of the synergy among several unimodal techniques that can complement one another.

This new generation is based according to Sharon Oviatt, Trevor Darrel and Myron Flickner (“Multimodal Interfaces that Flex Adapt and Persist” in Communications of the ACM Vol 47 num 1), at least, three foundations:

  • Multibiometric systems, that combine several biometric techniques that reduce the error by complementing one another.

  • Multimodal interfaces 

  • Multisensor Systems .

Multibiometric systems try to palliate the false reject (rejecting a valid user) and false acceptance problems (accepting an invalid user) that range from 0,2% for the best fingerprint recognition algorithms to 10-20% of false rejection (2-5% in false acceptance) for speaker recognition

Combining several techniques like voice recognition, face recognition, fingerprint or palm print recognition, hand geometry, iris or retina recognition it’s possible to mutually compensate the deficiencies of each particular technique. This should greatly help to solve on a midterm lapse the problem we talked about above: computers will finally be able to recognise us (maybe).

Anoto Pen: Diagram describing the elements that make up this digital pen.
Source: As can be seen in Anoto's website.

On the other hand, multimodal interfaces try to solve the problem of adapting the computer to the user instead of the contrary. They do this by combining several information input and output techniques together with advances in tangible interfaces (that we talked briefly about in issue 135) whose aim is to convert the objects in our environment into elements of digital interaction.

For example NISChart developed by Natural Interaction Systems is a system oriented to clinicians that combines voice recognition with conventional writing using the good old paper forms using Anoto, a digital pen that, besides writing normally as a usual pen, it’s capable of detecting the movements you are describing on the paper, thanks to certain marks printed on the same, and send it to the computer.

The system allows the practitioner to enter writing, annotations, check marks and, in general, anything that is normally entered in a standard form, besides storing the spoken comments of the clinician . The information gathered along with the writing, symbol and voice recognition are transferred to an application that uses semantic and contextual analysis to fuse and disambiguate the contents finally populating a database.

This way the clinician performs his/her work the usual way, computers don’t interfere in the relationship with the patient (or, at least no more than the pen and paper does) and you get all the advantages that databases and informatics offer. 

Multisensor systems give support to all this network of synergic multibiometric and multimodal combinations.

Nowadays most of the applications are mostly bimodal instead of multimodal, due to the complexity of integrating so many different techniques. In the next few years we’ll see the blossoming of many real multimodal experiences that seek to add the expressive richness of human interaction to the digital world.


See also:

Links of this issue:

http://bias.csr.unibo.it/fvc2002/results/resultsAvg.asp   Test of fingerprint recognition algorithms
http://www.nist.gov/speech/tests/spk/index.htm   Test of speaker recognition algorithms
http://www.anoto.com/   Anoto, digital pen
http://www.infovis.net/printMag.php?num=135&lang=2   num 135 Ambient Devices
http://www.naturalinteraction.com/nischart.html   NISChart
http://www.naturalinteraction.com/   Natural Interaction Systems
http://www.infovis.net/printMag.php?num=118&lang=2   num 118 Attentive User Interfaces
http://www.trustedreviews.com/article.aspx?head=45&page=473   Comparative review of digital pens
© Copyright InfoVis.net 2000-2014