Museums and the Web 2005
Screen Shot:  Interaction of Web server...

Reports and analyses from around the world are presented at MW2005.

1-800-FOR-TOUR: Delivering Automated Audio Information through Patron's Cell Phones

Matthew Nickerson, Southern Utah University, USA


Many museums around the world rent audio players to their visitors to provide automated tours delivering pre-recorded information about their exhibits. Though generally pleased with their patrons' responses to automated audio tours, museum administrators find that hosting them can be expensive, time consuming, and frustrating. Ongoing advances in mobile wireless technology provide an alternative to the current cumbersome method of renting sound players to museum visitors. Many patrons bring their own 'digital sound player' with them in the guise of their personal cell phone. As cell phones proliferate and the price of calling plans falls, cell phones may present a viable alternative to current audio tour systems. History Calls was a recent experiment using VXML technology to deliver an automated audio museum tour directly to patrons' cell phones.

Keywords: cultural heritage, automated tours, voiceXML (VXML), wireless, cell phones


Automated audio tours are a popular resource at many cultural heritage sites around the world. These types of tours were first introduced more than two decades ago using personal audio cassette players. Many patrons enjoyed these personal tours allowing them to enjoy sites more privately, without a human guide or docent, and letting them move along at their own pace. Typically a patron would rent a player for a fee and then follow a proscribed route through the exhibit, accessing the audio information periodically at predetermined locations. These types of audio tours were very successful, and through the years they became regular features at museums all over the world.

In the 1990's, the technology improved as museums began to employ digital files and players. Automated tours incorporating digital sound were no longer held captive by the sequential nature of analog tape. Patrons could wander through exhibits accessing information in any order they pleased. Using digital players, the private and personal nature of automated museum tours increased dramatically, and the popularity of this service continues to rise (Martin, 2000).

Though popular with patrons, managing these types of tours is a major undertaking for the cultural heritage institutions that choose to employ them. The digital players for audio tours require constant maintenance for repairs, recharging, updates, and replacement. Whether the service is provided solely by the institution or is shared/sub-contracted with a private provider, the upkeep of automated audio tour machines is time-consuming and expensive. Respondents to a 2001 survey of museums indicated the largest disadvantages of automated tours were installation costs and the challenges of equipment malfunction (Smith, 2001).

This query, posted by a museum professional to a museum e-list hosted by The British Interactive Group, reflects some of the most common fears and frustrations of museum professionals when considering an audio tour:

I've heard of some bad experiences from users of this sort of kit - hard to use/control from visitors point of view, easily broken/lost from the organisation's point of view. I would be really grateful to hear from anyone with good experience of a particular system. Or bad experience/systems to avoid. Also management of them - a deposit system? Charging?

(British Interactive Group, 1999)


The main source of cost and frustration with maintaining a museum audio tour system is the rental and continuous upkeep of the audio player hardware. Large exhibits can require hundreds of these machines. Thousands of patrons of all ages, interests, and technological know-how rent and use the players, carrying them throughout the museum or gallery every day. Commercial vendors have developed heavy-duty players, but accident, careless handling, and absent minded patrons still result in damage and loss. Exhibit information is stored directly on to each machine, so updates must be accomplished individually. Sophisticated storage racks can facilitate the updates and recharging en masse, but maintenance is still a costly and time-consuming operation.

In reviewing this important and popular museum service, it seems one obvious way to alleviate much of the time, cost, and frustration associated with hosting audio tours would be to get museums out of the player rental business and distribute the tour through the patron's own machine. Technological advances over the past few years now make this a viable option. Hand-held wireless devices of all shapes and sizes are already accompanying museum patrons on their visits. Why not capitalize on the most ubiquitous of these machines by delivering automated audio tours directly to visitors' own hardware: their personal cell phones?


The growth and popularity of cell phone technology is astounding. There are an estimated 1.5 billion mobile subscribers worldwide, with 2004 sales estimated at 648 million units (ITfacts, 2005). With the widespread availability and popularity of wireless telephones, this technology seems a logical option for testing a new method for audio tour delivery.

This case study describes a proof-of-concept project for delivering automated audio museum tours via visitors' cell phones. Based on its success in commercial applications, we selected VoiceXML (VXML) as the central development tool for the project. VXML is an XML-based markup language designed specifically to implement interactive voice dialogs. Its main function is to describe the user interface; that is, the exchanges of requests and information between the caller and the application. These exchanges, or dialogs, facilitate communication between two very different world-wide networks: the telephone system and the Internet. VXML dialogs feature several inputs/outputs, including synthesized speech, digitized audio, voice recognition, and DTMF (touch tones). VXML dialogs are described by documents (programs) that reside on a Web server and work in concert with a voice server. The voice server receives/translates voice input and also creates computer-synthesized voice messages (Larson, 2003).

Screen Shot:  Interaction of Web server...

  Fig 1: Interaction of Web server, voice server, and telephone

The gateway on the voice server allows these inputs/outputs to be processed through standard telephony technology. The gateway is the bridge between the phone system and the Web-based VXML dialogs.    The voice server is the second key component in a VXML system. Though VXML is an open language, to build a voice server usually requires the licensing of proprietary systems such as voice recognition software and voice synthesis engines. A simpler method for acquiring this type of voice capability is to purchase a turnkey system, several of which are currently on the market. There are commercial service providers as well that host VXML applications, either in full or linked to a private Web server. As the use and need for VXML grows, more commercial options will be available to developers.

The voice recognition process within the voice server is simplified through the creation of VXML grammars. A VXML document defines a grammar for each point in the dialog, enabling the speech recognizer to work very fast and efficiently. The grammar defines a finite vocabulary specific to the VXML document and its application, preparing the voice server to respond to only a few words rather than thousands.

History Calls

A VXML system, as described above, was used to create an automated audio tour for an exhibit of historic photographs in the gallery space at a university library. The twenty photographs in the exhibit were hung in traditional museum fashion. Creation of the automated audio tour followed a four step process: 1) research, 2) preparing the audio, 3) creating the VXML document, and 4) linking to the voice server.


Substantial research was conducted to learn about the people and events depicted in the twenty photographs in the exhibit. The photographs covered 100 years of local performing arts history dating from 1901 to 2001. Significant information was found at the public and university libraries, and because of the local focus of the photo exhibit, community members with first-hand knowledge of the people and events were also consulted.

The information gathered through the research process was used to create description placards for each photograph, a three-fold exhibit brochure, and the script for the audio tour. The tour script consisted of one or two paragraphs per photograph, each translating into 30-60-second recordings when narrated.

Research also discovered important primary documents within the library's extensive oral history collection. A review of the transcripts uncovered several interviews directly relating to the people and events illustrated in the photo exhibit. In addition, several individuals depicted in the photos were still living in the area; they were contacted and interviewed regarding their memories of the pictured events. These interviews were recorded, and the digital sound files, along with the existing historical recordings, became an integral part of the final product.

Preparing the Audio

Because sound was the crux of the project, special care was taken to collect and capture the best possible audio in terms of both sound quality and content. Three types of audio were designed into the tour and delivered via the VXML document and voice server. First, the prepared script for each photograph described above was read and recorded as an ndividual sound file. To create sound of the highest quality, a professional actor served as the narrator for the texts and the recordings were made in a professional studio. In addition to the narration for each photo, the script also included introductory remarks and brief instructions on how to control and navigate through the automated tour.

The second type of sound files used in the tour was excerpts extracted from oral interview recordings. Short, 30-60-second clips directly related to individual photos in the exhibit were cut from the longer interviews and saved as individual sound files. The new interviews, conducted specifically for this project, were recorded in the studio to ensure high sound quality.

The older recordings, some dating back to as early as the 1940's, were also edited to provide short clips relating to the exhibit. It was found that the age and recording medium of these older interviews often made the sound quality unacceptable. In these instances, digital sound editing software was used to clean and improve the selected cuts for use in the tour. It is the inclusion of these first person narratives in the phone tour that gave the project its name,

History Calls

The third audio type used in the tour was not pre-recorded; rather, the VXML document and voice server produced the sound as a computer synthesized voice. The text for these computer generated comments was written into the VXML code and then translated into audio by the voice server.

Creating the VXML Document

The VXML document controlling the tour was created and stored on a local Web server along with the audio tour sound files. Before beginning to write the VXML code, it proved very helpful and instructive to diagram the options and paths that would be available to the user. This flow chart of the tour incorporated not only the introductory text and the information on each photograph but also the pathways and use for several support sound files, including error/repeat warnings, redundant help messages, a looping sound file to indicate a waiting mode, and the introduction and instructions for the on-line user survey.

The help and instructional messages were included as text messages within the VXML document and were rendered by the voice server in computer synthesized speech. This convention helped distinguish aurally between the mechanics of the tour and the narration for the exhibit. The synthesized voice immediately indicated that the listener had exited the tour narration and was now interacting with the navigational/help module.

Individually numbered sound files reflecting the numbered photos in the exhibit, together with the accompanying navigational text messages, served as the foundation for constructing the flow chart. Once the main trunk was complete, the ancillary branches were added. These branches included connections to help information, error warnings, prompt messages, and the on-line survey.

Like a movie director's story board, the flow chart served as a step-by-step outline of the entire program. The programmer's job was to create in VXML the paths and links indicated by the flow chart. As with most projects, not every eventuality was covered in the initial flow chart, and the designer and programmer worked together to iron out details as the project came together. Because VXML is designed for voice applications, the tags facilitate the creation of voice menu systems and their inherent support structure. This project utilized many of the useful elements VXML makes possible; for example,

  • The system signaled the user that s/he was connected by playing exhibit appropriate music. Main Menu choices or the help selection could be contacted over the music at any time.
  • Menu and other navigational decisions could be input either through voice or DTMF (phone touch pad).
  • If the user did not provide input within a predetermined waiting period, the program would prompt for it. After three different prompts, each more pointed than the last, the program returned the user to the Main Menu.
  • Voice inputs that were unrecognizable prompted another set of nested help messages, each asking the user to repeat the choice, and concluding with the suggestion that the user now try the phone's touch pad.

Linking to the Voice Server

The voice server for this project was provided by BeVocal, a leading VXML hosting service. BeVocal, like other commercial VXML providers, offers various plans according to the size and complexity of the project. In comparison to commercial applications, History Calls was a rather small operation and only required a minimal hosting package. BeVocal provided 24/7 access to a voice server via a single toll-free number with a limit of 5 simultaneous callers. They also provided limited tech support as well as several on-line tools for testing and debugging the History Calls VXML document.

Once the final VXML document was finished and saved on the local Web server, the BeVocal account was created and linked to the local document. Once this link was established, the BeVocal gateway was immediately available to accept incoming calls from patrons at the photograph exhibit. A standard transaction incorporated the following steps: A patron call to the toll-free number was accepted by the gateway, translated, and forwarded to the voice server. The voice server interpreted the voice/DTMF input from the gateway and forwarded this information to the VXML document on the local Web server. The VXML code determined the appropriate response which was sent back through the voice server and translated by the gateway to be received by the patron. Initial tests of the system revealed a few minor glitches that were easily corrected. Changes to the VXML document were effective immediately, as were corrections and updates to audio files.


Both a print and an on-line survey were made available to all patrons at the conclusion of their visit. Like the tour, the on-line survey was an automated audio experience administered through the user's cell phone. The audio tour dialog contained several prompts inviting patrons to respond to the audio survey as part of their tour experience. These prompts appeared at the beginning, during, and at the conclusion of the audio tour. Evaluation questions were delivered aurally and patron responses received either through voice or DTMF input. The survey responses were automatically entered into a database for evaluation. A unique feature of the on-line survey was an opportunity for the patron to offer an oral comment on the exhibit. This voice message was recorded and saved to the data base as a digital sound file.

A print survey was also made available to all visitors to the exhibit. An advantage of the print form was that it reached patrons whether or not they took advantage of the automated phone tour. In this way, general data concerning patron reaction to the exhibit was received as well as important data relating to why they did or did not choose to try the automated audio tour.

Evaluation of the History Calls project revealed several strengths and weaknesses. As predicted, the upkeep of the tour was simple and very low maintenance. Changes could be made to the server quickly and efficiently, and these changes were immediately implemented on the user side.

The first person narratives within the audio tour were a big hit with patrons and received a great deal of positive feedback on the exhibit surveys. Though not often found in audio tours, programs that use a variety of voices and perspectives can add interest and warmth to the audio experience (Schwartzer, 2001). Admittedly, this type of audio could be used in any audio tour system, not just those delivered through cell phones, but hearing the first person narration through their own telephones added to the reality and immediacy of the patrons' experience in a way that would not be possible using the digital players currently in use.

Both Voice and DTMF input were tested by patrons. In the 'quiet' environment of the exhibit space, most patrons were reluctant to use the voice input option and relied primarily on their phone key pad. Of those patrons who used the phone tour and responded to the survey, one-half relied solely on DTMF to interact with the system and never used the voice interface. Though shunned by these patrons in the library gallery, it is suspected that voice interaction would prove more popular in outdoor venues (zoos, historic sites). Voice interaction is also a powerful accessibility tool and an important benefit to handicapped patrons with limited mobility.

Though not encountered in this test, some structures are not conducive to cell phone reception. The difficulty of using cell phones inside some buildings is another factor in favoring outdoor venues for cell phone tours.

A valid concern understood from the outset of the project was the additional cost some patrons might incur by using their cell phone as the audio tour playback device. Only one respondent to the survey indicated that concern about cost prevented him/her from using the phone tour. Three other respondents indicated that this concern limited their use of the tour. Patrons seemed to have widely different service plans, and overall, concern over cost and roaming was limited. At present, the expanding market and strong competition among cell phone providers is leading toward lower costs and increasing minutes. It is not unreasonable to expect that concerns over the cost for these types of phone services will diminish in the future as service plans become even more generous.

Evaluation of the VXML technology and phone tours specifically was very positive. All of the respondents to the automated survey indicated that the tour had improved their experience at the gallery and that they would take advantage of similar cell phone tours if offered at other cultural heritage institutions.  


Cell phones, PDAs, and other wireless hand held devices, as well as new hybrids combining these technologies, are radically changing the way many people communicate, interact, and participate in the world around them. This rapid development and innovation in wireless technologies portend a rising need and opportunity to take advantage of these advances to explore new ways of distributing educational and cultural information. One area of focus, explored here, is adapting these technologies to deliver high quality, accurate, and engaging information directly to visitors' cell phones at museums, galleries, zoos, aquaria, botanical gardens, historic sites, and other cultural heritage venues. History Calls is one such project.

The conversational mode employed by patrons as they interact with a VXML system is an engaging and powerful tool for making cultural heritage come alive and for bringing history into the present. The voice input made possible by VXML allows patrons not only to view and learn about cultural artifacts but to talk to them as well. The simple change from digital player to cell phone immediately alters the relationship between the patron and the audio information, and between the patron and the artifact.

Letting the museum/site create and host the audio content while patrons supervise the care and upkeep of the players has many benefits. Beyond the specifics of the History Calls project, this initial study suggests several general benefits from employing a VXML-based cell phone tour system, including:

  • Reducing the expense, time, and frustration associated with maintaining audio players and their support systems
  • Eliminating concerns over player damage, loss or theft
  • Reducing the anxiety some patrons feel when using rented or unfamiliar hardware
  • Simplifying daily updates and other changes in recorded information
  • Expanding the use of audio tours into large outdoor venues where renting machines might be problematic
  • Offering voice interactivity to patrons with disabilities.

Though this experiment was conducted in a museum/gallery setting, the VXML system used in the project could easily be adapted to a host of other cultural heritage sites and applications. The primary goal of the project, to use patron hardware in delivering an automated audio tour, proved successful and invites more research and trials of the use of VXML and cell phone-based systems.


British Interactive Group (1999). "Audio Guides". Anonymous highlight from electronic discussion list. consulted October 17, 2002. formerly available [link no longer active March 19, 2005.]

Moskalyuk, Alex (2005). Mobile usage. ITFacts, January 18, 2005. consulted March 19, 2005

Larson, J.A. (2003). VoiceXML: Introduction to developing speech applications. Prentice Hall, New York.

Martin, D. Audio guides. Museum Practice, 5.1, 71-81.

Schwarzer, M. (2001). Art and gadgetry: The future of the museum visit. Museum News, July/August, 2001. available

Smith, M. (2001). The survey highlights, In M. Schwarzer, Art and Gadgetry. Museum News, July/August, 2001, 9.  consulted  March 19, 2005

Cite as:

Nickerson, M., 1-800-FOR-TOUR: Delivering Automated Audio Information through Patron's
Cell Phones, in J. Trant and D. Bearman (eds.). Museums and the Web 2005: Proceedings, Toronto: Archives & Museum Informatics, published March 31, 2005 at