Entering Through the Side Door - a Usage Analysis of a Web Presentation
Information Retrieval Patterns from Hypermedia Presentations
Public and private organizations are increasingly using Web (World Wide Web) presentations for dissemination of information about their organization and services. Web presentations, structured as a set of inter-linked pages, are supplemented with orientation aids such as indexes, navigation bars, and/or context maps. Guidelines for Web site design recommend both hierarchic structures and the use of page descriptors. A hierarchic structure facilitates theme development through initial presentation of thematic context followed by details placed on pages within the structure. Page descriptors enhance access to the information when using Web search engines. It is believed that the use of hypermedia technology will facilitate information gathering (Bush, 1945; Nelson, 1967; Shneiderman, 1992).
For the general public, gathering information entails retrieval of interesting document sets. Dierking & Falk (1998) and Futters (1997) report that a primary motive for both traditional and virtual museum visits, is a personal interest in a topic the visitor expects to find. Dierking & Falk note further that users of technological exhibits differed from the general museum population only in a somewhat more specific topic interest. From these studies, one would anticipate that Web users would search museum sites for specific information, rather than use the site for general browsing.
Providing effective support for information gathering from hypermedia presentations requires an understanding of how the intended public retrieves information. However, little is known as to whether the information in Web presentations reaches its intended audience or whether the recipients receive the information that they need and/or have requested (Day, 1995; Futers, 1997). Researchers anticipate a number of problems. As the number of inter-linked documents and path selections increases, user disorientation and cognitive overload may hinder information gathering (Conklin, 1987; Preece, 1994). Link structures may actually hinder location of specific information (MacKenzie, 1996). Hierarchic structures with detail pages, with or without descriptors, can invite search engine entry through the 'side-door', i.e. directly to information within the presentation structure. A virtual visitor can quickly get lost, loose interest, and go away, without discovering the context of the retrieved information.
Actually, little is known about how Web users retrieve hypermedia information. How is the information selected? How much is viewed? How long does the user spend with an information presentation? Answers to these questions will enable information providers to improve presentations and tailor presentations to different recipient groups.
Our study is based on the page selection log of sessions for a hypermedia exhibit of social science topics. The data have been collected over a 3 ‡ year period in which the exhibit has been located at 3 successive sites: as a off-line exhibit in a natural science museum, as an information kiosk in the information area in the school of Social Science, and on the Web. The goal of the study has been to gather information about user behavior at hypermedia exhibits in order to develop a framework for the design of hypermedia presentations. We observed that Web users are significantly more thematically focused than users of the off-line systems. Most, 85%, of the Web sessions were started from a search engine request using a keyword search. Most of these, 76%, started at a detail page. Sessions starting at a detail page were significantly (p=0.00000003.2) shorter than sessions begun at the exhibitís start pages.
A Study of Hypermedia Usage
We have studied information retrieval patterns for users of a small hypermedia exhibit, 1st implemented as a off-line exhibit, placed in the Museum of Natural Science in August, 1996. The exhibit was later moved to the School of Social Science to be used as an information kiosk and, after translation to English, was placed on the Web at http://nordbotten.com/museum.
Since the goal of this project was to study the information retrieval patterns of the general public, no recruitment of subjects was preformed. The exhibit has been available to anyone who visited the museum during 1996-1997, the School of Social Science in 1997-1998, or searched the Web from January 1999. Information searchers and browsers determined if they would activate the exhibit, which topics they would see, the number of exhibit pages they would retrieve, and the length of time they would spend. Three samples of user populations, summarized in Table 1, were studied.
Visitors to the natural science museum that housed the exhibits for the university anniversary celebration were the subjects of the 1st study population. We have assumed that the exhibit users had the same age distribution as the general museum population, i.e. that youths dominated during the school year and that adults and tourists dominated during the summer months.
Users of the information kiosk in the School of Social Science, were assumed to be seekers of specific information. We anticipated that this could be observed in topic selection sequences and the use of links to specific detail information. We further anticipated that the social science visitors would find particular interest in an exhibit developed by researchers in the social sciences and thus reflect those Web users seeking specific information.
Only 5 of the Web users, 4%, submitted the exit survey. Of these, 60% were women, 60% were over 40, 40% were under 20, none were young adults, 20 to 40 years old.
Our anticipation that the information retrieval patterns of Web users would be similar to museum visitors was based on the expectation that both groups contain casual browsers, looking for something interesting, as well as goal-oriented seekers of specific information.
The exhibit consists of six topic presentations developed by researchers in the social sciences. Each topic presentation is formed in a hierarchic structure, introduced with a general description page containing embedded text and image links to a set of inter-linked detail pages describing particular aspects of the topic. Most topic pages include 2 images with accompanying text, as shown in the example in Figure 1. Each page is thematically self contained and there was no scrolling on the implementation machine, which was operated by a touch screen. Topic presentations ranged from 2 to 8 pages.
All topic pages contain a navigation bar with 5 buttons for <exhibit index>, <topic start>, <next topic page>, <previous topic page>, and <exit>. The <exit> button calls a questionnaire, requesting the following data from the viewer: gender, age group (<20, 20-40, and >40), whether the exhibit was difficult to navigate, and whether the exhibit was interesting.
In addition to the topic presentations, the exhibit contains a cover page, a 2-level hierarchical index, and an overview index. The later, shown in Figure 2, is accessible via the top button on the navigation bar on each topic page. A time out, set at 45 seconds, assured that the off-line exhibits were restarted at the cover page if a user left the exhibit without using the <exit> button.
The <exit> questionnaire had to be discontinued during the museum exhibit because of usage problems. It was reinstalled for the move to the social science information center. However, though the <exit> questionnaire was selected in 53% of these sessions, no questionnaires were submitted. As noted above, only 14 of 180 Web site visitors, 7%, submitted the questionnaire.
The exhibit was initially implemented on a off-line PC with touch screen input using a WebSite™ server with a Netscape™ browser. User sessions were initiated by touching the cover page causing a transition to the theme index, identical to the left 2 columns of Figure 2. Thereafter, pages were selected by touching an active image, text, or button link. The 2-level index structure required 2 index page selections to reach the 1st topic selection, giving a minimum path length of 3 in a topic session. Sessions completed on return to the cover page.
Changes made for the Web presentation include: content translation to English, elimination of the time-out feature, and direct access to the overview index from the cover page rather than the original route through the theme index. This last change reduced the path length from the cover to a project presentation from 3 to 2 pages. The WebSite™ server administers the Web site. Access to the Web exhibit is open in the sense that, in addition to access through the cover page, users can access detail pages directly by using Internet search engines.
Placement of the museum exhibit as a Web site has allowed us to test our underlying hypothesis that users of off-line and Web presentations would use similar information retrieval patterns. The Web location was announced through research workshops and conferences, research papers, and as links from the authorsí home pages. In addition, 7 Internet crawlers visited the site over 500 times during the study period.
Data Analysis Procedures
Log data were collected for 4 periods: 2 from the museum exhibit in the fall, 1996 and summer 1997; from the social science information kiosk during the fall, 1998; and from the Web exhibit during the whole of 1999. The browser cache for the off-line exhibits was set to null, ensuring that each page was logged. The Web exhibit log contains only initial page selections, i.e. user backing through his/her local cache is lost in this log. The log data used for this study includes; the name of the requested page, the date and time for its selection, the name of the calling machine, the previous exhibit page, and for the Web exhibit, the initial search string.
Data preparation for the analysis of information retrieval patterns included separation of sessions, generation of a page transition matrix, and calculation of session length in time and number of pages. Development and maintenance sessions were excluded. Session times were dropped from the off-line to Web comparisons due to the characteristics of the Internet that cause long page construction times compared to the off-line presentations.
Table 2 gives the central definitions used for the analyses. Note that off-line sessions were initiated by a transition from the cover page since all pages were called from the host machine. Web sessions were identified by a change in the calling machine and/or a change in date.
Table 3 gives a summary of the session characteristics for the 3 study populations.
Table 3. Session Profiles
About 35% of the 1207 sessions contained no topic selections or, for Web visitors, only the initial page retrieved by the Web search engine. It was assumed this indicated that the exhibit was considered uninteresting. These sessions were eliminated from further session analysis, leaving 765 sessions in which at least one topic was selected.
Less than 50% of these sessions contain more than 1 topic. On average, less than 4 topic pages were selected during a session and less than 3 pages were chosen per topic. Topic interest, measured in number of detail pages selected, fell from 70% to 50%, 38%, 31% for the 2nd to 4th pages, respectively (Nordbotten & Nordbotten, 1999).
For visitors to the museum exhibit, topic selection was strongly correlated, 0.9, to topic placement in the indexes. That is, topic selection was top-down in the index choosing most frequently the 1st topic in the 1st theme followed by 1st topic in 2nd theme, and so on. 30% of the sessions contained more than one topic. Once a topic was selected, 80% of the detail page selections were selected using the <next> button, indicating that the presentation was read in a serial manner.
Visitors to the information kiosk showed more interest in the exhibit content than the museum visitors. Most sessions, 78%, were topic sessions. Initial topic selection was less dependent on index placement, correlation 0.78. 40% of the sessions contained more than one topic. 45% of the detail page selections used the embedded links. The length of the sessions and number of pages viewed per topic also increased. One characteristic of significant difference, p=0.000009, is the use of the embedded links for detail page selection.
Web site users
Most Web sessions, 85%, began from a keyword search using a search engine, the rest started from direct input, benchmarks, or links from other pages. Most of the search engine starts, 76%, began at a topic page. The percentage of Web visitors who selected topic sessions was less than off-line users, particularly when compared to the kiosk users. This can indicate that Web visitors were looking for specific information and were able to immediately identify the relevance of the retrieved exhibit page.
The start page of a topic session significantly influenced the session profile. Only 13% began at the exhibit cover page. These sessions were relatively long, significantly, p=0.00000003.2, longer than sessions that began at a detail page, and rich in content. They averaged 5.0 unique topic pages and 1.9 topic selections. 30% contained more than 5 (up to 13) topic pages with up to 5 topic selections.
There were 90 topic sessions that began at some detail page, thereby entering the exhibit ëthrough the side doorí. Of these, 40% began within a topic presentation hierarchy, thus missing the topic content given in the presentation page. Half of these selected an index, presumably to gain a context for the original page. Both topic sessions that began at the presentation page and those starting within the presentation, averaged 2.9 topic pages. Only 8 sessions contained more than 1 topic selection and only 6 contained more than 5 topic pages, ranging up to 8 pages.
Only 14, 12%, of the Web sessions exited using the <exit> button that presented the viewer with a simple demographic survey. Only 5 survey responses were submitted, which is insufficient to give any picture of the Web users.
From the above, we can identify two Web user groups, those who search for information about a specific topic and those who browse within a general exhibit. For each group, we sought information about how topics were selected, how topics were navigated, and the length of a retrieval session. A summary of these characteristics is given in Table 4 below.
Table 4. Information Retrieval Characteristics
In this study, we have had an opportunity to study information retrieval characteristics of 3, potentially different user groups, at 3 different locations, in a museum setting, at an information kiosk, and on the Web. Our goal has been to identify characteristics that can help support the design of effective information presentations. Particularly, we have been interested in the possibility of using off-line designs as tests for Web site designs.
Observations and Proposals
The similarities in information retrieval behavior between off-line and Web users, including:
indicate that hypermedia topics should be relatively short and built in a ënormalí reading style. Further, off-line users select topics from the top of the index, which indicates that particular attention to topic sequence is important.
While Web users tend to read selected topics in a serial manner, they start their sessions following a search engine selection of some topic page, thus avoiding the index structure of the exhibit. These sessions are very short, indicating that many viewers may not find information without its context. It appears that designers of hypermedia Web presentations should focus of short, self-contained page sets, where any one could be an entry point to the presentation. Indexes are not necessary as entry guides, but can be useful for the interested viewer to gain an overview of the presentation content.
Similar Studies and Further Research
Our study supports earlier observations of museum users (Futters,1997; Dierking & Falk, 1998), in that Web users tended to select information on only 1 topic from our museum site, rather than general browsing.
Clearly, any study of user activity needs to be supplemented with demographic and intention data. Unfortunately, our attempt to solicit demographic information from users of the Web exhibit was unsuccessful. More work in this area is needed.
Other structures for presentations can be explored. It is possible to include all topic information in a single file with embedded links to detail sections. Two drawbacks to this structure must be considered, 1) the large page will bring unnecessary information to many viewers and will increase transport and set-up time, thus increasing markedly response times for the user, and 2) it will become more difficult to monitor topic interest and thus adjust topic presentation to the intended user groups.
In conclusion, it appears that studying off-line exhibits, where knowledge of user demographics is possible to obtain, can be used for design of Web presentations, particularly for those visitors that enter through the front door, at the top of the exhibit structure.
Design of exhibits for side door visitors remains a challenge.
This project was begun as part of 50- and 25-year anniversary celebration activities for the University of Bergen and the School of Social Sciences, respectively . Thanks are extended to staff, faculty, and students of the Bergen Museum, School of Social Science, and the Department of Information Science for their help in the construction and test of the electronic exhibits. Special thanks are extended to Professor Svein Nordbotten for all project support.
Bush, V. (1945). As we may think. Atlantic Monthly July 176 (1), 101-108.
Conklin,J. (1987). Hypertext: An introduction and survey. IEEE Computer, 20 (1),17-41.
Day,G., (ed.) (1995). Discussion. Proceedings. Museum Collections and the Information Superhighway. Science Museum, London. http://www.nmsi.ac.uk/infosh/discuss.htm.
Dierking,L.D. & Falk,J.H. (1998). Understanding Free-Choice learning: A Review of the Research and its Application to Museum Web Sites. In D.Bearman & J. Trant (Eds.) Museum and the Web 97-99: Special Edition Proceedings. CD ROM. Archives & Museums Informatics. 1999. Also http://www.archimuse.com/mw98/papers/dierking/dierking_paper.html
Futers, K. (1997). Tell Me What You Want, What You Really, Really Want: a look at Internet user needs. Mda. http://www.open.gov.uk/mdocassn/eva_kf.htm.
MacKenzie, D. (1996). Beyond Hypertext: Adaptive Interfaces for Virtual Museums. TAMH Project Report, Tayside, Scotland. http://www.dmcsoft.com/tamh/papers/evaf.php3
Nelson, T.H. (1967). Getting it out of our system. In G.Schechter, (ed.) Information Retrieval: A Critical Review. (pp. 191-210). Thompson Books.
Nordbotten, J. & Nordbotten, S. (1999) Search Patterns in Hypertext Exhibits. Proceedings of HICSS 32, Maui, HI, USA, Jan. 4-8, CD, IEEE ISBN 0-7695-0001-3.
Preece, J. et.al. (1994). Human-Computer Interaction. Addison Wesley.
Shneiderman, B. (1992). Designing the User Interface - Strategies for effective Human-Computer Interaction (2nd ed). Addison-Wesley.