Published: March 1999.


A Model to Support Literary Research Collections on the World Wide Web

Colleen Phelan, City University, United Kingdom and Micheline Beaulieu, University of Sheffield, United Kingdom


Within ever-expanding Web venues for digitized museum objects - and aside from a few major projects similar to those being developed at Cornell (Hickerson, 1997) and the Library of Congress Manuscript Reading Room (URL: - there is little information delivery to researchers who wish to access museum library collections in all the richness that can be achieved through multimedia capabilities. In addition, the perhaps 'less exciting' objects that lack visual cachet such as those that comprise literary collections - the subject of this paper - are noticeably neglected. Whilst some literary libraries like the Jane Austen Collection (URL: present quite extensive guides and catalogue holdings on the web, full digitized representations of original materials remain sporadic. In addition to the cultural property considerations which prevail, this situation may be due to the daunting complexity of the materials and little knowledge about the researchers themselves.

This paper will discuss the diverse materials that comprise some literary research libraries with specific examples given from The Pickwick Papers collection held in the Dickens House Library, London (URL: A sample group of researchers will then be discussed in terms of their diverse backgrounds, expertise levels, tasks and their expectations of a multimedia interface. Next, using the results of a user-evaluation of an experimental hypermedia interface to The Pickwick Papers, suggestions for a variety of website designs are presented here as initial contributions to the development of an overall model for the presentation of literary research collections. Concluding remarks will discuss design problems and suggestions for further research.

The diversity of literary research collections

Investigations of three research collections of 19th C. literature in the UK - The Pickwick Papers (The Dickens House, London), Jane Eyre (The Bronte Parsonage, Haworth) and The Idylls of the King (The Tennyson Collection, Lincoln) - reveal similarities in types of materials overall but with prevailing diversities in content and physical formats as indicated in the following list: Primary materials:
  • Autograph mss.
  • Serialized issues e.g. monthly parts
  • First editions or rare copies (as bound volumes)
  • Illustrations
  • Plagiarisms and piracies
  • Adaptations
  • 19thC. reviews, critiques and essays in e.g. Blackwood's Magazine, The Athenaeum, etc.
  • Advertisements
  • Autograph letters
  • Paintings, drawing e.g. portraits of author and contemporaries; topography, etc.
  • Photographs (as original media)
  • Collected Editions (sets of bound volumes)
Secondary materials
  • Special editions e.g. student editions (with scholarly introduction and annotated text)
  • Journal articles e.g. reviews, essays, etc.
  • Thematic treatments e.g. religious, psychological, topographical, etc.
  • Background information on the 19th Century i.e. political, social, cultural and artistic
  • Biographies
  • Geneologies i.e. family trees, histories, etc.
  • Indexes, encyclopedias, glossaries
  • Adaptations (radio, TV, film) including scripts; audio and video recordings
  • Bibliographies
  • Library catalogues of related collections e.g. The British Museum
  • Exhibition catalogues
These materials bear complexities that are trademarks of the collections; individual items cannot stand alone and each comprises a multi-facetedness that creates an interconnecting web of relationships. For instance, an autograph letter can carry biographical and professional associations as well as temporal, topographical, historical and cultural attributes; relationships therefore can emanate from the correspondence collection to biographical publications, thematic treatments, broader background materials and to the picture collection. Additionally, dissimilarities prevail amongst sets of like materials; early editions of a work produced by the same publisher can vary in physical construction, format of text, illustrative plates, and in the wording of the text itself. A grey area that is a continual challenge to cataloguers, and an example of collection discrepancies, is archival material found inside books, either loose or pasted down; these can include letters, sketches, photographs, newspaper clippings, personal notes, etc. An ever-present dilemma, with policies varying from library to library, is whether to extract and catalogue these items separately or maintain the material as an integral element of the book (Sproat, 1998.)

Finally, missing materials can affect a collection's comprehensiveness and consistency, though adding rarity value to extant items. For example, autograph manuscripts, sometimes surviving as mere fragments, can be scattered throughout other institutions creating access problems for researchers.

Literary researchers, levels of subject expertise, and tasks

Researchers are an unknown quantity in some respects; aside from being dismissed as 'scholarly' there is little information about them and the tasks they perform, although some findings in the archival world (Pugh, 1982; Spindler & Pearce-Moses, 1993) and on the Web (Daniels, 1997) help to refute this. Our sample of literary researchers taking part in the study, though limited in numbers (fifteen) and in representativeness for a potential web audience (they have access to the Dickens House Library), reveals diversity in who they are, their subject expertise and their tasks - negating the myth that users of such collections are academics only. They are in several respects 'the general public' and the following list is a sample of the range of backgrounds represented:
  • Museum staff
  • Instructors including university, college and a-level/high-school
  • Students including university, college and a-level/high-school
  • Journalists, publishers and producers from the fields of newspaper and book publishing, theatre, TV, film, and radio
  • Performing artists including theatre, radio, TV, and film
  • Designers
  • Historians
  • Hobbyists and enthusiasts
Researchers' subject expertise falls into three main divisions - any one of which can include members from the above groups. At one extreme is the novice learner - perhaps a journalist - who is approaching the collection for the first time with little or no subject knowledge whilst at the other end is the expert - the curator - with in-depth mastery. A third and motley crew spans a murky mid-range with an expertise that appears to straddle the two opposite poles in various degrees. They may have a well-developed interest or expertise in one area whilst lacking knowledge in others - a screenwriter, for instance, who knows main characters and dialogue but is deficient in background information which could aid his interpretation.

The types of research tasks conducted are mainly interpretive; that is, researchers either gather and interpret the content for themselves as a learning exercise or for an audience, such as a readership. For instance, two of our researchers are 'absolute beginners' with learning goals, whilst several others are knowledgeable authors. Less frequently, tasks are conducted "to order" - for example, the simplest form of picture research where a specific illustration is accessed, perhaps by title, with little personal initiative required.

Within this framework, researchers transfer their expectations of a traditional collection's functionality to that of an IT interface; they utilize the system in three basic ways, as follows:

  • A tutorial system to aid and enhance users' learning
  • A 'traditional' library system to provide expected materials to users familiar with the content and new resources to enhance novices' learning.
  • A benchmark system which can provide authoritative examples of objects.
Researchers' tasks can range from one- to multi-dimensional, exhibiting a mixed-agenda of usage, and moving amongst the three functionalities. For instance, whilst a casual learner happily moves from topic to topic and is content with the system as an entertaining learning aid, an A-level English teacher, herself a subject novice, uses the system for two different purposes - first, to teach herself about The Pickwick Papers, exhibiting a careful and deliberate investigation, and secondly and simultaneously, as a library from which to choose materials for her students. Researchers who know the collection and seek specific items have high expectations and wish to find materials quickly, as they would in the library proper. For example, the exhibition designer expects an image index at the outset in order make appropriate selections of objects from the picture collection. A subject expert preparing a paper on topographical aspects demands not only bibliographical references but the full-text of materials he has used previously.

As a benchmark system, the interface has the potential to support synchronous in-house tasks where the authenticity, identification and provenance of recent rare acquisitions must be established, such as autograph letters, presumed first editions, or photographs. Instead of removing old and fragile materials from library shelves, digitized surrogates can act as touchstones for comparison, aiding conservation by reducing the need to handle original materials. For example, the museum librarian avoids possible damage to the rare Cheap Edition of Pickwick by accessing images of the text as an aide to cataloguing a recent acquisition.

19th Century Literature : suggested structures for web designs

The results from our user evaluation disclose a definite schism between information needs of subject novices and more knowledgeable users - not only a matter of depth and breadth but of type and quality of content. In addition, presenting a 19th C. text and attendant materials in a multimedia format for research-level tasks creates unique design dilemmas emanating from their constructions and often idiosyncratic, voluminous and variable content. The following sections provide suggestions for discrete designs that can support different user levels and tasks in conjunction with presentations of complex literary materials.

Terminology of index terms and menu design

Access to system content is through menus of index terms, or access points, acting as hypertext links to further information; choice of terminology and its arrangement within menus go hand-in-hand and its user-appropriateness can affect search progress, particularly at the outset (Norman, 1990.) This is crucial in CONTENTS which is generally a user's first point of contact. The challenge here is to provide terms that are understandable yet not patronizing for subject novices whilst remaining loyal to the expectations of more knowledgeable researchers. Where the latter approach the system with subject-specific frames of reference for broad terms such as 'topography' or the more dialectic 'Extra Illustrations', the former are left guessing, not knowing where these links will take them. Rather than implementing alternative terms, a frequently suggested solution is an optional pop-up note with a brief one-line definition as an aide to decision-making.

In terms of arrangements, learners prefer the semantic groupings of the terms in CONTENTS where access points for similar subjects are gathered together. However they require additional authority headings as the meanings behind such arrangements are not immediately apparent (even to the experts!) Arrangement of access points in an index should provide introductory information at the top as learners tend to investigate each term from top to bottom, at least in their first encounters with the system, aiding their learning of the basics before proceeding further. For example, the key elements of The Pickwick Papers are presented as the first grouping. However, which term should be presented first in this group in order to garner learners' attention? In this instance, "Illustrations" is appropriate and logical as it suggests visually interesting material and is integral to Pickwick and other of Dickens's works. The approach of other users to a CONTENTS index arrangement is one of expectancy rather than discovery, where they prefer to locate specific known information quickly via a scrollable alphabetical index.

In menu implementations, an important consideration is the growth of the system. Whereas a limited selection is presented in our prototype, the all-inclusive Pickwick collection would be extensive and its index entries would proliferate throughout. Given the varieties in users and tasks, the optimal provision is sets of individual indexes suited to specific approaches. As an attempt, in the current system the photograph of a pub can be selected by the name of the establishment or its neighbourhood. Even considering the small number of index terms, however, our results reveal that trying to provide this variable access in a single browsable menu is just a recipe for clutter and disorientation. As one subject expert states: "It's confusing. You have to think about it."

Introductory or background information

Novices expect a topical level of introductory information, i.e. without any scholarly references in the text, perhaps similar to that found in student study guides which provide clear outlines and easy-to-read text for the basics of a work. A more scholarly background may be too comprehensive and at times erudite for users who wish to conduct brief information gathering whilst first learning about a subject. Learners do want emphasis on salient elements; for example, where two different artists may have depicted the same scene, the Illustrations background should include brief biographies of the artists and critical remarks on their creations.

All users expect an indication of the parameters of the system content. Novices, in addition, require a glossary of dialectical terms associated with the subject area. For example, the autograph letters background should include definitions and illustrative samples of autograph, transcription and annotation so that these researchers can avoid confusion when they need to select between 'transcriptions index' and 'autographs index'. Our more expert researchers expect a self-contained reference library as introductory information. Aside from an authoritative scholarly overview as mentioned above, they need subject-specific dictionaries, glossaries, encyclopedic entries (illustrated if appropriate) and bibliographies. So perhaps rather than a 'background', each subject area could have its own 'reference section' instead.

Reproduction of full-text publications

Full text publications vary in content, style and intended audience. In a literary research collection these can range from interpretive treatments aimed at a diverse readership to scholarly critiques with voluminous footnotes and obscure Latin abbreviations. Whilst all of our users find the former acceptable, learners dismiss the latter as confusing and thus meaningless. Meanwhile, frequent library users familiar with the collection expect the complete range of secondary sources to be available in an IT system.

The importance of presenting contemporary secondary materials (e.g. 19th C.) cannot be overemphasized, as they provide exciting and enriching contextual perspectives that appeal to all levels of researcher. Although implemented for subject experts, our A-level teacher selects early journalistic reviews of The Pickwick Papers as an aid to her students' understanding of the writing styles, the social mores of the period and the public reception of the novel; in addition she believes they would give an exciting 'immediacy' to the subject that cannot be captured in later publications.

Temporal or chronological information

Chronologies are commonplace in historically-laden subject areas with presentations found in texts and multimedia. Our integrated and comprehensive implementations are useful to all of our researchers depending on their subject knowledge and the appropriateness of the topic. For example, The Pickwick Papers has biographical elements associated with Dickens's life and concurrent with his writing and other activities. Whilst subject experts expect this narrative-related information to be presented in a biographical chronology, learners require only his life and writing career. Depending on their content, some topics such as publication history can be decomposed and arranged in a unique array of voluminous temporal information and creating, as one researcher comments: "...the most comprehensive set of dates on Pickwick production yet." The current successful designs are arranged in traditional vertical columns of dates and events spread width-wise across the screen to illustrate parallel activities; more expansive designs require a horizontal scroll-bar as well as the standard vertical.

Access to related materials

All researchers demonstrate a desire to explore an object's associated information, both within the specific subject area and across the system. For example, the image of a Dickens autograph letter, as well as its transcription and an annotated version, have an interlocking web of links and, where handwriting is illegible, museum staff find these handy sources for deciphering correspondence, dating, etc. As an example of cross-system associations, a photograph of a street scene such as Whitechapel High Street can link to a London map, relevant characters, the original parts, journal articles and novel excerpts. This is particularly helpful information to an author who is writing a book on Dickens early career, enabling him to trace connections between Dickens' life and the biographical intimations in the novel. In terms of design, orientation must be maintained through spatial context, i.e. keeping the parent object on screen, either at its current size or as a 'thumbnail'.

Displaying components of a novel: what to present and how much

One of the beauties of multimedia is the ability to extract information from its parent source via digital scanning and to present it as a separate entity for easier access and examination. The 19th C. novel lends itself to such presentations due to its numerous and various attributes; these include the binding and pages, inserted ephemera, illustrations and individual parts and chapters, all of which can be voluminous. As an example, a bound first edition of The Pickwick Papers is over 800 pages long and is divided into 20 parts and 57 chapters; it includes 46 illustrations by three different artists, as well as a frontispiece and a vignette; inserted and bound-up with the volume is a cover from a first edition of the monthly parts. The story itself includes over 100 characters and at least 60 topographical sites as settings for its many episodic situations.

The main problem is to determine how much material to present online and to whom. One could argue that learners should be given a few well-chosen samples rather than full sets; this appears to be a matter of content. All expertise levels accept large numbers of visually-captivating illustrations; they either browse through or choose specifically from the index. However, learners prefer only two novel excerpts at the most in any one file, indicating that too much text, particularly several entries, can be overwhelming for those approaching a subject for the first time.

Comparative material is another matter. Some illustrations lend themselves to comparison and contrast, perhaps due to similar subject matter; whilst novices expect no more than two of the most prominent of these to be accessible and shown together, other users may expect all possible examples. For instance, an author of an article on a Pickwickian character, Joe the Fat Boy, requests examples of illustrations of him from the past 160 years - a task which is possible within the Dickens House library but which may seem insurmountable via digitized delivery. An equally acute problem is the presentation of novel excerpts where the same researcher demands examples of descriptions of the character from the very first monthly parts issues and numerous subsequent editions.

Navigation: supporting browsing and specific information needs.

The physical structure of the novel, and sets of objects, lend themselves to a simulation of 'paging through' or laterally navigating in a digitized format. For example, we presented elements of several editions of Pickwick in this manner, as well as illustrations, letters, manuscript fragments, photographs, map segments and novel excerpts. Our attempts were not as successful as expected, however, and this may be due to our own poor design or the confinement of the physical display (Marchionini, 1995); it may also be due to what appears to be an intuitive preference to search vertically through large sets in one file. A particularly successful navigation design is a scrollable 'side-bar' index which is presented simultaneously with a scrollable images index - a boon to learners when browsing through unfamiliar titles as the ability to view the text and picture together 'makes more sense' contextually. For other researchers, however, additional and more efficient finding aids are required for searching sets; suggestions are an alphabetized index and a search engine.

This keys into the provision for different approaches to material and a means to determine a researchers' needs at the outset. For example, a museum staff member may wish to select an illustration by title or theme and expect the groupings of images to be arranged accordingly. However, one of the idiosyncrasies of works like Pickwick are that their illustrations are conjoined with specific chapters; subject experts expect to search through the set of images or its index in this chronological order, which may be at odds with other required arrangements. Novices accept this series as well, as they have no preconceived approaches to the sequence of illustrations in the story; maintaining this design for them may provide instruction about the publication's production in an interesting visual manner.

Digitized images of museum objects

The advantages of digitization of museum collections on the web are by now well known - conservation of fragile and rare materials coupled with information delivery of those same objects to the widest group of users yet. However, can the expectations of all researchers be met in the degree of quality and access provided via digitized surrogates rather than by the 'real thing?' Researchers who have access to a library's original materials have high expectations for image quality. In fact, at least one frequent library user states that a screen version is 'too flat' and could never deliver the depth and detail he is used to. All but our two absolute beginners immediately detect imperfections in the illustrations from Pickwick, stating that they prefer to use the original - and knowing that they could. This is all well and good for those residing in Central London but Internet users in Washington state do not have that option. The optimum is to achieve and maintain loyalty to the original, including any imperfections.

In terms of examining objects, all researchers immediately search for ways in which to enlarge images; this is due in great part to reliance on that old favourite -the magnifying glass - and to previous multimedia experience where 'zoom' and 'thumbnail' are commonplace. However, researchers who are familiar with particular objects, e.g. the bindings of bound volumes, express a desire to view them in their entirety, as they would in the library, instead of as individual elements alone; in fact, the latter display is considered a secondary preference.

Image documentation

The suggested types of image descriptions are factual and contextual and the depth and range of interpretive information within these categories is variable, again depending on the expectations of the researchers and their tasks. The documentation can be akin to exhibition labelling as well as to record-keeping,

a. Factual information

Factual information consists of source, production, description and directional documentation. 'Source' or provenance documentation is standard in museum cataloguing and is required information by most in-house staff but extraneous to other researchers. In terms of 'production', the basics expected by all user levels consists of title/name of object, creator's name and date of production. However, subject experts desire more obscure production information, revealing a sophisticated multi-faceted approach. For example, text for a monthly parts cover, should emphasize the biographical and chronological relationship between its publication and that of the actual writing of the manuscript by Dickens. Likewise, an autograph manuscript should bear indications of its place in the day-to-day writing of the novel in order to offer a more rounded view of the author's writing activities. A dilemma occurs for a novice/mid-range user who's needs are variable, wavering between the obvious and the obscure. For example, a student of book-binding requires basic information connected with the materiality of the book alone, some of which may be extraneous to casual learners, such as 'name of binder'. Yet the breadth and depth of information for the same image in the main interface,with its associated relationships, may be excessive and irrelevant to this same user.

'Descriptive' documentation is required by researchers handling objects or studying them in-depth. For example, a rare books buyer for the museum needs extensive descriptions of material i.e. cloth or leather, decoration and imperfections for comparisons with potential purchases; the same information is useful to a scholar discussing comparisons between rare early copies for an introduction to a new edition of the work. In addition to this information, museum staff need clear indications of an object's condition and fragility as these affect handling and conservation decisions and procedures.

'Directional' information is expected by staff members, as well, as these can guide them to the originals in the library. Aside from standard classification numbers, they require negative numbers for quick picture retrieval; these users also suggest exact locations in the library, i.e. rooms, drawers, cabinets and shelves. Additionally, objects deposited in other institutions should be indicated. For example, all but one of the few remaining Pickwick manuscript fragments are scattered throughout USA collections. Providing such information, inclusive of catalogue numbers, phone, email and addresses can be helpful to researchers who have the opportunity to visit.

b. Contextual Information

Contextual information, akin to the interpretive labelling that often accompanies museum exhibits, aims to extract the image from its abstract factual state and present associations with other themes, thus providing information that can extend beyond the editorial. Balanced presentations remain a design challenge. The documentation required is variable depending on the content of the image itself and the users task. Overall, learners require no more than an unembellished description of theme or subject. A single Pickwick illustration acquires different contextual attributes, however, for other users. For instance, an actor expects descriptions of characters in terms of their physical characteristics i.e. style of dress, stance, speech and peculiarities such as Sam Weller's cockney speech. In addition, he requires novel excerpts, i.e. Dickens own words, to describe the character in order to aid his stage presentation. In contrast, for a staff member conducting basic picture research, captions should merely be concise and indicate a character's name, personality type, setting and situation in the novel. A scholar's approach may be entirely different to these, expecting documentation that describes relationships between characters which are not obvious to the untutored eye: "You need to describe how they look at each other; what lines are drawn between them."

Concluding Remarks

The design issues discussed in this paper are obvious basic first steps towards a hoped-for, more sophisticated, model that can accommodate diverse literary research collections; from within the vast universe of the web audience, users will approach such presentations with information needs that will outstrip those of our own small sample, making continual and necessary demands on designers.

Rising to the top of an agenda for further research, firstly, is the need to determine ways of supporting user diversity. Whilst novice- and expert-level tasks and needs are somewhat clear, at least for this user sample, mid-range users remain difficult to assess; one could say their requirements are more fluid and at this stage it is difficult to determine which types, levels and volume of content are appropriate. Is the introductory level of content in the novice interface too patronizing? Is information in the main interface too erudite and scholarly? Is there too little variety in the former and too much to sift through in the latter? Also, a consequence of separate user-interfaces is the need to determine when and how researchers can move between the two without disturbing the flow of search progress. Integral to this situation is the provision of user-appropriate search terms and indexes that can deliver terminology and arrangements that suit particular tasks. Development of an intelligent and sophisticated real-time user logging facility to query users, as and when they desire, could help in determining a user's needs and link him/her to a suitable search engine (Marchionini, 1995) ; in addition, the user could offer feedback on inappropriate or missing materials, providing designers with the information to modify the system.

Secondly, and synchronous with this issue is the challenge in delivering a functionality that can equal that of a traditional library. On-line access tools such as catalogues and glossaries have more than proven their worth, but will we ever be capable of providing the pleasure, excitement and the necessity of browsing through original and rare texts? Indeed there is now an element of 'the more you give them, the more they want'. As the curator of the Dickens House Library states during his Pickwick investigations: "I've got a full reference. Now I want to read the text!." Major efforts such as The British Library's Treasures Digitisation Project (URL: and user-controlled robotics such as Mechanical Gaze> at UC-Berkeley (URL: ek/MechanicalGaze) may offer solutions, giving researchers the ability to examine a text in its entirety, including page-turning and the option to select and enlarge elements at will.

A final issue that is difficult to overcome but ultimately affects the quality of the functionality of web delivery, is the ubiquitous diversity across research collections generally. A trademark lack of conformity in their arrangements is due to their idiosyncratic cataloguing schemes and limitations imposed by cultural property laws; additionally, some materials may be too fragile for handling in the course of digital scanning thus limiting access to the full complement of materials. Researchers may expect dependable and consistent presentation styles yet translating these diverse collections to a general model for web access would be an enormous task. Cooperative efforts, perhaps in the form of revised cataloguing policies with an eye to democratized digitized access (Hickerson, 1997), are thus necessary to overcome these limitations in order to cope with the growing and diverse demands of the web audience, better known as 'the general public'.


This research was made possible through the generosity of the staff of The Dickens House Museum,London and funding provided by a City University (London) School of Informatics Studentship.