Unifying our cultural memory: Could electronic environments bridge the historical accidents that fragment cultural collections?

in Information Landscapes for a Learning Society, Networking and the Future of Libraries 3, 1998. and presentation at UK Office of Library Networking Conference, July 1998.

David Bearman and Jennifer Trant, Partners, Archives & Museum Informatics, USA

(Section 8)

Using Information

Collation

Collation organizes information from various sources into a uniform conceptual order or schema maintained by the researcher. It requires knowledge of the structures of retrieved information resources in order to integrate them with the researcher's prior knowledge-base. At its simplest, this involves mapping data content structures (fields in the retrieved resource) to those in the research database - integrating a citation into a bibliography. More often, for research within a single discipline, this involves additional sorting of search results according to the values in a particular field, for example ordering botanical illustrations according to their genus and species. For less structured data, such as text files, it may require mapping of genre-typed headings or SGML DTDs. In an interdisciplinary environment, collation frequently requires the establishment of conceptual equivalencies between and among differing knowledge representation structures, such as is required to reassemble data about cultural objects from early nineteenth century France, from sources which associate them with the "Napoleonic Era" or "NeoClassical Style". For any of these tasks, metadata which documents the structure and syntax of the resource and its schemas in considerably greater detail than is required for discovery, becomes important. Participants at the 1998 Museums and the Web Conference were invited to populate a virtual model of Berlin in various time periods with representations of objects from their museum collections appropriate to the time and place. Needless to say, significant structural metadata and interoperability is required to make this possible, especially in a distributed environment.

Since the information resources being provided in our model are digital data, the retrieval process needs to have returned interoperable files and definitive metadata or the user will not be able to exploit the retrieved resources. In some cases, collation may necessitate file format conversion, character set mapping, code translation or term explosion. Only if the retrieved files are accompanied by metadata declarations, will they carry with them information the user needs to really use them. Without knowledge of the schemes employed in the intellectual construction of the retrieved resources and the data structures used to represent their content, the researcher will be unable to correlate them with other data to evaluate their information content.

This metadata makes it possible for the user to bring the new resource into the context of resources already available in the user space, and 'organized', indexed, or understood with respect to local schemas. Such "collation", which can involve various degrees of integration, is at least minimally necessary if the user is to be able to correlate the new information with existing knowledge in a usable way.

Analysis

Once coherent research knowledge-bases are constructed through collation of disparate resources, discipline-specific methodologies are employed in the extraction of the meaning inherent in the content of resources. The methods by which the underlying data were gathered, prepared and previously analyzed become critical issues in determining the validity of subsequent analytical methods, and metadata documenting these processes will be required if analysis is not to introduce artifacts. For example, the gamma of the capture of a color image is critical to its interpretation or comparison with other images. The methods of interpolation of census data are crucial to their combination with other datasets in a statistical test.²⁸

The knowledge representation processes that were applied by the creators or custodians of the information resource are crucial to the subsequent use of that resource in another context If this metadata was not delivered at the time of the initial retrieval, the user will need to go back to the provider to obtain additional meta-information. Just as the analysis process is iterative, the user may need to return to the provider many times to obtain different or more detailed metadata to support subsequent analysis and assessment. Having access to detailed documentation about the manner in which a source was represented may be critical to assessing its authenticity or its utility as evidence in a particular argument. As we are at a stage where capture and representation of analog objects in digital form is far from standardized, we can assume there will often need to be significant dialogue between the source system and the users' system about the data capture, knowledge representation, and documentation methods. We can expect there to be disconnects between the perspectives of a resource's creator and the disciplinary schema of a subsequent user. Bridging these will require an explicit declaration of the nature of the source, and a self-awareness on the part of the researcher of a his or her own process or methods.

Re-Presentation

The goal of the research process is to create new information - to provide an answer to the research question. Of course, in the process of analysis the user hopes to add new knowledge, establish a synthesis for himself, or create a revised schema that will bring new meaning to the previously collated information. If the researcher is successful, a new information resource will be created. The researcher will want to communicate her new idea, re-use the information obtained from the provider, and cite or quote or adapt some or all of the acquired resources. Additional metadata may be required particularly metadata about rights and permissions and documentation of the analysis methods employed.

The subsequent re-presentation of the information will be a new information package, and as such the user will need to create new metadata for it as well. If the communication involves publication of the new ideas, the published resource will, eventually, become part of the universe of information resources provided by others and subject to the same discovery, retrieval, collation and re-use cycle by other users.

Some genres of scholarly communication are discipline specific. Others are specific to the type of research process being reported, and all involve formalisms in the representation of knowledge. Much of the metadata sought by future researchers in the discovery process will be created at the re-presentation, or in the formal publication of results. While scholarly traditions have long demanded that some "metadata" be reported in footnotes (e.g. stanzas or line breaks, the edition used in word-frequency or textual analysis, the length of the light waves in medical diagnostic procedures, such as MRI or the density of laser beams) future researchers will need guidelines and methods to report this information in ways can be systematically used by others. The significance of this becomes particularly obvious when we consider types of distributed publication in which the new knowledge is in fact only the imposition of a new method, as defined by its metadata, on an existing information resource or data set. Such "publications" are being created today in electronic "collaboratories" and remote medical consultation.

Scholarship is a communal process of discussion and debate, involving links between resources and the addition of value by one work to those which preceded it. The linking mechanisms, such as citation, acknowledgement and quotation, use metadata and author-stated relationships at the level of the information resource itself. Its additive mechanism, such as annotation, criticism, summarization or indexing, often refers to specific content or structures within the individual information resource. These will become important in information discovery. Consider the simple example of a resource used in a particular context, such as a photograph that appeared on the front page of the New York Times on a particular day. Post-hoc metadata packets, created by others but linked to the photograph reflect subsequent re-use and provide further knowledge essential to an appreciation of its social significance.

NEXT: Metadata declarations and dialogue

PREVIOUS: Finding Information

Informatics: The interdisciplinary study of information content, representation, technology, and applications,
and the methods and strategies by which information is used in organizations, networks, cultures, and societies.

Conferences

Publications

Seminars

Consulting

Research

Search archimuse.com

What's Up?

Contact Us

Unifying our cultural memory: Could electronic environments bridge the historical accidents that fragment cultural collections?

in Information Landscapes for a Learning Society, Networking and the Future of Libraries 3, 1998. and presentation at UK Office of Library Networking Conference, July 1998.

David Bearman and Jennifer Trant, Partners, Archives & Museum Informatics, USA