Metadata Requirements for Evidence
David Bearman, Archives & Museum Informatics
Ken Sochats, University of Pittsburgh, School of Information Science
Introduction: Towards A Reference Model for Business Acceptable Communications
Managers in application domains from commerce to health care, and from research and development to manufacturing, are
seeking to define standards for data interchange adequate for their business purposes. The literature is replete with discussions
of how to enable end-to-end electronic business interaction, how to support the requirements of electronic patient records or
electronic laboratory notebooks, and how to implement the documentation demanded by CALS or ISO-9000.(1)
At the same time, managers of existing information networks and technical personnel charged with planning the National
Information Infrastructure of the future, are encountering the requirements to identify, control access, manage software
dependencies, represent the business meaning, and document the use of data, in these vast, distributed, heterogeneous,
computing environments.(2) Many observers feel that unless we can satisfy requirements for "integrity",
"authenticity", "reliability" and "archiving" of digital information, the National and Global Information Infrastructures will never
be able to support serious work.(3)
The professions traditionally concerned with evidence and records have not ignored these emerging requirements.(4) At the
University of Pittsburgh School of Library and Information Science, faculty and students engaged in a research project funded
by the National Historical Publications and Records Commission have been examining the "Functional Requirements for
Recordkeeping" as defined in a broad range of sources from law, regulation and best practices. From this "literary warrant"
they have derived a specification of the attributes of "recordness" or evidentiality.(5)
The specification defines thirteen properties which are identified in law, regulation and best practices throughout the society as
the fundamental properties of records.(6) These characteristics can be formally expressed as "production rules" or logical
statements of simple observable attributes.(7) One problem associated with deriving a set of requirements from the written prose
of the literature is that the specifications are often ambiguous, imprecise and subject to a high degree of interpretation. The
research group elected to represent the specification of each of the requirements formally as a set of production rules. The
production rule formalism was used during the process of developing a requirement in addition to being used as a
representation mechanism for the requirement. This helped to guarantee that the requirement specifications were as explicit
and unambiguous as possible. It also allowed the specifications to be logically refined such that the component statements
of the specifications were observable states or properties. These observables provided the foundation for the identification
of a specific set of metadata which, when present, satisfies the informational needs of the specification. If this metadata is
inextricably linked to, and retained with, the data associated with each business transaction it guarantees that the data object
will be usable over time, be accessible only under the terms and conditions established by its creator, and have properties
required to be fully trustworthy for purposes of executing business.(8) Additional metadata retained in system and organizational
accountability documentation assures full evidentiality.
In order to facilitate implementation of environments in which electronic evidence can be created, the project has taken its
findings one step further and proposed a "Reference Model" for "Business Acceptable Communications (BAC)". The metadata
requirements for evidentiality or "recordness" are necessary components of the reference model. One could imagine this as a
scheme for addressing electronic envelopes containing business communications that would ensure that the envelope could
be opened by different computers and software in the future and its contents would still be completely understandable. Because
it has been empirically found to correspond closely with the metadata specified in or required by strategies adopted by a range of
discrete standardization efforts underway in a variety of application niches, the reference model appears to have relevance to,
and value for, the process of defining standards for any type of BAC.(9)
The need for such a standard is widespread. Not only would it make communications received over networks trustworthy for
the purposes of conducting business, and help to ensure accountability and protect organizations against the risks of loss of
proof of their past behavior, it would greatly simply:
the management of huge volumes of communications from heterogeneous hosts,
the proper retention and disposition of records,
auditing the use of records for business, and
the appropriate management of private, secure, proprietary and business confidential data.
A side effect of such a recordkeeping standard is that it will enhance the business value of the data that it preserves. These
business benefits include:
providing data for market and other research.
documenting decisions, policies, events, etc.
documenting R&D and other business related processes.
To understand what data is necessary for such communications, we must begin by examining the nature of electronic evidence
(or the essential properties of records).
Records are at one and the same time the carriers, products and documentation, of business transactions. Transactions
(trans-actions) by definition are actions communicated from one person to another, from a person to a store of information
(such as a filing cabinet or computer database) and thereby available to another person at a later time, or communications
from a store of information to a person or another computer.(10) Because such trans-actions must leave the mind, computer
memory, or software process in which they are created (or must be used, "over-the-shoulder" as it were, by a person with
access to the same computer memory), the transaction record must be conveyed across a software layer, and typically across
a number of hardware devices.
Not all data that has been is communicated across software and hardware layers is a record. In fact, most information created
by and managed in information systems, is not a record because it lacks the properties of evidence. Records oriented
professionals within organizations, such as senior management, legal counsel, auditors, Freedom of Information and Privacy
officials, and archivists all require records, and not just information, but creators of the records frequently only need continued
access to the information to support their work. Therefore, application environments that support the ongoing work of the
organization frequently, or even usually, do not satisfy the requirements for creating evidence. In this paper we subsequently
distinguish between the terms "records" and "data", using records exclusively when we mean information that provides
evidence of a transaction.
The Functional Requirements for Evidence within Recordkeeping
Any organization that wants to use electronic documentation as evidence in the future will need to satisfy the requirements of
evidence in the normal course of conducting its business. It has been difficult to do so in the computer based communications
environments we have implemented in the past because applications software sold by third parties has not met these
requirements. Information systems are generally designed to hold timely, non-redundant and manipulable information, while
recordkeeping systems store time bound, inviolable and redundant records. Few, if any, in-house information managers have
been able to devote the energy to rigorous definition of the distinct requirements for recordkeeping or, if they had, would be able
to envision how to satisfy these throughout all systems. Without such explicit and testable specifications, computing application
and electronic communications systems have failed to satisfy the requirements for recordkeeping and are, therefore, a growing
liability to companies even while they are contributing directly to day-to-day corporate effectiveness.
The University of Pittsburgh research project identified hundreds of sources in law, regulation, best practice guidelines, and
general societal discourse which relate to the properties of evidence. From these it is clear that if records that are critical to the
organization for long term accountability and to protect its rights are not created by transactions, they cannot be created after
the fact from data in information systems. Information captured in the process of communication will only be evidence if the
content, structure and context metadata required to satisfy the functional requirements for recordkeeping is captured, maintained
and usable. The requirements of recordkeeping are corporate requirements, not those of any given business function or
application, and are therefore present for any communications. They are the foundation of good business practices and are
essential elements in reducing the risks of increased liabilities and decreased opportunities that accompany poor recordkeeping
The functional requirements in table 1 below are derived from the many sources we consulted which defined what constitutes
evidence. In addition to interviewing experts, we have systematically reviewed hundreds of sources considered authoritative by
lawyers, auditors, information technology specialists and archivists and records managers. In these sources we have identified
statements that pertain explicitly to the characteristics or attributes of evidence or records. Analysis of these authoritative sources
revealed twenty functional requirements for evidence which fell into three broad categories. In retrospect the small number of
requirements should not have surprised us, since they reflect a relatively tight social consensus about what it means for written
testimony about an act in the past to be considered trustworthy in the future.
Table 1: FUNCTIONAL REQUIREMENTS FOR EVIDENCE IN RECORDKEEPING
Accountable Recordkeeping System
The full requirement and specification is reproduced in Appendix 1.
Over the course of the past two years, this prose requirements statement has been subjected to rigorous analysis as we
expressed it in a formal representation. This version, which we call the "Production Rules Representation of the Functional
Requirements for Evidence" has forced us to operationalize a number of concepts that are not very precise in the literary warrant
and were not specific enough in the prose specification to ensure that a computer system would be able to validate their
presence or absence. Care has been taken in the development of the specification of the requirements to include only those
elements that are required to delineate the requirement. It is very easy to fall into the trap of overspecification and include
statements that would pre-define some level of implementation. For this reason, some of the requirements appear to to be very
abstract. These higher level specifications need to be further defined by the implementer to indentify specific system design
artifacts. We have made an effort to ensure that only observable data or calculations from observables reside at the leaf nodes
of the production rules. The observables consist of metadata and a very limited predicate vocabulary has been used to simplify
system requirements for auditing the production rules. The production rules representation is reproduced in Appendix 2.
Metadata Specifications for Evidence
Ideally a metadata specification for evidence could be completely deducible from the Production Rules version of the Functional
Requirements for Evidence. We believe we have achieved such a specification and that it serves to identify the data required
for such purposes as are proposed in the draft NIST standard for a "Record Description Record" or the Research Library
Group/Commission on Preservation and Access Task Force on Archiving of Digital Information. We also believe this specification
satisfies the needs for entries in electronic laboratory notebooks, electronic patient records and multivalent electronic
The functional requirements for recordkeeping dictate the creation of records that are comprehensive, identifiable (bounded),
complete (containing content, structure and context), and authorized. These four properties are defined by the requirements in
sufficient detail to permit us to specify what metadata items would need to describe them in order to audit these properties. This
descriptive metadata cannot be separated from them or changed after the record has been created. Several additional
requirements define how the data must be maintained and ultimately how it and other metadata can be used when the record is
accessed in the future. The metadata created with the record must allow the record to be preserved over time and ensure that it
will continue to be usable long after the individuals, computer systems and even information standards under which it was created
have ceased to be. The metadata required to ensure that functional requirements are satisfied must be captured by the overall
system through which business is conducted, which includes personnel, policy, hardware and software.
We envisage transactions taking place as metadata encapsulated objects, although records might not be physically stored in
this manner. When transmitted, the contents of the transaction would be preceded by information identifying the record, the terms
for access, the way to open and read it, and the business meaning of the communication much as a train of baggage cars is
preceded by an engine. Metadata encapsulated objects may contain other metadata encapsulated objects, because records
frequently consist of other records brought together under a new "cover", as when correspondence, reports and results of
database projections are forwarded to a management committee for decision. They may also contain the information content of
previous records which have been copied into an information system where the creator of this transaction has had the
opportunity to modify them; in this case they may contain a citation to a previous record but would not contain the encapsulated
version of the previous record.
Ideally the contents of all data objects that we want to communicate would be "interoperable" and encoded in standard formats
to give them a degree of software independence (the actual degree depends on how long any given "standard" can be expected
to remain a standard, which in archival terms is not very long). In any event, many data objects we create today will not be
standard and the metadata with which we label them must flag the dependencies of the data (including their dependency on
standards) so that a future review of record headers can locate sources of brittleness and segregate records requiring migration
to new software formats before they become unreadable.(11)
Our concept of evidence makes it important to know when records were used and how, in what ways they were filed, classified
and restricted in the past, and, if they have been destroyed under proper disposition authority, when and by whom that act took
place. It is also important to us to know what redacted versions of records were released over time. Transactional data reflecting
the history of its use (events in its life subsequent to creation), provides the documentation traditionally associated with archival
description, but instead of such data residing only at aggregate levels, it is possible to define electronic records metadata
structures that enable us to search for specific records based on information about the instance or concrete business transaction
which generated them.
In addition to ensuring that the data we capture is a record, and can serve as evidence, metadata should be defined so that it
makes data objects communicated across software and hardware layers (and therefore any communications over a network):
These properties, while important for simplifying the management of records (especially in an inter-networked environment in
which hundreds of millions of records are created daily), can be made to be direct consequences of keeping records if attention
is paid to the structuring of the metadata that makes records evidence.
Furthermore, a system for metadata management which has appropriate modularity and content standardization can support
formally auditing the business system which generated the information object transactions and the software, hardware,
procedures and policies surrounding a system to determine where they contribute, or fail to contribute, to the creation,
maintenance and use of evidence. While no system of management can be self-auditing, a communications system built to
ensure that appropriate metadata is captured for evidence can support a level of management accountability that it was never
previously possible to implement or enforce.
We recognize however that a specification based solely on necessary and sufficient conditions for recordness does not address
certain other desirable functionalities of a business communications environment because evidentiality is not the only requirement
for such a system. Among the other requirements we have seen being addressed in the effort to develop widely applicable
models for network metadata management are:
support for a system of access and use rights management
support for networked information discovery and retrieval
support for registration of intellectual property
Therefore, we have proposed a draft Reference Model for Business Acceptable Communications that attempts to specifically
address these additional requirements as part of a dialog that must take place between advocates of mechanisms to support
these different fundamental purposes through an overall structure for metadata encapsulated objects. (12)
The Proposed Reference Model
The metadata elements needed to execute the production rules expression of the University of Pittsburgh Functional
Requirements for Evidence possess no intrinsic order. Criteria for ordering these elements must be derived from scenarios of
their anticipated use within an overall system of recordkeeping.
The initial clustering of these data elements to achieve functional modularity, led researchers to organize them in six layers
which they labelled:
- Terms & Conditions
- History of Use
The addition of the requirements noted earlier from other object standardization efforts designed to provide support for a system
of access and use rights management, for networked information discovery and retrieval and for registration of intellectual
property suggests a need to add substantially to the properties identified as necessary for assurance of evidence, in layers
devoted to identification and terms and conditions. In particular it suggests a need for "resolving" agents or services for dealing
with terms and conditions of access or use and managing information discovery and retrieval for the aggregate resource of which
a given record forms a part. This required us to introduce a resource descriptor element (not present in the December 1994 draft)
that points to a compilation of which the record might be a part and through which it would be accessed.
It was evident in the discussion of NIDR the Spring 1995 CNI meeting (13) that the kinds of relevance
ranking and intellectual content representation for information retrieval functions being considered essential to the networked
information discovery and retrieval requirement operate at a level of compilations, repositories or services for records of business
transactions and that these publications, services, or repositories will have quite different descriptive data associated with them.
Indeed this is consistent with the focus of the Library of Congress Electronic cataloging meeting in October 1994 and the recent
announcement by OCLC of its intention to catalog Internet resources.
It was also clear from the discussion of network management issues associated with identification of intellectual property (14)
that much more attention needs to be given to naming of objects in this domain than has been necessary for the more limited
purposes of unique identification of evidence. The simplifying reality that no change can take place in a record and that any
interaction with a record, even looking at it or forwarding it, creates a new record, ignores the social dependence of the concept
of original creation at the foundation of intellectual property.
This (summer 1995) draft of the Reference Model, therefore, attempts to place the specific and limited requirements for metadata
of evidence in the context of the other tasks that have been imputed to such descriptors. It does so by renaming the layer
previously labeled "Registration" by the new name "Handle" indicating both the requirement for more robust methods of
identification than are necessary to evidence and the need for documentation of the contents, or pointers to documentation of
the contents, to facilitate discovery and retrieval. No effort is made here to elaborate on how these additional requirements could
or should represent the information required to satisfy their further requirements, since this will best be done by the communities
most concerned with that functionality. The clusters were described as "metadata/properties" reflecting the distaste expressed
for the term metadata by spokesmen for these communities at the recent CNI meeting.
Rather than pursue these matters any further, this paper explores the metadata content required by the Functional Requirements
for Recordkeeping which dictate mandatory and optional data elements within defined data clusters at each of the six layers of
the metadata model. In certain areas, particularly regarding structural dependencies of data objects representing non-textual
content, we have specified a potentially extensible set of modality specific data elements. This reflects the recognition that we
can never completely specify the data that will be required to document the structural dependencies of future data types.
The clusters are part of the reference model and must always occur, but the optional metadata elements may or may not be
present based on characteristics of the application. The metadata content directly related to satisfying requirements for evidence
is mandatory. Hence evidence, required for the conduct of business and for accountability, is ensured by a "Metadata
Encapsulated Object" conforming to the reference model for "Business Acceptable Communications". The metadata content
which contributes to recordkeeping, or management of records, but is not essential to evidence, is optional. Metadata content
useful for specific domains or business functions may be defined by those domains as mandatory for business in that domain or
optional within the domain. All such metadata would be optional for anyone outside the domain. The layers and clusters of the
reference model are shown in table 3, below.
Table 2: Outline of the Reference Model for Business Acceptable Communications, showing layers and data clusters
- Handle Layer
- Registration Metadata/Properties
- Record Identifier
- Information Discovery and Retrieval
- Terms & Conditions Layer
- Rights Status Metadata
- Access Metadata
- Use Metadata
- Retention Metadata
- Structural Layer
- File Identification
- File Encoding Metadata
- File Rendering Metadata
- Record Rendering Metadata
- Content Structure Metadata
- Source Metadata
- Contextual Layer
- Transaction Context
- Business Function
- Content Layer
- Use History Layer
The purpose of specifying metadata as part of this model is to ensure recordness. When the metadata needed by a specialized
domain has an essential application related purpose but is not required for recordness, it is preferable to satisfy this application
purpose by definition of a standard interchange format. The interchange standard can be cited in the metadata for Business
Acceptable Communications and the data content can then be opened by knowledge of the requirements and structures of the
standard without further elaboration. This has the dual advantage of efficiency of definition and ease of migratability as all records
corresponding to a specified protocol can be re-presented in a new standard if the old format is superseded.
How can the Reference Model be implemented?
We imagine two possible scenarios for the long term: in the first, each organization implements its own methods for capturing
metadata and encapsulating objects, while in the second the requirement is imposed on software developers and networks as a
consequence of standards adopted by their clients. In either case, any implementation must acknowledge that the level of
analysis and documentation at which computer mediated communications are evidence is that of the individual business
transaction. This means that we are not concerned with capturing metadata at higher levels, such as that of the resource or
lower levels such as that of a single document, data item/file, or computer system transaction. It also means that the data and
metadata we capture must always be related back to a specific transaction whether corporate or personal. It also requires that
we be able to incorporate multiple physical files within a single record, and any number of prior records within a new record.
Practically, one might view the implementation of metadata recording and management as a continuum. At one end, all of the
metadata is encapsulated, stored and transported with the record. The record in this case is physically self explanatory. The
problem with this is that a high amount of overhead is associated with every action taken with a record.
The opposite approach would be to store none of the metada with the record. The metadata would be stored in a data base on
a kind of archival server and each record would contain a pointer to its metadata. While this approach avoids the overhead
associated with communicating records, it requires more sophisticated management and is susceptible to problems associated
with corrupt pointers decoupling a record from its metadata.
The actual implementation adopted by an organization will probably lie somewhere between these two extremes. In the
inter-organizational exchange of records, the first model must be used. For intra-organizational use it is
likely that an intermediate approach is used. Some metadata will be encapsulated and carried with the record, particularly
metadata that will be used by subsequent processes or procedures. Other metadata such as citation of business rules, standards
and legal authority will be stored in a data base and referenced by pointers encapsulated with the record.
Very few existing information systems are designed to execute business rules or document business processes. Therefore they
will currently typically create a new computer record for a software transaction which involves no business transaction but
changes the data in the system (such as background saving of a file on which I am working) but create no record of common
business transactions which do not change data in the system but nevertheless require evidence (such as querying a database
for decision support). Implementation will need to impose the concept of business transactions, rather than that of systems
transactions, on their environment. In addition, they will need to interface with business process models to capture appropriate
It would be possible to design application software that did recognize business transactions boundaries but the differences
between organizations would likely make implementing such software complex and maintaining its knowledge of local business
processes costly. Nevertheless, certain parameterized features of application systems can already be employed to ensure the
satisfaction of some of the functional requirements for recordkeeping. For example, word processing systems can support
corporate record creating requirements if the users of such systems exclusively employ style sheets defined in such a way as to
distinguish between transactions based on their process location and business purposes. Geographic information systems often
have reporting features that allow the user to create output files of all the relevant layers of data incorporated into a query
At least in the short term, however, the need to create electronic records with metadata conforming to the reference model will
require systems implementers (possibly in cooperation with users) to construct traps outside of applications software in which they
can capture the metadata required for evidential transactions. Assuming no changes were made in applications systems,
implementers could capture some of the requisite data items in the user interface layer. For example, by enhancing the information
captured when users sign on to the system so that authorizations, logical business location, and types of transactions allowed to
an individual are brought into the system memory for assignment to transactions as needed. They could then provide icons
representing the business tasks in which a user may engage (based on process data models and business rules of the
organization) rather than icons representing software applications.
Indeed the user interface could easily be designed so that users never open software applications directly, but rather they
engage special 'clients' which open and configure the application software in a way that utilizes its style sheets, macros,
self-documenting features, for the particular business function in which the user is engaged. This way only clients representing
specific business processes can be admitted and the rules governing such uses can be imposed on the system. These clients
could then provide the necessary metadata to identify the business transaction when a record of it is created.
Alternatively, or in conjunction with user interface interventions, implementers could develop an "evidence" service in the
Application Platform Interface to capture transactions addressed in particular ways and assign them metadata attributes required
to ensure their authenticity and survival. Such a service could be customized with the rules of a particular business so as to
identify transactions of specific types and adhere to the appropriate retention periods, access and use rights, and filing rules.
Finally, information systems staff could identify components in the systems architecture, from storage devices serving as corporate
file rooms to telecommunication switches linking to other LAN's, WAN's or systems, which assigned yet other metadata attributes
to records when they were communicated. Thus records filed in certain places and under particular headings would be given
metadata attributes upon arrival at the filing server application. Records deemed to be lacking appropriate metadata to leave an
organizations' boundaries, or even to pass outside the LAN serving one work group, could be assigned those attributes or
returned to sender to provide the necessary descriptors. In conjunction with corporate policy and procedure individuals could
participate in completing document routing and description templates for all transactions, or be required to default to pre-set
templates for a series of identical transactions.
One of the questions that must be answered by research is whether some metadata elements are easier to capture in certain
layers of the architecture than others. We believe that certain of the metadata required for recordness, specifically that pertaining
to compliant organizations and accountable systems, will best be documented by means other than transaction level metadata
capture because of the inherent inefficiency of capture of such systemic proofs at a transaction level and the difficulty of a
system ascertaining the state of organizational compliance or of its own logical correctness.
In sum, it appears that through a combination of policy, implementation and design, and standards, an organization can ensure
that only "business acceptable communications" are generated from its information systems, maintained in its recordkeeping
systems, and made available through to its information retrieval systems.
Implementers will recognize that when a user requests a record, a copy of that record is passed to the information retrieval
sub-system, but if the user opens the record contents under the control of another application, the contents are incorporated
within the application in which he or she is working and will become part of the contents of a new transaction. If the user intends
only to append or forward a record, this does not involve opening the record and may, in some environments, be accomplished
by pointing to it while in others it will involve incorporating an encapsulated version of the record within the current transaction.
When users generate a communication in this environment, a "Business Acceptable Communication", encapsulated by metadata
necessary to ensure its integrity and longevity, would presumably be split off from the information system stream and sent to a
recordkeeping system or warehouse where it would be kept intact. Another version of the transaction would normally remain
within the application environment where it would be available for further manipulation, update and editing, or would do the jobs
of updating databases, launching procedures or generating reports in that environment. From the perspective of the business,
all data in information systems can be treated as a convenience copy, to be kept as long as required for on-going business
purposes and to be altered as desired to increase efficiency.
When needed, records from recordkeeping systems may be copied to information systems which need require their content,
but the record itself will never be deleted from, or changed within, the recordkeeping system except with specific records
disposition authority. Recordkeeping systems will store and provide access to metadata encapsulated objects. Sometimes the
purposes of such access will be to make use of the data content of records in subsequent business transactions which create
their own records. These transactions will take place through application systems, which like most information systems, are not
designed to make or keep records.
Sometimes the purpose of access is simply to view the records outside of the business purposes of the creating organization.
Traditionally such reference uses of archives have not created new records, although logically they are the record of the use of
the archives which is itself a function of the organization. In an evidential environment, viewing a record in conjunction with a
business transaction creates a new record for the recordkeeping system and leaves a transaction trail in the original record.
There is no specific computing model that must be employed in the maintenance of recordkeeping systems although it may
seem that the discussion of communicated transactions to this point has used the terminology of object orientation. Once these
transactions are communicated (typically by a serial process but always in such a way as to produce a serial record on the
receiving end, and hence as an encapsulated object), they can be treated as if the metadata was structured database information
in a standard relational, hierarchical or flat file database management system. A simple method for doing this securely would be to
store a hash of the contents in the metadata record and a hash of the metadata record with the contents.
Logically, metadata content must either follow an external standard or contain its own declarations (e.g.,. meta-meta-data). It
would be greatly more efficient for the society at large if instead of requiring individual organizations to implement systems in ways
that supported the requirements for evidence, a standard for communications could be adopted that placed the burden for
creating metadata encapsulated objects on the application software and network software developers.
The definition of a standard for Business Acceptable Communications presumes the existence of software and services that can
use the metadata which must be associated with an object. Specific types of services, for example, are envisioned to follow up
on address information contained in Terms and Conditions metadata layer to translate it into concrete prices, permissions, and
data views. The presumption is that Terms and Conditions metadata will be expressed in abstract categorical terms, not in
concrete terms so that it can be processed correctly as the situation variables (inflation, changes in permissions based on elapse
of time since the event, re-engineered business processes, etc.) change. The model for implementing control based on Terms
and Conditions is that a "resolver" will be put in place by the owner/creator of a record that is designed to operate against the
categorically expressed terms and conditions data in a dynamic manner. This allows the terms and conditions to be calculated
for the moment, based on the user, and sensitive to the conditions of use. It is envisioned that these applications will be
maintained by those interested in restricting rights and their functionality can be ensured in part by establishing a mechanism that
allows users access to the records if no restrictive permissions manager is operating.
The Reference Model for Business Acceptable Communications discussed here and proposed in the accompanying formal
presentation, builds upon and extends work underway in standards committees in many areas. It attempts to provide a generic
structure and theoretical grounding for work items proposing metadata encapsulated objects as a tactic for interchange. While
it will doubtless be refined before a fully acceptable reference model is adopted, it is our hope that the formulation of this model
will place the functional requirements for evidence at the heart of any discussion of what makes a business communication
The Reference Model acknowledges, but does not solve, some fundamental problems in the distributed network environment.
For example, a major concern is how the identifier uniquely assigned by one domain is guaranteed to be unique when the object
is incorporated into a universe in which identifiers assigned by other domains are present. Obviously uniqueness can be ensured
by combining a unique identifier within a domain with a unique identifier for the domain. The problematic aspect of this is that
domain identifiers need to be truly unique to a person or organization but we want to define a system in which the domain
identifier does not have to carry too much intelligence and yet can be meaningfully related to its successor and precursor
identifiers. Also, it must be possible to issue domain identifiers without serious overhead. Billions of unique business transactions
will flow through worldwide communications systems within and between organizations and between individuals and/or computers,
daily. It must be possible to uniquely identify all these. Mechanisms for unique identification of computing systems and sources of
communications are being worked out for such open domains as the Internet (by the IETF) but also need to be developed within
specific corporate communications contexts.
It may also be necessary to search for records that satisfy criteria based on their content, even though this is not essentially an
archival requirement. The Reference Model is designed to hold metadata that can satisfy such requirements but it is not currently
populated by metadata designed to support NIDR. Recent work on this area by the Coalition for Networked Information and by
the U.S. library community may define structures within this cluster although the problems of defining what records are "about",
rather than what they are "of" has been a vexing one since the advent of archives. The volume of records that are created has
always defied cataloging individual records and the content description of records, which are not created to be about their
content but rather as a consequence of business transactions, tends in any case to be either misleading or inadequate.
Substantial practical research will continue to be required to determine how best to provide access to records of specific
kinds or records documenting particular types of transactions.
Another concern is how as a practical matter, to best monitor metadata values in order to make the necessary software
migrations at appropriate times in the life of records. Not only do we need to make sure to migrate the records to new structures
before the old ones are no longer supported, we need to make good decisions about logical mappings in order not to introduce
too much noise with every migration and ultimately lose the message in digital copying as surely as with did with multi-generational
copying of analog messages. Needless to say, some people also worry that these software migrations, if they continue to need to
be done as often as once a decade or more, will become too costly to support and that as a consequence some records of
value will be abandoned. Within the environment in which recordkeeping takes place, stringent approaches to configuration
management will be essential to ensure that record documentation retains critical usable metadata.
At the same time, it is noteworthy that the proposed approach to archiving and to maintenance of business acceptable
communications does not require us to include information about physical formats and media within the record metadata.
Rather the environment in which records are kept will need to be one in which managers move data from one medium to
another as required to assure backup and preservation of the data. It is presumed that media that are currently supported will
always be used and that data transfer to current media will take place in the normal course of operations. Documentation of the
data processing center backup and recovery functions is not part of the model because the model presumes that day-to-day
data management will be responsible.
1) The Functional Requirements for Recordkeeping project has compiled a database of "warrant" for the
requirements we have defined. The most up-to-date version of the requirements, specifications, production rules, metadata
standards, literary warrant and research papers on the variables in electronic recordkeeping in organizations are maintained on
the project WWW server at:
Examples of the kinds of sources from which "literary warrant" has been drawn include:
- Code of Federal Regulations, 36 CFR PART 1234 -- Electronic Records Management. Subpart C
Standards for the Creation, Use, Preservation, and Disposition of Electronic Records
- Electronic Industry Data Exchange. ASC 12 Convention : Version 3 : Electronic Industry Data
Guidelines. Washington Publishing Co., 1994
- Federal Rules of Evidence. 1990
- Guttman, B.Computer security considerations in Federal procurements.National Institute of
Standards and Technology, NIST Pub 800-4
- Institute of Internal Auditors Research Foundation. Systems Auditability and Control Report
Researched by Price Waterhouse, 1991
- IS0 9001Quality systems - Model for quality assurance in design/development, production,
installation and servicing, 1987
- FIRMR : federal information resources management regulation. Washington, DC : U.S. General
Services Administration, Office of Information Resources Management, 1990
- McCormick on Evidence. 4th ed. by John William Strong, general editor. (St.Paul, Minn: West
Pub. Co, 1992)
- Miller, Larry P., GAAS guide: a comprehensive restatement of all current promulgated generally
accepted accounting principles. San Diego : Harcourt Brace Professional Pub.; 1994
2) for an example, see: IEEE Mass Storage Systems Standards Technical Committee Metadata Project,
Second Meeting on Metadata for the Administration and Access of Stored Information, Austin Texas February 17-18, 1994.
Documents discussed at this meeting included:
- "The Intelligent Archive" (UCRL-TB-115079-6 Lawrence Livermore Laboratory, Carol Hunter Project Manager)
- "Whitepaper on Data Management", Robyne Sumpter, Lawrence Livermore Laboratory Feb.10, 1994
- "A Metadata Capability Supporting the Hierarchical Storage and Access of Large Abstract Data Entities", J.C.Almond and
Rekha Singhal, University of Texas CHPC
3) for example, see:
4) New York State Archives & Records Administration, Guidelines for the Legal Acceptance of Public
Records in an Emerging Electronic Environment (Albany, Dept.of Education, 1994) 35pp.
- Clifford Lynch, "The Integrity of Digital Information: Mechanics and Definitional Issues", Journal of the American Society
for Information Science, vol.45#10, December 1994 p.737-744;
- Peter Graham, "Intellectual Preservation in the Electronic Environment", Proceedings, Library Collections and Technical
Services 1992 pp.18-32 (Chicago, ALA, 1992);
- Henry Perritt, "Public Information in the National Information Infrastructure", Report to the Regulatory Information Service
Center, General Services Administration and to the Administrator, Office of Information and Regulatory Affairs, Office of Management and Budget, 5/20/94
- Other activities, currently underway, to which the reference model seems relevant are the Research Libraries Group and
Commission on Preservation and Access Joint Task Force on Archiving Digital Documents, the Coalition for Networked
Information sponsored working group on Networked Information Discovery and Retrieval, and the National Institute of Standards
proposed Federal Information Processing Standard for "Record Description Records".
5) David Bearman, Electronic Evidence: Strategies for Managing Records in Contemporary Organizations
(Pittsburgh, Archives & Museum Informatics, 1994)
6) NHPRC grant #93-030, "Variables in the Satisfaction of Requirements for Electronic Records
7) David Bearman and Ken Sochats, "Formalizing Functional Requirements for Recordkeeping"
unpublished draft paper included in University of Pittsburgh Recordkeeping Functional Requirements Project: Reports and
Working Papers (LIS055/LS94001) September 1994
8) David Bearman, Functional Requirements for Recordkeeping: Metadata Specification
(Unpublished Draft, 2/21/94)
9) References to:
- Datastream for Folder Interchange (ISO 161/17-WG6 NWI)
- Electronic Document Interchange (EDI) standards, including EDIFACT
- ATM protocols
- Spatial Data Interchange Format (SDIF) and DIGEST
10) David Bearman, "Electronic Records Management Guidelines: A Manual for Development and
Implementation" in United Nations, Administrative Coordinating Committee for Information Systems, Management of Electronic
Records: Issues and Guidelines (New York, UN, 1990) reprinted in Electronic Evidence, op.cit.fn5
11) There is a consensus that "preservation" in electronic environment means refreshing. For an early,
but still sound, articulation of the reasons, see: Margaret Hedstrom, "Optical Disks: Are Archivists Repeating the Mistakes of the
Past?", Archives & Museum Informatics Newsletter, vol.2 (1988) p.52; also her "Electronic Archives: Integrity and Access in the
Networked Environment" in Stephanie Kenna and Seamus Ross, eds., Networking in the Humanities (London, Bowker/Saur,
12) We believe this model takes into account requirements such as those implied by the plans of the
German Government for its move from Bonn to Berlin over the next decade. In that planning process they it has become obvious
that much of the communication between governmental departments will take place electronically between individuals with little
if any face to face contact who will require secure and authenticated communications and the ability to make and keep records.
In defining an architecture to support these requirements, the PoliTeam, established for this purpose, defined an architecture that
could take advantage of the functional requirements for recordkeeping, but they did not identify those requirements. In reforms in
the Dutch civil service over the past several years, earlier opening of government records was one objective, and the studies
undertaken to support this goal revealed a need to begin to plan for electronic communications systems. In their reforms, the
Dutch government has begun to take advantage of the functional requirements for recordkeeping and is encountering many of
the same issues of metadata management being addressed by this paper. The Canadian government has been defining
"Guidelines on the Management of Electronic Records in the Electronic Work Environment" as a component of the "Electronic
Work Environment (EWE) Vision" being promulgated by the Canadian Treasury Board. Popularizations of the implications of
these activities have been published recently.by Terry Cook in "It's 10 O'Clock, Do You Know Where Your Data Are",
Technology Review, January 1995; also his, "Electronic Records, PaperMinds: The Revolution in information Management and
Archives in the Post-Custodial and Post-Modernist Era, Archives & Manuscripts, vol.22#2, p.300-328
13) Clifford Lynch presentation (unpublished)at the Coalition for Networked Information Meeting, Spring
14) Bill Arms presentation (unpublished) at the Coalition for Networked Information meeting, Spring 1995
APPENDIX 1. FUNCTIONAL REQUIREMENTS FOR EVIDENCE IN RECORDKEEPING
APPENDIX 2: PRODUCTION RULE REPRESENTATION OF REQUIREMENTS FOR EVIDENCE
APPENDIX 3. REFERENCE MODEL FOR BUSINESS ACCEPTABLE COMMUNICATIONS