Digital Preservation Basics: Concepts
What is Digital Preservation?
Digital Preservation refers to a broad range of managed activities designed to preserve the accessibility of digital materials and protect them from loss and obsolescence for as long as necessary.
Digital materials consist of:
1) Digital representations of physical or analog materials that have been digitized
2) Digitally created materials referred to as "born-digital" (no original hardcopy)
Whether materials are born-digital or obtained through digitization, the data are managed identically in Digital Preservation.
Digital Preservation Strategies:
Refreshing is the transfer of data from one storage medium to another of the same type (e.g. from old CD to newer CD) in order to avoid the degradation or alter- ation of data (i.e. bit rot, media damage).
Replication is the creation of duplicate copies of data (not co-located) in order to protect against accidental loss or alter- ation of data; having multiple copies of a file is safer than keeping only one.
Migration is the conversion of data from an outdated format or operating system to a newer one in order to avoid obsolesence; the newer version being compatible with current software/applications.
Emulation is the replication of an obsolete program's functionality on a completely different and modern system in order to imitate or render the functionality of the emulated data (e.g. re-creating an Atari 2600 game for Windows).
Addition of metadata is the strategy of attaching a file containing metadata (i.e. data about the data) in order to provide preservation information such as technical specifications, provenance, structural hierarchies, searchable descriptions, rights management and restrictions, etc.).
Preserve... as long as necessary
In order to maintain future access to digital files, a number of things must be preserved:
Bits/bitstream (the ones and zeroes)
File formats (how data are packaged)
Metadata (information about the data)
Applications (read/display the data)
Media/hardware (store the data)
Digital Materials are invariably less stable and have a shorter life span than physical or analog materials. Although there are advantages to the digital form, such as the ease and convenience of reproduction, transmission, and access, digital materials create serious challenges for those who use and manage them:
- Access to digital materials is dependent upon specific software and hardware
- Advances in technology are rapidly making today's materials/systems obsolete
- Media that store digital materials are fragile and quickly degrade/deteriorate
- Digital materials can easily be altered, corrupted, stolen, lost, or deleted
- Maintenance, security, and storage of digital materials can become costly
Consequently, the preservation and management of digital materials require long-term strategies and activities that differ significantly from those of physical and analog materials.
(See Preservation Strategies)
Digital Preservation Key Terms
Checksum is an algorithmic computation made from a file's digital data to produce a hash (unique alphanumeric code of a set length) for the purpose of detecting possible errors or changes to a file during its storage or transmission. If the checksum results in a hash identical to one computed previously, it has not been altered. This process is known as a "Fixity Check."
IE = Intellectual Entity is a set of content considered a single intellectual unit for the purpose of management and description; examples might include a single photograph, a 10-page report, a 400-page book, a 2-hour movie, a short audio recording, a webpage, etc. It may have multiple digital representations (e.g. TIF, JPG, and PDF of same image are one IE).
Master file is an "archival/preservation" or "production" version of a digital file used as a data source to produce derivative files for access, delivery, or viewing.
Metadata is information about an item (analog or digital) at any level (component, object, or collection). Metadata describing digital items can be embedded in a file, packaged with a group of files, or accessed externally. There are various categories of metadata: Administrative Metadata is data used to manage the digital content such as rights and permissions, and includes Preservation Metadata that supports the retention of content by detailing provenance and authenticity, and Technical Metadata that documents the technical characteristics of and processes used to create the digital content; Descriptive Metadata is bibliographic information used to describe the intellectual content for the purposes of identification and discovery; and Structural Metadata is data that establishes the relationship among components of a complex digital item, the chapters and pages of a book for example.
OAIS stands for Open Archival Information System and is an effective reference model for digital archives. It provides a foundational framework for institutions to follow in establishing digital-preservation operations and architectures. In addition, it serves as a standard for terminology, techniques, and strategies among those organizations participating in the preservation of digital content.
SIP = Submission Information Package is an OAIS term that refers to a packaged set of files and metadata that is delivered to the digital storage system for ingest into the repository.