|
Metadata Initiatives
NCDC
PMDO
NCEAD
NCEAC
Digitization
Guidelines
Metadata
Tools
|
This content standard was adapted from "Metadata for Administration and Preservation of Digital Images (MAPDI)" prepared by Dr. Helen R. Tibbo and Claire Eager with assistance from a preservation metadata working group at the School of Information and Library Science for the State Library of North Carolina, Department of Cultural Resources, NC Exploring Cultural Heritage Online (ECHO) program through an LSTA Grant in September 2002. For more information on that project, please see http://www.archimuse.com/mw2003/papers/tibbo/tibbo.html
Additional thanks go to Michael Adamo at the Digital Production Center, Perkins Library, Duke University for his assistance on Checksum utilities and directions.
INTRODUCTION
Preservation metadata can be elusive because it is so often embedded in descriptive structures in such categories as condition or conservation history. The advent of digital surrogates created through digitization projects, though, puts preservation metadata on the front burner in terms of importance. These digital surrogates require a host of preservation information about their creation in order to build in long-term sustainability. With the increasing pace of technological innovation, hard-ware and software obsolescence worries, and storage medium changes, it is essential to maintain accurate and complete data regarding your digital production. This will ensure effective migration and thus underwrite your sustainability efforts. In addition, it ensures that NC ECHO's "Scan Once Methodology" can be used to full effect given the ability to preserve not only your original materials but also the digital surrogates that you create (see NC ECHO's Guidelines for Digitization: 4. Digital Production).
What is Preservation Metadata?
Preservation metadata for digital objects is just one aspect of your metadata program. This content standard only addresses the metadata needs for the preservation of your digital objects. Descriptive, analytical, administrative, and structural metadata are handled through other metadata standards. This should not be considered a viable replacement of such standards as Dublin Core, EAD, or other metadata structures supported by NC ECHO's Metadata Initiatives.
Where Do I Keep My Preservation Metadata?
Preservation metadata is internal information retained by an institution responsible for the maintenance of digital surrogates. This metadata format is not intended to be shared in consortial efforts but to assist in the long-term sustainability of their digital objects. It will also ensure that when a consortial digital repository is created for NC ECHO, each institution will have maintained the appropriate metadata to deposit their digital objects for long term preservation. Because of this, Preservation Metadata for Digital Objects is system-neutral. In other words, preservation metadata may be recorded in a variety of formats, such as an Excel spreadsheet, an Access database, or other similar tools. This standard has been created to ensure that the preservation metadata maintained by an institution will provide information that is essential or useful in the long-term sustainability of their digital objects.
Institutions currently using a collection management system or digital asset management system for digital objects may very well have addressed the issue of preservation metadata within their existing system. If you are using such a system, we recommend that you review the content standard that follows to ensure that your system is recording at least the required data elements specified in the standard.
For those institutions that do not already maintain a system that accounts for preservation metadata, an Access database tool is provided by NC ECHO for the maintenance of the preservation metadata. Adoption of this particular tool is not required by partner institutions. It is provided for those institutions that do not already have a system in place. For information about this tool, including implementation guidelines, please see PMDO Access Tool (http://www.ncecho.org/dig/pmdo.shtml). Note that NC ECHO does not assume responsibility for the maintenance of individual instantiations of this tool. It is being provided for the convenience of institutions but individual implementations are not supported by NC ECHO. While we will provide what guidance we can, institutions will be responsible for maintaining their own databases.
This tool and content standard were initially developed through an NC ECHO grant by Dr. Helen R. Tibbo at the School of Information and Library Science at the University of North Carolina at Chapel Hill. While adjustments have been made to each, the foundation created by Tibbo, graduate assistant Claire Eager, and her advisory team provide the backbone of this standard.
PURPOSE OF DIGITAL OBJECTS
It is expensive for institutions to go back and re-digitize their holdings. Few ever do so. In addition, many originals could suffer from the handling and exposure to bright light required by digitization. Therefore, it is best to simply "scan once," create a master image, and make any future duplicates from it. While using the term "image" here, this would also refer to other kinds of digital objects, such as sound and video files. As with the "scan once" terminology, this theory has grown from the initial attention to the creation of digital images to the creation of digital objects.
The Master image: The highest quality copy of a digital image, often called the master image, is expected to be a quality surrogate of the original. As such, it should represent the un-manipulated original and be created at a high resolution and stored in an uncompressed format (usually TIFF). High resolution equals large amounts of information captured, and large amounts of information captured usually equal a higher quality digital image. Higher quality digital images have a longer life and are more versatile. It is the master image that holds the promise of versatility and longevity. From it, high quality prints or publications might be made as well as derivatives for a variety of uses.
Derivative images: Access images are lower resolution copies are derived from the master by using a "save as" function and changing the storage format and resolution. They may be of varying quality and are generally manipulated for better display upon the screen or page (cropping, re-sizing, etc.) Additional images, such as "thumbnails" (even lower resolution copies) may also be created from the master or access image. These thumbnails allow for even quicker downloads of pages, and faster retrieval of large numbers of images.
| LEVELS OF SCAN |
FILE FORMAT |
USED FOR |
ALTER ? |
| Master image |
TIFF |
Long-term storage or print |
Do not alter, or resize, or compress |
| Access image |
TIFF or JPEG |
Screen display or print |
Taken from the master, it is altered for presentation over the Web or other uses. |
| Thumbnail |
JPEG or GIF |
Screen display |
Taken from access, reduced size but not altered otherwise. |
Master images must be of the highest quality. Web images need not require such stringent quality controls. But, before compromising on image quality, consider the cost of migrating the image. Because migration is costly, it is far sounder to migrate a high quality (master) image than one of lesser quality. All digital objects will have to be migrated, if kept long enough. While the primary use of images in North Carolina ECHO is focused on Web access, repositories need to be mindful of future use, remembering the fragile nature of the originals. For more information about the creation and storage of these images, please see the NC ECHO Guidelines for Digitization (http://www.ncecho.org/dig/digguidelines.shtml).
The importance of this tripartite approach to digital objects cannot be overstated. In creating preservation metadata for your digital objects, you must keep in mind that you should retain information about each digital image, Master, Access, and Thumbnail. The Purpose element is thus required so that you may distinguish between the different versions of the same image.
PMDO CONTENT STANDARD: ELEMENTS AT AT GLANCE
* Master images have special requirements: They should not be compressed and should not contain a watermark. Therefore the Compression and Watermark fields would have a value of NO for master images, and Compression Type and Compression Degree would not apply to master images. For more information regarding this, please see Master Images in the NC ECHO Guidelines for Digitization.
PMDO CONTENT STANDARD ELEMENTS
IDENTIFYING THE DIGITAL OBJECT |
| Digital Object ID (Required) |
| Definition: A unique identifier for the digital object generated by the repository. This identifier is the filename without extension of the digital object. |
Input guidelines:
- Enter filename of the digital object without file extension: "object1" NOT "object1.jpg"
- Should be an alphanumeric structure.
- Do not use special characters such as >, <, &, #, ?, =, +, etc. or white spaces.
|
| Title of Original (Required) |
| Definition: Natural language title of the source object being digitized independent of the number of digital images require to make a digital surrogate. Do not confuse with Title of Digital which represents the title of the item being represented by a single digital object. Note that if an item that can be represented by a single digital image, Title of Original and Title of Digital (below) may be the same value. |
| Note that this field is included in most descriptive metadata systems and that the value entered here should be identical to your descriptive system. |
Input guidelines:
- Enter the natural language title either supplied by the original or created by the institution.
- Drop initial articles such as "A", "An", or "The"
|
| Title of Digital (Required) |
| Definition: Natural language title of the item represented by a single digital object. Do not confuse with Title of Original which represents the title of the original item represented by one or many digital objects. Note that if an item can be represented by a single digital object, Title of Original (above) and Title of Digital will be the same value. |
Input guidelines:
- Enter the natural language title either supplied by the original or created by the institution.
- Do not include initial articles such as "A", "An", or "The"
|
| Local Repository ID (Strongly Recommended, if applicable) |
| Definition: Local identification for institutions that have digital objects generated or held by subdivisions of the institution. Use of this element allows institutions to differentiate digital objects between divisions within their institution. |
Input guidelines:
- Use consistent coding for this established by your institution.
- Should be an alphanumeric structure.
- Do not use special characters such as >, <, &, #, ?, =, +, etc. or white spaces.
|
| Collection Source (Strongly Recommended, if applicable) |
| Definition: Title of the collection from which the originals were derived to produce the digital objects. This element allows you to track which objects from a collection have been digitized. |
Input guidelines:
- Enter the natural language title either supplied by the original or created by the institution.
- Do not include initial articles such as "A", "An", or "The"
|
| Project ID (Strongly Recommended, if applicable) |
| Definition: Local identification for institutions that have digital objects generated for specific projects. This element allows more consistent management of your digital assets. |
Input guidelines:
- Use consistent coding for this established by your institution.
- Should be an alphanumeric structure.
- Do not use special characters such as >, <, &, #, ?, =, +, etc. or white spaces.
|
CREATING THE DIGITAL OBJECT |
| Digital Creation Date (Required) |
| Definition: Date of creation for the digital object. This should be expressed in ISO 8601 date language (YYYYMMDD) and can be automatically generated by many systems. |
Input Guidelines:
- Use ISO 8601 format for recording date information (YYYYMMDD)
- Record year, month and day for more flexible management.
|
| Digital Creator (Required) |
| Definition: Creator (individual) of the digital object. |
Input Guidelines:
- Enter Last name, First name in a consistent manner: "Doe, Jane" or "Smith, John." Do not enter "Smith, John" on one record and "Smith, Johnny" on another.
- This can be set as a default in many systems if only one person is generating digital objects.
|
| Capture Hardware (Strongly recommended) |
| Definition: The hardware used to capture the digital object. Can include a scanner or digital camera. It is recommended to provide make and model. This can be considered relatively static information and defaults can be set with many systems. |
Input guidelines:
- Enter make and model in free text form.
|
| Capture Hardware Accessories (Strongly recommended) |
| Definition: Any hardware accessories, such as a special digital camera lens or lights used. |
Input Guidelines
- Free text description; can include make and model if appropriate.
|
| Capture Software (Strongly recommended) |
| Definition: The name and version of the software used to capture the digital object. Do not confuse with Manipulation Software. Can be set as default once software is determined. |
Input Guidelines:
- Name and Version of capture software in free text form.
|
| Capture Software Settings (Strongly recommended) |
| Definition: Any settings used in the creation of the object, such as exposure, color balance, or resizing. This information will be software-specific in terminology, etc. and is therefore connected to the software recorded in Capture Software element. |
Input Guidelines:
- Enter settings in free text form using the vocabulary specific to your Capture Software.
|
| Manipulation Software (Strongly recommended) |
| Definition: The name and version of the software used to manipulate the digital object after capture. Do not confuse with Capture Software which accounts for the software used to capture the digital object. Some software will perform both capture and manipulation functions. This should be listed in each of the Software fields. |
Input Guidelines:
- Name and Version of Image Manipulation Software in free text form.
|
| Manipulation Software Settings (Strongly recommended) |
| Definition: Any settings used in the manipulation of the image, such as exposure, color balance, or resizing. This information will be software-specific in terminology, etc. and is therefore connected to the software recorded in Manipulation Software element. |
Input Guidelines:
- Enter settings in free text form using the vocabulary specific to your Manipulation Software.
|
| Resolution (Required) |
| Definition: Resolution of the digital object in dots per inch for images; kilohertz for sound files. |
Input Guidelines:
- For images, enter the resolution value in dots per inch (dpi)
- For sound files, enter the resolution value in kilohertz (kHz)
|
| Compression (Strongly recommended) |
| Definition: Specification of whether or not the image of record has been compressed or not. Not applicable to Master Images which should not be compressed. |
Input guidelines:
- Enter yes or no
|
| Compression Type (Strongly recommended) |
| Definition: For those images that have been compressed, the type of compression performed. Not applicable to Master Images which should not be compressed. |
Input guidelines:
- Values include JPEG, LZW, PNG, etc.
|
| Compression Degree (Strongly recommended) |
| Definition: For those images that have been compressed, the level of compression that was done. Not applicable to Master Images which should not be compressed. |
Input guidelines:
- Enter the level of compression that was done to the derivative image using degree values.
|
| Dimensions (Strongly recommended) |
| Definition: Indicates the size of the digital object relative to display settings. This information is useful in detecting corruption of the digital object. |
Input guidelines:
- Record Height x Width in pixels.
- This information can be located within your digital capture or manipulation software.
|
| Bit Depth (Strongly recommended) |
| Definition: The bit depth of the digital image. |
Input guidelines:
- Standard values include:
| TYPES OF "SCAN" |
PREFERRED BIT-DEPTH |
THIS MEANS |
| Images |
| bi-tonal |
1 bit |
each pixel is either black or white |
| grayscale |
8 bit |
each pixel can be 1 of 256 shades of gray |
| color |
8 bit or 24 bit |
8 bit: each pixel can be 1 of 256 shades of color or 24 bit: each pixel can be
1 of 16.8 million color possibilities |
| Audio files |
| WAV file |
24 bit |
Stereo if the original is in stero; mono if the original is in mono |
| Mp3 |
16 bit |
Stereo if the original is in stero; mono if the original is in mono |
See the NC ECHO Guidelines for Digitization for more information about bit depth
|
| Color Space (Optional) |
| Definition: Color space refers to the base palette of the image. Most images made for use in digital displays are in RGB. Images that are made for use in printing (brochures, ads, etc.) are usually in CMYK. RGB should be the default. |
Input guidelines:
- Enter either RGB or CMYK
|
| Watermark (Strongly recommended) |
| Definition: For derivative images, a ye/no field indicating the use of a watermark in the digital object. Master images should NEVER contain a watermark. |
| Input guidelines:
|
| File Format (Required) |
| Definition: The file format of the digital object. While the file format often can be derived from the file name extension, providing it as a separate fields allows for much faster searching and indexing with the database. |
Input guidelines:
- Standard formats include JPG, GIF, and TIFF for images
- Standard formats include WAV and Mp3 for audio files
|
| Purpose (Required) |
| Definition: Indicates the purpose of the digital object. |
Input guidelines:
- Enter Master, Access or Thumbnail
Master: The highest resolution used for preservation and creation of digital surrogates. Should not be manipulated or compressed.
Access: Derivative object saved from the master object at a lower resolution for publishing online.
Thumbnail: Derivative object saved from the master object typically small in dimensions.
|
| Checksum (Required) |
| Definition: A form of redundancy check, the checksum can be used to detect errors unseen by the human eye. It does this by adding up the bits and storing the resulting value. The checksum value is a string of alphanumeric characters |
Input guidelines:
- Generate a checksum on hte original digital surrogate using a checksum utility, following the first five steps in Appendix A
- Record the alphanumerical value in the field as it is generated by the utility.
- When migrating the files, follow the remaining steps in Appendix A to ensure that no data has been changed or lost. Directions in Appendix A include comparison steps using Microsoft Excel for expedited checking.
|
| The freeware checksum utility that Appendix A was created for can be found at: http://www.freewarefiles.com/program_9_223_19077.html. |
REVISION OF THE DIGITAL OBJECT |
| Revision Date (Required, if applicable) |
| Definition: Repeatable field for recording the date of any changes to the digital object after its creation. This allows institutions to track changes to a single image over a long period of time by date. This could provide valuable information about migrations to other file formats, size changes, exposure changes, etc. It is important to enter the date information consistently using the ISO 8601 standard for date format |
Input guidelines:
- YYYYMMDD (use ISO 8601 standard for date format)
|
| Revision History (Required, if applicable) |
| Definition: Repeatable field for documenting any changes to the digital object after its creation. This allows institutions to track changes to a single image over a long period of time. This could provide valuable information about migrations to other file formats, size changes, exposure changes, etc. |
Input guidelines:
- Create free text statement.
|
RIGHTS MANAGEMENT |
| Copyright (Strongly recommended) |
| Definition: Field indicating the status of copyright to the content of the digital object. The creator of the digital image automatically owns copyright to that image, but may not own copyright to the source of the image. This field will assist in the management of rights for digital surrogates. Note that copyright for digital originals as well as copyright for digital images not created within an institution present other intellectual property rights problems. For more information about copyright issues, please see the NC ECHO Guidelines for Digitization: 3. Legal Considerations. |
Input guidelines:
- Enter Yes, No, Public
Yes = institution owns copyright to the source
No = institution does not own copyright to the source
Public = Source is in the public domain
|
| Creation Date of Source (Strongly recommended) |
| Definition: The creation or publication date of the source object. This assists institutions in monitoring when the content of the digital object enters the public domain. |
Input guidelines:
- Use ISO 8601 format for recording date information (YYYY)
- Year of creation only is required.
|
APPENDIX A: USING CHECKSUMS TO VERIFY DATA TRANSFERS
- Launch the checksum program by double clicking the icon. A Windows Explore type window will appear.
- Navigate to the folder that needs checksums and click the Checksum button.
- Another window will open and you will see the contents of the folder you have selected. Click Select All.
- If the folder you selected has folders within it and you would like a checksum produced for each individual file then click the Add recursively button. If you do not click this button the program will generate one checksum value for each folder in the directory.
- Click OK and the program will produce checksums for the selected files.
- After the program has generated the checksums a Save As dialogue box will appear.
- Navigate to where you would like to save the file and give the file a distinct name such as the folders path. Change the File type to Text File and click save.
- You can now transfer your files to their new location.
- Once the files are in their new location repeat steps 2-8.
- Once you have checksum files for both directories you can now compare them to ensure that there is no data loss.
- Open the checksum text file and copy the contents of the file. Paste this information into an Excel spreadsheet.
- Delete any rows in the spreadsheet that do not have checksums.
- Select column A and Click Data > Text to Columns.
- In the dialogue box that opens click the "Delimited" radio button and then NEXT.
- Place a check in the "Space" check box, click NEXT and then click finished.
- Label the columns with the location associated with those checksums.
- Repeat steps 11-16. In step 13 select column C instead of column A.
- You should now have 4 columns of information in your spread sheet that looks like this:

- Select the cell in column E row 2 and type "=EXACT(A1,C1)" then tab out of the field. This compares the two cells and returns a TRUE or FALSE.
- The value should say TRUE. If it does not, the transferred file is corrupt and must be retransferred.
- Select this field again; there will be a small black square that appears in the lower right hand corner of this box.
- Click and drag this box down the entire E column so that all the checksum values can be compared.
|