Using XML for Long-term Preservation: Experiences from the DiVA Project
2003 (English)Conference paper (Other academic)
One of the objectives of the DiVA project is to explore the possibility of using XML as a format for long-term preservation. For this reason, the practical use of XML in different parts of the system was evaluated before deciding on the design.
The DiVA Document Format - defined by an XML schema - has been developed to describe the inter-relationships amongst the various data elements and processes, and to support long-term preservation of the actual documents.
XML Schema provides a means for defining the structure, content and semantics of XML documents. It is an XML based alternative to the XML Document Type Definition (DTD). Because one of the primary reasons for using XML was to support long-term preservation, the most popular DTDs for documents: DocBook and TEI were evaluated. Limitations regarding metadata descriptions were found in both of these DTDs, so the decision to develop a new structure for DiVA, using XML schema, was made. This schema combines the DocBook Schema (derived from the DocBook DTD) for the textual parts of the document with the internal schema for all metadata (bibliographic and administrative data).
Using the DiVA Document Format for content management and inter-process communication, several applications were developed. Some of their purposes are essential for long-term preservation:
Make persistent National Bibliographic Numbers (NBN) available for the URN resolution service at the Royal Library in Stockholm available.
Send MARC21 records in MARC-XML to the National Library.
Create archival file packages for long-term preservation, checksum them, store them in the DiVA Archive and send a copy of them to the Swedish Royal Library.
Currently the file-archives for long-term preservation contain the original full-text file in various formats and the DiVA Document Format file, which contains all the metadata about the document. Furthermore the DiVA Document Format file contains all parts of the full-text file that can be converted into XML. In the future it might be possible to transfer the whole full-text into XML, in which case the file-archives would contain only DiVA Document Format files.
Place, publisher, year, edition, pages
Uppsala: Electronic Publishing Centre , 2003.
long-term preservation, XML, XML Schema, DiVA, DiVA Document Format, DiVA Archive, URN, URN:NBN
Other Social Sciences not elsewhere specified
IdentifiersURN: urn:nbn:se:uu:diva-87162OAI: oai:DiVA.org:uu-87162DiVA: diva2:79