The NCBI Historical Book Tag Set (hereafter NCBI Historical Tag Set) is a layer on top of the NCBI Book Tag Set, specifically
designed to describe historical materials. The NCBI Book Tag Set defines elements and attributes to model a wide variety of
books (such as pamphlets and monographs), describing both the metadata for a book and the content of the book. The Tag Set
can also be used to describe only the metadata for a book, if full digitized textual content is not available. The NCBI Historical
Tag Set adds, to the structures of the NCBI Book Tag Set, information relevant to the digitization of historical works, for
example metadata concerning the digital edition.
The NCBI Historical Tag Set makes several levels of changes to the NCBI Book Tag Set. Some new elements and attributes have
been added, the models of existing elements have been broadened to include the new material, new attribute values have been
added to existing attributes, and the way in which the <annotation> element is used has been changed. Elements unique to the Historical Book DTD include:
- Metadata elements for both books and individual contributors;
- Metadata concerning the electronic edition of the work that was scanned and digitized;
- Elements to record annotations or additions to the text; and
- Elements to record physical aspects of the volumes such as the page breaks.
Annotations of historical material are considered to be of two types: 1) those which add words to the text (for example, a
penciled marginal note) and 2) those that merely decorate words already in the text (for example, a phrase underlined in pencil).
Text-bearing annotations (i.e., those with new content) use the inline <alt-term> element or the <annotation> element. Decorations use the <named-content> element to surround words in the original text, with the attribute
content-type taking values like “pencil underline”
and “yellow highlight”. The <annotation> and <named-content> elements are not unique to the Historical Book Tag Set; they were already defined in the NCBI Book Tag Set, but they have
been put to new uses and may be used in new places. For example, in the NCBI Book Tag Set, the <annotation> element is used only within citations. In contrast, in the NCBI Historical Tag Set, this element is a block-level element
at the same level as a paragraph as
well as an inline-element inside textual passages. New attributes
were added to <annotation> to describe some
of these new roles and purposes.
The following structures added in the NCBI Historical Tag Set:
- Information in addition to the text of the work:
- Alternate Term — The <alt-term> element is used for placing a second version
of a word or phrase in the text, for example,
to add a more modern version of an out-dated
historical term. The inserted word is then present for searching and may also be
displayed next to the term, possibly pointed
out by a mechanism such as square brackets,
as an aid to the reader. For example, to
show the modern spelling of a disease name
the word “smallpox” could be added to the historical term “small-pox” to ensure that searches
for the modern
term find the page.
- Page Start — The <page-start> element is milestone element that marks the beginning of a
physical page in the printed edition of a
historical work. For display on the web,
a typical behavior for this element would be
to cause the display of a horizontal rule and
display the page number, so that print page
breaks may be seen in flowing webpages. The page number type is a hook for book-
specific processing, for example, to allow one
page number to be marked as primary; to
distinguish between printed numbers provided
by the publisher and penciled numbers added
by a library; or to make other pagination
- Page Number — The <page-num> element records one (potentially of many) page numbers associated with a pages. This is part of the page-start information.
- Running Head — The <running-head> element records one (potentially of many) running heads printed on the page. For historical works these frequently cannot
be algorithmically determined from the metadata.
- Additional metadata elements were also added:
- Contributor Date — The <contrib-date> element records for birth dates, death dates, and
flourished dates for a contributor such as an author.
- Printer — The element <printer>, <printer-name>, and <printer-loc> were added to record the name and location (usually city) of the book’s printer.
- Digital Edition Metadata — The <digital-edition-meta> element holds the metadata that
digitization of the document, that is, metadata that
pertains to the electronic transcription not to the work itself.
Many of the elements in the digital edition
metadata are identical to elements in the
book metadata, but their content will refer
to the electronic XML copy of the book,
not to the work as a whole or to the specific
For example, the <contributor> element in
the book metadata will name the book's author,
editor, illustrator, or other person in a
primary creative role. The contributor in the
digital edition metadata began his or her
effort with a particular print volume
and has played a role in the transcription,
annotation, emendment, translation, etc. of
that printed book for the electronic archives.