General Introduction

Book and Collection Tag Sets

The NCBI Book Tag Sets were written using the Publishing Tag Set as a base and adding book-specific elements. The Book Tag Sets define elements and attributes that describe the content of books (such as pamphlets and monographs) and book collections, respectively.

Unlike the Journal Tag Sets, which were written generically to be of wide-ranging applicability beyond NLM, the Book Tag Sets were written with a more modest purpose, to describe volumes for the NCBI online libraries.

The book modules have been arranged to form two Tag Sets:

The NCBI Book Tag Set, while written to describe both the metadata for a book and the content of the book, can also be used to describe only the metadata for both books and book parts, such as “Chapters”. The NCBI Collection Tag Set describes a document that contains metadata for a grouping or “collection” of books, potentially followed by textual information about the books in the collection, and a listing of the books; the content of the books is not included in the Collection Tag Set.

This Tag Library describes the Book Tag Set, Book Collection Tag Set, and the Archiving and Interchange Suite, by providing:

Book Tag Set

The Book Tag Set provides the model for a single book, pamphlet, monograph, etc. It is possible to model an entire book, including its bibliographic metadata as well as its content or to model only the book metadata, to allow books to be modeled where the content is not in XML. A complete book typically contains metadata about itself; some textual front matter (such as a book preface or list of contributors); the book’s content (the text and graphical material that make up the body of the work); and possibly some ancillary back matter such as appendixes. Book components must appear in the following order:

  • Book Metadata (required). The metadata contains the bibliographic information for the book, for example, the book title, the date of publication, a copyright statement, a list of keywords, etc.
  • Textual Front Matter (optional). The book front matter may contain textual material such as a preface, introduction, or biographical information about the book’s author.
  • Body of the Book (optional). The body of the book is the main textual and graphic content of the book. This usually consists one of more structured book components, typically designated as “Parts”, “Chapters”, “Modules”, “Sections”, or similar; etc. (all of which will be tagged as <book-part>s). These large structural units may themselves contain paragraphs or sections, tables, sidebars (boxed text), etc. The body of a book is optional to allow the possibility of XML metadata for a work in another format such as PDF.
  • Back Matter for the Book (optional). If present, the back matter contains information that is ancillary to the main text, such as a glossary, appendix, or list of cited references. (Note that a book may include such back matter at the end of <book-part>s as well as at the end of the book, i.e., individual Parts or Chapters of a book may contain their own list of references, glossaries. Appendices, etc.)

Collection Tag Set

The Collection Tag Set describes a group of books; the intent is to provide information about the collection, not to provide the contents of the books themselves. Accordingly, the top-level element, <collection>, serves as a container element for metadata about the collection, any text discussing the collection as a whole, and a list of the books. The <collection> element contains the following elements, in order:

  • Collection metadata (required) This component contains the bibliographic information for a collection of books, for example, a name for the Collection, publication date, a list of the books in the collection, etc.
  • Collection Description Textual Front matter (optional). If present, this is textual material describing the collection or the books within it.
  • Collection Description Body (optional). If present, this component includes any textual descriptions of the collection as a whole or its collection members.
  • Collection Description Back Matter (optional). If present, the Collection back matter contains information that is ancillary to the main text of the collection description.


Because access for a wide range of output devices, as well as for the visually impaired, is becoming more and more important in the STM journal community, the modules in the Archiving and Interchange Suite were designed to follow, as much as possible, the W3C Web Content Accessibility Guidelines 2.0 working draft (22 August 2002), which was the latest accessibility specification available when the Suite was initially constructed. This Specification specifies accessibility guidelines on many levels from design through application. The guidelines which pertain to the modeling of materials were followed to at least Level-2 compliance. For example, a Long Description <long-desc> element was defined as part of many other elements, such as Figure <fig>, so it can be added not only to all figures and other graphical objects, but to any section of the text (for example, to a Boxed Text <boxed-text>) to provide an accessible description of the object. The @xml:lang attribute was added to all section-level elements and many paragraph-level elements to permit explicit indication of the language of the content, as required by these guidelines. The Abbreviation or Acronym <abbrev> element (also to be used for acronyms) was added to meet Checkpoint 4.3.

How to Read This Tag Library

Terms and Definitions


Elements are nouns, like “speech” and “speaker”, that represent components of books, the book itself, and accompanying metadata.


Attributes hold facts about an element, such as which type of list (e.g., numbered, bulleted, or plain) is being requested when using the List <list> tag, or the name of a pointer to an external file that contains an image. Each attribute has both a name (e.g., @list-type) and a value (e.g., “bullet”).


Data about the data, for example, bibliographic information. The distinction is between metadata elements which describe a book (such as the name of book’s publisher) versus elements which contain the textual and graphical content of the book.

How To Start Using This Tag Library

The Book and Book Collection Tag Sets are available on the Archiving and Interchange Tag Suite Website including: the Tag Set models expressed as DTDs; one or more tagged sample documents; and this Tag Library documentation (a set of linked HTML files). The DTD file delivery includes the Book and Book Collection DTDs; the Tag Set customization files needed for each DTD to override the declarations in the modular Suite, any new modules that define Tag-Set-specific elements, and the modules that comprise the full Suite.

How you use the documentation will depend on what you need to learn about the modules and the Tag Set.

Learn the Tag Set

If you want to learn about the elements and the attributes in this Tag Set so you can tag documents or learn how the book (or book collection) model is constructed, here is a good way to start.

  • Read the Tag Library General Introduction, taking particular note of the section that describes the parts of the Tag Library so you will know what resources are available.
  • Next, if you do not know the symbols used in the Document Hierarchy diagrams, read the “Key to the Near & Far® Diagrams”.
  • Scan the Document Hierarchy diagrams to get a good sense of the top-level elements and their contents. (Find what is inside an <book>, now what is inside each of the four large pieces of a book, keep working your way down.)
  • Pick an element from one of the diagrams. Look up the element in the Elements Section to find the full name of the element, its definition, usage notes, content allowed, and any attributes. Look up one of the attributes to find its full name, usage notes, and potential values.

Finally, if you are interested in conversion from a particular source:

  • Look at a printed or online book or look at the DTD/schema for the other book.
    • Can all the information you want to store from a book fit into the models shown in the diagrams?
    • Do you have, or know how to get, all the information the models require? Will that information always be available for documents that are complete and correct?
    • How difficult will it be to identify the parts of the information using the elements and attributes described in these models? Would changes to one or more models make this easier?

Structure of This Tag Library

This Tag Library contains the following sections:

How To Use (Read Me First

How to make best use of this Tag Library to reference XML tags, become familiar with the Book and Book Collection Tag Sets as a whole, or see examples of recommended usage.


This introduction to the contents of this Tag Library, to the design philosophy and intended usage of the Archiving and Interchange Suite, and to the Book and Collection Tag Sets.

Elements Section

Descriptions of the elements used in the Book and Collection Tag Sets and the parts of the Archiving and Interchange Suite used in these Tag Sets. The element descriptions are listed in alphabetical order by tag name.

[Note: Each element has two names: a “tag name” (formally called an element-type name) that is used in tagged documents, in the DTDs/schemas, and by XML software; and an “element name” (usually longer) that provides a fuller, more descriptive name for the benefit of human readers. For example, a tag name might be <disp-quote> with the corresponding element name Quote, Displayed, or a tag name might be <verse-group> with the corresponding element name Verse Form for Poetry.]

Attributes Section

Descriptions of the attributes in the Book and Collection Tag Sets. Like elements, attributes also have two names: the shorter machine-readable one and a (usually longer) human-readable one. Attributes are listed in order by the shorter, machine-readable names. For example, the attribute short name @list-type instead of the more informal, easier to read: Type of List.

Parameter Entity Section

Names (with occasional descriptions) and contents of the parameter entities in the Tag Sets modules.

Document Hierarchy Diagrams

Tree-like graphical representations of the content of many elements. This can be a fast, visual way to determine the structure of a book (or book collection) or of any element within a book.

“Full” Book Sample

A representative book sample is provided in both PDF format and in XML according to this Tag Set. This is provided to help users understand the relationship between the book as displayed and the XML version of the book. The XML sample provides information about the book as a whole, one of the book’s more complicated chapters, and some of the book’s back matter. The PDF shows only the chapter content.

Common Tagging Practice

Tips, tricks, hints, and examples of how (and why) to tag certain structures using these Tag Sets.

Implementing These Tag Sets

Implementor’s instructions for using the Tag Sets, customizing a Tag Set, or making derivative tag sets based on one.

Change Report: Version 3.0

Detailed information about the changes between version 3.0 and 2.3 for both for the base Suite and the Book and Book Collection Tag Sets. The report has two sections: the first identifies how each element changed between the two versions; the second provides an implementor’s view of the changes in parameter entities, content models, attribute lists and/or values, etc.


The Book and Book Collection Tag Sets are available asan XML Document Type Definition (DTD), available in two forms: a zipped file containing a downloadable version of the DTD (in multiple files), and a readable/browsable version in which the internal markup has been escaped.

Context Table

A listing of where each element may be used. All elements in the Tag Set are given in a single alphabetical list.

The Context Table is formatted in two columns. The first column (“This Element”) names an element, with the name shown in pointy brackets. In the second column (“May Be Contained In”) for each element is an alphabetical list of all the elements in which the first column element may occur. For example, if the first column contains the element <book-front> and the second column contains only the <book> element, this means that the <book-front> element may only be used directly inside an <book>. Most elements may be used inside more than one other element. For example, the element <def> (a definition) may be used inside the <abbrev> and the <def-item> elements.

The Context Table contains the same information that is found on each element page under the heading “May Be Contained In”.


Where to find elements, tags, and terms used in this Tag Library. Includes synonyms (terms not used in this Tag Set) that direct the reader to elements used in this Tag Library, for example, “author” is paired with Contributor <contrib>.

Tag Library Typographic Conventions

<alt-text> The tag name of an element (written in lower case with the entire name surrounded by “< >”)
Alternate Title Text (For a Figure, Etc.) The element name (long descriptive name of an element) or the descriptive name of an attribute (written in title case, with important words capitalized, and the words separated by spaces)
must not Emphasis to stress a point

Changes in Version 3.0

This version of the Tag Suite and the derivative Tag Sets marks a significant departure from all previous versions. Before now, all changes to the Suite had been fully backward compatible. That meant that while the Suite changed and grew, no document tagged according to a particular Tag Set was invalid because of any new or changed models. For the first time, this is not the case. Version 3.0 is non-backward compatible, meaning documents which were valid against any previous version may not be valid against this version. Some users will choose to keep their existing documents as they are, valid according to an older version of the Tag Set and to use the new Tag Set for future documents. Some users will convert their existing documents to the new Tag Set, a conversion that we believe will be mostly automatable in most cases. It is possible that some users will choose to continue to use an older version of the Tag Set.

The rationales behind the changes vary, but all were made with the intent to make it easier to move forward. There has been nearly six years of experience working with this Tag Set on books throughout the range of the NCBI Bookshelf. Many of the changes were of the “if we had known then what we know now” variety. Others were changes that have been obvious improvements for four or five years, but could not be made within the scope of normal maintenance because they would have been non-backward compatible.

Version 3.0 includes the regular scope changes and additions that have been the subject of previous version releases. For example, a new funding model (<funding-group>) has been added to reflect the changing needs of the community. In addition to this kind of change, version 3.0 changes also include:

A report listing all changes is available in the Change Report section.


The NCBI Handbook provided the majority of sample material used in this Tag Library.