General Introduction


NCBI Book and Collection Tag Sets

The NCBI Book Tag Set and NCBI Book Collection Tag Set (hereafter NCBI Collection Tag Set) define elements and attributes that describe the content of books (such as pamphlets and monographs) and book collections, respectively. The NCBI Book Tag Set describes both the metadata for a book and the content of the book, but it can also describe only the metadata for both books and book parts, such as “Chapters”. The NCBI Collection Tag Set describes a document that contains metadata for a grouping or “collection” of books, potentially followed by textual information about the books in the collection, and a listing of the books. The content of the books is not included in the Collection Tag Set. Although designed for biomedical books, the NCBI Book Tag Set should be sufficiently general to describe not only STM books but also technical books in any field.

The NCBI Book Document Type Definition (DTD) and the NCBI Collection DTD define the foundational constraint language for the tag set. The DTDs comprises a few DTD-specific modules and use (by reference) the base modules of the Journal Archiving and Interchange Suite. The modules of that Suite were developed as part of an effort to create XML applications through which materials on health-related disciplines could be shared and reused electronically. The Suite has been used to construct many tag sets in addition to this one. Although the full Suite was developed to support electronic journal production, the structures should be adequate to support book material and limited print production as well as electronic

This Tag Library describes the NCBI Book Tag Set, NCBI Book Collection Tag Set, and the Journal Archiving and Interchange Suite, by providing:


NCBI Book Tag Set

The NCBI Book Tag Set provides the model for a single book, pamphlet, monograph, etc. It is possible to model an entire book, including its bibliographic metadata as well as its content or to model only the book metadata, to allow books to be modeled where the content is not in XML. A complete book typically contains metadata about itself; some textual front matter (such as a book preface or list of contributors); the book’s content (the text and graphical material that make up the body of the work); and possibly some ancillary back matter such as appendixes. Book components must appear in the following order:

  • Book Metadata (required). The metadata contains the bibliographic information for the book, for example, the book title, the date of publication, a copyright statement, a list of keywords, etc.
  • Textual Front Matter (optional). The book front matter may contain textual material such as a preface, introduction, or biographical information about the book’s author.
  • Body of the Book (optional). The body of the book is the main textual and graphic content of the book. This usually consists one of more structured book components, typically designated as “Parts”, “Chapters”, "Modules", “Sections”, or similar; etc. (all of which will be tagged as <book-part>s). These large structural units may themselves contain paragraphs or sections, tables, sidebars (boxed text), etc. The body of a book is optional to allow the possibility of XML metadata for a work in another format such as PDF.
  • Back Matter for the Book (optional). If present, the back matter contains information that is ancillary to the main text, such as a glossary, appendix, or list of cited references. (Note that a book may include such back matter at the end of <book-part>s as well as at the end of the book, i.e., individual Parts or Chapters of a book may contain their own list of references, glossaries. Appendices, etc.)

NCBI Collection Tag Set

The NCBI Collection Tag Set describes a group of books; the intent is to provide information about the collection, not to provide the contents of the books themselves. Accordingly, the top-level element, <collection>, serves as a wrapper element for metadata about the collection, any text discussing the collection as a whole, and a list of the books. The <collection> element contains the following elements, in order:

  • Collection metadata (required) This component contains the bibliographic information for a collection of books (such as a historical collection), for example, a name for the Collection, publication date, a list of the books in the collection, etc.
  • Collection Description Textual Front matter (optional). If present, this is textual material describing the collection or the books within it.
  • Collection Description Body (optional). If present, this component includes any textual descriptions of the collection as a whole or its collection members.
  • Collection Description Back Matter (optional). If present, the Collection back matter contains information that is ancillary to the main text of the collection description.


Tag Set Version 2.3

Version 2.3 was an enhancement release that made changes requested by the Working Group based on operational concerns. All changes to content models and attribute lists have been made backward compatible. As an example, a new XHTML style attribute was added to all the elements inside <table> and the content-type attribute was distributed widely to add or preserve semantic information. The Book, Historical, and Collection 2.3 Change Report may be accessed through the following page: http://dtd.nlm.nih.gov/book/2.3/index.html.


Modular DTD Design

The NCBI Book DTD, the NCBI Collection DTD, and the full NLM Archiving and Interchange Suite have been written as a set of XML DTD “modules”, each of which is a separate physical file. No module is an entire DTD by itself, but these modules can be combined into a number of different DTDs. Modules are primarily intended for maintenance and creation of new DTDs; all the elements of the same “type” (class) are stored together.

The NCBI Book DTD and NCBI Collection DTD have been built from these modules, and both of these DTDs and individual modules from the Archival and Interchange Suite will be described in this Tag Library.

The major disadvantage of a modular system is the longer learning curve, since it may not be immediately obvious where within the system to find a particular element or attribute cluster. To help with this, the description of each element in the Element Section of this documentation names the module in which that element is defined.

There are many advantages to such a modular approach. The smaller units are written once, maintained in one place, and used in many different DTDs. This makes it much easier to keep lower level structures consistent across document types, while allowing for any real differences that analysis identifies. A DTD for a new function (such as an authoring DTD) or a new publication type (such as a book) can be built quickly, since most of the necessary components will already be defined in the Suite. Editorial and production personnel can bring the experience gained on one tagging project directly to the next with very little loss or retraining. Customized software (including authoring, typesetting, and electronic display tools) can be written once, shared among projects, and modified only for real distinctions.


How to Read This Tag Library


Terms and Definitions

Element

Elements are nouns (such as “Speech”, “Book”, and “Speaker”) that represent the components of books, the book itself, and accompanying metadata about the book.

Attribute

Attributes hold facts about an element, such as which type of list (e.g., numbered, bulleted, or plain) is being requested when using the List <list> tag, or the name of a pointer to an external file that contains an image. Each attribute has both a name (e.g., list-type) and a value (e.g., “bulleted”).

Metadata

Data about the data, for example, bibliographic information. The distinction is between metadata elements which describe a book (such as the name of book’s publisher) versus elements which contain the textual and graphical content of the book.


Structure of This Tag Library

This Tag Library contains the following sections:

Introduction

This introduction to the contents of this Tag Library, to the design philosophy and intended usage of the Journal Archiving and Interchange Suite, and to the NCBI Book and NCBI Collection Tag Sets.

Elements Section

Descriptions of the elements used in the NCBI Book Tag Set, the NCBI Collection Tag Set, and most of the base Suite. The element descriptions are listed in alphabetical order by tag name.

(Note: Each element has two names: a “tag name” (formally called an element-type name) that is used in tagged documents, in the DTDs/schemas, and by XML software, and an “element name” (usually longer) that provides a fuller, more descriptive name for the benefit of human readers. For example, a tag name might be <disp-quote> with the corresponding element name Quote, Displayed or a tag name might be <verse-group> with the corresponding element name Verse Form for Poetry.

Attributes Section

Descriptions of the attributes used in these Tag Sets. Like elements, attributes also have two names: the shorter machine-readable one and a (usually longer) human-readable one. Attributes are listed in order by the shorter machine-readable names, for example, the attribute short name “fig-type” instead of the more human-friendly (or at least more obvious)name “Type of Figure”.

Parameter Entity Section (Implementors Only)

Names, description, and contents of the Parameter Entities in these DTD modules.

Context Table

Listings of where each element may be used. All elements in the Tag Set are given in a simple alphabetical list.

The Context Table is formatted in two columns. The first column lists an element’s tag name as well as its descriptive name, and the second column lists the tag and name pairs of all the elements in which the listed element may occur. For example, if the first column contains the book front matter element <book-front> and the second column contains only the element <book>, this means that the <book-front> (Book Front Matter) element may only be used inside an <book> (Book) element.

Most elements may be used inside more than one other element. For example, the element <access-date> (Access Date for Cited Work) may be used inside the <citation>, <nlm-citation>, and <related-article> elements.

Note: These Context Table listings (which list where an element may be used) are the inverse of the content definition that is given as a part of each element description, which lists what can be inside the named element.

Book Hierarchy Diagrams

Tree-like graphical representations of the content of many elements. This can be a fast visual way to determine the structure of an book (or book collection) or of any element within a book (or book collection).

NCBI Book and Suite Naming Rules

How to build element, attribute, Parameter Entity and file names when adding to the Suite

“Full” Book Sample

A representative book sample is provided in both PDF format and in XML according to this tag set. These samples are provided to help users understand the relationship between the book as displayed and the XML version of the book. The XML sample provides information about the book as a whole, one of the book’s more complicated chapters, and some of the book’s back matter. The PDF shows only the chapter content.

DTD Section

Reference copies of the NCBI Book DTD, its customization modules, its book-specific element modules, the NCBI Collection DTD, and the modules of the Archiving and Interchange Suite used by these two DTDs.

Index By Element Name

Index of element descriptions, alphabetically by element name (the longer, more descriptive name)

Index By Tag Name

Index of element descriptions, alphabetically by tag name (element-type name)


Tag Library Typographic Conventions

<alt-text> The tag name of an element (Written in lower case monofont with the entire name surrounded by “< >”)
Alternate Title Text (For a Figure, Etc.) The element name (long descriptive name of an element) or the descriptive name of an attribute (Written in title case, that is, with important words capitalized, and the words separated by spaces)
must not Emphasis to stress a point

How To Start Using This Tag Library

The NCBI Book, Historical Book, and NCBI Book Collection Tag Sets are available on the NLM Journal Archiving and Interchange Tag Suite website including: the Tag Set models expressed as DTDs, XSD schemas, and RELAX NG schemas; one or more tagged sample documents; and this Tag Library documentation (a set of linked HTML files). The DTD file delivery includes the base DTDs for Book, Collection, and Historical; the DTD customization files needed for each DTD to over-ride the declarations in the NLM modular Suite, any new modules that define DTD-specific elements, and the modules that comprise the full Suite.

How you use the documentation will depend on what you need to learn about the modules and the Tag Set.


Explore the Tag Set

If you want to learn about the elements and the attributes in the Suite, so that you can tag documents or read tagged documents, here is a good way to start.

  • Read the Tag Library General Introduction, taking particular note of the next section which describes the parts of the Tag Library, so you will know what resources are available. Stop just before the section “How To Make New DTDs From These Modules”.
  • Next, if you do not know the symbols used in the Document Hierarchy diagrams, read the “Key to the Near & Far® Diagrams”.
  • Scan the Document Hierarchy diagrams to get a good sense of the top-level elements and their contents. (If working with the NCBI Book Tag Set, find what is inside a <book>, then find what is inside each of the large pieces of a book and keep working your way down.)
  • Pick an element from one of the diagrams. (Look up the element in the Elements Section to find the full name of the element, its definition, usage notes, content allowed, and any attributes. Look up one of the attributes to find its full name, usage notes, and potential values.)


Learn the DTD Structure

If you want to learn about the Tag Set in order to write a new DTD or to modify either of these DTDs:

  • Skim the Tag Library General Introduction.
  • If you do not know the symbols used in the Document Hierarchy diagrams, then read the “Key to the Near & Far® Diagrams”.
  • Use the Document Hierarchy diagrams to give you a good sense of the top-level elements and their contents.
  • Pick an element from one of the diagrams (Look up the element in the Elements Section to find the full name of the element, the definition, usage notes, content allowed, and attributes list. Look up one of the attributes to find its full name, usage notes, and potential values.).
  • Read the DTD Modules, given at the end of this documentation.

New DTDs are created by writing, at a minimum, a new DTD module and new customization modules, so you might want to read (in order):


Modules in These DTDs
(Implementors Mainly)

The NCBI Book Tag Set and NCBI Collection Tag Set were written as customizations of the NLM Archiving and Interchange Suite. The basic Suite has module for defining tables, paragraphs, etc. The NCBI Book Tag Set (book.dtd) and its customization modules define the elements for a monograph or book and also calls in the Suite modules. In contrast, the NCBI Collection Tag Set bookcollection.dtd and its customization modules define a group of books, where the intent is to give information about the collection, and list its members, but not provide the actual content of any member book.

The NCBI Book Tag Set expressed as a DTD is comprised of the Book DTD module itself (book.dtd), the four book customization modules that are used to over-ride element and attribute declarations in the Suite (%bookcustom-modules.ent;, %bookcustom-classes.ent;, %bookcustom-mixes.ent;, %bookcustom-models.ent;), and the five modules that add new elements and attributes over and above what the Suite provides [%bookmeta.ent;, %bookpart.ent;, %bookimagemap.ent;, %bookmultilink.ent;, and %bookrelated-object.ent; (used solely as part of the Book Collection Tag Set)].

NCBI Book DTD

(File name book.dtd) The top-level NCBI Book DTD Module that declares the document element (<book>) and the other top-level elements that define the primary components of a book (book metadata, book front matter, body, and back matter). The DTD invokes all the modules it uses, by reference, as external Parameter Entities: first the NCBI Book DTD Module of Modules is called to name all Book-specific customized modules, then the Suite Module of Modules is called to name all the potential modules from the Suite, then customized and default modules are called (for Parameter Entities naming element classes, mixes, and models), then the Common Module for shared elements and attribute lists is called, and finally all the other Suite element modules are called as needed, and the four new Book-specific element modules, %bookmeta.ent;, %bookpart.ent;, %bookimagemap.ent;, and %bookmultilink.ent;. (The module %bookrelated-object.ent; is used solely as part of the Book Collection Tag Set.)


Book Over-ride Modules

The NCBI Book DTD customization modules over-ride the definitions in the modules of the Suite:

Book-Specific Module to Name Modules

(Parameter Entity %bookcustom-modules.ent;) Defines all the external modules that are specific to the NCBI Collection DTD or NCBI Book DTD (except itself, which must be both named and called inside a DTD). A DTD can select from these modules by referencing the module names through external Parameter Entities. The entities are declared in this module, but referenced (actually called in) in the DTD proper. To include a set of elements (such as all the lists or all the MathML elements), a DTD references the external Parameter Entity of the module (defined in this module) that contains these declarations.

Note: The NCBI Book DTD Module of Modules and the Suite Module of Modules need to be the first two external modules called by either the NCBI Collection DTD or the NCBI Book DTD. Customization modules for classes, mixes, and models will typically be called following the NCBI Book DTD Module of Modules and the Suite Module of Modules.

Book-Specific Class Over-rides Module

(Parameter Entity %bookcustom-classes.ent;) Sets up Parameter Entities that will be used to override the Suite’s default classes (those that are described in the %default-classes.ent; module)

Note: This module must be called before %default-classes.ent; module (which this module overrides) and the %bookcustom-mixes.ent; and %bookcustom-models.ent; modules (which may build on classes defined in this module).

Book-Specific Mix Over-rides Module

(Parameter Entity %bookcustom-mixes.ent;) Sets up Parameter Entities that will be used to override default mixes (groupings made of “classes”) prescribed by the %default-mixes.ent; module

Note: This module must be called after the Book Customize Classes Module (%bookcustom-classes.ent;) and the Default Classes Module (%default-classes.ent;) but before the %default-mixes.ent; module (which this module overrides) and the %bookcustom-models.ent; module (which may build on mixes defined in this module).

Book-Specific Models/Attributes Over-rides Module

(Parameter Entity %bookcustom-models.ent;) Sets up Parameter Entities that will be used to override default content model Parameter Entities set elsewhere in the Suite. Also defines customizable attribute Declared Values and attribute lists for the DTD being defined.

Note: This module must be called after the NCBI Book DTD Customize Mixes Module (%bookcustom-mixes.ent;) and Default Mixes Module (%default-mixes.ent;) but before any “base” modules of the Suite.


Book Element Modules

The new elements added just for the NCBI Book Tag Set are defined in the following modules:

Book DTD Metadata Module

(Parameter Entity %bookmeta.ent;) Describes book-specific metadata elements that are not defined in the Suite metadata module %articlemeta.ent;

Book DTD Book Part Module

(Parameter Entity %bookpart.ent;) Declares book-component-level metadata, such as chapter-specific or part-specific metadata elements

Book DTD Image Map Module

(Parameter Entity %bookimagemap.ent;) Declares the elements used to create client-side image maps, which make hot spots on graphics

Book DTD Multilink Module

(Parameter Entity %bookmultilink.ent;) Defines links to external resources (Note: The external and multiple external links defined in this module are used in the NCBI Book Tag Set for external links instead of the XLink mechanism. The XLink mechanism, although deprecated, is still specified in all of the Suite modules.)


Suite Setup Modules

The basic Suite modules over-ridden by the BOOK-specific customization modules just named include the following:

Suite Module to Name the Modules

(Parameter Entity %modules.ent;) Defines all the external modules that are part of the modular Archiving and Interchange Suite (except itself, which must be both named and called inside a DTD). A DTD selects from these modules by referencing the module names through external Parameter Entities. The entities are declared in the Suite Module of Modules (%modules.ent;), but referenced (or not) in the DTD proper. To include a set of elements (such as all the book metadata or all the display elements) a DTD references the external Parameter Entity of the module that contains these declarations.

Note: The NCBI Book DTD Module of Modules and the Suite Module of Modules need to be the first two external modules called by either the NCBI Collection DTD or NCBI Book DTD. Customization modules for classes, mixes, and models will typically be called following the NCBI Book DTD Module of Modules and the Suite Module of Modules.

Suite Default Element Classes Module

(Parameter Entity %default-classes.ent;) Sets up the Parameter Entities that name the element members of each class that will be used to establish the content models

Note: This module must be called before the Book Customize Mixes Module (%bookcustom-mixes.ent;) and the Default Element Mixes Module (%default-mixes.ent;), as well as the Book Customize Models Module, %bookcustom-models.ent; (which builds on those modules).

Suite Default Element Mixes Module

(Parameter Entity %default-mixes.ent;) Sets up the Parameter Entities that name mixes (groupings made of “classes”) that will be used to establish the content models

Note: This module must be called before the Book Customize Models Module (%bookcustom-models.ent;) or any “base” module of the Interchange Suite.


NCBI Collection Tag Set

The NCBI Collection Tag Set models a collection list and description, is a small Tag Set that defines a few collection elements and uses all of the book and Suite modules.

NCBI Collection DTD

(File name bookcollection.dtd) The top-level NCBI Collection DTD Module that declares the document element (<collection>) and the other top-level elements that define a grouping of related books (collection metadata, book front matter, body, and back matter). All elements but these few — and the elements needed to flesh out a collection’s metadata such as <collection-list> — are declared in the modules of the Suite. The DTD invokes all the other modules it uses, by reference, as external Parameter Entities: first the NCBI Book DTD Module of Modules is called to name all Book-specific customized modules, then the Suite Module of Modules is called to name all the potential modules from the Suite, then customized and default modules are called (for Parameter Entities naming element classes, mixes, and models), then the Common Module for shared elements and attribute lists is called, and finally all the other modules are called as needed, including the DTD-specific element modules, %bookmeta.ent;, %bookpart.ent;, %bookimagemap.ent;, %bookmultilink.ent;, and %bookrelated-object.ent; (used solely as part of the Book Collection Tag Set).

Book Related Object Module

(Parameter Entity %bookrelated-object.ent;) Defines the wrapper element <related-object>, used as a container for text links to a related object, possibly accompanied by a very brief description of the object (typically a related book). (This is a temporary module until the full Suite is updated to include this element, probably as part of %common.ent; module in release 3.0.)


Basic Suite Modules

The modules comprising the rest of the Suite that are used to build both the NCBI Book DTD and the NCBI Collection DTD are the following:

Common (Shared) Elements Module

(Parameter Entity %common.ent;) Declarations for elements, attributes, entities, and notations that are shared by more than one class module

Note: This module must be called before any of the modules comprising the Interchange Suite.

Article Metadata Elements Module

(Parameter Entity %articlemeta.ent;) Declares the metadata elements (issue elements and article header elements) used to describe a journal article. This module has been incorporated in the NCBI Book DTD and NCBI Collection DTD to include metadata elements that, although previously declared to model journal articles, are also used in the metadata of a book or book component such as a chapter.

Back Matter Elements Module

(Parameter Entity %backmatter.ent;) Declares elements that are not part of the main textual flow of a work, but are considered to be ancillary material such as appendices, glossaries, and bibliographic reference lists

Display Class Elements Module

(Parameter Entity %display.ent;) Declares the display-related elements, such as figures, graphics, math, chemical expressions and structures, tables, etc.

Format Class Elements Module

(Parameter Entity %format.ent;) Declares elements concerned with rendition of output, for example, printing on a page or display on a screen. This module includes the elements in the Appearance Class, the Break Class, and the Emphasis Class.

Link Class Elements Module

(Parameter Entity %link.ent;) Declares elements that are links (internal or external) by definition, such as URLs (<uri>) and internal cross references (<xref>)

List Class Elements Module

(Parameter Entity %list.ent;) Declares the elements in the List Class; these are all lists except the lists of bibliographic references (citations). Lists are considered to be composed of items.

Math Class Elements Module

(Parameter Entity %math.ent;) Declares the elements in the math classes such as display equations

Paragraph-Like Elements Module

(Parameter Entity %para.ent;) Declares structural, non-display elements that may appear in the same places as a paragraph. These elements are named in the various paragraph class Parameter Entities.

Subject Phrase Class Elements Module

(Parameter Entity %phrase.ent;) Declares the Phrase Class elements, that is, names the inline, subject-specific elements.

Bibliographic Reference (Citation) Class Elements Module

(Parameter Entity %references.ent;) Declares the bibliographic reference elements

Section Class Elements Module

(Parameter Entity %section.ent;) Declares the elements of the Section Class, that is, declares all section-level elements in the NCBI Book Tag Set (or NCBI Collection Tag Set).

MathML Setup Module

(Parameter Entity %mathmlsetup.ent;) Invokes the MathML modules

DTD Creation Note: To include the MathML elements, a DTD must reference this module. This module sets up all Parameter Entities needed to use the MathML tag set and references (invokes) the MathML 2.0 Tag Set Module, which, in turn, invokes all the other MathML modules.

MathML 2.0 Tag Set Module

(Parameter Entity %mathml.dtd;) Mathematical Markup Language (MathML) 2.0, an XML application for describing mathematical notation and capturing both its structure and content

MathML 2.0 Qualified Names 1.0

(Parameter Entity %mathml-qname.mod;) Declares Parameter Entities to support namespace-qualified names, namespace declarations, and name prefixing for MathML, as well as declares the Parameter Entities used to provide namespace-qualified names for all MathML element types

Extra Entities for MathML 2.0

(Parameter Entity %ent-mmlextra;) Used for MathML processing

Aliases for MathML 2.0

(Parameter Entity %ent-mmlalias;) Used for MathML processing

XHTML Table Setup Module

(Parameter Entity %XHTMLtablesetup.ent;) Sets all Parameter Entities needed by the XHTML table model, and then invokes the module containing that model

DTD Creation Note: To include the XHTML table model in a tag set, a DTD must reference this module. This module sets up all Parameter Entities needed to use the XHTML table model and references (invokes) the XHTML Table Model Module. (See next item.)

XHTML Table Model Module

(Parameter Entity %xhtml-table-1.mod;) The public XML DTD version of the XHTML table model. This module is invoked from the module %XHTMLtablesetup.ent;. (See previous item.) This is the default table model for this tag set.

XHTML Table Style Module

(Parameter Entity %xhtml-inlstyle-1.mod;) Declares the style attribute, which supports inline style markup for elements such as <td>and <tr> elements within XHTML tables.

OASIS XML Table Setup Module

(Parameter Entity %oasis-tablesetup.ent;) Note: Not used in the current NLM Book DTD. Sets all Parameter Entities needed by the OASIS (CALS) Exchange table model, and then invokes the module containing that model

DTD Creation Note: To include the OASIS table model in a DTD, the DTD must reference this module. This module sets up all Parameter Entities needed to use the OASIS table model and references (invokes) the OASIS XML Exchange Table Model Module. This module has been modified to use a namespace prefix of “oasis” for all OASIS table elements, to disambiguate these elements and thus permit both the CALS and XHTML table models to be used in one tag set, should the developer choose to do this. There is a separate http://dtd.nlm.nih.gov/options/OASIS/tag-library/19990315/index.htmlTag Library describing the OASIS elements, attributes, and Parameter Entities.

OASIS XML Exchange Table Model Module

(Parameter Entity %oasis-exchange.ent;) Note: Not used in the current Archiving DTD. The OASIS (CALS) Exchange table model. This module is invoked in %oasis-tablesetup.ent;.

XML Special Characters Module

(Parameter Entity %xmlspecchars.ent;) Standard ISO XML special character entities used in this tag set

Custom Special Characters Module

(Parameter Entity %chars.ent;) Custom special character entities created specifically for use in this tag set

Notation Declarations Module

(Parameter Entity %notat.ent;) Container module for the Notation Declarations to be used with this Suite. These notations have been placed in their own module for easy expansion or replacement.


Making DTDs from Suite Modules
(For Implementors Only)


Modular DTD Design

This Suite has been written as a series of XML DTD modules that can be combined into a number of different DTDs. The modules are separate physical files that, taken together, define all element structures (such as tables, math, chemistry, paragraphs, sections, figures, footnotes, and reference elements), as well as attributes and entities in the Suite.

Modules in the Suite are primarily intended to group elements for maintenance. There are different kinds of modules. A module may either:

  • Be a building block for a base DTD (such as the %modules.ent; module)
  • Define the elements inside a particular structure. For example, the %references.ent; module names all the potential components of bibliographic reference lists.
  • Name the members of a “class” of elements, where class is a named grouping of elements that share a similar usage or potential location. For example, the %phrase.ent; module defines small floating elements that may occur within text (for example, inside a paragraph or a title) or that describe textual content (for example, a disease name, drug name, or the name of a discipline).
  • Be a module of “editorial convenience”. For example, the %common.ent; module holds elements and attributes used in the content models of the various class elements.


Parameter Entities Modification

Parameter Entities are the major mechanism for customizing a DTD or creating a new DTD from the modules in the Suite. Individual DTDs will be constructed by 1) establishing element and attribute combinations and content models using Parameter Entities in one of the DTD-specific customizing modules and 2) choosing appropriate modules from the Suite that declare the elements needed. For example, if the base DTD contained 6 kinds of lists and 2 table models, a more specific DTD, such as an authoring DTD, might use a Customize Classes Module to redefine the List Class to name only 3 lists and redefine the Display Class to allow only one table model.

The standard modules to create a customized DTD are: the DTD itself, a module to name its components, and as many over-ride modules and new elements modules as necessary. Typical modules for a new DTD are:

  • DTD — The DTD module (.dtd) for the new DTD (At a minimum, this module declares the top-level element (such as article, book, book-collection, or report) and any other structural elements unique to the new document type.);
  • DTD-specific Module of Modules — The module to name all the new modules created expressly for the new DTD;
  • Class Over-rides — DTD-specific over-rides of the Suite default element classes;
  • Mix Over-rides — DTD-specific over-rides of the Suite default class mixes;
  • Model Over-rides — DTD-specific content model over-rides for the content models in the modules of the suite (using “-elements” and “-model” Parameter Entities); and any
  • New Models — DTD-specific new elements. (For example, the NCBI Book DTD added new book-specific metadata elements.)


Element Classes Concept

Many of the elements in the NCBI Book DTD have been grouped into loose element classes. There is no hard and fast rule for what constitutes a class; each one is a design decision, a matter of judgment. These classes are designed to ease customization to meet the particular needs of new DTDs. Base classes for the Archiving and Interchange Suite are defined in a separate Default Element Classes Module (%default-classes.ent;).

Content models are built using sequences of elements, and OR groups that are classes (typically) or mixes. As an example, the content model for a Paragraph element is declared to be an OR group (that is, a choice) of data characters and any of the elements named in the Paragraph Elements mix. The mix %p-elements; is declared to be a large OR group of many other element-defining classes: the Block Display Class Elements, the Mathematical Expressions Class Elements, the List Class Elements, the Citation Class Elements, et al.

These element classes can be viewed as building blocks that will be used to build larger Parameter Entities for element mixes. (Note: A mix describes a usage circumstance for a group of elements, such as all the paragraph-level elements, all the elements allowed inside a table cell, all the elements inside a paragraph, or all the inline elements). For example, to add another block display item to the Block Display Class Elements, you would edit the %block-display.class; entity declaration, and place the new declaration into the DTD-specific class over-ride module, thus establishing your definition ahead of the block display class of the Suite. Then create a new module that defined the new element and call that module from the DTD.


How To Build a New Custom DTD


Overview

The basic idea for a new DTD is that all lower-level elements (paragraphs, lists, figures, etc.) will be defined in modules — either the modules of the base Suite or in new DTD-specific modules rather than in the DTD itself. The new DTD will be fairly short and include only definitions of the topmost elements, at least the document element and maybe its children.

Modules are defined (declared) using External Parameter Entities in the Suite’s Module to Name the Modules or in the DTD-specific Module of Modules. Modules are called (referenced) in the DTD proper, in the order needed to define the Parameter Entities in sequence.

Version 2.3 of the NCBI Book DTD was written as an example of the new best-practice customization technique. A new variant DTD that follows this plan will probably consist of the following modules:

  • A DTD module to define the top-level elements (for example, book.dtd);
  • A DTD-specific Module of Modules to name new non-Suite modules in the DTD (for example, %bookcustom-modules.ent;);
  • A DTD-specific definition of element classes to add new classes and over-ride the Suite’s default classes (for example, %bookcustom-classes.ent;);
  • A DTD-specific definition of element mixes to add new mixes and over-ride the Suite’s default mixes (for example, %bookcustom-mixes.ent;);
  • A DTD-specific module of content model over-rides (for example, %bookcustom-models.ent;);
  • DTD-specific modules to hold new element declarations (for example, %bookmeta.ent; and %bookpart.ent;); and
  • All or most of the modules in the base Suite, to handle ordinary lists, paragraphs, tables, etc.


Making a Variant DTD

To illustrate the process just described, here is a series of specific instructions for making a new DTD, illustrated by showing how the NCBI Book DTD was created from the modules of the whole Suite.

  1. Modules — Write a new DTD-specific Module of Modules, which will define all the new customization modules the DTD needs. (As an example, the NCBI Book DTD created the module %bookcustom-modules.ent;, which contains the definitions of
    • the class-over-ride module %bookcustom-classes.ent;,
    • the mix-over-ride module %bookcustom-mixes.ent;,
    • the models-over-ride module %bookcustom-models.ent;, as well as
    • the modules that define new elements
      • %bookmeta.ent;,
      • %bookpart.ent;,
      • %bookmultilink.ent;,
      • %bookimagemap.ent;, and
      • %bookrelated-object.ent; (used solely as part of the Book Collection Tag Set).
  2. Class Over-rides — Write a DTD-specific class-over-ride module, defining any over-rides to the Suite classes. These classes are defined in the default classes module, %default-classes.ent;. (As an example, the NCBI Book DTD created the module %bookcustom-classes.ent;, in which several new classes, including %book-part.class; and %custom-meta.class;, were declared.)
  3. Mix Over-rides — Write a DTD-specific mix-over-ride module, defining any over-rides to the Suite mixes. These mixes are defined in the default mixes module, %default-mixes.ent;. (As an example, the NCBI Book DTD created the module %bookcustom-mixes.ent;, in which mixes such as %book-part-level; and %emphasized-text; were declared.)
  4. Model Over-rides — Create a DTD-specific content-model-over-ride module, defining any over-rides to the content models and attribute lists for the Suite. (As an example, the NCBI Book DTD created the module %bookcustom-models.ent;, in which element groupings (suffixed “-elements”) that will be mixed with #PCDATA were redefined, full content models over-rides (suffixed “-model”) were redefined, and some new attributes and attribute lists were added.)
  5. New Elements — Write any new element modules needed. These will define any new block-level or phrase-level elements. (As an example, the Book created several new modules, including the module %bookimagemap.ent;, in which the elements <map-group>, <map>, and <area> were declared.)
  6. DTD Module — With those modules in place, construct a new DTD module. Within that module:
    • Use an External Parameter Entity Declaration to name and then call the DTD-specific modules of modules. (For the NCBI Book DTD, the module %bookcustom-modules.ent;)
    • Use an External Parameter Entity Declaration to name and then call the Suite Modules of Modules, which names all the potential modules. (For the NCBI Book DTD, the module %modules.ent;)
    • Use an External Parameter Entity reference to call the DTD-specific class over-rides. (For the NCBI Book DTD, the module %bookcustom-classes.ent;)
    • Use an External Parameter Entity reference to call the Suite default classes. (For the NCBI Book DTD, the module %default-classes.ent;)
    • Use an External Parameter Entity reference to call the DTD-specific mix over-rides. (For the NCBI Book DTD, the module %bookcustom-mixes.ent;)
    • Use an External Parameter Entity reference to call the Suite default mixes. (For the NCBI Book DTD, the module %default-mixes.ent;)
    • Use an External Parameter Entity reference to call the DTD-specific content models and attribute list over-rides. (For the NCBI Book DTD, the module %bookcustom-models.ent;)
    • Use an External Parameter Entity reference to call in the standard Common Module (%common.ent;) that defines elements and attributes so common they are used by many modules.
    • Use an External Parameter Entity reference to call any new DTD-specific module defining new block-level or phrase-level elements. [For the NCBI Book DTD, the following modules were declared: %bookmeta.ent;, %bookpart.ent;, %bookmultilink.ent;, and %bookimagemap.ent; (The %bookrelated-object.ent; module was declared solely for use in the Book Collection DTD.]
    • Select, from the Suite Module of Modules, those modules which contain the elements needed for the DTD (for instance, selecting lists and not selecting math elements) and calling in each of the modules needed. (The NCBI Book DTD calls these in alphabetical order, since the order does not matter.)
    • Define the document element and any other unique elements and entities needed for this DTD. (For example, the NCBI Book DTD declares only five elements — <book> [the top-level element] and its components: <book-meta>, <book-front>, <body>, and <back>.)


Naming Conventions
(Implementors Only)


Element and Attribute Naming

  • CASE — Element, attribute, and entity names that originate with this tag set or with the Suite are in all lower case. Element and attribute names taken from PUBLIC modules (e.g., MathML and various table modules) incorporated into these Tag Sets are in the case in which they were found in the original module.
  • TWO-WORD NAMES — Elements named with two words are separated by a hyphen, for example, <def-list> and <term-head>.
  • WORD STANDARDIZATION — Abbreviations are standardized so that, for example, “figure” is always used as “fig” (as in the element <fig-group>) and group is not abbreviated (as in the elements <fig-group>, <kwd-group>, and <fn-group>).

Parameter Entity Names for Classes and Mixes

PARAMETER ENTITY: SAME FUNCTION, SAME NAME — The Suite modules and initial DTDs have used a series of Parameter Entity naming conventions consistently. While parsing software cannot enforce these Parameter Entity naming or usage conventions, these conventions can make it much easier for a person to know how the content models work and what must be modified to make a DTD change.

CLASSES — Classes are functional groupings of elements used together in an OR group. Each class is named with a Parameter Entity, and all class Parameter Entity names end in the suffix “.class”:

 <!ENTITY % list.class "def-list | list">

A class, by definition, should never be made “empty”; the class should be removed from all models where you do not want the class elements included.

MIXES — Mixes are functional OR groups of classes; mixes should never contain element names directly. All mixes must be declared after all classes, since mixes are composed of classes. Mix names have no set suffix; for example, they may end in “-mix” or “-elements”. Content models and content model over-rides use mixes and classes for all OR groups. Only content model sequences are made up of element names directly.

MODEL OVER-RIDES — Parameter Entity mixes for over-riding a content model are of two styles: 1) inline mixes and 2) full content model replacements. These two groupings have been defined and named separately to preserve the mixed-content or element- content nature of the models in DTDs derived from the Suite.

The inline Parameter Entities to be intermingled with character data (#PCDATA) in a mixed content model are named with a suffix “-elements”. For example, “%institution-elements;” would be used in the content model for the element <institution>:

 <!ENTITY % institution-elements "| %break.class; | %emphasis.class; | %subsup.class;" >
 <!ELEMENT  institution (#PCDATA %institution-elements;)* >

All inline mixes begin with an OR bar, so that the mix can be removed leaving just character data (#PCDATA):

 <!ENTITY % rendition-plus "| %emphasis.class;  | %subsup.class;" >

The over-ride of a complete content model will be named with a suffix “-model” and should include the entire content model, including the enclosing parentheses:

 <!ENTITY % kwd-group-model "(title?, (%kwd.class;)+ )" >
 <!ELEMENT  kwd-group %kwd-group-model; >
 

File Naming Conventions

DTD — This Tag Library describes the components for the NCBI Book, NCBI Book Collection, and parts of the NCBI Historical Book Tag Sets. Each of these Tag Sets is described in a DTD, an XSD schema, and a RELAX NG schema version. For the DTDs, the base DTD module (delivered as the file journalpublishing.dtd) calls in all the other DTD fragment modules as External Parameter Entities. Each module specific to this DTD (therefore, not part of the Suite) takes the prefix “journalpubcustom-”. The same prefix has been followed in the other two constraint languages schemas.

Each DTD and DTD fragment module has been assigned a unique formal public identifier (fpi). File names are never referenced directly in the comments in the DTD; the file is referred to by the name of the external Parameter Entity, which names the fpi and a system name for the file. The external Parameter Entity has been set to the initial delivery filename.

The Publishing DTD, the individual DTD-fragment modules of the DTD and the Suite, the XSD schema modules, and the RELAX NG schema modules have been given DOS/Windows 3-digit suffixes indicating their type:

*.dtd

A module that can be used as the top level of an XML hierarchy. Used for the Journal Publishing DTD top level, journalpublishing.dtd, but also taken unchanged for public DTD modules that have been included in this DTD such as the MathML DTD and the XHTML table model.

*.ent

A DTD fragment for incorporation into a full DTD. May contain element declarations, entity declarations, etc., for example, articlemeta.ent.

*.mod

A DTD fragment for incorporation into a full DTD. May contain element declarations, entity declarations, etc. This extension has the same meaning as *.ent and is only used to maintain the extension names dictated by the inclusion of PUBLIC DTD and/or schema fragments, for example, mathml2-qname-1.mod.

*.xsd

A W3C XML Schema (XSD) schema or schema module, for example, journalpublishing.xsd.

*.rng

A RELAX NG schema module, for example, journalpublishing.rng and articlemeta.ent.rng.

While the tag set cannot dictate graphic file names, the comments do suggest that best practice for naming graphic files in documents tagged according to this Suite would be to limit the names and path names to these characters: letters (both upper and lower case), numbers, underscore, hyphen, and period. All such names will be assumed to be case sensitive. DOS-style file extensions may be used.


Acknowledgments

The NCBI Handbook provided the majority of sample material used in this tag library.