Book DTD Version 2.2


Version 2.2

Version 2.2 was released on November 2, 2006. Following is a list of updates.

The files for Version 2.2 are available:

The Tag Library for version 2.2 is here: http://dtd.nlm.nih.gov/book/tag-library/2.2/

Summary of Version 2.2 Changes

In Version 2.2, the Book DTD, Book Collection DTD, and the Archiving and Interchange DTD Suite modules were modified to:

  • Use the latest (Version 2.2) NLM DTD Suite; and

  • Make minor changes requested by NLM (such as adding permission and keyword elements to the Section Metadata element).

How to Build A New Custom DTD

The Concept

The basic idea for a new DTD is that all lower-level elements (paragraphs, lists, figures, etc.) will be defined in modules, either the modules of the Suite or in new DTD-specific modules, not in the DTD itself. The DTD will be fairly short and include only definitions of the topmost element(s), at least the document element and maybe its children.

Modules are defined using External Parameter Entities in with the Suite Module of Modules or in the DTD-specific Module of Modules. Modules are called (referenced) in the DTD, in the order needed to define the Parameter Entities in sequence.

Version 2.0 of this Journal Archiving and Interchange was written as an example of the new best-practice customization technique. A new DTD that follows this plan will probably consist of the following modules:

  • A DTD module to define the top-level elements (e.g., yournew.dtd) ;

  • A DTD-specific definition of element classes to add new classes and over-ride the default classes; (for example, %yournew-classes.ent;)

  • A DTD-specific definition of element mixes to add new mixes and over-ride the default mixes; (for example, %yournew-mixes.ent;);

  • A DTD-specific module of content model over-rides (for example, %yournew-models.ent;).

  • A DTD-specific Module of Modules to name the non-Suite modules in the DTD (for example, %yournew-modules.ent;)

  • DTD-specific modules to hold new types of element declarations (e.g., %taxonomic-key.ent; or %help-topic-meta.ent;); and

  • All or most of the modules in the base Suite.

Example: Making a New DTD Using the Suite

To show the process, here is a series of instructions for making a new DTD, illustrated by showing how Journal Archiving and Interchange DTD was created from the modules of the whole Suite.

  1. Modules —Write a new DTD-specific Module of Modules, which defines all new customization modules the DTD needs. (As an example, the Archiving DTD created the module %archivecustom-modules.ent;, which contains the definitions of the class-over-ride module %archivecustom-classes.ent;, the mix-over-ride module %archivecustom-mixes.ent;, and the models-over-ride module %archivecustom-models.ent;.)

  2. Class Over-rides —Write a DTD-specific class-over-ride module, defining any over-rides to the Suite classes. These classes were defined in the default classes module. (As an example, the Archiving DTD created the module %archivecustom-classes.ent;, in which a new model for %contrib-info.class; was declared and an entirely new class %x.class; was added.)

  3. Mix Over-rides —Write a DTD-specific mix-over-ride module, defining any over-rides to the Suite mixes. These mixes were defined in the default mixes module. (As an example, the Archiving DTD created the module %archivecustom-mixes.ent;, in which a new mix %all-phrase; was declared and then used in many existing mixes, such as%simple-phrase;.)

  4. Model Over-rides —Create a DTD-specific content-model-over-ride module, defining any over-rides to the content models and attribute lists for the DTD Suite. (As an example, the Archiving DTD created the module %archivecustom-models.ent;, in which element collections (suffixed “-element”) that will be mixed with #PCDATA were redefined, full content models over-rides (suffixed “-model”) were redefined, and some new attributes and attribute lists were added.

  5. New Elements —Write any new element modules needed. These will define any new block-level or phrase-level elements. (As an example, the Archiving DTD did not need any new elements not in the Suite, but the new NLM Book DTD added modules for book metadata and book component parts.)

  6. DTD Module —Use the modules just described in the construction of a new DTD module. Within that DTD module:

    • Use an External Parameter Entity Declaration to name and then call the DTD-specific modules of modules (For the Archiving DTD, the module %archivecustom-modules.ent;.)

    • Use an External Parameter Entity Declaration to name and then call the DTD Suite modules of modules, which names all the potential modules. (For the Archiving DTD, the module %modules.ent;.);

    • Use an External Parameter Entity reference to call the DTD-specific class over-rides (For the Archiving DTD, the module %archivecustom-classes.ent;.);

    • Use an External Parameter Entity reference to call the DTD Suite default classes (For the Archiving DTD, the module %default-classes.ent;.);

    • Use an External Parameter Entity reference to call the DTD-specific mix over-rides (For the Archiving DTD, the module %archivecustom-mixes.ent;.);

    • Use an External Parameter Entity reference to call the DTD Suite default mixes (For the Archiving DTD, the module %default-mixes.ent;.);

    • Use an External Parameter Entity reference to call the DTD-specific content models and attribute list over-rides (For the Archiving DTD, the module %archivecustom-models.ent;.);

    • Use an External Parameter Entity reference to call in the standard Common Module (%common.ent;) that defines elements and attributes so common they are used by many modules.

    • Select, from the Module to Name the Modules, those modules which contain the elements needed for the DTD (for instance, selecting lists and not selecting math elements) and calling in each of the modules needed; (The Archive DTD calls these in alphabetical order, since the order does not matter.)

    • Define the document element and any other unique elements and entities needed for this DTD. (For example, the Archiving DTD declares only six elements — <article> [the top-level element] and its components: <front>, <body>, <back>, <sub-article>, and <response>.)

Changes to the Base Suite

The section below lists the changes to the base Suite which is used to build the Archiving, Publishing, and Authoring DTDs. Most of the old class modules (%list.ent;, %references.ent;, etc.) look much the way they used to, although the default classes and mixes have been moved out of the individual modules into the new class-specific and mix-specific modules. A few elements have moved from one module to another, particularly to the common module, as their usage increased. There are more Parameter Entities to make it possible to over-ride even more content models and attribute lists.

Global changes include:

  • Changed the version number of every module in both the base Suite and the Archiving DTD to reflect a Version of 2.0 and a date of 08/30/2004;

  • Changed the formal public identifier of each module to that version and date, and

  • Changed the “dtd-version” in all the DTDs to #FIXED attribute to “2.0”.

Rationale and Philosophy for Changes

Regularizing Versus Strict Preservation

Two base DTDs were originally constructed from the Journal Archiving and Interchange DTD Suite: the Journal Archiving and Interchange DTD (nicknamed Green) and the Journal Publishing DTD (nicknamed Blue). The Publishing Blue DTD was to be prescriptive, to facilitate authoring. The Archiving Green DTD was to be moderately loose, to serve as a basis for interchange and repositories.

An understated, but very real, goal of the Archiving DTD was regularization of the archive itself. At the time of conversion from a publisher’s original DTD to the Journal Archiving DTD, a number of changes would be made to make all articles more alike. An alternate goal for the DTD might have been (but was not) preservation of a publisher’s content as exactly as possible. (As a small illustration of the difference such a design makes in a DTD, consider the ”article-type” attribute values for the <article> element. An archive-regularizing DTD would make this a closed list and would change the “research article” of one publisher, the “research paper” of another, and the “letter” of a journal such as Nature into a single value during conversion. In contrast, a preservationist approach might make this attribute a CDATA open list, to capture whatever the publisher had called the article.)

Three DTDs From Two

Many requests from the AIT Working Group were from preservationists asking that the Archiving DTD loosen up to allow representing a wider range of publishers’ input. As the Archiving DTD loosened, archives wanting to regularize the archive migrated to the Publishing Blue DTD. They found it a little too tight for their needs and requested loosening changes. At the same time, vendors wanting to establish an authoring environment felt that Blue needed tightening, to make it easier for authors. The solution has been to create three base DTDs from the Suite:

Journal Archiving (Green) DTD

a preservationist archival DTD (the current Archiving DTD, made even more flexible and non-enforcing

Journal Publishing (Blue) DTD

an archive regularization and interchange DTD (the current Publishing DTD loosened as necessary)

Authoring DTD

a tight, small subset that concentrates on best practice as an aid to authoring

Remodularization

New Customization Requirements

When the Archiving and Interchange DTD Suite was written, it was assumed that the major use of the Suite would be to make entirely new and distinct DTDs, so the modularization was done to make that convenient. All customization was concentrated in one module, and Parameter Entities were defined just before they were used. But very few original DTDs have been developed in the last year and a half. New DTDs are being created from the Suite not by DTD development but by DTD modification. Most organizations seem to want most of their models to be the same as the Suite, plus a few changes. People modifying the DTD have also requested more guidance in how to modify it properly. In light of actual Suite usage, we have observed the following DTD Suite customization requirements.

  • The published modules should act as a more complete model of how to build a modified DTD from the Suite. The modularization of the base Archiving and Publishing DTDs should provide a sample of best practice for making and modularizing new DTDs.

  • Customization modules should be small and list only what has changed from the Suite defaults.

  • It should be relatively easy to compare new DTDs developed from the Suite.

  • Parameter Entities that are not over-ridden by a new DTD should not need to be declared by it (so that everything in the customization module is a change from the base).

  • It is still useful to developers to have all the element classes listed in one place, grouped functionally (in functions that match the Suite modules) so that it is easy to see what in the Suite is designed to be changed easily and how to change it.

New Class and Mix Modules

Best practice for how to make a new DTD from the Suite has changed, and the Suite has been remodularized for the new style. Two Suite modules have been created to hold the declarations that were formerly part of each DTD’s customization. In place of the former single customization module, there are smaller function-specific modules:

  • Default definition of element classes (the module %default-classes.ent;); and

  • Default definition of element mixes (the module %default-mixes.ent;).

All the default class Parameter Entities have been removed from the individual modules and placed into the new classing module.

Modeling Conventions and PE Naming

These DTDs and Suite modules have used a series of design and naming conventions consistently. While parsing software cannot enforce these Parameter Entity usage or naming conventions, these conventions can make it much easier for a person to know how the content models work. Version 2.0 of the Journal Archiving DTD (and the entire Verison 2.0 Archiving and Interchange DTD Suite) use the following usage and naming conventions.

  • Classes —Classes are functional OR-groups of elements. All class names end in the suffix “.class”. For example: <!ENTITY % list.class "def-list | list">Classes cannot be made empty; the class should just be removed from all models where you do not want the elements included.

  • Mixes —Mixes are OR-groups of classes. All mixes must be declared after all classes, since mixes are composed of classes. (Mixes should never contain element names directly.) Mix names have no set suffix. Some mixes are inline to be intermingled with #PCDATA and some mixes grouping of block-level elements. All inline mixes begin with an OR bar. For example: <!ENTITY % rendition-plus "| %emphasis.class; | %subsup.class;" >

  • Content —Content models and content model over-rides use mixes and classes for all OR groups. Only sequences are made up of element names directly. Content models over-rides are of two types, defined separately to preserve the mixed-content or element- content nature of the models as an aid to interchange.

    • -models —The over-ride of a complete content model will be named with a suffix “-model”. The over-ride includes the entire content model, including the enclosing parentheses, for example: <!ENTITY % kwd-group-model "(title?, (%kwd.class; | %x.class;)+ )" >

    • -elements —A grouping of elements to be mixed with #PCDATA inside a content model will be named with a suffix “-elements”. For example “access-date-elements” would be used in the models for the elements <access-date>. All “-elements” over-rides begin with an OR bar, so that a model may exclude all elements and be reduced to #PCDATA . For example: <!ENTITY % access-date-elements "| %date-parts.class; | %x.class;" > Could be replaced by <!ENTITY % access-date-elements "" >

  • Attribute lists — Attribute lists for a particular element are named with the name of the element followed by the suffix “-atts”, so, for example, the attributes for the abstract element would be named “abstract-atts”. Such lists are not reused as frequently as they might be in many DTDs, to provide maximum flexibility. Attribute lists for different elements were rarely tied together. The Parameter Entities contain at least one complete line of an attribute list, not including the ATTLIST Declaration. <!ENTITY % rendition-plus "| %emphasis.class; | %subsup.class;" >

New Classes

The ideal situation in a DTD is that mix OR-groups and OR-groups within content models do not name elements; they name classes. This makes DTD-customization easier and makes maintenance over time significantly easier. A few new classes were created to facilitate this:

  • %app.class;

  • %back.class;

  • %caption.class;

  • %corresp.class;

  • %date.class;

  • %date-parts.class;

  • %def.class; (used in both <def-item> and <abbrev>

  • %degree.class;

  • %fig-display.class;

  • %fn-link.class;

  • %front.class;

  • %front-back.class;

  • %id.class;

  • %just-base-display.class;

  • %just-para.class; (used in, for example, <author-comment>, <bio>, <def>, <caption>, <statement>, <fig>),

  • %just-table.class;

  • %kwd.class;

  • %name.class;

  • %ref-list.class;

  • %sec-back.class;

  • %table-foot.class; and

  • %tbody.class;.

Content models were rewritten to use the newly created classes. This rewriting did not lead to DTD changes, except in the following elements:

  • Modified %doc-back-matter-mix; (formerly named %doc-back-matter-elements;) to correct the historical error that had this Parameter Entity calling a mix (%sec-level;) and not a class (%sec.class;). Since there was nothing in %sec-level; but <sec>, this has no effect on the Archiving DTD as delivered, but it may change existing customizations.

  • %front-matter-model; rewritten to use new class Parameter Entity %front-back.class; and to use %list.class; rather than just <def-list>. This widens the model of <front-matter> by adding <list>.

  • Paragraph-related changes:

    • %inside-para; was renamed %p-elements;

    • Deleted %para.class; In the definition of the Paragraph <p> element, its place will be taken by %p-elements; .In other mixes, such as %para-level;, %para.class; was replaced by the combination of %just-para.class; and %rest-of-para.class; . (No DTD Changes)

    • Inside the content model for Paragraph <p>, %rest-of-para.class; was renamed %p-elements;. The content model for <named=content> and the model for %p-elements; itself still use %rest-of-para.class;.

  • In %named-content-elements;, replaced the Parameter Entity %emphasized-text; with its constituent classes

  • In %copyright-statement-elements;, replaced the mix %rendition-plus; with its constituent classes

  • In %citation-elements;, replaced the mix %simple-text; with its constituent classes. This causes the apparent, but not real, deletion of %address-link.class;. These links are also in %references.class; so there was not DTD change.)

Link Classes

The link classes were reorganized to make future modification easier. Three classes were deleted; new link classes were added for a total of four, and everywhere the link classes were used was modified as follows:

  • All occurrences of %ext-links.class; were replaced with the new class %address-links.class;.

  • All occurrences of %link.class; were replaced with some combination of the new link classes named below.

These were not usually DTD changes, just parameterization changes.

The following link-related classes were deleted:

  • %link.class;

  • %inpara-address;

  • %ext-links.class;

The new link classes are:

  • %address-link.class; (external links used in addresses)

  • %fn-link.class; (footnote alone)

  • %simple-link.class; (the internal links, just as it used to be)

  • %article-link.class; (links for journal articles)

Specific changes this encompassed include:

  • Replaced the link PEs in %emphasized-text;, %inside-cell; , %p-elements;, %product-elements;, %simple-phrase;

  • In %aff-elements;, replaced %link.class; with %address-link.class;, %simple-link.class;, %article-link.class; (directly in the Suite base module and via the use of %all-phrase; in the Archiving customization) (No DTD Change)

  • In <author-notes>, “(corresp | fn)+” was replaced with the %fn-link.class; and the new class %corresp.class;

  • In %collab-elements;, replaced %ext-links.class; with %address-link.class; and deleted %inpara-address; (directly in the Suite base module and via the use of %all-phrase; in the Archiving customization) (No DTD Change)

  • In %copyright-statement-elements;, replaced %inpara-address; with %address-link.class; (directly in the Suite base module and via the use of %all-phrase; in the Archiving customization)

  • <email> is considered to be just another type of external link, as <ext-link> is, so it was added to: %collab-elements; and %copyright-statement-elements;.

  • In %inside-para; (which had been modified and renamed %p-elements;) (No DTD change since %address-link.class; covers it.), deleted the PE %inpara-address;

  • In %named-content-elements;, replaced %link.class; with %address-link.class;, %article-link.class; , and %simple-link.class; (directly in the Suite base module and via the use of %all-phrase; in the Archiving customization) (No DTD Change)(No DTD Change)

  • In <related-article>, deleted %ext-links.class; because %references.class; has the needed links. (No DTD Change)

Rename Existing Parameter Entities

In order to make customization and maintenance easier, the names of several existing Parameter Entities were changed to bring them in line with the naming practices. This entailed changing the PE declaration and every mix or context model that used the PE.

  • %address-elements; ==> to %address.class;

  • %author-notes-elements; ==> %author-notes-model;

  • %block-math; ==> %block-math.class;

  • %citation-model; ==> %citation-elements;

  • %contrib-info; ==> %contrib-info.class;

  • %copyright-statement-model; ==> %copyright-statement-elements;

  • %def-item-elements; ==> %def-item-model;

  • %display-back-matter; ==> %display-back-matter.class;

  • %doc-back-matter-elements; ==> %doc-back-matter-mix;

  • %inline-math; ==> %inline-math.class;

  • %list-item-elements; ==> %list-item-model;

  • %related-article-model; ==> %related-article-elements;

  • %sec-back-matter-elements; ==> %sec-back-matter-mix;

Inline Mix OR-Bars

INLINE MIX OR-BAR — All inline mixes begin with an OR bar. While not strictly conformant, all modern XML parsers tested allow this variant. This technique allows the PE to be set to the null string, cancelling out any element inclusions and leaving a model of #PCDATA. This could also have been accomplished by over-riding the entire content model with a PE. The disadvantage of that method is that it makes it very easy to change mixed-content models to block-level element-content models. Since that is a major barrier to interchange, keeping the level the same is the one area where this DTD attempts to enforce consistency.

Changed the following inline-mix Parameter Entities to use the OR-bar-first mechanism. This requires changing not only the Parameter Entity to add the OR-bar, but changing all content models that use the entity to remove the OR bar: %all-phrase;, %emphasized-text;, %inside-para; (Now renamed %p-elements;and used only inside the Paragraph element <p>), %just-rendition;, %preformat-elements;, %related-article-elements;, %rendition-plus; , %simple-phrase;, %simple-text;, and all the Parameter Entities with the suffix “-elements”, if they did not already start that way.

Model Over-rides Permitted

To make the DTDs more flexible and allow additional over-riding, the following new PEs were added:

  • %access-date-elements;

  • %chem-struct-elements;

  • %copyright-statement-elements;

  • %degrees-elements;

  • %display-formula-elements; and %display-formula-model;.

  • %edition-elements;

  • %etal-elements;

  • %ext-link-elements;

  • %fax-elements;

  • %font-elements;

  • %given-names-elements;

  • %gov-elements;

  • %history-model;

  • %issn-elements;

  • %issue-elements;

  • %issue-title-elements;

  • %just-para.class

  • %kwd-group-model;

  • %label-elements;

  • %long-desc-elements;

  • %on-behalf-of-elements;

  • %p-elements;

  • %patent-elements;

  • %phone-elements;

  • %prefix-elements;

  • %publisher-name-elements;

  • %publisher-loc-elements;

  • %role-elements;

  • %self-uri-elements;

  • %series-text-elements;

  • %series-title-elements;

  • %std-elements;

  • %string-date-elements;

  • %string-name-elements;

  • %suffix-elements;

  • %surname-elements;

  • %time-stamp-elements;

  • %uri-elements;

  • %verse-line-elements;

  • %volume-elements; and

  • %volume-id-elements;.

New attribute Parameter Entities were added as well:

  • %article-id-atts; for <article-id>

  • %date-atts; for <date>

  • %object-id-atts; and “object-id-type

  • %pub-date-atts; for <pub-date>

  • %pub-id-atts; for <pub-id>

  • %sub-article-atts; for <sub-article>

  • %volume-id-atts; for <volume-id>


PubMed Central
NCBI | NLM | NIH
Department of Health & Human Services
Freedom of Information Act | Disclaimer
Last updated: