General Introduction

The Journal Publishing DTD defines a document type for journal articles and some non-article journal material such as product and book reviews, editorials, and letters to the editor. The DTD was written to describe both the metadata for a journal article and the content of the article, but it can also describe just the article header metadata. This is a prescriptive DTD, optimized for the authoring and initial XML tagging of journal material. Although designed for biomedical journals, this DTD should be sufficiently general to describe not only STM journals but technical journals in any field.

The DTD was constructed using the modules of the Archiving and Interchange DTD Suite and has been modeled along the same philosophical lines as the Journal Archiving and Interchange DTD, which is a DTD for interchange and storage of journal material. However, because this is a publishing DTD optimized for the creation of new material, the DTD is far smaller (fewer elements, and fewer choices in many contexts) than was the full Journal Archiving and Interchange DTD. Where, in the interchange DTD, there may have been several ways to express the same information, only one way is provided for this publishing DTD. It was not the intention to limit the expressive power licensed by this DTD but rather to limit the meaningless choices that a full interchange DTD needs to make conversion from a wide variety of formats as easy as possible. The philosophy for the interchange DTD was to accept as many varied forms of many structures as possible. The philosophy of this DTD is to prefer a single structural form, or at least a single style of tagging.

The only element in the Publishing DTD that is not in the Archiving and Interchange model is the NLM Citation Model. This citation model, although loose enough to accommodate the full range of citation types in the NLM Guidelines, is far more prescriptive that the Citation model. This model and the extensive examples of tagged citations provided are intended to encourage the creation of citations according to NLM's guidelines.

This Tag Library describes the Journal Publishing DTD as well as elements from the Archiving and Interchange DTD Suite. This Tag Library provides:

DTD Design Principles

Introduction to the Journal Publishing DTD

The Journal Publishing DTD defines a document that is a component of a journal such as an article, a book or product review, a letter to the editor, etc. Each such document may have up to four components, which must appear in the following order:

  • Front matter (required). The article front matter contains the metadata (header information) such as the article title, the journal in which it appears, the date and issue of publication for that issue of that journal, etc.
  • Body of the article (optional). The body of the article is the main textual and graphic content of the article. This usually consists of paragraphs and sections, which may themselves contain figures, tables, sidebars (boxed text), etc. An article is not required to have a body, so that this DTD may be used to record journal headers when the entire body of the article is not being tagged.
  • Back matter for the article (optional). If present, back matter contains information that is ancillary to the main text, such as a glossary, appendix, or list of references.
  • Optionally, either one or more responses, or one or more subordinate articles:
    • Response (rarely used). A response is a commentary on the article itself, for example, an opinion from an editor on the importance of the article or a reply from the original author to a letter concerning this article.
    • Sub-Article (rarely used). A sub-article is a small article that is completely contained inside another article.

Modular DTD Design

The Archiving and Interchange DTD Suite was written as a set of XML DTD “modules”, each of which is a separate physical file. No module is an entire DTD by itself, but these modules can be combined into a number of different DTDs, for example, a journal repository and interchange DTD, a journal publishing (authoring) DTD, and a book archiving DTD. Modules are primarily intended for maintenance; all the elements of the same “type” (class) are stored together. The DTD itself is small and calls in many other modules to define the lower-level structural elements.

The major disadvantage of a modular system is the longer learning curve, because it may not be immediately obvious where within the system to find a particular element or attribute cluster. To help with this, the description of each element in the Element Section of this documentation names the module in which that element is defined.

There are many advantages to such a modular approach. The smaller units are written once, maintained in one place, and used in many different DTDs. This makes it much easier to keep lower-level structures consistent across document types, while allowing for any real differences that analysis identifies. A DTD for a new function (such as an authoring DTD) or a new publication type can be built quickly, because most of the necessary components will already be defined in the DTD Suite. Editorial and production personnel can bring the experience gained on one tagging project directly to the next with very little loss or retraining. Customized software (including authoring, typesetting, and electronic display tools) can be written once, shared among projects, and modified only for real distinctions.


Because access for a wide range of output devices as well as for the visually impaired is becoming more and more important in the STM journal community, the modules in this Suite were designed to follow, as much as possible, the W3C Web Content Accessibility Guidelines 2.0 working draft, which was the latest accessibility specification available when the Suite was constructed. This Specification specifies accessibility guidelines on many levels from design through application. The guidelines that pertain to the modeling of materials were followed to at least Level-2 compliance. For example, a Long Description <long-desc> element was defined as part of many other elements such as <fig>; therefore, it can be added not only to all figures and other graphical objects but also to any section of the text (for example, to a sidebar <boxed-text>) to provide an accessible description of the object. The xml:lang attribute was added to all section-level elements and many paragraph-level elements to permit explicit indication of the language of the content, as required by these guidelines.

How to Read This Tag Library

Terms and Definitions


Elements are nouns, such as “Speech” and “Speaker”, that represent components of journal articles, the articles themselves, and accompanying metadata.


Attributes hold facts about an element, such as which type of list (e.g., numbered, bulleted, or plain) is being requested when using the List <list> tag, or the name of a pointer to an external file that contains an image. Each attribute has both a name (such as list-type ) and a value (such as “bulleted”).


Data about the data, for example, bibliographic information. The distinction is between metadata elements that describe an article (such as the name of the journal in which an article was published) versus elements that contain the textual and graphical content of the article.

How to Start Using This Tag Library

A full DTD Suite delivery package includes the Journal Publishing DTD, the Customization Module for the DTD, the files that make up the Archiving and Interchange DTD Suite modules, one or more tagged sample documents, and this documentation, as a set of linked HTML files. How you use the documentation will depend on what you need to learn about the Publishing DTD modules and the modules of the larger DTD.

If you want to learn how a journal article is constructed or to get an overview of the elements and the attributes in this DTD, here is a good way to start.

  • Read the Tag Library General Introduction, taking particular note of the next section that describes the parts of the Tag Library, so you will know what resources are available. Stop just before the section “Technical Details for Implementors”.
  • Next, if you do not know the symbols used in the Document Hierarchy diagrams, read the “Key to the Near and Far Diagrams”.
  • Scan the Document Hierarchy diagrams to get a good sense of the top-level elements and their contents. (Find what is inside an <article>, then what is inside each of the four large pieces of an article; keep working your way down.)
  • Pick an element from one of the diagrams. (Look up the element in the Elements Section to find the full name of the element, its definition, usage notes, content allowed, and any attributes. Look up one of the attributes to find its full name, usage notes, and potential values.)
  • Finally, if you are interested in tagging for a particular source journal:
    • Look at an article in a printed or online journal. Can all of the information you want to store from an article fit into the models shown in the diagrams? Do you have, or know how to get, all of the information the models require? Will that information always be available for documents that are complete and correct? How difficult will it be to identify the parts of the information using the elements and attributes described in these models?
    • Now look at some non-article content, such as a news column, a book review, or some letters to the editor. Can the tags handle all of these article types and all of their components?

Structure of This Tag Library

This Tag Library contains the following sections:


An introduction to the contents of this Tag Library, to the design philosophy and intended usage of the Journal Publishing DTD.

Elements Section

Descriptions of the elements used in the Journal Publishing DTD. The element descriptions are listed in alphabetical order by tag name. (Note: Each element has two names: a “tag name” (formally called an element-type name) that is used in tagged documents, the DTDs, and by the software; and an “element name” (usually longer) that provides a fuller, more descriptive name for the benefit of human readers. For example, a tag name might be <disp-quote> with the corresponding element name Quote, Displayed, or a tag name might be <verse-group> with the corresponding element name Verse Form for Poetry.)

Attributes Section

Descriptions of the attributes in the DTD modules. Similar to elements, attributes also have two names: the shorter machine-readable one and a (usually longer) human-readable one. Attributes are listed in order by the shorter machine-readable names, for example, the attribute short name list-type instead of the more informal, easier to read: Type of List.

Parameter Entity Section (For Implementors Only)

Names (with occasional descriptions) and contents of the Parameter Entities in the DTD modules.

Context Table

Listings of where each element may be used. All elements are given in a simple alphabetical list. There is a single table for the elements from all the Suite modules that are called from the DTD.

The Context Table is formatted in two columns. The first column lists an element’s tag name, and the second column lists the tag names of all the elements in which the first element may occur. For example, if the first column contains the front matter element <front> and the second column contains only the article element <article>, this means that the <front> (Front Matter) element may only be used inside an <article> (Article) element.

Most elements may be used inside more than one other element. For example, the attribution element <attrib> (Attribution) may be used inside both block quote <disp-quote> and poem <verse-group> elements.

Note: These Context Table listings (which list where an element may be used) are the inverse of the content description that is given as a part of each element in the element section, which lists what can be inside the named element.

Document Hierarchy Diagrams

Tree-like graphical representations of the content of many elements. This can be a fast visual way to determine the structure of an article or of any element within an article.

Full Article Samples

Two full articles are provided in both PDF form and in XML according to this DTD. These are provided to help users understand the relationship between the article as displayed and the XML version of the article.

Index by Tag Name

Index of element descriptions, alphabetically by tag name (element-type name).

Index by Element Name

Index of element descriptions, alphabetically by element name (the longer, more descriptive name).

DTD Section

Copies of both the Journal Publishing DTD and its Customization Modules (the two files that make up the DTD proper) and the full Archiving and Interchange DTD Suite of XML DTD modules described in this Tag Library.

Tag Library Typographic Conventions

<alt-text> The tag name of an element (written in lowercase with the entire name surrounded by “< >”).
Alternate Title Text (For a Figure, Etc.) The element name (long descriptive name of an element) or the descriptive name of an attribute (written in title case, that is, with important words capitalized and the words separated by spaces).
must not Emphasis to stress a point.

Technical Details for Implementors

Modules Make Up the DTD Library

The Archiving and Interchange DTD Suite has been written as a series of XML DTD modules that can be combined into a number of different DTDs. The modules are separate physical files that, taken together, define all element structures (such as tables, math, chemistry, paragraphs, sections, figures, footnotes, and reference elements) as well as attributes and entities in the Suite.

The three base modules for any DTD are a) the DTD itself, b) the Customization Mudule it uses to define the Parameter Entities for use in the other modules, and c) the Module of Module that names all the other modules.

The remaining modules are primarily intended to group elements for maintenance. There are different kinds of modules. A module may either:

  • be a building block for a base DTD (such as the Module to Name the Modules: %modules.ent;)
  • define the elements inside a particular structure, for example, the Reference Elements Module names all the potential components of bibliographic reference lists
  • name the members of a “class” of elements, where class is a named grouping of elements that share a similar usage or potential location. For example, the Phrase Class Module defines small floating elements that may occur within text, such as inside a paragraph or a title, or that describe textual content, for example, a disease name, drug name, or the name of a discipline
  • be a module of “editorial convenience”, for example, the common module that holds elements and attributes used in the content models of the class elements

Modules in the Journal Publishing DTD

The Journal Publishing DTD defines a single document type for a journal article or other journal component. The DTD draws most of its elements from the Archiving and Interchange DTD Suite. The DTD proper consists of a DTD file and a publishing-specific Customization Module (%journalpubcustomize.ent;) that sets up the differences between the publishing DTD and the Suite modules.

The following files are critical for the customization process that creates this DTD from the modular Suite:

Journal Publishing DTD

(File name journalpublishing.dtd) The top-level Journal Publishing DTD Module, which declares the document element (Article) and the other top-level elements that define a journal article for publishing: article, front matter, back matter, and sub-articles or responses. All elements but these few are declared in the modules of the Suite. The DTD invokes all the other modules it uses, by reference, as external Parameter Entities: first the Module to Name the Modules is called to name all the potential modules, then the Customization Module is called to set up the Parameter Entities that change the content models and attribute lists of elements in the Suite, then all the other modules needed.

Journal Publishing DTD Customize Classes Module

(Parameter Entity %journalpubcustomize.ent;) This module sets up the changes between the regular Archiving and Interchange DTD Suite and the requirements of the Journal Publishing DTD. Parameter Entities name the element members of each class that will be used to establish the publishing content models. This module also defines publishing-specific attribute values and attribute lists. (Note: This module is called directly after the Module to Name the Modules (%modules.ent;) but before any other module.)

Element Classes

Many of the elements in the Suite have been grouped into loose element classes. Most classes are defined in separate modules that bear their names, although a few elements that are defined are in the Common Module because they are used in defining many other modules. Thus, the Link Class is defined in the Link Module, the List Class is defined in the List Module, etc. For the class modules, comments at the top of the module name the Parameter Entity used to invoke the class and define the default class membership. (The real class membership is always defined in the DTD-specific Customization Module, for this DTD the %journalpubcustomize.ent;.)

These element classes can be viewed as building blocks that will be used to build larger Parameter Entities for element mixes. A mix describes a usage circumstance that all the elements share (such as all the paragraph-level elements, all the elements allowed inside a table cell, all the elements inside a paragraph, or all of the inline elements). Content models are built from these mixes. As an example, the content model for a Paragraph <p> is declared to be an OR group (that is, a choice) of data characters and any of the elements named in the mix called %inside-para;, where the inside-paragraph-mix is declared to be a large OR group of many other element-defining classes: the Block Display Class, the Math Class, the List Class, the Link Class, etc.

There are also a few groupings that are not pure classes but just groupings of convenience. For example, there is no “Address Class”; there is a Parameter Entity called %address-elements; that holds a few of the elements, such as country, email, and fax number, that are the contents of an address element <address> and are defined in the Common Module. There is no hard and fast rule for what constitutes a class; each one is a design decision, a matter of judgment.

Using the Element Classes in the Suite

The classes described below are defined in the Journal Publishing Customization Classes Module. The documentation for the classes and their current default element contents are listed in the Parameter Entity Section toward the end of this Tag Library. In the Parameter Entity Section, the names of the elements in a group or class are listed within quotation marks, separated by vertical bars. For example, Phrase Class will be listed as “%phrase.class;” and shown to contain:


which means that the element <named-content> is defined as a Phrase Class element.

Accessibility Class

(%access.class;) Elements added to make it easier to process journal articles in ways that are more accessible to people and devices with special needs, for example, the visually handicapped. Includes, for example, the element <alt-text>, which is a short phrase name or description of an object, usually a graphical object, that can be used “behind the picture” on a website or pronounced in a talking system [defined in the Common Module].

Break Class

(%break.class;) Formatting element (usage discouraged) used to force a line break, primarily in tables and titles [defined in the Format Module].

Citation Class

(%citation.class;) Reference to an external document (a citation) as used within, for example, the text of a paragraph [defined in the Common Module].

Conference Class

(%conference.class;) Metadata elements that may be used to describe a conference, for example, the conference name, theme, and sponsoring organization [defined in the Common Module].

Display Class

(Several Parameter Entities: %block-display.class;, %inline-display.class;, %simple-display.class;) Graphical or other display-related elements, including figures, chemical formulas, and images [defined in the Display Class Module].

Emphasis Class

(%emphasis.class;) Used to produce rendering/typographical distinctions such as superscript, subscript, or bold text [defined in the Format Module].

Label Class

(%label.class;) The label element used to hold the number, prefix character, or prefix word or phrase of a labeled object, such as a table, figure, or footnote [defined in the Common Module].

Link Class

(%link.class;, %simple-link.class;, %ext-links.class; ) Elements that associate one location with another, including cross references and URIs for links to the World Wide Web [defined in the Link Module].

List Class

(%list.class;) The types of lists used in text, including numbered lists and bulleted lists [defined in the List Module].

Math Class

(%math.class;) The mathematical elements (such as Formula, Inline <inline-formula> and Formula, Display <disp-formula>) and elements that can contain the MathML tags [defined in the Math Module].

Paragraph Class

(%para.class;, %rest-of-para.class;, %intable-para.class;) Information for the reader that is at the same structural level as a paragraph, including both regular paragraphs and specially named paragraphs that may have distinctive uses or different displays, such as dialogs and formal statements [defined in the Common Module and the Paragraph Module].

Personal Name Class

(%person-name.class;) The element components of a person’s name (such as <surname>) that can be used, for example, inside the name of a contributor [defined in the Common Module].

Phrase Class

(%phrase.class;) Inline elements that surround a word or phrase in the text because the subject (content) should be identified to support some kind of display, searching, or processing. For example, a <named-content> element could be used to identify a drug name, genus/species, product, etc. [defined in the Phrase Module].

Reference Class

(%references.class;) The elements that may be included inside a Citation (bibliographic reference) [defined in the Reference Module].

Section Class

(%sec.class;) The elements that are at the same hierarchical level as a section [defined in the Section Module].

Table Class

(%table.class;) Elements that contain the rows and columns inside the Table Wrapper <table-wrap> element. The following elements can be set up for inclusion: Table (XHTML table model) <table>

Modules in the Archiving and Interchange DTD Suite

The modules composing the full DTD Suite are:

Module to Name the Modules

(Parameter Entity %modules.ent;) This is one of the modules in the Archiving and Interchange DTD Suite that is used unchanged by this Journal Publishing DTD. This module defines all the external modules that are part of the modular Archiving and Interchange DTD Suite (except itself and the Customization Module, which are both named and called inside the Journal Publishing DTD). All possible external modules are declared as external entities in the Module to Name the Modules (%modules.ent;). The Journal Publishing DTD selects the modules it chooses to use by referencing their external Parameter Entities.

Common (Shared) Elements Module

(Parameter Entity %common.ent;) Declarations for elements, attributes, entities, and notations that are shared by more than one class module. (Note: This module must be called before any of the class or element grouping modules.)

Article Metadata Elements Module

(Parameter Entity %articlemeta.ent;) Declares the metadata elements (issue elements and article header elements) used to describe a journal article. (Note: Metadata elements that describe the journal are in the Journal Metadata Module, %journalmeta.ent;.)

Back Matter Elements Module

(Parameter Entity %backmatter.ent;) Declares elements that are not part of the main textual flow of a work but are considered to be ancillary material, such as appendices, glossaries, and bibliographic reference lists.

Display Class Elements Module

(Parameter Entity %display.ent;) Declares the display-related elements such as figures, graphics, math, chemical expressions and structures, tables, etc.

Format Class Elements Module

(Parameter Entity %format.ent;) Declares elements concerned with rendition of output, for example, printing on a page or display on a screen. This module includes the elements in the Appearance Class, the Break Class, and the Emphasis Class.

Journal Metadata Elements Module

(Parameter Entity %journalmeta.ent;) Declares the elements used to describe the journal in which a journal article is published. (Note: The issue metadata and article metadata are defined in the Article Metadata Module, %articlemeta.ent;.)

Link Class Elements Module

(Parameter Entity %link.ent;) Declares the elements in the Link Class; these are elements that are links (internal or external) by definition, such as URLs <uri> and internal cross references <xref>.

List Class Elements Module

(Parameter Entity %list.ent;) Declares the elements in the List Class; these are all lists except the lists of bibliographic references (citations). Lists are considered to be composed of items.

Math Class Elements Module

(Parameter Entity %math.ent;) Declares the elements in the math classes, such as display equations.

Paragraph-Like Elements Module

(Parameter Entity %para.ent;) Declares structural, non-display elements that may appear in the same places as a paragraph. These elements are named in the various paragraph-class Parameter Entities.

Subject Phrase Class Elements Module

(Parameter Entity %phrase.ent;) Declares the Phrase Class elements, that is, names the inline, subject-specific elements. At the time of DTD creation, there was only one, but it had an attribute to name the type. If more specific subject words (such as “gene”) are added to a later version of this DTD, they would be added to the %phrase.class; entity and defined in this module or in %common.ent;.

Bibliographic Reference (Citation) Class Elements Module

(Parameter Entity %references.ent;) Declares the bibliographic reference elements.

Section Class Elements Module

(Parameter Entity %section.ent;) Declares the elements of the Section Class, that is, declares all section-level elements in the Journal Publishing DTD. At the time of this initial DTD creation, there is only one such element, Section <sec> itself, but future expansion to named sections (such as <methodology> or <materials>) or any new section-level structures would be added here.

MathML Setup Module

(Parameter Entity %mathmlsetup.ent;) Invokes the MathML modules. (DTD Creation Note: To include the MathML elements, a DTD must reference this module. This module sets up all Parameter Entities needed to use the MathML tagset and references (invokes) the MathML 2.0 DTD Module, which, in turn, invokes all of the other MathML modules.)

MathML 2.0 DTD Module

(Parameter Entity %mathml.dtd;) Mathematical Markup Language (MathML) 2.0, an XML application for describing mathematical notation and capturing both its structure and content.

MathML 2.0 Qualified Names 1.0

(Parameter Entity %mathml-qname.mod;) Declares Parameter Entities to support namespace-qualified names, namespace declarations, and name prefixing for MathML, as well as declares the Parameter Entities used to provide namespace-qualified names for all MathML element types.

Extra Entities for MathML 2.0

(Parameter Entity %ent-mmlextra;) Used for MathML processing.

Aliases for MathML 2.0

(Parameter Entity %ent-mmlalias;) Used for MathML processing.

XHTML Table Setup Module

(Parameter Entity %XHTMLtablesetup.ent;) Sets all Parameter Entities needed by the HTML 4.0 (XHTML) table model and then invokes the module containing that model. (DTD Creation Note: To include the XHTML Table Model, reference this module from the DTD. This module sets up all Parameter Entities needed to use the XHTML Table Model and references (invokes) the XHTML Table Model Module.)

XHTML Table Model Module

(Parameter Entity %htmltable.dtd;) The public XML version of the HTML 4.0 (XHTML) table model. This module is invoked in %XHTMLtablesetup.ent;.

OASIS XML Table Setup Module

(Parameter Entity %oasis-tablesetup.ent;) Sets all Parameter Entities needed by the OASIS (CALS) Exchange table model and then invokes the module containing that model. (DTD Creation Note: To include the OASIS Table Model, reference this module from the DTD. This module sets up all Parameter Entities needed to use the OASIS Table Model and references (invokes) the OASIS XML Exchange Table Model Module.)

OASIS XML Exchange Table Model Module

(Parameter Entity %oasis-exchange.ent;) The OASIS (CALS) Exchange table model. This module is invoked in %oasis-tablesetup.ent;.

XML Special Characters Module

(Parameter Entity %xmlspecchars.ent;) Standard ISO XML special character entities used in this DTD.

Custom Special Characters Module

(Parameter Entity %chars.ent;) Custom special character entities created specifically for use in this DTD.

Notation Declarations Module

(Parameter Entity %notat.ent;) Container module for the Notation Declarations to be used with this DTD Suite. These notations have been placed in their own module for easy expansion or replacement.

Archiving and Interchange Suite
Naming Conventions

XML Component Naming Conventions

Element and attribute names that originate with the Journal Publishing DTD and the Archiving and Interchange DTD Suite are in all lowercase. Element and attribute names taken from PUBLIC modules incorporated into these DTDs are in the case in which they are found in the original module (e.g., MathML and various table modules). Elements named with two words are separated by a hyphen, for example, <def-list> and <term-head>.

Classes are functional groupings of elements, defined and used together. Each class is named with a Parameter Entity, and all class Parameter Entity names end in the suffix “.class”.

File Naming Conventions

This Tag Library describes the components of the Journal Publishing DTD, which uses modules from the Archiving and Interchange DTD Suite. The publishing DTD consists of a base DTD module (delivered as the file journalpublishing.dtd), which references the other DTD modules.

The individual modules (as delivered) have been given DOS/Windows three-digit suffixes indicating their type:


A module that can be used as the top level of an XML hierarchy. Used for the journal Article top level, journalpublishing.dtd, but also taken unchanged for public DTD modules that have been included in this DTD, such as the MathML DTD and the XHTML table model.


A DTD fragment for incorporation into a full DTD. May contain element declarations, entity declarations, etc.


A DTD fragment for incorporation into a full DTD. May contain element declarations, entity declarations, etc. This extension has the same meaning as *.ent and is only used to maintain the extension names dictated by the inclusion of PUBLIC DTD fragments, for example, mathml2-qname-1.mod.

Each DTD and module has been assigned a unique formal public identifier (fpi). File names are never referenced directly in the comments in the DTD; the file is referred to by the name of the external Parameter Entity, which names the fpi and a system name for the file. The external Parameter Entity has been set to the initial delivery filename.

Although the DTD cannot dictate graphic file names, the comments do suggest that the best practice for graphic file names for documents tagged according to this DTD Suite would be to limit the names and path names to these characters: letters (both upper- and lowercase), numbers, underscore, hyphen, and period. All such names will be assumed to be case sensitive. DOS-style file extensions may be used.


We thank, Molecular Biology of the Cell, and The Proceedings of the National Academy of Sciences of the U.S.A. for providing the sample articles used in this Tag Library.