Article Authoring 2.1

Documentation for Implementors

This Authoring DTD comprises a handful of DTD-specific modules that set up Parameter Entity over-rides and uses (by reference) the base modules of the full Journal Archiving and Interchange DTD Suite. The modules of that Suite were developed as part of an effort to create XML applications through which materials on health-related disciplines could be shared and reused electronically. Although the full Suite was developed to support electronic production, the structures should be adequate to support some print production as well. The Suite has been used to construct many DTDs in addition to this one.

Because this is a authoring DTD, thus optimized for creating new content, the DTD is far smaller (fewer elements, and fewer choices in many contexts) than either the Archiving or the Publishing DTDs. Where, in the Archiving DTD, there may have been several ways to express the same information, the goal was to allow only one way in this authoring DTD. It was not the intention to limit the expressive power licensed by this DTD, but rather to limit the meaningless choices that a full interchange DTD needs to make to accommodate conversion from as wide a variety of formats as possible. The philosophy for the Archiving DTD was to accept as many varied forms of many structures as possible unchanged. The philosophy for the Publishing DTD was to accept a wide variety of structures and to regularize those that matter to the archive. The philosophy of this Authoring DTD is to prefer a single structural form, or at least a single style of tagging, whenever possible. Similarly the Archiving and Publishing DTDs allow for formatting such as list numbering and citation references to be preserved. This DTD assumes that such objects will need to be generated as part of production.

This DTD Suite has been written as a series of XML DTD modules that can be combined into a number of different DTDs. The modules are separate physical files that, taken together, define all element structures (such as tables, math, chemistry, paragraphs, sections, figures, footnotes, and reference elements), as well as attributes and entities in the Suite.

Modules in the Suite are primarily intended to group elements for maintenance. There are different kinds of modules. A module may either:

Be a building block for a base DTD (such as the Module to Name the Modules module)
Define the elements inside a particular structure. For example, the Bibliography References (Citation) Elements Module module names all the potential components of bibliographic reference lists.
Name the members of a “class” of elements, where class is a named grouping of elements that share a similar usage or potential location. For example, the Phrase-Level Content Elements Module module defines small floating elements that may occur within text, such as inside a paragraph or a title, or that describe textual content, for example, a disease name, drug name, or the name of a discipline.
Be a module of “editorial convenience”. For example, the Common (Shared) Element Declarations Module module holds elements and attributes used in the content models of the various class elements.

There are many advantages to such a modular approach. The smaller units are written once, maintained in one place, and used in many different DTDs. This makes it much easier to keep lower level structures consistent across document types, while allowing for any real differences that analysis identifies. A DTD for a new function (such as a repository DTD) or a new publication type can be built quickly, since most of the necessary components will already be defined in the DTD Suite. Editorial and production personnel can bring the experience gained on one tagging project directly to the next with very little loss or retraining. Customized software (including authoring, typesetting, and electronic display tools) can be written once, shared among projects, and modified only for real distinctions.

A number of DTDs have already been developed from the modules in the full Archiving and Interchange DTD Suite and more are being developed. The DTDs produced for the National Library of Medicine include:

Archiving and Interchange DTD — (Journal Archiving and Interchange DTD, nicknamed “Green”) A DTD into which publishers’ original XML (or SGML) content can be converted, providing a common format for storing this material and for transferring content to any one of a number of archives. This DTD has been designed to be very open and inclusive—to allow journal articles to be translated from a wide variety of proprietary journal article DTDs. It is permissive by design, intended to capture whatever warts and wrinkles exist in a Publishers’ content.
Publishing DTD — (Journal Publishing DTD, nicknamed “Blue”) A DTD for use as a repository DTD by archives which desire consistent processing and searching capability. The Journal Publishing DTD is a more prescriptive and restrictive subset of the Journal Archiving and Interchange DTD Suite than the Archiving DTD (above). This DTD may also be used to transfer content between archives.
Authoring DTD — (Authoring DTD, nicknamed “Pumpkin”) A DTD intended for writing and editing new journal articles. The Authoring DTD will be designed to enable “good” journal coding and to provide assistance to authors and editors through more restricted models than the interchange and repository DTDs allow.
Book DTD — (nicknamed “Purple”) A DTD intended for preserving book and monograph content for use in electronic websites.

There is one and only one element in the authoring DTD that is not in the Archiving DTD (although it is in the Publishing DTD): the <nlm-citation> element. This citation model, although loose enough to accommodate the full range of citation types in the NLM Guidelines, is far more prescriptive than the <citation> model of the base Suite. The NLM citation model and the extensive examples of tagged citations provided are intended to encourage the creation of citations according to NLM’s guidelines.

If you want to learn about the DTD Suite in order to write a new DTD or to modify this one:

Skim the Tag Library General Introduction.
- Start really reading at the section “Documentation for Implementors”;
- Read the Parameter Entities that name the classes; and
If you do not already know the symbols used in the Document Hierarchy diagrams, then read the “Key to the Near & Far® Diagrams”.
Use the Document Hierarchy diagrams to give you a good sense of the top-level elements and their contents.
Pick an element from one of the diagrams (Look up the element in the Elements Section to find the full name of the element, the definition, usage notes, content allowed, and attributes list. Look up or link to one of the attributes to find its full name, usage notes, and potential values.).
Scan the DTD Modules, given at the end of this documentation.
New DTDs are created by writing a new DTD module and new customization modules, so you might want to read (in order):
- The DTD module (articleauthoring.dtd);
- The module that names all the other modules (%modules.ent;);
- The customization modules that names all DTD-specific changes to the Suite modules (%articleauthcustom-modules.ent;); and
- The customization modules themselves (%articleauthcustom-classes.ent;, %articleauthcustom-mixes.ent;, and %articleauthcustom-models.ent;), as well as any modules being added by this DTD (%nlmcitation.ent;).
  You might also wish to familiarize yourself with the relationship between the “customization” modules and the “default” modules for classes, mixes, and models.) Those can be followed by any one of the class modules (although the DTD Suite has been designed to have the %common.ent; module precede any other class module).

Parameter Entities are the major mechanism for customizing a DTD or creating a new DTD from the modules in the full Suite. Individual DTDs will be constructed by 1) establishing element and attribute combinations and content models using Parameter Entities in one of the DTD-specific customizing modules and 2) choosing appropriate modules from the Suite that declare the elements needed. For example, if the base DTD contained 6 kinds of lists and 2 table models, a more specific DTD might use a Customize Classes Module to redefine the List Class to name only 3 lists and redefine the Display Class to allow only one table model.

The standard modules to create a customized DTD are: 1) the DTD itself, 2) a module to name its component modules, and 3) as many over-ride modules (class, mix, and/or model) and new elements modules as necessary. Thus, typical modules for a new DTD are:

DTD — The DTD module (articleauthoring.dtd) for the new DTD base DTD (At a minimum, this module declares the top-level element (such as article, book, help-topic, or report) and any other structural elements unique to the new document type.);
DTD-specific Module of Modules — The DTD-specific module of modules, to name all the new modules created expressly for the new DTD;
Class Over-rides — DTD-specific over-rides of the Suite default element classes;
Mix Over-rides — DTD-specific over-rides of the Suite default class mixes;
Model Over-rides — DTD-specific content model over-rides for the content models in the modules of the suite (using “-elements” and “-model” Parameter Entities); and
New Model Modules — DTD-specific new elements (for example, a new Book DTD might add book-specific metadata elements or a DTD for historical material might add a module to define annotations.)

The basic idea for a new DTD is that all lower-level elements (paragraphs, lists, figures, etc.) will be defined in modules — either the modules of the base DTD Suite or in new DTD-specific modules rather than in the DTD itself. The new DTD will be fairly short and include only definitions of the topmost elements, at least the document element and maybe its children.

Modules are defined (declared) using External Parameter Entities in the Suite’s Module to Name the Modules or in the DTD-specific Module of Modules. Modules are called (referenced) in the DTD proper, in the order needed to define the Parameter Entities in sequence.

This Authoring DTD was written as an example of the new best-practice customization technique. A new variant DTD that follows this plan will probably consist of the following modules:

A DTD module to define the top-level elements (for example, articleauthoring.dtd);
A DTD-specific Module of Modules to name new non-Suite modules in the DTD (for example, %articleauthcustom-modules.ent;);
A DTD-specific definition of element classes to add new classes and over-ride the default classes (for example, %articleauthcustom-classes.ent;);
A DTD-specific definition of element mixes to add new mixes and over-ride the default mixes (for example, %articleauthcustom-classes.ent;);
A DTD-specific module of content model over-rides (for example, %articleauthcustom-models.ent;);
DTD-specific modules to hold new element declarations; and
All or most of the modules in the Suite.

To show the process, here is a series of instructions for making a new DTD, illustrated by showing how the Authoring DTD was created from the modules of the whole Suite.

Modules — Write a new DTD-specific Module of Modules, which defines all new customization modules the DTD needs. (As an example, the Authoring DTD created the module %articleauthcustom-modules.ent;, which contains the definitions of the class-over-ride module %articleauthcustom-classes.ent;, the mix-over-ride module %articleauthcustom-mixes.ent;, and the models-over-ride module %articleauthcustom-models.ent;.)
Class Over-rides — Write a DTD-specific class-over-ride module, defining any over-rides to the Suite classes. These classes are defined in the default classes module, %default-classes.ent;. (As an example, the Authoring DTD created the module %articleauthcustom-classes.ent;, in which several new models, including %rest-of-para.class.class; and %name.class;, were declared.)
Mix Over-rides — Write a DTD-specific mix-over-ride module, defining any over-rides to the Suite mixes. These mixes are defined in the default mixes module, %default-mixes.ent;. (As an example, the Authoring DTD created the module %articleauthcustom-mixes.ent;, in which mixes such as %emphasized-text; and %simple-phrase; were declared.)
Model Over-rides — Create a DTD-specific content-model-over-ride module, defining any over-rides to the content models and attribute lists for the DTD Suite. (As an example, the Authoring DTD created the module %articleauthcustom-models.ent;, in which element collections (suffixed “-elements”) that will be mixed with #PCDATA were redefined, full content models over-rides (suffixed “-model”) were redefined, and some new attributes and attribute lists were added.)
New Elements — Write any new element modules needed. These will define any new block-level or phrase-level elements. (As an example, the Authoring DTD includes the module %nlmcitation.ent;, in which a more prescriptive citation element, <nlm-citation>, is declared. The module %nlmcitation.ent; was initially written as part of the Journal Publishing DTD and is used without change by this DTD.)
DTD Module — With those modules in place, construct a new DTD module. Within that module:
- Use an External Parameter Entity Declaration to name and then call the DTD-specific modules of modules. (For the Authoring DTD, the module %articleauthcustom-modules.ent;)
- Use an External Parameter Entity Declaration to name and then call the DTD Suite Modules of Modules, which names all the potential modules. (For the Authoring DTD, the module %modules.ent;)
- Use an External Parameter Entity reference to call the DTD-specific class over-rides. (For the Authoring DTD, the module %articleauthcustom-classes.ent;)
- Use an External Parameter Entity reference to call the DTD Suite default classes. (For the Authoring DTD, the module %default-classes.ent;)
- Use an External Parameter Entity reference to call the DTD-specific mix over-rides. (For the Authoring DTD, the module %articleauthcustom-mixes.ent;)
- Use an External Parameter Entity reference to call the DTD Suite default mixes. (For the Authoring DTD, the module %default-mixes.ent;)
- Use an External Parameter Entity reference to call the DTD-specific content models and attribute list over-rides. (For the Authoring DTD, the module %articleauthcustom-models.ent;)
- Use an External Parameter Entity reference to call in the standard Common Module (%common.ent;) that defines elements and attributes so common they are used by many modules.
- Use an External Parameter Entity reference to call any DTD-specific module defining block-level or phrase-level elements. (For the Authoring DTD, the module %nlmcitation.ent; in which a more prescriptive citation element, <nlm-citation>, is declared)
- Select, from the Module of Modules, those modules which contain the elements needed for the DTD (for instance, selecting lists and not selecting math elements) and calling in each of the modules needed. (The Authoring DTD calls these in alphabetical order, since the order does not matter.)
- Define the document element and any other unique elements and entities needed for this DTD. (For example, the Authoring DTD declares only four elements — <article> [the top-level element] and its components: <front>, <body>, and <back>.)

CASE — Element, attribute, and entity names that originate with this DTD or with the Suite are in all lower case. Element and attribute names taken from PUBLIC modules (e.g., MathML and various table modules) incorporated into these DTDs are in the case in which they were found in the original module.
TWO-WORD NAMES — Elements named with two words are separated by a hyphen, for example, <def-list> and <term-head>.
WORD STANDARDIZATION — Abbreviations are standardized so that, for example, “keyword” is always used as “kwd” (as in the element <kwd-group>) and group is not abbreviated (as in the element <kwd-group>). The naming rules are described in the Authoring DTD and Suite Naming Rules section of this Tag Library.

PARAMETER ENTITY: SAME FUNCTION, SAME NAME — The Suite modules and initial DTDs have used a series of Parameter Entity naming conventions consistently. While parsing software cannot enforce these Parameter Entity naming or usage conventions, these conventions can make it much easier for a person to know how the content models work and what must be modified to make a DTD change.

CLASSES — Classes are functional groupings of elements used together in an OR group. Each class is named with a Parameter Entity, and all class Parameter Entity names end in the suffix “.class”:

 <!ENTITY % list.class "def-list | list">

A class, by definition, should never be made “empty”; the class should be removed from all models where you do not want the class elements included.

MIXES — Mixes are functional OR groups of classes; mixes should never contain element names directly. All mixes must be declared after all classes, since mixes are composed of classes. Mix names have no set suffix; for example, they may end in “-mix” or “-elements”. Content models and content model over-rides use mixes and classes for all OR groups. Only content model sequences are made up of element names directly.

MODEL OVER-RIDES — Parameter Entity mixes for over-riding a content model are of two styles: 1) inline mixes and 2) full content model replacements. These two groupings have been defined and named separately to preserve the mixed-content or element-content nature of the models in DTDs derived from the Suite.

The inline Parameter Entities to be intermingled with character data (#PCDATA) in a mixed content model are named with a suffix “-elements”. For example, “%institution-elements;” would be used in the content model for the element <institution>:

 <!ENTITY % institution-elements "| %break.class; | %emphasis.class; | %subsup.class;" >
 <!ELEMENT  institution (#PCDATA %institution-elements;)* >

All inline mixes begin with an OR bar, so that the mix can be removed leaving just character data (#PCDATA):

 <!ENTITY % rendition-plus "| %emphasis.class;  | %subsup.class;" >

The over-ride of a complete content model will be named with a suffix “-model” and should include the entire content model, including the enclosing parentheses:

 <!ENTITY % kwd-group-model "(title?, (%kwd.class;)+ )" >
 <!ELEMENT  kwd-group %kwd-group-model; >

DTD — This Tag Library describes the components for the Authoring DTD. This DTD consists of a base DTD module (delivered as the file articleauthoring.dtd) which calls in all the other modules as External Parameter Entities. Each module specific to this DTD (therefore, not part of the Suite) takes the prefix “articleauthcustom-”.

Each DTD and module has been assigned a unique formal public identifier (fpi). File names are never referenced directly in the comments in the DTD; the file is referred to by the name of the external Parameter Entity, which names the fpi and a system name for the file. The external Parameter Entity has been set to the initial delivery filename.

The individual modules of both the Suite and the DTD (as delivered) have been given DOS/Windows 3-digit suffixes indicating their type:

`*.dtd`	A module that can be used as the top level of an XML hierarchy. Used for the Authoring DTD top level, `articleauthoring.dtd`, but also taken unchanged for public DTD modules that have been included in this DTD such as the MathML DTD and the XHTML table model.
`*.ent`	A DTD fragment for incorporation into a full DTD. May contain element declarations, entity declarations, etc.
`*.mod`	A DTD fragment for incorporation into a full DTD. May contain element declarations, entity declarations, etc. This extension has the same meaning as `*.ent` and is only used to maintain the extension names dictated by the inclusion of PUBLIC DTD fragments, for example, `mathml2-qname-1.mod`.

While the DTD cannot dictate graphic file names, the comments do suggest that best practice for naming graphic files in documents tagged according to this DTD Suite would be to limit the names and path names to these characters: letters (both upper and lower case), numbers, underscore, hyphen, and period. All such names will be assumed to be case sensitive. DOS-style file extensions may be used.

Modeling several structures and functions that might appropriately be part of a DTD using this Suite has been delayed until a later version of the Suite. Such components include:

Questions and Answers (except as they can be modeled with the current DTD by using paragraphs and lists);
Proper systematic identification keys (except as they can be tagged using regular list structures);
Continuing Medical Education material;
Forms and fill-in-the-blank areas;
Conflict of Interest statements and Financial Disclosures (except as they can be modeled using paragraphs and footnotes);
Electronic and Digital Rights Management material;
Advertising included in a journal (for example, employment listings, classified advertising, and display advertising);
Calendars, meeting schedules, and announcements (except as these can be handled as ordinary articles or sections within articles); and
Material specific to an individual journal such as Author Guidelines, Policy and Scope statements, Editorial or advisory boards, detailed indicia, etc.

Article Authoring 2.1

Documentation for Implementors

Modular DTD Design

DTDs from the Base Suite

Learning the DTD

Subsidiary section:

Structure of the Authoring DTD

How To Make New DTDs

Parameter Entities Modules to Customize and Change

How To Build a New Custom DTD

The Concept

Making a Variant DTD

DTD and Suite Naming Conventions

Element and Attribute Naming Rules

Parameter Entity Names for Classes and Mixes

File Naming Conventions

Phase II DTD Work

Article Authoring 2.1

Version of 20050930

Article Authoring 2.1

Digital Archive of Journal Articles National Center for Biotechnology Information (NCBI) National Library of Medicine (NLM)

Documentation for Implementors

Modular DTD Design

DTDs from the Base Suite

Learning the DTD

Subsidiary section:

Structure of the Authoring DTD

How To Make New DTDs

Parameter Entities Modules to Customize and Change

How To Build a New Custom DTD

The Concept

Making a Variant DTD

DTD and Suite Naming Conventions

Element and Attribute Naming Rules

Parameter Entity Names for Classes and Mixes

File Naming Conventions

Phase II DTD Work

Article Authoring 2.1

Version of 20050930

Digital Archive of Journal Articles
National Center for Biotechnology Information (NCBI)
National Library of Medicine (NLM)