[Updated versions of the Tag Suite have been released. Current version information is available here.]
Version 2.0 was released on December 30, 2004. Following is a list of updates.
The files for Version 2.0 are available:
The Tag Library for version 2.0 is here: dtd.nlm.nih.gov/tag-library/2.0/
There are extensive changes between Version 2.0 of the Journal Archiving and Interchange Tag Set (hereafter Archiving Tag Set) and earlier Tag Set versions 1.0 and 1.1. For this reason, all modules in both the Archiving Tag Set and the overarching Tag Suite advance to Version 2.0 dated 08/30/2004.
Although the changes are fully backwards compatible for XML documents (document instances), the new Archiving Tag Set and the full Suite may not be backwards compatible for all previous customizations.
The major changes to the content models and attribute lists in version 2.0 include:
Incorporation of AIT Working Group suggestions from the June 2004 meeting and June/July 2004 follow-up list discussions; and
General loosening of the Archiving Tag Set to further the goals of preservationist archives;
Major changes have also been made in the Parameter Entities and modularization. Archiving Version 2.0 permits more modifications than previous versions and changes the way in which Tag Set customizations are made. The changes to the modularization include:
Division into three base Sets instead of two: a preservation-oriented Archiving Tag Set, a regularizing Publishing Tag Set, and a smaller strict Authoring Tag Set;
Remodularization of the Tag Suite, and thus of this Journal Archiving and Interchange Tag Set, to meet new (and newly articulated) customization requirements and to make it more obvious how to make a new Set from the Suite. As part of this modularization, all the default classes have been moved out of the individual element-definition modules, new Suite default class and default mix modules have been set up, and single-module-customization is no longer considered best practice.
The Archiving and Interchange Tag Set Working Group (AIT WG) reviewed changes requested by tag set users over the past year and recommended the following changes. Changes are listed in alphabetical order by element name.
Allowed all date components (day | month | season | year) inside
Added the following elements:
Added to the following elements: <array>, <boxed-text>, <chem-struct-wrapper>, <disp-quote>, <fig>, <graphic>, <preformat>, <table-wrap>, <table-wrap-foot>, <verse-group>
Added new attributes
New element, added to %date-parts.class; with year, day, etc.
<term> optional and repeatable inside <def-item>
New element Object Identifier <object-id> can be used to capture any publisher’s or archive’s ID number. The Object Identifier was modeled as an element rather than as an attribute to allow for multiple IDs.
Used in the following elements: <citation>, <product>, <abstract>, <boxed-text>, <chem-struct-wrapper>, <fig>, <supplementary-material> , <graphic>, <media>, <pre-format>, and <table-wrap>.
New sequence attribute “seq” for recording publication sequence
<label> and <caption> both allowed to repeat , to allow for alternate placements.
Added optional <label> to the beginning of the model
New element to hold an identifier, such as a DOI, that is associated with an issue of a journal (as opposed to the existing element <issue>, which will henceforth be named and defined as the Issue Number). Added to <article-meta>, <citation>, <product>, and <related-article>
Added new element to hold a special issue or theme title. Added to <article-meta>, <citation>, <product>, and <related-article>.
Added LaTeX as a notation for display and inline formulas. Added “LaTeX” as a potential values for the “notation” attribute, inside %tex-math-atts;
Added new attribute “initials” to the personal name components <surname> and <given-names>.
Added reference elements to content
Added new values: %pub-id-types;, which is used on <article> and other elements. New values: “pmcid” and “art-access-id”. These values are not used in the Archiving DTD, which sets the “pub-id-type” attribute to CDATA.
Added <issue-id>, <issue-title>, <page-range>, <role>, <string-name>, and <volume-id>
Added reference elements to content. Also added the following attributes:
Created new element and allowed it to be used anywhere <name> is used
Added new element to hold an identifier, such as a DOI, that is associated with a volume of a journal (as opposed to the existing element <volume>, which will henceforth be named and defined as the Volume Number. Added new element to <article-meta>, <citation>, <product>, and <related-article>.
Generated Text and Punctuation—Added a container element to hold punctuation or other generated text, typically when 1) an archive decides not to have any text generated and thus to pre-generate such things as commas or semicolons between keywords or 2) when an archive receives text with <x> tags embedded and wishes to retain them. The <x> element is allowed in: <x> —Added a container element to hold punctuation or other generated text, typically when 1) an archive decides not to have any text generated and thus to pre-generate such things as commas or semicolons between keywords or 2) when an archive receives text with <x> tags embedded and wishes to retain them. The <x> element is allowed in: <aff>, <article-meta>, <citation>, <collab>, <contrib>, <contributor-group>, <corresp>, <def-list>, <def-item>, <kwd-group>, everywhere the %references.class; was used (though NOT by adding it to the class itself), <person-group>,<product>, <publisher-loc>, <related-article>, <string-date>, and <string-name>.
The intent of the Journal Archiving Tag Set will be to be maximally enabling, allowing each archive to describe as much as possible of what a diverse group of publishers can produce. There is almost no enforcement in this Set, beyond an attempt to keep block-level elements at the block level and mixed content elements at the data character level (to facilitate interchange). Most of the user requests boiled down to asking for a loosening of the Archiving Tag Set models. As part of the Version 2.0 split into 3 Tag Sets (and in recognition of the fact the a large percentage of the requested Suite changes over the last year have asked for expanding where a particular inline/phrase-level element might be used), Archiving was loosened as follows:
Distinctions concerning where phrase-level elements may be used were removed.. Now, if one phrase-level element is possible in a mixed content model, all phrase-level elements are possible.
All attributes values where a list could be used were changed to CDATA, to allow for any potential value. (Mime types and subtypes were not so changed, but this change is requested for the next release.)
Nearly all models containing data characters have been parameterized to have the potential to be mixed content. Only IDs remain as #PCDATA.
All redefinitions of content models replace the full model, so they can be redefined as any legal content model including ANY and EMPTY.
To remove the distinctions between where phrase-level elements may be used, a new Parameter Entity %all-phrase; was created, which contains almost all inline elements (not <font>, <hr>, <break>, etc.). The named inline elements are used:
Inside the general mixes: %emphasized-text;, %inside-cell;, %just-rendition;, %rendition-plus;, %simple-phrase;, and %simple-text;
Inside all Parameter Entities named “nnn-elements” that can be mixed with #PCDATA in the content model of the nnn element; and
In the following content model over-rides: %chem-struct-model;, %copyright-statement-elements; (formerly named %copyright-statement-model;), %history-model;.
Caution: There is one mild oddity, for which it did not seem worthwhile to break the system: Phrase-level elements include the address elements, which includes <addr-line>, so address line can be used in far more contexts than seems reasonable.
Attribute Declared Value Lists — To better preserve all the values a publisher might have used, all explicit value lists were changed to CDATA. Attributes changed: article-type, date-type, pub-id-type, seq, ref-type
Removed the PE %article-types; (which is not used in this Set, though it is used in Publishing Tag Set [Blue])
Changed nearly all models containing only data characters (i.e., #PCDATA models) to be over-ridden by a Parameter Entities suffixed “-elements” so that there is the potential for mixed content. The only purely #PCDATA elements left are
Various identifiers (such as <article-id>, <pub-id>, and <isbn>);
Date elements (such as <day>, <month>, <copyright-year>, and <year>);
Page number information (such as <fpage>, <lpage>, and <page-range>);
The text alternative element <alt-text>.
Made the following new Parameter Entities to allow the models to be over-ridden:%bio-model;, %contrib-model;, %def-model;, %disp-formula-model;, %fn-group-model;,%fn-model; , %history-model;, %inline-formula-model;, %kwd-group-model;, %person-group-model;, %preformat-model;, and %verse-group-model;.
Complete Models. All model over-rides will include enclosing parenthesis, so that model can be replaced by EMPTY or ANY. Thus, added parentheses to the Parameter Entity and removed them from the Element Declaration for the following: %abstract-model;, %ack-model;, %app-model;, %app-group-model;, %array-model;, %article-full-model;, %article-meta-model;, %article-short-model;, %author-notes-model;, %author-comment-model;, %back-model;, %body-model;, %boxed-text-model;, %chem-struct-wrapper-model;, %contrib-group-model;, %contrib-model;, %counts-model;, %date-model;, %def-item-model;, %def-list-model;, %disp-formula-model;, %disp-quote-model;, %fig-group-model;, %fig-model;, %fn-group-model;, %fn-model;, %front-model;, %gloss-group-model;, %glossary-model;, %inline-formula-model;, %journal-meta-model;, %kwd-group-model; , %list-model;, %list-item-model;, %note-model;, %preformat-model;, %ref-list-model;, %ref-model;, %sec-model; , %sec-opt-title-model;, %statement-model;, %table-wrap-model;, %table-wrap-group-model;, %table-wrap-foot-model; ,%title-group-model;, and %trans-abstract-model;.
Added a new Parameter Entity %caption-body-parts; to the content model for <caption> to allow section-level or additional block elements but to keep the prohibition on a #PCDATA model.
from <ack>, <address>, <date>, <notes>, and <person-group>
Removed <email> and <uri>
The section below lists the changes to the base Suite which is used to build the Archiving, Publishing, and Authoring Tag Sets. Most of the old class modules (%list.ent;, %references.ent;, etc.) look much the way they used to, although the default classes and mixes have been moved out of the individual modules into the new class-specific and mix-specific modules. A few elements have moved from one module to another, particularly to the common module, as their usage increased. There are more Parameter Entities to make it possible to over-ride even more content models and attribute lists.
Global changes include:
Changed the version number of every module in both the base Suite and the Archiving Tag Set to reflect a Version of 2.0 and a date of 08/30/2004;
Changed the formal public identifier of each module to that version and date, and
Changed the “dtd-version” in all the Sets to #FIXED attribute to “2.0”.
Two base Tag Sets were originally constructed from the Journal Archiving and Interchange Tag Suite: the Journal Archiving and Interchange Tag Set (nicknamed Green) and the Journal Publishing Tag Set (nicknamed Blue). The Publishing Blue Tag Set was to be prescriptive, to facilitate authoring. The Archiving Green Tag Set was to be moderately loose, to serve as a basis for interchange and repositories.
An understated, but very real, goal of the Archiving Tag Set was regularization of the archive itself. At the time of conversion from a publisher’s original DTD to the Journal Archiving Tag Set, a number of changes would be made to make all articles more alike. An alternate goal for the Set might have been (but was not) preservation of a publisher’s content as exactly as possible. (As a small illustration of the difference such a design makes in a tag set, consider the ”article-type” attribute values for the <article> element. An archive-regularizing set would make this a closed list and would change the “research article” of one publisher, the “research paper” of another, and the “letter” of a journal such as Nature into a single value during conversion. In contrast, a preservationist approach might make this attribute a CDATA open list, to capture whatever the publisher had called the article.)
Many requests from the AIT Working Group were from preservationists asking that the Archiving Set loosen up to allow representing a wider range of publishers’ input. As the Archiving Set loosened, archives wanting to regularize the archive migrated to the Publishing Set. They found it a little too tight for their needs and requested loosening changes. At the same time, vendors wanting to establish an authoring environment felt that Publishing needed tightening, to make it easier for authors. The solution has been to create three base Tag Sets from the Suite:
|Journal Archiving (Green) Tag Set||
a preservationist archival Tag Set (the current Archiving Tag Set, made even more flexible and non-enforcing
|Journal Publishing (Blue) Tag Set||
an archive regularization and interchange Tag Set (the current Publishing Tag Set loosened as necessary)
|Authoring (Pumpkin) Tag Set||
a tight, small subset that concentrates on best practice as an aid to authoring
When the Archiving and Interchange Tag Suite was written, it was assumed that the major use of the Suite would be to make entirely new and distinct Tag Sets, so the modularization was done to make that convenient. All customization was concentrated in one module, and Parameter Entities were defined just before they were used. But very few original sets have been developed in the last year and a half. New sets are being created from the Suite not by development but by modification. Most organizations seem to want most of their models to be the same as the Suite, plus a few changes. People modifying the Tag Set have also requested more guidance in how to modify it properly. In light of actual Suite usage, we have observed the following Tag Suite customization requirements.
The published modules should act as a more complete model of how to build a modified tag set from the Suite. The modularization of the base Archiving and Publishing Sets should provide a sample of best practice for making and modularizing new sets.
Customization modules should be small and list only what has changed from the Suite defaults.
It should be relatively easy to compare new sets developed from the Suite.
Parameter Entities that are not over-ridden by a new set should not need to be declared by it (so that everything in the customization module is a change from the base).
It is still useful to developers to have all the element classes listed in one place, grouped functionally (in functions that match the Suite modules) so that it is easy to see what in the Suite is designed to be changed easily and how to change it.
Best practice for how to make a new set from the Suite has changed, and the Suite has been remodularized for the new style. Two Suite modules have been created to hold the declarations that were formerly part of each Tag Set's customization. In place of the former single customization module, there are smaller function-specific modules:
Default definition of element classes (the module %default-classes.ent;); and
Default definition of element mixes (the module %default-mixes.ent;).
All the default class Parameter Entities have been removed from the individual modules and placed into the new classing module.
These Tag Sets and Suite modules have used a series of design and naming conventions consistently. While parsing software cannot enforce these Parameter Entity usage or naming conventions, these conventions can make it much easier for a person to know how the content models work. Version 2.0 of the Journal Archiving Tag Set (and the entire Verison 2.0 Archiving and Interchange Tag Suite) use the following usage and naming conventions.
Classes —Classes are functional OR-groups of elements. All class names end in the suffix “.class”. For example: <!ENTITY % list.class "def-list | list">Classes cannot be made empty; the class should just be removed from all models where you do not want the elements included.
Mixes —Mixes are OR-groups of classes. All mixes must be declared after all classes, since mixes are composed of classes. (Mixes should never contain element names directly.) Mix names have no set suffix. Some mixes are inline to be intermingled with #PCDATA and some mixes grouping of block-level elements. All inline mixes begin with an OR bar. For example: <!ENTITY % rendition-plus "| %emphasis.class; | %subsup.class;" >
Content —Content models and content model over-rides use mixes and classes for all OR groups. Only sequences are made up of element names directly. Content models over-rides are of two types, defined separately to preserve the mixed-content or element- content nature of the models as an aid to interchange.
-models —The over-ride of a complete content model will be named with a suffix “-model”. The over-ride includes the entire content model, including the enclosing parentheses, for example: <!ENTITY % kwd-group-model "(title?, (%kwd.class; | %x.class;)+ )" >
-elements —A grouping of elements to be mixed with #PCDATA inside a content model will be named with a suffix “-elements”. For example “access-date-elements” would be used in the models for the elements <access-date>. All “-elements” over-rides begin with an OR bar, so that a model may exclude all elements and be reduced to #PCDATA . For example: <!ENTITY % access-date-elements "| %date-parts.class; | %x.class;" > Could be replaced by <!ENTITY % access-date-elements "" >
Attribute lists — Attribute lists for a particular element are named with the name of the element followed by the suffix “-atts”, so, for example, the attributes for the abstract element would be named “abstract-atts”. Such lists are not reused as frequently as they might be in many DTDs, to provide maximum flexibility. Attribute lists for different elements were rarely tied together. The Parameter Entities contain at least one complete line of an attribute list, not including the ATTLIST Declaration. <!ENTITY % rendition-plus "| %emphasis.class; | %subsup.class;" >
The ideal situation in a tag set is that mix OR-groups and OR-groups within content models do not name elements; they name classes. This makes customization easier and makes maintenance over time significantly easier. A few new classes were created to facilitate this:
%def.class; (used in both <def-item> and <abbrev>
%just-para.class; (used in, for example, <author-comment>, <bio>, <def>, <caption>, <statement>, <fig>),
Content models were rewritten to use the newly created classes. This rewriting did not lead to Tag Set changes, except in the following elements:
Modified %doc-back-matter-mix; (formerly named %doc-back-matter-elements;) to correct the historical error that had this Parameter Entity calling a mix (%sec-level;) and not a class (%sec.class;). Since there was nothing in %sec-level; but <sec>, this has no effect on the Archiving Tag Set as delivered, but it may change existing customizations.
%front-matter-model; rewritten to use new class Parameter Entity %front-back.class; and to use %list.class; rather than just <def-list>. This widens the model of <front-matter> by adding <list>.
%inside-para; was renamed %p-elements;
Deleted %para.class; In the definition of the Paragraph <p> element, its place will be taken by %p-elements; .In other mixes, such as %para-level;, %para.class; was replaced by the combination of %just-para.class; and %rest-of-para.class; . (No Tag Set Changes)
Inside the content model for Paragraph <p>, %rest-of-para.class; was renamed %p-elements;. The content model for <named=content> and the model for %p-elements; itself still use %rest-of-para.class;.
In %named-content-elements;, replaced the Parameter Entity %emphasized-text; with its constituent classes
In %copyright-statement-elements;, replaced the mix %rendition-plus; with its constituent classes
In %citation-elements;, replaced the mix %simple-text; with its constituent classes. This causes the apparent, but not real, deletion of %address-link.class;. These links are also in %references.class; so there was no Tag Set change.)
The link classes were reorganized to make future modification easier. Three classes were deleted; new link classes were added for a total of four, and everywhere the link classes were used was modified as follows:
All occurrences of %ext-links.class; were replaced with the new class %address-links.class;.
All occurrences of %link.class; were replaced with some combination of the new link classes named below.
These were not usually Tag Set changes, just parameterization changes.
The following link-related classes were deleted:
The new link classes are:
%address-link.class; (external links used in addresses)
%fn-link.class; (footnote alone)
%simple-link.class; (the internal links, just as it used to be)
%article-link.class; (links for journal articles)
Specific changes this encompassed include:
Replaced the link PEs in %emphasized-text;, %inside-cell; , %p-elements;, %product-elements;, %simple-phrase;
In %aff-elements;, replaced %link.class; with %address-link.class;, %simple-link.class;, %article-link.class; (directly in the Suite base module and via the use of %all-phrase; in the Archiving customization) (No Tag Set Change)
In <author-notes>, “(corresp | fn)+” was replaced with the %fn-link.class; and the new class %corresp.class;
In %collab-elements;, replaced %ext-links.class; with %address-link.class; and deleted %inpara-address; (directly in the Suite base module and via the use of %all-phrase; in the Archiving customization) (No Tag Set Change)
In %copyright-statement-elements;, replaced %inpara-address; with %address-link.class; (directly in the Suite base module and via the use of %all-phrase; in the Archiving customization)
<email> is considered to be just another type of external link, as <ext-link> is, so it was added to: %collab-elements; and %copyright-statement-elements;.
In %inside-para; (which had been modified and renamed %p-elements;) (No Tag Set change since %address-link.class; covers it.), deleted the PE %inpara-address;
In %named-content-elements;, replaced %link.class; with %address-link.class;, %article-link.class; , and %simple-link.class; (directly in the Suite base module and via the use of %all-phrase; in the Archiving customization) (No Tag Set Change)
In <related-article>, deleted %ext-links.class; because %references.class; has the needed links. (No Tag Set Change)
In order to make customization and maintenance easier, the names of several existing Parameter Entities were changed to bring them in line with the naming practices. This entailed changing the PE declaration and every mix or context model that used the PE.
%address-elements; ==> to %address.class;
%author-notes-elements; ==> %author-notes-model;
%block-math; ==> %block-math.class;
%citation-model; ==> %citation-elements;
%contrib-info; ==> %contrib-info.class;
%copyright-statement-model; ==> %copyright-statement-elements;
%def-item-elements; ==> %def-item-model;
%display-back-matter; ==> %display-back-matter.class;
%doc-back-matter-elements; ==> %doc-back-matter-mix;
%inline-math; ==> %inline-math.class;
%list-item-elements; ==> %list-item-model;
%related-article-model; ==> %related-article-elements;
%sec-back-matter-elements; ==> %sec-back-matter-mix;
INLINE MIX OR-BAR — All inline mixes begin with an OR bar. While not strictly conformant, all modern XML parsers tested allow this variant. This technique allows the PE to be set to the null string, cancelling out any element inclusions and leaving a model of #PCDATA. This could also have been accomplished by over-riding the entire content model with a PE. The disadvantage of that method is that it makes it very easy to change mixed-content models to block-level element-content models. Since that is a major barrier to interchange, keeping the level the same is the one area where this DTD attempts to enforce consistency.
Changed the following inline-mix Parameter Entities to use the OR-bar-first mechanism. This requires changing not only the Parameter Entity to add the OR-bar, but changing all content models that use the entity to remove the OR bar: %all-phrase;, %emphasized-text;, %inside-para; (Now renamed %p-elements;and used only inside the Paragraph element <p>), %just-rendition;, %preformat-elements;, %related-article-elements;, %rendition-plus; , %simple-phrase;, %simple-text;, and all the Parameter Entities with the suffix “-elements”, if they did not already start that way.
To make the DTDs more flexible and allow additional over-riding, the following new PEs were added:
%display-formula-elements; and %display-formula-model;.
New attribute Parameter Entities were added as well:
%article-id-atts; for <article-id>
%date-atts; for <date>
%object-id-atts; and “object-id-type”
%pub-date-atts; for <pub-date>
%pub-id-atts; for <pub-id>
%sub-article-atts; for <sub-article>
%volume-id-atts; for <volume-id>
A Frequently Asked Questions page is available.
Last updated: September 17, 2012