Version 2.0

[Updated versions of the Tag Suite have been released. Current version information is available here.]

Version 2.0 was released on December 30, 2004. Following is a list of updates.

The files for Version 2.0 are available:

Documentation

The Tag Library for version 2.0 is here: dtd.nlm.nih.gov/tag-library/2.0/

Change Report

There are extensive changes between Version 2.0 of the Journal Archiving and Interchange Tag Set (hereafter Archiving Tag Set) and earlier Tag Set versions 1.0 and 1.1. For this reason, all modules in both the Archiving Tag Set and the overarching Tag Suite advance to Version 2.0 dated 08/30/2004.

Although the changes are fully backwards compatible for XML documents (document instances), the new Archiving Tag Set and the full Suite may not be backwards compatible for all previous customizations.

The major changes to the content models and attribute lists in version 2.0 include:

Major changes have also been made in the Parameter Entities and modularization. Archiving Version 2.0 permits more modifications than previous versions and changes the way in which Tag Set customizations are made. The changes to the modularization include:

Changes to Content Models
User Requested Changes

The Archiving and Interchange Tag Set Working Group (AIT WG) reviewed changes requested by tag set users over the past year and recommended the following changes. Changes are listed in alphabetical order by element name.

<access.date>

Allowed all date components (day | month | season | year) inside

<article-meta-model>

Added the following elements:

  • <custom-meta-wrap> (tag set-specific custom metadata)

  • <email>

  • <issue-id> (existing <issue> means issue number)

  • <issue-title>

  • <license>

  • <page-range> (for discontinuous page ranges. Supplements but does not replace first page and last page.)

  • <volume-id> (existing <volume> means volume number)

<attrib>

Added to the following elements: <array>, <boxed-text>, <chem-struct-wrapper>, <disp-quote>, <fig>, <graphic>, <preformat>, <table-wrap>, <table-wrap-foot>, <verse-group>

<contract-name>/ <contract-sponsor>

Added new attributes

  • id (ID)

  • rid (IDREFS)

  • %might-link-atts;

<string-date>

New element, added to %date-parts.class; with year, day, etc.

<def-list>

<term> optional and repeatable inside <def-item>

<disp-formula>

Added %inline-math;

DOI/Object Identifier

New element Object Identifier <object-id> can be used to capture any publisher’s or archive’s ID number. The Object Identifier was modeled as an element rather than as an attribute to allow for multiple IDs.

Used in the following elements: <citation>, <product>, <abstract>, <boxed-text>, <chem-struct-wrapper>, <fig>, <supplementary-material> , <graphic>, <media>, <pre-format>, and <table-wrap>.

<elocation-id>

New sequence attribute “seq” for recording publication sequence

<fig>

<label> and <caption> both allowed to repeat , to allow for alternate placements.

<fn>

Added optional <label> to the beginning of the model

<inline-formula>

  • Added to everywhere %inline-display; already was

  • Added <label>, <named-content> to its model

<issue-id>

New element to hold an identifier, such as a DOI, that is associated with an issue of a journal (as opposed to the existing element <issue>, which will henceforth be named and defined as the Issue Number). Added to <article-meta>, <citation>, <product>, and <related-article>

<issue-title>

Added new element to hold a special issue or theme title. Added to <article-meta>, <citation>, <product>, and <related-article>.

<journal-meta>

Added <custom-meta-wrap>

>LaTeX

Added LaTeX as a notation for display and inline formulas. Added “LaTeX” as a potential values for the “notation” attribute, inside %tex-math-atts;

<name>

Added new attribute “initials” to the personal name components <surname> and <given-names>.

<product>

Added reference elements to content

pub-id-types” attribute

Added new values: %pub-id-types;, which is used on <article> and other elements. New values: “pmcid” and “art-access-id”. These values are not used in the Archiving DTD, which sets the “pub-id-type” attribute to CDATA.

references.class

Added <issue-id>, <issue-title>, <page-range>, <role>, <string-name>, and <volume-id>

<related-article>

Added reference elements to content. Also added the following attributes:

  • id ID #IMPLIED” attribute, so the related-article can be referenced;

  • ext-link-type”, which indicates the type of link used to point to the related article. The attribute was used with exactly the same content (CDATA) and suggested values as when used with the element <ext-link>

  • issue”, which provides metadata concerning the related article (together with the attributes “vol”, “page”, and “journal-id”)

  • journal-id”, which provides metadata concerning the related article (together with the attributes “vol”, “page”, and “issue”)

  • journal-id-type”, which performs the same function that this attribute performs for the element <journal-id>. The “journal-id” values are the same as those for existing journal identifiers plus “issn

  • alternate-form-of”, which works (similarly to the same attribute when used on a <graphic> element) to point to another <related-article> element within the same document that provides an alternative form of the related article.

<string-name>

Created new element and allowed it to be used anywhere <name> is used

<volume-id>

Added new element to hold an identifier, such as a DOI, that is associated with a volume of a journal (as opposed to the existing element <volume>, which will henceforth be named and defined as the Volume Number. Added new element to <article-meta>, <citation>, <product>, and <related-article>.

<x>

Generated Text and Punctuation—Added a container element to hold punctuation or other generated text, typically when 1) an archive decides not to have any text generated and thus to pre-generate such things as commas or semicolons between keywords or 2) when an archive receives text with <x> tags embedded and wishes to retain them. The <x> element is allowed in: <x> —Added a container element to hold punctuation or other generated text, typically when 1) an archive decides not to have any text generated and thus to pre-generate such things as commas or semicolons between keywords or 2) when an archive receives text with <x> tags embedded and wishes to retain them. The <x> element is allowed in: <aff>, <article-meta>, <citation>, <collab>, <contrib>, <contributor-group>, <corresp>, <def-list>, <def-item>, <kwd-group>, everywhere the %references.class; was used (though NOT by adding it to the class itself), <person-group>,<product>, <publisher-loc>, <related-article>, <string-date>, and <string-name>.

General Loosening of the Tag Set

The intent of the Journal Archiving Tag Set will be to be maximally enabling, allowing each archive to describe as much as possible of what a diverse group of publishers can produce. There is almost no enforcement in this Set, beyond an attempt to keep block-level elements at the block level and mixed content elements at the data character level (to facilitate interchange). Most of the user requests boiled down to asking for a loosening of the Archiving Tag Set models. As part of the Version 2.0 split into 3 Tag Sets (and in recognition of the fact the a large percentage of the requested Suite changes over the last year have asked for expanding where a particular inline/phrase-level element might be used), Archiving was loosened as follows:

  1. Distinctions concerning where phrase-level elements may be used were removed.. Now, if one phrase-level element is possible in a mixed content model, all phrase-level elements are possible.

  2. All attributes values where a list could be used were changed to CDATA, to allow for any potential value. (Mime types and subtypes were not so changed, but this change is requested for the next release.)

  3. Nearly all models containing data characters have been parameterized to have the potential to be mixed content. Only IDs remain as #PCDATA.

  4. All redefinitions of content models replace the full model, so they can be redefined as any legal content model including ANY and EMPTY.

Loosening Phrase-level Usage

To remove the distinctions between where phrase-level elements may be used, a new Parameter Entity %all-phrase; was created, which contains almost all inline elements (not <font>, <hr>, <break>, etc.). The named inline elements are used:

Caution: There is one mild oddity, for which it did not seem worthwhile to break the system: Phrase-level elements include the address elements, which includes <addr-line>, so address line can be used in far more contexts than seems reasonable.

Attribute Loosening
Mixed Content (-elements)

Changed nearly all models containing only data characters (i.e., #PCDATA models) to be over-ridden by a Parameter Entities suffixed “-elements” so that there is the potential for mixed content. The only purely #PCDATA elements left are

Complete Content Models (-model)
Other Content Model Redefinition

<caption>

Added a new Parameter Entity %caption-body-parts; to the content model for <caption> to allow section-level or additional block elements but to keep the prohibition on a #PCDATA model.

Removed Parenthesis

from <ack>, <address>, <date>, <notes>, and <person-group>

access.class

  • Removed<ext-link>.

  • Added %address-link.class; to anywhere %access.class; was used.

address-elements.class

Removed <email> and <uri>

<contrib>

  • Made content model made into Parameter Entity %contrib-model;

  • Added <etal> to %contrib-info.class;, therefore, strict sequence after the contributor name no longer enforced

  • Added <uri> and <fn> to %contrib-info.class;

  • Allowed <string-name> as an alternative to <name>

  • Allowed <degrees> to follow either <name> or <string-name>

Changes to the Base Suite

The section below lists the changes to the base Suite which is used to build the Archiving, Publishing, and Authoring Tag Sets. Most of the old class modules (%list.ent;, %references.ent;, etc.) look much the way they used to, although the default classes and mixes have been moved out of the individual modules into the new class-specific and mix-specific modules. A few elements have moved from one module to another, particularly to the common module, as their usage increased. There are more Parameter Entities to make it possible to over-ride even more content models and attribute lists.

Global changes include:

Rationale and Philosophy for Changes
Regularizing Versus Strict Preservation

Two base Tag Sets were originally constructed from the Journal Archiving and Interchange Tag Suite: the Journal Archiving and Interchange Tag Set (nicknamed Green) and the Journal Publishing Tag Set (nicknamed Blue). The Publishing Blue Tag Set was to be prescriptive, to facilitate authoring. The Archiving Green Tag Set was to be moderately loose, to serve as a basis for interchange and repositories.

An understated, but very real, goal of the Archiving Tag Set was regularization of the archive itself. At the time of conversion from a publisher’s original DTD to the Journal Archiving Tag Set, a number of changes would be made to make all articles more alike. An alternate goal for the Set might have been (but was not) preservation of a publisher’s content as exactly as possible. (As a small illustration of the difference such a design makes in a tag set, consider the ”article-type” attribute values for the <article> element. An archive-regularizing set would make this a closed list and would change the “research article” of one publisher, the “research paper” of another, and the “letter” of a journal such as Nature into a single value during conversion. In contrast, a preservationist approach might make this attribute a CDATA open list, to capture whatever the publisher had called the article.)

Three Tag Sets From Two

Many requests from the AIT Working Group were from preservationists asking that the Archiving Set loosen up to allow representing a wider range of publishers’ input. As the Archiving Set loosened, archives wanting to regularize the archive migrated to the Publishing Set. They found it a little too tight for their needs and requested loosening changes. At the same time, vendors wanting to establish an authoring environment felt that Publishing needed tightening, to make it easier for authors. The solution has been to create three base Tag Sets from the Suite:

Journal Archiving (Green) Tag Set

a preservationist archival Tag Set (the current Archiving Tag Set, made even more flexible and non-enforcing

Journal Publishing (Blue) Tag Set

an archive regularization and interchange Tag Set (the current Publishing Tag Set loosened as necessary)

Authoring (Pumpkin) Tag Set

a tight, small subset that concentrates on best practice as an aid to authoring

Remodularization
New Customization Requirements

When the Archiving and Interchange Tag Suite was written, it was assumed that the major use of the Suite would be to make entirely new and distinct Tag Sets, so the modularization was done to make that convenient. All customization was concentrated in one module, and Parameter Entities were defined just before they were used. But very few original sets have been developed in the last year and a half. New sets are being created from the Suite not by development but by modification. Most organizations seem to want most of their models to be the same as the Suite, plus a few changes. People modifying the Tag Set have also requested more guidance in how to modify it properly. In light of actual Suite usage, we have observed the following Tag Suite customization requirements.

New Class and Mix Modules

Best practice for how to make a new set from the Suite has changed, and the Suite has been remodularized for the new style. Two Suite modules have been created to hold the declarations that were formerly part of each Tag Set's customization. In place of the former single customization module, there are smaller function-specific modules:

All the default class Parameter Entities have been removed from the individual modules and placed into the new classing module.

Modeling Conventions and PE Naming

These Tag Sets and Suite modules have used a series of design and naming conventions consistently. While parsing software cannot enforce these Parameter Entity usage or naming conventions, these conventions can make it much easier for a person to know how the content models work. Version 2.0 of the Journal Archiving Tag Set (and the entire Verison 2.0 Archiving and Interchange Tag Suite) use the following usage and naming conventions.

New Classes

The ideal situation in a tag set is that mix OR-groups and OR-groups within content models do not name elements; they name classes. This makes customization easier and makes maintenance over time significantly easier. A few new classes were created to facilitate this:

Content models were rewritten to use the newly created classes. This rewriting did not lead to Tag Set changes, except in the following elements:

Link Classes

The link classes were reorganized to make future modification easier. Three classes were deleted; new link classes were added for a total of four, and everywhere the link classes were used was modified as follows:

These were not usually Tag Set changes, just parameterization changes.

The following link-related classes were deleted:

The new link classes are:

Specific changes this encompassed include:

Rename Existing Parameter Entities

In order to make customization and maintenance easier, the names of several existing Parameter Entities were changed to bring them in line with the naming practices. This entailed changing the PE declaration and every mix or context model that used the PE.

Inline Mix OR-Bars

INLINE MIX OR-BAR — All inline mixes begin with an OR bar. While not strictly conformant, all modern XML parsers tested allow this variant. This technique allows the PE to be set to the null string, cancelling out any element inclusions and leaving a model of #PCDATA. This could also have been accomplished by over-riding the entire content model with a PE. The disadvantage of that method is that it makes it very easy to change mixed-content models to block-level element-content models. Since that is a major barrier to interchange, keeping the level the same is the one area where this DTD attempts to enforce consistency.

Changed the following inline-mix Parameter Entities to use the OR-bar-first mechanism. This requires changing not only the Parameter Entity to add the OR-bar, but changing all content models that use the entity to remove the OR bar: %all-phrase;, %emphasized-text;, %inside-para; (Now renamed %p-elements;and used only inside the Paragraph element <p>), %just-rendition;, %preformat-elements;, %related-article-elements;, %rendition-plus; , %simple-phrase;, %simple-text;, and all the Parameter Entities with the suffix “-elements”, if they did not already start that way.

Model Over-rides Permitted

To make the DTDs more flexible and allow additional over-riding, the following new PEs were added:

New attribute Parameter Entities were added as well:

Frequently Asked Questions

A Frequently Asked Questions page is available.




National Center for Biotechnology Information
U.S. National Library of Medicine
8600 Rockville Pike, Bethesda, MD 20894
Copyright, Disclaimer, Privacy, Accessibility

U.S. National Institutes of HealthU.S. Department of Health and Human ServicesUSA.gov


Last updated: September 17, 2012