NCBI Logo NLM Journal Archiving and Interchange Tag Suite NLM Logo

Tag Suite Home
Version 2.0
Changes to the Base Suite
Rationale and Philosophy for Changes
Remodularization
Modeling Conventions and PE Naming
Tag Sets

Version 2.0

[Updated versions of the Tag Suite have been released. Current version information is available here.]

There are extensive changes between Version 2.0 of the NLM Journal Archiving and Interchange Tag Suite (hereafter Tag Suite) and earlier Tag Suite versions 1.0 and 1.1. For this reason, all modules in the Tag Suite advance to Version 2.0 dated 08/30/2004.

Although the changes are fully backwards compatible for XML documents (document instances), the new Tag Suite may not be backwards compatible for all previous customizations.

The major changes to the content models and attribute lists in version 2.0 include:

Major changes have also been made in the Parameter Entities and modularization. The Tag Suite Version 2.0 permits more modifications than previous versions and changes the way in which customizations are made. The changes to the modularization include:

Changes to the Base Suite

The section below lists the changes to the base Suite which is used to build the Archiving, Publishing, and Authoring Tag Sets. Most of the old class modules (%list.ent;, %references.ent;, etc.) look much the way they used to, although the default classes and mixes have been moved out of the individual modules into the new class-specific and mix-specific modules. A few elements have moved from one module to another, particularly to the common module, as their usage increased. There are more Parameter Entities to make it possible to over-ride even more content models and attribute lists.

Global changes include:

Rationale and Philosophy for Changes

Regularizing Versus Strict Preservation

Two base Tag Sets were originally constructed from the NLM Journal Archiving and Interchange Tag Suite: the Journal Archiving and Interchange Tag Set (nicknamed Green) and the Journal Publishing Tag Set (nicknamed Blue). The Publishing Tag Set was to be prescriptive, to facilitate authoring. The Archiving Tag Set was to be moderately loose, to serve as a basis for interchange and repositories.

An understated, but very real, goal of the Archiving Tag Set was regularization of the archive itself. At the time of conversion from a publisher’s original DTD to the Archiving Tag Set, a number of changes would be made to make all articles more alike. An alternate goal for the Tag Set might have been (but was not) preservation of a publisher’s content as exactly as possible. (As a small illustration of the difference such a design makes in a Tag Set, consider the ”article-type” attribute values for the <article> element. An archive-regularizing Tag Set would make this a closed list and would change the “research article” of one publisher, the “research paper” of another, and the “letter” of a journal such as Nature into a single value during conversion. In contrast, a preservationist approach might make this attribute a CDATA open list, to capture whatever the publisher had called the article.)

Three DTDs From Two

Many requests from the AIT Working Group were from preservationists asking that the Archiving Tag Set loosen up to allow representing a wider range of publishers’ input. As the Archiving Tag Set loosened, archives wanting to regularize the archive migrated to the Publishing Tag Set. They found it a little too tight for their needs and requested loosening changes. At the same time, vendors wanting to establish an authoring environment felt that Blue needed tightening, to make it easier for authors. The solution has been to create three base Tag Sets from the Suite:

Journal Archiving (Green) Tag Set

a preservationist archival Tag Set (the current Archiving Tag Set, made even more flexible and non-enforcing

Journal Publishing (Blue) Tag Set

an archive regularization and interchange Tag Set (the current Publishing Tag Set loosened as necessary)

Authoring Tag Set (Not available in 2.0)

a tight, small subset that concentrates on best practice as an aid to authoring

Remodularization

New Customization Requirements

When the Archiving and Interchange Tag Suite was written, it was assumed that the major use of the Suite would be to make entirely new and distinct Tag Sets, so the modularization was done to make that convenient. All customization was concentrated in one module, and Parameter Entities were defined just before they were used. But very few original Tag Sets have been developed in the last year and a half. New Tag Sets are being created from the Suite not by development but by modification. Most organizations seem to want most of their models to be the same as the Suite, plus a few changes. People modifying the Tag Suite have also requested more guidance in how to modify it properly. In light of actual Suite usage, we have observed the following Tag Suite customization requirements.

New Class and Mix Modules

Best practice for how to make a new Tag Set from the Suite has changed, and the Suite has been remodularized for the new style. Two Suite modules have been created to hold the declarations that were formerly part of each Tag Set’s customization. In place of the former single customization module, there are smaller function-specific modules:

Modeling Conventions and PE Naming

These Tag Set and Suite modules have used a series of design and naming conventions consistently. While parsing software cannot enforce these Parameter Entity usage or naming conventions, these conventions can make it much easier for a person to know how the content models work. Version 2.0 of the Archiving Tag Set (and the entire Verison 2.0 Archiving and Interchange Tag Suite) use the following usage and naming conventions.

New Classes

The ideal situation in a Tag Set is that mix OR-groups and OR-groups within content models do not name elements; they name classes. This makes Tag Set-customization easier and makes maintenance over time significantly easier. A few new classes were created to facilitate this:

Content models were rewritten to use the newly created classes. This rewriting did not lead to Tag Set changes, except in the following elements:

Link Classes

The link classes were reorganized to make future modification easier. Three classes were deleted; new link classes were added for a total of four, and everywhere the link classes were used was modified as follows:

The following link-related classes were deleted:

The new link classes are:

Specific changes this encompassed include:

Rename Existing Parameter Entities

In order to make customization and maintenance easier, the names of several existing Parameter Entities were changed to bring them in line with the naming practices. This entailed changing the PE declaration and every mix or context model that used the PE.

Inline Mix OR-Bars

INLINE MIX OR-BAR — All inline mixes begin with an OR bar. While not strictly conformant, all modern XML parsers tested allow this variant. This technique allows the PE to be set to the null string, cancelling out any element inclusions and leaving a model of #PCDATA. This could also have been accomplished by over-riding the entire content model with a PE. The disadvantage of that method is that it makes it very easy to change mixed-content models to block-level element-content models. Since that is a major barrier to interchange, keeping the level the same is the one area where this Tag Suite attempts to enforce consistency.

Changed the following inline-mix Parameter Entities to use the OR-bar-first mechanism. This requires changing not only the Parameter Entity to add the OR-bar, but changing all content models that use the entity to remove the OR bar: %all-phrase;, %emphasized-text;, %inside-para; (Now renamed %p-elements;and used only inside the Paragraph element <p>), %just-rendition;, %preformat-elements;, %related-article-elements;, %rendition-plus; , %simple-phrase;, %simple-text;, and all the Parameter Entities with the suffix “-elements”, if they did not already start that way.

Model Over-rides Permitted

To make the Tag Sets more flexible and allow additional over-riding, the following new PEs were added:

New attribute Parameter Entities were added as well:

Tag Sets

These Tag Sets are availble in version 2.0:




National Center for Biotechnology Information
U.S. National Library of Medicine
8600 Rockville Pike, Bethesda, MD 20894
Copyright, Disclaimer, Privacy, Accessibility

U.S. National Institutes of HealthU.S. Department of Health and Human ServicesUSA.gov


Last updated: September 14, 2012