Tag Library Introduction

Journal Publishing Tag Set Tag Library version 3.0

The intent of the NLM Journal Archiving and Interchange Tag Suite is to provide a common format in which publishers and archives can exchange journal content. The Suite provides a set of XML schema modules that define elements and attributes for describing the textual and graphical content of journal articles as well as some non-article material such as letters, editorials, and book and product reviews.

The Journal Publishing Tag Set defines elements and attributes that describe the content and metadata of journal articles, including research and non-research articles, letters, editorials, and book and product reviews. The Tag Set allows for descriptions of the full article content or just the article header metadata.

Publishing is a moderately prescriptive Tag Set, optimized for archives who wish to regularize and control their content, not to accept the sequence and arrangement presented to them by any particular publisher. The Tag Set is also intended for use by publishers for the initial XML tagging of journal material, usually as converted from an authoring form like Microsoft Word.

Because Publishing is optimized for regularizing an archive or establishing a sequence of elements to aid print and web production, the Tag Set is smaller than the Archiving Tag Set. There are fewer elements, fewer choices in many contexts, and particular element sequence is imposed more often.

The philosophy of this Publishing Tag Set is to prefer a single structural form whenever possible. The Publishing Tag Set is optimized for regularizing an archive or establishing a sequence of elements to aid print and web production. Elements and tagging choices are limited to produce consistent data structures to enable output products and to provide a single location of information for searching.

By design, this is a model for journal articles, such as the typical research article found in an STM journal, and not a model for complete journals. This Tag Set does not include an overarching model for a collection of articles. In addition, the following journal material is not described by this Tag Set:

Company, product, or service display advertising;
Job search or classified advertising;
Calendars, meeting schedules, and conference announcements (except as these can be tagged as ordinary articles, sub-articles, or sections within articles); and
Material specific to an individual journal, such as Author Guidelines, Policy and Scope statements, editorial or advisory boards, detailed indicia, etc.

The Journal Publishing Tag Set defines a document that is a top-level component of a journal such as an article, a book or product review, or a letter to the editor. Each such document is composed of one or more parts; if there is more than one part, they must appear in the following order:

Front matter (required). The article front matter contains the metadata for the article (also called article header information), for example, the article title, the journal in which it appears, the date and issue of publication for that issue of that journal, a copyright statement, etc. This is not textual front matter as appears in books, rather this is bibliographic information about the article and the journal in which it was published.
Body of the article (optional). The body of the article is the main textual and graphic content of the article. This usually consists of paragraphs and sections, which may themselves contain figures, tables, sidebars (boxed text), etc. The body of the article is optional to accommodate those repositories that just keep article header information and do not tag the textual content.
Back matter for the article (optional). If present, the article back matter contains information that is ancillary to the main text, such as a glossary, appendix, or list of cited references.
Floating Material (optional). A publisher may choose to place all the floating objects in an article and its back matter (such as tables, figures, boxed text sidebars, etc.) into a separate container element outside the narrative flow for convenience of processing.
Following the front, body, back, and floating material, there may be either one or more responses to the article or one or more subordinate articles:
- Response. A response is a commentary on the article itself, for example, an opinion from an editor on the importance of the article or a reply from the original author to a letter concerning his article.
- Sub-article. A sub-article is a small article that is completely contained inside another article.

This Tag Set is one of several created from the Suite. Information about these Tag Sets may be found at the following site: http://dtd.nlm.nih.gov.

Because access for a wide range of output devices, as well as for the visually impaired, is becoming more and more important in the STM journal community, the modules in the Archiving and Interchange Suite were designed to follow, as much as possible, the W3C Web Content Accessibility Guidelines 2.0 working draft (22 August 2002), which was the latest accessibility specification available when the Suite was initially constructed. This Specification specifies accessibility guidelines on many levels from design through application. The guidelines which pertain to the modeling of materials were followed to at least Level-2 compliance. For example, a Long Description <long-desc> element was defined as part of many other elements, such as Figure <fig>, so it can be added not only to all figures and other graphical objects, but to any section of the text (for example, to a Boxed Text <boxed-text>) to provide an accessible description of the object. The @xml:lang attribute was added to all section-level elements and many paragraph-level elements to permit explicit indication of the language of the content, as required by these guidelines. The Abbreviation or Acronym <abbrev> element (also to be used for acronyms) was added to meet Checkpoint 4.3.

Element	Elements are nouns, like “speech” and “speaker”, that represent components of journal articles, the articles themselves, and accompanying metadata.
Attribute	Attributes hold facts about an element, such as which type of list (e.g., numbered, bulleted, or plain) is being requested when using the List <list> tag, or the name of a pointer to an external file that contains an image. Each attribute has both a name (e.g., @list-type) and a value (e.g., “`bullet`”).
Metadata	Data about the data, for example, bibliographic information. The distinction is between metadata elements which describe an article (such as the name of the journal in which an article was published) versus elements which contain the textual and graphical content of the article.

The Journal Publishing Tag Set is available on the NLM Journal Archiving and Interchange Tag Suite Website including: the Tag Set models expressed as DTDs, XSD schemas, and RELAX NG schemas; one or more tagged sample documents; and this Tag Library documentation (a set of linked HTML files). The DTD file delivery includes the Journal Publishing Tag Set; the Tag Set customization files needed for the Journal Publishing Tag Set to override the declarations in the modular Suite, any new modules that define Tag-Set-specific elements, and the modules that comprise the full Suite.

How you use the documentation will depend on what you need to learn about the modules and the Tag Set.

If you want to learn about the elements and the attributes in this Tag Set so you can tag documents or learn how the journal article model is constructed, here is a good way to start.

Read the Tag Library General Introduction, taking particular note of the next section that describes the parts of the Tag Library so you will know what resources are available.
Next, if you do not know the symbols used in the Document Hierarchy diagrams, read the “Key to the Near & Far® Diagrams”.
Scan the Document Hierarchy diagrams to get a good sense of the top-level elements and their contents. (Find what is inside an <article>, now what is inside each of the four large pieces of an article, keep working your way down.)
Pick an element from one of the diagrams. Look up the element in the Elements Section to find the full name of the element, its definition, usage notes, content allowed, and any attributes. Look up one of the attributes to find its full name, usage notes, and potential values.

Finally, if you are interested in conversion from a particular source:

Look at an article in a printed or online journal or look at the DTD/schema for the other journal.
- Can all the information you want to store from an article fit into the models shown in the diagrams?
- Do you have, or know how to get, all the information the models require? Will that information always be available for documents that are complete and correct?
- How difficult will it be to identify the parts of the information using the elements and attributes described in these models? Would changes to one or more models make this easier?
Now look at some non-article content, such as a news column, a book review, or some letters to the editor. Are there tags to handle all these article types and all their components?

This Tag Library contains the following sections:

How To Use (Read Me First	How to make best use of this Tag Library to reference XML tags, become familiar with the Publishing Tag Set as a whole, or see examples of recommended usage.
Introduction	This introduction to the contents of this Tag Library, to the design philosophy and intended usage of the Archiving and Interchange Suite, and to the Journal Publishing Tag Set.
Elements Section	Descriptions of the elements used in the Journal Publishing Tag Set and the parts of the Archiving and Interchange Suite used in this Tag Set. The element descriptions are listed in alphabetical order by tag name. [Note: Each element has two names: a “tag name” (formally called an element-type name) that is used in tagged documents, in the DTDs/schemas, and by XML software; and an “element name” (usually longer) that provides a fuller, more descriptive name for the benefit of human readers. For example, a tag name might be <disp-quote> with the corresponding element name Quote, Displayed, or a tag name might be <verse-group> with the corresponding element name Verse Form for Poetry.]
Attributes Section	Descriptions of the attributes in the Journal Publishing Tag Set. Like elements, attributes also have two names: the shorter machine-readable one and a (usually longer) human-readable one. Attributes are listed in order by the shorter, machine-readable names. For example, the attribute short name @list-type instead of the more informal, easier to read: Type of List.
Parameter Entity Section	Names (with occasional descriptions) and contents of the parameter entities in the Tag Set modules.
Document Hierarchy Diagrams	Tree-like graphical representations of the content of many elements. This can be a fast, visual way to determine the structure of an article or of any element within an article.
Full Article Samples	Two full articles are provided in both PDF format and in XML according to this Tag Set. These are provided to help users understand the relationship between the article as displayed and the XML version of the article.
Common Tagging Practice	Tips, tricks, hints, and examples of how (and why) to tag certain structures using this Tag Set.
Implementing This Tag Set	Implementor’s instructions for using the Tag Set, customizing this Tag Set, or making derivative tag sets based on this one.
Change Report: Version 3.0	Detailed information about the changes between version 3.0 and 2.3 for both for the base Suite and the Journal Publishing Tag Set. The report has two sections: the first identifies how each element changed between the two versions; the second provides an implementor’s view of the changes in parameter entities, content models, attribute lists and/or values, etc.
DTD, XSD, and RNG	The Journal Publishing Tag Set is available in three forms: an XML Document Type Definition (DTD); a W3C XML Schema (XSD); and a RELAX NG Schema (RNG). Each of these formats is available in two forms: a zipped file containing a downloadable version of the schema (often in multiple files), and a readable/browsable version in which the internal markup has been escaped.
Context Table	A listing of where each element may be used. All elements in the Tag Set are given in a single alphabetical list. The Context Table is formatted in two columns. The first column (“This Element”) names an element, with the name shown in pointy brackets. In the second column (“May Be Contained In”) for each element is an alphabetical list of all the elements in which the first column element may occur. For example, if the first column contains the element <front> and the second column contains only the <article> element, this means that the <front> element may only be used directly inside an <article>. Most elements may be used inside more than one other element. For example, the element <def> (a definition) may be used inside the <abbrev> and the <def-item> elements. The Context Table contains the same information that is found on each element page under the heading “May Be Contained In”.
Index	Where to find elements, tags, and terms used in this Tag Library. Includes synonyms (terms not used in this Tag Set) that direct the reader to elements used in this Tag Library, for example, “author” is paired with Contributor <contrib>.

<alt-text>	The tag name of an element (written in lower case with the entire name surrounded by “< >”)
Alternate Title Text (For a Figure, Etc.)	The element name (long descriptive name of an element) or the descriptive name of an attribute (written in title case, with important words capitalized, and the words separated by spaces)
must not	Emphasis to stress a point

This version of the Tag Suite and the derivative Tag Sets marks a significant departure from all previous versions. Before now, all changes to the Suite had been fully backward compatible. That meant that while the Suite changed and grew, no document tagged according to a particular Tag Set was invalid because of any new or changed models. For the first time, this is not the case. Version 3.0 is non-backward compatible, meaning documents which were valid against any previous version may not be valid against this version. Some users will choose to keep their existing documents as they are, valid according to an older version of the Tag Set and to use the new Tag Set for future documents. Some users will convert their existing documents to the new Tag Set, a conversion that we believe will be mostly automatable in most cases. It is possible that some users will choose to continue to use an older version of the Tag Set.

The rationales behind the changes vary, but all were made with the intent to make it easier to move forward. There has been nearly six years of experience working with this Tag Set on journal articles as varied as medicine, biotechnology, and physics. Many of the changes were of the “if we had known then what we know now” variety. Others were changes that have been obvious improvements for four or five years, but could not be made within the scope of normal maintenance because they would have been non-backward compatible.

Version 3.0 includes the regular scope changes and additions that have been the subject of previous version releases. For example, a new funding model (<funding-group>) has been added to reflect the changing needs of the community. In addition to this kind of change, version 3.0 changes also include:

Elements renamed to make the naming more consistent throughout the Tag Suite. For example, the element <custom-meta-wrap> was changed to <custom-meta-group> because it was learned over the years that “wraps” represent a single element and all its component parts while “groups” are collections of similar objects.
Confusing or ambiguous aspects of the models were simplified. For example, the “citation-type” attribute on bibliographic reference citations used to be a text attribute that gave the “type” of the citation. This attribute was used to specify that this was a journal, not a book; or that this was a print, not a web publication; or that this was published by the government or by a standards body. Use of this “type” attribute got very confusing when a user wanted to specify several things about a document (a report published online by the government). This single, over-used attribute was replaced with three new, more specific attributes: @publication-type (to hold journal or book), @publication-format (to hold print versus online), and @publisher-type (to hold standards body versus government).
Wrapper elements were added to keep related-material together. For example, translated titles and subtitles are now tied together in a <trans-title-group> element, which holds the @xml:lang attribute previously held by individual titles and subtitles.

A report listing all changes is available in the Change Report section.

We thank bmj.com, Molecular Biology of the Cell, and The Proceedings of the National Academy of Sciences of the U.S.A. for providing the sample articles used in this Tag Library.

Journal Publishing Tag Set Tag Library version 3.0

General Introduction

Introduction to the Journal Publishing Tag Set

Rationale

Scope

Structural Overview

Tag Sets Developed from the Suite

Accessibility

How to Read This Tag Library

Terms and Definitions

How To Start Using This Tag Library

Learn the Tag Set

Structure of This Tag Library

Tag Library Typographic Conventions

Changes in Version 3.0

Acknowledgments

Journal Publishing Tag Set Tag Library version 3.0

Version of November 2008

Journal Publishing Tag Set Tag Library version 3.0

Digital Archive of Journal Articles National Center for Biotechnology Information (NCBI) National Library of Medicine (NLM)

General Introduction

Introduction to the Journal Publishing Tag Set

Rationale

Scope

Structural Overview

Tag Sets Developed from the Suite

Accessibility

How to Read This Tag Library

Terms and Definitions

How To Start Using This Tag Library

Learn the Tag Set

Structure of This Tag Library

Tag Library Typographic Conventions

Changes in Version 3.0

Acknowledgments

Journal Publishing Tag Set Tag Library version 3.0

Version of November 2008

Digital Archive of Journal Articles
National Center for Biotechnology Information (NCBI)
National Library of Medicine (NLM)