General Introduction

The Article Authoring Tag Set (hereafter Authoring Tag Set) can be used to create new journal articles tagged with XML elements. The Tag Set defines the elements (and their attributes) that describe the content of most journal articles, as well the content of some non-article journal material, such as editorials and book reviews. A person using this Tag Set is typically one of the article’s authors or someone (such as a copyeditor) to whom the author has sent the XML files. The Tag Set is intended for use with XML-aware software or text-editors to guide the writer in choosing tags correctly. This Tag Set is a simplified version of the Journal Publishing (Blue) Tag Set, which describes the biological and medical journal articles archived in PubMed Central.

This Tag Library provides the documentation for the Authoring Tag Set including examples of element and attribute usage. The documentation is divided in to two levels: 1) author documentation for the person writing or editing textual material who wants to use the tags and 2) implementor documentation, which is more detailed for people who need to modify or maintain the DTDs or schemas. The author-level reference material includes:

Implementor-level material is also included:

Tag Set Design Principles

Purpose and Characteristics

This Authoring Tag Set is an XML vocabulary designed to describe new journal articles. These tags can be used by authors to submit publications to journals and to archives such as PubMed Central. The focus on authoring means that this is a smaller, tighter, less inclusive tag set than would have been necessary to create a journal archiving tag set. An archiving tag set needs to accommodate a wide variety of structures and offers very flexible content models for elements, whereas this Authoring Tag Set is more prescriptive than descriptive and includes many elements whose content must occur in a specified order. This is a tag set optimized for authorship of new journal articles, where regularization and control of content is important, and where it is useful rather than harmful to have only one way to tag a structure.

Since no assumptions can be made concerning the processing software and editorial situation that will receive an article authored in this Tag Set, tagging that forces specific formatting has been avoided. There is no way for an author to number his/her lists explicitly, for example, or to manually number the cited references, since many journals have their own citation policies. Numbers for the cited references must be generated by software to match editorial policy. For proofing purposes, the stlyesheets that may be used to produce PDF from these tagged articles number the references as counting numbers and the stylesheets that produce HTML for proofing display the identifier of the <ref> as the number.

Scope: Journal Articles

  • Articles. This is not a Tag Set for complete journals. The Authoring Tag Set models journal articles, where such an article is defined as the typical research article found in an STM journal. By design, the definition of an article is broad enough to include other article-like material found in journals, for example, editorials, short new pieces, obituaries, meeting reports, and book or product reviews.
  • Header and Body. This Tag Set describes both the metadata for a journal article and the content of the article. Both are required to make a complete article.
  • Multidisciplinary. Although designed for biomedical journals, this Tag Set should be sufficiently general to describe not only STM journals but technical journals in any field.

Changes in Version 2.3

Version 2.3 was an enhancement release that made changes requested by the Working Group based on operational concerns. All changes to content models and attribute lists have been made backward compatible. As an example, a new XHTML “style” attribute was added to all the elements inside <table> and the “content-type” attribute was distributed widely to add or preserve semantic information The Authoring 2.3 Change Report may be accessed through the following page:


Because access for a wide range of output devices, as well as for the visually impaired, is becoming more and more important in the STM journal community, the modules in the Archiving and Interchange Suite were designed to follow, as much as possible, the W3C Web Content Accessibility Guidelines 2.0 working draft. This Tag Set is based on the Archiving and Interchange Suite, which used the August 2002 specification, which was the latest accessibility specification available when the Suite was initially constructed. This Specification specifies accessibility guidelines on many levels from design through application. The guidelines which pertain to the modeling of materials were followed to at least Level-2 compliance. For example, a Long Description <long-desc> element was defined as part of many other elements, such as Figure <fig>, so it can be added not only to all figures and other graphical objects, but to any section of the text (for example, to a Boxed Text <boxed-text>) to provide an accessible description of the object. The xml:lang attribute was added to all section-level elements and many paragraph-level elements to permit explicit indication of the language of the content, as required by these guidelines. The Abbreviation or Acronym <abbrev> element (also to be used for acronyms) was added to meet Checkpoint 4.3.

Subsidiary sections:

Documentation for Authors

Documentation for Implementors


We thank, Molecular Biology of the Cell, and The Proceedings of the National Academy of Sciences of the U.S.A. for providing the sample articles used in this tag Library.