The National Center for Biotechnology Information (NCBI) of the National Library of Medicine (NLM) created the Journal Archiving and Interchange Tag Suite with the intent of providing a common format in which publishers and archives can exchange journal content. The Suite provides a set of XML schema modules that define elements and attributes for describing the textual and graphical content of journal articles as well as some non-article material such as letters, editorials, and book and product reviews.
The intent of this Tag Suite is to preserve the intellectual content of journals independent of the form in which that content was originally delivered. The Suite has been written as a set of XML schema modules, each of which is a separate physical file. No module is an entire schema by itself, but these modules can be combined into a number of different schemas.
The Suite can be used to construct schemas for authoring and archiving journal articles as well as transferring journal articles from publishers to archives and between archives. Details on creating schemas from the Suite are available in the Tag Libraries. Although the full Suite was developed to support electronic production, the structures should be adequate to support some print production as well.
NCBI/NLM has created several distinct Tag Sets from the Suite of Modules, each with its own purpose. A brief overview of each Tag Set is provided below. The full description of each Tag Set is available in its documentation.
|Archiving and Interchange Tag Set||Created to enable an archive to capture as many of the structural and semantic components of existing printed and tagged journal material as conveniently as possible, with no effort made to model any particular sequence or textual format|
|Journal Publishing Tag Set||Optimized for the archives that wish to regularize and control their content, not to accept the sequence and arrangement presented to them by any particular publisher|
|Article Authoring Tag Set||Designed for authoring new journal articles, where regularization and control of content is important|
|NCBI Book Tag Set||Written specifically to describe volumes for the NCBI online libraries|
Each one of the Tag Sets is delivered as an XML DTD, W3C XML schema, and RELAX NG schema, but only the XML DTD is intended for maintenance. While the structural constraints on document tagging expressed by the W3C XML schema and the RELAX NG schema are identical to those of the DTD, neither reflects the DTD's modular structure. For the specific impact this has on customizations, please consult the individual Tag Set documentation.
[Note: NCBI/NLM also created a DTD for the submission of citations and abstracts for MEDLINE/PubMed that predated this Suite. If you want to submit citations and abstracts to NLM for inclusion in PubMed/MEDLINE, use the PubMed Journal Article DTD. Detailed information is available from the PubMed web site: Information for Publishers re: XML Tagged Data.]
The Suite and all Tag Sets are in the public domain. An organization that wants to create its own schema from the Suite may do so without permission from NLM.
The Suite has been set up to be extended using a new schema file and a new schema-specific customization module to redefine the many parameter entities. Do not modify the Suite directly or redistribute modified versions of the Suite.
In the interest of maintaining consistency and clarity for potential users, NLM requests:
If you create a schema from the Archiving and Interchange Tag Suite and intend it to stay compatible with the Suite, please include the following statement as a comment in all of your modules:
Created from, and fully compatible with, the NLM Journal Archiving and Interchange Tag Suite.
If you alter one or more modules of the suite, please rename your version and all its modules to avoid any confusion with the original Suite. Also, please include the following statement as a comment in all your modules:
Based in part on, but not fully compatible with, the NLM Journal Archiving and Interchange Tag Suite.
The schemas and tools are all available by anonymous FTP: ftp://ftp.ncbi.nih.gov/pub/archive_dtd
Please see jats.nlm.nih.gov for the current Tag Suite version information.
|Version Number||Release Date|
|3.0||November 21, 2008|
|2.3||March 28, 2007|
|2.2||June 8, 2006|
|2.1||November 14, 2005|
|2.0||December 30, 2004|
|1.1||November 5, 2003|
|1.0||March 31, 2003|
How to Build A New Custom DTD
The basic idea for a new DTD is that all lower-level elements (paragraphs, lists, figures, etc.) will be defined in modules, either the modules of the Suite or in new DTD-specific modules, not in the DTD itself. The DTD will be fairly short and include only definitions of the topmost element(s), at least the document element and maybe its children.
Modules are defined using External Parameter Entities in with the Suite Module of Modules or in the DTD-specific Module of Modules. Modules are called (referenced) in the DTD, in the order needed to define the Parameter Entities in sequence.
Version 2.0 of this Journal Archiving and Interchange was written as an example of the new best-practice customization technique. A new DTD that follows this plan will probably consist of the following modules:
- A DTD module to define the top-level elements (e.g., yournew.dtd);
- A DTD-specific definition of element classes to add new classes and over-ride the default classes; (for example, %yournew-classes.ent;)
- A DTD-specific definition of element mixes to add new mixes and over-ride the default mixes; (for example, %yournew-mixes.ent;);
- A DTD-specific module of content model over-rides (for example, %yournew-models.ent;).
- A DTD-specific Module of Modules to name the non-Suite modules in the DTD (for example, %yournew-modules.ent;)
- DTD-specific modules to hold new types of element declarations (e.g., %taxonomic-key.ent; or %help-topic-meta.ent;); and
- All or most of the modules in the base Suite.
Example: Making a New DTD Using the Suite
To show the process, here is a series of instructions for making a new DTD, illustrated by showing how Journal Archiving and Interchange DTD was created from the modules of the whole Suite.
- Modules —Write a new DTD-specific Module of Modules, which defines all new customization modules the DTD needs. (As an example, the Archiving DTD created the module %archivecustom-modules.ent;, which contains the definitions of the class-over-ride module %archivecustom-classes.ent;, the mix-over-ride module %archivecustom-mixes.ent;, and the models-over-ride module %archivecustom-models.ent;.)
- Class Over-rides —Write a DTD-specific class-over-ride module, defining any over-rides to the Suite classes. These classes were defined in the default classes module. (As an example, the Archiving DTD created the module %archivecustom-classes.ent;, in which a new model for %contrib-info.class; was declared and an entirely new class %x.class; was added.)
- Mix Over-rides —Write a DTD-specific mix-over-ride module, defining any over-rides to the Suite mixes. These mixes were defined in the default mixes module. (As an example, the Archiving DTD created the module %archivecustom-mixes.ent;, in which a new mix %all-phrase; was declared and then used in many existing mixes, such as%simple-phrase;.)
- Model Over-rides —Create a DTD-specific content-model-over-ride module, defining any over-rides to the content models and attribute lists for the DTD Suite. (As an example, the Archiving DTD created the module %archivecustom-models.ent;, in which element collections (suffixed “-element”) that will be mixed with #PCDATA were redefined, full content models over-rides (suffixed “-model”) were redefined, and some new attributes and attribute lists were added.
- New Elements —Write any new element modules needed. These will define any new block-level or phrase-level elements. (As an example, the Archiving DTD did not need any new elements not in the Suite, but the new NLM Book DTD added modules for book metadata and book component parts.)
- DTD Module —Use the modules just described in the construction of a new DTD module. Within that DTD module:
- Use an External Parameter Entity Declaration to name and then call the DTD-specific modules of modules (For the Archiving DTD, the module %archivecustom-modules.ent;.)
- Use an External Parameter Entity Declaration to name and then call the DTD Suite modules of modules, which names all the potential modules. (For the Archiving DTD, the module %modules.ent;.);
- Use an External Parameter Entity reference to call the DTD-specific class over-rides (For the Archiving DTD, the module %archivecustom-classes.ent;.);
- Use an External Parameter Entity reference to call the DTD Suite default classes (For the Archiving DTD, the module %default-classes.ent;.);
- Use an External Parameter Entity reference to call the DTD-specific mix over-rides (For the Archiving DTD, the module %archivecustom-mixes.ent;.);
- Use an External Parameter Entity reference to call the DTD Suite default mixes (For the Archiving DTD, the module %default-mixes.ent;.);
- Use an External Parameter Entity reference to call the DTD-specific content models and attribute list over-rides (For the Archiving DTD, the module %archivecustom-models.ent;.);
- Use an External Parameter Entity reference to call in the standard Common Module (%common.ent;) that defines elements and attributes so common they are used by many modules.
- Select, from the Module to Name the Modules, those modules which contain the elements needed for the DTD (for instance, selecting lists and not selecting math elements) and calling in each of the modules needed; (The Archive DTD calls these in alphabetical order, since the order does not matter.)
- Define the document element and any other unique elements and entities needed for this DTD. (For example, the Archiving DTD declares only six elements — <article> [the top-level element] and its components: <front>, <body>, <back>, <sub-article>, and <response>.)
Links to general information on XML, XSLT, Unicode™, and XLink are available on the XML Resources page.
NLM thanks Mulberry Technologies, Inc. and Inera, Inc. for their expert advice and the intense document analysis that was required to create this library of schema modules for archiving and content interchange.
NLM also thanks the Harvard University Libraries, both for proposing that a draft archiving NLM DTD for life sciences journals be extended to accommodate journals in all disciplines and for sponsoring Inera's collaboration with other DTD authors in completing Version 1.0. The Andrew W. Mellon Foundation provided support for these important contributions.