Tagging Keywords

Keywords are words and phrases used to name a book’s or book part’s key concepts for search and retrieval purposes. Typically an author or a publisher will assign a small number of key terms to expand lookup beyond full text or to point up the most important topics described in a book or book part. Indexers assigning keywords can make sure that someone searching for “this topic” will find this book or this chapter, even if the exact words are not present in text of the book.

In this Tag Set, keywords come in sets (<kwd-group>), each of which may come from a particular source or ontology (such as “author-created” or the “MeSH Subject Headings”), which is named using the Keyword Authority attribute. Here are some sample tagged keywords that a contributor chose as best describing a book:

<kwd-group kwd-group-type="author-created">
   <kwd>acid precipitation</kwd>
   <kwd>acid rainfall</kwd>    
   <kwd>smelting region</kwd>
   <kwd>Aluminum residues</kwd>
   <kwd>Sulphur dioxide</kwd>
   <kwd>Copper-nickel smelters</kwd>
</kwd-group>

Versions of this Tag Set prior to 3.0 have allowed for multiple sets of keywords, but the individual keywords in the set had no structure; they were just text, words, and phrases — possibly with face markup, superscripts, and subscripts (such as “<kwd>XML</kwd>”, “<kwd>H<sub>2</sub>O</kwd>”, and “<kwd>blood-brain barrier</kwd>”). Version 3.0 now accommodates more elaborate keyword structures.

Tagging Complex/Compound/Nested Keywords

Some keywords possess an internal structure of their own, for example, a keyword may include both a textual phrase and its corresponding code (“863 Icelandic sagas”). Many styles of such compound keywords can be handled in these Tag Sets with the <compound-kwd> element, which is modeled as a series of repeatable parts (<compound-kwd-part>). These parts can differentiate a text/code pair, divide a coded keyword into multiple code segments, describe a hierarchy, and name a variety of other compound structures. The @content-type attribute on the <compound-kwd-part> element is used to name each part, describe the role it plays, or otherwise define how each part functions within the keyword as a whole.

Keywords with Codes

The simplest case of a compound keyword is a keyword that includes both a textual phrase and its corresponding code, for example, “863 Icelandic sagas”. Both the code and the text can be tagged as keywords parts (<compound-kwd-part>) inside the element <compound-kwd>, with the @content-type attribute used to name the role or type of each part:

 <kwd-group kwd-group-type="ISO-463">
   <compound-kwd>
     <compound-kwd-part 
        content-type="ISO-463-code">863</compound-kwd-part>
     <compound-kwd-part
        content-type="ISO-463-text">Icelandic sagas</compound-kwd-part>
   </compound-kwd>
   ...
 </kwd-group>

Abbreviation and Expansion Keywords

Compound keywords can also be used to handle keywords that hold an abbreviation and its expansion. Both the abbreviation and the expansion are tagged as <compound-kwd-part> in a single <compound-kwd>. The @kwd-group-type attribute on <kwd-group>, which is sometimes used to name the source or the descriptor for the keywords, can be used instead to name the type of information, such as “abbreviations”. For example:

 <kwd-group kwd-group-type="abbreviations">
    <compound-kwd>
      <compound-kwd-part content-type="abbrev">WT</compound-kwd-part>
      <compound-kwd-part content-type="expansion">WildType</compound-kwd-part>
    </compound-kwd>
    <compound-kwd>
      <compound-kwd-part content-type="abbrev">CFU</compound-kwd-part>
      <compound-kwd-part content-type="expansion">Colony-forming unit</compound-kwd-part>
    </compound-kwd>
 </kwd-group>

Keywords vs. Subjects

Some publishers assign hierarchical topics to book sections such as chapters or articles (<book-part>). For example, a publisher might tag selected topics (“Blood brain barrier”), nested inside themes (“Cellular and Molecular Biology”), grouped into larger units like “Neuroscience”, and grouped into still larger units such as “Biological Sciences”, forming the following hierarchy describing a chapter:

Biological Sciences
   Neuroscience
      Cellular and Molecular Biology
         Blood–brain barrier

This kind of structure places the part of the book in context or divides chapters into categories, commonly seen in Tables of Contents, where all the Neuroscience chapters are grouped together and all the Biochemistry chapters are grouped, etc. Since keywords are intended for searching, exact lookup, and retrieval of a part of a book rather than establishing that part’s context, best practice, is to tag this topic structure as subject groups within the book part categories:

<book-part-categories>
  <subj-group subj-group-type="keywords"> 
     <subject>Biological Sciences</subject>
     <subj-group subj-group-type="keywords">
        <subject>Neuroscience</subject>
        <subj-group subj-group-type="keywords">
            <subject>Cellular and Molecular Biology</subject>  
          <subj-group subj-group-type="keywords">
               <subject>Blood–brain barrier</subject>
            </subj-group>
        </subj-group>
     </subj-group>
  </subj-group> 
</book-part-categories>

If an archive wishes to tag these hierarchical structures as keywords, the following tagging would still be adequate to this Tag Set. Although <compound-kwd-part> is not recursive, the hierarchy may be preserved by setting the @kwd-group-type to “hierarchical”, “topical”, or “subject hierarchy”, and using the @content-type attribute on each part to indicate hierarchical level:

<kwd-group kwd-group-type="hierarchical">   
  <compound-kwd>
    <compound-kwd-part content-type="level1">Biological Sciences</compound-kwd-part>
    <compound-kwd-part content-type="level2">Neuroscience</compound-kwd-part>
    <compound-kwd-part content-type="level3">Cellular and Molecular Biology</compound-kwd-part> 
    <compound-kwd-part content-type="level4">Blood–brain barrier</compound-kwd-part>
  </compound-kwd>
</kwd-group>