PubMed contains citations and abstracts of biomedical literature from several NLM literature resources, including MEDLINE—the largest component of the PubMed database. For an overview of PubMed and its contents, visit the About PubMed page: https://pubmed.ncbi.nlm.nih.gov/about/.

Downloading PubMed Data

PubMed data are available via our FTP servers and via the E-utilities API.

FTP download

Once a year, NLM releases a complete (baseline) set of PubMed citation records in XML format for download. Incremental update files are then released daily and include new, revised, and deleted citations. The PubMed DTD states any changes to the structure and allowed elements from year to year. (Note: Binary mode must be used when downloading data from our FTP servers.)

Note: Book citations are not included in the FTP files; however, they can be retrieved from the web interface or with the PubMed E-Utilities API.

NCBI E-utilities API

The Entrez Programming Utilities (E-utilities) consist of eight server-side programs that provide a stable interface into the Entrez query and database system at the National Center for Biotechnology Information (NCBI).

For more information, visit the PubMed User Guide’s Downloading PubMed section: https://pubmed.ncbi.nlm.nih.gov/help/#download-pubmed-data

Some notes about PubMed data

The use of "Medline" in an element name does not mean the record represents a citation from a MEDLINE-selected journal. When the NLM DTDs and XML elements were first created, MEDLINE records were the only data exported. Now NLM exports citations other than MEDLINE records. To minimize unnecessary disruption to users of the data, NLM has retained the original element names (e.g., MedlineCitation, MedlineJournalInfo, MedlineTA).

Policies affecting data creation have evolved over the years. Some PubMed records are added or revised well after the cited article was first published. In these cases, on occasion an element that had not yet been created when the article was published may appear on the record. For example, the Abstract element was not created until 1975, but some records published before 1975 but added to PubMed after 1975 contain <Abstract>. It is also possible that an element may be treated differently from the way it would have been treated had the record been created or maintained near the time the article was published. For example, the number of <Author> occurrences can diverge from the policies stated in the NLM author indexing policy (https://pubmed.ncbi.nlm.nih.gov/help/#author-indexing-policy). Lastly, as of October 2016, the publisher of the original article has the capability to edit the PubMed record’s citation data, with the exception of MeSH data, using the PubMed Data Management system. PubMed record data for older citations, therefore, may contain data for elements that didn’t exist when the citation was created.

Using this documentation

This site provides annotations and examples for all elements and attributes defined in the current PubMed DTD.

Element pages include:

  • A description or other notes regarding the data included in the element
  • Content Model describing the expected contents -- this includes syntax and any parent or child elements
  • Valid attributes
  • Sample XML

Attribute pages include:

  • Associated elements
  • Allowed values, where specified
  • A description or other notes regarding the attribute

Note: MathML and General Entities do not include additional descriptive text or examples.

The menu on the left sidebar expands to show a list of all Elements and Attributes in alphabetical order. On each page the names of Elements and Attributes are hyperlinked for easy navigation within this tool.

If you have any questions, please contact:

  • National Center for Biotechnology Information
  • info@ncbi.nlm.nih.gov