Reading and writing bibliographic metadata
Metadata in scientific documents is, unfortunately, rarely appreciated and not widely used. When it comes to bibliographic metadata, the situation is even quite deceiving: there is no accepted format specification, and the reliability of publishers' metadata, if any at all, is questionable in many cases.
The cb2Bib reads all XMP (a specific XML standard devised for metadata storage)
packets found in the document. It then parses the XML strings looking for nodes
and attributes with key names meaningful to bibliographic references. If a given
bibliographic field is found in multiple packets, the cb2Bib will take the last
one, which most often, and according to the PDF specs, is the most updated one.
The metadata is then summarized in the cb2Bib clipboard panel as, for instance
[Bibliographic Metadata <title>arXiv:0705.0751v1 [cs.IR] 5 May 2007</title> /Bibliographic Metadata]
This data, whenever the user considers it to be correct, can be easily imported
by the build-in 'Heuristic Guess' capability. On the other hand, if keys are found
with the prefix
Once an extracted reference is saved and there is a document attached to it, the cb2Bib will optionally insert the bibliographic metadata into the document itself. The cb2Bib writes an XMP packet as, for instance,
<bibtex:author>P. Constans</bibtex:author> <bibtex:journal>arXiv 0705.0751</bibtex:journal> <bibtex:title>Approximate textual retrieval</bibtex:title> <bibtex:type>article</bibtex:type> <bibtex:year>2007</bibtex:year>
which is similar to JabRef, but differs on that the cb2Bib strictly sticks to BibTeX and avoids (perhaps unnecessary) syntax specialization in author strings.
The BibTeX fields
The actual writing of the packet into the document is performed by ExifTool, an excellent Perl program written by Phil Harvey. See http://www.sno.phy.queensu.ca/~phil/exiftool/. ExifTool supports several document formats for writing. The most relevant here are Postscript and PDF. For PDF documents, metadata is written as an incremental update of the document. This exactly preserves the binary structure of the document, and changes can be easily reversed or modified if so desired. Whenever ExifTool is unable to insert metadata, e.g., because the document format is not supported or it has structural errors, the cb2Bib will issue an information message, and the document will remain untouched.
Since February 14, 2001
Last modified: December 2014
Copyright © 2001-2014 MOLspaces.com
All Rights Reserved