Search BibTeX files for references
Description
- Search pattern
Patterns and composite patterns can be either approximate strings,
strings, regular expressions, or wildcard filters. Patterns admit Unicode
characters. The scope of each pattern can be the reference as a whole or be
focused on a particular reference field. The fields year,
file, and journal are treated specifically. The field
year has the qualifiers Exact, Newer, and
Older. The field file can optionally refer to either
the filename or the contents of such a file. Finally, for journal,
the input pattern is duplicated to the, if available, journal fullname, and they
two are checked against the journal actual field contents and, if
available, its expanded contents. For example, typing 'ijqc' retrieves all
references with journal being 'Int. J. Quantum Chem.'. Or, typing
'chemistry' retrieves any of 'J. Math. Chem.', 'J. Phys. Chem.', etc. This
expansion is not performed when the pattern scope is set to
all.
- Search scope
By default, searches are performed on the current BibTeX output file. If Scan
all BibTeX files is checked the search will extend to all BibTeX files,
extension .bib, present in the current directory. It might be therefore
convenient to group all reference files in one common directory, or have them
linked to that directory. When Scan linked documents is checked, and one
or more pattern scope is all or file, the contents of
the file in file is converted to text and scanned for that given
pattern. See Configuring Utilities section to
configure the external to text converter.
- Search modifier
The cb2Bib converts TeX encoded characters to Unicode when parsing the
references. This permits, for instance, for the pattern 'Møller' to
retrieve either 'Møller' or 'M{\o}ller', without regard to how the BibTeX
reference is written. By checking Simplify source, the reference and the
converted PDF files are simplified to plain Ascii. Thus, the pattern '\bMoller\b'
will hit any of 'Møller', 'M{\o}ller', or 'Moller'. Additionally, all
non-word characters are removed, preserving only the Ascii, word structure of the
source. Note that source simplification is only performed for the patterns whose
scope is all or file contents, and that and so far, the cb2Bib
has only a subset of such conversions. Implemented TeX to Unicode conversions can
be easily checked by entering a reference. The Unicode to Ascii letter-only
conversion, on the other hand, is the one that the cb2Bib also uses to write the
reference IDs and, hence, the renaming of dropped files. The cb2Bib can also
understand minor sub and superscript formatting. For instance, the pattern 'H2O'
will retrieve 'H2O' from a BibTeX string 'H$_{2}$O'.
Notes
- The cb2Bib uses an internal cache to speed up the search of linked files. By
default data is stored as
current_file.bib.c2b. It might be more
convenient, however, to setup a temporary directory out of the user data backup
directories. See Search In Files Cache Directory in Configuring Files. When a linked file is
processed for the first time, the cb2Bib does several string manipulations, such
as removing end of line hyphenations. This process is time consuming for very
large files.
- The approximate string search is described in reference http://arxiv.org/abs/0705.0751v1. It reduces the chance of missing a
hit due to transcription and decoding errors in the document files. Approximate
string is also a form of serendipitous information retrieval.
|