cb2Bib: Search BibTeX files for references
- Search pattern
Patterns and composite patterns can be either approximate strings,
strings, regular expressions, or wildcard filters. Patterns admit Unicode
characters. The scope of each pattern can be the reference as a whole or be focused
on a particular reference field. The fields year, file,
and journal are treated specifically. The field year has
the qualifiers Exact, Newer, and Older. The
field file can optionally refer to either the filename or the contents
of such a file. Finally, for journal, the input pattern is duplicated
to the, if available, journal fullname, and they two are checked against the
journal actual field contents and, if available, its expanded
contents. For example, typing 'ijqc' retrieves all references with
journal being 'Int. J. Quantum Chem.'. Or, typing 'chemistry'
retrieves any of 'J. Math. Chem.', 'J. Phys. Chem.', etc. This expansion is not
performed when the pattern scope is set to all.
- Search scope
By default, searches are performed on the current BibTeX output file. If Scan
all BibTeX files is checked the search will extend to all BibTeX files,
extension .bib, present in the current directory. It might be therefore convenient
to group all reference files in one common directory, or have them linked to that
directory. When Scan linked documents is checked, and one or more pattern
scope is all or file, the contents of the file in
file is converted to text and scanned for that given pattern. See
Configuring conversion
to plain text section to configure the external to text converter.
- Search modifier
The cb2Bib converts TeX encoded characters to Unicode when parsing the references.
This permits, for instance, for the pattern 'Møller' to retrieve either
'Møller' or 'M{\o}ller', without regard to how the BibTeX reference is
written. By checking Simplify source, the reference and the converted PDF
files are simplified to plain Ascii. Thus, the pattern '\bMoller\b' will hit any of
'Møller', 'M{\o}ller', or 'Moller'. Additionally, all non-word characters
are removed, preserving only the Ascii, word structure of the source. Note that
source simplification is only performed for the patterns whose scope is
all or file contents, and that and so far, the cb2Bib has only a
subset of such conversions. Implemented TeX to Unicode conversions can be easily
checked by entering a reference. The Unicode to Ascii letter-only conversion, on
the other hand, is the one that the cb2Bib also uses to write the reference IDs
and, hence, the renaming of dropped files.
- The cb2Bib uses an internal cache to speed up the search of linked files. By
default data is stored as
current_file.bib.c2b. It might be more
convenient, however, to setup a temporary directory out of the user data backup
directories. See Search In Files Cache Directory in Configuring Files. When a linked file is
processed for the first time, the cb2Bib does several string manipulations, such as
removing end of line hyphenations. This process is time consuming for very large
files.
- The approximate string search is described in reference http://arxiv.org/abs/0705.0751v1. It reduces the chance of missing a
hit due to transcription and decoding errors in the article files.
|