Personal

Contact

Anir

LSim

cb2Bib

Downloads

Site Map

Links

Search BibTeX files for references


 

Description

  • Search pattern
    Patterns and composite patterns can be either approximate strings, strings, regular expressions, or wildcard filters. Patterns admit Unicode characters. The scope of each pattern can be the reference as a whole or be focused on a particular reference field. The fields year, file, and journal are treated specifically. The field year has the qualifiers Exact, Newer, and Older. The field file can optionally refer to either the filename or the contents of such a file. Finally, for journal, the input pattern is duplicated to the, if available, journal fullname, and they two are checked against the journal actual field contents and, if available, its expanded contents. For example, typing 'ijqc' retrieves all references with journal being 'Int. J. Quantum Chem.'. Or, typing 'chemistry' retrieves any of 'J. Math. Chem.', 'J. Phys. Chem.', etc. This expansion is not performed when the pattern scope is set to all.
  • Search scope
    By default, searches are performed on the current BibTeX output file. If Scan all BibTeX files is checked the search will extend to all BibTeX files, extension .bib, present in the current directory. It might be therefore convenient to group all reference files in one common directory, or have them linked to that directory. When Scan linked documents is checked, and one or more pattern scope is all or file, the contents of the file in file is converted to text and scanned for that given pattern. See Configuring Utilities section to configure the external to text converter.
  • Search modifier
    The cb2Bib converts TeX encoded characters to Unicode when parsing the references. This permits, for instance, for the pattern 'Møller' to retrieve either 'Møller' or 'M{\o}ller', without regard to how the BibTeX reference is written. By checking Simplify source, the reference and the converted PDF files are simplified to plain Ascii. Thus, the pattern '\bMoller\b' will hit any of 'Møller', 'M{\o}ller', or 'Moller'. Additionally, all non-word characters are removed, preserving only the Ascii, word structure of the source. Note that source simplification is only performed for the patterns whose scope is all or file contents, and that and so far, the cb2Bib has only a subset of such conversions. Implemented TeX to Unicode conversions can be easily checked by entering a reference. The Unicode to Ascii letter-only conversion, on the other hand, is the one that the cb2Bib also uses to write the reference IDs and, hence, the renaming of dropped files. The cb2Bib can also understand minor sub and superscript formatting. For instance, the pattern 'H2O' will retrieve 'H2O' from a BibTeX string 'H$_{2}$O'.

Notes

  • The cb2Bib uses an internal cache to speed up the search of linked files. By default data is stored as current_file.bib.c2b. It might be more convenient, however, to setup a temporary directory out of the user data backup directories. See Search In Files Cache Directory in Configuring Files. When a linked file is processed for the first time, the cb2Bib does several string manipulations, such as removing end of line hyphenations. This process is time consuming for very large files.
  • The approximate string search is described in reference http://arxiv.org/abs/0705.0751v1. It reduces the chance of missing a hit due to transcription and decoding errors in the document files. Approximate string is also a form of serendipitous information retrieval.