cb2Bib Search BibTeX and PDF Document Files


  • Search pattern
    Patterns and composite patterns can be either approximate strings, strings, contexts, regular expressions, or wildcard filters. Patterns admit Unicode characters. The scope of each pattern can be the reference as a whole or be focused on a particular reference field. The fields year, file, and journal are treated specifically. The field year has the qualifiers Exact, Newer, and Older. The field file can optionally refer to either the filename or the contents of such a file. Finally, for journal, the input pattern is duplicated to the, if available, journal fullname, and they two are checked against the journal actual field contents and, if available, its expanded contents. For example, typing ‘ijqc’ retrieves all references with journal being ‘Int. J. Quantum Chem.‘. Or, typing ‘chemistry’ retrieves any of ‘J. Math. Chem.’, ‘J. Phys. Chem.’, etc. This expansion is not performed when the pattern scope is set to all.
  • Search scope
    By default, searches are performed on the current BibTeX output file. If Scan all BibTeX files is checked the search will extend to all BibTeX files, extension .bib, present in the current directory. It might be therefore convenient to group all reference files in one common directory, or have them linked to that directory. When Scan linked documents is checked, and one or more pattern scope is all or file, the contents of the file in file is converted to text and scanned for that given pattern. See Configuring Utilities section to configure the external to text converter.
  • Search modifier
    cb2Bib converts TeX encoded characters to Unicode when parsing the references. This permits, for instance, for the pattern ‘Møller’ to retrieve either ‘Møller’ or ’M{\o}ller’, without regard to how the BibTeX reference is written. By checking Simplify source, the reference and the converted PDF files are simplified to plain ASCII. In this way, the pattern ‘\bMoller\b’ will hit any of ‘Møller’, ’M{\o}ller’, or ‘Moller’. Additionally, all non-word characters are removed, preserving only the ASCII, word structure of the source. Note that source simplification is only performed for the patterns whose scope is all or file contents, and that and so far, cb2Bib has only a subset of such conversions. Implemented TeX to Unicode conversions can be easily checked by entering a reference. The Unicode to ASCII letter-only conversion, on the other hand, is the one that cb2Bib also uses to write the reference IDs and, hence, the renaming of dropped files. cb2Bib can understand minor sub and superscript formatting. For instance, the pattern ‘H2O’ will retrieve ‘H2O’ from a BibTeX string H$_{2}$O.

Contextual Search

A convenient way to retrieve documents is by matching a set of keywords appearing in a close proximity context, while disregarding the order in which the words might had been written. cb2Bib considers two types of contextual searches. One flexibilizes phrase matching only at the level of the constituting words. It is accessed by selecting Fixed string: Context in the pattern type box. The other one, in addition, stems the supplied keywords. It is accessed by selecting Context. By way of stemming, the keyword analyze, for example, will also match analyse, and aluminum will match aluminium too.

The syntax for Context type patterns is summarized in the following table:

Operator   Example                          Expansion

space      contextual search                contextual AND search

|          contextual search|matching       contextual AND (search|match)

+          contextual search|+matching      contextual AND (search|\bmatching\b)

_          contextual_search                contextual.{0,25}search

-          non-parametric                   non.{0,1}parametr

Diacritics and Greek letters:

           naïve search                     (naïve|naive) AND search

           kendall tau                      kendall AND (tau|τ)

In the above examples, operator space AND means match words in any order. Operator _ preserves word order, and operator + prevents stemming and forces exact word match. Operator - considers cases of words that might had been written either united, hyphenated, or space separated. Diacritics are expanded if the diacritic mark is specified. This is, naive will not match naïve. On the other hand, Greek letters are expanded only when typed by name.


  • cb2Bib uses an internal cache to speed up the search of linked files. By default data is stored as current_file.bib.c2b. It might be more convenient, however, to setup a temporary directory out of the user data backup directories. See Search In Files Cache Directory in Configuring Files. When a linked file is processed for the first time, cb2Bib does several string manipulations, such as removing end of line hyphenations. This process is time consuming for very large files.
  • The approximate string search is described in reference https://arxiv.org/abs/0705.0751. It reduces the chance of missing a hit due to transcription and decoding errors in the document files. Approximate string is also a form of serendipitous information retrieval.