Personal

Contact

LSim

cb2Bib

Downloads

Site Map

Links



Sofotex 5 starts award

cb2Bib: PDF Reference Import


 

Articles in PDF or other formats that can be converted to plain text can be processed and indexed by the cb2Bib. Files can be selected using the Select Files button, or dragging them from the desktop or the file manager to the PDFImport dialog panel. Files are converted to plain text by using any external translation tool or script. This tool, and optionally its parameters, are set in the cb2Bib configure dialog. See the Configuring conversion to plain text section for details.

Once the file is converted, the text is sent to the cb2Bib for reference recognition. This is the usual, two step process. First, text is optionally preprocessed, using a simple set of rules and/or any external script.or tool. See Configuring Input. Second, text is processed for reference extraction. The cb2Bib so far uses two methods. One considers the text as a full pattern, which is checked against the user's set of regular expressions. The better designed are these rules, the best and most reliable will be the extraction. The second method, used when no regular expression matches the text, considers instead a set of predefined subpatterns. See Field Recognition Rules.

At this point users can interact and supervise their references, right before saving them. Allowing user intervention is and has been a design goal in the cb2Bib. Thus, at this point, the cb2Bib invites users to check their references. Poorly translated characters, accented letters, 'forgotten' words, or some minor formatting in the titles might be worth considering. In addition, if too few fields were extracted, one might perform a network query. Say, only the DOI was catch, then there are chances that such a query will fill the remaining fields.

The references are saved from the cb2Bib main panel. Once Save is pressed, and depending on the configuration, see Configuring Files, the article file will be either renamed, copied, moved or simply linked onto the file field of the reference.

When several files are going to be indexed, the sequence can be as follows:

  • Process next after saving
    Once files are load and Process is pressed, the PDFImport dialog can be minimized (but not closed). All required operations at this point are accessible from the main panel. The link in the file field will be permanent, without regard to which operations (e.g. clipboard copying) are needed, until the reference is saved. Then, the next file will be automatically processed. The source file can be openend at any time by right clicking the file line edit.
  • Unsupervised processing
    In this operation mode, all files will be sequentially processed, following the chosen steps and rules. If the processes is successful, the reference is automatically saved, and the next file is processed. If it is not, the file is skipped and no reference is saved. While processing, the clipboard is disabled for safety. Once finished, this box is unchecked, to avoid a possible accidental saving of a void reference. Network queries that require intervention, i.e., whose result is launching a given page, are disabled. The processes follows until all files are processed. However, it will stop to avoid a file being overwritten, as a result of a repeated key. In this case, it will resume after manual renaming and saving.