The cb2Bib sources have been ported to Qt5. To highlight this major update in library requirements the version number is set to 1.9.0. Later, once stabilized and new functionality related to Qt5 enhancements are applied, version number will be set to 2.
At this point the cb2Bib has exactly the same functionality as its preceding version 1.5.0. To build the program, however, only qmake and its related config procedure are available. The cmake scripts have not yet been ported.
Qt5 brings important enhancements related to regular expressions and string processing. Some careful updates to the cb2Bib sources are needed to fully benefit from them. They will implemented through the 1.9.x series. We expect by then a performance boost on full text, regular expression based searches.
Included in version 1.5.0 sources there is a patch for XPDF 3.0.4, the default tool to convert PDF documents to plain text. The modified code separates superscripts to avoid words being joined to reference numbers and author names joined to affiliations' glyphs. Interested users will need to download the package, apply the patch, and compile it.
Additionally, this version improves converted text postprocessing. This step normalizes character codes, reverts ligatures, restores when possible orphan diacritics and broken words, and undoes text hyphenation.
Conversion to text and postprocessing is important for reference extraction, and document indexing and searching. It is therefore recommended to delete cached document-to-text data to benefit from the present improvements. The cb2Bib stores cached texts in *c2b files in an user specified directory. After that, by performing a search or initiating indexing an updated cache will be created.
Approximate and context searches effectively locate our references of interest. As collections grow in size, and low performance devices, netbooks and tablets, start being used, complete document searches become demanding. Besides, it is often not clear what to query for, and then a glossary of terms provides guidance. Often too, interest lies on subsetting documents by being similar to a given one.
Version 1.4.7 adds a pragmatic term or keyword extraction from the document
contents. Accepted keywords are set as the substrings appearing at least twice in
one document, appearing at least in three documents, and conforming to predefined
part-of-speech (POS) sequences. Keyword extraction is performed by either clicking
Retrieving is accomplished through pre-sorted views of the references and filtering. Both, views and filtering, scale on the (tens of) thousands references. Usually, we recall a work from its publication year, a few words from its title, or (some of the letters of) one of its authors names. Often, what we remember is when a reference was included into our collection. Therefore, having such a chronological view was desirable.
The implementation of this sorted-by-inclusion-date view was not done during the 1.3.x series, but postponed to version 1.4.0; somehow, to indicate that some sort of 'proprietary' BibTeX tag might be required to specify inclusion timestamps. I have been reluctant through the cb2Bib's life span to introduce 'cb2Bib-only' tags in the BibTeX outputs. I believe that there is little gain, and it costs, possibly, breaking interoperability.
In the end, the choice was to not write any 'timestamp' tag in references.
See also The cb2Bib Citer.
When version 0.2.7 came up, it was mentioned in Release Note cb2Bib 0.2.7, that the cb2Bib 'doesn't have the means to automatically discern an author name from a department or street name'. I forgot mentioning, that I did not expect the cb2Bib would have had such a feature. Since the last Release Note cb2Bib 1.1.0, the cb2Bib internals had changed significantly. Some changes, such heuristic recognition for interlaced authors and affiliations, get easily noticed. Other changes, however, do not, and need additional explanation.
From version 1.2.3, the switches
Lists of references are now sorted case and diacritic insensitive. For some languages such a choice is not the expected one, and some operating systems offer local-aware collation. Due to usual inconsistencies and inaccuracies in references, this decision was taken to group together 'Density Matrix' with 'Density-matrix', and Møller with Moller, which, in a personal collection, most probably, refer to the same concept and to the same person. Additionally, document to text converted strings are now clean from extraneous, non-textual symbols. Therefore, recreating cache files is recommended.
Finally, this release introduces a new module, named
A frequent request from cb2Bib users has been to expand the command line functionality. So far few progress has been seen in this regard. First, the addition of in-document searches and reading/inserting metadata were priorities. Second, the cb2Bib is not the tool to interconvert among bibliographic formats. And third, the cb2Bib is designed to involve the user in the search process, in the archiving and validation of the discovered works and references.
For the latter reason, and for not knowing a priori how would such a tool be
designed, the cb2Bib internals had been interlaced to its graphical interface. At
the time of version 0.7.0, when the graphical libraries changed, and a major
refactoring was required, the code started moving toward a better modularization
and structure. The current release pushes code organization further. As a result,
it adds two new command line switches:
The new cb2Bib module is named after the BibTeX key 'annote'. Annote is not for a 'one reference annotation' though. Instead, Annote is for short notes that interrelate several references. Annote takes a plain text note, with minimal or no markup, inserts the bibliographic citations, and converts it to a HTML page with links to the referenced documents.
From within the cb2Bib, to write your notes, type Alt+A, enter a filename, either new or existing, and once in Annote, type E to launch your default text editor. For help, type F1. Each time you save the document the viewer will be updated. To display mathematical notations, install jsMath locally. And, remember, code refactoring introduces bugs.
Approximately four years ago the first cb2Bib was released. It included the possibility of easily linking a document to its bibliographic reference, in a handy way, by dragging the file to the main (at that time, single) panel. Now, in version 1.0.0, when a file is dropped, the cb2Bib scans the document for metadata packets, and checks, in a rather experimental way, whether or not they contain relevant bibliographic information.
Publishers metadata might or might not be accurate. Some, for instance, assign
the DOI to the key Title. The cb2Bib extracts possibly relevant key-value pairs
and adds them to clipboard panel. Whenever key-value pairs are found accurate,
just pressing Alt+G imports them to the line edits. If keys with the prefix
The preparsed metadata that is added to the clipboard panel begins with
The previous cb2Bib release added the command line option
This release addresses these two points. Now, when the cb2Bib is launched as
The Windows' un/installer cleans/sets configuration data on the registry. Being
aware of this particular, it might be better not to install the program directly
to the USB drive. Just copy the cb2Bib base directory from a home/own computer to
the removable drive, and then run it on the host computer as
The cb2Bib accepts several arguments on its command line to access specific
functionality. So far, the command
This release adds the command line option
So far, however, this feature should be regarded as experimental. The Qt library to which the cb2Bib is linked does read/write access to system settings in a few places (concretely, in file and color dialogs). On Unix and Mac OS systems this access can be modified by setting the environment variable DAG_CONFIG_HOME. No such workaround is presently available in Windows.
See The cb2Bib Command Line for a detailed syntax description.
Several changes in this release affect installation and deployment. First, the
cb2Bib internals for settings management has been reorganized. Version 0.8.1 will
not read previous settings, as user colors, file locations, etc. On Unix, settings
are stored at
Second, cb2Bib tags are not shown by default. Instead, it is shown plain, raw clipboard data, as it is easier to identify with the original source. To write a regular expression, right click, on the menu, check 'View Tagged Clipboard Data', and perform the extraction from this view.
And finally, the cb2Bib adds the tag <<excerpt>> for network queries. It takes a simplified version of the clipboard contents and sends it to, e.g. Google Scholar. From there, one can easily import BibTeX references related to that contents. Therefore one should unchecked in most cases the 'Perform Network Queries after automatic reference extractions' box.
The cb2Bib reads the clipboard contents, processes it, and places it to the main cb2Bib's panel. If clipboard contents can be recognized as a reference, it writes the corresponding BibTeX entry. If not, the user can interact from the cb2Bib panel and complete or correct the reference. Additionally, this process permits to write down a regular expression matching the reference's pattern.
To ease pattern writing, cb2Bib preprocesses the raw input data. This can consider format conversion by external tools and general substitutions, in addition to including some special tags. The resulting preprocessed data is usually less readable. A particularly illustrating case is when input data comes from a PDF article.
The cb2Bib now optionally presents input data, as raw, unprocessed data. This preserves the block text format of the source, and thus identifying the relevant bibliographic fields by visual inspection is more straightforward. In this raw mode view panel, interaction works in a similar manner. Except that, no conversions or substitutions are seen there, and that no regular expression tags are written.
This release moves forward cb2Bib base requirement to Qt 4.2.0. Compilation errors related to rehighlight() library calls, kindly reported by Bongard, Seemann, and Luisser, should not appear anymore. File/URL opening is carried now by this library, in a desktop integrated manner. Additionally, Gnome users will enjoy better integration, as Cleanlooks widget style is available.
All known regressions in 0.6.9x series have been fixed. Also, a few minor improvements have been included. In particular, file selection dialogs display navigation history, and BibTeX output file can be conveniently selected from the list of '*.bib' files at the current directory. Such a feature will be specially useful to users that sort references in thematic files located at a given directory.
This release fixes a regression in the cb2Bib network capabilities. Network,
and hence querying was erratic, both for the internal HTTP routines and for
external clients. In addition to this fix, the
The cb2Bib has been ported from Qt3 to Qt4, a migration in its underlying system library. Qt experienced many changes and improvements in this major release upgrade. Relevant to cb2Bib, these changes will provide a better file management, word completion, faster searches, and better desktop integration.
Upgrading to Qt4 it is not a "plug and recompile" game. Thorough refactoring and rewriting was required. The resulting cb2Bib code is cleaner and more suitable to further development. As one might expect, major upgrades introduce new bugs that must be fixed. The cb2Bib 0.6.90 is actually a preview version. It has approximately the same functionality than its predecessor. So, no additions were considered at this point. Its use, bug reporting, and feedback are encouraged. This will help to get sooner a stable cb2Bib 0.7.
To compile it, type
The cb2Bib uses the internal tags
The cb2Bib identified so far new lines by checking for '\n' codes. I was
unaware that this was a platform dependent, as well as a not completely accurate
way of detecting new lines. McKay Euan reported that
This release addresses this issue. It is supposed now that the cb2Bib regular expressions will be more transferable among the different platforms. Extraction from plain text sources is expected to be completely platform independent. Extraction from web pages will still remain browser dependent. In fact, each browser adds its peculiar interpretation of a given HTML source. For example, in Wiley webpages we see the sectioning header 'Abstract' in its source and in several browsers, but we see, and get, 'ABSTRACT' if using Konqueror.
What we pay for this more uniform approach is, however, a break in
compatibility with previous versions of cb2Bib. Unix/Linux users should not
expect many differences, though. Only one from the nine regular expressions in the
examples needed to be modified, and the two contributed regular expressions work
perfectly without any change. Windows users will not see a duplication of
Finally, just to mention that I do not have a MacOSX to test any of the cb2Bib releases in this particular platform. I am therefore assuming that these changes will fix the problem at hand. If otherwise, please, let me know. Also, let me know if release 0.6.0 'break' your own expressions. I consider this release a sort of experimental or beta version, and the previous version 0.5.3, will still be available during this testing period.
Two issues had appeared regarding cb2Bib installation and deployment on MacOSX platforms.
First, if you encounter a 'nothing to install'-error during installation on
MacOSX 10.4.x using the cb2Bib binary installer available at naranja.umh.es/~atg/,
please delete the cb2bib-receipts from
Second, and also extensible to other cb2Bib platform versions, if PDFImport
issues the error message 'Failed to call some_format_to_text' tool, make
sure such a tool is installed and available. Go to Configure->PDFImport, click
at the 'Select External Convert Tool' button, and navigate to set its full path.
Since version 0.5.0 the default full path for the MacOSX is already set, and
Qt/KDE applications emit notifications whenever they change the clipboard
contents. The cb2Bib uses these notifications to automatically start its
'clipboard to BibTeX' processing. Other applications, however, does not notify
about them. Since version 0.2.1, see Release Note cb2Bib 0.2.1, cb2Bib started
checking the clipboard periodically. This checking was later disabled as a
default, needing a few lines of code to be uncomented to activate it. Without such
a checking, the cb2Bib appears unresponsive when selecting/copying from e.g.,
acroread or Mozilla. This release includes the class
Releases 0.3.3 and 0.3.4 brought querying functionality to cb2Bib. In essence, cb2Bib was rearranged to accommodate copying and opening of network files. Queries were then implemented as user customizable HTML posts to journal databases. In addition, these arrangements permitted defining convenience, dynamic bookmarks that were placed at the cb2Bib's 'About' panel.
cb2Bib contains three viewing panels: 'About', 'Clipboard' and 'View BibTeX', being the 'Clipboard' panel the main working area. To keep cb2Bib simple, only two buttons, 'About' and 'View BibTeX', are set to navigate through the panels. The 'About' and 'View BibTeX' buttons are toggle buttons for momentarily displaying their corresponding panels. Guidance was so far provided by enabling/disabling the buttons.
After the bookmark introduction, the 'About' panel has greatly increased its usefullness. Button functionality has been slightly redesigned now to avoid as many keystrokes and mouse clicks as possible. The buttons remain switchable, but they no longer disable the other buttons. User is guided by icon changes instead. Hopefully these changes will not be confusing or counterintuitive.
Bookmarks and querying functionality are customizable through the
Users should order the
So far, this querying functionality is still tagged as experimental.
Either the querying itself or its syntax seem quite successful. However,
downloading of PDF files, on windows OS + T1 network, was found to freeze
once progress reaches the 30-50%. Any feedback on this issue will be greatly
appreciated. Also, information on
cb2Bib considers the whole set of authors as an author-string pattern. This string is later postprocessed, without requirements on the actual number of authors it may contain, or on how the names are written. Once considered author-string patterns, the extraction of bibliographic references by means of regular expressions becomes relatively simple.
There are situations, however, where several author-strings are required. The
following box shows one of these cases. Authors are grouped according to their
affiliations. Selecting from 'F. N. First' to 'F. N. Fifth' would include 'First
Affiliation' within the author string. Cleaning up whatever wording 'First
Affiliation' may contain is a rather ill-posed problem. Instead, cb2Bib includes
At this point in the manual extraction, the user was faced with a red
So far the
The cb2Bib 0.3.0 manual extraction works as usual. By clicking
In automatic mode, cb2Bib now adds all
The cb2Bib 0.2.7 release introduces multiple retrieving from PDF files. PDF documents are becoming more and more widely used, not only to transfer and printing articles, but also are substituting the personal paper files and classifiers for the electronic equivalents.
cb2Bib is intended to help updating personal databases of papers. It is a tool focused on what is left behind in database retrieving. Cases such as email alerts, or inter colleague references and PDF sharing are example situations. Though in an electronic format, sources are not standardized or not globally used as to permit using habitual import filters in reference managers. cb2Bib is designed to consider a direct user intervention, either by creating its own useful filters or by a simple copy-paste assistance when handtyping.
Hopefully someday cb2Bib will be able to take that old directory, with perhaps a few hundreds of papers, to automatically index the references and rename the files by author, in a consistent manner. The required mechanism is already there, in this version. But I guess that this new feature will manifest some present limitations in cb2Bib. For instance, most printed and PDF papers interlace author names and affiliations. cb2Bib doesn't have the means to automatically discern an author name from a department or street name. So far one needs to manually use the 'Add to Authors' feature to deal with these situations. Also, the managing of regular expressions needs developing, specially thinking in the spread variety of design patterns in publications.
In summary, this current version is already useful in classifying and extracting the reference of that couple of papers that someone send right before submitting a work. A complete unsupervised extraction is still far away, however.
The cb2Bib mechanism 'select-and-catch' failed in some cases. Acrobat and Mozilla selections were not always notified to the cb2Bib. Indeed, this 'window manager - application' connection seems to be broken on a KDE 3.3.0 + Qt 3.3.3 system.
The cb2Bib 0.2.1 continues to listen to system clipboard change notifications, whenever they are received and whenever cb2Bib is on connected mode. Additionally, the cb2Bib 0.2.1 periodically checks for changes in the system clipboard. Checks are performed every second, approximately. This permits cb2Bib to work as usual, although one could experience 1-2 seconds delays in systems where the automatic notification is broken.
If the 'select-and-catch' functionality appears 'sticky', possibly happening
while using non KDE applications from where text is selected, check the source
Since February 14, 2001
Last modified: March 2017
Copyright © 2001-2017 MOLspaces.com
All Rights Reserved