cb2Bib Release Notes

Release Note cb2Bib 2.0.1

To optimize search on PDF’s contents, cb2Bib keeps a cache with the extracted text streams, that are compressed to reduce disk space and reading overhead. Nowadays, compressors with extremely high decompression speed are available. Two of them are LZSSE, for SSE4 capable architectures, and LZ4, for a broader range of CPUs. These two compressors can now be used by cb2Bib, with the latter set as the default compression library in cb2Bib builds. When upgrading to version 2.0.1, the first search on the document collection will recreate the cache, and this step will be noticeably slow.

Additionally, cb2Bib 2.0.1 includes original, optimized text matching code for AVX2 capable architectures that is used for search matching and BibTeX parsing. This code is not set in default builds and needs to be explicitly enabled at compilation time.

Finally, it is important mentioning the inclusion in version 2.0.1 of stemmed context search, see Contextual Search for details, and contributed feedback in handling citations and extending cite commands to markdown syntax, see Predefined Placeholders.


Release Note cb2Bib 2.0.0

Throughout the 1.9.x series, the cb2Bib sources were updated to the improved string processing capabilities of Qt5 and PCRE libraries. This update has brought a remarkable speedup for in-document searches and full search indexing.

Alternate normalization of journal titles and abbreviations, upgrading jsMath to MathJax, extending network queries syntax, and a PDF user manual are the additional enhancements in cb2Bib 2.0.0.

Back in version 0.3.3, cb2Bib introduced network queries to obtain the data for a citation. While convenient, queries to publishers’ websites were difficult to setup and fragile. Nowadays, fortunately, arXiv, PubMed and Crossref offer structured APIs. These interfaces provide to the end user an easy setup for completing bibliographic citations.


Release Note cb2Bib 1.9.0

The cb2Bib sources have been ported to Qt5. To highlight this major update in library requirements the version number is set to 1.9.0. Later, once stabilized and new functionality related to Qt5 enhancements are applied, version number will be set to 2.

At this point cb2Bib has exactly the same functionality as its preceding version 1.5.0. To build the program, however, only qmake and its related config procedure are available. The cmake scripts have not yet been ported.

Qt5 brings important enhancements related to regular expressions and string processing. Some careful updates to the cb2Bib sources are needed to fully benefit from them. They will implemented through the 1.9.x series. We expect by then a performance boost on full text, regular expression based searches.


Release Note cb2Bib 1.5.0

Included in version 1.5.0 sources there is a patch for XPDF 3.0.4, the default tool to convert PDF documents to plain text. The modified code separates superscripts to avoid words being joined to reference numbers and author names joined to affiliations’ glyphs. Interested users will need to download the package, apply the patch, and compile it.

Additionally, this version improves converted text postprocessing. This step normalizes character codes, reverts ligatures, restores when possible orphan diacritics and broken words, and undoes text hyphenation.

Conversion to text and postprocessing is important for reference extraction, and document indexing and searching. It is therefore recommended to delete cached document-to-text data to benefit from the present improvements. cb2Bib stores cached texts in *c2b files in an user specified directory. After that, by performing a search or initiating indexing an updated cache will be created.


Release Note cb2Bib 1.4.7

Approximate and context searches effectively locate our references of interest. As collections grow in size, and low performance devices, netbooks and tablets, start being used, complete document searches become demanding. Besides, it is often not clear what to query for, and then a glossary of terms provides guidance. Often too, interest lies on subsetting documents by being similar to a given one.

Version 1.4.7 adds a pragmatic term or keyword extraction from the document contents. Accepted keywords are set as the substrings appearing at least twice in one document, appearing at least in three documents, and conforming to predefined part-of-speech (POS) sequences. Keyword extraction is performed by either clicking on Index Documents at the c2bciter desktop tray menu, or, by typing cb2bib –index [bibdirname] on a shell. During extraction, the Part Of Speech (POS) Lexicon distribution file must be available and readable. On termination, indexing files are saved on the Search In Files Cache Directory. Simply copying this directory will synchronize keyword indexing to a second computer.

After refreshing c2bciter module, pressing key G displays the glossary of terms. On a reference, pressing K displays its list of keywords. Pressing R on a keyword lists the references related to that keyword. Pressing R on a reference lists similarly related references. Similarity is assessed based on keyword occurrences. Left and Right keys provide previous and next navigation. Pressing V on either a reference keyword, or a keyword reference, visualizes the keyword excerpts from the reference’s document. To close excerpt dialog press Esc or Left keys.

See also cb2Bib Citer, Configuring Files, and cb2Bib Command Line.


Release Note cb2Bib 1.4.0

The c2bciter module was introduced in version 1.3.0. Its name, as it was described, states its purpose of being “aimed to ease inserting citation IDs into documents”. In fact, it does have such functionality. And, it has also another, equally important one: it provides a very fast way to retrieve a given work from our personal collections.

Retrieving is accomplished through pre-sorted views of the references and filtering. Both, views and filtering, scale on the (tens of) thousands references. Usually, we recall a work from its publication year, a few words from its title, or (some of the letters of) one of its authors names. Often, what we remember is when a reference was included into our collection. Therefore, having such a chronological view was desirable.

The implementation of this sorted-by-inclusion-date view was not done during the 1.3.x series, but postponed to version 1.4.0; somehow, to indicate that some sort of ‘proprietary’ BibTeX tag might be required to specify inclusion timestamps. I have been reluctant through the cb2Bib’s life span to introduce ‘cb2Bib-only’ tags in the BibTeX outputs. I believe that there is little gain, and it costs, possibly, breaking interoperability.

In the end, the choice was to not write any ‘timestamp’ tag in references. Instead, c2bciter checks for the last modified date of the linked documents to build an approximated chronological view. The advantage is that all, not just ‘version 1.4.0 or later’, references are sorted. Furthermore, if a reference is later corrected, and the document metadata is updated too, the modification date is reflected in the view. The obvious inconvenience is that no such sorting can be done for references without an attached document.

See also cb2Bib Citer.


Release Note cb2Bib 1.3.0

When version 0.2.7 came up, it was mentioned in Release Note cb2Bib 0.2.7 that cb2Bib ‘doesn’t have the means to automatically discern an author name from a department or street name’. I forgot mentioning, that I did not expect cb2Bib would have had such a feature. Since the last Release Note cb2Bib 1.1.0, the cb2Bib internals had changed significantly. Some changes, such heuristic recognition for interlaced authors and affiliations, get easily noticed. Other changes, however, do not, and need additional explanation.

From version 1.2.3, the switches –txt2bib and –doc2bib set cb2Bib to work on console mode. The non-exact nature of the involved extractions makes logging necessary. On Windows, graphic or console modes must be decided not at run time, but when the application is built. So far, logging and globing were missing. This release adds the convenience wrapper c2bconsole. Typing c2bconsole –txt2bib i*.txt out.bib, for instance, will work as it does in the other platforms.

Lists of references are now sorted case and diacritic insensitive. For some languages such a choice is not the expected one, and some operating systems offer local-aware collation. Due to usual inconsistencies and inaccuracies in references, this decision was taken to group together ‘Density Matrix’ with ‘Density-matrix’, and Møller with Moller, which, in a personal collection, most probably, refer to the same concept and to the same person. Additionally, document to text converted strings are now clean from extraneous, non-textual symbols. Therefore, recreating cache files is recommended.

Finally, this release introduces a new module, named c2bciter, and aimed to ease inserting citation IDs into documents. The module should ideally stay idle at the system tray, and be recalled as needed by pressing a global, desktop shortcut. This functionality, while desirable, and usual in dictionaries, is platform and desktop dependent. On KDE there are currently known issues when switching among virtual desktops.

See also cb2Bib Citer, and cb2Bib Command Line.


Release Note cb2Bib 1.1.0

A frequent request from cb2Bib users has been to expand the command line functionality. So far few progress has been seen in this regard. First, the addition of in-document searches and reading/inserting metadata were priorities. Second, cb2Bib is not the tool to interconvert among bibliographic formats. And third, cb2Bib is designed to involve the user in the search process, in the archiving and validation of the discovered works and references.

For the latter reason, and for not knowing a priori how would such a tool be designed, the cb2Bib internals had been interlaced to its graphical interface. At the time of version 0.7.0, when the graphical libraries changed, and a major refactoring was required, the code started moving toward a better modularization and structure. The current release pushes code organization further. As a result, it adds two new command line switches: –html-annote and –view-annote.

The new cb2Bib module is named after the BibTeX key ‘annote’. Annote is not for a ‘one reference annotation’ though. Instead, Annote is for short notes that interrelate several references. Annote takes a plain text note, with minimal or no markup, inserts the bibliographic citations, and converts it to a HTML page with links to the referenced documents.

From within cb2Bib, to write your notes, type Alt+A, enter a filename, either new or existing, and once in Annote, type E to launch your default text editor. For help, type F1. Each time you save the document the viewer will be updated. To display mathematical notations, install jsMath locally. And, remember, code refactoring introduces bugs.

See also cb2Bib Annote and cb2Bib Command Line.


Release Note cb2Bib 1.0.0

Approximately four years ago the first cb2Bib was released. It included the possibility of easily linking a document to its bibliographic reference, in a handy way, by dragging the file to the main (at that time, single) panel. Now, in version 1.0.0, when a file is dropped, cb2Bib scans the document for metadata packets, and checks, in a rather experimental way, whether or not they contain relevant bibliographic information.

Publishers metadata might or might not be accurate. Some, for instance, assign the DOI to the key Title. cb2Bib extracts possibly relevant key-value pairs and adds them to clipboard panel. Whenever key-value pairs are found accurate, just pressing Alt+G imports them to the line edits. If keys with the prefix bibtex are found, their values are automatically imported.

The preparsed metadata that is added to the clipboard panel begins with [Bibliographic Metadata and ends with /Bibliographic Metadata]. Therefore, if you are using PDFImport together with a set of regular expressions, such that they contain the begin (^) or end ($) anchors, you can safely replace them by the above tags. In this manner, existing regular expressions remain useful with this minor change. And, with the advantage that, if recognition fails for a given document, metadata might give the hardest fields to extract from a PDF article, which are author and title.

See also Reading and Writing Bibliographic Metadata.


Release Note cb2Bib 0.8.4

The previous cb2Bib release added the command line option –conf [full_path]cb2bib.conf to specify the settings location. This feature was intended, mainly, as a clean way to run the program on a host computer from a removable drive. The work done focused on arranging the command line and settings related code. It was left for a later release to solve some requirements regarding the managing of file pathnames and temporary files.

This release addresses these two points. Now, when cb2Bib is launched as cb2bib –conf –without a configuration filename– it treats filenames as being relative to the cb2Bib actual location. Temporary files, if needed, will be placed at this location as well. Therefore, no data is being written on the host, and cb2Bib works independently of the actual address that the host assigns to the removable drive.

The Windows’ un/installer cleans/sets configuration data on the registry. Being aware of this particular, it might be better not to install the program directly to the USB drive. Just copy the cb2Bib base directory from a home/own computer to the removable drive, and then run it on the host computer as cb2bib –conf.


Release Note cb2Bib 0.8.3

cb2Bib accepts several arguments on its command line to access specific functionality. So far, the command cb2bib tmp_ref permits importing references from the browser, whenever a download to reference manager choice is available. In addition, the command cb2bib –bibedit ref.bib directly launches the BibTeX editor for file browsing and editing.

This release adds the command line option –conf [full_path]cb2bib.conf to specifically set a file where all internal settings are being retrieved and stored. This has two interesting applications. On one hand, it easily permits switching from several sets of extraction rules, since the files abbreviations.txt, regexps.txt, and netqinf.txt are all stored in the cb2Bib’s settings. And, on the other hand, it allows installing the program on a USB flash drive, and cleanly running it on any (e. g., library) computer. Settings can be stored and kept on the external device, and therefore, no data will be written on the registry or settings directory of the host computer.

So far, however, this feature should be regarded as experimental. The Qt library to which cb2Bib is linked does read/write access to system settings in a few places (concretely, in file and color dialogs). On Unix and Mac OS systems this access can be modified by setting the environment variable DAG_CONFIG_HOME. No such workaround is presently available in Windows.

See cb2Bib Command Line for a detailed syntax description.


Release Note cb2Bib 0.8.1

Several changes in this release affect installation and deployment. First, the cb2Bib internals for settings management has been reorganized. Version 0.8.1 will not read previous settings, as user colors, file locations, etc. On Unix, settings are stored at ~/.config/MOLspaces/cb2Bib.conf. This file can be removed, or renamed. On Windows, it is recommended to uninstall previous versions before upgrading.

Second, cb2Bib tags are not shown by default. Instead, it is shown plain, raw clipboard data, as it is easier to identify with the original source. To write a regular expression, right click, on the menu, check ‘View Tagged Clipboard Data’, and perform the extraction from this view.

And finally, cb2Bib adds the tag <<excerpt>> for network queries. It takes a simplified version of the clipboard contents and sends it to, e.g. Google Scholar. From there, one can easily import BibTeX references related to that contents. Therefore one should unchecked in most cases the ‘Perform Network Queries after automatic reference extractions’ box.


Release Note cb2Bib 0.7.2

cb2Bib reads the clipboard contents, processes it, and places it to the main cb2Bib’s panel. If clipboard contents can be recognized as a reference, it writes the corresponding BibTeX entry. If not, the user can interact from the cb2Bib panel and complete or correct the reference. Additionally, this process permits to write down a regular expression matching the reference’s pattern.

To ease pattern writing, cb2Bib preprocesses the raw input data. This can consider format conversion by external tools and general substitutions, in addition to including some special tags. The resulting preprocessed data is usually less readable. A particularly illustrating case is when input data comes from a PDF article.

cb2Bib now optionally presents input data, as raw, unprocessed data. This preserves the block text format of the source, and thus identifying the relevant bibliographic fields by visual inspection is more straightforward. In this raw mode view panel, interaction works in a similar manner. Except that, no conversions or substitutions are seen there, and that no regular expression tags are written.


Release Note cb2Bib 0.7.0

This release moves forward cb2Bib base requirement to Qt 4.2.0. Compilation errors related to rehighlight() library calls, kindly reported by Bongard, Seemann, and Luisser, should not appear anymore. File/URL opening is carried now by this library, in a desktop integrated manner. Additionally, Gnome users will enjoy better integration, as Cleanlooks widget style is available.

All known regressions in 0.6.9x series have been fixed. Also, a few minor improvements have been included. In particular, file selection dialogs display navigation history, and BibTeX output file can be conveniently selected from the list of ‘*.bib’ files at the current directory. Such a feature will be specially useful to users that sort references in thematic files located at a given directory.


Release Note cb2Bib 0.6.91

This release fixes a regression in the cb2Bib network capabilities. Network, and hence querying was erratic, both for the internal HTTP routines and for external clients. In addition to this fix, the netqinf.txt has been updated. PubMed is working again. Queries are also extended to include DOI’s. A possible applicability will be for indexing a set of PDF articles with PDFImport. If the article contains its DOI number, and ‘Perform Network Queries after automatic reference extractions’ is checked, chances are that automatic extractions will work smooth.


Release Note cb2Bib 0.6.90

cb2Bib has been ported from Qt3 to Qt4, a migration in its underlying system library. Qt experienced many changes and improvements in this major release upgrade. Relevant to cb2Bib, these changes will provide a better file management, word completion, faster searches, and better desktop integration.

Upgrading to Qt4 it is not a “plug and recompile” game. Thorough refactoring and rewriting was required. The resulting cb2Bib code is cleaner and more suitable to further development. As one might expect, major upgrades introduce new bugs that must be fixed. The cb2Bib 0.6.90 is actually a preview version. It has approximately the same functionality than its predecessor. So, no additions were considered at this point. Its use, bug reporting, and feedback are encouraged. This will help to get sooner a stable cb2Bib 0.7.

To compile it, type ./configure as usual. The configure script calls the qmake tool to generate an appropriate Makefile. To make sure the right, Qt4 qmake is invocated, you can setup QTDIR environment variable prior to ./configure. The configure’s call statement will then be '$QTDIR/bin/qmake'. E. g., type 'setenv QTDIR /usr' if qmake happens to be at the directory/usr/bin.


Release Note cb2Bib 0.6.0

cb2Bib uses the internal tags <<NewLine_n>> and <<Tab_n>> to ease the creation of regular expressions for reference extraction. New line and tabular codes from the input stream are substituted by these numbered tags. Numbering new lines and tabulars gives an extra safety when writing down a regular expression. E. g., suppose field title is ‘anything’ between ‘<<NewLine1>> and <<NewLine2>>’. We can then easily write ‘anything’ as ‘.+’ without the risk of overextending the caption to several ‘\n’ codes. On the other hand, one still can use <<NewLine\d>> if not interested in a specific numbering. All these internal tags are later removed, once cb2Bib postprocesses the entry fields.

The cb2Bib identified so far new lines by checking for ‘\n’ codes. I was unaware that this was a platform dependent, as well as a not completely accurate way of detecting new lines. McKay Euan reported that <<NewLine_n>> tags were not appearing as expected in the MacOSX version. I later learn that MacOSX uses ‘\r’ codes, and that Windows uses ‘\r\n’, instead of ‘\n’ for new line encoding.

This release addresses this issue. It is supposed now that the cb2Bib regular expressions will be more transferable among the different platforms. Extraction from plain text sources is expected to be completely platform independent. Extraction from web pages will still remain browser dependent. In fact, each browser adds its peculiar interpretation of a given HTML source. For example, in Wiley webpages we see the sectioning header ‘Abstract’ in its source and in several browsers, but we see, and get, ‘ABSTRACT’ if using Konqueror.

What we pay for this more uniform approach is, however, a break in compatibility with previous versions of cb2Bib. Unix/Linux users should not expect many differences, though. Only one from the nine regular expressions in the examples needed to be modified, and the two contributed regular expressions work perfectly without any change. Windows users will not see a duplication of <<NewLine_n>> tags. To update previous expressions it should be enough just shifting the <<NewLine_n>> numbering. And, of course, any working regular expression that does not uses <<NewLine_n>> tags will still be working in this new version.

Finally, just to mention that I do not have a MacOSX to test any of the cb2Bib releases in this particular platform. I am therefore assuming that these changes will fix the problem at hand. If otherwise, please, let me know. Also, let me know if release 0.6.0 ‘break’ your own expressions. I consider this release a sort of experimental or beta version, and the previous version 0.5.3, will still be available during this testing period.


Release Note cb2Bib 0.5.0

Two issues had appeared regarding cb2Bib installation and deployment on MacOSX platforms.

First, if you encounter a ‘nothing to install’-error during installation on MacOSX 10.4.x using the cb2Bib binary installer available at naranja.umh.es/~atg/, please delete the cb2bib-receipts from /Library/Receipts and then rerun the installer. See also M. Bongard’s clarifying note ‘MACOSX 10.4.X “NOTHING TO INSTALL”-ERROR’ for details.

Second, and also extensible to other cb2Bib platform versions, if PDFImport issues the error message ‘Failed to call some_format_to_text’ tool, make sure such a tool is installed and available. Go to Configure->PDFImport, click at the ‘Select External Convert Tool’ button, and navigate to set its full path. Since version 0.5.0 the default full path for the MacOSX is already set, and pointing to /usr/local/bin/pdftotext.


Release Note cb2Bib 0.4.1

Qt/KDE applications emit notifications whenever they change the clipboard contents. cb2Bib uses these notifications to automatically start its ‘clipboard to BibTeX’ processing. Other applications, however, does not notify about them. Since version 0.2.1, see Release Note cb2Bib 0.2.1, cb2Bib started checking the clipboard periodically. This checking was later disabled as a default, needing a few lines of code to be uncomented to activate it. Without such a checking, cb2Bib appears unresponsive when selecting/copying from e.g., acroread or Mozilla. This release includes the class clipboardpoll written by L. Lunak for the KDE’s Klipper. Checking is performed in a very optimized way. This checking is enabled by default. If you experience problems with this feature, or if the required X11 headers aren’t available, consider disabling it by typing ./configure –disable-cbpoll prior to compilation. This will disable checking completely. If the naive, old checking is preferred, uncomment the four usual lines, ./configure –disable-cbpoll, and compile.


Release Note cb2Bib 0.3.5

Releases 0.3.3 and 0.3.4 brought querying functionality to cb2Bib. In essence, cb2Bib was rearranged to accommodate copying and opening of network files. Queries were then implemented as user customizable HTML posts to journal databases. In addition, these arrangements permitted defining convenience, dynamic bookmarks that were placed at the cb2Bib’s ‘About’ panel.

cb2Bib contains three viewing panels: ‘About’, ‘Clipboard’ and ‘View BibTeX’, being the ‘Clipboard’ panel the main working area. To keep cb2Bib simple, only two buttons, ‘About’ and ‘View BibTeX’, are set to navigate through the panels. The ‘About’ and ‘View BibTeX’ buttons are toggle buttons for momentarily displaying their corresponding panels. Guidance was so far provided by enabling/disabling the buttons.

After the bookmark introduction, the ‘About’ panel has greatly increased its usefullness. Button functionality has been slightly redesigned now to avoid as many keystrokes and mouse clicks as possible. The buttons remain switchable, but they no longer disable the other buttons. User is guided by icon changes instead. Hopefully these changes will not be confusing or counterintuitive.

Bookmarks and querying functionality are customizable through the netqinf.txt file, which is editable by pressing the Alt+B keys. Supported queries are of the form ‘Journal-Volume-First Page’. cb2Bib parses netqinf.txt each time a query is performed. It looks for journal=Full_Name|[code] to obtain the required information for a specific journal. Empty, ‘journal=’ entries have a meaning of ‘any journal’. New in this release, cb2Bib will test all possible queries for a given journal instead of giving up at the first No article found message. The query process stops at the first successfull hit or, otherwise, once netqinf.txt is parsed completely (in an equivalent way as the automatic pattern recognition works). This permits querying multiple -and incomplete- journal databases.

Users should order the netqinf.txt file in a way it is more convenient. E.g., put PubMed in front of JACS if desired an automatic extraction. Or JACS in front of PubMed and extract from the journal web page, if author accented characters are wanted.

So far, this querying functionality is still tagged as experimental. Either the querying itself or its syntax seem quite successful. However, downloading of PDF files, on windows OS + T1 network, was found to freeze once progress reaches the 30-50%. Any feedback on this issue will be greatly appreciated. Also, information on kfmclient equivalent tools for non KDE desktops would be worth to be included in the cb2Bib documentation.


Release Note cb2Bib 0.3.0

cb2Bib considers the whole set of authors as an author-string pattern. This string is later postprocessed, without requirements on the actual number of authors it may contain, or on how the names are written. Once considered author-string patterns, the extraction of bibliographic references by means of regular expressions becomes relatively simple.

There are situations, however, where several author-strings are required. The following box shows one of these cases. Authors are grouped according to their affiliations. Selecting from ‘F. N. First’ to ‘F. N. Fifth’ would include ‘First Affiliation’ within the author string. Cleaning up whatever wording ‘First Affiliation’ may contain is a rather ill-posed problem. Instead, cb2Bib includes an Add Authors option. The way of operation is then to select ‘F. N. First, F. N. Second, F. N. Third’ and chose Authors and right after, select ‘F. N. Fourth and F. N. Fifth’ and chose Add Authors.

                                             Journal Name, 10, 1100-1105, 2004


                     F. N. First, F. N. Second, F. N. Third
                                First Affiliation

                           F. N. Fourth and F. N. Fifth
                                Second Affiliation

  Abstract: Select from "Journal Name ..." to "... second author set.". The 'F.
  N. First, F. N. Second, F. N. Third' author string is automatically processed
  as one author set, while 'F. N. Fourth and F. N. Fifth' is processed as
  another, second author set.

At this point in the manual extraction, the user was faced with a red <<moreauthors>> tag in the cb2Bib clipboard panel. The <<moreauthors>> tag was intended to warn the user about the fact that cb2Bib would not be able to consider the resulting extraction pattern as a valid, general regular expression. Usual regular expressions are built up from an a priori known level of nesting. In these cases, however, the level of nesting is variable. It depends on the number of different affiliations occurring in a particular reference.

So far the <<moreauthors>> tag has become a true FAQ about cb2Bib and a source of many confusions. There is no real need, however, for such an user warning. The <<moreauthors>> has therefore been removed and cb2Bib has taken an step further, to its 0.3.0 version.

The cb2Bib 0.3.0 manual extraction works as usual. By clicking Authors the Authors edit line is reseted and selection contents moved there. Alternatively, if Add Authors is clicked, selection contents is added to the author field. On this version, however, both operations are tagged as <<author>> (singular form, as it is the BibTeX keyword for Authors). The generated extraction pattern can now contain any number of <<author>> fields.

In automatic mode, cb2Bib now adds all author captions to Authors. In this way, cb2Bib can treat interlaced author-affiliation cases. Obviously, users needing such extractions will have to write particular regular expressions for cases with one set of authors, for two sets, and so on. Eventhough it is not rare a work having a hundred of authors, it would be quite umprobable that they were working on so many different institutions. Therefore, few regular expressions should actually be required in practice. Although not elegant, this breaks what was a cb2Bib limitation and broadens its use when extracting from PDF sources. Remember here to sort these regular expressions in decreasing order, since at present, cb2Bib stops at the first hit. Also, consider Any Pattern to get ride of the actual affiliation contents, as you might not want to extract authors addresses.


Release Note cb2Bib 0.2.7

The cb2Bib 0.2.7 release introduces multiple retrieving from PDF files. PDF documents are becoming more and more widely used, not only to transfer and printing articles, but also are substituting the personal paper files and classifiers for the electronic equivalents.

cb2Bib is intended to help updating personal databases of papers. It is a tool focused on what is left behind in database retrieving. Cases such as email alerts, or inter colleague references and PDF sharing are example situations. Though in an electronic format, sources are not standardized or not globally used as to permit using habitual import filters in reference managers. cb2Bib is designed to consider a direct user intervention, either by creating its own useful filters or by a simple copy-paste assistance when handtyping.

Hopefully someday cb2Bib will be able to take that old directory, with perhaps a few hundreds of papers, to automatically index the references and rename the files by author, in a consistent manner. The required mechanism is already there, in this version. But I guess that this new feature will manifest some present limitations in cb2Bib. For instance, most printed and PDF papers interlace author names and affiliations. cb2Bib doesn’t have the means to automatically discern an author name from a department or street name. So far one needs to manually use the ‘Add to Authors’ feature to deal with these situations. Also, the managing of regular expressions needs developing, specially thinking in the spread variety of design patterns in publications.

In summary, this current version is already useful in classifying and extracting the reference of that couple of papers that someone send right before submitting a work. A complete unsupervised extraction is still far away, however.


Release Note cb2Bib 0.2.1

The cb2Bib mechanism ‘select-and-catch’ failed in some cases. Acrobat and Mozilla selections were not always notified to cb2Bib. Indeed, this ‘window manager - application’ connection seems to be broken on a KDE 3.3.0 Qt 3.3.3 system.

The cb2Bib 0.2.1 continues to listen to system clipboard change notifications, whenever they are received and whenever cb2Bib is on connected mode. Additionally, the cb2Bib 0.2.1 periodically checks for changes in the system clipboard. Checks are performed every second, approximately. This permits cb2Bib to work as usual, although one could experience 1-2 seconds delays in systems where the automatic notification is broken.

If the ‘select-and-catch’ functionality appears ‘sticky’, possibly happening while using non KDE applications from where text is selected, check the source file c2bclipboard.cpp, look for 'Setting timer', and set variable interval to 1000. This is the interval of time in ms that cb2Bib will use to check for clipboard changes.