cb2Bib Processing of Author Names

cb2Bib automatically processes the author names string. It uses a set of heuristic rules. First, the authors separator is identified. And second, it is decided whether or not author names are in natural or reverse order, or in the ‘Abcd, E., F. Ghij, …’ mixed order.

Cleanup author string:

  • Escape BibTeX to Unicode
  • Remove digits from authors string
  • Remove any character except -',;&\.\s\w
  • Simplify white spaces
  • Consider composing prefixes (da|de|dal|del|der|di|do|du|dos|el|la|le|lo|van|vande|von|zur)
  • Consider composing suffixes (II|III|IV|Jr)
  • Some publishers use superscripts to refer to multiple author affiliations. Text clipboard copying loses superscript formatting. Author strings are clean from ‘orphan’ lowcase, single letters in a preprocessing step. Everything following the pattern [a-z] is removed. Fortunately, abbreviated initials are most normally input as uppercase letters, thus permitting a correct superscript clean up.
    Caution: Lowcase, single, a to z letters are removed from author’s string.
    Caution: Supperscripts will be added to author Last Name if no separation is provided. Users should care about it and correct these cases.

Rules to identify separators:

  • Contains comma and semicolon -> ‘;’
  • Contains pattern '^Abcd, E.-F.,' -> ‘,’
  • Contains pattern '^Abcd,' -> ‘and’
  • Contains comma -> ‘,’
  • Contains semicolon -> ‘;’
  • Any other -> ‘and’

Rules to identify ordering:

  • Contains comma and semicolon -> Reverse
  • Pattern '^Abcd,' -> Reverse
  • Pattern '^Abcd EF Ghi' -> Natural
  • Pattern '^Abcd EF' -> Reverse
  • Pattern '^Abcd E.F.' -> Reverse
  • Any other pattern -> Natural