The cb2Bib automatically processes the author names string. It uses a set of
heuristic rules. First, the authors separator is identified. And second, it is
decided whether or not author names are in natural or reverse order, or in the
'Abcd, E., F. Ghij, ...' mixed order.
Cleanup author string:
- Remove digits from authors string
- Remove any character except
-',;&\.\s\w
- Consider composing prefixes
(da|de|dal|del|der|di|do|du|dos|la|le|van|vande|von)
- Consider composing suffixes
(II|III|IV|Jr)
- Some publishers use superscripts to refer to multiple author affiliations.
Text clipboard copying loses superscript formatting. Author strings are clean from
'orphan' lowcase, single letters in a preprocessing step. Everything following the
pattern [a-z] is removed. Fortunately, abbreviated initials are most
normally input as uppercase letters, thus permitting a correct superscript clean
up.
Caution: Lowcase, single, a to z letters are removed from author's
string.
Caution: Supperscripts will be added to author Last Name if no
separation is provided. Users should care about it and correct these cases.
Rules to identify separators:
- Contains comma and semicolon -> ';'
- Contains pattern
'^Abcd, E.-F.,' -> ','
- Contains pattern
'^Abcd,' -> 'and'
- Contains comma -> ','
- Contains semicolon -> ';'
- Any other -> 'and'
Rules to identify ordering:
- Contains comma and semicolon -> Reverse
- Pattern
'^Abcd,' -> Reverse
- Pattern
'^Abcd EF Ghi' -> Natural
- Pattern
'^Abcd EF' -> Reverse
- Pattern
'^Abcd E.F.' -> Reverse
- Any other pattern -> Natural