The ERG internally (still) analyzes most punctuation marks as pseudo-affixes (rather than as separate tokens, as in the PTB). To accommodate any discrepancies, the grammar includes token mapping rules to adjust (i.e. correct) externally supplied tokenization (see the ChartMapping page for general background); specifically, punctuation marks will be re-combined with preceding or following tokens, reflecting standard orthographic convention.
I am trying to find the arguments in favor of this approach. I remember to have read it somewhere. Can anyone help me with the reference?
BTW, what happens with commas or dots following quotes? They are considered jointly as suffixes for the preceding word?