Case sensitive input for LKB?

Hi all,

I have students using a transliteration system for Telugu this quarter that relies on case distinctions. Does the LKB processor still necessarily normalize case for ascii input, or is there a way to turn this off?

Thanks!

Emily & Ling 567

This answer relates only to the LKB; ACE might be different. I think the following is correct, but please correct me if not (I’ve never actually used morphology in the LKB – just programmed around that area).

LKB morphological processing is case-insensitive. Hence the morphological specifications and input tokens are case-folded before processing (to uppercase). If you have token mapping rules, they can get hold of the original form of the token before case-folding via the feature +FORM, and then if you want you can copy this to ORTH.FORM before lexical and phrasal processing.

In LKB-FOS I can think of a way to escape from case-folding in morphological processing: since all characters are Unicode (not ASCII), you could substitute each lower case character with a counterpart that has no Unicode case attribute, such as the equivalent small-caps character. E.g. “a” would be represented as U+1D00 Latin Letter Small Capital A (ᴀ). This should work, but I haven’t tested it. It has the disadvantage that you can’t directly enter such characters on the keyboard – you’d have to paste them or use some kind of input method. (However, with a modest amount of programming, it would be possible to do this mapping between lowercase and small-caps automatically in input and output).

Or maybe someone has a better idea?

I’ve had a further thought: use token mapping rules to do the character substitution on input, and post-generation mapping rules to substitute back when generating.

Thank you, @johnca ! Since the students are working with a transliteration anyway, I’ll just advise them to choose a different representation.