CNgram has been implemented in a medium scale Lucene implementation and determines language of several 10000 documents every night in a few minutes CPU time. there is
online demo if the Phoner
servlet from (byte) NGramJ. This application tries to determine a word
to a given phone number. This is not done by using a dictionary of words,
but by using a ngram profile and trying to generate as plausible ngrams
as possible.
(Byte) NGramJ was evaluated for the AutoFocus product.