What is NGramJ?
Getting Started Download Instructions Decide Which Type Use CNgram Use NGramJ
How Does it Work?
How to Contribute?
Developer Information
Other Information
Run either byte NGramJ or character CNgram
Here are some common cases.
- You have files with text of unknown encoding.
Use byte NGramJ
to determine both encoding and
- You have files with text of known encoding.
Use CNgram
to determine language or mixed
language documents.
- You don't have files but Strings within your Application.
Use CNgram
to determine language or mixed
language Strings.
- You have structured files in XML/HTML.
Usually encoding is not the problem, but you need to
get rid of the markup by using a parser first, then
use CNgram
to determine language or mixed
language documents. Note: The parser has to be
started somehow differently.