What is NGramJ?
Getting Started Download Instructions Decide Which Type Use CNgram Use NGramJ
How Does it Work?
Contact
How to Contribute?
Developer Information
Other Information
|
Run either byte NGramJ or character CNgram
Here are some common cases.
- You have files with text of unknown encoding.
-
Use byte NGramJ
to determine both encoding and
language.
- You have files with text of known encoding.
-
Use CNgram
to determine language or mixed
language documents.
- You don't have files but Strings within your Application.
-
Use CNgram
to determine language or mixed
language Strings.
- You have structured files in XML/HTML.
-
Usually encoding is not the problem, but you need to
get rid of the markup by using a parser first, then
use CNgram
to determine language or mixed
language documents. Note: The parser has to be
started somehow differently.
|