2007-03-19 09:47:52 v1.0 | ||
NGramJ, smart scanning for document properties.
What is NGramJ? |
Download NgramJ | Sourceforge Project Summary | NgramJ Online |
What is NGramJ? Practical Usages Other Applications Getting Started How Does it Work? Contact How to Contribute? Developer Information Other Information |
ngrams are a rather classical instrument in Natural Language Processing (NLP) applications. NGramJ is a Java based library containing two types of ngram based applications. It's major focus is to provide robust and state of the art language recognition (or language guessing how some call it more correctly). Both types are meant to be embedded into larger applications. Language recognition is not the only NLP application of ngrams and NGramJ can be used as a building block in all kinds of differing applications. However Langugage recognition was my major application and therefore NGramJ is somewhat streamlined for this.
Once you are in a program and treat Strings and other kinds of character sequences, CNgram is the only reasonable way to go. The CNgram library has been developed under consideration of multithreading and performance requirements. CNgram has also a language recognition mechanism which (to some extend) successfully recognizes mixed language documents. Caution: For historical reasons NGramJ sometimes refers to the (older) byte based ngrams excluding the newer addition of CNgram. I'm sorry about the confusion. There are alternative Java implementations of n-Grams. | ||||
| |||||
A Spieleck Project | top |