Eugene Charniak breaks new ground in artificial intelligence research by presenting statistical language processing from an artificial intelligence point of view in a text for researchers and scientists with a traditional computer science background.New, exacting empirical methods are needed to break the deadlock in such areas of artificial intelligence as robotics, knowledge representation, machine learning, machine translation, and natural language processing (NLP). It is time, Charniak observes, to switch paradigms. This text introduces statistical language processing techniques -- word tagging, parsing with probabilistic context free grammars, grammar induction, syntactic disambiguation, semantic word classes, word-sense disambiguation -- along with the underlying mathematics and chapter exercises.Charniak points out that as a method of attacking NLP problems, the statistical approach has several advantages. It is grounded in real text and therefore promises to produce usable results, and it offers an obvious way to approach "one simply gathers statistics."Language, Speech, and Communication
There are more complete textbooks now that cover everything in this book in more detail, but this was one of the first entries in the field, and contains concise explanations of basic statistical language modeling, introducing you to n-grams, Hidden Markov Models, Probabilistic Context Free Grammars, and so on. It also has the advantage of being a very small paperback, if you want a light reference to carry around with you.