Byte Pair Encoding is originally a compression algorithm that was adapted for NLP usage.
One of the important steps of NLP is determining the vocabulary.
There are different ways to model the vocabularly such as using an N-gram model, a closed vocabularly, bag of words, and etc. However, these methods are either very computationally memory heavy (usually paired with a large vocabularly size), do not handle OOV (out-of-vocabulary) words very well, or run into issues with words that rarely appear....
Published on September 05, 2020 18:14