Byte Pair Encoding for Natural Language Processing (NLP)

Byte Pair Encoding is originally a compression algorithm that was adapted for NLP usage.


One of the important steps of NLP is determining the vocabulary.


There are different ways to model the vocabularly such as using an N-gram model, a closed vocabularly, bag of words, and etc. However, these methods are either very computationally memory heavy (usually paired with a large vocabularly size), do not handle OOV (out-of-vocabulary) words very well, or run into issues with words that rarely appear....

 •  0 comments  •  flag
Share on Twitter
Published on September 05, 2020 18:14
No comments have been added yet.