This repository contains the GPT2 model which tokenizes the words sentence by sentence. I've also made sure that the words which can't be mapped to the dictionary of GPT2 are kept just like that and are not divided into fragments.
pip install transformers
git clone https://github.com/huggingface/transformers.git
cd transformers
pip install .
pip install torch