Extractive Text Summarization

This is a PyTorch implementation of the paper SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive Summarization of Documents

Demo

Clone the project using the command line

git clone https://github.com/KristinaRay/text-summarization.git

cd text-summarization

To start Flask server locally run

python demo/app.py --model_dir <model_path> \
                   --tokenizer_dir <BPE_model_path> \
                   --host <host> \
                   --port <port>

Dependencies

Install dependencies by running

pip install -r requirements.txt

Dataset

CNN/DailyMail non-anonymized summarization dataset is used for training the extractive summarization model.

In order to get the dataset, run

python fetch_dataset.py

For more information about CNN/DailyMail dataset, please follow the link

Usage

To train the extractive summarization model, run

python main.py --num_samples <number of dataset samples> \
                --batch_size <batch size> \
                --num_epochs <number of epochs> \
                --clip <gradient clip value> \
                --seed <a seed value for random number generator>

Generate the extractive summary of a text file

python inference.py --model_path <path to the trained model> \
                --tokenizer_path <path to the trained BPE tokenizer> \
                --input_file_path <path to a text file>

Output

The summary of a text file is saved in txt file ./output.txt

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
demo		demo
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
dataloader.py		dataloader.py
fetch_dataset.py		fetch_dataset.py
inference.py		inference.py
main.py		main.py
models.py		models.py
requirements.txt		requirements.txt
secret.py		secret.py
train_model.py		train_model.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Extractive Text Summarization

Demo

Dependencies

Dataset

Usage

Output

About

Releases

Packages

Languages

License

KristinaRay/text-summarization

Folders and files

Latest commit

History

Repository files navigation

Extractive Text Summarization

Demo

Dependencies

Dataset

Usage

Output

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages