summa - textrank

TextRank implementation for text summarization and keyword extraction in Python. An online version can be tested here.

Features

Text summarization
Keyword extraction
Text modeling with graph and gexf exportation

Examples

Text summarization:

>>> text = """CMC InfoSec cho hay, ngày hôm qua, 10/10/2017, Microsoft đã phát hành Patch Tuesday tháng 10 của mình, vá 62 lỗi bảo mật trong đó có 1 lỗi Zero-day nghiêm trọng (CVE-2017-11826).

Lỗ hổng CVE-2017-11826 gây ra memory corruption (xung đột vùng nhớ) từ đó cho phép thực thi mã lệnh từ xa, tồn tại trong tất cả các bản Microsoft Office 2017, Word Automation Services và Microsoft Office Web Apps Server. Điều đáng lo ngại là lỗ hổng này đã bị khai thác.

“Kẻ tấn công có thể thực hiện lây nhiễm đến các máy nạn nhân bằng phương thức Phishing (tấn công lừa đảo - PV), phát tán các file .docx và RTF có chứa mã độc. Khi đã lây nhiễm thành công vào máy nạn nhân, chúng đợi nạn nhân đăng nhập vào hệ thống để chiếm toàn bộ các quyền admin gồm cài đặt chương trình; xem, thay đổi, xóa dữ liệu hoặc tạo ra các tài khoản admin với các quyền kiểm soát hệ thống”, theo thông báo của Microsoft.

Lỗ hổng CVE -2017 - 11826 đã được công ty bảo mật Qihoo 360 Core Security phát hiện và cảnh báo đến Microsoft vào khoảng trung tuần tháng 9/2017. Theo Qihoo 360 Core Security, lỗ hổng CVE - 2017 - 11826 đã bị tin tặc khai thác và thực hiện tấn công lên một nhóm người dùng nhất định.

Qua phân tích đảo chiều của mẫu máy chủ điều khiển, hãng Qihoo 360 Core Security cũng cho biết cuộc tấn công đã được chuẩn bị từ trước tháng 8/2017 và khởi động vào đầu tháng 9 vừa qua."""

>>> from summa import summarizer
>>> print summarizer.summarize(text)
'Lỗ hổng CVE -2017 - 11826 đã được công ty bảo mật Qihoo 360 Core Security phát hiện và cảnh báo đến Microsoft vào khoảng trung tuần tháng 9/2017.

Theo Qihoo 360 Core Security, lỗ hổng CVE - 2017 - 11826 đã bị tin tặc khai thác và thực hiện tấn công lên một nhóm người dùng nhất định.'

Keyword extraction:

>>> from summa import keywords
>>> print keywords.keywords(text)
document
automatic summarization
technologies
technology

Installation

This software depends on NumPy and Scipy, two Python packages for scientific computing. You must have them installed prior to installing summa:

pip install summa

If you are going to use the export function, you also need NetworkX. For a better performance of keyword extraction, install Pattern

This version has been tested under Python 2.7

More examples

Command-line usage:

cd path/to/folder/summa/
python textrank.py -t FILE

Export:

>>> from summa.export import gexf_export
>>> gexf_export(text, path="graph.gexf")

Define length of the summary as a proportion of the text (also available in keywords):
```
>>> from summa.summarizer import summarize
>>> summarize(text, ratio=0.2)
```
Define length of the summary by aproximate number of words (also available in keywords):
```
>>> summarize(text, words=50)
```
Define input text language (also available in keywords):
```
>>> summarize(text, language='spanish')
```

The available languages are "danish", "dutch", "english", "finnish", "french", "german", "hungarian", "italian", "norwegian", "porter", "portuguese", "romanian", "russian", "spanish", "swedish"

Get results as a list (also available in keywords):

>>> summarize(text, split=True)
['Automatic summarization is the process of reducing a text document with a
computer program in order to create a summary that retains the most important
points of the original document.']

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.vscode		.vscode
summa		summa
test		test
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST		MANIFEST
README		README
README.rst		README.rst
__init__.py		__init__.py
document.txt		document.txt
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

summa - textrank

Features

Examples

Installation

More examples

About

Uh oh!

Releases

Packages

Languages

License

hung-ai-dev/VNMDocSummarize

Folders and files

Latest commit

History

Repository files navigation

summa - textrank

Features

Examples

Installation

More examples

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages