docxpy

This project is forked from ankushshah89/python-docx2txt. A new feature is added: extract the hyperlinks and its corresponding texts.

It is a pure python-based utility to extract text from docx files. The code is taken and adapted from python-docx. It can however also extract text from header, footer and hyperlinks. It can now also extract images.

How to install?

pip install docxpy

How to run?

From command line:

# extract text
docx2txt file.docx
# extract text and images
docx2txt -i /tmp/img_dir file.docx

From python:

import docxpy

file = 'file.docx'

# extract text
text = docxpy.process(file)

# extract text and write images in /tmp/img_dir
text = docxpy.process(file, "/tmp/img_dir")


# if you want the hyperlinks
doc = docxpy.DOCReader(file)
doc.process()  # process file
hyperlinks = doc.data['links']

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
bin		bin
docxpy		docxpy
tests		tests
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE.txt		LICENSE.txt
MANIFEST		MANIFEST
MANIFEST.in		MANIFEST.in
README.rst		README.rst
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

docxpy

How to install?

How to run?

About

Releases

Packages

Languages

License

badbye/docxpy

Folders and files

Latest commit

History

Repository files navigation

docxpy

How to install?

How to run?

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages