Documents to HTML converter
Extension | Text | Styles extraction | Images extraction |
---|---|---|---|
HTML/XHTML | Yes | Yes | Yes |
XML | Yes | Not applicable | Not applicable |
DOCX | Yes | Yes | Yes |
DOC | Yes | No | No |
RTF | Yes | Yes | Yes |
ODT | Yes | Yes | Yes |
XLSX | Yes | Yes | Yes |
XLS | Yes | Yes | No |
CSV | Yes | Not applicable | Not applicable |
TXT/MD | Yes | Yes | Yes |
JSON | Yes | Not applicable | Not applicable |
EPUB | Yes | Yes | Yes |
Yes | No | Yes | |
PPT | Yes | No | No |
cURL for downloading images:
apt-get install libcurl4-openssl-dev
or
brew install curl
iconv for encoding conversion
sudo apt-get install libc6
or
brew install libiconv
Tidy for cleaning and repairing HTML
sudo apt-get install libtidy-dev
or
brew install tidy-html5
file for determining file extension
- getoptpp - Command line options parser
- lodepng - PNG encoder and decoder
- miniz - Data compression library
- json - JSON parser
- pygixml - XML parser
Make sure the Qt (>= 5.6) development libraries are installed:
- In Ubuntu/Debian:
apt-get install qt5-default qttools5-dev-tools zlib1g-dev
- In Fedora:
sudo dnf builddep tiled
- In Arch Linux:
pacman -S qt
- In Mac OS X with Homebrew:
brew install qt5
brew link qt5 --force
- Or you can download Qt from: https://www.qt.io/download-open-source/
Now you can compile by running:
qmake (or qmake-qt5 on some systems)
make
To do a shadow build, you can run qmake from a different directory and refer it to space-invaders.pro, for example:
mkdir build
cd build
qmake ../src/document2html.pro
make
If you have ideas how to build project with CMake instead of Qt please contact me.
Usage:
document2html -f|-d <input file|dir> -o <output dir> [-si]
document2html -h
document2html -v
Options:
Short Flag | Long Flag | Description |
---|---|---|
-f | --file | Input file |
-d | --dir | Input directory |
-o | --out | Output directory |
-s | --style | Extract styles |
-i | --image | Extract images |
-h | --help | Display help message |
-v | --version | Display package version |
- rembish - DOC, PPT and PDF converter (PHP)
- PolicyStat - DOCX converter (Python)
- python-excel - XLSX and XLS converter (Python)
- lvu - RTF converter (C++)
- adhocore - TXT/MD converter (PHP)
- ahupp - libmagic wrapper (Python)
If you have questions regarding the libraries, I would like to invite you to open an issue at Github. Please describe your request, problem, or question as detailed as possible, and also mention the version of the libraries you are using as well as the version of your compiler and operating system. Opening an issue at Github allows other users and contributors to this libraries to collaborate.
You're welcome! :)