PDFXtract

::A cross-platform utility that converts PDF documents into Markdown-formatted text::

MacOS	Windows

Description

PDFXtract is a simple yet powerful utility for converting PDF files to plain text with Markdown formatting. Whether you are an academic researcher, a data analyst, or just someone who needs to extract text from a PDF document, PDFXtract has got you covered.

Features

PDF to Markdown: Extracts text from a PDF and formats it in Markdown for easy reading and further editing.
Multi-Page Support: Handles multi-page PDFs with ease.
Clipboard Integration: Copy the entire text or a specific page directly to your clipboard.
Markdown Preview: Preview how the Markdown-formatted text will look in a web browser.
Cross-Platform: Available for both Windows and macOS.

Installation

For Windows Users

Download the .exe file from the Releases page.
Double-click the .exe file to install PDFXtract.
Follow the on-screen instructions to complete the installation.
You can press here, too.

For macOS Users

Download the .dmg file from the Releases page.
Double-click the .dmg file to mount it.
Drag the PDFXtract.app into your Applications folder.
You can press here, too.

Use `src.py` for your custom use

Download the src.py and requirements.txt files.
Install the required packages using pip install -r requirements.txt.
Run the src.py file using python src.py or python3 src.py.

How to Use

Open PDF File: Click the "Open PDF File" button to load your PDF.
Convert: Click the "Convert" button to start the extraction process.
Copy to Clipboard: If you want to copy the text to your clipboard, click the "Copy to Clipboard" button.
Copy Specific Page: To copy a specific page, enter the page number and click the "Copy Specific Page" button.
Preview Markdown: Click the "Preview Markdown" button to see how the Markdown-formatted text will appear in a web browser.

Contributing

I welcome contributions from the community. Feel free to submit issues or create pull requests.

License

This project is licensed under the MIT License.

Author

@wjgoarxiv

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
mac-preview.png		mac-preview.png
requirements.txt		requirements.txt
src.py		src.py
win-preview.png		win-preview.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDFXtract

Description

Features

Installation

For Windows Users

For macOS Users

Use `src.py` for your custom use

How to Use

Contributing

License

Author

About

Releases 1

Packages

Languages

License

wjgoarxiv/PDFXtract

Folders and files

Latest commit

History

Repository files navigation

PDFXtract

Description

Features

Installation

For Windows Users

For macOS Users

Use src.py for your custom use

How to Use

Contributing

License

Author

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Use `src.py` for your custom use

Packages