::A cross-platform utility that converts PDF documents into Markdown-formatted text::
MacOS | Windows |
---|---|
PDFXtract is a simple yet powerful utility for converting PDF files to plain text with Markdown formatting. Whether you are an academic researcher, a data analyst, or just someone who needs to extract text from a PDF document, PDFXtract has got you covered.
- PDF to Markdown: Extracts text from a PDF and formats it in Markdown for easy reading and further editing.
- Multi-Page Support: Handles multi-page PDFs with ease.
- Clipboard Integration: Copy the entire text or a specific page directly to your clipboard.
- Markdown Preview: Preview how the Markdown-formatted text will look in a web browser.
- Cross-Platform: Available for both Windows and macOS.
- Download the
.exe
file from the Releases page. - Double-click the
.exe
file to install PDFXtract. - Follow the on-screen instructions to complete the installation.
- You can press here, too.
- Download the
.dmg
file from the Releases page. - Double-click the
.dmg
file to mount it. - Drag the
PDFXtract.app
into your Applications folder. - You can press here, too.
- Download the
src.py
andrequirements.txt
files. - Install the required packages using
pip install -r requirements.txt
. - Run the
src.py
file usingpython src.py
orpython3 src.py
.
- Open PDF File: Click the "Open PDF File" button to load your PDF.
- Convert: Click the "Convert" button to start the extraction process.
- Copy to Clipboard: If you want to copy the text to your clipboard, click the "Copy to Clipboard" button.
- Copy Specific Page: To copy a specific page, enter the page number and click the "Copy Specific Page" button.
- Preview Markdown: Click the "Preview Markdown" button to see how the Markdown-formatted text will appear in a web browser.
I welcome contributions from the community. Feel free to submit issues or create pull requests.
This project is licensed under the MIT License.