-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Programming question]: docx2md #106
Comments
This message was generated by git-bob (version: 0.14.1dev, model: claude-3-5-sonnet-20241022, log), an experimental AI-based assistant. It can make mistakes and has limitations. Check its messages carefully. Hi! I'm git-bob, happy to help with document conversion. Here's a notebook showing how to convert between docx and markdown using import jupytext
# create notebook
notebook = jupytext.writes("""
# Document conversion functions
import pypandoc
from docx import Document
import os
def docx_to_markdown(docx_path, output_path):
"""Convert docx to markdown"""
pypandoc.convert_file(docx_path, 'md', outputfile=output_path)
def markdown_to_docx(markdown_path, output_path):
"""Convert markdown to docx"""
pypandoc.convert_file(markdown_path, 'docx', outputfile=output_path)
# Example usage:
if __name__ == "__main__":
# Convert docx to markdown
docx_to_markdown("example.docx", "output.md")
# Convert markdown to docx
markdown_to_docx("example.md", "output.docx")
""", fmt='py:percent')
print(notebook) This requires installing:
You'll also need to have pandoc installed on your system. On Ubuntu/Debian:
Let me know if you need any adjustments! |
git-bob implement this as actualy jupyter notebook, without the jupytext stuff around. Use the file "test.docx" as example input and save the result as "test.md". |
For the record: I implemented this without pandoc, and without reusing pre-existing libs to have a permissively licensed, python-only implementation for simple docx files: https://github.com/haesleinhuepf/docx2markdown/ |
I need a python function that converts a docx document to markdown and an python functions that converts a markdown file to docx.
git-bob think about this
The text was updated successfully, but these errors were encountered: