Python scraper based on AI
-
Updated
Jul 3, 2025 - Python
Python scraper based on AI
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
A multithreaded 🕸️ web crawler that recursively crawls a website and creates a 🔽 markdown file for each page, designed for LLM RAG
Export Atlassian Confluence pages as markdown files.
Multimodal document parser for high quality data understanding and extraction
URL to Markdown API is a service that convert web content into clean, structured Markdown format through a simple HTTP GET request. It's built using FastAPI and the MarkItDown library, offering a straightforward way to convert various content types (web pages, YouTube videos, PDFs, documents) into Markdown that's optimized for Large Language Mod
✅ Parse your browser's exported HTML bookmark file to Markdown.
Python, Javascript, and Rust libraries for the Spider Cloud API.
Turn a supported list of filetypes (e.g. .docx) into a markdown structured text file. Also optionally defangs indicators and extract texts from images. Built for threat intel use-cases.
ScrapeGraphAI is a Python-based web-scraping framework that pairs large-language-model reasoning with a graph-style pipeline engine to turn websites (or local XML/HTML/JSON/Markdown files) into structured data with just a handful of lines of code.
Python script to convert Google Keep HTML note exports into Markdown (.md) files suitable for importing into Joplin.
Outillage d'extraction du contenu de l'ancien site de Geotribu (web scraping, conversion en markdown...)
a cli tool to fetch webpages main content and print it as markdown
A simplified online encyclopedia with Markdown-formatted entries. Powered by Django.
website scraper for text with conversion to markdown.md and directory structuring
Convert HTML to Discord's Markdown-formatted text.
Leverage Reader-LM's capabilities using LitServe.
HTML to Markdown Converter: Convert your HTML files to Markdown in just few steps.
Let's do web scrapping from codewars and bring all the solution codes along with their README at once
Add a description, image, and links to the html-to-markdown topic page so that developers can more easily learn about it.
To associate your repository with the html-to-markdown topic, visit your repo's landing page and select "manage topics."