Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
-
Updated
May 30, 2025 - Python
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
Parse markdown article, download images and replace images URL's with local paths
Reddit bot to preview and post hyperlinks as comments
NLP Web Service
Extract article or news by url or html, parse the title and content, output in markdown format.
디시인사이드 Client-Side 글 검색기 입니다.
📚 Сборник полезных штук из Natural Language Processing: Определение языка текста, Разделение текста на предложения, Получение основного содержимого из html документа
The program can be used to scrape the content from an article from web by an input of a set of URLs in a text file or a URL. This project uses newspaper3k and python-docx libraries. The output of this program will give a neatly modified Word Document in '.docx' format with the contents of the article.
Nebula Expired Article Hunter is a marketing tool you can use to get expired content from www.archive.org A.K.A. wayback machine, you could use this kind of content to grow up your blog with evergreen information, improve your marketing campaigns without investing in writing services, or whatever you imagine is useful for.
Simple HTTP API endpoint that takes URL to any article and returns JSON object containing information about the article.
A python script to scrap articles from Prothom Alo with the Headline, Category, URL, and Summary
Transform messy HTML from Google Docs into well-structured HTML!
The main goal of an AI-Powered News Summarizer is to assist users in quickly understanding the main points and essential information from a large volume of news articles or textual content. By automatically summarizing news articles, it saves time and effort by providing users with a brief overview without having to read the entire article.
Cortex AI: Multi-Model Insights Hub is an advanced platform that leverages cutting-edge AI to empower your research, analysis, and data exploration. By integrating multiple Large Language Models (LLMs) with a sophisticated Retrieve-and-Generate (RAG) system
Scrape Yılmaz Özdil articles and create Markov model to generate newspaper articles like Yılmaz Özdil. Turkish text dataset creator for data science and NLP projects.
A Simple Article Picker Simply it Scrapes the website http://mawdoo3.com and picks a random from it to show it you
Outil de scraping conçu pour extraire proprement le contenu d’articles en ligne (blogs, presse, publications). Il automatise la collecte de données textuelles, nettoie le contenu (suppression des balises, publicités, etc.), et permet un export structuré pour une analyse ultérieure (NLP, résumé, veille, etc.).
Arachnio client library for Python 3.10+
Universal scraping platform for news and comments
Add a description, image, and links to the article-extractor topic page so that developers can more easily learn about it.
To associate your repository with the article-extractor topic, visit your repo's landing page and select "manage topics."