Telegram Article Scraper Bot

Code that allows a Telegram bot to scrape article links from a website using BeautifulSoup4 and sends it to your chat.

Features:

🎨 Customizable - You can pretty much try any publication site (some do block web scraping).
⚡ Quick & efficient - The scraper pulls articles very quickly and sends it directly to your chat.
📰 Article previews - Telegram does a great job displaying a preview of the article link in the chat.

Screenshots:

Instructions

Use a device that can run 24/7, like a Raspberry Pi.
Install Python and Telegram's Python Bot code on said device.
Use Telegram's BotFather to create a new bot. Put the API token in your Python file.
Find the publication site(s) you want to pull articles from and identify the article div class name. I suggest using a browser's inspection feature.
Edit the Python file to include the website, the article div class, and the href into the function to pull the article links.
Run the bot and see if you can scrape the URLs of the articles. (Use the 'test.py' file if you just want to test the web scraper first before entering anything into the Telegram bot code).
📚 👓 Enjoy!

Examples:

PC Gamer

async def pcgamer(update: Update, context: ContextTypes.DEFAULT_TYPE):
    await context.bot.send_message(chat_id=update.effective_chat.id, text="Let's go!")

    headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.85 Safari/537.36'}
    Web=requests.get("https://www.pcgamer.com/news/", headers=headers)

    soup=BeautifulSoup(Web.text, "html.parser")
    counter=0

    for link in soup.findAll('a', class_="article-link"):
        await context.bot.send_message(chat_id=update.effective_chat.id, text= link['href']) #send the href to the chat
        counter+=1

        if counter > 10: #Pull only 10 articles
            break

Reuters World News

async def reuters(update: Update, context: ContextTypes.DEFAULT_TYPE):
    await context.bot.send_message(chat_id=update.effective_chat.id, text="Here are the current events")

    headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.85 Safari/537.36'}
    Web=requests.get("https://www.reuters.com/world/", headers=headers)

    soup=BeautifulSoup(Web.text, "html.parser")
    counter=0

    for link in soup.findAll('a', class_="text__text__1FZLe text__dark-grey__3Ml43 text__inherit-font__1Y8w3 text__inherit-size__1DZJi link__underline_on_hover__2zGL4 media-story-card__heading__eqhp9"):
        site="https://www.reuters.com"
        await context.bot.send_message(chat_id=update.effective_chat.id, text= site+link['href']) #adding the href to the end of www.reuters.com
        counter+=1

        if counter > 10:
            break

Questions?

Feel free to post in the Discussions or Issues tabs.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
LICENSE		LICENSE
README.md		README.md
example.py		example.py
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Telegram Article Scraper Bot

Features:

Screenshots:

Instructions

Examples:

PC Gamer

Reuters World News

Questions?

About

Releases

Packages

Languages

License

brihuang99/telegram-article-scraper-bot

Folders and files

Latest commit

History

Repository files navigation

Telegram Article Scraper Bot

Features:

Screenshots:

Instructions

Examples:

PC Gamer

Reuters World News

Questions?

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages