This Python script utilizes the BeautifulSoup library to scrape content from Wikipedia articles and then generates a summary of the scraped text.
- Install Required Libraries: Make sure you have the necessary libraries installed. If not, you can install them using pip:
pip install beautifulsoup4
pip install lxml
pip install nltk
- Run the Script: Run the Python script. It will prompt you to enter the topic you want to search on Wikipedia.
python main.py
- View the Summary: The script will then scrape the Wikipedia page related to the entered topic, summarize the content, and print the summary.
- Python 3
- BeautifulSoup (
bs4
) - lxml
- NLTK (Natural Language Toolkit)
-
Scraping Wikipedia: The script takes user input to search for a topic on Wikipedia. It then fetches the content from the Wikipedia page related to that topic using the
urllib
library. -
Parsing and Formatting: BeautifulSoup is used to parse the HTML content of the Wikipedia page. The script extracts the text from paragraph tags (
<p>
) and removes any unwanted characters and numbers. -
Summarization: The script tokenizes the article text into sentences and calculates the frequency of each word. It assigns scores to sentences based on the frequency of the words they contain. The sentences with the highest scores are selected to form the summary.
-
Displaying Summary: The summary, consisting of the top 7 most relevant sentences, is printed to the console.
If you were to search for the topic "Artificial Intelligence", the script would fetch the Wikipedia page for that topic, summarize its content, and print the summary.