Python XPath and CSS Selector Tutorial

XPath is a query language used for selecting nodes in an XML or HTML document, while CSS selectors are used for similar purposes within HTML documents. This tutorial covers how to use both XPath and CSS selectors in Python using lxml for XPath and BeautifulSoup for CSS selectors.

Prerequisites

Python 3.x
lxml library for XPath (install via pip: pip install lxml)
BeautifulSoup library for CSS selectors (install via pip: pip install beautifulsoup4)

Setup and Installation

Ensure Python and pip are installed on your system. Install the required libraries using pip:

pip install lxml beautifulsoup4

Usage

Using CSS Selectors with BeautifulSoup

Import the BeautifulSoup library and parse an HTML document:

from bs4 import BeautifulSoup

# Open and parse the HTML file
with open('index.html', 'r') as file:
    soup = BeautifulSoup(file, 'html.parser')

Use the select method to find elements using CSS selector expressions:

# Select all elements with the class "header"
headers = soup.select(".header")

# Select the first element with the id "title"
title = soup.select_one("#title")

# Select all paragraphs inside elements with the class "main"
paragraphs = soup.select(".main > p")

Print out the selected elements:

# Print the text of each header element
for header in headers:
    print(header.text)

# Print the text of the title element
print(title.text)

# Print the text of each paragraph
for paragraph in paragraphs:
    print(paragraph.text)

Using XPath with lxml

Import the lxml library and parse an HTML document:

from lxml import etree

# Parse the HTML file
tree = etree.parse('index.html')

Use XPath expressions to find elements:

# Select all elements with the class "header"
headers = tree.xpath('//*[contains(@class, "header")]')

# Select the first element with the id "title"
title = tree.xpath('//*[@id="title"][1]')

# Select all paragraphs inside elements with the class "main"
paragraphs = tree.xpath('//div[contains(@class, "main")]//p')

Print out the selected elements:

for header in headers:
    print(header.text)

for paragraph in paragraphs:
    print(paragraph.text)

Conclusion

XPath and CSS selectors are powerful tools for navigating and processing HTML and XML documents in Python. With the help of lxml and BeautifulSoup, you can easily select and manipulate elements based on their attributes and structure in the document. Contributing

Feel free to contribute to this tutorial by providing additional examples, corrections, or enhancements.

Thank you for your support!

If you appreciate my work, please consider becoming a 'Sponsor', giving a ⭐ to my projects, or following me.

Credits

License

This project is licensed under the MIT - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.github		.github
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Repository files navigation

Python XPath and CSS Selector Tutorial

Prerequisites

Setup and Installation

Usage

Using XPath with lxml

Conclusion

Thank you for your support!

Credits

License

About

Uh oh!

Sponsor this project

Uh oh!

Uh oh!

License

VolkanSah/Python-XPath-Tutorial

Folders and files

Latest commit

History

Repository files navigation

Python XPath and CSS Selector Tutorial

Prerequisites

Setup and Installation

Usage

Using XPath with lxml

Conclusion

Thank you for your support!

Credits

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Sponsor this project

Uh oh!