Piculet is a module for extracting data from XML or HTML documents using XPath queries. It consists of a single source file with no dependencies other than the standard library. If available, it will make use of the lxml package for improved performance and better XPath support.
Piculet is used for the parsers of the Cinemagoer project.
Piculet works with Python 3.8 and later versions.
You can install it using pip
:
pip install piculet
Installing Piculet creates a script named piculet
which can be used to invoke the command line interface:
$ piculet -h usage: piculet [-h] [--version] [--html] -s SPEC [document]
For example, say you want to extract some data from the file shining.html. An example specification is given in movie.json. Download both of these files and run the command:
$ piculet -s movie.json shining.html
The documentation is available on: https://piculet.readthedocs.io/
The source code can be obtained from: https://github.com/uyar/piculet
Copyright (C) 2014-2023 H. Turgut Uyar <uyar@tekir.org>
Piculet is released under the LGPL license, version 3 or later. Read the included LICENSE.txt file for details.