Web scraping/data collection
Collect URLs from UK Parliament website for all House of Commons MPs and House of Lords Members.
A script developed to help our performance analyst run SEO/Performance checks in batches for all Bio pages (1427 URLs) using Lighthouse. Links to bio pages are collected in .csv file.
Pages parsed for URLs
NOTE: UK Parliament is getting a new website. New page structure means that this scraper will break and will need to be modified in the future
.csv file contains
- House name (Commons/Lords)
- Name of MP/Lords Member
- Link to Bio Page
Built with Python 3.6.4 and the following modules
- requests_html
- urllib
- re
- time
- datetime
- csv
Kostas Koutoupis (@kkoutoup) for the Web and Publications Unit (WPU) of the Chambers and Committee Office (CCT), House of Commons