Skip to content

A Python scraper that collects URLs from Parliament's website for MPs and Lords Members

Notifications You must be signed in to change notification settings

kkoutoup/All-MPs-and-Lords-Members-Bio-Page-Links

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 

Repository files navigation

All-MPs-and-Lords-Members-Bio-Page-Links

Category

Web scraping/data collection

Purpose

Collect URLs from UK Parliament website for all House of Commons MPs and House of Lords Members.

User needs

A script developed to help our performance analyst run SEO/Performance checks in batches for all Bio pages (1427 URLs) using Lighthouse. Links to bio pages are collected in .csv file.

Data collected

Pages parsed for URLs

NOTE: UK Parliament is getting a new website. New page structure means that this scraper will break and will need to be modified in the future

.csv file contains

  • House name (Commons/Lords)
  • Name of MP/Lords Member
  • Link to Bio Page

Dependencies

Built with Python 3.6.4 and the following modules

  • requests_html
  • urllib
  • re
  • time
  • datetime
  • csv

Developed by

Kostas Koutoupis (@kkoutoup) for the Web and Publications Unit (WPU) of the Chambers and Committee Office (CCT), House of Commons

About

A Python scraper that collects URLs from Parliament's website for MPs and Lords Members

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages