Skip to content

A scraper for the Freedom of Information portal of the Philippine government.

License

Notifications You must be signed in to change notification settings

pmagtulis/foi-ph-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

foi-ph-scraper

An automatic scraper for the Freedom of Information portal of the Philippine government.

Recent updates

column name definition
Sept 23 now scrapes every 3 days
Sept 12 updated the GitHub actions to fix Chromium problems

How to use this?

Follow the steps below. Each run of the autoscraper saves a new CSV file that contains 2,000 new entries from the FOI website. Since the website arranges requests by latest date, it essentially scrapes new requests everytime it runs, with some allowance for older ones for merging purposes.

Download each CSV file from the output directory, and combine them with the larger dataset found at foi-analysis repository for processing.

What is this?

This code scrapes requests data from the Philippines' Freedom of Information website. We use a Python code as well to automatically scrape the website for new information every three days.

The goal is to create a single database of these requests in a data frame and make some analysis out of them such as:

Which agency received the most number of requests?

How many requests had been denied/approved?

What type of requests are most common?

FOI background

Launched in 2016, the FOI website is in compliance with Executive Order No. 2, Series of 2016 by President Rodrigo Duterte that institutionalized freedom of information in the Executive branch of government.

The portal is meant to be a one-stop shop where filers can file a request to all National Government agencies, government-owned and -controlled corporations, as well state schools and colleges. The website allows the user to track developments of their request, chat with an FOI officer who handles the request, and get feedback from them.

The legislative and judiciary branches of government, as well as the central bank, do not participate on the FOI portal, and have their own mechanisms for public disclosures.

The process

  1. Using Selenium, we scrape 3,000 requests from the FOI website which contains a live list of requests. We specifically scrape the contents of the tab labeled "ALL REQUESTS." Below is a screenshot of that page and an example of how each requests data is structured:

Screen Shot 2022-01-15 at 6 31 50 PM

  1. We put all scraped information in a single data frame for processing through pandas.

  2. Raw scraped files are then saved into CSV format with a unique file name every Sunday.

  3. The .yml file contains an automated push of the scraped file to the foi-analysis repository under the output directory. For that part of the code to work, you would need to create a similar repository and directory, or you can just simply delete that part and just clone my repository analyzing the FOI requests data. Everything should be in there.

Definition of terms

The following information were scraped from the website:

column name definition
agency the name of the government agency where the request was submitted
date date when request was made through the FOI portal
status shows at which stage of the FOI process is the file request in. Examples are "SUCCESSFUL", "PARTIALLY SUCCESSFUL", "DENIED", etc.
date date when request was made through the FOI portal
period_covered a required information when filing a request meant to serve as filter for the extent of period covered by each request
purpose the purpose why the request is being made, typically indicating how the data will be used. This is required when filing an FOI request
link hyperlink to each FOI request, containing details and direct messages between the filer and agency concerned. Only available for data from December 7, 2021
reasons_denial a brief reason cited for a denial of request. Only available for data from 2016-December 31, 2021

Next steps

  1. After running this code and getting the latest requests information from the FOI website, they are stored in a single file, CSV format.

  2. Proceed to the next notebook titled "foi-analysis" for next steps: including merging and cleaning the data with older ones from another CSV file, courtesy of the PCOO, which manages the website. The PCOO said they sometimes older remove requests data from the FOI website at the request of the filer. It was not clear however if the agency also conducts regular cleaning of the portal to remove aging files.

Contact

Prinz Magtulis, ppm2130@columbia.edu

Comments and suggestions are always welcome! All rights reserved.

About

A scraper for the Freedom of Information portal of the Philippine government.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages