Reducing Redundancy in Coastal Management Using Natural Language Processing

A Python natural language processing program for identifying key words and phrases in conservation management plans.

The Driving Questions

Can we identify common themes / conservation measures among management plans?
Can we capture values and interests of plans’ authors?

Approach

The PdfScrape program processes PDFs hosted online into various analysis ready data, and performs some initial, exploratory visualizations of common words and their connections.

Project Status

Initial development focused on coastal management plans for the state of Washington.

Contained within this repository are:

A complied list of URLs for the management plans.
Various analysis ready versions of Fish & Wildlife Species Recovery Plans PDFs
Exploratory visualizations of common words and their connections (see below)
The PdfScrape program & explaining how to use the program for your own list of PDF URLs.

Example visualizations

Frequency plots of most common verbs and nouns

Pseudo-clustering

For user-specified key words, the pseudo-clustering plot shows the relationship between the word count for each key word (KeyCount) and the word count for the words surrounding / co-occurring with the key word (WordCount). Further details provided in the PdfScrape README

TSNE (t-distributed stochastic neighbor embedding)

Visually analyze text clustering patterns from the input PDFs

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
PdfScrape		PdfScrape
PdfScrapeTutorial-2021-05-27.mp4		PdfScrapeTutorial-2021-05-27.mp4
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reducing Redundancy in Coastal Management Using Natural Language Processing

The Driving Questions

Approach

Project Status

Example visualizations

Frequency plots of most common verbs and nouns

Pseudo-clustering

TSNE (t-distributed stochastic neighbor embedding)

About

Releases

Packages

Languages

CoPeCOMET/RRCMP-NLP

Folders and files

Latest commit

History

Repository files navigation

Reducing Redundancy in Coastal Management Using Natural Language Processing

The Driving Questions

Approach

Project Status

Example visualizations

Frequency plots of most common verbs and nouns

Pseudo-clustering

TSNE (t-distributed stochastic neighbor embedding)

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages