Skip to content

Latest commit

 

History

History
37 lines (24 loc) · 870 Bytes

README.md

File metadata and controls

37 lines (24 loc) · 870 Bytes

semanticClimate

Conversion of IPCC documents into semantic form

goals

  • to convert the IPCC documents from PDF into (a) HTML (b) XML
  • extract terms and exploire their use and meaning
  • link terms to Wikidata and create AMI-dictionaries
  • create new structiures for navigation, search, display

Content

Initially we will start with AR6 WGIII but move onto other WG's and perhaps look backwards as well.

Strategy

Download components, using a hierarchical naming scheme, and convert to text (pdf2txt)

semanticClimate pm286$ cd ipcc/ar6/wg3/
$ ls
Chapter01.pdf
$ mkdir Chapter01
$ cp Chapter01.pdf Chapter01/fulltext.pdf
$ cd Chapter01
$ pdf2txt.py -o fulltext.txt fulltext.pdf 
$ ls
fulltext.pdf	fulltext.txt