Improve the readability of generated paraphrases by using Simple PPDB
This project use syntaxnet's trained model Parsey McParseface to generate pos tag. Then the program will replace target words in Simple PPDB with corresponding pos tag.
At last , the program will print average different readability score for both original and replaced sentence.
The readability calculation methods used are automatic_readability_index and flesch_kincaid_readability
The readability of generated paraphrases can be improved by replacing difficult words
But how to target the difficult words and replace them with words in correct form becomes a problem
.
└── ...
├── readability_cal.py # main program
├── uni_penny_mapping.txt # mapping file for converting between penny tree tagging schema and universal dependency
└── repl_loader.py # utils to preload dict
└── helper.py # utils
└── replaced.txt # simplified replace dict
└── beam_size_4_residual.txt # generated paraphrases
└── ...
-
First you need to download Simple PPDB and put it under project folder.
-
Then you should run
python repl_loader.py
to read simple ppdb and save it to a pickle file
-
python readability_cal.py
The program will replace words in
beam_size_4_residual.txt
according to the result of Parsey McParseface and will get new parse result after replacing target words.
automatic_readability_index:
* original
10.299129999999925
* replaced
10.192199999999964
flesch_kincaid_readability:
* original:
4.996375000000048
* replaced
4.9426620000000465
From the table above, we can see that the readability slightly increase (higher score means more difficult).