Python code to ranking daily reputation of public figure
Run the below script and the result will be in output.csv
python OccurenceSeeker.py > result.csv
python csv_tranpose.py result.csv
Add public figures' name under any file in items_list, format is each name for one row. Please note that the name is treated as both public figure ID and seeker. Therefore, the current script is not able to distinguish two people with same name and also not able to map name alias.
If you would like to run the program yourself, you should prepare a tsv
with the format as below:
<Days_diff_from_2015/01/01> <NEWS_TITLE> <NEWS_CONTENT>
e.g.,
363 【李國章入主】李國章任校委會主席拆局 燙手山芋無人願接 港大校委會近期風波不絕,主席一職更是自梁智鴻卸任後,已懸空兩個月之久。特首梁振英最終⋯⋯
The algorithm is going to illustrate the reputation based on the the following assumptions:
- Reputation is decaying everyday if there is no update of the public figure (forgetting curve)
- Public figure being reported recently is more reputable to others reported in the past
- Public figure with more news results in higher reputation
- Public figure with frequent news results in stabler reputation
- Public figure being stated at the beginning in the article is more important to one being stated at the end
- Public figure will remember others in the same article of itself
The algorithm is simply build a undirected graph that each vertex is representing a public figure and each edge is representing two public figures has appeared in the same news. The edge weight/distance is calculated according to the forgetting curve
(details stated in reference).
Based on the above graph, q-importance calculated the closeness centrality for each vertex as a reputation value. User can rank the value to illustrate the relative repuation ranking between public figures.
Edge weight is calculated as: