FENESTRA-Fake-News-Structure-and-Threat-Assessment

This Github includes the source code of our system, FENESTRA, an automated pipeline for the discovery and description of the narrative framework of conspiracy theories that circulate on social media and other forums.

Data

Bridgegate and Pizzagate data can be accessed from this link.

Dependencies

This work consists of two main components:

Relation Extraction pipeline: Uses Dependency Trees and Semantic Role Labeling Techniques to provide a high recall of Open Information Extraction tuples. This work is based on python 2.7.
Relationship Aggregation pipeline: Uses contextualized embeddings techniques to group entities/relationships. This work is based on python 3.

Note: In an on-going work, we are transferring all the implementations to python 3.

Installation -- First Component: Relation Extraction

Required python version: python 2.7

Step-1: Install required packages:

pip install -r requirements_python2.txt

Or alternatively install the following packages:

pip install networkx pandas matplotlib nxpd nltk pycorenlp enum34 tqdm

Step-1.1 download practNLPTools package, and run "sudo python setup.py install" in the downloaded folder. Step-1.2 Please use the NLTK Downloader to obtain the nltk-punkt resource which is used for sentence tokenizing.

python -m nltk.downloader punkt

Step-2: Download the Stanford CoreNLP package, unzip it, and run the following command inside the unzipped folder:

java -mx8g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000

Step-3: Modify the parameters in "relEx_parse_tree.py" according to your input data. Feel free to take a look at the settings we used for Bridgegate and Pizzagate experiments. For example you need to set the "INPUT_DELIMITER" parameter to be "," when you feed a csv file, and set it to "\n" when feeding a txt file which the inputs are separated by new lines.

Step-4: Run the following command to extract relationships:

python2.7 relEx_parse_tree.py "path_to_input_file" "output_directory"

Outputs

A csv file with sentences, relationships, relationship types, and more metadata (This is the main file which will be used in the next components) -- Output Name: INPUT_NAME_relations_MAX_ITERATION.csv
A txt file with ranking of the relationships (based on simple exact matching) -- Output Name: INPUT_NAME_top_rels_MAX_ITERATION.txt
A csv file with sentences and their annotations (to be used for ) -- Output Name: INPUT_NAME_sentences_and_annotations_MAX_ITERATION.csv
A txt file with pairwise relationships of selected entities (you can hand pick entities of interest by populating the "entity_versions" dictionary in "utility_functions.py") -- Output Name: INPUT_NAME_pairwise_rels_selected_MAX_ITERATION_DATASET.txt
A json file with pairwise relationships formatted to be visualized in d3 js library -- Output Name: INPUT_NAME_g_arg_selected_MAX_ITERATION_EndTimeTimestamp.json
A gexf file with pairwise relationships formatted to be visualized in Gephi -- Output Name: INPUT_NAME_g_arg_selected_MAX_ITERATION_EndTimestamp.gexf

Installation -- Second Component: Entity Extraction

Required python version: python 3.6+

Step-1: Install required packages:

pip3 install -r requirements_python3.txt

Step-2: Modify the parameters in "main.py". Information about the parameters are commented in the code.

Step-3: run the following command:

python3.6 main.py

Outputs

Ranking of Named entities with their type, average over the confidence score of their mentions
Ranking of the head words in the arguments
A final ranking which combines the above ranking
Assign a contextualized embedding (average over BERT embeddings of their mentions) for top N actants in the final entity ranking
Visualize top ranked entities in a 2D plot projected by PCA
Store embeddings (and their meta data) in csv files align with TensorBoard's input formats
Generate a time plot which visualizes how many entities are introduced to the story daily -- In this experiment we consider the fullname of entities (which follows the format of entity_first_name + space + entity_last_name) because first names and last names could be common among multiple persons, however, fullnames are most likely unique.
Provide a supplementary csv file for the time-plot which shows the entity names that got introduced daily, their associate post, and the sentence in which we detected their first presence.

Reference

Tangherlini, Timothy; Roychowdhury, Vwani; Shahsavari, Shadi; Shahbazi, Behnam; Ebrahimzadeh, Ehsan (2019), An Automated Pipeline for the Discovery of Conspiracy and Conspiracy Theory Narrative Frameworks -- processed data, v2, UC Los Angeles Dash, Dataset, https://doi.org/10.5068/D1V665

Notes

Note-1

If you run the relation extraction pipeline with the pronoun resolution task, and the output is empty, with no errors! Then, make sure you are running the StanfordCoreNLP server on port 9000.

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
code		code
entity_extraction		entity_extraction
graph_extraction		graph_extraction
relation_extraction		relation_extraction
supernodes_extraction		supernodes_extraction
.gitignore		.gitignore
README.md		README.md
flowchart.png		flowchart.png
requirements_python2.txt		requirements_python2.txt
requirements_python3.txt		requirements_python3.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FENESTRA-Fake-News-Structure-and-Threat-Assessment

Data

Dependencies

Installation -- First Component: Relation Extraction

Installation -- Second Component: Entity Extraction

Reference

Notes

About

Releases

Packages

Languages

leeshihao/FENESTRA-Fake-News-Structure-and-Threat-Assessment

Folders and files

Latest commit

History

Repository files navigation

FENESTRA-Fake-News-Structure-and-Threat-Assessment

Data

Dependencies

Installation -- First Component: Relation Extraction

Installation -- Second Component: Entity Extraction

Reference

Notes

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages