forked from Mini-Conf/Mini-Conf
-
Notifications
You must be signed in to change notification settings - Fork 12
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Create paper projections for paper similarity graph (#63)
* Add papers.csv creator and UMAP projection in scripts/reduce.py * Reformat code * Add similar papers in poster page * Add guide to produce similar paper recommendations * Reformat main.py * make image_path configurable * format create_papers_csv * Update with latest papers.csv and simplify code * Remove unused imports * Ignore typecheck for openreview and umap-learn * Modify poster.html to get correct id field * refactor templates/poster.html * Update README.md * Update README.recommendations.md Co-authored-by: Hao Fang <haofang1990@gmail.com>
- Loading branch information
Showing
7 changed files
with
139 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
# How to get similar paper recommendations | ||
|
||
In this guide we can see how to get paper recommendations using the pretrained model provided | ||
from [ICLR webpage](https://github.com/ICLR/iclr.github.io/tree/master/recommendations) and abstract embeddings. | ||
|
||
|
||
|
||
## Create a visualization based on BERT embeddings | ||
|
||
1. Grab ACL2020 | ||
[papers.csv](https://github.com/acl-org/acl-2020-virtual-conference-sitedata/blob/add_acl2020_accepted_papers_tsv/papers.csv) | ||
from this branch or a more recent version and copy it to `sitedata_acl2020`. | ||
2. Run `python scripts/embeddings.py sitedata_acl2020/papers.csv` to produce the BERT embeddings | ||
for the paper abstracts. | ||
3. Run `python scripts/reduce.py --projection-method [tsne|umap] sitedata_acl2020/papers.csv embeddings.torch > sitedata_acl2020/papers_projection.json` | ||
to produce a 2D projection of the BERT embeddings for visualization. `--projection-method` | ||
selects which dimensionality reduction technique to use. | ||
4. Rerun `make run` and go to the paper visualization page | ||
|
||
|
||
## Produce similar paper recommendations | ||
|
||
1. Run `python scripts/create_recommendations_pickle.py --inp sitedata_acl2020/papers.csv --out cached_or.pkl` to produce `cached_or.pkl`. | ||
This file is compatible with the inference scripts provided in [https://github.com/ICLR/iclr.github.io/tree/master/recommendations](https://github.com/ICLR/iclr.github.io/tree/master/recommendations) | ||
2. Clone [https://github.com/ICLR/iclr.github.io](https://github.com/ICLR/iclr.github.io). You will | ||
need `git-lfs` installed. | ||
3. `cp cached_or.pkl iclr.github.io && cd iclr.github.io/recommendations` | ||
4. Install missing requirements | ||
5. `python recs.py`. This will run inference using a pretrained similarity model and produce the | ||
`rec.pkl` file that contains the paper similarities. | ||
6. You can use the `iclr.github.io/data/pkl_to_json.py` script to produce the `paper_recs.json` | ||
file that contains the similar paper recommendations that can be displayed to the website. Make | ||
sure to modify the filepaths to point to the correct `cached_or.pkl`, `rec.pkl`. | ||
7. Grab the produced `paper_recs.json` file and copy it to `sitedata_acl2020`. A version of this file | ||
produced using this method is [here](https://github.com/acl-org/acl-2020-virtual-conference-sitedata/blob/add_acl2020_accepted_papers_tsv/paper_recs.json) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
import argparse | ||
import csv | ||
import pickle | ||
|
||
import openreview # type: ignore | ||
|
||
# No type hints for openreview-py package. Ignore mypy | ||
|
||
|
||
def read_entries(papers_csv): | ||
with open(papers_csv, "r") as fd: | ||
entries = list(csv.reader(fd, skipinitialspace=True)) | ||
entries = entries[1:] # skip header | ||
|
||
return entries | ||
|
||
|
||
def dump_cached_or(entries, out_pickle): | ||
cached_or = {} | ||
for entry in entries: | ||
cached_or[entry[0]] = openreview.Note( # id | ||
"", [], [], [], {"abstract": entry[3], "title": entry[1]} | ||
) # Hack. ICLR Recommender script accepts Openreview notes | ||
|
||
with open(out_pickle, "wb") as fd: | ||
pickle.dump(cached_or, fd) | ||
|
||
|
||
def parse_args(): | ||
parser = argparse.ArgumentParser( | ||
description="Convert CSV from original ACL format to Miniconf " | ||
"compatible format" | ||
) | ||
parser.add_argument("--inp", type=str, help="papers.csv") | ||
parser.add_argument( | ||
"--out", | ||
type=str, | ||
help="Dump entries into a pickle compatible with " "ICLR Recommendation engine", | ||
) | ||
return parser.parse_args() | ||
|
||
|
||
def main(): | ||
args = parse_args() | ||
entries = read_entries(args.inp) | ||
dump_cached_or(entries, args.out) | ||
|
||
|
||
if __name__ == "__main__": | ||
main() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,6 @@ | ||
transformers | ||
sklearn | ||
umap-learn | ||
openreview-py | ||
torch==1.4.0 | ||
ics |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters