Sparse Structure Learning via Graph Neural Networks for inductive document classification

Figure 1. The architecture of TextSSL.

About data

We use the same benchmark datasets that are used in Yao, Mao, and Luo 2019, where we follow the same train/test splits and data preprocessing for MR, Ohsumed and 20NG datasets as Kim 2014; Yao, Mao, and Luo 2019. Thanks for their work.

For R8 and R52 datasets, they are only provided by a preprocessed version that lack punctuations and do not have explicit sample names. Since we use documents with sentence segmentation information to construct graph, we re-extract the data from original Reuters-21578 dataset.

You can download the dataset here:

re-extract R8 and R52 datasets.

python re-extract_data/mk_R8_R52.py --name R8

remove words.
```
python remove_words.py --name R8
```

About path

To run the code, you should change Your_path=/data/project/yinhuapark/ssl/ to your own path.

Make graph dataset

create co-occurrence pairs of each documents.

python ssl_make_graphs/create_cooc_document.py --name R8

construct graphs of each documents in InMemoryDatset.

python ssl_make_graphs/PygDocsGraphDataset.py --name R8

Train

python ssl_graphmodels/pyg_models/train_docs.py --name R8

Reference

If you find our paper and repo useful, please cite our paper:

@inproceedings{piao2022sparse,
  title={Sparse Structure Learning via Graph Neural Networks for Inductive Document Classification},
  author={Piao, Yinhua and Lee, Sangseon and Lee, Dohoon and Kim, Sun},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={36},
  number={10},
  pages={11165--11173},
  year={2022}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Sparse Structure Learning via Graph Neural Networks for inductive document classification

About data

About path

Make graph dataset

Train

Reference

The readme is inspired by GSAT.

Files

README.md

Latest commit

History

README.md

File metadata and controls

Sparse Structure Learning via Graph Neural Networks for inductive document classification

About data

About path

Make graph dataset

Train

Reference

The readme is inspired by GSAT.