Please download the datasets from DatasetsForTHLM , and put it into ./Data
Example of training THLM on Patents dataset:
python main.py --dataset_name Patents
Obtain node embeddings for Patents, GoodReads and OAG_Venue in ./Downstream/preprocess_data
Example of obtaining node embeddings for Patents:
python Patent_features.py
-
Link Prediction for OAG_Venue:
./Downstream/Link-Train-OAG
-
Link Prediction for Patents/GoodReads:
./Downstream/Link-Train-Patent
-
Node Classification for OAG_Venue:
./Downstream/train-OAG
-
Node Classification for Patents/GoodReads:
./Downstream/train-Patent
We also provide the pre-trained language models on these three datasets at HuggingFace.
- PyTorch 2.0.0
- transformers 4.23.1
- dgl 0.9.1
- tqdm
- numpy