Skip to content

Commit

Permalink
Merge branch 'v2' into dev
Browse files Browse the repository at this point in the history
# Conflicts:
#	README.MD
#	examples-v2/aspect_polarity_classification/ensemble_inference.py
#	examples-v2/aspect_polarity_classification/train_apc.py
#	examples-v2/aspect_term_extraction/train_atepc.py
#	examples-v2/text_classification/train_text_classification.py
#	pyabsa/__init__.py
#	pyabsa/tasks/RNARegression/dataset_utils/__plm__/data_utils_for_inference.py
#	pyabsa/tasks/RNARegression/dataset_utils/__plm__/data_utils_for_training.py
#	pyabsa/tasks/_Archive/RNAClassification/dataset_utils/data_utils_for_training.py
  • Loading branch information
yangheng95 committed Dec 5, 2023
2 parents f5f8d05 + 5b5282b commit 0f6cca3
Show file tree
Hide file tree
Showing 101 changed files with 1,063 additions and 1,025 deletions.
44 changes: 24 additions & 20 deletions README.MD
Original file line number Diff line number Diff line change
@@ -1,45 +1,46 @@
# [PyABSA - Open Framework for Aspect-based Sentiment Analysis](https://arxiv.org/pdf/2208.01368)
# PyABSA - Open Framework for Aspect-based Sentiment Analysis ([paper](https://dl.acm.org/doi/abs/10.1145/3583780.3614752))

![PyPI - Python Version](https://img.shields.io/badge/python-3.8-gold.svg)
![PyPI - Python Version](https://img.shields.io/badge/python-3.6-blue.svg)
[![PyPI](https://img.shields.io/pypi/v/pyabsa)](https://pypi.org/project/pyabsa/)
[![Downloads](https://pepy.tech/badge/pyabsa)](https://pepy.tech/project/pyabsa)
[![Downloads](https://pepy.tech/badge/pyabsa/month)](https://pepy.tech/project/pyabsa)
![License](https://img.shields.io/pypi/l/pyabsa?logo=PyABSA)
[![Documentation Status](https://readthedocs.org/projects/pyabsa/badge/?version=v2)](https://pyabsa.readthedocs.io/en/latest/)
[![Documentation Status](https://readthedocs.org/projects/pyabsa/badge/?version=v2)](https://pyabsa.readthedocs.io/en/v2/?badge=v2)

[![total views](https://raw.githubusercontent.com/yangheng95/PyABSA/traffic/total_views.svg)](https://github.com/yangheng95/PyABSA/tree/traffic#-total-traffic-data-badge)
[![total views per week](https://raw.githubusercontent.com/yangheng95/PyABSA/traffic/total_views_per_week.svg)](https://github.com/yangheng95/PyABSA/tree/traffic#-total-traffic-data-badge)
[![total clones](https://raw.githubusercontent.com/yangheng95/PyABSA/traffic/total_clones.svg)](https://github.com/yangheng95/PyABSA/tree/traffic#-total-traffic-data-badge)
[![total clones per week](https://raw.githubusercontent.com/yangheng95/PyABSA/traffic/total_clones_per_week.svg)](https://github.com/yangheng95/PyABSA/tree/traffic#-total-traffic-data-badge)

[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/back-to-reality-leveraging-pattern-driven/aspect-based-sentiment-analysis-on-semeval)](https://paperswithcode.com/sota/aspect-based-sentiment-analysis-on-semeval?p=back-to-reality-leveraging-pattern-driven)

**Hi, there!** Please star this repo if it helps you! Each Star helps PyABSA go further, many thanks. PyABSA is a free
and open-source tool for everyone, but please do not forget to attach the (informal or formal) author
information and project address in your works, products and publications, etc.

## Try our demos on Huggingface Space

Apart from the [paper](https://arxiv.org/pdf/2208.01368), there are two new features in PyABSA: Aspect sentiment triplet extraction and Aspect quadruple extraction.
Apart from the [paper](https://arxiv.org/pdf/2208.01368), there are two new features in PyABSA: Aspect sentiment triplet
extraction and Aspect quadruple extraction.
We have deployed the demos on Huggingface Space, you can try them online.

- **[Aspect sentiment quadruple extraction](https://huggingface.co/spaces/yangheng/PyABSA) (English) New feature**
- **[Aspect sentiment triplet extraction](https://huggingface.co/spaces/yangheng/PyABSA) (English) New feature**
- [(Gradio) Aspect term extraction & sentiment classification](https://huggingface.co/spaces/Gradio-Blocks/Multilingual-Aspect-Based-Sentiment-Analysis) (
English, Chinese, Arabic, Dutch, French, Russian, Spanish, Turkish, etc.)
- [(Prototype) Aspect term extraction & sentiment classification](https://huggingface.co/spaces/yangheng/PyABSA-ATEPC) (
English,
Chinese, Arabic, Dutch, French, Russian, Spanish, Turkish, etc.)
- [方面术语提取和情感分类](https://huggingface.co/spaces/yangheng/PyABSA-ATEPC-Chinese) (中文, etc.)
- [Aspect-based sentiment classification (Multilingual)](https://huggingface.co/spaces/yangheng/PyABSA-APC) (English,
Chinese, etc.)

## Usage Examples

We have prepared many examples for different tasks. Please refer to [Examples](./examples-v2) for more usage examples.

## Installation

### Install Anaconda (Optional)
Install [Anaconda](https://www.anaconda.com/products/individual) if you do not have it.

### Create a new environment (Optional)
```bash
conda create -n pyabsa python=3.8
conda activate pyabsa
```

### Install Pytorch
Install Pytorch following the [official instructions](https://pytorch.org/get-started/locally/).
You need to select your environment, operating system, package, Python version, and CUDA version.

### install via pip

To use PyABSA, install the latest version from pip or source code:
Expand Down Expand Up @@ -89,6 +90,7 @@ print(atepc_result)


```

### Aspect-based sentiment analysis

```python3
Expand Down Expand Up @@ -119,7 +121,9 @@ apc_result = classifier.batch_predict(target_file=inference_source, #
print(apc_result)

```

## Dataset Annotation and Model Training

please refer to the documentation: [PyABSA Documentation](https://pyabsa.readthedocs.io/en/v2/).
If you have any questions about the docs, please feel free to raise an issue. Also, you can join to improve the docs.

Expand Down Expand Up @@ -157,11 +161,12 @@ If you are looking for the original proposal of local context focus, here are so
[here](https://github.com/yangheng95/PyABSA/tree/release/demos/documents).

## Citation

```bibtex
@article{YangL22,
author = {Heng Yang and
Ke Li},
title = {A Modularized Framework for Reproducible Aspect-based Sentiment Analysis},
title = {PyABSA: Open Framework for Aspect-based Sentiment Analysis},
journal = {CoRR},
volume = {abs/2208.01368},
year = {2022},
Expand All @@ -174,6 +179,7 @@ If you are looking for the original proposal of local context focus, here are so
bibsource = {dblp computer science bibliography, https://dblp.org}
}
```

## Contribution

This repository is developed and maintained by HENG YANG ([yangheng95@GitHub](https://github.com/yangheng95)),
Expand All @@ -193,8 +199,6 @@ many ways, including:
hyper-parameters)
- Star this repository to keep it active



## License

PyABSA is released under MIT licence, please cite this repo (or papers) or attach the author information in your work
Expand Down
Original file line number Diff line number Diff line change
@@ -1 +0,0 @@
{"2.0.0": {"APC": {"multilingual": {"id": "", "Training Model": "FAST-LSA-T-V2-Deberta", "Training Dataset": "APCDatasetList.Multilingual", "Language": "Multilingual", "Description": "Trained on RTX3090", "Available Version": "1.10.5+", "Checkpoint File": "fast_lcf_bert_Multilingual_acc_87.18_f1_83.11.zip", "Author": "H, Yang (hy345@exeter.ac.uk)"}, "multilingual2": {"id": "", "Training Model": "FAST-LSA-T-V2-Deberta", "Training Dataset": "APCDatasetList.Multilingual", "Language": "Multilingual", "Description": "Trained on RTX3090", "Available Version": "1.10.5+", "Checkpoint File": "fast_lcf_bert_Multilingual_acc_82.66_f1_82.06.zip", "Author": "H, Yang (hy345@exeter.ac.uk)"}, "english": {"id": "", "Training Model": "FAST-LSA-T-V2-Deberta", "Training Dataset": "APCDatasetList.English", "Language": "English", "Description": "Trained on RTX3090", "Available Version": "1.10.5+", "Checkpoint File": "fast_lsa_t_v2_English_acc_82.21_f1_81.81.zip", "Author": "H, Yang (hy345@exeter.ac.uk)"}, "chinese": {"id": "", "Training Model": "FAST-LSA-T-V2-Deberta", "Training Dataset": "APCDatasetList.Chinese", "Language": "Chinese", "Description": "Trained on RTX3090", "Available Version": "1.10.5+", "Checkpoint File": "fast_lsa_t_v2_Chinese_acc_96.0_f1_95.1.zip", "Author": "H, Yang (hy345@exeter.ac.uk)"}}, "ATEPC": {"multilingual": {"id": "", "Training Model": "FAST-LCF-ATEPC", "Training Dataset": "ABSADatasets.Multilingual", "Language": "Multilingual", "Description": "Trained on RTX3090", "Available Version": "1.16.0+", "Checkpoint File": "fast_lcf_atepc_Multilingual_cdw_apcacc_80.81_apcf1_73.75_atef1_76.01.zip", "Author": "H, Yang (hy345@exeter.ac.uk)"}, "multilingual2": {"id": "", "Training Model": "FAST-LCF-ATEPC", "Training Dataset": "ABSADatasets.Multilingual", "Language": "Multilingual", "Description": "Trained on RTX3090", "Available Version": "1.16.0+", "Checkpoint File": "fast_lcf_atepc_Multilingual_cdw_apcacc_78.08_apcf1_77.81_atef1_75.41.zip", "Author": "H, Yang (hy345@exeter.ac.uk)"}, "english": {"id": "", "Training Model": "FAST-LCF-ATEPC", "Training Dataset": "ATEPCDatasetList.English", "Language": "English", "Description": "Trained on RTX3090", "Available Version": "1.10.5+", "Checkpoint File": "fast_lcf_atepc_English_cdw_apcacc_82.36_apcf1_81.89_atef1_75.43.zip", "Author": "H, Yang (hy345@exeter.ac.uk)"}, "chinese": {"id": "", "Training Model": "FAST-LCF-ATEPC", "Training Dataset": "ATEPCDatasetList.Chinese", "Language": "Chinese", "Description": "Trained on RTX3090", "Available Version": "1.10.5+", "Checkpoint File": "fast_lcf_atepc_Chinese_cdw_apcacc_96.22_apcf1_95.32_atef1_78.73.zip", "Author": "H, Yang (hy345@exeter.ac.uk)"}}, "RNAC": {"degrad_lstm": {"id": "", "Training Model": "LSTM", "Training Dataset": "ABSADatasets.Multilingual", "Language": "RNA", "Description": "Trained on RTX3090", "Available Version": "1.16.0+", "Checkpoint File": "lstm_degrad_acc_85.26_f1_84.62.zip", "Author": "H, Yang (hy345@exeter.ac.uk)"}, "degrad_bert": {"id": "", "Training Model": "MLP", "Training Dataset": "Degrad", "Language": "RNA", "Description": "Trained on RTX3090", "Available Version": "1.16.0+", "Checkpoint File": "bert_mlp_degrad_acc_87.44_f1_86.99.zip", "Author": "H, Yang (hy345@exeter.ac.uk)"}}, "TAD": {"tad-sst2": {"id": "", "Training Model": "TAD", "Training Dataset": "SST2", "Language": "English", "Description": "Trained on RTX3090", "Available Version": "1.15+", "Checkpoint File": "TAD-SST2.zip", "Author": "H, Yang (yangheng@m.scnu.edu.cn)"}, "tad-agnews10k": {"id": "", "Training Model": "TAD", "Training Dataset": "AGNews", "Language": "English", "Description": "Trained on RTX3090", "Available Version": "1.15+", "Checkpoint File": "TAD-AGNews10K.zip", "Author": "H, Yang (yangheng@m.scnu.edu.cn)"}, "tad-amazon": {"id": "", "Training Model": "TAD", "Training Dataset": "AGNews", "Language": "English", "Description": "Trained on RTX3090", "Available Version": "1.15+", "Checkpoint File": "TAD-Amazon.zip", "Author": "H, Yang (yangheng@m.scnu.edu.cn)"}}, "CDD": {"promise": {"id": "", "Training Model": "CodeT5-small", "Training Dataset": "Promise", "Language": "Code", "Description": "Trained on RTX3090", "Available Version": "1.16.0+", "Checkpoint File": "bert_mlp_all_cpdp_acc_75.33_f1_73.52.zip", "Author": "H, Yang (hy345@exeter.ac.uk)"}}, "UPPERTASKCODE": {"promise": {"id": "", "Training Model": "CodeT5-small", "Training Dataset": "DatasetName", "Language": "", "Description": "Trained on RTX3090", "Available Version": "1.16.0+", "Checkpoint File": "lstm_degrad_acc_85.26_f1_84.62.zip", "Author": "H, Yang (hy345@exeter.ac.uk)"}}}}
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,13 @@ def ensemble_performance(dataset, print_result=False):
if __name__ == "__main__":
# Training the models before ensemble inference, take Laptop14 as an example

for dataset in [Laptop14, Restaurant14, Restaurant15, Restaurant16, MAMS]:
for dataset in [
APC.APCDatasetList.Laptop14,
APC.APCDatasetList.Restaurant14,
APC.APCDatasetList.Restaurant15,
APC.APCDatasetList.Restaurant16,
APC.APCDatasetList.MAMS
]:
# Training
pass
# Ensemble inference
Expand Down
10 changes: 5 additions & 5 deletions examples-v2/aspect_polarity_classification/train_apc.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,14 +31,14 @@
APC.APCDatasetList.MAMS,
]:
for model in [
APC.APCModelList.FAST_LSA_T_V2,
APC.APCModelList.FAST_LSA_S_V2,
# APC.APCModelList.FAST_LSA_T_V2,
# APC.APCModelList.FAST_LSA_S_V2,
APC.APCModelList.BERT_SPC_V2,
# APC.APCModelList.BERT_SPC
]:
for pretrained_bert in [
"microsoft/deberta-v3-base",
# "bert-base-uncased",
# "microsoft/deberta-v3-base",
"bert-base-uncased",
# 'roberta-base',
# 'microsoft/deberta-v3-large',
]:
Expand All @@ -47,7 +47,7 @@
config.pretrained_bert = pretrained_bert
# config.pretrained_bert = 'roberta-base'
config.evaluate_begin = 0
config.max_seq_len = 512
config.max_seq_len = 80
config.num_epoch = 30
# config.log_step = 5
config.log_step = -1
Expand Down
10 changes: 5 additions & 5 deletions examples-v2/aspect_term_extraction/train_atepc.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,8 @@
config.evaluate_begin = 0
config.max_seq_len = 128
config.batch_size = 16
# config.pretrained_bert = 'yangheng/deberta-v3-base-absa'
config.pretrained_bert = "microsoft/mdeberta-v3-base"
config.pretrained_bert = 'yangheng/deberta-v3-base-absa'
# config.pretrained_bert = "microsoft/mdeberta-v3-base"
config.log_step = -1
config.l2reg = 1e-8
config.num_epoch = 20
Expand All @@ -28,12 +28,12 @@
config.cache_dataset = True
config.cross_validate_fold = -1

# chinese_sets = ATEPC.ATEPCDatasetList.Chinese_Zhang
chinese_sets = ATEPC.ATEPCDatasetList.Multilingual
chinese_sets = ATEPC.ATEPCDatasetList.Chinese_Zhang
# chinese_sets = ATEPC.ATEPCDatasetList.Multilingual

aspect_extractor = ATEPC.ATEPCTrainer(
config=config,
# from_checkpoint="", # not necessary for most situations
from_checkpoint="english", # not necessary for most situations
dataset=chinese_sets,
checkpoint_save_mode=1,
auto_device=True,
Expand Down
4 changes: 2 additions & 2 deletions examples-v2/text_classification/train_text_classification.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
classification_config_english.num_epoch = 20
classification_config_english.batch_size = 16
classification_config_english.evaluate_begin = 0
classification_config_english.max_seq_len = 100
classification_config_english.max_seq_len = 512
classification_config_english.learning_rate = 5e-5
classification_config_english.dropout = 0
classification_config_english.seed = {42, 14, 5324}
Expand All @@ -22,7 +22,7 @@
# classification_config_english.use_amp = True
classification_config_english.cache_dataset = False

SST2 = "evoprompt"
SST2 = "sst2"
sent_classifier = TC.TCTrainer(
config=classification_config_english,
dataset=SST2,
Expand Down
2 changes: 1 addition & 1 deletion pyabsa/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
# Copyright (C) 2021. All Rights Reserved.

__name__ = "pyabsa"
__version__ = "2.3.4rc0"
__version__ = "2.3.4"


from pyabsa.utils.notification_utils.notification_utils import (
Expand Down
12 changes: 6 additions & 6 deletions pyabsa/tasks/ABSAInstruction/data_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,12 +22,12 @@

class InstructDatasetLoader:
def __init__(
self,
train_df_id,
test_df_id,
train_df_ood=None,
test_df_ood=None,
sample_size=1,
self,
train_df_id,
test_df_id,
train_df_ood=None,
test_df_ood=None,
sample_size=1,
):
self.train_df_id = train_df_id.sample(frac=sample_size, random_state=1999)
self.test_df_id = test_df_id
Expand Down
24 changes: 12 additions & 12 deletions pyabsa/tasks/ABSAInstruction/instruction.py
Original file line number Diff line number Diff line change
Expand Up @@ -85,10 +85,10 @@ def __init__(self, bos_instruction=None, eos_instruction=None):

def prepare_input(self, input_text, aspects):
return (
self.bos_instruction
+ input_text
+ f"The aspects are: {aspects}"
+ self.eos_instruction
self.bos_instruction
+ input_text
+ f"The aspects are: {aspects}"
+ self.eos_instruction
)


Expand Down Expand Up @@ -123,10 +123,10 @@ def __init__(self, bos_instruction=None, eos_instruction=None):

def prepare_input(self, input_text, aspects):
return (
self.bos_instruction
+ input_text
+ f"The aspects are: {aspects}"
+ self.eos_instruction
self.bos_instruction
+ input_text
+ f"The aspects are: {aspects}"
+ self.eos_instruction
)


Expand Down Expand Up @@ -161,8 +161,8 @@ def __init__(self, bos_instruction=None, eos_instruction=None):

def prepare_input(self, input_text, aspects):
return (
self.bos_instruction
+ input_text
+ f"The aspects are: {aspects}"
+ self.eos_instruction
self.bos_instruction
+ input_text
+ f"The aspects are: {aspects}"
+ self.eos_instruction
)
14 changes: 7 additions & 7 deletions pyabsa/tasks/ABSAInstruction/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -154,12 +154,12 @@ def predict(self, text, **kwargs):
return ensemble_result

def get_labels(
self,
tokenized_dataset,
trained_model_path=None,
predictor=None,
batch_size=4,
sample_set="train",
self,
tokenized_dataset,
trained_model_path=None,
predictor=None,
batch_size=4,
sample_set="train",
):
"""
Get the predictions from the trained model.
Expand Down Expand Up @@ -315,7 +315,7 @@ def train(self, tokenized_datasets, **kwargs):
return trainer

def get_labels(
self, tokenized_dataset, predictor=None, batch_size=4, sample_set="train"
self, tokenized_dataset, predictor=None, batch_size=4, sample_set="train"
):
"""
Get the predictions from the trained model.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ def syntax_distance_alignment(tokens, dist, max_seq_len, tokenizer):

text = text[1:]
dep_dist = dep_dist[1:]
bert_tokens = bert_tokens[len(tmp_tokens) :]
bert_tokens = bert_tokens[len(tmp_tokens):]
else:
text = text[1:]
bert_tokens = bert_tokens[1:]
Expand Down Expand Up @@ -109,7 +109,7 @@ def prepare_input_for_apc(config, tokenizer, text_left, text_right, aspect):

text_raw = text_left + " " + aspect + " " + text_right
text_spc = (
bos_token + " " + text_raw + " " + eos_token + " " + aspect + " " + eos_token
bos_token + " " + text_raw + " " + eos_token + " " + aspect + " " + eos_token
)
text_indices = text_to_sequence(tokenizer, text_spc, config.max_seq_len)
text_raw_bert_indices = text_to_sequence(
Expand Down Expand Up @@ -186,7 +186,7 @@ def get_syntax_distance(text_raw, aspect, tokenizer, config):


def get_lca_ids_and_cdm_vec(
config, bert_spc_indices, aspect_indices, aspect_begin, syntactical_dist=None
config, bert_spc_indices, aspect_indices, aspect_begin, syntactical_dist=None
):
SRD = config.SRD
cdm_vec = np.zeros((config.max_seq_len), dtype=np.int64)
Expand All @@ -206,7 +206,7 @@ def get_lca_ids_and_cdm_vec(


def get_cdw_vec(
config, bert_spc_indices, aspect_indices, aspect_begin, syntactical_dist=None
config, bert_spc_indices, aspect_indices, aspect_begin, syntactical_dist=None
):
SRD = config.SRD
cdw_vec = np.zeros((config.max_seq_len), dtype=np.float32)
Expand Down Expand Up @@ -246,15 +246,15 @@ def build_spc_mask_vec(config, text_ids):


def build_sentiment_window(
examples, tokenizer, similarity_threshold, input_demands=None
examples, tokenizer, similarity_threshold, input_demands=None
):
copy_side_aspect("left", examples[0], examples[0], examples, input_demands)
for idx in range(1, len(examples)):
if is_similar(
examples[idx - 1]["text_indices"],
examples[idx]["text_indices"],
tokenizer=None,
similarity_threshold=similarity_threshold,
examples[idx - 1]["text_indices"],
examples[idx]["text_indices"],
tokenizer=None,
similarity_threshold=similarity_threshold,
):
copy_side_aspect(
"right", examples[idx - 1], examples[idx], examples, input_demands
Expand Down
Loading

0 comments on commit 0f6cca3

Please sign in to comment.