PyABSA is a personal project which received many contributions from all the contributors. Please feel free to help make it developing, with regards for all the people who contribute to PyABSA. I am glad if PyABSA helps you, please star this repo as Each Star helps PyABSA go further, many thanks.
The repo ABSADatasets provides an open-source dataset annotating tool, you can easily annotate your dataset before using PyABSA.
- First, refer to ABSADatasets to prepare your dataset into acceptable format.
- You can PR to contribute your dataset and use it like
ABDADatasets.your_dataset
(All the datasets are for research only, shall not danger your data copyright)
Have no enough data to train your model, here are what you can do:
- Combine multiple datasets with your dataset to train your model
- Resume training from shared checkpoints, see train_based_on_checkpoint.py, train_atepc_based_on_checkpoint.py
PyABSA uses FindFile to locate the target file(s) so you can specify a dataset/checkpoint path by keywords instead of using absolute path. e.g.,
dataset = './laptop' # relative path
dataset = 'ABSOLUTE_PATH/laptop/' # absolute path
dataset = 'laptop' # dataset name, char-case un-sensitive
dataset = 'lapto' # search any path containing the 'lapto' or 'aptop' string
checkpoint = 'lcfs' # checkpoint path assignment is similar to above methods
Auto select the free cuda for training & inference PyABSA use the AutoCUDA to support automatic cuda assignment, but you can still set a preferred device.
auto_device = True # to auto assign a cuda device for training / inference
auto_device = False # to use cpu
auto_device = 'cuda:1' # to specify a preferred device
auto_device = 'cpu' # to specify a preferred device
auto_device = 'allcuda' # use all cuda to train
PyABSA encourages you to use string labels instead of numbers. e.g., sentiment labels = {negative, positive, Neutral, unknown}
- What labels you use in the dataset, what labels will be output in inference
- You can train a model using multiple datasets with same sentiment labels, and you can even contribute and define a combination of datasets here!
- The version information of PyABSA is also available in the output while loading checkpoints training args.
If you need to visualize the difference between the metrics, you can use MetricVisualizer. Here is an example of using MetricVisualizer to visualize the FAST_LCF_BERT metrics under different max_seq_lens.
import autocuda
import random
from metric_visualizer import MetricVisualizer
from pyabsa.functional import Trainer
from pyabsa.functional import APCConfigManager
from pyabsa.functional import ABSADatasetList
from pyabsa.functional import APCModelList
import warnings
from pyabsa import __version__
assert __version__ >= '1.8.20'
from metric_visualizer import __version__
assert __version__ >= '0.4.0'
device = autocuda.auto_cuda()
warnings.filterwarnings('ignore')
seeds = [random.randint(0, 10000) for _ in range(3)]
max_seq_lens = [60, 70, 80, 90, 100]
apc_config_english = APCConfigManager.get_apc_config_english()
apc_config_english.model = APCModelList.FAST_LCF_BERT
apc_config_english.lcf = 'cdw'
apc_config_english.max_seq_len = 80
apc_config_english.cache_dataset = False
apc_config_english.patience = 10
apc_config_english.seed = seeds
MV = MetricVisualizer()
apc_config_english.MV = MV
for eta in max_seq_lens:
apc_config_english.eta = eta
dataset = ABSADatasetList.Laptop14
Trainer(config=apc_config_english,
dataset=dataset, # train set and test set will be automatically detected
checkpoint_save_mode=0, # =None to avoid save model
auto_device=device # automatic choose CUDA or CPU
)
apc_config_english.MV.next_trial()
apc_config_english.MV.summary(save_path=None, xticks=max_seq_lens)
apc_config_english.MV.traj_plot_by_trial(save_path=None, xticks=max_seq_lens)
apc_config_english.MV.violin_plot_by_trial(save_path=None, xticks=max_seq_lens)
apc_config_english.MV.box_plot_by_trial(save_path=None, xticks=max_seq_lens)
save_path = '{}_{}'.format(apc_config_english.model_name, apc_config_english.dataset_name)
apc_config_english.MV.summary(save_path=save_path)
apc_config_english.MV.traj_plot_by_metric(save_path=save_path, xticks=max_seq_lens, xlabel=r'max_seq_len')
apc_config_english.MV.violin_plot_by_metric(save_path=save_path, xticks=max_seq_lens, xlabel=r'max_seq_len')
apc_config_english.MV.box_plot_by_metric(save_path=save_path, xticks=max_seq_lens, xlabel=r'max_seq_len')
The default SpaCy english model is en_core_web_sm, if you didn't install it, PyABSA will download/install it automatically.
If you would like to change english model (or other pre-defined options), you can get/set as following:
from pyabsa.functional.config.apc_config_manager import APCConfigManager
from pyabsa.functional.config.atepc_config_manager import ATEPCConfigManager
from pyabsa.functional.config.classification_config_manager import ClassificationConfigManager
# Set
APCConfigManager.set_apc_config_english({'spacy_model': 'en_core_web_lg'})
ATEPCConfigManager.set_atepc_config_english({'spacy_model': 'en_core_web_lg'})
ClassificationConfigManager.set_classification_config_english({'spacy_model': 'en_core_web_lg'})
# Get
APCConfigManager.get_apc_config_english()
ATEPCConfigManager.get_atepc_config_english()
ClassificationConfigManager.get_classification_config_english()
# Manually Set spaCy nlp Language object
from pyabsa.core.apc.dataset_utils.apc_utils import configure_spacy_model
nlp = configure_spacy_model(APCConfigManager.get_apc_config_english())
pyabsa | package root (including all interfaces) |
pyabsa.functional | recommend interface entry |
pyabsa.functional.checkpoint | checkpoint manager entry, inference model entry |
pyabsa.functional.dataset | datasets entry |
pyabsa.functional.config | predefined config manager |
pyabsa.functional.trainer | training module, every trainer return a inference model |
Please do not install the version without corresponding release note to avoid installing a test version.
To use PyABSA, install the latest version from pip or source code:
pip install -U pyabsa
git clone https://github.com/yangheng95/PyABSA --depth=1
cd PyABSA
python setup.py install
- Create a new python environment (Recommended) and install latest pyabsa
- Find a suitable demo script (ATEPC , APC , Text Classification) to prepare your training script. (Welcome to share your demo script)
- Format or Annotate your dataset referring to ABSADatasets or use public dataset in ABSADatasets
- Init your config to specify Model, Dataset, hyper-parameters
- Training your model and get checkpoints
- Share your checkpoint and dataset
PyABSA will check the latest available checkpoints before and load the latest checkpoint from Google Drive. To view available checkpoints, you can use the following code and load the checkpoint by name:
from pyabsa import available_checkpoints
checkpoint_map = available_checkpoints() # show available checkpoints of PyABSA of current version
If you can not access to Google Drive, you can download our checkpoints and load the unzipped checkpoint manually. 如果您无法访问谷歌Drive,您可以从此处 (提取码:ABSA) 下载我们预训练的模型,并加载模型(本仓库为个人业余项目,没有精力再维护百度云,如果您可以帮助管理国内checkpoint的保存和下载请联系我)。
More datasets are available at ABSADatasets.
- Laptop14
- Restaurant14
- Restaurant15
- Restaurant16
- Phone
- Car
- Camera
- Notebook
- MAMS
- TShirt
- Television
- MOOC
- Shampoo
- Multilingual (The sum of all datasets.)
You don't have to download the datasets, as the datasets will be downloaded automatically.
Except for the following models, we provide a template model involving LCF vec, you can develop your model based on the LCF-APC model template or LCF-ATEPC model template.
- LCF-ATEPC
- LCF-ATEPC-LARGE (Dual BERT)
- FAST-LCF-ATEPC
- LCFS-ATEPC
- LCFS-ATEPC-LARGE (Dual BERT)
- FAST-LCFS-ATEPC
- BERT-BASE
- SLIDE-LCF-BERT (Faster & Performs Better than LCF/LCFS-BERT)
- SLIDE-LCFS-BERT (Faster & Performs Better than LCF/LCFS-BERT)
- LCF-BERT (Reimplemented & Enhanced)
- LCFS-BERT (Reimplemented & Enhanced)
- FAST-LCF-BERT (Faster with slightly performance loss)
- FAST_LCFS-BERT (Faster with slightly performance loss)
- LCF-DUAL-BERT (Dual BERT)
- LCFS-DUAL-BERT (Dual BERT)
- BERT-BASE
- BERT-SPC
- LCA-Net
- DLCF-DCA-BERT *
- AOA_BERT
- ASGCN_BERT
- ATAE_LSTM_BERT
- Cabasc_BERT
- IAN_BERT
- LSTM_BERT
- MemNet_BERT
- MGAN_BERT
- RAM_BERT
- TD_LSTM_BERT
- TC_LSTM_BERT
- TNet_LF_BERT
We expect that you can help us improve this project, and your contributions are welcome. You can make a contribution in many ways, including:
- Share your custom dataset in PyABSA and ABSADatasets
- Integrates your models in PyABSA. (You can share your models whether it is or not based on PyABSA. if you are interested, we will help you)
- Raise a bug report while you use PyABSA or review the code (PyABSA is a individual project driven by enthusiasm so your help is needed)
- Give us some advice about feature design/refactor (You can advise to improve some feature)
- Correct/Rewrite some error-messages or code comment (The comments are not written by native english speaker, you can help us improve documents)
- Create an example script in a particular situation (Such as specify a SpaCy model, pretrained-bert type, some hyperparameters)
- Star this repository to keep it active
The LCF is a simple and adoptive mechanism proposed for ABSA. Many models based on LCF has been proposed and achieved SOTA performance. Developing your models based on LCF will significantly improve your ABSA models. If you are looking for the original proposal of local context focus, please redirect to the introduction of LCF. If you are looking for the original codes of the LCF-related papers, please redirect to LC-ABSA / LCF-ABSA or LCF-ATEPC.
This work build from LC-ABSA/LCF-ABSA and LCF-ATEPC, and other impressive works such as PyTorch-ABSA and LCFS-BERT.