Repository for the paper: Log Parsing with Prompt-based Few-shot Learning
Abstract: Logs generated by large-scale software systems provide crucial information for engineers to understand the system status and diagnose problems of the systems. Log parsing, which converts raw log messages into structured data, is the first step to enabling automated log analytics. Existing log parsers extract the common part as log templates using statistical features. However, these log parsers often fail to identify the correct templates and parameters because: 1) they often overlook the semantic meaning of log messages, and 2) they require domain-specific knowledge for different log datasets. To address the limitations of existing methods, in this paper, we propose LogPPT to capture the patterns of templates using prompt-based few-shot learning. LogPPT utilises a novel prompt tuning method to recognise keywords and parameters based on a few labelled log data. In addition, an adaptive random sampling algorithm is designed to select a small yet diverse training set. We have conducted extensive experiments on 16 public log datasets. The experimental results show that LogPPT is effective and efficient for log parsing.
LogPPT consists of the following components:
- Adaptive Random Sampling algorithm: A few-shot data sampling algorithm, which is used to select K labelled logs for training (K is small).
- Few-shot Data Sampling: An adaptive random sampling based method for selecting K labelled logs for training (K is small).
- Prompt-based Parsing: A module to tune a pre-trained language model using prompt tuning for log parsing
- Python 3.8
- torch
- transformers
- ...
To install all library:
$ pip install -r requirements.txt
To download the pre-trained language model:
$ cd pretrained_models/roberta-base
$ bash download.sh
$ python fewshot-sampling.py
dataset=Apache
shot=32
trf="datasets/${dataset}/${shot}shot/1.json"
tef="datasets/${dataset}/test.json"
python train.py --mode prompt-tuning --train_file ${trf} \
--validation_file ${tef} \
--model_name_or_path "./pretrained_models/roberta-base" \
--per_device_train_batch_size 8 \
--learning_rate 5e-5 \
--lr_scheduler_type polynomial \
--task_name log-parsing \
--num_warmup_steps 20 \
--max_train_steps 200 \
--log_file datasets/${dataset}/${dataset}_2k.log_structured.csv \
--shot $shot \
--dataset_name ${dataset} \
--task_output_dir "outputs"
The parsed logs (parsing results) are saved in the outputs
folder.
For the descriptions of all parameters, please use:
python train.py --help
python benchmark.py
Tools | References |
---|---|
AEL | [QSIC'08] Abstracting Execution Logs to Execution Events for Enterprise Applications, by Zhen Ming Jiang, Ahmed E. Hassan, Parminder Flora, Gilbert Hamann. [JSME'08] An Automated Approach for Abstracting Execution Logs to Execution Events, by Zhen Ming Jiang, Ahmed E. Hassan, Gilbert Hamann, Parminder Flora. |
LenMa | [CNSM'15] Length Matters: Clustering System Log Messages using Length of Words, by Keiichi Shima. |
Spell | [ICDM'16] Spell: Streaming Parsing of System Event Logs, by Min Du, Feifei Li. |
Drain | [ICWS'17] Drain: An Online Log Parsing Approach with Fixed Depth Tree, by Pinjia He, Jieming Zhu, Zibin Zheng, and Michael R. Lyu. |
Logram | [TSE'20] Logram: Efficient Log Parsing Using nn-Gram Dictionaries, by Hetong Dai, Heng Li, Che-Shao Chen, Weiyi Shang, and Tse-Hsun Chen. |
** Implementations for baselines are adopted from Tools and Benchmarks for Automated Log Parsing, and Guidelines for Assessing the Accuracy of Log Message Template Identification Techniques.
- Accuracy:
- Robustness:
Robustness across different log data types
Robustness across different numbers of training data
- Accuracy on Unseen Logs:
Running time of different log parsers under different volume
- We exclude the Virtual Label Token Generation module and let the pre-trained model automatically assign the embedding for the virtual label token “I-PAR”. To measure the contribution of the Adaptive Random Sampling module, we remove it from our model and randomly sample the log messages for labelling.
- We vary the number of label words from 1 to 16 used in the Virtual Label Token Generation module.
Results with different numbers of label words
We compare LogPPT with fine-tuning, hard-prompt, and soft-prompt.
- Effectiveness:
Accuracy across different tuning methods
- Efficiency:
Parsing time across different tuning methods
Additional results with PTA and RTA metrics
-
PTA: The ratio of correctly identified templates over the total number of identified templates.
-
RTA: The ratio of correctly identified templates over the total number of oracle templates.
Raw logs | Events |
---|---|
TEST 9/13884 [2/2 concurrent test workers running] | TEST <*> [<*> concurrent test workers running] |
(1.039 s) Test touch() function : basic functionality [ext/standard/tests/file/touch_basic.phpt] | <*> Test touch() function : basic functionality <*> |
(120.099 s) Bug #60120 (proc_open hangs when data in stdin/out/err is getting larger or equal to 2048) [ext/standard/tests/file/bug60120.phpt] | <*> Bug <*> (proc_open hangs when data in <*> is getting larger or equal to <*>) <*> |
SKIP Bug #54977 UTF-8 files and folder are not shown [ext/standard/tests/file/windows_mb_path/bug54977.phpt] reason: windows only test | SKIP Bug <*> UTF-8 files and folder are not shown <*> reason: windows only test |
Exts skipped : 17 | Exts skipped : <*> |