HPC is an open dataset of logs collected from System 20 of the high performance computing cluster at the Los Alamos National Laboratories. But the link (http://institutes.lanl.gov/data/fdata/) to the original data has been out of service. The log has been used for benchmarking log parsing methods in the following papers, where you may find more details about the usage of this dataset.
The raw logs are available for downloading at https://github.com/logpai/loghub.
If you use this dataset from loghub in your research, please cite the following papers.
- Adetokunbo Makanju, A. Nur Zincir-Heywood, Evangelos E. Milios. Clustering Event Logs Using Iterative Partitioning, in Proc. of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2009.
- Pinjia He, Jieming Zhu, Shilin He, Jian Li, Michael R. Lyu. An Evaluation Study on Log Parsing and Its Use in Log Mining, in Proc. of IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 2016.
- Jieming Zhu, Shilin He, Pinjia He, Jinyang Liu, Michael R. Lyu. Loghub: A Large Collection of System Log Datasets for AI-driven Log Analytics. IEEE International Symposium on Software Reliability Engineering (ISSRE), 2023.