Semantic-aware intelligent log analytics

Le, Van-Hoang

Title: Semantic-aware intelligent log analytics
Creator: Le, Van-Hoang
Relation: University of Newcastle Research Higher Degree Thesis
Resource Type: thesis
Date: 2024
Description: Research Doctorate - Doctor of Philosophy (PhD)
Description: Large-scale software-intensive systems often produce a large volume of logs to record runtime status and events for troubleshooting purposes. Logs play an important role in the maintenance and operation of software systems, which allow engineers to better understand the system's behaviours and diagnose problems. The rich information included in log data enables a variety of software reliability management tasks, such as anomaly detection, root cause analysis, and failure prediction. As the scale and complexity of software systems increase, traditional log analytics approaches becomes time-consuming and error-prone due to the rapid growth of log data volume and the complexity of log data semantics. In this thesis, we propose intelligent approaches for semantic-aware log analytics to effectively utilize log data in software reliability management. Firstly, we conduct an empirical study on log-based anomaly detection with deep learning. Log-based anomaly detection plays a vital role in software reliability management. Recently, many deep learning models have been proposed to automatically detect system anomalies based on log data and achieve high detection accuracy. To achieve a profound understanding of how far we are from solving the problem of log-based anomaly detection, we conduct an in-depth analysis of five state-of-the-art deep learning-based models for detecting system anomalies. We obtain five insightful findings and make these methods open-source for easy reuse and further study. Secondly, we propose to a novel deep learning-based approach that detect system anomalies from raw log messages. Existing anomaly detection approaches require to convert raw log messages into structured data, which might be error-prone due to the semantic misunderstanding problem from log data. To tackle this challenge, we propose NeuralLog to extract the semantic meaning of raw log messages and represents them as semantic vectors. These representation vectors are then used to detect anomalies through a Transformer-based classification model, which can capture the contextual information from log sequences. Experimental results on four real-world datasets confirm the effectiveness of our proposed method. Thirdly, we propose a semantic-aware log parsing method powered by prompt-based few-shot learning. Log parsing, which extract log templates associated with dynamic parameters, is considered as the first step of many log-based reliability management methods. Existing log parsing methods extract the common part as log templates using statistical features and often fail to identify the correct templates and parameters because they often overlook the semantic meaning of log messages and require domain-specific knowledge for different log datasets. To address the limitations of existing methods, we propose LogPPT to capture the semantic information of log messages to identify log events and parameters based on a few labelled log data. Experimental results on 16 real-world datasets show that LogPPT is effective and efficient for log parsing. Fourthly, we propose to pre-train a language model with semantic awareness using heterogeneous log data to unify many log analytics tasks into a single framework through. Existing approaches for intelligent log analytics are specifically designed for a certain type of tasks and cannot generalise to other tasks. Therefore, we propose PreLog, a pre-trained model with contrastive learning. PreLog is pre-trained on a large amount of log data with two log-specific objective and is generalised to downstream tasks. Extensive experimental results show that PreLog achieves better or comparable results in comparison with state-of-the-art, task-specific methods. Finally, we explore the application of ChatGPT, the current cutting-edge large language model (LLM), to perform log parsing without model training. Experimental results show that ChatGPT can achieve good results for log parsing with appropriate prompts, especially with few-shot prompting. Our findings indicate that applying LLMs to log analytics is a promising direction. We outline several challenges and opportunities for LLMs-based log analytics as well as discuss the potential future works. In summary, this thesis targets the design of semantic-aware approaches toward intelligent log analytics. Comprehensive experiments on public datasets demonstrate the effectiveness of our proposed methods.
Subject: intelligent log analytics; semantic-aware; software systems; log data
Identifier: http://hdl.handle.net/1959.13/1512957
Identifier: uon:56685
Language: eng
Full Text

Hits: 1007
Visitors: 1023
Downloads: 29

		Thumbnail	File	Description	Size	Format
View Details Download			ATTACHMENT01	Thesis	5 MB	Adobe Acrobat PDF	View Details Download
View Details Download			ATTACHMENT02	Abstract	144 KB	Adobe Acrobat PDF	View Details Download