Anomaly detection

Different subscriber behaviours, hardware setups, and software configurations in the Internet service product generate various patterns. A development team validates its functionality by strict evaluation process, such as test pass rates, trouble reports, or key performance indicators. However, absence of evidence is not an evidence of absence. It is hard to detect abnormality when unknown cases deviated from normal behaviours. The root cause for these cases might be due to unseen user behaviors, hardware malfunctions or software defects. We call this anomaly.

Anomaly detection refers to the problem of finding patterns in data that do not conform to the expected behaviour occurring on product. For example, we can identify unusual peaks and drops of traffic which are different from normal pattern. Since it is hard to conclude possible abnormal behaviours by rule-based system, a company commonly applies machine learning techniques to detect anomaly. It is important in computer system performance mornitoring perspective to detect anomalies quickly and automaticallly.

Note that anomaly detection is different from intrusion detection. While both of them aim to detect significant changes, intrusion detection aims to detect policay violations rather than possible product defects.

Contextual anomaly

Potential anomalies in continuous sequence can be divided into three different types.

  • Point anomaly, also known as an outlier, occurs when a data point is anomalous
  • Collective anomaly occurs when a group of data points are anomalous while each data point may not be an anomaly by itself
  • Contextual point or contextual collective anomaly occurs when a single data point is anomalous or collective data points are anomalous with regards to the context

While it is easier to discover point anomaly, it is much harder to detect contextual anomaly since we need to take the context into account in order to explain whether it is normal or abnormal. We, therefore, cannot simply judge the time-series plot by its shape if more than one contextual attributes are required to judge. A contextual anomaly detection algorithm is required to consider all the important varialbes that may explain suspicious behaviours.

Here are the more detailed reasons why detecting a contextual anomaly is hard:

  • It is hard to make a generic algorithm for contextual anomaly. Since the definition of abnormal behaviour is quite different from company to company, many anomaly detection algorithms have been developed for specific application to meet each company’s requirements.
  • The training data for anomaly detection system are very asymmetric since few and diverse anomalies are observed against a much larger set of normal cases. This makes it hard to detect an error when an unseen behaviour presents itself.
  • A metric that is too sensitive to differentiate will yield many false alarms. Similarly, a metric that is not sensitive enough will yield many missed detections.
  • A behaviour that can be considered as normal in one situation can be abnormal in other situations. Also, there may be more than one normal behaviour depending on domain, and each behaviour can be quite different depending on the context.

Bayesian approach

The general idea in Bayesian anomaly detection is to build a probablistic model over normal cases, and to compare new samples with trained model when they arrived. Samples that have small probabilities of being generated by the model are considred anomalies, that is, they are very unlikely to belong to a set of normal cases. If the number of potential indicators becomes large, it becomes hard to induce the best set of indicators In worst case, the number of regressors can be larger than the number of training samples, due to data sparsity.

cbar creates a probablistic model that finds the best indicators to define context and to conclude potential anomalies based on their context. This library depends bsts and Boom, which Steven L Scott developes.

  • It utilises the Bayesian approach to automatically find its best indicators among the total to reduce the dimensions of raw data.
  • It will learn a posterior predictive distribution from the indicators which shows the uncertainty of a point estimatefor normality to find anomalies regarding to its context
  • The proposed algorithm uses the top indicators that have higher inclusion probability to find the best \(k\) and to form a predictive posterior distribution as an approximate solution