|
Modeling Features
Modeling is the process of condensing the raw data into a terse concise model. SensorMINER utilizes several algorithms to perform this including rule induction, path modeling, and box modeling. Each method has its own advantages and disadvantages producing specific output. Regardless of the modeling method used, the common goal is to build the necessary files required by the anomaly detection engine.
Gecko Modeling
This type of modeling is based on the fact that the normal operation of a device can be characterized in different temporal states. Gecko uses a clustering algorithm that can determine a reasonable number of clusters. We then use a rule induction algorithm to describe these states in logical rules. Finally,
transitional logic between the states is added to create a finite state automaton. Our empirical results, on data obtained from the NASA shuttle program, indicate that the
Gecko clustering algorithm is comparable to a human expert in identifying states and our overall system can track normal behavior and detect anomalies.
Path Modeling
In path modeling a series is modeled by the path formed in 3d-space traced by the data and either its first two derivatives, or two low-pass filtered copies of the data, scaled to fit a unit cube. Each point on a test trace has an error defined as the square of the Euclidian distance to the nearest point on the modeled path. Path models support parameterized models (a set of points defining a piecewise linear or spline approximation), and have straightforward extensions to multiple training cycles and multidimensional data.
Box Modeling
A
multidimensional time series is modeled by approximating its path through
state phase space with a sequence of boxes. A test point is anomalous if
it is outside of any box, with a score depending on the distance from the
nearest box. The method may be generalized to multiple training series by
merging paths with no additional cost in testing time.
|