Blog 3

An understanding of how changes in technology impact on the way humans communicate, and the ethical issues that surround the changes Technology related to communication has advanced over time and…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Anomaly Detection Techniques

With all the use cases, one main question is:

While we could manually detect the anomalies, as our brains are really great at this, it is never the best approach. We cannot always have eyeballs on the charts all the time.

Despite reducing human dependency and notifying users about anomalies, there are some drawbacks of having false negatives, false positives.

This boils down to a considerably more robust method for anomaly detection which is using machine learning.

Trying to balance the imbalanced data will not always serve the purpose of anomaly detection. So we need various other methods…..

I am choosing 5 algorithms from multiple categories (Linear, Proximity-based, Probabilistic, Outlier Ensembles) to explain how they perform anomaly detection.

The ABOD algorithm is based on the angle variance among the difference vectors of data objects in the dataset. Thus, in comparison to pure distance-based methods, are the consequences of the ‘curse of dimensionality.’

The angle between the PX and the PY from point P is somewhat smaller than the angles of other points Q and R in the diagram above for the outlined point P. The corner of the most distant data points is less than the angle of the near data points. If you think more, the difference between the farthest data points (of all possible angles to the other data points) and closer data points would be smaller. Thus, the data point is treated as an outlier with less angle variation. Angles in large dimensions are more stable than distances. The angle and the distance between the point are separated during actual use so that the distance is also taken.

KNN is a monitored ML algorithm sometimes used in data science for classification problems (also regression issues sometimes). It is one of the most basic and commonly used algorithms with strong use cases.

The underlying principle is that identical observations are close together and outliers generally represent solo observations, that they are more distant from the cluster with similar observations.

Although KNN is a supervised ML algorithm, it takes an unsupervised approach completely based on the threshold value of distance when it comes to anomaly detection.

They are like every other tree ensemble form, based on decision trees. Partitions are generated in these trees, by first choosing a feature randomly and then selecting a split value randomly between the lowest and highest value of the chosen function.

A random function is first chosen to construct a branch of the tree. Next, a random split value is chosen (between min and maximum value). If this function has a lower value, then the chosen one follows the left branch, otherwise the right branch. This process takes place until the maximum depth is isolated or defined.
The outliers are essentially less common and distinct in terms of values than normal observations (they lie further away from the regular observations in the feature space).
This is why they can be found closer to the tree root by using such a random partitioning (shorter average path).

Histogram Based Outlier Score takes on the freedom of the function and measures the degree of deviations by histogram construction. A histogram can be calculated for each function, measured individually, and averaged at the end of multivariate anomaly detection.

A One-Class Vector Support Machine is an unsupervised learning algorithm trained on ‘normal’ data, the negative examples in our case. It understands the borders of these points and is thus capable of classifying certain points beyond the boundary as outliers, you guessed.
It is difficult to train an unsupervised learning algorithm and the One-Class SVM is no exception. The nu parameter can be the proportion of outliers that you intend to see, the gamma parameter smooths up the contour lines.

I am using this amazing library PyOD for anomaly detection. It is a comprehensive and scalable Python toolkit for outlier detection in multivariate data. All credits to Zhao, Y., Nasrullah, Z. and Li, Z., 2019. PyOD: A Python Toolbox for Scalable Outlier Detection. Journal of machine learning research (JMLR), 20(96), pp.1–7.

After installing and updating PyOD, it's time to install some packages and modules.

Next, I have the chosen 6 algorithms with some default parameters.

Now moving on to the model implementation.

Ohhhh!

readers are like 👆

I will consider it in my future work. Thank you for sticking around.

Add a comment

Related posts:

One of the Best Gifts to Give Your Kid Is Failure

By giving kids freedom to make decisions, we are not only teaching them responsibility, but teaching them how to succeed on their own. For a kindergartener: They forget to bring their doll to a play…

Smm panel for indian followers

Affordablesmm is the Main SMM Provider. It is the cheapest Indian SMM panel and #1 Top SMM Services Provider. THE BEST SMM PANEL & TOP SMM Reseller Panel RANK IN #1 SMM PANEL IN THE place since we…

Can creative thinking be optimised?

I am answering this question from my baseline of the thinking process. Experiences I have had about discovering my mind spaces and the thinking process. So the difference between creative thinking…