Ask Implementing Anomaly Detection with PyOD

Etsytut04 · Nov 25, 2023

## Thực hiện phát hiện bất thường với PYOD

Phát hiện bất thường là nhiệm vụ xác định các điểm dữ liệu khác biệt đáng kể so với phần còn lại của dữ liệu.Điều này có thể hữu ích để xác định các ngoại lệ, gian lận và các sự kiện bất thường khác.Pyod là một thư viện Python để phát hiện bất thường không giám sát.Nó cung cấp một loạt các thuật toán cho các loại dữ liệu khác nhau, bao gồm cả dữ liệu đơn biến và đa biến.

Trong hướng dẫn này, chúng tôi sẽ chỉ cho bạn cách sử dụng PYOD để thực hiện phát hiện bất thường trên bộ dữ liệu trong thế giới thực.Chúng tôi sẽ sử dụng [Bộ dữ liệu KDD Cup 99] (https://www.kdd.org/kdd-cup/1999/), chứa dữ liệu về lưu lượng mạng.Chúng tôi sẽ đào tạo một mô hình PYOD trên dữ liệu thông thường và sau đó sử dụng nó để phát hiện sự bất thường trong dữ liệu thử nghiệm.

### 1. Cài đặt Pyod

Bước đầu tiên là cài đặt Pyod.Bạn có thể làm điều này bằng cách sử dụng PIP:

`` `
PIP cài đặt Pyod
`` `

### 2. Tải dữ liệu

Bước tiếp theo là tải dữ liệu.Chúng tôi sẽ sử dụng [Bộ dữ liệu KDD Cup 99] (https://www.kdd.org/kdd-cup/1999/).Bộ dữ liệu này chứa dữ liệu về lưu lượng mạng.Chúng tôi sẽ tải dữ liệu vào DataFrame Pandas:

`` `
nhập khẩu gấu trúc dưới dạng PD

df = pd.read_csv ("kddcup99.csv")
`` `

### 3. Đào tạo mô hình

Bước tiếp theo là đào tạo một mô hình Pyod.Chúng tôi sẽ sử dụng [SVM một lớp] (https://scikit-learn.org/stable/modules/svm.html#one-class-svm).Thuật toán này là một thuật toán phát hiện dị thường đơn giản nhưng hiệu quả.

Để đào tạo mô hình, trước tiên chúng tôi cần chia dữ liệu thành một tập huấn luyện và tập kiểm tra.Chúng tôi sẽ sử dụng 80% dữ liệu đầu tiên làm tập huấn luyện và 20% còn lại làm bộ kiểm tra.

`` `
từ sklearn.model_selection nhập khẩu troed_test_split

X_train, x_test = Train_test_split (df, test_size = 0.2)
`` `

Bây giờ chúng tôi có thể đào tạo mô hình trên bộ đào tạo:

`` `
từ pyod.models.oneclasssvm nhập oneclasssvm

model = oneclassSVM ()
model.fit (x_train)
`` `

### 4. Phát hiện sự bất thường

Bước tiếp theo là phát hiện sự bất thường trong dữ liệu thử nghiệm.Chúng ta có thể làm điều này bằng cách sử dụng phương thức `dự đoán ()` của mô hình:

`` `
y_pred = model.predict (x_test)
`` `

Mảng `y_pred` chứa các nhãn dự đoán cho dữ liệu thử nghiệm.Các nhãn là 0 cho dữ liệu bình thường và 1 cho dữ liệu dị thường.

### 5. Hình dung kết quả

Chúng ta có thể trực quan hóa các kết quả bằng cách sử dụng [biểu đồ phân tán] (https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.scatter.html).Trong biểu đồ phân tán, trục X đại diện cho tính năng đầu tiên của dữ liệu và trục y đại diện cho tính năng thứ hai.Các điểm có màu đỏ nếu chúng dị thường và màu xanh nếu chúng bình thường.

`` `
Nhập matplotlib.pyplot như PLT

plt.scatter (x_test [:, 0], x_test [:, 1], c = y_pred)
plt.show ()
`` `

Biểu đồ phân tán cho thấy các điểm dị thường được tách biệt rõ ràng với các điểm bình thường.Điều này có nghĩa là mô hình có thể phát hiện thành công sự bất thường trong dữ liệu.

### 6. Kết luận

Trong hướng dẫn này, chúng tôi đã chỉ cho bạn cách sử dụng PYOD để thực hiện phát hiện bất thường trên bộ dữ liệu trong thế giới thực.Chúng tôi đã sử dụng [bộ dữ liệu KDD Cup 99] (https://www.kdd.org/kdd-cup/1999/) và đào tạo mô hình SVM một lớp trên dữ liệu thông thường.Sau đó, chúng tôi đã sử dụng mô hình để phát hiện sự bất thường trong dữ liệu thử nghiệm.Kết quả cho thấy mô hình có thể phát hiện thành công sự bất thường trong dữ liệu.

### hashtags
=======================================
## Implementing Anomaly Detection with PyOD

Anomaly detection is the task of identifying data points that are significantly different from the rest of the data. This can be useful for identifying outliers, fraud, and other unusual events. PyOD is a Python library for unsupervised anomaly detection. It provides a variety of algorithms for different types of data, including both univariate and multivariate data.

In this tutorial, we will show you how to use PyOD to implement anomaly detection on a real-world dataset. We will use the [KDD Cup 99 dataset](https://www.kdd.org/kdd-cup/1999/), which contains data about network traffic. We will train a PyOD model on the normal data and then use it to detect anomalies in the test data.

### 1. Installing PyOD

The first step is to install PyOD. You can do this using pip:

```
pip install pyod
```

### 2. Loading the data

The next step is to load the data. We will use the [KDD Cup 99 dataset](https://www.kdd.org/kdd-cup/1999/). This dataset contains data about network traffic. We will load the data into a pandas DataFrame:

```
import pandas as pd

df = pd.read_csv("kddcup99.csv")
```

### 3. Training the model

The next step is to train a PyOD model. We will use the [One-Class SVM](https://scikit-learn.org/stable/modules/svm.html#one-class-svm) algorithm. This algorithm is a simple but effective anomaly detection algorithm.

To train the model, we first need to split the data into a training set and a test set. We will use the first 80% of the data as the training set and the remaining 20% as the test set.

```
from sklearn.model_selection import train_test_split

X_train, X_test = train_test_split(df, test_size=0.2)
```

We can now train the model on the training set:

```
from pyod.models.oneclasssvm import OneClassSVM

model = OneClassSVM()
model.fit(X_train)
```

### 4. Detecting anomalies

The next step is to detect anomalies in the test data. We can do this using the `predict()` method of the model:

```
y_pred = model.predict(X_test)
```

The `y_pred` array contains the predicted labels for the test data. The labels are 0 for normal data and 1 for anomalous data.

### 5. Visualizing the results

We can visualize the results using a [scatter plot](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.scatter.html). In the scatter plot, the x-axis represents the first feature of the data and the y-axis represents the second feature. The points are colored red if they are anomalous and blue if they are normal.

```
import matplotlib.pyplot as plt

plt.scatter(X_test[:, 0], X_test[:, 1], c=y_pred)
plt.show()
```

The scatter plot shows that the anomalous points are clearly separated from the normal points. This means that the model is able to successfully detect anomalies in the data.

### 6. Conclusion

In this tutorial, we showed you how to use PyOD to implement anomaly detection on a real-world dataset. We used the [KDD Cup 99 dataset](https://www.kdd.org/kdd-cup/1999/) and trained a One-Class SVM model on the normal data. We then used the model to detect anomalies in the test data. The results showed that the model was able to successfully detect anomalies in the data.

### Hashtags

Tiktok911fFamily · Jun 30, 2024

Đưa ra một bộ dữ liệu, làm thế nào tôi có thể thực hiện phát hiện dị thường với PYOD?

Ask Implementing Anomaly Detection with PyOD

Etsytut04

New member

Tiktok911fFamily

New member