[ML] k-NEAREST NEIGHBORS (k-최근접 이웃 알고리즘)

k-nearest neighbors algorithm 이란?

패턴 인식에서 k-최근접 이웃 알고리즘(또는 줄여서 k-NN)은 분류나 회귀에 사용되는 비모수 방식이다. 아래 그림처럼 만약 k가 3이라면 보라색 마름모 (new data point)는 가장 가까운 3개 (별 1, 동그라미 2)를 파악해서 class B로 분류하게 된다. 만약 k가 7이라면 별 4, 동그라미 3으로 class A로 분류하게 될 것이다.

k-nearest neighbors Example

we will practice using k-nearest neighbors to predict the outcome of a user clicking on an online advertisement based on the class of nearby data points.

dataset

advertising.csv

0.10MB

1-2. Import Libaries / Dataset

import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import classification_report, confusion_matrix

df = pd.read_csv('/content/advertising.csv')

3. Remove Variables

we remove the discrete variables from the dataframe, including Ad Topic Line, Timestamp, Male, Country, and City.
k-NN generally works best with continuous variables such as age and area income.

del df['Ad Topic Line']
del df['Timestamp'] 
del df['Male'] 
del df['Country'] 
del df['City']

df.head()

4-5. Scale data/Set X and y values

we use StandardScaler() from Scikit-learn to standardize the variance of the independent variables (while dropping the dependent variable Clicked on Ad).
This transformation will help to avoid one or more variables with a high range unfairly pulling the focus of the model.

scaler = StandardScaler()
scaler.fit(df.drop('Clicked on Ad',axis=1))
scaled_features = scaler.transform(df.drop('Clicked on Ad',axis=1))

X = scaled_features
y = df['Clicked on Ad']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=10, shuffle=True)

model = KNeighborsClassifier(n_neighbors=5)

model.fit(X_train, y_train)
 
model_predict = model.predict(X_test) 

print(confusion_matrix(y_test, model_predict)) 
print(classification_report(y_test, model_predict))

6. Set algorithm

Note that setting k to an uneven number helps to eliminate the possibility of a prediction stalemate in the case of a binary prediction.

7. Evaluate

8. Optimize

We can experiment with the number of neighbors chosen in step 6 and attempt to reduce the number of incorrectly predicted outcomes.
Based on manual trial and error, we can improve the model by opting for 3 neighbors. (k는 홀수가 좋은 경우가 많음)

9. Predict

We can deploy our model(n_neighbors=3) on the first 10 rows of the scaled_features dataframe to predict the likely outcome.

model.predict(scaled_features)[0:10]

https://github.com/erica00j/machinelearning/blob/main/KNN_advertising.ipynb

'인공지능 > Machine Learning' 카테고리의 다른 글

[ML] Tree Based Learning Algorithms - Decision Trees (0)	2022.11.30
[ML] k-NEAREST NEIGHBORS 예제 (0)	2022.11.15
[ML] Bias & Variance (0)	2022.11.15
[ML] Support Vector Machines, SVM (0)	2022.11.15
[ML] Logistic Regression (로지스틱 회귀) (0)	2022.11.15

고구마의 개발

[ML] k-NEAREST NEIGHBORS (k-최근접 이웃 알고리즘)

1-2. Import Libaries / Dataset

3. Remove Variables

4-5. Scale data/Set X and y values

6. Set algorithm

7. Evaluate

8. Optimize

9. Predict

'인공지능 > Machine Learning' 카테고리의 다른 글

댓글

티스토리툴바

[ML] k-NEAREST NEIGHBORS (k-최근접 이웃 알고리즘)

1-2. Import Libaries / Dataset

3. Remove Variables

4-5. Scale data/Set X and y values

6. Set algorithm

7. Evaluate

8. Optimize

9. Predict

'인공지능 > Machine Learning' 카테고리의 다른 글

관련글

댓글

티스토리툴바