[ML] k-Means Clustering / Scree plot

2022.10.13 - [인공지능/Machine Learning] - [ML] 머신러닝의 학습 방법 (Supervised Learning, Unsupervised Learning, Reinforcement Learning)

[ML] 머신러닝의 학습 방법 (Supervised Learning, Unsupervised Learning, Reinforcement Learning)

머신러닝의 학습 방법 1. Supervised Learning (지도 학습) 데이터에 대한 Label(명시적인 정답)이 주어진 상태에서 컴퓨터를 학습시키는 방법이다. 훈련 데이터(Training Data)로부터 하나의 함수를 유추해

uely.tistory.com

앞에서 배운 Unsupervised algorithms are used to clean or reshape the data.

Dimension Reduction Techniques
k-means Clustering

K-means clustering 이란?

주어진 데이터를 k개의 클러스터로 묶는 알고리즘으로, 각 클러스터와 거리 차이의 분산을 최소화하는 방식으로 동작한다. 이 알고리즘은 비지도 학습의 일종으로, 별도의 레이블이 없는 데이터 안에서 패턴과 구조를 발견한다. K는 데이터 세트에서 찾을 것으로 예상되는 클러스터(그룹) 수를 말한다. Means는 각 데이터로부터 그 데이터가 속한 클러스터의 중심까지의 평균 거리를 의미한다. (이 값을 최소화하는 게 알고리즘의 목표가 된다.)

Each cluster is assigned a random centroid, which is a data point that becomes the epicenter of an individual cluster. The centroid coordinates are updated based on the mean of the new cluster. Example of potential groupings include animal species, customers with similar features, and housing market segmentation.

K-Means Clustering Algorithm

Examine the unclustered data and manually select a centroid for each cluster.
k개의 임의의 중심점을 배치한다.
Clusters are formed after calculating the Euclidean Distance(the average of the squared distance between the centroid and the other datapoints in that cluster) of the remaining data points to the centroids.
집합 D의 각 데이터 오브젝트들에 대해 k 개의 클러스터 중심 오브젝트와의 거리를 각각 구하고, 각 데이터 오브젝트가 어느 중심점 (centroid) 와 가장 유사도가 높은지 알아낸다. 그리고 그렇게 찾아낸 중심점으로 각 데이터 오브젝트들을 할당한다.
The centroid coordinates for each cluster are updated to reflect the cluster’s mean value.
같은 클러스터 내의 데이터들의 평균 위치를 구하고 이 평균 위치 값을 새로운 centroid로 지정한다.
The previous centroids stay in their original position and the new centroids are added to the scatterplot. Lastly, as one data point has switched from the right cluster to the left cluster, the centroids of both clusters need to be updated one last time.
더이상 중심점이 업데이트 되지 않을 때까지 앞의 작업을 반복해준다.
Final clusters are produced based on the updated centroids for each cluster.

Scree plot

2022.10.14 - [인공지능/Machine Learning] - [ML] PCA / Correlation vs. Covariance

[ML] PCA / Correlation vs. Covariance

2022.10.13 - [인공지능/Machine Learning] - [ML] 머신러닝의 학습 방법 (Supervised Learning, Unsupervised Learning, Reinforcement Learning) [ML] 머신러닝의 학습 방법 (Supervised Learning, Unsupervised..

uely.tistory.com

Scree plot은 PCA 주성분 분석에서 고유값의 비율을 차트로 시각화한 것이다. 주성분 분석(PCA)에서 유지하기 위한 주성분의 수를 결정하는 데 사용된다. 고유값 변화율이 완만해지는 부분이 필요한 주성분의 수이다. 그래프가 완만해 지는 부분 이전까지만 활용하는 것이 바람직하다.

'인공지능 > Machine Learning' 카테고리의 다른 글

[ML] PCA 실습 예제 (0)	2022.10.14
Principal Component Analysis (PCA, 주성분분석) (0)	2022.10.14
[ML] Dimension Reduction / Correlation vs. Covariance (0)	2022.10.14
[ML] Drop Missing Values (isnull().sum(), dropna()) (1)	2022.10.13
[ML] Data Scrubbing / One-hot Encoding (0)	2022.10.13

고구마의 개발

[ML] k-Means Clustering / Scree plot

'인공지능 > Machine Learning' 카테고리의 다른 글

댓글

티스토리툴바

[ML] k-Means Clustering / Scree plot

'인공지능 > Machine Learning' 카테고리의 다른 글

관련글

댓글

티스토리툴바