KNN (K-Nearest Neighbors) 알고리즘

KNN (K-Nearest Neighbors) 알고리즘

2024. 12. 20. 12:54ㆍdata 공부/데이터전처리

KNN(K-Nearest Neighbors, 최근접 이웃)은 지도학습(supervised learning)에서 사용되는 분류(classification) 및 회귀(regression) 알고리즘

점 사이의 유클리드 거리를 기반으로 검색

Usage :

K value: which is used to select the number of neighbors(nearest data points)

small K value : reflects the traits of neighbors well, but include high level of noise and null data
large K value : includes large number of neighbors, which minimize the impact of noise and null data, but difficult to reflect the detailed trait of neighbors

pd.set_option('display.max_columns', 30)

컬럼이 너무 많아서 안보일 때

꼭 데이터 확인을 통해 total charges 같은 수의 요소가 float, object, 등 맞는지 확인해야 함

숫자 형태로 바꾸어줘야함

gender의 상하위관계를 만들지 않기 위해 column을 분리시킴

gender는 하나만 보아도 여자인지 남자인지 알 수 있어서 그를 위한 분리가 필요함

pd.get_dummies(data, columns=['gender'], drop_first = True)

scaling

training code

최적의 K값 구하기

accuracy score 가 가장 높은 index를 찾고, 그에 해당하는 K 값을 넣어 다시 돌려주기

53이 가장 높았어서, 54를 넣었더니 accuracy score 가 더 높게 나오는 것을 알 수 있음

이러저러한 것들을 열심히 파보는 블로그