리니어 리그레션 및 예측

리니어 리그레션 및 예측

2024. 12. 19. 12:29ㆍdata 공부/데이터전처리

data.drop(['sex','email'], axis =1)

import statsmodels.api as sa
model = sa.OLS(y_train, X_train)   #이모듈은 y값을 먼저 넣어주어야 함

model = model.fit() #훈련 시키기

리니어 리그레션 모델

r squared: 클수록 좋은 모델 (1에 가까울수록)

Adjusted : 변수의 가중치를 가지고 수치를 매겨줌

coefficient : 변수의 영향력(강도와 방향)

- 데이터 스케일을 봐야 그 coeefieicnet 가 정말 impact 있는건지 알 수 있음. 숫자가지고는 크다 작다 말할 수 없음

R2 = 평균치, 베이스라인에서의 전체 에러에서 얼만큼 에러를 개선시킨것인가 를 퍼센트로 나타낸 값 (linear regression 라인기준)
SST : 토탈에러값
- - ssr : 평균값에서 얼마나 더 개선을 시켰는지
- - sse : 실제값에서의 차이 (불가피한 에러)

x값 예측하기

MSE 보다 자주 쓰이는

RMSE (Root squred ) : 데이터 괴리감을 줄이기 위해

np.sqrt(mean_squared_error(y_test, pred))

dataframe 에서 axis =0 => row, axis 1 => column

when you drop your data, you may use

data.drop('Yearly Amount Spend', axis=1) since drop basically drops the data based on the row

if you want to drop two columns, you have to use it like a one list as :

data.drop(['year','month'], axis=1)

but when we load the 'data' again, columns appear as they were since the drop does not drop the data it is, but only show them

data.drop(['year','month'], axis=1, inplace=True) #대체하겠다

linear regression : RSS가 가장 작은 선을 만드는 것

머신러닝이 기울기, y절편을 참고해 어느 방향으로 얼만큼 갔을 때 에러가 줄어드는지 파악함. 그 방향성과 강도를 알고 움직임

이러저러한 것들을 열심히 파보는 블로그