랜덤포레스트(하이퍼파라미터)

Notice

Recent Posts

Recent Comments

Link

« 2025/12 »
일	월	화	수	목	금	토
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

𝑁𝑜𝑡𝑒𝑏𝑜𝑜𝑘

랜덤포레스트(하이퍼파라미터) 본문

프로젝트🏠

랜덤포레스트(하이퍼파라미터)

seoa__ 2025. 1. 27. 21:23

[ 목차 ]

랜덤포레스트의 주요 하이퍼파라미터

n_estimators : 트리 개수
- 너무 적으면 과소적합, 너무 많으면 학습 시간이 길어짐
- 기본값 : 100
max_depth : 트리의 최대 깊이
- 너무 깊으면 과적합, 너무 얕으면 과소적합
- 기본값 : 제한 없음
max_features : 사용할 최대 변수 개수
- 변수 개수 제한
- 분류에서는 sqrt
- 변수 간 상관성 고려
min_samples_split : 노드 분할 최소 샘플 수
- 기본 값 : 2
min_samples_leaf : 리프 노드 최소 샘플 수
- 기본 값 : 1

https://www.dbpia.co.kr/journal/articleDetail?nodeId=NODE10565794&utm_source=chatgpt.com&language=ko_KR&hasTopBanner=true

데이터 분리

논문 참고

https://sejong.dcollection.net/public_resource/pdf/200000630803_20250126185853.pdf

# 예제 코드

from sklearn.model_selection import train_test_split

digits_data = digits.data / 16
X_train, X_test, y_train, y_test = train_test_split(digits_data,digits.target, test_size=0.1)

print(X_train.shape, X_test.shape, y_train.shape,
y_test.shape)

((1617, 64), (180, 64), (1617,), (180,))

from sklearn.model_selection import train_test_split
#train_test_split 학습용, 테스트용 분리하는 함수

# 타겟 변수와 입력 변수 분리
X = data_new.drop(columns=['허위매물여부', 'ID', '게재일']) # 입력변수
y = data_new['허위매물여부'] # 타겟변수,y는 예측할 대상

# 데이터 분리
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=42)

print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)

(2206, 14) (246, 14) (2206,) (246,)
# x-train : 학습용 데이터 (2206 샘플, 14개 특징)
# x_test : 테스트 데이터 (246 샘플, 14개 특징)

'프로젝트🏠' 카테고리의 다른 글

머신러닝 샘플링 (2)	2025.01.31

'프로젝트🏠' Related Articles

머신러닝 샘플링 2025.01.31

𝑁𝑜𝑡𝑒𝑏𝑜𝑜𝑘

랜덤포레스트(하이퍼파라미터) 본문

랜덤포레스트(하이퍼파라미터)

랜덤포레스트의 주요 하이퍼파라미터

데이터 분리

'프로젝트🏠' 카테고리의 다른 글

티스토리툴바