Contents
접기
데이터 셋
https://www.kaggle.com/datasets/johnsmith88/heart-disease-dataset
Heart Disease Dataset
Public Health Dataset
www.kaggle.com
패키지 로딩
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# from keras.models import Squential
from tensorflow.keras import Sequential
from keras.layers import Dense
from keras.callbacks import EarlyStopping
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, roc_auc_score, precision_score
하이퍼 파라미터 설정
INPUT_DIM = 13
MY_EPOCH = 100
MY_BATCH = 32
MY_SPLIT = 0.4
데이터 불러오기
data = pd.read_excel('./dataset/heart.xls')
print(data.shape)
display(data.head())
data.describe()
data.info()
print(data.isna().sum())
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 303 entries, 0 to 302
Data columns (total 14 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 age 303 non-null int64
1 sex 303 non-null int64
2 cp 303 non-null int64
3 trestbps 303 non-null int64
4 chol 303 non-null int64
5 fbs 303 non-null int64
6 restecg 303 non-null int64
7 thalach 303 non-null int64
8 exang 303 non-null int64
9 oldpeak 303 non-null float64
10 slope 303 non-null int64
11 ca 303 non-null int64
12 thal 303 non-null int64
13 target 303 non-null int64
dtypes: float64(1), int64(13)
memory usage: 33.3 KB
데이터 스케일링 - 데이터 표준화
from sklearn.preprocessing import StandardScaler
X = data.drop('target', axis = 1)
y = data['target']
scaler = StandardScaler()
scaled_data = scaler.fit_transform(X)
scaled_data = pd.DataFrame(scaled_data, columns= X.columns)
print(scaled_data.describe())
boxplot = scaled_data.boxplot(figsize = (10,7), showmeans = True)
plt.show()
학습/평가 데이터 분할
데이터를 두 번 분할 시켰다.
X_train, X_test, y_train, y_test = train_test_split(scaled_data, y, test_size = MY_SPLIT, random_state = 10)
X_train.shape, X_test.shape, y_train.shape, y_test.shape
X_val, X_test, y_val, y_test = train_test_split(X_test,y_test, test_size = 0.5, random_state = 0)
X_val.shape, X_test.shape, y_val.shape, y_test.shape
((61, 13), (61, 13), (61,), (61,))
모델 생성
Output Shape (None, 1000) : None -- 배치 사이즈가 정해지지 않아서 None 으로 표기
# Output Shape (None, 1000) : None -- 배치 사이즈가 정해지지 않아서 None 으로 표기
from keras.layers import Dropout
from keras import regularizers
model = Sequential()
model.add(Dense(1000, activation= 'tanh', input_dim = INPUT_DIM, kernel_regularizer= regularizers.l2(0.02)))
model.add(Dense(1000, activation= 'tanh', kernel_regularizer= regularizers.l2(0.1)))
model.add(Dropout(rate= 0.5))
model.add(Dense(1,activation= 'sigmoid'))
model.summary()
odel: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 1000) 14000
dense_1 (Dense) (None, 1000) 1001000
dropout (Dropout) (None, 1000) 0
dense_2 (Dense) (None, 1) 1001
=================================================================
Total params: 1016001 (3.88 MB)
Trainable params: 1016001 (3.88 MB)
Non-trainable params: 0 (0.00 Byte)
모델 컴파일 및 학습
from keras.callbacks import TensorBoard
import datetime
# log_dir : 로그가 기록될 디렉토리 경로 (경로에 한글이 포함되면 안된다.)
log_dir = 'c:\\Logs\\' + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
tensorboard = TensorBoard(log_dir=log_dir, histogram_freq=1)
from keras.callbacks import EarlyStopping
early_stop = EarlyStopping(monitor='val_loss', mode='min', patience=3)
model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['accuracy'])
model.fit(X_train, y_train, batch_size= MY_BATCH, epochs= MY_EPOCH, validation_data=(X_val,y_val), verbose = 1, callbacks = [tensorboard, early_stop])
model.save('heart-disease.h5')
Epoch 1/100
6/6 [==============================] - 0s 35ms/step - loss: 75.1489 - accuracy: 0.7403 - val_loss: 51.0454 - val_accuracy: 0.7869
Epoch 2/100
6/6 [==============================] - 0s 20ms/step - loss: 42.6342 - accuracy: 0.8398 - val_loss: 31.9832 - val_accuracy: 0.7869
(중략)
100 번 중 20번까지 돌고 early stop 했다 (val_loss 가 3번 동안 줄어들지 않았기 때문)
예측 및 모델 평가
y_pred_prob = model.predict(X_test)
y_pred = (y_pred_prob > 0.5)
print('\n == CONFUSION MATRIX ==')
print(confusion_matrix(y_test,y_pred))
score = model.evaluate(X_test, y_test, verbose= 1) #, callbacks = [tensorboard])
print('Loss: ', score[0])
print('Accuracy: ', score[1])
print('Precision: ', precision_score(y_test,y_pred))
print('AUC: ', roc_auc_score(y_test,y_pred_prob))
2/2 [==============================] - 0s 1ms/step
== CONFUSION MATRIX ==
[[18 13]
[ 1 29]]
2/2 [==============================] - 0s 2ms/step - loss: 0.5433 - accuracy: 0.7705
Loss: 0.5433093309402466
Accuracy: 0.7704917788505554
Precision: 0.6904761904761905
AUC: 0.9193548387096774
'국비 교육 > 머신러닝, 딥러닝' 카테고리의 다른 글
[딥러닝] 활성화함수, 출력함수, 손실함수, 최적화함수 (0) | 2024.08.01 |
---|---|
[딥러닝] DNN 심층신경망 (0) | 2024.08.01 |
[딥러닝 - 교육 외] Keras (0) | 2024.08.01 |
[딥러닝- 교육 외] 이론, 알고리즘 종류 (0) | 2024.08.01 |