This repository has been archived by the owner on Jan 17, 2022. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Compare_Knn_DTree_LRegresion_NBayes_MLP.py
332 lines (237 loc) · 11.1 KB
/
Compare_Knn_DTree_LRegresion_NBayes_MLP.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
#!/usr/bin/env python
# coding: utf-8
# In[1]:
# Author : Amir Shokri
# github link : https://github.com/amirshnll/COVID-19-Surveillance
# dataset link : http://archive.ics.uci.edu/ml/datasets/COVID-19+Surveillance
# email : amirsh.nll@gmail.com
# ### <p style=color:blue>Compare different models for predicting whether a couple will get divorced </p>
# <b>-Decision Tree<br>-Logistic Regression<br>-Naive Bayes<br>-KNN<br>-MLP
# #### The Dataset
# The Dataset is from UCIMachinelearning and it provides you all the relevant information needed for the prediction of Divorce. It contains 54 features and on the basis of these features we have to predict that the couple has been divorced or not. Value 1 represent Divorced and value 0 represent not divorced. Features are as follows:
# 1. If one of us apologizes when our discussion deteriorates, the discussion ends.
# 2. I know we can ignore our differences, even if things get hard sometimes.
# 3. When we need it, we can take our discussions with my spouse from the beginning and correct it.
# 4. When I discuss with my spouse, to contact him will eventually work.
# 5. The time I spent with my wife is special for us.
# 6. We don't have time at home as partners.
# 7. We are like two strangers who share the same environment at home rather than family.
# 8. I enjoy our holidays with my wife.
# 9. I enjoy traveling with my wife.
# 10. Most of our goals are common to my spouse.
# 11. I think that one day in the future, when I look back, I see that my spouse and I have been in harmony with each other.
# 12. My spouse and I have similar values in terms of personal freedom.
# 13. My spouse and I have similar sense of entertainment.
# 14. Most of our goals for people (children, friends, etc.) are the same.
# 15. Our dreams with my spouse are similar and harmonious.
# 16. We're compatible with my spouse about what love should be.
# 17. We share the same views about being happy in our life with my spouse
# 18. My spouse and I have similar ideas about how marriage should be
# 19. My spouse and I have similar ideas about how roles should be in marriage
# 20. My spouse and I have similar values in trust.
# 21. I know exactly what my wife likes.
# 22. I know how my spouse wants to be taken care of when she/he sick.
# 23. I know my spouse's favorite food.
# 24. I can tell you what kind of stress my spouse is facing in her/his life.
# 25. I have knowledge of my spouse's inner world.
# 26. I know my spouse's basic anxieties.
# 27. I know what my spouse's current sources of stress are.
# 28. I know my spouse's hopes and wishes.
# 29. I know my spouse very well.
# 30. I know my spouse's friends and their social relationships.
# 31. I feel aggressive when I argue with my spouse.
# 32. When discussing with my spouse, I usually use expressions such as ‘you always’ or ‘you never’ .
# 33. I can use negative statements about my spouse's personality during our discussions.
# 34. I can use offensive expressions during our discussions.
# 35. I can insult my spouse during our discussions.
# 36. I can be humiliating when we discussions.
# 37. My discussion with my spouse is not calm.
# 38. I hate my spouse's way of open a subject.
# 39. Our discussions often occur suddenly.
# 40. We're just starting a discussion before I know what's going on.
# 41. When I talk to my spouse about something, my calm suddenly breaks.
# 42. When I argue with my spouse, ı only go out and I don't say a word.
# 43. I mostly stay silent to calm the environment a little bit.
# 44. Sometimes I think it's good for me to leave home for a while.
# 45. I'd rather stay silent than discuss with my spouse.
# 46. Even if I'm right in the discussion, I stay silent to hurt my spouse.
# 47. When I discuss with my spouse, I stay silent because I am afraid of not being able to control my anger.
# 48. I feel right in our discussions.
# 49. I have nothing to do with what I've been accused of.
# 50. I'm not actually the one who's guilty about what I'm accused of.
# 51. I'm not the one who's wrong about problems at home.
# 52. I wouldn't hesitate to tell my spouse about her/his inadequacy.
# 53. When I discuss, I remind my spouse of her/his inadequacy.
# 54. I'm not afraid to tell my spouse about her/his incompetence.
# Generally, logistic Machine Learning in Python has a straightforward and user-friendly implementation. It usually consists of these steps:<br>
# 1. Import packages, functions, and classes<br>
# 2. Get data to work with and, if appropriate, transform it<br>
# 3. Create a classification model and train (or fit) it with existing data<br>
# 4. Evaluate your model to see if its performance is satisfactory<br>
# 5. Apply your model to make predictions<br>
# ### Import packages, functions, and classes
# In[1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import GaussianNB
from sklearn.neighbors import KNeighborsClassifier
from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix as cm
from sklearn import metrics
from sklearn import preprocessing
from sklearn.metrics import accuracy_score
from sklearn import tree
# ### Get data to work with and, if appropriate, transform it
# In[2]:
df = pd.read_csv('divorce.csv',sep=';')
df.head()
# In[3]:
df.info()
# In[4]:
y=df.Class
x_data=df.drop(columns=['Class'])
# print(x_data)
# ### Data description
# In[5]:
sns.countplot(x='Class',data=df,palette='hls')
plt.show()
count_no_sub = len(df[df['Class']==0])
count_sub = len(df[df['Class']==1])
pct_of_no_sub = count_no_sub/(count_no_sub+count_sub)
print("percentage of no divorce is", pct_of_no_sub*100)
pct_of_sub = count_sub/(count_no_sub+count_sub)
print("percentage of divorce", pct_of_sub*100)
# ### Normalize data
# In[6]:
x = (x_data - np.min(x_data)) / (np.max(x_data) - np.min(x_data)).values
x.head()
# In[7]:
plt.figure(figsize=(10,8))
sns.heatmap(df.corr(), cmap='viridis');
# ### Split dataset to data train & data test
# In[8]:
x_train, x_test, y_train, y_test = train_test_split(x,y,test_size = 0.4,random_state=400)
print("x_train: ",x_train.shape)
print("x_test: ",x_test.shape)
print("y_train: ",y_train.shape)
print("y_test: ",y_test.shape)
# ### Train & Score
# Step 1. Import the model you want to use<br>
# Step 2. Make an instance of the Model<br>
# Step 3. Training the model on the data, storing the information learned from the data<br>
# Step 4. Predict labels for new data <br>
# ### Decision Tree Classifier
# In[9]:
clft = DecisionTreeClassifier()
clft = clft.fit(x_train,y_train)
y_predt = clft.predict(x_test)# step 4
print(classification_report(y_test, clft.predict(x_test)))
print('Accuracy of Decision Tree classifier on test set: {:.2f}'.format(clft.score(x_test, y_test)))
from sklearn import tree
plt.figure(figsize=(10,10))
temp = tree.plot_tree(clft.fit(x,y), fontsize=12)
plt.show()
# ### Logistic Regreession Classifier
# In[10]:
clfr = LogisticRegression(solver='lbfgs')# step 2
clfr.fit(x_train, y_train.ravel())# step 3
y_predr = clfr.predict(x_test)# step 4
# model = LogisticRegression(solver='liblinear', random_state=0).fit(x_train, y_train.ravel())
# In[11]:
print(classification_report(y_test, clfr.predict(x_test)))
print('Accuracy of logistic regression classifier on test set: {:.2f}'.format(clfr.score(x_test, y_test)))
# In[12]:
from sklearn.metrics import roc_auc_score
from sklearn.metrics import roc_curve
logit_roc_auc = roc_auc_score(y_test, clfr.predict(x_test))
fpr, tpr, thresholds = roc_curve(y_test, clfr.predict_proba(x_test)[:,1])
plt.figure()
plt.plot(fpr, tpr, label='Logistic Regression (area = %0.2f)' % logit_roc_auc)
plt.plot([0, 1], [0, 1],'r--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver operating divorce')
plt.legend(loc="lower right")
plt.show()
# ### Naive Bayes Classifier
# In[13]:
clfb = GaussianNB()
clfb.fit(x_train, y_train.ravel())
y_predb = clfb.predict(x_test)# step 4
print(classification_report(y_test, clfb.predict(x_test)))
print("Naive Bayes test accuracy: ", clfb.score(x_test, y_test))
# ### KNN Classifier
# In[14]:
K = 5
clfk = KNeighborsClassifier(n_neighbors=K)
clfk.fit(x_train, y_train.ravel())
y_predk=clfk.predict(x_test)
print("When K = {} neighnors , KNN test accuracy: {}".format(K, clfk.score(x_test, y_test)))
print("When K = {} neighnors , KNN train accuracy: {}".format(K, clfk.score(x_train, y_train)))
print(classification_report(y_test, clfk.predict(x_test)))
print("Knn(k=5) test accuracy: ", clfk.score(x_test, y_test))
ran = np.arange(1,30)
train_list = []
test_list = []
for i,each in enumerate(ran):
clfk = KNeighborsClassifier(n_neighbors=each)
clfk.fit(x_train, y_train.ravel())
test_list.append(clfk.score(x_test, y_test))
train_list.append(clfk.score(x_train, y_train))
print("Best test score is {} , K = {}".format(np.max(test_list), test_list.index(np.max(test_list))+1))
print("Best train score is {} , K = {}".format(np.max(train_list), train_list.index(np.max(train_list))+1))
# In[15]:
plt.figure(figsize=[15,10])
plt.plot(ran,test_list,label='Test Score')
plt.plot(ran,train_list,label = 'Train Score')
plt.xlabel('Number of Neighbers')
plt.ylabel('fav_number/retweet_count')
plt.xticks(ran)
plt.legend()
print("Best test score is {} , K = {}".format(np.max(test_list), test_list.index(np.max(test_list))+1))
print("Best train score is {} , K = {}".format(np.max(train_list), train_list.index(np.max(train_list))+1))
# ### MLP Classifier
# In[16]:
clfm = MLPClassifier(hidden_layer_sizes=(5,), max_iter=2000)
clfm.fit(x_train, y_train.ravel())
y_predm = clfm.predict(x_test)
print("Accuracy:",metrics.accuracy_score(y_test, y_predm))
print(classification_report(y_test, clfm.predict(x_test)))
print("MLP test accuracy: ", clfm.score(x_test, y_test))
# ### Compare Confusion Matrix
# In[17]:
def confusionMatrix(y_pred,title,n):
plt.subplot(5,5,n)
ax=sns.heatmap(cm(y_test, y_pred)/sum(sum(cm(y_test, y_pred))), annot=True
,cmap='RdBu_r', vmin=0, vmax=0.52,cbar=False, linewidths=.5)
plt.title(title)
plt.ylabel('Actual outputs')
plt.xlabel('Prediction')
b, t=ax.get_ylim()
ax.set_ylim(b+.5, t-.5)
plt.subplot(5,5,n+1)
axx=sns.heatmap(cm(y_test, y_pred), annot=True
,cmap='plasma', vmin=0, vmax=40,cbar=False, linewidths=.5)
b, t=axx.get_ylim()
axx.set_ylim(b+.5, t-.5)
return
plt.figure(figsize=(12,12))
# figure, axes = plt.subplots(nrows=1, ncols=1)
confusionMatrix(y_predt,'Decision Tree',1)
confusionMatrix(y_predr,'Logistic Regression',4)
confusionMatrix(y_predb,'Naive Bayes',11)
confusionMatrix(y_predk,'KNN',14)
confusionMatrix(y_predm,'MLP',21)
# plt.subplots_adjust(bottom=0.25, top=0.75)
# figure.tight_layout()
plt.savefig('Compare Confusion Matrix')
plt.show
# #### Result:
# So we have successfully trained our dataset into different models for predicting and compare whether a couple will get divorced or not in divorce data set. And also got the accuracy & confusion matrix for each model as well.