Finding actionable genetic mutations that causes cancer using Machine Learning
Algorithm | Model | Percent of misclassified point | Dimension | Test Logloss |
---|---|---|---|---|
SGDClassifier with log loss | OneHotEncoded Gene, Variation + 1gram TFIDF + 1,2gram CountVectorizer of text | NA | 785767 | 1.1492 |
Logistic Regression Fine tuned | Same as model 1 | 35.187% | 785767 | 1.0912 |
Logistic Regression Fine Tuned | OneHotEncoding of Gene, Variation, 1,2,3gram TFIDF of 20000 features | 30.075% | 22182 | 0.8864 |
Fine tuned Logistic Regression + Fine tuned RandomForest | Manual average ensemble of model2 + fine tuned RandomForest of response coding of gene, variation, text + binary countvectorizer of text. | 29.924% | NA | 0.8736 |