git clone https://github.com/asmyoo/MSAP.git
pip install -e ./MSAP
git clone https://github.com/asmyoo/kneed.git
pip install -e ./kneed
git clone https://github.com/asmyoo/kneebow.git
pip install -e ./kneebow
pip install -r requirements.txt
- for additional info, might need some files in old_files folder
- hpc is what was used to run all the results except for ones requiring ipynb
- local is what was used to run ipynb results
python reformat_ml.py
Make 12to18 data if want to change meaning to be 1 anywhere even if missing data to be 1 within src/preprocess
python make_12to18.py
- Change model_selecting.py config to use the new dataset preprocessed_data_without_temporal_12to18.csv
python make_12to18ave.py
- Change model_selecting.py config to use the new dataset preprocessed_data_without_temporal_12to18ave.csv
- Cleaning.py for % missing value imputation and make sure columns_ignored contains child id variable name
- Model_selecting.py for age_cutoff and column_dependent
python get_config_info.py
- Make sure within get_config_info the default preprocessed data filename is correct
- Prediction label is 0/1 so does not need to be marked as categorical unless mistake is made
- Change preprocessing.py config categorical variables if needed (probably not)
- Change cleaning.py with columns_ignored to add mental health variables (don't do for now because our predictions seem to use these variables heavily to predict)
- Make sure to create a new conda environment for the requirements for depression-predictor
git clone https://github.com/asmyoo/depression-predictor.git
cd depression-predictor
pip install -r requirements.txt
cd ..
python -u -m depression-predictor.depp.run_eda
- Copy the Variables excel file and preprocessed data into the depression-predictor data folder
- Check filename for data in depression-predictor utils/dataset.py
- Takes approx 1 hr
- Copy vars_sorted.csv to DepressionProject/output
- Then run python notebook feature_analysis_correlations_iterativeimpute.ipynb
python -u -m DepressionProject.run_cleaner
- Make sure to not overwrite png's from feature_analysis_correlations_iterativeimpute.ipynb, missing_value png’s, and data_cleaned.csv's
python -m DepressionProject.run_encode DepressionProject/output/data_cleaned.csv DepressionProject/output/data_cleaned_encoded.csv
- Move output files into output folder (separated by age, include png's and etc)
- Use script
- Use script
python -u -m DepressionProject.run_univariate \
./DepressionProject/output/pval_filter_60_MVI/output_12to18_yesmental/results.pkl \
./DepressionProject/output/pval_filter_60_MVI/output_12to18_yesmental/preprocessed \
./DepressionProject/output/pval_filter_60_MVI/output_12to18_yesmental/data_cleaned_encoded.csv \
./DepressionProject/output/pval_filter_60_MVI/output_12to18_yesmental/balanced_accuracy \
y12to18_Dep_YN_216m \
--use-balanced-accuracy
python -u -m DepressionProject.fix_embed_colors \
./DepressionProject/output/pval_filter_60_MVI/output_12_yesmental/results.pkl \
./DepressionProject/output/pval_filter_60_MVI/output_12_yesmental/preprocessed \
./DepressionProject/output/pval_filter_60_MVI/output_12_yesmental/data_cleaned_encoded.csv \
./DepressionProject/output/pval_filter_60_MVI/output_12_yesmental/ \
y12CH_Dep_YN_144m
Run make_readable_all_var_sorted.py to change the description column of all vars_sorted_dir_ranked_rounded.csv to be more readable
python -u -m DepressionProject.make_readable_all_var_sorted ./DepressionProject/output/pval_filter_60_MVI
Run make_readable_heatmapcsv.py if have pearson.csv of x and y variables that are highly correlated or anticorrelated after looking at the pearson heatmap
python -u -m DepressionProject.make_readable_heatmapcsv ./DepressionProject/output/rfe_pearson_spearman/output_12_yesmental
python -u -m DepressionProject.get_unique_fts ./DepressionProject/output/pval_filter_60_MVI
python -u -m DepressionProject.rank_pearson_rfe ./DepressionProject/output/pval_filter_60_MVI
python -u -m DepressionProject.run_tsne_cluster \
./DepressionProject/output/pval_filter_60_MVI/output_12to18_yesmental/results.pkl \
./DepressionProject/output/pval_filter_60_MVI/output_12to18_yesmental/preprocessed \
./DepressionProject/output/pval_filter_60_MVI/output_12to18_yesmental/data_cleaned_encoded.csv \
./DepressionProject/output/pval_filter_60_MVI/output_12to18_yesmental/f1 \
y12to18_Dep_YN_216m
python -u -m DepressionProject.plot_rfe_jaccard \
./DepressionProject/output/pval_filter_60_MVI/Supplementary\ Spreadsheet\ 3.xlsx
./DepressionProject/output/pval_filter_60_MVI/rfe_jaccard.svg
python -u -m DepressionProject.get_top_10_rfe \
./DepressionProject/output/pval_filter_60_MVI/Supplementary\ Spreadsheet\ 3.xlsx
./DepressionProject/output/pval_filter_60_MVI/rfe_jaccard.svg
Run print_num_fts_missingvalratio.py to get number of features and missing value ratio before cleaning
python -u -m DepressionProject.print_num_fts_missvalratio
python reformat_ml_checkdups.py
python clean_dups.py
python check_dups.py
python -u -m DepressionProject.print_num_fts_missvalratio --path_data ./DepressionProject/output/preprocessed_data_without_temporal_checkdup_cleaned_no_info.csv
Run make_readable_pcc_sc_kendall.py and make_readable_list.py after pasting in the best rfe list and lists from run_univariate's output from src/preprocess
Make sure to input the hardcoded variables for the rfe results
python -u -m DepressionProject.run_tsne_use_rfe_results_all \
./DepressionProject/output/10MVIout/output_12_yesmental \
./DepressionProject/output/10MVIout/output_16_yesmental \
./DepressionProject/output/10MVIout/output_17_yesmental \
./DepressionProject/output/10MVIout/output_18_yesmental \
y12CH_Dep_YN_144m \
y16CH_Dep_YN_192m \
y17CH_Dep_YN_204m \
y18CH_Dep_YN_216m
python -u -m DepressionProject.run_f1_calcs_baseline_all \
./DepressionProject/output/10MVIout/output_12_yesmental \
./DepressionProject/output/10MVIout/output_16_yesmental \
./DepressionProject/output/10MVIout/output_17_yesmental \
./DepressionProject/output/10MVIout/output_18_yesmental \
y12CH_Dep_YN_144m \
y16CH_Dep_YN_192m \
y17CH_Dep_YN_204m \
y18CH_Dep_YN_216m
python -u -m DepressionProject.plot_f1_overall
./DepressionProject/output/10MVIout/f1s.png