NLP Transformer Model for Depression Detection #6

jamespeilunli · 2024-07-19T19:03:50Z

#1

2024-07-18:

We began experimenting with different NLP methods and datasets for depression detection.

Found some nice datasets we experimented on:

Experiments:

Kyle: LSTM on Kaggle Twitter dataset; proj.ipynb and proj.py
Daniel: Transformer on Translation dataset with code repurposed from Medium; Its_transforming_time_.ipynb
James: BERT on both datasets with code repurposed from this Kaggle notebook; bert_test.ipynb and bert_test2.ipynb

2024-07-20

Experimented with larger datasets

Daniel: Used BERT model from james on datasets made in the Datasets branch. Achieved better accuracy at 96.8%; bert_test3.ipynb

2024-07-22

#11

The next few days are about improving accuracy on the big dataset.

2024-07-23

Daniel: modified the model to give one scalar between [0, 1] as a probability for the output (through a sigmoid activation) instead of a tensor with 2 scalars from before. This gives one depression score for each post. Sigma Bert.ipynb
Kyle: tried training and using the model on each user instead of each post for the big dataset. We did this by concatening the posts (as strings) together with a special separator/token before running tokenization on the string. In the same file, Kyle also tried averaging the results of the model from before. averageUserTweet.ipynb

2024-07-24

Daniel: building on our idea with the model taking in inputs per user, we will try to combine the posts from the user in a different way: run a separate BERT on each post, then using that as input for an LSTM, to give one final output for depression score. Code was finished, awaiting testing.(This was my magnum opus, it took waaaayyyyy too long, but it works)
James: experimented with layer freezing layer_freezing_test.ipynb and more regularization with no significant results (although layer freezing was faster to train, its accuracy dropped by ~1-2%). Also expanded the MAX_LEN of tokens per string in a successful attempt to combat overfitting through increase in data (BERT is very complex and can easily overfit if given less data) - this gave ~+5-6% accuracy with a final test accuracy of 79% in bigger_data.ipynb. Tomorrow, we should plan on increasing the MAX_LEN even more, including testing on one of the older BERTs with no combining of text from the same user.

2024-07-25

After significant sacrifices in mental health, Daniel, Kyle and James worked on debugging the LSTM + BERT. This included fixes in the encoder, optimizations in the forward function, and improving the training loop with gradient accumalation. This led to a massive reduction in iteration speed from roughly 50 seconds a batch yesterday to 3 this evening. Work still needs to be done to ensure accuracy, and increase the amount of data that can be used.

2024-07-26

Kyle + James: The error with the model was solved, it had to do with the wrong order of unpacking output tuples(... help me). We finally achieved consistent learning and accuracy, reaching 80%. improved.ipynb

Then Daniel and Tad worked on a new variation of the LSTM+BERT nicknamed "tolBERT" instead of having one large LSTM with a thousand layers, it splits the output of the BERT models into groupings and uses multiple different LSTMs which are combined into a single MLP. This has the advantage of minimizing vanishing gradients, and more parameters to potentially tune. This achieved an astonishing 82.6% testing accuracy on the MDDL dataset, approaching our target accuracy of 85%. It is the belief of everyone working on the model, that with proper optimisation, both tolBERT and the LSTM+BERT should be able to achieve target accuracy. There is even discussion of an ensemble model in the future combining both. This is a huge step in our research, and we are very happy with the current results.

2024-07-27

everyone did more research on optimizing model accuracy

2024-07-28

Daniel: improved tolBERT to 84.4% by increasing LSTM layer amount to 5
James: improved tolBERT to 84.8% with 20 epochs, 6 LSTMs, and 6 LSTM layers

2024-07-29

we hope to finalize the models today, trying to get the files in this branch organized and ready to merge #23
Daniel: making a .py file for backend to load our model and run it

2024-07-30

fixing an issue where the saved torch model was giving us junk accuracy. we suspect it was because we were not saving the list of LSTMs properly. trying to fix with storing them in torch.nn.ModuleList instead of storing in regular Python list

just base model, not fit for our dataset

jamespeilunli · 2024-07-19T19:08:47Z

ok everyone, I just reorganized the repository so that our changes are separated into different branches and so main isn't flooded with random experiment files.

proj.py and proj.ipynb are in this branch, but they aren't showing up in the diff for some reason.

vercel · 2024-07-20T17:38:03Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
utd-summer-ai	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Jul 30, 2024 7:38pm

kylebliu · 2024-07-21T18:32:12Z

aw hell nah

…

On Sat, Jul 20, 2024 at 12:38 PM vercel[bot] ***@***.***> wrote: *The latest updates on your projects*. Learn more about Vercel for Git ↗︎ <https://vercel.link/github-learn-more> Name Status Preview Comments Updated (UTC) *utd-summer-ai* ✅ Ready (Inspect <https://vercel.com/james-projects-d63f44ae/utd-summer-ai/EXE3jfhsTeN5nkK4ayPt2oMJdnpk> ) Visit Preview <https://vercel.live/open-feedback/utd-summer-ai-git-model-james-projects-d63f44ae.vercel.app?via=pr-comment-visit-preview-link&passThrough=1> 💬 *Add feedback* <https://vercel.live/open-feedback/utd-summer-ai-git-model-james-projects-d63f44ae.vercel.app?via=pr-comment-feedback-link> Jul 20, 2024 5:38pm — Reply to this email directly, view it on GitHub <#6 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/A5667L34Y6STZY7ZSUAD5CTZNKOBDAVCNFSM6AAAAABLFC6Y32VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENBRGI2DAMBTGA> . You are receiving this because you were assigned.Message ID: ***@***.***>

…into model

…odel

Note this uses the "WEBEATTHEPAPER" model so it might have to be changed if needed

…into model

jamespeilunli · 2024-08-03T16:37:50Z

i can't even checkout this branch on CLI because the LFS limit is full...

jamespeilunli · 2024-08-08T00:12:23Z

observation: amount of posts significantly impacts depression score. i noticed this when fixing the bug where mastodon would only give me max of 40 posts instead of 60

JadedTester746 and others added 6 commits July 18, 2024 21:55

transformer draft

fa55b1c

just base model, not fit for our dataset

test BERT model on another twitter dataset

150e462

test BERT on the initial dataset Kyle tested on

a7f1f23

Added dataset folder

5176916

Add files via upload

2a80916

Delete front-end-for-utd-ai-depression-website.zip

d969978

jamespeilunli assigned jamespeilunli, kylebliu and JadedTester746 Jul 19, 2024

tested bert with larger dataset, 96.8% on test dataset ;)

59f8c20

vercel bot deployed to Preview July 20, 2024 17:38 View deployment

added file saving and loading support for bert model

3de2b56

vercel bot deployed to Preview July 21, 2024 18:47 View deployment

skibidi

de277f2

vercel bot deployed to Preview July 22, 2024 15:32 View deployment

Update .gitattributes

247eae0

vercel bot deployed to Preview July 22, 2024 15:33 View deployment

jamespeilunli marked this pull request as ready for review July 22, 2024 16:44

jamespeilunli requested review from kylebliu and JadedTester746 July 22, 2024 16:45

jamespeilunli marked this pull request as draft July 22, 2024 17:00

add bert test 3 model file for easy access

2d6416a

vercel bot deployed to Preview July 22, 2024 20:05 View deployment

add progress bars to training

75ace66

vercel bot deployed to Preview July 22, 2024 20:40 View deployment

JadedTester746 added 2 commits July 23, 2024 15:08

changed the model to output just one percentage value

137acf0

Merge branch 'model' of https://github.com/jamespeilunli/utd-summer-ai …

a8429df

…into model

Merge branch 'model' of github.com:jamespeilunli/utd-summer-ai into m…

be248b6

…odel

vercel bot deployed to Preview July 26, 2024 22:18 View deployment

Model with 84.8% accuracy

0dec43e

vercel bot deployed to Preview July 29, 2024 00:00 View deployment

code for model giving 84.8% accuracy

73710fd

vercel bot deployed to Preview July 29, 2024 00:22 View deployment

jamespeilunli mentioned this pull request Jul 29, 2024

organize our ML experiments #23

Open

JadedTester746 and others added 2 commits July 29, 2024 14:47

Implementation of model for website

c9f2255

Implemented model for website

61f3a5d

Note this uses the "WEBEATTHEPAPER" model so it might have to be changed if needed

vercel bot deployed to Preview July 29, 2024 19:49 View deployment

Dataset for the multiworkers

fa73f5d

vercel bot deployed to Preview July 29, 2024 20:34 View deployment

Cleaned and commented multitweet model

9971129

vercel bot deployed to Preview July 29, 2024 21:51 View deployment

add my 84.8% acc ipynb

6065fcc

vercel bot deployed to Preview July 30, 2024 03:25 View deployment

JadedTester746 and others added 2 commits July 30, 2024 11:05

Added single post implementation

54e9257

Single post implemented

dc7895a

vercel bot deployed to Preview July 30, 2024 16:08 View deployment

Merge branch 'model' of https://github.com/jamespeilunli/utd-summer-ai …

80700fd

…into model

vercel bot deployed to Preview July 30, 2024 16:15 View deployment

fix model saving with list of LSTM and new model with 86% PB accuracy!

14fb45d

vercel bot deployed to Preview July 30, 2024 16:18 View deployment

reprlacing ModelImplementation.ipynb

8d80f9c

vercel bot deployed to Preview July 30, 2024 19:37 View deployment

Fixed file loading on new Model Implementation

0035308

vercel bot deployed to Preview July 30, 2024 19:38 View deployment

add final cleaned ipynbs for reproducibility of our best models

5fda371

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NLP Transformer Model for Depression Detection #6

NLP Transformer Model for Depression Detection #6

jamespeilunli commented Jul 19, 2024 •

edited

Loading

jamespeilunli commented Jul 19, 2024

vercel bot commented Jul 20, 2024 •

edited

Loading

kylebliu commented Jul 21, 2024 via email

jamespeilunli commented Aug 3, 2024

jamespeilunli commented Aug 8, 2024

NLP Transformer Model for Depression Detection #6

Are you sure you want to change the base?

NLP Transformer Model for Depression Detection #6

Conversation

jamespeilunli commented Jul 19, 2024 • edited Loading

2024-07-18:

2024-07-20

2024-07-22

2024-07-23

2024-07-24

2024-07-25

2024-07-26

2024-07-27

2024-07-28

2024-07-29

2024-07-30

jamespeilunli commented Jul 19, 2024

vercel bot commented Jul 20, 2024 • edited Loading

kylebliu commented Jul 21, 2024 via email

jamespeilunli commented Aug 3, 2024

jamespeilunli commented Aug 8, 2024

jamespeilunli commented Jul 19, 2024 •

edited

Loading

vercel bot commented Jul 20, 2024 •

edited

Loading