Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request info on experiments run #7

Open
robmarkcole opened this issue Jan 8, 2024 · 2 comments
Open

Request info on experiments run #7

robmarkcole opened this issue Jan 8, 2024 · 2 comments

Comments

@robmarkcole
Copy link

From the following config I assume there was a fair bit of experimentation performed to arrive at these parameters - are you able to shed light/into on the experiments run? I am seeking to compare Prithvi/Unet when both typical defaults are used, and when optimised.
Many thanks

use_skipAtt: false
train_dropout_rate: 0.15

optimizer: sam
LR: 0.011
LR_policy: PolynomialLR
criterion:
    name: TverskyFocalLoss
    weight:
    - 0.0182553
    - 0.03123664
    - 0.02590038
    - 0.03026126
    - 0.04142966
    - 0.04371284
    - 0.15352935
    - 0.07286951
    - 0.10277024
    - 0.10736637
    - 0.1447082
    - 0.17132445
    - 0.0566358
    ignore_index: 0
    gamma: 0.9
@samKhallaghi
Copy link
Member

As the model is big I decided to use dropout to overcome overfitting I checked 0.1, 0.15, 0.20 and 0.25 and decided to use 0.15. If you want to optimize this it is better to use Dropout2D instead. Conventional dropout with randomly mask some of the pixels in the feature maps based on the drop rate probability which might break the spatial relationships but Dropout2D instead will mask the whole feature maps which is more suited to CNN models where the spatial relationships are crucial.

Sharpness-Aware Minimization (SAM) Optimizer helped a lot in stabilizing the loss curve another alternative that I have not tested might be AdamW but it needs tuning the weight decay parameter.

For the learning rate policy, I have tested "StepLR" and "MultiStepLR" (both with the default setting in the repo) beside the PolynomialLR but liked the latter better. Unfortunately, I did not get a chance to try more policies or extensively play with the configuration of the ones that I tested.

For loss function, I only tested "CrossEntropy" besides the "TverskyFocalLoss" and the latter was performing much better. TFL has two important parameters alpha which controls the trade-off between precision and recall (as the value gets higher the loss will weigh false positives more heavily) and gamma which down-weights easy-to-classify samples and focuses more on harder ones.
The chosen value of 0.9 was out on previous experience with land cover mapping with Landsat and in this context is only tested against the default value of 1.33.

There is also the possibility of including a simple additive co-attention on the skip connection of the Unet model. I tested it only once during the early stages of the training experiment when we were testing a 17-class segmentation and decided not to use it as the model was not performing well in some of the classes. After changing the dataset to the current version I did not get a chance to try it again as it requires optimizing the other hyper-parameters as well.

@robmarkcole
Copy link
Author

@samKhallaghi many thanks on the details - lots of avenues to explore

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants