Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New initial assumption #1350

Open
wants to merge 43 commits into
base: master
Choose a base branch
from

Conversation

dmitryglhf
Copy link
Collaborator

@dmitryglhf dmitryglhf commented Dec 2, 2024

This is a 🔨 code refactoring.

Summary

This PR introduces the following key updates:

  • New Initial Assumptions: Updates initial assumptions by adding boosting-based solutions.

    Comparison table between old and new assumptions (validated on automlbenchmark):

    Metric (mean) main gbm
    0 auc 0.869263 0.870476
    1 acc 0.84667 0.839198
    2 balacc 0.805336 0.816215
    3 logloss 0.449189 0.407661
    4 training_duration 256.164 1013.27
    Full table and Details
    ('task', 'gbm') ('auc', 'main') ('auc', 'gbm') ('acc', 'main') ('acc', 'gbm') ('balacc', 'main') ('balacc', 'gbm') ('logloss', 'main') ('logloss', 'gbm') ('training_duration', 'main') ('training_duration', 'gbm')
    0 Australian 0.948217 0.929117 0.898551 0.869565 0.898981 0.872666 0.321673 0.377569 5.7 7.6
    1 Australian 0.918081 0.914686 0.855072 0.855072 0.847623 0.853565 0.374966 0.39668 5 6.8
    2 Australian 0.904924 0.914686 0.869565 0.855072 0.866723 0.853565 0.372005 0.397505 4.9 7.6
    3 Australian 0.937606 0.924873 0.84058 0.869565 0.843379 0.872666 0.346305 0.380288 4.8 7.5
    4 Australian 0.966893 0.970713 0.927536 0.956522 0.925297 0.954584 0.284554 0.208328 4.8 7.8
    5 Australian 0.966044 0.955857 0.898551 0.884058 0.893039 0.885823 0.303967 0.312517 5 7.4
    6 Australian 0.911715 0.888795 0.84058 0.797101 0.840407 0.803905 0.375949 0.539201 5.2 7.9
    7 Australian 0.94188 0.932479 0.884058 0.869565 0.878205 0.869231 0.350762 0.353845 5.2 6.5
    8 Australian 0.967949 0.887179 0.855072 0.869565 0.841026 0.857692 0.315267 0.370713 5.1 6.1
    9 Australian 0.923077 0.906838 0.84058 0.855072 0.835897 0.85641 0.35447 0.370971 5.2 6.7
    10 blood-transfusion 0.720273 0.711014 0.76 0.733333 0.595029 0.55848 0.59012 0.535162 4.9 4.8
    11 blood-transfusion 0.738304 0.813353 0.72 0.8 0.568713 0.678363 0.533596 0.441131 4.8 4.6
    12 blood-transfusion 0.64425 0.648148 0.76 0.76 0.576023 0.576023 1.03492 0.543418 5.1 4.6
    13 blood-transfusion 0.604776 0.699318 0.706667 0.773333 0.597953 0.641813 1.56405 0.535749 4.9 5.3
    14 blood-transfusion 0.793372 0.85039 0.8 0.813333 0.640351 0.649123 0.438392 0.416505 4.6 5
    15 blood-transfusion 0.653509 0.644737 0.706667 0.786667 0.540936 0.612573 1.01007 0.558623 4.7 4.8
    16 blood-transfusion 0.674464 0.739766 0.76 0.8 0.614035 0.697368 1.4769 0.486198 4.6 4.6
    17 blood-transfusion 0.606725 0.660331 0.733333 0.733333 0.596491 0.596491 0.647909 0.557038 4.5 5.1
    18 blood-transfusion 0.724458 0.771414 0.77027 0.756757 0.665119 0.656347 1.46817 0.474645 4.7 4.6
    19 blood-transfusion 0.663571 0.696078 0.743243 0.756757 0.585655 0.635707 1.48337 0.521154 4.7 4.7
    20 car nan nan 0.722543 0.757225 0.60015 0.738428 0.98549 0.463133 4.9 12.3
    21 car nan nan 0.739884 0.774566 0.704054 0.739779 0.565073 0.452854 4.9 12.4
    22 car nan nan 0.739884 0.751445 0.569484 0.69837 0.528267 0.464546 5.3 13.1
    23 car nan nan 0.739884 0.734104 0.692652 0.633586 0.503071 0.487186 4.6 10.6
    24 car nan nan 0.682081 0.716763 0.493273 0.523721 0.584056 0.509841 4.7 13.6
    25 car nan nan 0.757225 0.734104 0.652667 0.480675 0.582897 0.503798 5.2 12.1
    26 car nan nan 0.601156 0.65896 0.358891 0.514144 0.739626 0.616918 5 13.3
    27 car nan nan 0.710983 0.722543 0.685383 0.664893 0.646046 0.570362 5.3 14.4
    28 car nan nan 0.69186 0.726744 0.658255 0.637003 0.587998 0.51466 5 11.1
    29 car nan nan 0.767442 0.767442 0.721217 0.731682 0.586961 0.448114 4.9 9.5
    30 christine 0.80515 0.757547 0.717712 0.723247 0.717712 0.723247 0.547611 0.584064 454.2 1491.3
    31 christine 0.807499 0.812503 0.745387 0.739852 0.745387 0.739852 0.547159 0.542757 399.7 1482.5
    32 christine 0.804367 0.752808 0.732472 0.706642 0.732472 0.706642 0.552923 0.591433 421 1526.4
    33 christine 0.78062 0.791656 0.706642 0.715867 0.706642 0.715867 0.567128 0.57349 443.6 1624.4
    34 christine 0.80033 0.797988 0.708487 0.715867 0.708487 0.715867 0.555783 0.556033 432.5 1556.9
    35 christine 0.794815 0.798941 0.728782 0.738007 0.728782 0.738007 0.561063 0.553597 405.7 1496.1
    36 christine 0.795605 0.801841 0.732472 0.747232 0.732472 0.747232 0.561305 0.552886 428.3 1569.8
    37 christine 0.802923 0.779619 0.725092 0.732472 0.725092 0.732472 0.548977 0.559119 434.9 1546.2
    38 christine 0.800506 0.764794 0.722736 0.722736 0.722632 0.722707 0.558004 0.5741 428 1568.8
    39 christine 0.748114 0.715068 0.695009 0.663586 0.695141 0.663701 0.593076 0.65323 420.1 1462.1
    40 cnae-9 nan nan 0.962963 0.925926 0.962963 0.925926 0.293401 0.237987 27.7 3140.7
    41 cnae-9 nan nan 0.861111 0.833333 0.861111 0.833333 0.414649 0.533415 26 3074.4
    42 cnae-9 nan nan 0.935185 0.944444 0.935185 0.944444 0.326 0.199649 26.8 3091.4
    43 cnae-9 nan nan 0.851852 0.833333 0.851852 0.833333 0.443444 0.514255 26.2 3072.1
    44 cnae-9 nan nan 0.916667 0.935185 0.916667 0.935185 0.386968 0.263697 26.9 2624.8
    45 cnae-9 nan nan 0.916667 0.925926 0.916667 0.925926 0.317146 0.308698 26.1 3186.2
    46 cnae-9 nan nan 0.898148 0.898148 0.898148 0.898148 0.348513 0.383307 26.6 3636.4
    47 cnae-9 nan nan 0.888889 0.888889 0.888889 0.888889 0.365092 0.358372 29.3 3336.5
    48 cnae-9 nan nan 0.916667 0.898148 0.916667 0.898148 0.340371 0.281255 26.6 3331.1
    49 cnae-9 nan nan 0.907407 0.953704 0.907407 0.953704 0.330015 0.173771 26.1 3293.6
    50 credit-g 0.797143 0.841905 0.77 0.79 0.67381 0.745238 0.486819 0.490703 5.2 10.2
    51 credit-g 0.73 0.695476 0.73 0.73 0.607143 0.635714 0.525489 0.560589 5.2 9.6
    52 credit-g 0.737381 0.685238 0.75 0.73 0.640476 0.654762 0.531201 0.548282 5 12
    53 credit-g 0.745476 0.774762 0.78 0.71 0.671429 0.697619 0.53252 0.545789 4.9 9.2
    54 credit-g 0.754762 0.755238 0.71 0.74 0.57381 0.642857 0.527182 0.565986 6.1 10.7
    55 credit-g 0.814524 0.775 0.74 0.78 0.614286 0.690476 0.478908 0.51608 4.6 9.6
    56 credit-g 0.799286 0.76881 0.76 0.76 0.657143 0.704762 0.486235 0.516084 4.8 11
    57 credit-g 0.788571 0.765476 0.76 0.73 0.609524 0.654762 0.498715 0.526922 5.6 9.8
    58 credit-g 0.786429 0.737143 0.78 0.74 0.661905 0.661905 0.509098 0.55564 4.5 9.3
    59 credit-g 0.752381 0.707143 0.75 0.66 0.621429 0.633333 0.526 0.588445 4.6 10.7
    60 dilbert nan nan 0.959 0.927 0.95927 0.927283 0.37313 0.243046 3071.8 5732.7
    61 dilbert nan nan 0.969 0.925 0.969026 0.924601 0.355981 0.217827 3088.2 5736.9
    62 dilbert nan nan 0.958 0.913 0.957988 0.913234 0.368037 0.235823 3087.7 5631.3
    63 dilbert nan nan 0.968 0.899 0.96811 0.898918 0.364057 0.279577 3176.9 5684.8
    64 dilbert nan nan 0.975 0.907 0.975163 0.907303 0.353609 0.272681 3019.5 5705.7
    65 dilbert nan nan 0.968 0.906 0.967995 0.905929 0.35037 0.258641 3039.6 5649.5
    66 dilbert nan nan 0.967 0.904 0.967146 0.904641 0.361488 0.302438 3071.4 5829.5
    67 dilbert nan nan 0.961 0.929 0.961348 0.92896 0.371518 0.208034 3047.6 5758.3
    68 dilbert nan nan 0.975 0.924 0.975014 0.924035 0.341111 0.204101 3033.5 5761.9
    69 dilbert nan nan 0.974 0.917 0.974349 0.917206 0.353983 0.276392 3062.3 5759.6
    70 fabert nan nan 0.694175 0.706311 0.655998 0.678044 0.919977 0.962891 493.9 5205.6
    71 fabert nan nan 0.705097 0.707524 0.673712 0.688732 0.919193 0.977101 490.7 5374
    72 fabert nan nan 0.696602 0.679612 0.663541 0.658196 0.880917 1.05394 500.4 5247.5
    73 fabert nan nan 0.696602 0.690534 0.670755 0.675512 0.885773 1.02187 497.9 4908.3
    74 fabert nan nan 0.707524 0.703883 0.671923 0.690287 1.10883 0.982583 513.9 4936.2
    75 fabert nan nan 0.701456 0.724515 0.664636 0.703494 0.943119 0.913164 485.9 4827.9
    76 fabert nan nan 0.691748 0.705097 0.657243 0.687148 0.991882 1.0006 504.1 5273.9
    77 fabert nan nan 0.686513 0.687728 0.652982 0.66752 0.972647 1.0143 492.7 5151.2
    78 fabert nan nan 0.693803 0.698663 0.664057 0.684369 1.05133 1.00813 485.4 5118.8
    79 fabert nan nan 0.679222 0.682868 0.64159 0.656184 1.03787 1.00714 508.7 4880.7
    80 jasmine 0.865996 0.847047 0.822742 0.772575 0.822282 0.77255 0.407403 0.493839 11.1 19.8
    81 jasmine 0.882617 0.860694 0.792642 0.762542 0.792237 0.762506 0.510891 0.477033 10.8 19.1
    82 jasmine 0.911051 0.862461 0.859532 0.809365 0.859955 0.809553 0.358449 0.458768 11.3 21.3
    83 jasmine 0.864228 0.841924 0.822742 0.785953 0.823177 0.786219 0.402974 0.484679 11.9 17.5
    84 jasmine 0.891446 0.85111 0.791946 0.775168 0.791946 0.775168 0.522546 0.496324 11.5 19.3
    85 jasmine 0.874465 0.870974 0.788591 0.802013 0.788591 0.802013 0.414509 0.46007 10.7 18.1
    86 jasmine 0.875681 0.875208 0.818792 0.815436 0.818792 0.815436 0.407578 0.432216 11.1 18.4
    87 jasmine 0.851583 0.850998 0.778523 0.785235 0.778523 0.785235 0.548317 0.470177 10.5 18.7
    88 jasmine 0.873789 0.891852 0.842282 0.812081 0.842282 0.812081 0.406978 0.43201 10.5 18.3
    89 jasmine 0.88451 0.877438 0.808725 0.798658 0.808725 0.798658 0.389047 0.439971 10.8 18
    90 kc1 0.829172 nan 0.881517 nan 0.699197 nan 0.622305 nan 5.4 9.5
    91 kc1 0.797137 nan 0.876777 nan 0.657909 nan 0.788903 nan 5.3 8.5
    92 kc1 0.865398 nan 0.843602 nan 0.574197 nan 0.303443 nan 5.9 9.3
    93 kc1 0.841164 nan 0.838863 nan 0.583589 nan 0.472707 nan 5.6 7.7
    94 kc1 0.765918 nan 0.805687 nan 0.539241 nan 0.704489 nan 5.6 7.8
    95 kc1 0.794774 nan 0.834123 nan 0.58078 nan 0.65995 nan 6 7.6
    96 kc1 0.835972 nan 0.867299 nan 0.63747 nan 0.466631 nan 5.5 8.1
    97 kc1 0.830524 nan 0.872038 nan 0.677307 nan 0.3388 nan 5.4 7.7
    98 kc1 0.870531 nan 0.881517 nan 0.682925 nan 0.2923 nan 6 7.7
    99 kc1 0.787746 nan 0.828571 nan 0.56566 nan 0.671742 nan 5.5 8.6
    100 kr-vs-kp 0.999393 0.999922 0.9875 0.990625 0.987202 0.990196 0.0778635 0.0180881 6.9 17.3
    101 kr-vs-kp 0.992681 0.995304 0.98125 0.978125 0.982036 0.978494 0.0963221 0.105302 8 17.1
    102 kr-vs-kp 0.999256 0.999256 0.99375 0.990625 0.993464 0.990196 0.0805282 0.0390096 7.1 17
    103 kr-vs-kp 0.999883 0.999824 0.990625 0.990625 0.990744 0.991018 0.0756029 0.0281091 6.8 16.8
    104 kr-vs-kp 1 1 1 1 1 1 0.0625037 0.00889091 7.3 17.4
    105 kr-vs-kp 0.999941 1 0.990625 1 0.991018 1 0.0646797 0.00874697 6.5 16.5
    106 kr-vs-kp 0.99561 0.997795 0.987461 0.99373 0.987184 0.99372 0.0862045 0.0350546 6.8 14.9
    107 kr-vs-kp 0.994938 0.998838 0.984326 0.987461 0.983553 0.987138 0.0964548 0.0609861 7.4 16.3
    108 kr-vs-kp 0.999114 0.999724 0.984326 0.99373 0.984144 0.993717 0.0797152 0.0251634 6.9 15.3
    109 kr-vs-kp 0.999921 0.999803 0.996865 0.99373 0.997006 0.994012 0.0666536 0.0271778 6.7 13.6
    110 mfeat-factors nan nan 0.965 0.91 0.965 0.91 0.258424 0.292598 11 390.9
    111 mfeat-factors nan nan 0.965 0.91 0.965 0.91 0.256913 0.2551 11 488.7
    112 mfeat-factors nan nan 0.98 0.92 0.98 0.92 0.226164 0.301555 11 490.8
    113 mfeat-factors nan nan 0.96 0.91 0.96 0.91 0.265162 0.274168 11.3 487.4
    114 mfeat-factors nan nan 0.945 0.87 0.945 0.87 0.464267 0.411958 10.9 488
    115 mfeat-factors nan nan 0.965 0.91 0.965 0.91 0.253705 0.331723 11 485
    116 mfeat-factors nan nan 0.96 0.91 0.96 0.91 0.249729 0.289065 11.7 491
    117 mfeat-factors nan nan 0.955 0.92 0.955 0.92 0.221254 0.271884 11 480.5
    118 mfeat-factors nan nan 0.965 0.91 0.965 0.91 0.251814 0.283761 11.1 481.2
    119 mfeat-factors nan nan 0.975 0.93 0.975 0.93 0.258493 0.288244 11.7 482.8
    120 phoneme 0.960972 0.956905 0.907579 0.885397 0.888661 0.878462 0.290749 0.295581 5.7 9.2
    121 phoneme 0.970118 0.961449 0.924214 0.913124 0.907784 0.901767 0.221067 0.254949 5.7 8.6
    122 phoneme 0.953168 0.929476 0.890943 0.866913 0.858523 0.834165 0.253817 0.346479 9 8.7
    123 phoneme 0.953991 0.941478 0.892791 0.859519 0.869011 0.834436 0.30981 0.320529 5.9 8.3
    124 phoneme 0.941091 0.927182 0.887037 0.866667 0.862632 0.840811 0.39088 0.346186 5.9 9.6
    125 phoneme 0.964983 0.952482 0.918519 0.888889 0.90344 0.878786 0.232069 0.279442 6.1 10.3
    126 phoneme 0.978229 0.960766 0.931481 0.9 0.912602 0.87365 0.196456 0.248106 5.7 9.3
    127 phoneme 0.962887 0.954926 0.914815 0.9 0.891544 0.87365 0.232282 0.263625 5.9 10
    128 phoneme 0.952649 0.935951 0.9 0.862963 0.868667 0.833259 0.312718 0.341778 6.2 8.7
    129 phoneme 0.960968 0.948381 0.907407 0.907407 0.870252 0.873917 0.23324 0.262606 5.7 9.7
    130 segment nan nan 0.982684 0.978355 0.982684 0.978355 0.248126 0.109847 5.4 112
    131 segment nan nan 0.965368 0.969697 0.965368 0.969697 0.133645 0.15302 5.9 162.6
    132 segment nan nan 0.974026 0.961039 0.974026 0.961039 0.0903909 0.0976559 5.5 117.4
    133 segment nan nan 0.974026 0.965368 0.974026 0.965368 0.109198 0.150524 5.3 90
    134 segment nan nan 0.991342 0.987013 0.991342 0.987013 0.0774718 0.0525901 6 110.4
    135 segment nan nan 0.95671 0.969697 0.95671 0.969697 0.102213 0.122378 5.5 92.3
    136 segment nan nan 0.974026 0.978355 0.974026 0.978355 0.105654 0.0977534 5.1 91.6
    137 segment nan nan 0.95671 0.952381 0.95671 0.952381 0.291009 0.139675 5.9 77.6
    138 segment nan nan 0.961039 0.969697 0.961039 0.969697 0.135033 0.115094 5.2 72.5
    139 segment nan nan 0.969697 0.978355 0.969697 0.978355 0.117298 0.0837299 5.4 67.8
    140 sylvine 0.973082 0.974678 0.920078 0.931774 0.920013 0.931717 0.214351 0.210881 6.7 11.2
    141 sylvine 0.985682 0.986412 0.94152 0.947368 0.941444 0.947326 0.165776 0.164922 6.9 10.9
    142 sylvine 0.982627 0.980408 0.939571 0.945419 0.939613 0.945472 0.176468 0.182448 6.5 10.6
    143 sylvine 0.983828 0.969518 0.94152 0.94347 0.941551 0.943504 0.172237 0.192741 6.3 12.8
    144 sylvine 0.980194 0.980942 0.931641 0.943359 0.931641 0.943359 0.183483 0.174067 7.1 10.7
    145 sylvine 0.974785 0.975784 0.925781 0.941406 0.925781 0.941406 0.196787 0.19208 6.6 11.9
    146 sylvine 0.976357 0.982239 0.931641 0.945312 0.931641 0.945312 0.197862 0.172008 6.5 12.3
    147 sylvine 0.982697 0.980759 0.927734 0.951172 0.927734 0.951172 0.185382 0.167868 6.9 11.3
    148 sylvine 0.981453 0.988815 0.943359 0.947266 0.943359 0.947266 0.179784 0.153231 6.8 11.6
    149 sylvine 0.983498 0.990608 0.933594 0.949219 0.933594 0.949219 0.180598 0.15897 6.6 10.6
    150 vehicle nan nan 0.776471 0.835294 0.77868 0.834361 0.507338 0.593526 5.3 9.3
    151 vehicle nan nan 0.705882 0.658824 0.70882 0.661688 0.54274 0.798171 4.5 9.9
    152 vehicle nan nan 0.694118 0.705882 0.69697 0.710498 0.582163 0.750613 4.8 9.4
    153 vehicle nan nan 0.823529 0.788235 0.82408 0.788907 0.465199 0.562478 5.1 9
    154 vehicle nan nan 0.717647 0.741176 0.719643 0.741126 0.52135 0.645659 4.3 9.6
    155 vehicle nan nan 0.694118 0.752941 0.698052 0.757035 0.529287 0.640406 4.5 10
    156 vehicle nan nan 0.702381 0.761905 0.715368 0.768598 0.540311 0.601441 4.3 9.1
    157 vehicle nan nan 0.833333 0.797619 0.837121 0.800812 0.472727 0.571386 5 9.4
    158 vehicle nan nan 0.75 0.714286 0.75 0.714827 0.492527 0.73362 4.6 8.9
    159 vehicle nan nan 0.714286 0.690476 0.71369 0.6875 0.575806 0.740775 4.4 8.7

    On kc1 (openml.org/t/3917) got an issue:

    image
    but, it succesfully performs on custom launch:

    Accuracy AUC Log_loss Balacc
    0.829023 0.716955 0.426545 0.600246
  • Bug Fix: Resolved issues in the convert_to_dataframe method for both XGBoost and LightGBM models:

    ValueError: Length of values (100) does not match length of index (50)
    

    Follow this link to check bug reproducing:
    https://colab.research.google.com/drive/1r09xVZeVYSaTcQmG8-0r0RJyPfGesqq9?usp=sharing

    This case happens specifically in XGBoost and LGBM because of the convert_to_dataframe method when we try to use a container as a target in model.fit(features=X_train, target=y_train) instead of model.fit(features=train, target='target')

Context

Closes #1341

@dmitryglhf dmitryglhf requested a review from nicl-nno December 2, 2024 15:54
@pep8speaks
Copy link

pep8speaks commented Dec 2, 2024

Hello @dmitryglhf! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2025-01-07 15:38:49 UTC

RIDGE = 'ridge'

# Parameters of models
models_params = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Это какие-то эффективные гиперпараметры?

Copy link
Collaborator Author

@dmitryglhf dmitryglhf Dec 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

При тестировании пайплайнов на Kaggle с указанными гиперпараметрами значение метрики немного улучшалось. Настройку параметров прекращал, когда скорость работы значительно уменьшалась или качество падало. Для CatBoost и линейных моделей параметры остались по умолчанию, поскольку они изначально показывали хорошие результаты, а при подборе метрика ухудшалась и/или время работы возрастало (для CatBoost).

.add_branch((CATBOOSTREG, models_params[CATBOOSTREG]),
(XGBOOSTREG, models_params[XGBOOSTREG]),
(LGBMREG, models_params[LGBMREG])) \
.join_branches(CATBOOSTREG, models_params[CATBOOSTREG])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

А почему модели обьединены CATBOOSTREG, а не линейной моделью?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Пробовал объединять с помощью Random Forest и линейных моделей. В обоих случаях метрика немного ухудшалась. Аналогичный результат был, когда эти модели были перед разветвлением. Это тестирование проводилось не на полном бенчмарке, а на Kaggle. Думаю здесь мне стоит провести больше тестов с объединением линейной моделью.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ну вот в презентации расхваливалась именно линейная модель. Можно конечно оба варианта добавлять в популяцию, с использованием кэша это вычислительно не очень дорого.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Хорошо, я протестирую это.

@nicl-nno
Copy link
Collaborator

nicl-nno commented Dec 2, 2024

image

А какая модель тут была раньше, что качество так просело?

@dmitryglhf
Copy link
Collaborator Author

dmitryglhf commented Dec 3, 2024

image

А какая модель тут была раньше, что качество так просело?

Раньше здесь было Scaling + RandomForest.

@nicl-nno
Copy link
Collaborator

nicl-nno commented Dec 3, 2024

Раньше здесь было Scaling + RandomForest.

Вроде он и сейчас в начальной популяции есть. Он не вообще используется или просто проигрывает в ходе отбора?

@dmitryglhf
Copy link
Collaborator Author

Раньше здесь было Scaling + RandomForest.

Вроде он и сейчас в начальной популяции есть. Он не вообще используется или просто проигрывает в ходе отбора?

Сейчас он есть, но не использовался, в бенчмарке тестировалось только 'gbm' приближение.

@dmitryglhf
Copy link
Collaborator Author

dmitryglhf commented Dec 11, 2024

Metric main gbm_linear linear_gbm_linear linear_gbm_catboost
0 auc 0.869263 0.879204 0.853338 0.848935
1 acc 0.84667 0.851591 0.826598 0.821353
2 balacc 0.805336 0.823641 0.79243 0.793106
3 logloss 0.449189 0.381866 0.504397 0.492602
4 training_duration 242.554 1007.77 86.6531 99.8162
Full table and Details

Pipelines (ridge instead of logit in regression tasks);
'main' assumption:
image

'gbm_linear' assumption:
image

'linear_gbm_linear' assumption:
image

'linear_gbm_catboost' assumption:
image

main_vs_gl.csv
main_vs_lgl.csv
main_vs_lgc.csv

Chose gbm_linear as result.

@dmitryglhf
Copy link
Collaborator Author

/fix-pep8

@aimclub aimclub deleted a comment from codecov bot Dec 11, 2024
Copy link

codecov bot commented Dec 11, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 80.34%. Comparing base (b0618df) to head (50064a3).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #1350   +/-   ##
=======================================
  Coverage   80.33%   80.34%           
=======================================
  Files         146      146           
  Lines       10464    10469    +5     
=======================================
+ Hits         8406     8411    +5     
  Misses       2058     2058           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@dmitryglhf
Copy link
Collaborator Author

Nodes fit time in gbm_linear for Kaggle_s4e6, n_jobs=16 (-1)

Node: scaling, fit time: 1.89 seconds
Node: catboost, fit time: 82.914 seconds
Node: lgbm, fit time: 7.416 seconds
Node: xgboost, fit time: 2.812 seconds
Node: logit, fit time: 0.551 seconds
2024-12-11 18:14:40,666 - ApiComposer - Initial pipeline was fitted in 106.3 sec.

@dmitryglhf
Copy link
Collaborator Author

Metric (mean) main_scal_rf gbm_linear gbm_linear_catboost_with_rsm (new) catboost_only_without_rsm (new) rf_gbm_linear (new)
0 auc 0.869263 0.879204 0.877655 0.874172 0.872379
1 acc 0.84667 0.851591 0.852249 0.84465 0.839088
2 balacc 0.805336 0.823641 0.824312 0.815915 0.80716
3 logloss 0.449189 0.381866 0.383262 0.36923 0.647716
4 training_duration 242.554 1007.77 338.644 883.467 93.0425

Training duration reduced in gbm_linear because of adding parameter "rsm": 0.1 as default for catboost (Speeding up CatBoost).

Full table and Details

catboost_only.csv
gbm_linear_ctbst_rsm.csv
rf_gbm_linear.csv

catboost_only_without_rsm:
image

rf_gbm_linear:
image

@dmitryglhf
Copy link
Collaborator Author

dmitryglhf commented Dec 24, 2024

Metric (mean) main_scal_rf gbm_linear gbm_catboost_new_params rf3_gbm_linear xgb_lgbm_linear
0 auc 0.869263 0.879204 0.879746 0.86393 0.877597
1 acc 0.84667 0.851591 0.852339 0.837973 0.848727
2 balacc 0.805336 0.823641 0.822745 0.805042 0.818386
3 logloss 0.449189 0.381866 0.377827 0.625611 0.392734
4 training_duration 242.554 1007.77 251.445 104.559 213.057
Full table and Details

catboost_new_params.csv
rf3_gbm_linear.csv
xgb_lgbm_linear.csv

gbm_catboost_new_params tested with new default parameters for catboost:

  "catboost": {
    "n_jobs": -1,
    "num_trees": 3000,
    "learning_rate": 0.03,
    "l2_leaf_reg": 1e-2,
    "bootstrap_type": "Bernoulli",
    "grow_policy": "SymmetricTree",
    "max_depth": 5,
    "min_data_in_leaf": 1,
    "one_hot_max_size": 10,
    "fold_permutation_block": 1,
    "boosting_type": "Plain",
    "od_type": "Iter",
    "od_wait": 100,
    "max_bin": 32,
    "feature_border_type": "GreedyLogSum",
    "nan_mode": "Min",
    "verbose": false,
    "allow_writing_files": false,
    "use_eval_set": true,
    "use_best_model": true,
    "enable_categorical": true

  },
  "catboostreg": {
    "n_jobs": -1,
    "num_trees": 3000,
    "learning_rate": 0.03,
    "l2_leaf_reg": 1e-2,
    "bootstrap_type": "Bernoulli",
    "grow_policy": "SymmetricTree",
    "max_depth": 5,
    "min_data_in_leaf": 1,
    "one_hot_max_size": 10,
    "fold_permutation_block": 1,
    "boosting_type": "Plain",
    "od_type": "Iter",
    "od_wait": 100,
    "max_bin": 32,
    "feature_border_type": "GreedyLogSum",
    "nan_mode": "Min",
    "verbose": false,
    "allow_writing_files": false,
    "use_eval_set": true,
    "use_best_model": true,
    "enable_categorical": true,
    "loss_function": "MultiRMSE"
  },

rf3_gbm_linear:
image

xgb_lgbm_linear:
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

enh: Design effective initial assumption
3 participants