Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update README.md #1

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
94 changes: 35 additions & 59 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,101 +1,77 @@
After learned solutions by others,especially Little Boat,simple but effective!!amazing!

I spend a lot of time FE,and submit more times then Faron,Little Boat et.
# Kaggle Rethop Solution Ranking 1st

I'll thanks KazAnova who public the leak,otherwise I can't do this better.You are my idol.
## score: *Public*:0.50379 *Private*:0.50500

My English is poor,if I write confused,please tell me,many thanks.haha..

**FE:**
## Features Enginnering

I split the base features two classes:

manager:created,description,price,et

building:bathrooms,bedrooms,latitude,longitude,display_address,featuers,photos,et
1. manager:created,description,price,et

Then I link and compare them one by one.
2. building:bathrooms,bedrooms,latitude,longitude,display_address,featuers,photos,et

My best single model at: https://github.com/plantsgo/Rental-Listing-Inquiries
## Files

score:Public:0.50379 Private:0.50500
* *the jpgs.json* is the shape of each photos.

you should add train.json and test.json in the folder.
* *listing_image_time.csv* is the leak @KazAnova said.

the jpgs.json is the shape of each photos.
## How to Use

the listing_image_time.csv is the leak @KazAnova said.
1. *sigma.py* to create the csv file.

1.run sigma.py to create the csv file.
2. *script.py* to create the features what @gdy5 show.

2.run script.py to create the features what @gdy5 show.
3. *feature_tt.py* to create the base features.

3.run feature_tt.py to create the base features.
4. *feature_tt_long.py* to create four features which spend about four hours...but I have generate already which named timeout.csv,so you can skip it....

4.run feature_tt_long.py to create four features which spend long time,about four hours...but I have give it which named timeout.csv,so you can skip it....
5. *xgb.py* and will create the last result.

5.run xgb.py and will create the last result.
## Transform

**Transform:**
1. X

1.X
2. log10(X+1)

2.log10(X+1)
## Ensemble

My best nn model is log10(X+1) score LB:0.535 before add magic feature.
My best nn model is log10(X+1) score *LB:0.535* before add magic feature.

Ensemble:
Level 2:
### Level 2:

I have 4 datasets:

1.My best single model.

2.some features which not improve at my best model ,but can improve at model with base features.

3.@gdy5 's kernel with some of my features.

4.@Branden Murrayit 's kernel add some of my features.

①:each dateset I used [xgb,nn,gb,rf,et,lr,xgb_reg,lgb_reg,nn_reg] cv flod=5
1. My best single model.

the reg model have a good importance in my model.

②:and I merge high and medium level ,then userd[lgb,nn,lgb_reg,nn_reg,et,rf] in my best dataset. cv flod=5

③:[nn,nn_reg,xgb,gb,rf,et,lr,xgb_reg]@last three datasets cv flod=5
2. some features which not improve at my best model ,but can improve at model with base features.

④:[nn,nn_reg,xgb,gb,rf,et,lr,xgb_reg]add magic feature @last three datasets cv flod=5
3. @gdy5 's kernel with some of my features.

⑤:[nn,nn_reg,xgb,knn,gb,rf,et,lr,ada_reg,rf_reg,gb_reg,et_reg,xgb_reg]@last three datasets cv flod=10
4. @Branden Murrayit 's kernel add some of my features.


Level 3:
Each dateset I used [xgb,nn,gb,rf,et,lr,xgb_reg,lgb_reg,nn_reg] cv fold=5

1.user ①,②,③,④ as metefeatures with xgb,nn,et.

with a feature from description,Classify the source by description:
begin with " "

CooperCooper.com
the reg model have a good importance in my model.

p a website_redacted

li /ul /p
### Level 3:

it improved at public but turn bad at pravate.Maybe can remove it.
1. 1,2,3 metefeatures with xgb,nn,et.

pre=((xgb^0.65)*(nn^0.35))*0.85+et*0.15
then userd @weiwei 's Prior correction. but only improved 0.00001-0.00002
*pre=((xgb^0.65)*(nn^0.35))*0.85+et*0.15*
then userd @weiwei 's Prior correction. but only improved 0.00001-0.00002

2.user ①,②,⑤ as metefeatures with xgb,nn,et.
pre=((xgb^0.65)*(nn^0.35))*0.85+et*0.15
2. 1,2,3,4 metefeatures with xgb,nn,et.
*pre=((xgb^0.65)*(nn^0.35))*0.85+et*0.15*

Level 4:
50/50 average level 3
### Level 4:
50/50 average level 3


**Last,Thanks all shares,I learned many from the kernels and discussions.**
##*Last,Thanks all shares,I learned many from the kernels and discussions.*



Expand Down