Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data preprocessing #95

Open
amira-yahlali opened this issue Mar 2, 2023 · 7 comments
Open

Data preprocessing #95

amira-yahlali opened this issue Mar 2, 2023 · 7 comments

Comments

@amira-yahlali
Copy link

I'm trying to clean my data and do some preprocessing but i don't have much understanding of the Columns if the zero in them are normal or missing values i'm using the dataset cic-collection on kaggle if any expert would help i'd be much thankful

@algopy
Copy link

algopy commented Mar 2, 2023 via email

@amira-yahlali
Copy link
Author

Ok, what's your objective ?

On Thu, Mar 2, 2023, 15:58 amira-yahlali @.***> wrote:

I'm trying to clean my data and do some preprocessing but i don't have
much understanding of the Columns if the zero in them are normal or missing
values i'm using the dataset cic-collection on kaggle if any expert would
help i'd be much thankful


Reply to this email directly, view it on GitHub
#95,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AMQQRCQQ3UD2DHFC2J2VBX3W2BY6VANCNFSM6AAAAAAVNF5SK4
.
You are receiving this because you are subscribed to this thread.Message
ID: @.***>

I just need an understanding of what the columns represent and if the null value in each columns is a normal value or is it a missing value i'm trying to preprocess my data and like minimize it

@algopy
Copy link

algopy commented Mar 2, 2023 via email

@amira-yahlali
Copy link
Author

My data is the cic-ids-collection on kaggle using class label as target dropping label and the rest is features i'd love to send you my notebook directly to make it easier for you

@AnmolArora15
Copy link

Hi,
Is this issue still open?
I am looking forward to working on it.
Thanks,
Anmol Arora

@HeerakKashyap
Copy link

I'm trying to clean my data and do some preprocessing but i don't have much understanding of the Columns if the zero in them are normal or missing values i'm using the dataset cic-collection on kaggle if any expert would help i'd be much thankful

see brother, if u want to remove the columns having all the null values/missing values you can use : data.drop(colums=[' ',' ' ] , inplace=true) in order to remove those columns

if u want to check the columns with number of non null values you can use data.info() to have precise understanding for the data .

if order to check the outliers in the data you can use seaborn library and import pairplot fucntion i.e seaborn.pairplot in oder to have graph depicting the outliers .

Regards

@amira-yahlali
Copy link
Author

amira-yahlali commented Aug 13, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants