Dennis Batiste- 3rd Assignment #420

dennibat · 2019-09-07T22:05:59Z

No description provided.

llpk79 · 2019-09-08T07:02:14Z

It's good to see you here, Mr. Bamboo!

dennibat · 2019-09-08T23:13:10Z

LS-DS_112

llpk79 · 2019-09-13T04:37:27Z

Okay, so Making Data Backed Assertions.

A little rough here, but it sounds like the idea of binning these values before plotting them makes a lot of sense. I think you'll do fine with the rest after getting that sorted out.

Try making cross tabs with the binned features and explore heatmaps and other plots to help understand better.
We are trying to tell a story with the data. These are tools to help us understand the story, and to tell it when we figure it out.

Keep it up, Dennis!! Please reach out to me, or any of the team, with anything at all. I'm usually up late!!

llpk79 · 2019-09-17T01:36:37Z

Sprint challenge code review:

Part 1 - Load and validate the data

Load the data as a pandas data frame.
- Complete.
Validate that it has the appropriate number of observations (you can check the raw file, and also read the dataset description from UCI).
- Incomplete.
- Build your headers list before running pd.read_csv(file_or_url, header=header)
- Or, do like you did, but make header=None
Validate that you have no missing values.
- Incomplete.
- Run df.isna().sum() on the last line of a cell, or use print()
- How do you know how many rows there should be?
Add informative names to the features.
- Complete.
The survival variable is encoded as 1 for surviving >5 years and 2 for not - change this to be 0 for not surviving and 1 for surviving >5 years (0/1 is a more traditional encoding of binary variables)
- Complete.
- Try to think of a more explicit way to do this. What is the value of headers[3]?
At the end, print the first five rows of the dataset to demonstrate the above.
- Incomplete.
- df.head() will do it.

Part 2 - Examine the distribution and relationships of the features

Explore the data - create at least 2 tables (can be summary statistics or crosstabulations) and 2 plots illustrating the nature of the data.
- Complete.
- Take careful note of your variable names and the arguments to your functions. Did you intend to do pd.crosstab(df['survived'], *year_of_op_bin*, normalize='columns')? Because it's a good idea, make sure you execute it!
- What do these plots tell us?

Part 3 - DataFrame Filtering

Use DataFrame filtering to subset the data into two smaller dataframes. You should make one dataframe for individuals who survived >5 years and a second dataframe for individuals who did not.
- Incomplete.
- Dataframe filtering is like: new_df = df[df[column] == condition]
- You correctly encoded the survival column above. Here, we are making use of that.
- Check your syntax when defining functions. There cannot be a space before the ()
Create a graph with each of the dataframes (can be the same graph type) to show the differences in Age and Number of Positive Axillary Nodes Detected between the two groups.
- Incomplete.

Part 4 - Analysis and Interpretation

Answer these as if you were at a job interview speaking with a hiring manager.

llpk79 · 2019-09-17T18:05:47Z

Retry - Sprint challenge code review:

Part 1 - Load and validate the data

Load the data as a pandas data frame.
- Complete.
Validate that it has the appropriate number of observations (you can check the raw file, and also read the dataset description from UCI).
- Complete.
Validate that you have no missing values.
- Complete.
Add informative names to the features.
- Complete.
The survival variable is encoded as 1 for surviving >5 years and 2 for not - change this to be 0 for not surviving and 1 for surviving >5 years (0/1 is a more traditional encoding of binary variables)
- Incomplete.
- There are no Nan values or strings to replace in this data.
At the end, print the first five rows of the dataset to demonstrate the above.
- Complete.

Part 2 - Examine the distribution and relationships of the features

Explore the data - create at least 2 tables (can be summary statistics or crosstabulations) and 2 plots illustrating the nature of the data.
- Complete.
- What do these plots tell us?

Part 3 - DataFrame Filtering

Use DataFrame filtering to subset the data into two smaller dataframes. You should make one dataframe for individuals who survived >5 years and a second dataframe for individuals who did not.
- Incomplete.
Create a graph with each of the dataframes (can be the same graph type) to show the differences in Age and Number of Positive Axillary Nodes Detected between the two groups.
- Incomplete.

Part 4 - Analysis and Interpretation

While perhaps true, these conclusions are not supported by the data examined in this sprint.

dennibat added 3 commits September 4, 2019 20:01

Created using Colaboratory

28d58a9

Delete Copy_of_LS_DS_DSPT3_111_A_First_Look_at_Data.ipynb

02f9762

Created using Colaboratory

bdd7f11

Created using Colaboratory

a67a4c7

dennibat closed this Sep 8, 2019

dennibat reopened this Sep 8, 2019

dennibat changed the title ~~Dennis Batiste- First Assignment (First Look at Data)~~ Dennis Batiste- 2nd Assignment Sep 8, 2019

dennibat added 2 commits September 9, 2019 19:26

Created using Colaboratory

904791c

Created using Colaboratory

e8f6ae9

dennibat changed the title ~~Dennis Batiste- 2nd Assignment~~ Dennis Batiste- 3rd Assignment Sep 11, 2019

Created using Colaboratory

85bb799

Created using Colaboratory

7545898

dennibat added 2 commits September 18, 2019 00:29

Created using Colaboratory

62c302f

Add files via upload

b241915

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dennis Batiste- 3rd Assignment #420

Dennis Batiste- 3rd Assignment #420

dennibat commented Sep 7, 2019

llpk79 commented Sep 8, 2019

dennibat commented Sep 8, 2019

llpk79 commented Sep 13, 2019

llpk79 commented Sep 17, 2019

llpk79 commented Sep 17, 2019

Dennis Batiste- 3rd Assignment #420

Are you sure you want to change the base?

Dennis Batiste- 3rd Assignment #420

Conversation

dennibat commented Sep 7, 2019

llpk79 commented Sep 8, 2019

dennibat commented Sep 8, 2019

llpk79 commented Sep 13, 2019

llpk79 commented Sep 17, 2019

llpk79 commented Sep 17, 2019