Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data type issue when loading data. #118

Open
biodfrl89 opened this issue Nov 4, 2020 · 1 comment
Open

Data type issue when loading data. #118

biodfrl89 opened this issue Nov 4, 2020 · 1 comment

Comments

@biodfrl89
Copy link

I have identify in Lesson 5 "Explorando data frames" that, when loading the gapminder dataset:

gapminder <- read.csv("data/gapminder-FiveYearData.csv")

the parameter stringsAsFactor is never used. It's ok for the main lessons, because when they invoke str(gapminder) it is shown that country and continent are characters vector. But in the Solution to Challenge 4, where the student must analyze the output from str(gapminder), it is annotated that country and continent are factors. But they are not, they are character vectors.

In order to solve this, the read.csv( ) must be used with stringsAsFactor = TRUE, or change the Solution to Challenge 4 to say that country and continent are characters vector.

Something similar happen in Lesson 13 "Manipulación de data frames con dplyr". The gapminder dataset, used in previous lessons, is processed using:

gdp_bycontinents <- gapminder %>% group_by(continent) %>% summarize(mean_gdpPercap=mean(gdpPercap))

however, when gdp_bycontinents variable is called, the output say that continen is <fctr>, but it should say <chr>. Again is not clear if the original dataset is loaded via read.csv( ), using the parameter stringsAsFactor =TRUE or not.

Curiously, on Lesson 14 "Manipulación de data frames usando tidyr", a wide version of gapminder dataset is loaded, and until this lesson it is shown that the parameter stringsAsFactors = FALSE is used and it is explained why.

gap_wide <- read.csv("data/gapminder_wide.csv", stringsAsFactors = FALSE)

In general, these kind of situations tell me that maybe the read.csv( ) function was updated and the stringsAsFactors parameter default value was changed from TRUE to FALSE, not having the necessity to specify it in Lesson 14, but also altering some outputs from lessons 5 and 13.

@zkamvar
Copy link
Contributor

zkamvar commented Nov 5, 2020

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants