You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
All dog owners residing in NYC are required by law to license their dogs. The data is sourced from the DOHMH Dog Licensing System (https://a816-healthpsi.nyc.gov/DogLicense), where owners can apply for and renew dog licenses. Each record represents a unique dog license that was active during the year, but not necessarily a unique record per dog, since a license that is renewed during the year results in a separate record of an active license period. Each record stands as a unique license period for the dog over the course of the yearlong time frame.
Some example exercises from my quick exploration follow the Pros and Cons.
Pros
Accessible content area - most people have at least a basic familiarity with dog breeds and know NYC is a big city.
Fertile for data manipulation questions.
Pretty tidy.
Room for joining with census tract data.
Cons
Is this being updated? Docs say 2016, but it actually now includes 2017 registrations.
Probably not similar data for other geographic areas.
Requires some working with datetimes, although we could do this and have a cleaner version available.
Example exercises
library(tidyverse)
Read in "NYC_Dog_Licensing_Dataset.csv" and take a look with glimpse(). You'll need to make sure values that are "NULL" in the CSV file are interpreted as missing values.
## # A tibble: 16,800 x 2
## # Groups: AnimalName [16,800]
## AnimalName n
## <chr> <int>
## 1 UNKNOWN 2489
## 2 NAME NOT PROVIDED 1764
## 3 BELLA 1360
## 4 MAX 1287
## 5 CHARLIE 984
## 6 COCO 943
## 7 ROCKY 880
## 8 LOLA 876
## 9 LUCY 767
## 10 BUDDY 747
## # … with 16,790 more rows
Make sure the values UNKNOWN and NAME NOT PROVIDED in the AnimalName column are interpreted as missing values, then find the most popular male dog names.
Peaks in summer? Generally increasing trend - are there more dogs or just more registrations?
When are dogs born?
Complicated by the fact that since animal birth month is entered as a full datetime in M/D/Y format - month and year are meaningful, day and time are not:
A slightly lighthearted data option for the novice courses.
Source: https://data.cityofnewyork.us/Health/NYC-Dog-Licensing-Dataset/nu7n-tubp
Some inspiration for questions: https://www.nytimes.com/interactive/2018/02/08/realestate/dogs-of-new-york.html?module=inline
Some example exercises from my quick exploration follow the Pros and Cons.
Pros
Accessible content area - most people have at least a basic familiarity with dog breeds and know NYC is a big city.
Fertile for data manipulation questions.
Pretty tidy.
Room for joining with census tract data.
Cons
Is this being updated? Docs say 2016, but it actually now includes 2017 registrations.
Probably not similar data for other geographic areas.
Requires some working with datetimes, although we could do this and have a cleaner version available.
Example exercises
library(tidyverse)
Read in
"NYC_Dog_Licensing_Dataset.csv"
and take a look withglimpse()
. You'll need to make sure values that are"NULL"
in the CSV file are interpreted as missing values.glimpse(dogs)
What are the most popular dog names?
Make sure the values
UNKNOWN
andNAME NOT PROVIDED
in theAnimalName
column are interpreted as missing values, then find the most popular male dog names.What are some of the longest dog names that have been registered?
The first one looks like the entire record has been truncated in the name field - copy and paste entry error?
What breeds are most common?
How does the number of registrations change over time?
Peaks in summer? Generally increasing trend - are there more dogs or just more registrations?
When are dogs born?
Complicated by the fact that since animal birth month is entered as a full datetime in M/D/Y format - month and year are meaningful, day and time are not:
When are dogs born?
Is Jan the default value?
Other ideas:
What does it mean that there might be more than one record per unique dog? Why can't we identify individual dogs?
Combine with census tract data: plot number of registrations against demographics, or registrations of a particular breed against demographics.
Are there breeds that are increasing or decreasing in popularity (hard with only 2 years of registrations)?
Compare lengths of names between dogs with
"Unknown"
breed to those with a stated breed. E.g. do mutts get shorter names?The text was updated successfully, but these errors were encountered: