-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path06_shr.Rmd
586 lines (475 loc) · 43.5 KB
/
06_shr.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
# Supplementary Homicide Reports (SHR) {#shr}
```{r, echo=FALSE}
knitr::opts_chunk$set(
echo = FALSE,
warning = FALSE,
error = FALSE,
message = FALSE
)
```
The Supplementary Homicide Reports dataset - often abbreviated to SHR - is the most detailed of the SRS datasets and provides information about the circumstances and participants (victim and offender demographics and relationship status) for homicides. For each homicide incident it tells you the age, sex, race, and ethnicity of each victim and offender as well as the relationship between the first victim and each of the offenders (but not the other victims in cases where there are multiple victims). It also tells you the weapon used by each offender and the circumstance of the killing, such as a "lovers triangle" or a gang-related murder. As with other SRS data, it also tells you the agency it occurred in and the month and year when the crime happened.
One important point of clarification: this is not the number of murders, though it does track that. This data also includes the number of homicides that are manslaughter by negligence (e.g. children playing with a gun, hunting accident) and justifiable homicides (i.e. not criminal). So be carefully when speaking about this data. It is murders but not only murders so you want to speak precisely.
```{r}
shr <- readRDS("data/supplementary_homicide_reports_shr_1976_2023.rds")
offenses_known_yearly <- readRDS("data/offenses_known_yearly_1960_2023.rds")
```
## Agencies reporting
This data only has a report when the agency has a homicide that year and since homicides are relatively rare it is difficult to measure underreporting. One way we can look at reporting is to compare homicide in the SHR data with that of other datasets. We will look at two of them: the Offenses Known and Clearances by Arrest which is covered in detail in Chapter \@ref(offensesKnown), and the Center for Disease Control and Prevention (CDC) data on national deaths from homicide.^[CDC WONDER data is available here: https://wonder.cdc.gov/] Both this dataset and the Offenses Known and Clearances by Arrest data are SHR datasets so you may think that the numbers of homicides from each dataset should be the same. That is a perfectly reasonable assumption, but since this is SHR data we are talking about, you would be wrong. Police agencies are free to report to either, both, or neither dataset so while the number of homicides are close for each dataset, they are never equal. CDC WONDER data aggregates mortality data (among other data) from state death certificates which reduces the issue of voluntary reporting that we have in SHR data.
Figure \@ref(fig:shrVsOffenses) shows the annual number of homicide victims (including murders and manslaughters) from each of these datasets starting in 1976 for the SHR data and in 1999 for the CDC data.^[1975 is actually the first year that the Supplementary Homicide Reports data is available but that dataset only has info for a single victim and offender - all later years has info for up to 11 victims and offenders - so 1976 is often used as the first year of data]
For the SHR data, in every year the numbers are fairly similar and the trends are the same over time, but the number of homicides is never equal. The numbers have actually gotten worse over time with the difference between the datasets increasing and the Offenses Known data having consistently more murders reported than the SHR data since the late 1990s. Compared to the CDC data, however, both SHR datasets - and in particular the SHR data - undercount the number of homicides. While trends are the same, SHR data reports thousands fewer murders per year than the CDC data, indicating how much of an issue underreporting is in this data.
```{r shrVsOffenses, fig.cap = "The annual number of murders and nonngeligent manslaughters from the Supplementary Homicide Report and the Offenses Known and Clearances by Arrest dataset, and homicides from the Center for Disease Control (CDC). Numbers differ because agencies voluntarily report and may not report to both datasets."}
cdc_homicides <- read_table2("data/cdc_wonder_homicide.txt") %>%
rename_all(janitor::make_clean_names)
cdc_homicides <- cdc_homicides[1:22, ]
cdc_homicides <- cdc_homicides %>%
select(year,
year_2) %>%
rename(homicides = year_2) %>%
mutate(year = readr::parse_number(year),
homicides = readr::parse_number(homicides))
shr_murders <- shr %>%
mutate(victim_count = additional_victim_count + 1) %>%
group_by(year) %>%
summarize(shr_murders = sum(victim_count))
offenses_known_murders <- offenses_known_yearly %>%
filter(year >= 1976) %>%
mutate(homicides = actual_murder + actual_manslaughter) %>%
group_by(year) %>%
summarize(offenses_known_murders = sum(homicides))
shr_offenses_known_murders <-
shr_murders %>%
left_join(offenses_known_murders)
ggplot(shr_offenses_known_murders, aes(x = year)) +
geom_line(aes(y = shr_murders, color = "SHR"), size = 1.02) +
geom_line(aes(y = offenses_known_murders, color = "Offenses Known"), size = 1.02) +
geom_line(data = cdc_homicides, aes(x = year, y = homicides, color = "CDC"), size = 1.02) +
xlab("Year") +
ylab("Murders") +
theme_crim() +
scale_color_manual(values = c("SHR" = "#1b9e77",
"Offenses Known" = "#d95f02",
"CDC" = "#7570b3")) +
scale_y_continuous(labels = scales::comma) +
labs(color = "") +
expand_limits(y = 0)
```
Let us look at Chicago for another example of the differences in reporting from the SHR and the Offenses Known data. Figure \@ref(fig:chicagoSHRvsOffensesKnown) shows the annual number of homicide victims from both datasets. In most years they are pretty similar, excluding a few really odd years in the 1980s and in 1990. But what is also strange is that most years have more SHR victims than Offenses Known victims. So nationally SHR has fewer homicides than Offenses Known but that pattern is reversed in Chicago? This is one of the many quirks of SHR data. And is a warning against treating national trends as local trends; what is true nationally is not always true in your community. So when you use this data, check everything closely. And once you have done that, check it again.
```{r chicagoSHRvsOffensesKnown, fig.cap = "The annual number of homicide victims in Chicago, Supplementary Homicide Reports and Offenses Known and Clearances by Arrest, 1976-2023."}
chicago_homicides <- shr %>%
filter(ori %in% "ILCPD00") %>%
mutate(victim_count = additional_victim_count + 1) %>%
group_by(year) %>%
summarize(shr_homicides = sum(victim_count))
chicago_offenses_known_homicides <- offenses_known_yearly %>%
filter(ori %in% "ILCPD00") %>%
mutate(homicides = actual_murder + actual_manslaughter) %>%
filter(year >= 1976) %>%
group_by(year) %>%
summarize(offenses_known_homicides = sum(homicides))
chicago_offenses_known_homicides <-
chicago_homicides %>%
left_join(chicago_offenses_known_homicides)
ggplot(chicago_offenses_known_homicides, aes(x = year)) +
geom_line(aes(y = shr_homicides, color = "SHR"), size = 1.02) +
geom_line(aes(y = offenses_known_homicides, color = "Offenses Known"), size = 1.02) +
xlab("Year") +
ylab("# of Homicide Victims") +
theme_crim() +
scale_color_manual(values = c("SHR" = "#1b9e77",
"Offenses Known" = "#d95f02")) +
scale_y_continuous(labels = scales::comma) +
labs(color = "") +
expand_limits(y = 0)
```
Figures \@ref(fig:shrTopAgenciesCount) and \@ref(fig:shrTopAgenciesCountPercent) attempt to get at this question by looking the number and percent of all incidents that the top 100, 50 and 10 agencies based on number of homicide incidents make up out of all homicide incidents in each year. These agencies are massively disproportionate in how many homicides they represent - though they are also generally the largest cities in the country so are a small number of agencies but a large share of this nation's population. On average, the 10 agencies with the most homicide incidents each year - which may change every year - have over 4,000 homicide incidents and make up about 1/4 of all homicide incidents reported nationally. The top 50 have about 7,500 incidents a year, accounting for 46% of incidents. The top 100 agencies have a bit under 10,000 incidents a year and make up over 55% of all homicide incidents in the United States. So excluding the largest agencies in the country would certainly undercount homicides.
```{r, shrTopAgenciesCount, fig.cap = "The annual number of homicide incidents, showing all agencies, the top 100 agencies (by number of homicide incidents), top 50, and top 10 agencies, 1976-2023."}
shr_annual_top_agencies <- data.frame(year = 1976:2022,
total_incidents = NA,
top_100_incidents = NA,
top_50_incidents = NA,
top_10_incidents = NA)
for (year_temp in 1976:2022) {
temp <-
shr %>%
filter(year %in% year_temp) %>%
count(ori) %>%
arrange(desc(n))
shr_annual_top_agencies$total_incidents[shr_annual_top_agencies$year %in% year_temp] <-
sum(temp$n)
shr_annual_top_agencies$top_100_incidents[shr_annual_top_agencies$year %in% year_temp] <-
sum(temp$n[1:100])
shr_annual_top_agencies$top_50_incidents[shr_annual_top_agencies$year %in% year_temp] <-
sum(temp$n[1:50])
shr_annual_top_agencies$top_10_incidents[shr_annual_top_agencies$year %in% year_temp] <-
sum(temp$n[1:10])
}
shr_annual_top_agencies$top_100_percent <-
shr_annual_top_agencies$top_100_incidents / shr_annual_top_agencies$total_incidents
shr_annual_top_agencies$top_50_percent <-
shr_annual_top_agencies$top_50_incidents / shr_annual_top_agencies$total_incidents
shr_annual_top_agencies$top_10_percent <-
shr_annual_top_agencies$top_10_incidents / shr_annual_top_agencies$total_incidents
ggplot(shr_annual_top_agencies, aes(x = year)) +
geom_line(aes(y = total_incidents, color = "All Agencies"), size = 1.02) +
geom_line(aes(y = top_100_incidents, color = "Top 100"), size = 1.02) +
geom_line(aes(y = top_50_incidents, color = "Top 50"), size = 1.02) +
geom_line(aes(y = top_10_incidents, color = "Top 10"), size = 1.02) +
xlab("Year") +
ylab("# of Homicide Incidents") +
theme_crim() +
scale_color_manual(values = c("All Agencies" = "black",
"Top 100" = "#1b9e77",
"Top 50" = "#d95f02",
"Top 10" = "#7570b3")) +
scale_y_continuous(labels = scales::comma) +
labs(color = "") +
expand_limits(y = 0)
```
```{r, shrTopAgenciesCountPercent, fig.cap = "The annual percent of homicide incidents by the top 100 agencies (by number of homicide incidents), top 50, and top 10 agencies, 1976-2023."}
ggplot(shr_annual_top_agencies, aes(x = year)) +
geom_line(aes(y = top_100_percent, color = "Top 100"), size = 1.02) +
geom_line(aes(y = top_50_percent, color = "Top 50"), size = 1.02) +
geom_line(aes(y = top_10_percent, color = "Top 10"), size = 1.02) +
xlab("Year") +
ylab("% of Homicide Incidents") +
theme_crim() +
scale_color_manual(values = c("Top 100" = "#1b9e77",
"Top 50" = "#d95f02",
"Top 10" = "#7570b3")) +
scale_y_continuous(labels = scales::percent) +
labs(color = "") +
expand_limits(y = 0)
```
## Important variables
The data has demographic information for up to 11 victims and 11 offenders, as well as the information on the weapon used by each offender, the relationship between the first victim and each offender, and the circumstance of the homicide. The data also has the traditional SHR set of variables about the agency: their ORI code, population, state, region and the month and year of this data. One key variable that is missing is the outcome of the homicide: there is no information on whether any of the offenders were arrested.
While there is information on up to 11 victims and offenders, in most cases, there is only a single victim and a single offender in each incident. We can use the additional_victim_count and additional_offender_count columns to see how many additional victims/offenders there are. An additional victim/offender means in addition to the first one. Even though we have columns for up to 11 victims and offenders, in very rare instances the additional_[victim/offender]_count columns may say there are more than 11 victims/offenders.
To see how the breakdown for the number of victims in each incident looks, Figure \@ref(fig:numberSHRVictims) shows the percent of incidents with each possible number of victims.^[There are five incident where there are more than 11 victims. For simplicity of the graph, these incident are excluded.] In nearly all incidents - 96.0% - there was only a single victim. This drops to 3.3% of incidents for two victims, 0.5% for three victims, and only about 0.2% of incidents have four or more victims.
```{r numberSHRVictims, fig.cap = "The percent of incidents that have 1-11 victims."}
shr$number_victims <- shr$additional_victim_count + 1
shr$number_offenders <- shr$additional_offender_count + 1
shr %>%
filter(number_victims < 12) %>%
crimeutils::make_stat_count_plots("number_victims", count = "FALSE",
xlab = "# of Victims") +
scale_x_continuous(breaks = 1:11)
```
Figure \@ref(fig:numberSHROffenders) shows the breakdown of the number of offenders per homicide incident.^[There are seven incidents with more than 11 offenders. For simplicity of the graph, these incidents are excluded.] It is a little less concentrated than with victims but the vast majority of homicides are committed by one offender - or at least the police only report one offender. About 87.6% of homicides have only one offender, 8.4% have two, 2.5% have three, and 1.5% have four. Fewer than 0.5% of homicides have more than four offenders. However, this is all a bit misleading. In cases where there is no information about the offender, including how many offenders there is, the data simply says that there is a single offender. So the number of homicides with a single offender is an over-count while the number with more offenders is an undercount.
```{r numberSHROffenders, fig.cap = "The percent of incidents that have 1-11 offenders."}
shr %>%
filter(number_offenders < 12) %>%
crimeutils::make_stat_count_plots("number_offenders", count = "FALSE",
xlab = "# of Offenders") +
scale_x_continuous(breaks = 1:11)
```
The variable "situation" says what type of victim-offender number combination the incident is - e.g. "multiple victims/single offender", "single victim/multiple offenders", etc. - and does indicate if the number of offenders is unknown (though curiously there are over 4,000 instances where the number of offenders is unknown but they still say there are two offenders) so you can use this variable to determine if the police do not know how many offenders there is. You're still limited, of course, in that the number of offenders is always what the police think there are, and they may be wrong. So use this variable - and anything that comes from it like the percent of offenders of a certain race - with caution.
We will now look at a number of important variables individually. Since the data can potentially have 11 victims and 11 offenders - but in practice has only one each in the vast majority of cases - we will only look at the first victim/offender for each of these variables. Therefore, the results will not be entirely accurate, but will still give you a good overview of the data. The figures below will use data for all homicides from 1976 to 2022 so will cover all currently available years of data. Keep in mind that national trends are not the same as local trends so what is shown in these figures will probably not be the same as what is happening in your community. And that looking at all homicides means we are including murders, manslaughters, and justifiable homicides.
### Demographics
There are two broad categories of variables that we will cover: demographics of the victim and offenders, and characteristics of the case. We start with demographics.
#### Age
This data includes the age (in years) for each victim and each offender. For those under one years old, it also breaks this down into those from birth to six days old "including abandoned infant" and those seven days old to 364 days old. So there is a bit more info on homicides of babies. It also maxes out the age at 99 so for victims or offenders older than that we do not get their exact age, just text that says "99 years or older" (which I turn to the number 99 in the figures below).
Figure \@ref(fig:shrOffenderAge) shows the percent of homicides where the first offender in the case is of each age from 0-99. Offenders with unknown ages are excluded from this graph and make up about 27% of cases. The average (mean) age is 31.1 years old (shown in orange) which is due to a long right tail; the median age is 28 years old. If you look closely at the left side of the graph you can see that there are some very young offenders, with at least one offender for each year of age from 0 to 10 included in the data. It is not clear from this alone that these ages are a data entry error. While a two-year-old certainly could not kill someone, the data does include deaths caused by "children playing with gun" (homicide circumstances will be discussed in Section \@ref(circumstance)) so these ages could potentially be correct.
If you are familiar with the age-crime curve in criminology - which basically says crime peaks in late teen years then falls dramatically - this shows that exact curve, though is older and does not decline as the offender ages as quickly as we see with less serious crimes.
```{r shrOffenderAge, fig.cap = "The age of homicide offenders, based on the first offender in any homicide incident. Offenders under age 1 (classified as 'birth to 7 days old, including abandoned infant' and '7 days to 364 days old') and considered 0 years old. Offenders reported as '99 years or older' are considered 99 years old."}
temp <- shr
temp$offender_1_age[temp$offender_1_age %in% c("bb", "nb")] <- 0
temp$offender_1_age[temp$offender_1_age %in% "99 years or older"] <- 99
temp$offender_1_age[temp$offender_1_age %in% "unknown"] <- NA
temp$offender_1_age <- readr::parse_number(temp$offender_1_age)
crimeutils::make_stat_count_plots(temp,
column = "offender_1_age",
ylab = "# of Offenders",
xlab = "Age",
count = FALSE) +
geom_vline(aes(xintercept = mean(offender_1_age, na.rm = TRUE)), color = "#d95f02", size = 1.07)
```
Figure \@ref(fig:shrVictimAge) repeats Figure \@ref(fig:shrOffenderAge) but with victim age rather than offender age. The mean victim age (shown in orange) is 33 and the median age is 30. Though the average victim age is a bit younger than the average offender age, trends are relatively similar for teenagers and older where deaths spikes in the late teen years and then declines steadily. The major difference is the U-shape for younger victims - for victims under age 15, homicides peak at age 0 (i.e. younger than their first birthday) with ~1.4% of all homicides being this this age. They then decline until plateauing at around age 6 before increasing again in the early teen years.
```{r shrVictimAge, fig.cap = "The age of homicide victims, based on the first victims in any homicide incident. Victims under age 1 (classified as 'birth to 7 days old, including abandoned infant' and '7 days to 364 days old') and considered 0 years old. Victims reported as '99 years or older' are considered 99 years old."}
temp <- shr
temp$victim_1_age[temp$victim_1_age %in% c("7 days to 364 days", "birth to 6 days, including abandoned infant", "nn")] <- 0
temp$victim_1_age[temp$victim_1_age %in% "99 years or older"] <- 99
temp$victim_1_age[temp$victim_1_age %in% "unknown"] <- NA
temp$victim_1_age <- readr::parse_number(temp$victim_1_age)
crimeutils::make_stat_count_plots(temp,
column = "victim_1_age",
count = FALSE,
ylab = "# of Victims",
xlab = "Age") +
geom_vline(aes(xintercept = mean(victim_1_age, na.rm = TRUE) ),
color = "#d95f02",
size = 1.07)
```
#### Sex
We will next look at victim and offender sex, a simple variable since only male and female are included. About 62.2% of offenders, as seen in Figure \@ref(fig:shrOffenderSex), are male and about 8.2% are female, indicating a large disparity in the sex of homicide offenders. The remaining 29.6% of offenders do not have sex data available because the police do not know the sex of this individual. For offenders who are not arrested, this variable may be inaccurate since it is perceived sex of the offender.^[If we ignore unknown sex, essentially saying that the unknown people will have their sex distributed exactly as the known sex people, 88% are male and 12% are female. However, this assumption is probably wrong since the unknown people may be materially different than the known people, as evidence by them likely not being arrested and committing the crime in a way where even their sex cannot be identified. ]
```{r shrOffenderSex, fig.cap = "The sex of offender \\#1, 1976-2023."}
shr$offender_1_sex[is.na(shr$offender_1_sex)] <- "unknown"
shr$offender_1_sex <- tolower(shr$offender_1_sex)
shr %>%
mutate(offender_1_sex = capitalize_words(offender_1_sex)) %>%
crimeutils::make_barplots("offender_1_sex", count = "FALSE") +
ggplot2::scale_y_continuous(labels = scales::percent)
```
There is far less uncertainty for victim sex, with under 0.17% of victims having an unknown sex. Here again there is a large disparity between male and female with about 78.2% of victims being male and 21.6% being female.
```{r shrVictimSex, fig.cap = "The sex of victim \\#1, 1976-2023."}
shr %>%
mutate(victim_1_sex = capitalize_words(victim_1_sex)) %>%
crimeutils::make_barplots("victim_1_sex", count = "FALSE") +
ggplot2::scale_y_continuous(labels = scales::percent)
```
#### Race
This data also includes the race of the victims and offenders. This includes the following races: American Indian or Alaskan Native, Asian, Black, Native Hawaiian or Other Pacific Islander, and White. These are the only races included in the data; Hispanic is considered an ethnicity and is available as a separate, though flawed, variable. There is no category for bi- or multi-racial. As with other demographics info for offenders, in cases where no arrest is made (and we do not know in this data if one is made), there is no way to confirm the person's race so these results may not be entirely accurate.
Figure \@ref(fig:shrOffenderRace) shows the percent of homicides in the data by the race of offender #1. Black and White offenders are included are similar percentages, at 34.3% and 33.6% of victims, respectively. The next most common group is Unknown at about 30.6% of offenders. Given that so many offenders have an unknown race, the reliability of race measures is limited. The remaining races are Asian at 0.9% of offenders, American Indian or Alaskan Native at 0.6%, and Native Hawaiian or Other Pacific Islander at 0.02%.
```{r shrOffenderRace, fig.cap = "The race of offender \\#1, 1976-2023."}
shr$offender_1_race[shr$offender_1_race %in% c("p",
"P",
"native hawaiian or other pacific islander")] <- "Native Hawaiian/Pacific Islander"
shr$offender_1_race[shr$offender_1_race %in% "american indian or alaskan native"] <- "American Indian/Alaskan Native"
shr$offender_1_race[is.na(shr$offender_1_race)] <- "unknown"
shr %>%
mutate(offender_1_race = capitalize_words(offender_1_race)) %>%
crimeutils::make_barplots("offender_1_race", count = "FALSE") +
ggplot2::scale_y_continuous(labels = scales::percent)
```
For victim race, seen in Figure \@ref(fig:shrVictimRace), only about 1% of victim #1 races are unknown. This means we can be a lot more confident in the race of the victims than in the race of the offender. Similar to offenders, White and Black victims are the two most common races, with 48.4% and 48.1% of victims, respectively. There is a greater share of Asian victims than Asian offenders at 1.5% of victims. American Indian or Alaskan Natives make up 0.8% of victims while Native Hawaiian or Pacific Islanders make up 0.02% of victims.
```{r shrVictimRace, fig.cap = "The race of victim \\#1, 1976-2023"}
shr$victim_1_race[shr$victim_1_race %in% c("P", "p",
"native hawaiian or other pacific islander")] <- "Native Hawaiian/Pacific Islander"
shr$victim_1_race[shr$victim_1_race %in% "american indian or alaskan native"] <- "American Indian/Alaskan Native"
shr %>%
mutate(victim_1_race = capitalize_words(victim_1_race)) %>%
crimeutils::make_barplots("victim_1_race", count = "FALSE") +
ggplot2::scale_y_continuous(labels = scales::percent)
```
#### Ethnicity
The final demographic variable is ethnicity which is whether the victim or offender is Hispanic or not Hispanic. The SHR data has a weird relationship with this variable (which is also in the Arrests by Age, Sex, and Race dataset, discussed in Chapter \@ref(arrests)) where ethnicity is technically a variable in the data but very rarely collected. As such, this is an unreliable variable that if you really want to use needs careful attention to make sure it is being reported consistently by the agencies that you are looking at.
The vast majority - 69.7% - of offenders have an unknown ethnicity while 23.4% are not Hispanic and 7.1% are Hispanic.
```{r shrOffenderEthnicity, fig.cap = "The ethnicity of offender \\#1, 1976-2023."}
shr$offender_1_ethnic_origin[is.na(shr$offender_1_ethnic_origin)] <- "unknown"
shr %>%
mutate(offender_1_ethnic_origin = capitalize_words(offender_1_ethnic_origin)) %>%
crimeutils::make_barplots("offender_1_ethnic_origin", count = "FALSE") +
ggplot2::scale_y_continuous(labels = scales::percent)
```
Unlike the other demographic variables, there is still a huge amount of underreporting when it comes to victim ethnicity, though still less than for offender ethnicity. 55.6% of victims have an unknown ethnicity. Approximately 33.2% of victim #1 are reported as not Hispanic while 11.1% are reported as Hispanic.
```{r shrVictimEthnicity, fig.cap = "The ethnicity of victim \\#1, 1976-2023."}
shr$victim_1_ethnic_origin[is.na(shr$victim_1_ethnic_origin)] <- "unknown"
shr %>%
mutate(victim_1_ethnic_origin = capitalize_words(victim_1_ethnic_origin)) %>%
crimeutils::make_barplots("victim_1_ethnic_origin", count = "FALSE") +
ggplot2::scale_y_continuous(labels = scales::percent)
```
As an example of agencies under-reporting this variable, let us look at the number of offender #1s in Albuquerque, New Mexico, a city which the [US Census](https://www.census.gov/quickfacts/fact/table/albuquerquecitynewmexico,US/PST045222) says is about 50% Hispanic. Yet the Albuquerque police reported no ethnicity information for almost three decades of data.
```{r ABQ, fig.cap = "Annual number of offender \\#1 who is Hispanic in Albuquerque, New Mexico, 1976-2023."}
shr$hispanic_offender <- 0
shr$hispanic_offender[shr$offender_1_ethnic_origin %in% "hispanic"] <- 1
shr$unknown_hispanic_offender <- 0
shr$unknown_hispanic_offender[shr$offender_1_ethnic_origin %in% "unknown"] <- 1
shr$not_hispanic_offender <- 0
shr$not_hispanic_offender[shr$offender_1_ethnic_origin %in% "not hispanic"] <- 1
shr %>%
filter(ori %in% "NM00101") %>%
group_by(year) %>%
summarize(hispanic_offender = sum(hispanic_offender),
unknown_hispanic_offender = sum(unknown_hispanic_offender),
not_hispanic_offender = sum(not_hispanic_offender)) %>%
data.frame() %>%
ggplot(aes(x = year, y = hispanic_offender)) +
geom_line(aes(color = "Hispanic"), linewidth = 1.05) +
geom_line(aes(y = unknown_hispanic_offender, color = "Not Hispanic"), linewidth = 1.05) +
geom_line(aes(y = not_hispanic_offender, color = "Unknown Ethnicity"), linewidth = 1.05) +
xlab("Year") +
ylab("Hispanic Offenders") +
theme_crim() +
scale_color_manual(values = c("Hispanic" = "#1b9e77",
"Not Hispanic" = "#d95f02",
"Unknown Ethnicity" = "#7570b3")) +
scale_y_continuous(labels = scales::comma) +
labs(color = "")
```
### Case characteristics
Now we will move to facts about each case such as what weapon was used, how people involved knew each other, and what was the (rough) cause of the homicide.
#### Weapon used
The first variable we will look at is the weapon used by each offender. Table \@ref(tab:shrWeapon) shows the weapon used by the first offender in every incident from 1976 to 2022. Each offender can only be reported as having a single weapon, so this table essentially shows the number (and percent) of homicides caused by this weapon. This is not entirely true since in reality an offender could use multiple weapons and there can be multiple offenders. In these cases the police include what they believe is the "primary" weapon used by this offender.
The most commonly used weapon is a handgun, which is used in nearly half of homicides. This is followed by a knife or other sharp weapon used to cut at almost 15% of homicides, and then by "firearm, type not stated" which is just a firearm where we do not know the exact type (it can include handguns) at 8.9% of homicides The fourth most common weapon is "personal weapons" at nearly 6% of homicides. "Personal weapons" is a weird term to mean that there was no weapon - the "weapon" was the offender who beat the victim to death. Shotguns are involved in almost 5% of homicides and all other weapons are involved in fewer than 5% of cases. In total there are 19 different weapons included though most are very uncommon.
```{r }
temp <- make_frequency_table(shr %>% filter(offender_1_weapon != 35), "offender_1_weapon",
c("Weapon", "# of Incidents", "% of Incidents"))
kableExtra::kbl(temp,
# format = "html",
digits = 2,
align = c("l", "l", "r", "r"),
#booktabs = TRUE,
longtable = TRUE,
label = "shrWeapon",
caption = "The weapon used in a homicide incident, 1976-2023. In cases where there are multiple offenders, shows only the primary weapon for the first offender.",
escape = TRUE
) %>%
kable_styling(bootstrap_options = "striped", full_width = FALSE, latex_options = c("hold_position", "repeat_header"))
```
You may have noticed from the table that AR-15 is not included. While AR-15 is the commonly discussed in the media and policy circles as a way to control gun violence, it is not in a category by itself. Instead it is combined with other rifles in the "rifle" weapon group, and makes up about 3.6% of the weapons used by offender #1 in the data.
Let us check if AR-15s, through our rough proxy of the "rifle" weapon group, is getting more common over time. Figure \@ref(fig:shrRifle) shows the number of homicide incidents (including manslaughters, so not necessarily all murders) where offender #1 used a rifle. Figure \@ref(fig:shrRiflePercent) shows the percent of all homicide incidents where the the weapon was a rifle. Using both of these measures we can see the rifles are getting less common, declining substantially since 1980 though increasing again starting in the mid-2010s.
```{r, shrRifle, fig.cap = "The annual number of homicide incidents where offender \\#1's weapon was a rifle, 1976-2023."}
shr$rifle <- 0
shr$rifle[shr$offender_1_weapon %in% "rifle"] <- 1
shr %>%
group_by(year) %>%
summarize(rifle = sum(rifle)) %>%
data.frame() %>%
ggplot(aes(x = year, y = rifle)) +
geom_line(linewidth = 1.05) +
xlab("Year") +
ylab("# of Homicide Incidents") +
theme_crim() +
crimeutils::scale_color_crim() +
scale_y_continuous(labels = scales::comma) +
labs(color = "") +
expand_limits(y = 0)
```
```{r shrRiflePercent, fig.cap = "The annual share of homicide incidents where offender \\#1's weapon was a rifle, 1976-2023."}
shr %>%
mutate(dummy = 1) %>%
group_by(year) %>%
summarize(rifle = sum(rifle),
dummy = sum(dummy)) %>%
ungroup() %>%
mutate(rifle_percent = rifle / dummy) %>%
data.frame() %>%
ggplot(aes(x = year, y = rifle_percent)) +
geom_line(linewidth = 1.05) +
xlab("Year") +
ylab("% of Homicide Incidents") +
theme_crim() +
crimeutils::scale_color_crim() +
scale_y_continuous(labels = scales::percent) +
labs(color = "") +
expand_limits(y = 0)
```
Now, maybe this weapon is more commonly used in some types of crimes such as school shootings. You could get at that question using this data by seeing if times when a rifle is used that victims or offenders are younger or if the circumstance is something that suggests a school shooting. Unfortunately there is no offense location variable here, though there is in NIBRS and we can largely recreate this data through NIBRS. And of course you cannot tell if the weapon is actually an AR-15, only if it is a rifle.
#### Relationship between first victim and offenders
An interesting and highly useful variable is the relationship between the first victim and each offender. To be clear, this is only for the first victim; we do not have the relationship between other victims and offenders. However, as seen earlier, this is not *too much* of an issue since nearly all incidents only have a single victim. There are 29 possible relationship types (including "unknown" relationship) which are broken into three categories: legal family members, people known to the victim but who are not family, and people not known to the victim. Table \@ref(tab:shrRelationship) shows these relationships and the number and percent of homicides with these relationships.
The most common relationship, with about 28% of homicides, is that the police do not know the relationship. So there is a good deal of uncertainty in the relationship between victims and offenders. Next is that the victim is the offender's acquaintance at 19.7% or is a stranger at 15.3%. The next is "other - known to victim" which is similar to being an acquaintance at almost 5% of homicides. This is followed by the victim being the friend of the murderer at 3.6%. The 6th most common relationship, at 3.6% is that the victim is the wife of the offender, so she is murdered by her husband, and is the first familial relationship of this list. The remaining relationships all make up fewer than 3% of all homicides.
```{r }
temp <- make_frequency_table(shr, "victim_1_relation_to_offender_1",
c("Relationship", "# of Incidents", "% of Incidents"))
temp$Category <- ""
temp$Category[tolower(temp$Relationship) %in% c("brother",
"daughter",
"sister",
"in-law",
"other family",
"son",
"father",
"wife",
"stepdaughter",
"stepson",
"common-law husband",
"common-law wife",
"mother",
"husband",
"stepmother",
"stepfather")] <- "Family"
temp$Category[tolower(temp$Relationship) %in% c("other - known to victim",
"acquaintance",
"friend",
"neighbor",
"employee",
"girlfriend",
"boyfriend",
"ex-wife",
"employer",
"homosexual relationship",
"ex-husband")] <- "Not family (but known)"
temp$Category[tolower(temp$Relationship) %in% c("stranger")] <- "Not known"
temp <-
temp %>%
select(Relationship,
Category,
everything())
kableExtra::kbl(temp,
# format = "html",
digits = 2,
align = c("l", "l", "r", "r"),
#booktabs = TRUE,
longtable = TRUE,
label = "shrRelationship",
caption = "The relationship between the first victim and the first offender in a homicide incident, 1976-2023.",
escape = TRUE
)%>%
kable_styling(bootstrap_options = "striped", full_width = FALSE, latex_options = c("hold_position", "repeat_header"))
```
#### Homicide circumstance {#circumstance}
We also have information on the type of the homicide, which this data calls the "circumstance." This comes as relatively broad categories that leave a lot to be desired in our understanding of what led to the homicide. Table \@ref(tab:shrCircumstance) shows the number and percent of each circumstance for the first victim of each homicide from 1976 to 2022. This data has 33 possible circumstances which it groups into four main categories: murders that coincide with committing another crime ("felony type" murders), murders that do not coincide with another crime ("non-felony type" murders), justifiable homicides, and negligent manslaughter.
The felony type murders are simply ones where another crime occurred during the homicide. While this is called "felony type" it does include other crimes such as theft and gambling (which are not always a felony) so is a bit of a misnomer. The "non-felony type" are murders that happen without another crime. This includes gang killings (where, supposedly, only the murder occurred), children killed by babysitters, fights among intoxicated (both of alcohol and drugs) people, and "lover's triangle" killings. Justifiable homicides are when a person (civilian or police officer) kill a person who is committing a crime. Negligent manslaughter includes accidental shootings such as when children find and shoot a gun, but excludes deaths from traffic accidents.
The most common circumstances, accounting for 27.4%, 26.9%, and 12.5%, respectively, are "Unknown", "Other Arguments", and "Other Non-Felony Type - Not Specified." Since the data includes "Argument Over Money Or Property" as one category, the "Other Arguments" mean that it is an argument for a reason other than over money or property. The "Other Non-Felony Type" one does not mean that the murder did not occur alongside another crime, but also does not fall into the non-felony categories included. Robbery is the only remaining circumstance with more than 5% of murders, at 7.4%.
```{r }
shr$offender_1_circumstance[shr$offender_1_circumstance %in% "30"] <- "unknown"
shr$offender_1_circumstance[shr$offender_1_circumstance %in% "circumstances undetermined"] <- "unknown"
shr$offender_1_circumstance[shr$offender_1_circumstance %in% "circumstances undetermined"] <- "unknown"
temp <- make_frequency_table(shr, "offender_1_circumstance",
c("Circumstance", "# of Incidents", "% of Incidents"))
temp$Category <- ""
temp$Category[tolower(temp$Circumstance) %in% c("abortion",
"all suspected felony type",
"arson",
"burglary",
"gambling",
"larceny",
"motor vehicle theft",
"narcotic drug laws",
"other - not specified",
"other felony type - not specified",
"other sex offenses",
"prostitution and commercialized vice",
"rape",
"robbery")] <- "Felony Type"
temp$Category[tolower(temp$Circumstance) %in% c("other arguments",
"sniper attack",
"gangland killings",
"institutional killings",
"juvenile gang killings",
"argument over money or property",
"brawl due to influence of alcohol",
"brawl due to influence of narcotics",
"child killed by babysitter",
"lovers triangle",
"other non-felony type - not specified")] <- "Non-Felony Type"
temp$Category[tolower(temp$Circumstance) %in% c("felon killed by police",
"felon killed by private citizen")] <- "Justifiable Homicide"
temp$Category[tolower(temp$Circumstance) %in% c("all other manslaughter by negligence except traffic deaths",
"children playing with gun",
"gun cleaning death - other than self-inflicted",
"other negligent handling of gun which resulted in death of another",
"victim shot in hunting accident")] <- "Negligent Manslaughter"
temp <-
temp %>%
select(Circumstance,
Category,
everything())
kableExtra::kbl(temp,
#format = "html",
digits = 2,
align = c("l", "l", "r", "r"),
#booktabs = TRUE,
longtable = TRUE,
escape = TRUE,
label = "shrCircumstance",
caption = "The circumstance of the homicide for the first offender in a homicide incident.") %>%
kable_styling(bootstrap_options = "striped", full_width = FALSE, latex_options = c("hold_position", "repeat_header"))
```
#### Homicide subcircumstance
The "subcircumstance" just tells you more information about justifiable homicides. This includes the circumstance leading up to the "felon" - which is how the person killed is described, though technically they do not need to have committed a felony - was killed. It includes if this person attacked an officer (the one who killed them), a different officer, a civilian, or was committing or fleeing a crime.
This dataset is one source of information on how many people police kill each year. But it is a large undercount compared to other sources such as the Washington Post collection, so is not a very useful source of information on this topic.
```{r }
temp <- make_frequency_table(shr, "offender_1_subcircumstance",
c("Subcircumstance", "# of Incidents", "% of Incidents"))
kableExtra::kbl(temp,
# format = "html",
digits = 2,
align = c("l", "l", "r", "r"),
#booktabs = TRUE,
longtable = TRUE,
label = "shrSubCircumstance",
caption = "The circumstance for the first offender in a homicide incident in cases where the offender is killed. This includes incidents where the only person who dies in the offender.",
escape = TRUE
) %>%
kable_styling(bootstrap_options = "striped", full_width = FALSE, latex_options = c("hold_position", "repeat_header"))
```