13_nibrs_offense.Rmd

# Offense Segment {#offenseSegment}

```{r, echo=FALSE}
knitr::opts_chunk$set(
  echo    = FALSE,
  warning = FALSE,
  error   = FALSE,
  message = FALSE
)
```

```{r,  results='hide'}
offense <- readRDS("data/nibrs_offense_segment_2023.rds")
nibrs_offense_summary_stats <- readRDS("data/nibrs_summary_stats/nibrs_offense_summary_stats.rds")
nibrs_offense_first_year <- readRDS("data/nibrs_summary_stats/nibrs_offense_first_year.rds")
nibrs_location_first_year <- readRDS("data/nibrs_summary_stats/nibrs_location_first_year.rds")
bias_motivation_first_year <- readRDS("data/nibrs_summary_stats/bias_motivation_first_year.rds")
gc()
```

This dataset provides information about the offense that occurred, with each incident potentially having multiple offenses. Each row is an incident-offense so incidents with multiple offenses would have multiple rows. For a subset of offenses it also provides a more detailed subcategory of offense, allowing a deeper dive into what exactly happened. For example, for animal abuse there are four subcategories of offenses: simple/gross neglect of an animal, intentional abuse or torture, animal sexual abuse (bestiality), and organized fighting of animals such as dog or cock fights. 

There is also information for what date the crime occurred on, where the crime occurred - in categories such as residence or sidewalk rather than an address - whether the offender is suspected of using drugs, alcohol, or "computer equipment" (which includes cell phones) during the crime, and which weapon was involved. In cases where the weapon was a firearm it says whether that weapon was fully automatic or not. It also provides information on if the crime was a hate crime by including a variable on the bias motivation (if any) of the offender. This is based on evidence that the crime was motivated, at least in part, by the victim's group (e.g. race, sexuality, religion, etc.). There are 34 possible bias motivations and while hate crimes could potentially be motivated by bias against multiple groups, this data only allows for a single bias motivation.

As you look through this data yourself you may be surprised that some common crimes, such as DUIs and disorderly conduct, are missing. That is because some crimes, which the FBI calls "Group B" crimes, are reported only when an arrest is made and only as part of the "Group B Arrest Report" segment. Therefore, none of these offenses will be reported in the Offense Segment. We'll discuss these Group B offenses when we discuss arrestees in Chapter \@ref(arrestee). 

## Crime category

The most important variable in the Offense Segment is figuring out exactly what offense was committed. This tells you what crimes occurred in the incident. There can be multiple crimes in a single incident so this provides information about each offense that happened. To figure out which offenses belong together, just look at the incident number, year, and the ORI. Within ORI and year, each incident number is a unique identifier for an incident. Be careful that you're using all three of these variables as the incident number may be the same in different agencies, or in the same agency in different years, but these refer to different incidents.

Each crime is mutually exclusive and crimes which are elements of another crime are not double-counted. For example, robberies are basically theft plus assault/intimidation - it is the use of force (assault) or the threat of force (intimidation) to take property (theft). A case of robbery in this data would only count as robbery, not as robbery and theft and assault/intimidation. If there are these crimes together in an incident that is because that crime *also* occurred separately. For example, if someone is robbed and after the robbery is over (i.e. they hand over their belongings) they are then punched repeatedly, that could potentially be classified as a robbery and an assault. 

Table \@ref(tab:offenseCrimeCategories) shows each possible crime in the data and how common it was in 2022. It is sorted by frequency instead of alphabetically so it is easier to see which crimes are most common. There were about 13 million crimes reported to NIBRS in 2022. The most common crime is simple assault - which is an assault that did not use a weapon and did not result in serious injury - at 14% of crimes, or about 1.7 million crimes. If you think this is odd because property crimes are more common than violent crimes, you would be right. NIBRS data is pretty specific in its crime categories so it splits up certain crimes into a number of different categories. Theft is the most common crime committed in the United States. In NIBRS it is broken into several different types of theft so you need to combine them together to actually measure theft in its entirety. Of the top 6 most common crimes, theft crimes make up ranks 3, 5, and 6 (all other larceny, theft from motor vehicle, and shoplifting), and there are other theft offenses that are less common such as "theft from building" and "theft of motor vehicle parts/accessories." 

This table also shows the first year that offense is included in the data. Most offenses have been included since NIBRS started in 1991, but these have been new offenses added, with these additions becoming more common recently. For example, the crime "Failure to register as a sex offender" was added in 2021 as was "Illegal entry into the United States." There are even offenses that were not reported at all in 2022, such as "treason," which is an offense that only federal and tribal agencies are allowed to report. 

```{r }

temp <- make_frequency_table(offense,
                             "ucr_offense_code",
                             c("Crime Category",
                               "\\# of Offenses",
                               "\\% of Offenses")) %>%
  left_join(nibrs_offense_first_year %>% 
              mutate(ucr_offense_code = capitalize_words(ucr_offense_code)) %>%
              rename(`Crime Category` = ucr_offense_code)) %>%
  select(`Crime Category`,
         `First Year` = year,
         everything())
temp$`First Year`[is.na(temp$`First Year`)] <- "-"


kableExtra::kbl(temp, 
                # format = "html",
                digits = 2, 
                align = c("l", "l", "r", "r"),
                #booktabs = TRUE, 
                longtable = TRUE,
                escape = TRUE,
                label = "offenseCrimeCategories",
                caption = "The number and percent of crimes reported from all agencies in 2022, by crime category.") %>%
  kable_styling(bootstrap_options = "striped", full_width = FALSE, latex_options = c("hold_position", "repeat_header"))
```

Though each agency is supposed to report the same crimes - using the exact same definition of the crimes so the data is consistent - that is not always true in practice. For example, animal cruelty was first reported in 2015 (before that it was not an option so agencies could not report it) and likely most agencies in the US have had at least one animal abuse crime since then. In 2015, however, reporting was concentrated in a small number of states, meaning that not all agencies reported that offense. The concentration in certain states suggests that this low reporting is due to agencies in certain states deciding (or not being able to, such as if having older reporting systems which do not have animal cruelty as an option) not to report that offense at all. Reporting has increased as time has gone on, suggesting that over time more agencies begin reporting crimes as they are added. Therefore, I strongly suggest examining your data over time and across geographic areas to see if there are any biases before using it. 

## Offense subtype

In addition to the broader crime committed, NIBRS does allow for a "subtype" of crime variable which gives us a bit more information about what crime occurred (the variable is technically called the "type of criminal activity"). This is especially useful for certain crimes where it is not clear exactly what they are referring to from the crime category alone. For example, for drug crimes we generally differentiate possession from sale or manufacturing. NIBRS combines everything into "drug/narcotic violations (crimes for drug materials such as syringes are classified as "drug equipment violations"). So we need to use the subtype variable to figure out what type of drug crime it is. Looking at the subtype we can see if the arrest is for "distributing/selling" "operating/promoting/assisting." "possessing/concealing," "transporting/transmitting/importing," or "using/consuming." There can be up to three subtypes per offense, so an arrest for a drug crime may be related to both possessing and selling drugs. 

There are also some unexpected subtypes related to certain offenses. For example, there are a few dozen drug offenses that also have the subtype of "exploiting children." This subtype is generally for cases where a child is abused, but happens here for drug offenses that do not have any associated child abuse (or for some of them, does not have any other crime at all) offense. The reason, I believe, for this category is that these offenses occurred in public so could have been viewed by children, and were labeled as exploiting children for that reason. Or, it may simply be a data entry error. If you are studying crimes against children, pulling from this variable would likely overcount crimes so - as always - you should make sure that the data you carefully check your data for odd things like this.^[Whether children viewing a crime, even a drug crime, would count as a crime against children would, of course, depend on your definition.]

This data is only available for the below subset of crimes, and is not always present even for these crimes. In Table \@ref(tab:offenseCrimeSubcategories) we show the breakdown of subtypes for each of these offenses, based on the first subtype reported. 

* Animal Cruelty
* Assault Offenses - Aggravated Assault
* Assault Offenses - Intimidation
* Assault Offenses - Simple Assault
* Commerce Violations - Federal Liquor Offenses
* Counterfeiting/Forgery
* Drug/Narcotic Offenses - Drug Equipment Violations
* Drug/Narcotic Offenses - Drug/Narcotic Violations
* Fugitive Offenses - Harboring Escapee/Concealing From Arrest
* Gambling Offenses - Gambling Equipment Violations
* Kidnapping/Abduction
* Murder/Nonnegligent Manslaughter
* Negligent Manslaughter
* Pornography/Obscene Material
* Robbery
* Sex Offenses - Fondling (Incident Liberties/Child Molest)
* Sex Offenses - Rape
* Sex Offenses - Sexual Assault With An Object
* Sex Offenses - Sodomy
* Stolen Property Offenses (Receiving, Selling, Etc.)
* Weapon Law Violations - Explosives
* Weapon Law Violations - Violation of National Firearm Act of 1934
* Weapon Law Violations - Weapon Law Violations

```{r }
offenses_with_subtypes <- c("animal cruelty",
                            "assault offenses - aggravated assault",
                            "assault offenses - intimidation",
                            "assault offenses - simple assault",
                            "commerce violations - federal liquor offenses",
                            "counterfeiting/forgery",
                            "drug/narcotic offenses - drug equipment violations",
                            "drug/narcotic offenses - drug/narcotic violations",
                            "fugitive offenses - harboring escapee/concealing from arrest",
                            "gambling offenses - gambling equipment violations",
                            "kidnapping/abduction",
                            "murder/nonnegligent manslaughter",
                            "negligent manslaughter",
                            "pornography/obscene material",
                            "robbery",
                            "sex offenses - fondling (indecent liberties/child molest)",
                            "sex offenses - rape",
                            "sex offenses - sexual assault with an object",
                            "sex offenses - sodomy",
                            "stolen property offenses (receiving, selling, etc.)",
                            "weapon law violations - explosives",
                            "weapon law violations - violation of national firearm act of 1934",
                            "weapon law violations - weapon law violations" )

final <- data.frame()
offense$type_criminal_activity_1[is.na(offense$type_criminal_activity_1)] <- "None"
for (crime in offenses_with_subtypes) {
  temp <-
    offense %>% filter(ucr_offense_code %in% crime)
  if (nrow(temp) > 0) {
    temp <- make_frequency_table(temp,
                                 "type_criminal_activity_1",
                                 c("Crime Subcategory",
                                   "\\# of Offenses",
                                   "\\% of Offenses")) %>%
      mutate(Crime = capitalize_words(crime)) %>%
      select(`Crime`,
             `Crime Subcategory`,
             "\\# of Offenses",
             "\\% of Offenses")
  
  final <-
    final %>%
    bind_rows(temp)
}
}

kableExtra::kbl(final, 
                #  format = "html",
                digits = 2, 
                align = c("l", "l", "r", "r"),
                #booktabs = TRUE, 
                longtable = TRUE,
                escape = TRUE,
                label = "offenseCrimeSubcategories",
                caption = "The number and percent of crime subtypes by offense, 2023. This breakdown is only available for a subset of offenses. There can be up to three subtypes per offense; in this table we only use the first subtype.") %>%
  kable_styling(bootstrap_options = "striped", full_width = FALSE, latex_options = c("hold_position", "repeat_header"))
```

## Offense completed

For each offense, this segment also tells you if the offense was completed or only attempted. Nearly all offenses reported in NIBRS are completed offenses. This is likely in part due to completed crimes being easier to detect than attempted crimes. For example, if someone breaks into your house you will likely discover that and alert the police. If someone tries to break in but fails (even something such as trying your front door to see if it is locked and then leaving because it could be considered attempted burglary) there is much less evidence so it probably does not come to the police's attention as much.

Some offenses, such as simple and aggravated assault or homicide, are only labeled as completed. This is because an attempted murder, for example, would be classified as aggravated assault. Since crimes in NIBRS are mutually exclusive, there cannot be both attempted murder and aggravated assault, so only aggravated assault is included. This does limit the data as it is important to know when an aggravated assault is done with the intent to kill the victim and when it is just to seriously harm the victim (though measuring this would likely be extremely imprecise since it requires knowing the motives of the offender).

Table \@ref(tab:offensesCompleted) shows the percent of each crime category in 2022 NIBRS data that was completed or was only attempted.

```{r }
temp <- offense %>%
  group_by(ucr_offense_code) %>%
  count(offense_attempted_or_completed)

completed <- temp[temp$offense_attempted_or_completed %in% "completed", ] %>%
  select(-offense_attempted_or_completed) %>%
  rename(completed = n)
attempted <- temp[temp$offense_attempted_or_completed %in% "attempted", ] %>%
  select(-offense_attempted_or_completed) %>%
  rename(attempted = n)

final <- completed %>%
  left_join(attempted, by = "ucr_offense_code") %>%
  replace_na(list(attempted = 0)) %>%
  mutate(total = completed + attempted,
         completed = completed / total * 100,
         attempted = attempted / total * 100,
         completed = round(completed, 2),
         attempted = round(attempted, 2),
         completed = paste(completed, "\\%"),
         attempted = paste(attempted, "\\%"),
         ucr_offense_code = capitalize_words(ucr_offense_code)) %>%
  select(-total) %>%
  arrange(desc(completed))

names(final) <- c("Crime Category", "\\% Completed", "% Attempted")

kableExtra::kbl(final, 
                #format = "html",
                digits = 2, 
                align = c("l", "l", "r", "r"),
                #booktabs = TRUE, 
                longtable = TRUE,
                label = "offensesCompleted",
                escape = TRUE,
                caption = "The percent of crimes completed or attempted, by crime category.") %>%
  kable_styling(bootstrap_options = "striped", full_width = FALSE, latex_options = c("hold_position", "repeat_header"))
```

In Figure \@ref(fig:nibrsOffenseCompleted) we see the share of all offenses per year that are reported as completed. In every year we have data nearly all offenses were reported as being completed. 

```{r nibrsOffenseCompleted, fig.cap="The annual percent of offenses reported as completed, 1991-2023."}
nibrs_offense_summary_stats  %>%
  ungroup() %>%
  ggplot(aes(x = year, y = percent_offense_completed )) +
  geom_line(linewidth = 1.05) +
  xlab("Year") +
  ylab("% Completed") +
  theme_crim() +
  scale_y_continuous(labels = scales::percent, expand = c(0, 0), limits = c(0, NA)) +
  labs(color = "") +
  expand_limits(y = 0) +
  scale_x_continuous(breaks = time_series_x_axis_year_breaks)
```

## Drug, alcohol, or computer use

Intoxication, mainly by alcohol, is known to be a major correlate (and cause) of crime. Drunk people commit a lot of crime (even though most drunk people never commit crime). Drunk people are also better targets for crime so are chosen by certain offenders who want an easy victim. NIBRS tries to capture this by telling us if the offender is *suspected of using* drugs (just "drugs" as we do not know which drug was involved, though we could look in the Property Segment to see what drug [if any] was seized by the police), alcohol, or "computer equipment" which also includes cell phones. Computer equipment is more relevant for certain crimes such as fraud or pornography/obscene materials. 

For each offense there are three variables about usage of any of these so potentially the offender could have used all three. The data does not get any more specific than if the offender is *suspected of using* these items. So we do not know how intoxicated they are or what they used the computer equipment for. Just that they are suspected of using it. And suspected is key. We do not know for sure if they used it. If, for example, a victim says that their mugger was drunk, NIBRS will say they are suspected of using alcohol, even though there is no definitive proof such as a blood test or breathalyzer. Unlike some past variables like offense subtype where it applies to only a subset of crimes, this variable is available for every crime. 

Figure \@ref(fig:offenseDrugAlcoholComputer) shows the distribution is suspected usage for all offenses in 2019 NIBRS. This is just from the first suspected use variable for simplicity of the graph. The most common outcome is "Not Applicable" at 89% of offenses. Not Applicable actually just means that the offender was not suspected of using drugs, alcohol, or computer equipment. 

```{r offenseDrugAlcoholComputer, fig.cap = "The distribution of drug, alcohol, or computer use for all offenses in 2022"}
offense %>%
  mutate(offender_suspected_of_using_1 = capitalize_words(offender_suspected_of_using_1)) %>%
  crimeutils::make_barplots("offender_suspected_of_using_1", count = FALSE, ylab = "% of Offenses") +
  ggplot2::scale_y_continuous(labels = scales::percent, expand = c(0, 0), limits = c(0, NA))
```

Figure \@ref(fig:offenseDrugAlcoholComputerAny) shows the distribution of suspected use when excluding "Not Applicable." Drug usage is the most common thing offenders are suspected of using. In about 61% of offenses where the offender is suspected of using something (of the drugs, alcohol, or "computer equipment" choices), that something is drugs. Again, we do not know what type of drug was used, only that it was not alcohol. Alcohol follows at 30% while computer equipment is only 6%. 

```{r offenseDrugAlcoholComputerAny, fig.cap = "The distribution of drug, alcohol, or computer use for offenses where there was usage of one of these items. For easier viewing of how this variable is distributed, this figure excludes all offenses where there was no drug, alcohol, or computer use or the variable was NA."}
offense %>%
  filter(offender_suspected_of_using_1 != "not applicable") %>%
  mutate(offender_suspected_of_using_1 = capitalize_words(offender_suspected_of_using_1)) %>%
  crimeutils::make_barplots("offender_suspected_of_using_1", count = FALSE, ylab = "% of Offenses") +
  ggplot2::scale_y_continuous(labels = scales::percent, expand = c(0, 0), limits = c(0, NA))
```

## Crime location

This dataset tells us where each crime happened, giving more of a type of location rather than the precise location (e.g. coordinates) where it happened. Table \@ref(tab:offenseLocation) shows the different location types where each offense could occur, sorted by most common to least common location, and includes the first year that location was reported. Most locations were part of the data since 1991 but there have been some changes, such as adding "Cyberspace" in 2009, and splitting "school/college" to "school - college/university" and "school - elementary/secondary" in 2009. 

The most common place for a crime to occur is in someone's own home, at 38% of offenses. This makes a bit of sense as people spend a lot of time at home and certain crimes, such as burglary and domestic violence, commonly occur in the victim's home. Crimes happening on a road or alley make up the second most common location at 17% and parking lot or garage follows at 10%. The remaining locations only make up 5% or fewer of offense locations. A careful reader may realize a mistake in this table. 

Incidents can involve multiple offenses but would likely - though not always - occur in the same location. So if certain locations are more likely to have multiple offenses in that incident then we could be counting those locations more often. That may be okay, if what you're really interested in is data at the offense-level rather than the more commonly used incident-level. But it is important to be careful in making sure you are measuring the data right and presenting results clearly. 

```{r offenseLocation}

temp <- make_frequency_table(offense,
                             "location_type",
                             c("Crime Location",
                               "\\# of Offenses",
                               "\\% of Offenses")) %>%
  left_join(nibrs_location_first_year %>% 
              mutate(location_type = capitalize_words(location_type)) %>%
              rename(`Crime Location` = location_type)) %>%
  select(`Crime Location`,
         `First Year` = year,
         everything())
temp$`First Year`[is.na(temp$`First Year`)] <- "-"

kableExtra::kbl(temp, 
                #   format = "html",
                digits = 2, 
                align = c("l", "l", "r", "r"),
                #booktabs = TRUE, 
                longtable = TRUE,
                label = "offenseLocation",
                escape = TRUE,
                caption = "The location of crimes for all offenses reported in 2022.") %>%
  kable_styling(bootstrap_options = "striped", full_width = FALSE, latex_options = c("hold_position", "repeat_header"))
```

Keep in mind that some locations may be an overly specific location that fits well into a broader category that you are interested in. For example, if you care about crimes that happen in stores you would look at "Bank/Savings and Loan", "Restaurant", "Bar/Nightclub," among other locations, which combined have a lot more offenses than any one individually. This is a recurring theme of NIBRS data - you have a lot of data and some of it is so specific that you need to do extra work to aggregate data into units you want.

```{r}
locations <- unique(offense$location_type)
locations <- sort(locations)

location_offense <- 
  make_top_5_table(data = offense,
                   filter_variable = "location_type",
                   filter_values = locations,
                   filter_variable_name = "Crime Location",
                   other_variable = "ucr_offense_code",
                   other_variable_name = "Offense",
                   unit = "Offenses")

kableExtra::kbl(location_offense, 
                #   format = "html",
                digits = 2, 
                align = c("l", "l", "r", "r"),
                #booktabs = TRUE, 
                longtable = TRUE,
                label = "offenseLocation",
                escape = TRUE,
                caption = "The most common offenses for each crime location, 2023.") %>%
  kable_styling(bootstrap_options = "striped", full_width = FALSE, latex_options = c("hold_position", "repeat_header"))

```


## Weapons {#offenseWeapons}

Using a weapon during a crime can greatly increase the severity of the offense, as evidenced by increased sanctions for using a weapon (and particularly a gun) and the enormous amount of attention - by the media, the public, and researchers - on gun crimes. Luckily, this data tells us the weapon used in certain offenses. There can be up to three different weapon types included in an offense. This data does not provide a weapon used for all offenses, just for the ones that they deem to be violent crimes, and thus could involve a weapon. Please note that this is the weapons used in some capacity during the crime, not necessarily to harm the victim.^[The Victim Segment does have data on victim injuries though it does not say which weapon caused the injuries] For example, if a gun is involved in a crime, that gun may have been fired and missed the victim, fired and hit the victim, used to beat the victim, or merely brandished. From this data alone we do not know how it was used. 

The list of offenses where there is data on weapon usage is below:

* Assault Offenses - Aggravated Assault
* Assault Offenses - Simple Assault
* Extortion/Blackmail
* Human Trafficking - Commercial Sex Acts
* Human Trafficking - Involuntary Servitude
* Justifiable Homicide - Not A Crime
* Kidnapping/Abduction
* Murder/Nonnegligent Manslaughter
* Negligent Manslaughter
* Robbery
* Sex Offenses - Fondling (Incident Liberties/Child Molest)
* Sex Offenses - Rape
* Sex Offenses - Sexual Assault With An Object
* Sex Offenses - Sodomy
* Weapon Law Violations - Explosives
* Weapon Law Violations - Violation of National Firearm Act of 1934
* Weapon Law Violations - Weapon Law Violations

Table \@ref(tab:offenseWeapon) shows the breakdown in the weapons used in 2022 data, by the offense type. This table aggregates data at the offense-level, meaning that an incident with two offenses that both involved a weapon would count that weapon twice. Depending on your use case you may want to aggregate data to the incident-level, such as by top-coding to the most serious weapon per incident. 

```{r }
offenses_with_weapons <-
  c("assault offenses - simple assault",
    "assault offenses - aggravated assault",
    "weapon law violations - weapon law violations",
    "sex offenses - sexual assault with an object",
    "sex offenses - rape",
    "robbery",
    "sex offenses - sodomy",
    "murder/nonnegligent manslaughter",
    "kidnapping/abduction",
    "sex offenses - fondling (indecent liberties/child molest)",
    "human trafficking - commercial sex acts",
    "negligent manslaughter",
    "extortion/blackmail",
    "justifiable homicide - not a crime",
    "human trafficking - involuntary servitude",
    "weapon law violations - explosives",
    "weapon law violations - violation of national firearm act of 1934")


final <- data.frame()
offense$type_weapon_force_involved_1[is.na(offense$type_weapon_force_involved_1)] <- "Unknown"
offense$type_weapon_force_involved_1[offense$type_weapon_force_involved_1 %in% 16:17] <- "Unknown"
for (crime in offenses_with_weapons) {
  temp <-
    offense %>% filter(ucr_offense_code %in% crime)
  if (nrow(temp) > 0) {
  temp <- make_frequency_table(temp,
                               "type_weapon_force_involved_1",
                               c("Weapon",
                                 "\\# of Offenses",
                                 "\\% of Offenses")) %>%
    mutate(Crime = capitalize_words(crime)) %>%
    select(`Crime`,
           `Weapon`,
           "\\# of Offenses",
           "\\% of Offenses")
  final <-
    final %>%
    bind_rows(temp)
  }
}

kableExtra::kbl(final, 
                # format = "html",
                digits = 2, 
                align = c("l", "l", "r", "r"),
                #booktabs = TRUE, 
                longtable = TRUE,
                label = "offenseWeapon",
                escape = TRUE,
                caption = "The weapon used by an offender, by offense, 2023. The use means that it was part of the crime though may not have been physically discharged. For example, pointing a gun at someone even without firing the gun is still using it.") %>%
  kable_styling(bootstrap_options = "striped", full_width = FALSE, latex_options = c("hold_position", "repeat_header"))
```

We can use this dataset to look at, for example, trends in the type of weapon used in murders and nonnegligent manslaughters over time, as seen in Figure \@ref(nibrsMurdersWeapon). We can see that guns are the most common weapon are over 60% of murders in most years. Most of these guns are handguns, with about 35% of all murders using a handgun. Other weapons are far less common making up fewer than 20% of offenses most years. There are different agencies reporting each year so differents in trends may simply be due to different agencies in the data. For your own analysis you will need to be far more careful than the figure shown here.

```{r nibrsMurdersWeapon, fig.cap="The annual percent of murders and nonnegligent homicides, by offender weapon, 1991-2023."}
nibrs_offense_summary_stats  %>%
  ggplot(aes(x = year)) +
  geom_line(aes(y = percent_murders_gun, color = "Gun"), linewidth = 1.05) +
  geom_line(aes(y = percent_murders_knife, color = "Knife"), linewidth = 1.05) +
  geom_line(aes(y = percent_murders_other_weapon, color = "Other"), linewidth = 1.05) +
  geom_line(aes(y = percent_murders_unarmed, color = "Unarmed"), linewidth = 1.05) +
  geom_line(aes(y = percent_murders_handgun, color = "Handgun"), linewidth = 1.05) +
  xlab("Year") +
  ylab("% at Murders") +
  theme_crim() +
  scale_y_continuous(labels = scales::percent, expand = c(0, 0), limits = c(0, NA)) +
  labs(color = "") +
  expand_limits(y = 0) +
  scale_color_manual(values = c("Gun" = "#a6cee3",
                                "Knife" = "#1f78b4",
                                "Other" = "#b2df8a",
                                "Unarmed" = "#33a02c",
                                "Handgun" = "black")) +
  scale_x_continuous(breaks = time_series_x_axis_year_breaks)
```

### Automatic weapons

When the weapon involved was a firearm there is a variable which indicates that the firearm was fully automatic. To be clear, this means that when you pull the trigger once the gun will fire multiple bullets. Semi-automatic firearms are **not** automatic firearms. Of course, saying a gun is fully automatic requires either the policing seizing the gun or the gun being fired (and for witnesses to accurately determine that it is fully automatic). Since many crimes are never solved (and even those that lead to an arrest may not lead to the gun being seized^[Though some guns are seized even without an arrest, such as if the gun is left at the crime scene], this variable is likely imprecise. Still, Figure \@ref(fig:offenseAutomaticWeapon) shows the percent of firearms used in offenses in 2022 that are reported to be fully automatic. Even though there can be up to three weapons used in an offense, this figure only looks at the first weapon. The most common guns to be automatic are rifles and handguns, both with about 5% of all uses being of an automatic weapon. The remaining categories are all under 3% of uses. 

```{r offenseAutomaticWeapon, fig.cap = "The percent of firearms used that were fully automatic, for all offenses, 1991-2023."}
nibrs_offense_summary_stats  %>%
  ggplot(aes(x = year)) +
  geom_line(aes(y = percent_rifle_automatic, color = "Rifle"), linewidth = 1.05) +
  geom_line(aes(y = percent_shotgun_automatic , color = "Shotgun"), linewidth = 1.05) +
  geom_line(aes(y = percent_handgun_automatic , color = "Handgun"), linewidth = 1.05) +
  geom_line(aes(y = percent_firearm_type_not_stated_automatic , color = "Firearm\n (type not\n stated)"), linewidth = 1.05) +
  geom_line(aes(y = percent_other_firearm_automatic , color = "Other\n Firearm"), linewidth = 1.05) +
  xlab("Year") +
  ylab("% Automatic Weapon") +
  theme_crim() +
  scale_y_continuous(labels = scales::percent, expand = c(0, 0), limits = c(0, NA)) +
  labs(color = "") +
  expand_limits(y = 0) +
  scale_color_manual(values = c("Rifle" = "#1b9e77",
                                "Shotgun" = "#d95f02",
                                "Handgun" = "#7570b3",
                                "Firearm\n (type not\n stated)" = "#1f78b4",
                                "Other\n Firearm" = "black")) +
  scale_x_continuous(breaks = time_series_x_axis_year_breaks)


```

## Burglary info

For burglary offenses there are two variables that provide a little more information on the offense. The first variable is the number of "premises" that the burglar entered. This is only available when the location for the offense is either hotel/motel or a rental storage facility. So the "premise" can really be thought of as a room in the building, not that they break into multiple hotels. Figure \@ref(fig:offensePremisesEntered) shows the breakdown in the number of premises entered during a burglary incident. The graph is capped at ten or more for simplicity but in the data itself the number can go higher. The vast majority of hotel/motel and storage facility burglaries only have one room entered, with 87% of these burglaries only being on a single room. This declines enormously to 5% burglarizing two rooms and then more than halves to 2% burglarizing three rooms. This trend continues as the number of rooms increase. 

```{r offensePremisesEntered, fig.cap = "The distribution in the number of premises entered during burglaries, 2023. This information is only available for burglaries in a hotel/motel or rental storage facilities."}


offense$number_of_premises_entered[offense$number_of_premises_entered >= 10] <- "10 or more"
offense$number_of_premises_entered <- factor(offense$number_of_premises_entered, 
                                             levels = rev(c(1:9, "10 or more")))

offense %>%
  filter(ucr_offense_code %in% "burglary/breaking and entering",
         !is.na(number_of_premises_entered)) %>%
  ggplot2::ggplot(aes(x = number_of_premises_entered)) + 
  ggplot2::coord_flip() +
  ggplot2::xlab("# of Premises") +
  ggplot2::geom_bar(ggplot2::aes_string(y = "(..count..)/sum(..count..)")) + 
  crimeutils::theme_crim() +
  ggplot2::scale_y_continuous(labels = scales::percent, 
                              expand = c(0, 0)) +
  ylab("% of Burglaries")
```

The second variable, and one where there is data from every burglary reported regardless of location, says whether the burglar entered the building forcibly or not. A burglary without force is one when the burglary *only* enters through unlocked doors or windows. The *only* means that if they entered through an unlocked door or window and then forced open another door or window, the entire burglary is classified as forcible entry. Forcible entry is any when the burglar has to access a locked door or window *through any means of entering*. This is very broad and includes actions ranging from breaking the window - which people generally think of when it comes to forcible entry - to less obvious uses of force like picking the lock or even using a passcard (e.g. a hotel room card) to unlock the door. The FBI also includes when a burglar enters a building legally and then stays past their allowed time such as walking into a store and hiding somewhere until past closing time. 

Figure \@ref(fig:nibrsBurglaryForce) shows the annual trend in the share of burglaries with or without force. Nearly all burglaries at the start of our data used force and has steadily declined until fewer than 60% of burglaries have force in 2022. However, this data is likely affected by differences in reporting by whether force was used. For example, consider two cases of burglary in which the victim does not notice any property stolen. If you come home and find your front door kicked in you'll almost certainly call the police, regardless of if you find any property taken. But if you come home and the door is just unlocked, and do not notice anything stolen, then you may just chalk it up to forgetting to lock the door and never alert the police. 

```{r nibrsBurglaryForce, fig.cap="The annual percent of burglaries, by whether entry used force, 1991-2023."}
nibrs_offense_summary_stats  %>%
  ggplot(aes(x = year)) +
  geom_line(aes(y = percent_burglary_force, color = "Force"), linewidth = 1.05) +
  geom_line(aes(y = percent_burglary_no_force , color = "No Force"), linewidth = 1.05) +
  xlab("Year") +
  ylab("% of Burglaries") +
  theme_crim() +
  scale_y_continuous(labels = scales::percent, expand = c(0, 0), limits = c(0, NA)) +
  labs(color = "") +
  expand_limits(y = 0) +
  scale_color_manual(values = c("Force" = "#1b9e77",
                                "No Force" = "#d95f02")) +
  scale_x_continuous(breaks = time_series_x_axis_year_breaks)
```

## Hate crime indicator (bias motivation)

For each offense, NIBRS indicates whether it had a bias motivation, which is NIBRS way of saying if it was a hate crime or not. Offenses are considered hate crimes when the police has some evidence that the offense was motivated - at least in part - against the victim. Since not all hate crimes have evidence of bias (e.g. a person targeted due to bias but without the offender providing evidence that it is a hate crime) many hate crimes will likely not be reported as such. The process for what the FBI classifies as a hate crime is the same in NIBRS as in the Hate Crime dataset discussed in detail in Chapter \@ref(hate_crimes). For more information on how hate crimes are defined and important caveats with these data, please read that chapter. 

Table \@ref(tab:offenseBiasMotivation) shows the percent of all offenses in 2022 that were classified with or without a bias motivation. Nearly all offenses - 99.9% - are without a bias motivation or with an unknown bias motivation meaning that they are not considered hate crimes. 

```{r }
temp <- offense
temp <- 
  temp %>%
  distinct(unique_incident_id, .keep_all = TRUE)
temp$bias_motivation_binary <- "bias motivation"
temp$bias_motivation_binary[temp$bias_motivation %in% c("no bias motivation",
                                                        "unknown bias motivation")] <- "no bias motivation"
temp <- make_frequency_table(temp,
                             "bias_motivation_binary",
                             c("Bias Motivation",
                               "\\# of Offenses",
                               "\\% of Offenses"))
kableExtra::kbl(temp, 
                #  format = "html",
                digits = 2, 
                align = c("l", "l", "r", "r"),
                #booktabs = TRUE, 
                longtable = TRUE,
                escape = TRUE,
                label = "offenseBiasMotivation",
                caption = "The number and percent of incidents that had a known bias motivation for all incidents reported in 2022.") %>%
  kable_styling(bootstrap_options = "striped", full_width = FALSE, latex_options = c("hold_position", "repeat_header"))
```

In Table \@ref(tab:offenseBiasMotivationBiases) we can see the breakdown in the bias motivation of hate crimes, for all incidents where the crime is considered a hate crime. The most common bias motivation is anti-Black, which accounts for 31% of all hate crimes in the data. This is followed by anti-White at 10% and "anti-male homosexual (gay)" at almost 9% of crimes. The only other biases that make up more than 5% of hate crimes are anti-Jewish, anti-Hispanic, and "anti-Lesbian, Gay, Bisexual, Or Transgender (Mixed Group)."^[Looking at the raw percents is a rather naive measure as it assumes that all groups have equal risk of hate crimes. Certain groups, such as Jews and transgender people, make up a relatively small share of the percent of hate crimes but when considering their percent of the overall population (itself only a slightly better measure as even total population does not account for true opportunity to be victimized) are victimized at much higher rates than many other groups.]

Some of these groups are also subsets of larger groups. For example, anti-Muslim, anti-Arab, and anti-Sikh (while Sikhs are not Muslim or Arabic, some Sikhs have been targeted by people who incorrectly believe that they are) are probably all the same bias motivation. Likewise, attacks on LGBT people are in multiple categories, which allows for a more detailed understanding of these hate crimes but requires aggregation to look at them as a group. While this aggregation is easy enough to do, accidentally missing any of the subcategories could vastly undercount offenses against the larger category.  

```{r }
temp <- offense
temp <- 
  temp %>%
  distinct(unique_incident_id, .keep_all = TRUE)
temp <- make_frequency_table(temp[!temp$bias_motivation %in% c("no bias motivation",
                                                               "unknown bias motivation") &
                                    !is.na(temp$bias_motivation),],
                             "bias_motivation",
                             c("Bias Motivation",
                               "\\# of Offenses",
                               "\\% of Offenses")) %>%
  left_join(bias_motivation_first_year  %>% 
              mutate(bias_motivation = capitalize_words(bias_motivation)) %>%
              rename(`Bias Motivation` = bias_motivation)) %>%
  select(`Bias Motivation`,
         `First Year` = year,
         everything())
temp$`First Year`[is.na(temp$`First Year`)] <- "-"

kableExtra::kbl(temp, 
                #  format = "html",
                digits = 2, 
                align = c("l", "l", "r", "r"),
                #booktabs = TRUE, 
                longtable = TRUE,
                label = "offenseBiasMotivationBiases",
                escape = TRUE,
                caption = "The bias motivation (i.e. if it was a hate crime and what type of hate crime) for all incidents reported in 2022 that were classified as hate crimes.") %>%
  kable_styling(bootstrap_options = "striped", full_width = FALSE, latex_options = c("hold_position", "repeat_header"))
```

Even as the number of agencies reporting to NIBRS increased over time, the share of offenses that are considered hate crimes has remained fairly steady, as shown in Figure \@ref(fig:nibrsOffenseBias) with no year having more than 0.1% of offenses considered hate crimes. 

```{r}
biases <- unique(offense$bias_motivation)
biases <- biases[!biases %in% c("no bias motivation",
                                "unknown bias motivation")]
biases <- sort(biases)

final <- data.frame()
for (bias in biases) {
  temp <- make_frequency_table(offense %>% filter(bias_motivation %in% bias) %>%
                                 distinct(unique_incident_id,
                                          bias_motivation,
                                          ucr_offense_code),
                               "ucr_offense_code",
                               c("Offense",
                                 "\\# of Offenses",
                                 "\\% of Offenses")) %>%
    mutate(bias = capitalize_words(bias)) %>%
    select(`bias`,
           `Offense`,
           "\\# of Offenses",
           "\\% of Offenses")
  
  temp_final <- temp[nrow(temp), ]
  temp <- temp[-nrow(temp), ]
  temp_other <- temp
  names(temp_other)[3:4] <- c("count", "percent")
  temp_other_count <- sum(parse_number(temp_other$count[6:nrow(temp_other)]))
  temp_other_percent <- sum(parse_number(temp_other$percent[6:nrow(temp_other)]))
  temp_other <- data.frame(bias = bias,
                           offense = "All Other",
                           count = temp_other_count,
                           percent = temp_other_percent) %>%
    mutate(count = prettyNum(count, big.mark = ","),
           percent = paste0(percent, "\\%")) %>%
    rename(Offense = offense,
           `\\# of Offenses` = count,
           `\\% of Offenses` = percent)
  temp_other$bias <- capitalize_words(temp_other$bias)
  temp <- temp[1:5, ]
  temp <- temp %>%
    bind_rows(temp_other,
              temp_final)
  
  final <-
    final %>%
    bind_rows(temp)
}


kableExtra::kbl(final, 
                #  format = "html",
                digits = 2, 
                align = c("l", "l", "r", "r"),
                #booktabs = TRUE, 
                longtable = TRUE,
                escape = TRUE,
                label = "offenseBiasOffense",
                caption = "The number and percent of offenses by bias motivation, 2023. ") %>%
  kable_styling(bootstrap_options = "striped", full_width = FALSE, latex_options = c("hold_position", "repeat_header"))
```


```{r nibrsOffenseBias, fig.cap="The annual percent of offenses reported as having a bias motivation (i.e. hate crime), 1993-2023."}
nibrs_offense_summary_stats  %>%
  filter(year > 1992) %>%
  ggplot(aes(x = year, y = percent_with_bias )) +
  geom_line(linewidth = 1.05) +
  xlab("Year") +
  ylab("% With Bias Motivation") +
  theme_crim() +
  scale_y_continuous(labels = scales::percent, expand = c(0, 0), limits = c(0, NA)) +
  labs(color = "") +
  expand_limits(y = 0)
```