Skip to content

Future Year Population

Alex Bettinardi edited this page Apr 24, 2019 · 27 revisions

Introduction

Developing the Synthetic Population control files and seed data for a base or existing year is a relatively straight forward exercise. The near current data (usually just two years back) is available from census.gov, and a user just needs to associate the census data of interest with the approriate zones or geographies for the Synthetic Population (there's a little more to it than that, but at a high level it's a fairly simple process). For future years, there is no official data set to use. Therefore, a methodology and process needs to be established to develop the control files and seed data for future years.

One foundation element to the future year population is... the population (total). In Oregon, the Population Research Center (PRC) at Portland State University (PSU) is responsible for forecasting the population into the future for all counties, cities and MPOs in Oregon. Unfortunately, model boundaries usually do not perfectly align with political boundaries, so it is very rare when PRC future totals can be used exactly as provided. But the future forecasts provided do provide the basis for Oregon models' future year population totals.

PRC provides those future year populations in various time steps and by age and gender demographics. Because PRC provides a trusted source for both total population, but also population by age and gender, ODOT is working to use that information to properly "age" model populations into the future. However, it's not as simple as just providing new age controls to the Population Synthesizer. All the controls need to be consistent. As an example, older households generally have fewer jobs, fewer persons, fewer children. If only the age control is updated, but the total number of children stays the same, the controls are not consistent, and the population synthesizer will do a poor job of matching all the controls provided. Therefore, the process of aging the population into the future has several steps, which are covered further in the following section.

Steps To Adjust PUMS Weights to Future Age Distribution

As is introduced above, building a future population should include aging the population. Aging the population requires work to ensure that all the population synthesis controls are consistent. Here are the steps that have been applied for the ABM to work to ensure that the controls are consistent and will produce the intended synthetic population output for the future year ABM:

  • Base year controls are provided including base year weighted PUMS seed
  • An iterative proportional fitting (IPF) function is applied to the original PUMS weights to create a starting point where the PUMS seed weights produce the ABM's base year age distribution
  • New (Future) age distribution is provided
  • The IPF function is used again, this time to adjust the PUMS weights from the base year age distribution to the future year age distribution
  • The adjusted PUMS weights now will summarize (total) to the future household age distribution. The benefit of the re-weighted PUMS records provides the ability to tabulate any demographic summaries (workers, children, income, occupation) for the local areas' residents that have been re-weighted to align with an aged population for the area. This allows for a method to assess how the aged population could change other demographics and a consistent way to project those other census level demographic controls.

Each of these steps is now discussed in greater detail.

Base Year Controls and Seed Data

The Process beings with reading in base year level controls, including the raw PUMS data. The PUMS data is then processed (adding ABM specific fields, like the 6 occupation types) in the exact way that it is processed prior to population synthesis, giving access to all the fields used to control the population synthesis. Part of processing also includes filtering up to just PUMAs for the region.

While not required, the process has some initial review steps and plots for understanding the different characteristics of the older population (greater the 65 years old), and those less than 65 years old. This review and plotting is done with the original PUMS weights before any adjusting is done, and provides some insight into trends to expect. Specifically, how the following distributions are different for older (greater than 65) populations:

  • Household Type shifts to mobile home from multifamily and duplex (single family is roughly the same).
  • Household size shifts to 1 and 2 person households
  • Household income decreases
  • The number of zero worker households increases
  • The number of zero children households increases
  • Jobs decrease across the board, but specifically blue collar and "Natural Resources, Construction, and Maintenance" occupations get the biggest decreases

Adjust PUMS Weights to Base Year Age Distribution

This maybe considered an optional step. Overall the process re-weights PUMS data to the future year age distribution. This process could be applied directly to the original (raw) PUMS weights. However, the original PUMS weights don't align perfectly with the base year age distribution that has been specified. This could be due to the fact that PUMAs don't perfectly nest within the model boundary, the model age distribution is established separately from the census data, and/or that there are ranges of uncertainty around any of the ACS data within the PUMS records. All of those contribute to the slight differences between the age distribution found in the PUMS data from using the original PUMS weights and the age distribution input to the population synthesizer for the base year. Because of these differences the first step is to apply the IPF process to the shift the PUMS weight to match the base year age distribution before the IPF is used to shift the PUMS weights further to match the future year age distribution.

In either case (for the base year age distribution or the future year age distribution) the steps to IPF (adjust) the PUMS weights are the same:

  1. An Age category field is added to the PUMS records to bin each person record into the 12 age bins used in the ABM's population synthesis.
  2. A table is created where each row is an individual PUMS household ID and the columns are the 12 age categories. So every household record has a count of how many persons fall in each age bin.
HHID AGE1 AGE2 AGE3 AGE4 AGE5 AGE6 AGE7 AGE8 AGE9 AGE10 AGE11 AGE12
20 1 2 0 0 0 0 2 0 0 0 0 0
49 0 3 1 0 0 0 2 0 0 0 0 0
66 0 0 0 0 0 0 0 0 0 1 1 0
78 0 0 0 0 0 0 0 0 1 0 0 0
86 0 0 0 0 0 0 0 0 0 1 0 0
98 0 0 0 0 0 0 0 0 0 2 0 0
  1. The starting household weights are saved out.
  2. The IPF control is set, which is the total number of person records per bin (meaning that the column sums of the table above needs to sum to the control given).
  3. A While loop is then run. In that while loop the change needed for each of the 12 age categories is calculated as a factor. A given household weight can apply to multiple age categories, the factor for each age category is repeated for every person in the household and then an average is taken. For household ID 20 in the table above this would mean that the factor for Age1 would be included once, Age2 twice, and Age7 twice. So the factor applied to HHID 20 would be = (Age1_Factor + Age2_Factor + Age2_Factor + Age7_Factor + Age7_Factor) / 5 persons
  4. The while loop is iterated until the adjusted weights by household times the persons in each age category summed by age category equals the global control totals by the 12 age categories within a set tolerance, or until a maximum number of iterations is hit. For the base year it takes 25 iterations of this loop to achieve all 12 age category totals are within 0.01% of the total number of people specified per age category.

Future Year Controls (Age Category Totals)

This step is a simple reading in of the desired future year totals by age category, providing the IPF with the controls needed to re balance the PUMS weights to the future year (aged) populations by age category.

Adjust PUMS Weights to Future Year Age Distribution

This process is identical to the steps for the base year, except now the starting point is the weights that were adjusted for the base year age distributions, as opposed to raw PUMS weights, and the controls are notw the future age distributions. In this SOABM example, it took 30 iterations for the IPF while loop to achieve all 12 age category totals are within 0.01% of the total number of people specified per age category.

Resulting Future Population Outcomes

The adjusted (aged) PUMS data for the model area provides the relationship needed to grow the future population controls in away that is consistent with the demographic area for the region. The following summarizes how each of the population synthesizer controls for the ABM were impacted by using the aged PUMS weights.

Total Households

As is discussed in the introduction the population total by age is the one "known" input to the future population synthesis. The total number of households is informed by the population. It's an outcome of the population input. After using the PUMS aged weights, the 2045 SOABM average household size was determined to be 2.12 across the three PUMAs covered by SOABM. This was a decrease from the 2010 model region average household size of 2.45. For SOABM, 351,200 persons were generated for 2045, with a corresponding 165,600 households. This total number of households was then distributed across the MAZs (as informed by land use allocation processes and consolation with local partners). Then the MAZ totals were used as aged PUMS data was used to proportion the total future households across the various household level demographics at the MAZ and TAZ level. As is discussed in the following sections.

It should be further noted that the existing year demographic trends by MAZ and TAZ are used as the starting point. The region wide trends from the aged PUMS analysis is then used (along with the future year total households by MAZ) to just the existing year demographic trends into the future. The adjustments are made consistently across all zones in the region.

Households by Housing Type

After the aged PUMS data trends were applied the overall housing type by MAZ shifted as follows:

SF_HH DUPLEX_HH MF_HH MH_HH
Current 73% 3% 14% 9%
Future 64% 4% 16% 16%

With a slight shift away from single family to other types, specifically mobile home.

Households by Income

HHINC1 HHINC2 HHINC3 HHINC4 HHINC5 HHINC6 HHINC7 HHINC8
Current 15% 13% 13% 16% 19% 11% 8% 4%
Future 19% 15% 14% 16% 17% 9% 7% 4%

This table shows that the aged PUMS data shifts overall income slightly lower.

Households by Size

HHSIZE1 HHSIZE2 HHSIZE3 HHSIZE4
Current 27% 39% 15% 19%
Future 33% 42% 12% 13%

This table shows that the aged PUMS data shifts overall smaller homes.

Households with Children

HHWCHILD HHWOCHILD
Current 29% 71%
Future 20% 80%

This table shows that the aged PUMS data shifts overall to more households without children.

Households by Workers

HHWORK0 HHWORK1 HHWORK2 HHWORK3
Current 37% 35% 24% 3%
Future 47% 30% 19% 3%

This table shows that the aged PUMS data shifts to less workers overall with a large shift to zero worker households. This represents an overall shift to about 38% have the total population having a full time occupation (~132,000 workers).

Total Workers by Occupation

As is noted above overall workers decreases from 41% of the population having a full time job to 38% of the total population. Overall the distribution of those jobs by occupation type doesn't shift significantly.

OCCP1 OCCP2 OCCP3 OCCP4 OCCP5 OCCP6
Current 23% 19% 12% 25% 9% 12%
Future 22% 17% 13% 24% 10% 13%

Review of Weight Adjustments

As part of the review of the results, the extent to which the PUMS weights were adjusted was also reviewed. The following figure has a black line depicting the unaltered PUMS weight for each Southern Oregon (PUMAs 901, 902, and 800) PUMS household record used ordered from smallest (left) to largest (right). The red scatter dots around the black line show how each PUMS household weight was altered by the IPF in order to achieve the 2017 age distribution for the model area. The blue does represent the further adjustment needed to achieve the 2045 age distribution using the existing PUMS data. One can see that there is a distinct upper adjustment line in the blue dots. These are likely older households that get used more often via the weights to achieve the older age distribution. The larger population (the majority population) is shifted down, but not as much, as the weight adjustment is distributed over a larger record (population) sample.

An important note for both of the following plots - The weights have been normalized so that each set of weights produces the same number of total households (in this case, each set of weights produces 154,898 households). In actual execution the 2045 weights produce the number of households needed in 2045 (~165,600) and the 2017 weights produce the number of households specified for 2017 (115,275). But for an apples-to-apples comparison all the weights were normalized the the original household total of the three PUMA region so that all the weights would sum to the same answer, and no overall skewing to a higher or lower total population would be present in the comparisons.

The second plot below shows how the percent change is established when adjusting the weights to 2017 and is then increased when adjusting the weights to 2045. The 2017 percent change is clustered around zero with no percent change being greater than 30%. When pushing the PUMS weights to 2045, the older households are pushed to change by as much as 100%. The younger households are decreased by up-to ~50% to compensate for the aging. Fewer households have a change near 0%, as this overall shifting seems relatively evenly distributed across all households (weights).

Overall the adjustments to the PUMS weights seem reasonable.

Clone this wiki locally