Discrete Choice Models #134

jiffyclub · 2014-12-09T01:07:12Z

This is a big set of changes associated with ActivitySim/activitysim#3.

The goal here is to generalize our existing location choice model classes into discrete choice models with varying capabilities. The LCMs had baked in some assumptions that made them inappropriate for something like automobile ownership, even though the underlying MNL code is fully compatible with general discrete choice modeling. These some of the assumptions:

only one chooser was used for calculating probabilites
choices were made for choosers in aggregate because all choosers had the same probabilities, locations are unavailable once chosen, and it results in better performance
locations were removed from the alternatives pool at the group level because those locations were no longer available to others once chosen

To support LCMs we need to keep those capabilities, but we also need to be able to calculate probabilities and make choices on a per-chooser basis, as well as not modify alternatives between making choices for different segments.

In this PR I've changed all class names from "LocationChoice" to "DiscreteChoice" and cleaned up docstrings and variable names that referred to locations. I've also add some new options: probability_mode (can be single_chooser or full_product) and choice_mode (can be individual or aggregate) for controlling how probabilities are calculated and choices are made. At the group level there's a new option remove_alts that controls whether alternatives are filtered after performing prediction for a segment.

The defaults are full_product, individual, and False for probability_mode, choice_mode, and remove_alts, respectively. These are the settings you'd use for something like automobile ownership.

For something like LCMs you'd set those to single_chooser, aggregate, and True.

The Travis runs are failing right now because these changes are breaking sanfran_urbansim. I'll make a PR on there shortly.

We'd hardcoded that when doing prediction a DCM would select only the first chooser from the choosers table in order to get a PDF used when assigning choosers to alternatives. I've removed that so now all choosers go into making the interaction dataset and probabilities come back calculated one per chooser across all alternatives. Obviously we'll need to figure out a way to make this manageable for doing LCMs. I think the unit_choice method still needs work, bits and pieces of the code seem assume that there will be only one set of probabilities for all choosers, not probabilies per chooser.

Instead of separately returning probabilities and alternatives information this groups them all together. The probabilities have a MultiIndex with chooser IDs on the outside and alternative IDs on the inside.

We need to make choices per chooser because each chooser has a different probability across the alternatives. For many discrete choice situations we *don't* want to remove alternatives that are chosen, so remove that functionality.

Note that none of this functionality is implemented yet, this commit introduces the arguments to calls, doc strings, yaml, and tests. The options will allow users to specify how they want probabilities calculated and choices made. This will allow users to choose between calculating probabilities for all choosers or just one, and if they want choices made per chooser or for all choosers at once. Users can also provide their own functions for calculating those things.

Added support for modes 'single_chooser' and 'full_product' when calculating probabilities. These actually affect the merging of choosers and alternatives into an interaction dataset. In 'single_chooser' only the first chooser in the choosers table (after filtering) is used to construct the merged interaction table. This is the same behavior as UrbanSim's previous LCM class. In 'full_prodct' mode all choosers are merged with all alternatives.

Users of MNLDiscreteChoiceModel can choose either 'individual' or 'aggregate' mode for matching choosers to alternatives during prediction. In 'individual' mode a choice is made individually for every chooser and each chooser has access to all alternatives. In 'aggregate' mode choices are made for every chooser at the same time, which implies that alternatives are unavailble to others once chosen and that all choosers have the same probabilities over alternatives. 'aggregate' mode should only be used with 'single_chooser' probability mode.

This controls whether alternatives are removed from the alternatives pool between doing prediction for different segments. When doing LCMs (e.g. probability_mode='single_chooser' and choice_mode='aggregate') this should be set to True. For doing something like automobile ownership this should be False. False is the default.

.loc is really slow with large indexes. We can do things much faster using location based indexing instead of label based indexing. Here I'm replacing a .loc with a .take.

Discrete Choice Models

This was referenced Dec 9, 2014

Discrete Choice Model Capabilities ActivitySim/activitysim#3

Closed

Update for Discrete Choice Models UDST/sanfran_urbansim#8

Merged

fscottfoti mentioned this pull request Dec 17, 2014

Progress report and discussion topics for 12-19-14 meeting ActivitySim/activitysim#10

Closed

jiffyclub added 15 commits February 24, 2015 14:06

rename location choice stuff to "discrete choice"

7a38210

basic smoke test for mnl_interaction_dataset

aaa0ea5

Return DCM probabilities as MultiIndexed Series

679166d

Instead of separately returning probabilities and alternatives information this groups them all together. The probabilities have a MultiIndex with chooser IDs on the outside and alternative IDs on the inside.

update module/class names in models/__init__.py

c25b861

Make choices per-chooser and don't remove alts

b7ce91c

We need to make choices per chooser because each chooser has a different probability across the alternatives. For many discrete choice situations we *don't* want to remove alternatives that are chosen, so remove that functionality.

Series will create NaN values if no data given

2f0d5a2

location language -> "discrete" or "alternative"

2c0dae6

Refactor DCM tests with a DCM fixture

865de2e

more concrete tests for mnl_interaction_dataset

1aa4683

Use numeric indexing and .take instead of .loc

6f7a6e7

.loc is really slow with large indexes. We can do things much faster using location based indexing instead of label based indexing. Here I'm replacing a .loc with a .take.

jiffyclub force-pushed the dcm branch from f88f665 to 6f7a6e7 Compare February 24, 2015 22:07

pep8 fix

dcd84f0

jiffyclub added a commit that referenced this pull request Feb 24, 2015

Merge pull request #134 from synthicity/dcm

a94f05e

Discrete Choice Models

jiffyclub merged commit a94f05e into master Feb 24, 2015

jiffyclub deleted the dcm branch February 24, 2015 22:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discrete Choice Models #134

Discrete Choice Models #134

jiffyclub commented Dec 9, 2014

Discrete Choice Models #134

Discrete Choice Models #134

Conversation

jiffyclub commented Dec 9, 2014