Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate core and dense state space. #310

Merged
merged 164 commits into from
Feb 28, 2020

Conversation

tobiasraabe
Copy link
Member

@tobiasraabe tobiasraabe commented Dec 30, 2019

Closes #237.

Current behavior

Currently, a part of the state space is simply duplicated by all values of another dimension. The first part comprises experiences and similar variables which are mutually exclusive. We call this the core state space. Types and observables duplicate the core state space which is why we call it the dense state space.

The duplication causes a lot of problems because it unnecessarily requires a lot of memory.

Desired behavior

Remove the duplication to save memory and try to exploit the division for better parallelization.

Solution / Implementation

Core changes

  • I started by extracting the core state space which is a DataFrame containing not only the core state space dimension but also all covariates which can be computed using solely information of the state space. This costs memory but saves some runtime as we frequently need this information.
  • There are two kinds of state spaces
    • _SingleDimStateSpace is similar to the state space of KW94 and has no dense dimension.
    • _MultiDimStateSpace comprises many of the former state spaces for each of the product of dense dimensions. The attribute state_space.sub_state_spaces is a dictionary where the key are tuples of the values of dense state space dimensions. For a model with four types, the keys are [(0,), (1,), (2,), (3,)]. The values of the keys are dictionaries which contain information on the specific covariates for this part of the state space. Because the dense dimensions are constant per sub state space, the covariates are also constant. The keys are the names and values are values.
  • Access and setting attributes to the state space works via get_attribute, set_attribute and the accessor for data in one period.
  • There exist a decorator called parallelize_across_dense_dimensions which can be applied to functions whose calculations have no side-effects to other dense dimensions. For now, this is a simple for-loop, but it is easy to replace with joblib. The decorator recognizes, if arguments for the wrapped functions are dictionaries with dense state space dimensions as keys and automatically parallelizes over them.

Additional changes

  • Covariates were never handled better
    • Only relevant covariates are used.
    • The order in options is irrelevant as compute_covariates iterates over the covariates until no additional covariate can be computed.
    • Covariates are only computed if its dependencies are present without NaNs.
    • Covariates are separated into core, dense and mixed covariates.
    • There is a function to identify all relevant covariates for a subset of covariates instead of simply computing all covariates.
  • Dramatically reduced setup runtime for estimation via ML by vectorizing data checks from 50s to 7s for data with 40k obs.
  • Faster simulation.
  • random models do not have observables with just one level anymore.
  • Solving the model is aligned to the simulation and others. First, create the solve function with rp.get_solve_func(params, options), then solve with state_space = solve(params).

@codecov
Copy link

codecov bot commented Feb 12, 2020

Codecov Report

Merging #310 into master will not change coverage by %.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #310   +/-   ##
=======================================
  Coverage   84.22%   84.22%           
=======================================
  Files          42       42           
  Lines        2732     2732           
=======================================
  Hits         2301     2301           
  Misses        431      431           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8562584...8562584. Read the comment docs.

@tobiasraabe tobiasraabe changed the title [WIP] Separate core and dense state space. Separate core and dense state space. Feb 20, 2020
Copy link
Member

@janosg janosg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice PR. All comments are minor and it would be ok to merge as is.

respy/tests/utils.py Outdated Show resolved Hide resolved
respy/parallelization.py Outdated Show resolved Hide resolved
respy/pre_processing/process_covariates.py Show resolved Hide resolved
respy/shared.py Outdated Show resolved Hide resolved
respy/solve.py Outdated Show resolved Hide resolved
respy/state_space.py Outdated Show resolved Hide resolved
@tobiasraabe tobiasraabe merged commit 582b469 into master Feb 28, 2020
@tobiasraabe tobiasraabe deleted the separate-core-dense-state-space branch March 10, 2020 14:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Removing redundancy from the indexer by types and observed variables.
2 participants