Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Package naming for when there are substantial changes & remove hard-coding of package name in Call specifications #191

Closed
gregorbj opened this issue Aug 6, 2018 · 4 comments

Comments

@gregorbj
Copy link
Owner

gregorbj commented Aug 6, 2018

For a module that calls another module, the Call specification assigns to an alias the name of the module that is being called using the standard 'package_name::module_name' notation. For example, the BudgetHouseholdDvmt module calls among other modules, the CalculateHouseholdDvmt module. The alias assignment for this is:
CalcDvmt = "VEHouseholdTravel::CalculateHouseholdDvmt"

The issue this raises is that if or when some develops alternative CalculateHouseholdDvmt module, they have to do so in a package named VEHouseholdTravel in order for the call the work. Either that, or all other modules that call the CalculateHouseholdDvmt module need to be modified to reference the newly named module and package. So, for example when Liming's multimodel travel demand module is brought into VisionEval, either of these things would have to be done.

Of the two alternatives, the latter is less desirable because it requires changes to a number of other modules and makes the code less plug and play. It subverts the point of making the model system plug-and-play. If a new or modified module produces all data produced by the old module (i.e. the new Set specifications include all the old Set specifications) it should work in the system without other changes to the system.

The first alternative, however, also has an issue. While it is reasonable to expect that the new module to have the same name as the module it is replacing, it is problematical for the package name to be the same. Keeping the same package name makes it confusing and difficult to maintain both new and old versions for use because of naming conflicts. For example, say that a MPO uses the current version of the CalculateHouseholdDvmt module in some model runs. Then when a new CalculateHouseholdDvmt module comes along, they want to use that in some model runs. However, if they do so with the first alternative, the old package needs to be removed or renamed and replaced with the new package. If the MPO wants to revert to the older package they have to do a bunch of folder name changes to do that. The whole process can lead to confusion and uncertain results.

The solution to this would be to put new modules in new packages with different names and modify the Call specification for module calling to eliminate the hard-coding of the package name. So for example, to incorporate Liming's multimodal travel demand modules into the model system, the new package might be called something like 'CalculateHouseholdDvmtMM'. This package would include modules that have the same names and produce the same datasets as the existing package. These modules may produce additional datasets and the package may include additional modules.

Having a new package name won't cause any problems for including the new modules in a module run script because the package name is included as an argument in the runModule function call. It will require changes to how the framework processes Call specifications. To eliminate hard-coding the package name in the call alias assignment, the framework needs to know what package the module is in. One way to do this would be to use the parsed model run script to identify the package name associated with each module name. However, this will only work if a called module is also run by itself. If that is not the case, then this approach would not work. Another possibility would be use the parsed model run script to identify all the packages that are used and then search the packages for the name of the called module. This would work in all cases except where a called module exists in a package that is not otherwise used in a model run or where there are duplicate module names. In current implementations, neither of these will be a problem but some thought should be put into ways to avoid.

The key code management issue to resolve is whether packages that are substantially changed (i.e. where modules are substantially changed, not bug fixes, documentation fixes, etc.) should have new names. I believe that where there are substantial changes, the package should have a new name.

@gregorbj gregorbj changed the title Package naming for Remove hard-coding of package name in Call specifications Package naming for when there are substantial changes & remove hard-coding of package name in Call specifications Aug 6, 2018
@gregorbj
Copy link
Owner Author

There is another instance where hard-coding the package name needs to be corrected. The VEPowertrainsAndFuels::PowertrainFuelDefaults_ls data is loaded by several packages. In the future, there will be several VEPowertrainsAndFuels packages, each of which will represent a different set of assumptions. For example, one package could represent assumptions consistent with ZEV (zero emissions vehicles) rules while another could represent current CAFE standards only, while a third could represent a retrenchment from the CAFE standards. The main reason for having different packages to represent different powertrain/fuel scenarios is that these are difficult to research and develop and require knowledge the most transportation modelers and planners are unfamiliar with, so it makes sense to have some premade packages to reflect different scenarios. These packages would have slightly different names to make it easy for users to run the VE-RSPM or VE-State models with different vehicle and fuel assumptions. For example, a package which represents the ZEV rules could be called VEPowertrainsAndFuelsZEV. If the packages have different names, however, then other modules that use resources in them can't refer to a fixed package name.

@gregorbj
Copy link
Owner Author

gregorbj commented Jan 2, 2019

This issue is addressed in the ve_rspm_state branch. This needed to be done in order to serve ODOT's applications of VE-RSPM and VE-State which will use at least 2 different versions of the VEPowertrainsAndFuels package to reflect different assumptions about how the light-duty vehicle market will develop.

The initializeModel function was modified to create two correspondence tables (data frames) which map modules to packages and datasets to packages for all packages that are referred to in the run_model.R script. These data frames (ModulesByPackage_df, DatasetsByPackage_df) are saved in the model state list and file. The initializeModel and runModule functions were modified to recognize module calls (in the Call spec) that have either hard-wired references (i.e. package::module format) or soft (i.e. only identify module). If the reference is soft, the functions look up the corresponding package in the ModulesByPackage_df data frame in the model state list. This solves the problem of making module calls more generic so that a new package such as the VETravelDemandMM can be substituted for VEHouseholdTravel package without having to revise module call specifications to change the package name.

In addition a new function, loadPackageDataset, was added to enable loading a dataset from a VisionEval package using only the dataset name. This function uses the DatasetsByPackage_df data frame to look up the package the dataset is in. This enables a model dataset to be load by a module using just the dataset name. This is a necessary feature in order to have different VEPowertrainsAndFuels packages (e.g. VEPowertrainsAndFuelsOdotBau) which have different assumptions about future vehicle fleets.

The checkModuleSpecs function was also modified to remove the requirement that Call specifications be in package::module format.

VE-RSPM modules were changed to replace hard-wired references to soft references except in the case of model estimation code which loads model estimation data from another package. No changes were made to VE-RPAT modules.

The new code was tested with modules that have hard-wired references and modules that have soft references. Both the VE-RSPM and VE-RPAT models were run using the new code without error.

@gregorbj
Copy link
Owner Author

gregorbj commented Jan 3, 2019

Made changes to address 2 use cases that the previous approach would fail on. The previous approach relied on parsing the model run script to identify all VE packages that might be referenced. While this works for a full model run, it does not work for module testing and may not work when a model run is split up and datastore references are used. To address this, an additional parameter was added to the run_parameters.json file. The new parameter, RequiredVEPackages, is a listing of all VE packages used to implement a model (e.g. RSPM or RPAT). The visioneval code was modified to check whether this parameter exists and whether all the listed packages are installed. It compiles the module and dataset correspondence tables (as noted above) from this listing rather than from the model run script. The test run_parameters.json files for the RSPM and RPAT model tests were modified to include the RequiredVEPackages parameter and those tests were run successfully with the changes. In addition, the run_parameters.json file in the tests directory of the VETravelPerformance package was modified to add the parameter and those module tests completed satisfactorily. This change will need to be made in all of the tests directories of VEPackages.

bstabler referenced this issue Jan 7, 2019
…VE package that contains it. This enables loading datasets without hard-wiring the reference to a particular package using package::dataset notation
bstabler referenced this issue Jan 7, 2019
…e test function so that soft references to module calls and package datasets are supported.
bstabler referenced this issue Jan 7, 2019
…between modules and packages (ModulesByPackage_df) and between model datasets and packages (DatasetsByPackage_df) and incorparate into model state list. Add code which enables module calls to only identify the module name instead of hard-wiring in package::module format (which is also supported)
@gregorbj
Copy link
Owner Author

This has been implemented in the ve_rspm_state branch. All Travis tests pass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants