Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Settings validation resolution when user can't be prompted #651

Closed
keckler opened this issue May 2, 2022 · 4 comments
Closed

Settings validation resolution when user can't be prompted #651

keckler opened this issue May 2, 2022 · 4 comments
Assignees
Labels
enhancement New feature or request

Comments

@keckler
Copy link
Member

keckler commented May 2, 2022

Within the settingsValidation module, there is the ability for a case settings object to be inspected using an Inspector. The Inspector will go through all valid queries and check the user's case settings to see that a bunch of common-sense criteria are met. All of those queries in the framework are located here:

def _inspectSettings(self):
"""Check settings for inconsistencies."""
# import here to avoid cyclic issues
from armi import operators
self.addQueryBadLocationWillLikelyFail("operatorLocation")
self.addQuery(
lambda: self.cs["outputFileExtension"] == "pdf" and self.cs["genReports"],
"Output files of '.pdf' format are not supported by the reporting HTML generator. '.pdf' "
"images will not be included.",
"Switch to '.png'?",
lambda: self._assignCS("outputFileExtension", "png"),
)
self.addQuery(
lambda: (
self.cs[globalSettings.CONF_ZONING_STRATEGY] == "manual"
and not self.cs["zoneDefinitions"]
),
"`manual` zoningStrategy requires that `zoneDefinitions` setting be defined. Run will have "
"no zones.",
"",
self.NO_ACTION,
)
self.addQuery(
lambda: (
(
self.cs["beta"]
and isinstance(self.cs["beta"], list)
and not self.cs["decayConstants"]
)
or (self.cs["decayConstants"] and not self.cs["beta"])
),
"Both beta components and decay constants should be provided if either are "
"being supplied.",
"",
self.NO_ACTION,
),
self.addQuery(
lambda: self.cs["skipCycles"] > 0 and not self.cs["reloadDBName"],
"You have chosen to do a restart case without specifying a database to load from. "
"Run will load from output files, if they exist but burnup, etc. will not be updated.",
"",
self.NO_ACTION,
)
self.addQuery(
lambda: self.cs["runType"] != operators.RunTypes.SNAPSHOTS
and self.cs["loadStyle"] == "fromDB"
and self.cs["startCycle"] == 0
and self.cs["startNode"] == 0,
"Starting from cycle 0, and time node 0 was chosen. Restart runs load from "
"the time node just before the restart. There is no time node to load from "
"before cycle 0 node 0. Either switch to the snapshot operator, start from "
"a different time step or load from inputs rather than database as "
"`loadStyle`.",
"",
self.NO_ACTION,
)
self.addQuery(
lambda: self.cs["runType"] == operators.RunTypes.SNAPSHOTS
and not (self.cs["dumpSnapshot"] or self.cs["defaultSnapshots"]),
"The Snapshots operator was specified, but no dump snapshots were chosen."
"Please specify snapshot steps with the `dumpSnapshot` setting.",
"",
self.NO_ACTION,
)
self.addQuery(
lambda: self.cs.caseTitle.lower()
== os.path.splitext(os.path.basename(self.cs["reloadDBName"].lower()))[0],
"Snapshot DB ({0}) and main DB ({1}) cannot have the same name."
"Change name of settings file and resubmit.".format(
self.cs["reloadDBName"], self.cs.caseTitle
),
"",
self.NO_ACTION,
)
self.addQuery(
lambda: self.cs["reloadDBName"] != ""
and not os.path.exists(self.cs["reloadDBName"]),
"Reload database {} does not exist. \nPlease point to an existing DB, "
"or set to empty and load from input.".format(self.cs["reloadDBName"]),
"",
self.NO_ACTION,
)
def _willBeCopiedFrom(fName):
for copyFile in self.cs["copyFilesFrom"]:
if fName == os.path.split(copyFile)[1]:
return True
return False
self.addQuery(
lambda: self.cs["explicitRepeatShuffles"]
and not self._csRelativePathExists(self.cs["explicitRepeatShuffles"])
and not _willBeCopiedFrom(self.cs["explicitRepeatShuffles"]),
"The specified repeat shuffle file `{0}` does not exist, and won't be copied from elsewhere. "
"Run will crash.".format(self.cs["explicitRepeatShuffles"]),
"",
self.NO_ACTION,
)
self.addQuery(
lambda: not self.cs["power"],
"No power level set. You must always start by importing a base settings file.",
"",
self.NO_ACTION,
)
# The gamma cross sections generated for MC2-3 by ANL were done with NJOY with
# P3 scattering. MC2-3 would have to be modified and the gamma cross sections
# re-generated with NJOY for MC2-3 to allow any other scattering order with
# gamma cross sections enabled.
self.addQuery(
lambda: (
"MC2v3" in self.cs["xsKernel"]
and neutronics.gammaXsAreRequested(self.cs)
and self.cs["xsScatteringOrder"] != 3
),
"MC2-3 will crash if a scattering order is not set to 3 when generating gamma XS.",
"Would you like to set the `xsScatteringOrder` to 3?",
lambda: self._assignCS("xsScatteringOrder", 3),
)
self.addQuery(
lambda: self.cs["outputCacheLocation"]
and not os.path.exists(self.cs["outputCacheLocation"]),
"`outputCacheLocation` path {} does not exist. Please specify a location that exists.".format(
self.cs["outputCacheLocation"]
),
"",
self.NO_ACTION,
)
self.addQuery(
lambda: self.cs["numCoupledIterations"] > 0,
"You have {0} coupling iterations selected.".format(
self.cs["numCoupledIterations"]
),
"1 coupling iteration doubles run time (2 triples, etc). Do you want to use 0 instead? ",
lambda: self._assignCS("numCoupledIterations", 0),
)
def _factorsAreValid(factors, maxVal=1.0):
try:
expandedList = expandRepeatedFloats(factors)
except (ValueError, IndexError):
return False
return (
all(0.0 <= val <= maxVal for val in expandedList)
and len(expandedList) == self.cs["nCycles"]
)
self.addQuery(
lambda: self.cs["availabilityFactors"]
and not _factorsAreValid(self.cs["availabilityFactors"]),
"`availabilityFactors` was not set to a list compatible with the number of cycles. "
"Please update input or use constant duration.",
"Use constant availability factor specified in `availabilityFactor` setting?",
lambda: self._assignCS("availabilityFactors", []),
)
self.addQuery(
lambda: self.cs["powerFractions"]
and not _factorsAreValid(self.cs["powerFractions"]),
"`powerFractions` was not set to a compatible list. "
"Please update input or use full power at all cycles.",
"Use full power for all cycles?",
lambda: self._assignCS("powerFractions", []),
)
self.addQuery(
lambda: (
self.cs["cycleLengths"]
and not _factorsAreValid(self.cs["cycleLengths"], maxVal=1e10)
),
"The number of cycles defined in `cycleLengths` is not equal to the number of cycles in "
"the run `nCycles`."
"Please ensure that there is exactly one duration for each cycle in the run or use "
"{} days for all cycles.".format(self.cs["cycleLength"]),
"Use {} days for all cycles?".format(self.cs["cycleLength"]),
lambda: self._assignCS("cycleLengths", []),
)
def _correctCycles():
newSettings = {"nCycles": 1, "burnSteps": 0}
self.cs = self.cs.modified(newSettings=newSettings)
self.addQuery(
lambda: not self.cs["cycleLengths"] and self.cs["nCycles"] == 0,
"Cannot run 0 cycles. Set burnSteps to 0 to activate a single time-independent case.",
"Set 1 cycle and 0 burnSteps for single time-independent case?",
_correctCycles,
)
self.addQuery(
lambda: (
self.cs["runType"] == "Standard"
and self.cs["burnSteps"] == 0
and (len(self.cs["cycleLengths"]) > 1 or self.cs["nCycles"] > 1)
),
"Cannot run multi-cycle standard cases with 0 burnSteps per cycle. Please update settings.",
"",
self.NO_ACTION,
)
def decayCyclesHaveInputThatWillBeIgnored():
"""Check if there is any decay-related input that will be ignored."""
try:
powerFracs = expandRepeatedFloats(self.cs["powerFractions"])
availabilities = expandRepeatedFloats(
self.cs["availabilityFactors"]
) or ([self.cs["availabilityFactor"]] * self.cs["nCycles"])
except: # pylint: disable=bare-except
return True
for pf, af in zip(powerFracs, availabilities):
if pf > 0.0 and af == 0.0:
# this will be a full decay step and any power fraction will be ignored. May be ok, but warn.
return True
return False
self.addQuery(
lambda: (
self.cs["cycleLengths"]
and self.cs["powerFractions"]
and decayCyclesHaveInputThatWillBeIgnored()
),
"At least one cycle has a non-zero power fraction but an availability of zero. Please "
"update the input.",
"",
self.NO_ACTION,
)
self.addQuery(
lambda: self.cs["operatorLocation"]
and self.cs["runType"] != operators.RunTypes.STANDARD,
"The `runType` setting is set to `{0}` but there is a `custom operator location` defined".format(
self.cs["runType"]
),
"",
self.NO_ACTION,
)
self.addQuery(
lambda: self.cs["operatorLocation"]
and self.cs["runType"] != operators.RunTypes.STANDARD,
"The `runType` setting is set to `{0}` but there is a `custom operator location` defined".format(
self.cs["runType"]
),
"",
self.NO_ACTION,
)
self.addQuery(
lambda: self.cs["skipCycles"] > 0
and not os.path.exists(self.cs.caseTitle + ".restart.dat"),
"This is a restart case, but the required restart file {0}.restart.dat is not found".format(
self.cs.caseTitle
),
"",
self.NO_ACTION,
)
self.addQuery(
lambda: self.cs["deferredInterfacesCycle"] > self.cs["nCycles"],
"The deferred interface activation cycle exceeds set cycle occurrence. "
"Interfaces will not be activated in this run!",
"",
self.NO_ACTION,
)
self.addQuery(
lambda: (
self.cs["boundaries"] != neutronics.GENERAL_BC
and self.cs["bcCoefficient"]
),
"General neutronic boundary condition was not selected, but `bcCoefficient` was defined. "
"Please enable `Generalized` neutronic boundary condition or disable `bcCoefficient`.",
"",
self.NO_ACTION,
)
self.addQuery(
lambda: self.cs["geomFile"]
and str(self.geomType) not in geometry.VALID_GEOMETRY_TYPE,
"{} is not a valid geometry Please update geom type on the geom file. "
"Valid (case insensitive) geom types are: {}".format(
self.geomType, geometry.VALID_GEOMETRY_TYPE
),
"",
self.NO_ACTION,
)
self.addQuery(
lambda: self.cs["geomFile"]
and not geometry.checkValidGeomSymmetryCombo(
self.geomType, self.coreSymmetry
),
"{}, {} is not a valid geometry and symmetry combination. Please update "
"either geometry or symmetry on the geom file.".format(
str(self.geomType), str(self.coreSymmetry)
),
"",
self.NO_ACTION,
)

When the user submits a job locally and one or more of the queries fails inspection, a prompt will be sent to the user asking how to resolve the issue. For instance
image

Each of the queries is also associated with a potential resolution (or sometimes only the option to do nothing), and if the user opts to use that resolution, then ARMI will make the necessary change to the case settings object, re-inspect the settings, and keep prompting the user until everything looks kosher.

However, in cases where the user cannot be prompted for whatever reason (perhaps they are running on a remote machine), the question is instead printed in the log file, but the user cannot make any decisions, nothing is changed in the case settings, and the run continues even though ostensibly something is wrong.

I think that this system of just continuing the run when a setting-clash is known to exist is probably bad practice, because it gives the user the impression that their run is valid when in fact something is known to be wrong.

I think there are two alternatives that we might consider for cases when the user cannot be prompted to make a decision about clashing settings:

  1. When a query fails, the run should fail, forcing the user to actually update their case settings to be compatible with the queries and resubmit. This is annoying to the user and might break some existing workflows, but feels more in line with the intent of the settings validation system in the first place.
  2. When a query fails, the default action is auto-implemented and a note is printed in the log that some case setting has been changed. This is nicer to the user, but may somewhat hide from them that they did something wrong in the first place. Also, this option would need to handle the possibility that an infinite loop may be created through the resolution of one query breaking another.

Any thoughts?

@ntouran @jakehader @onufer @john-science

@ntouran
Copy link
Member

ntouran commented May 3, 2022

We have fought with the settings validation system a lot and gone back and forth. Lots of times the questions are just to guide you through things that might be wrong. When the system just crashes in a validation error, lots of power users complain that they can't get their jobs to run even though they know what they're doing. This can be really frustrating.

Running remotely must be automatically triggering the --batch argument, right (https://terrapower.github.io/armi/.apidocs/armi.context.html#armi.context.Mode)? That's why they're not crashing?

Maybe we could have it crash automatically in INTERACTIVE mode but run no matter what in BATCH mode.

@keckler
Copy link
Member Author

keckler commented May 3, 2022

I sympathize with the frustration of the power users.

But I also see the other side of things, where a developer (such as myself) might try to use the settings validation to force a user to make decisions about certain case settings that don't have clear defaults. So if I, as a developer, need a value to exist in the case settings under certain conditions but can't pick a reasonable default for that value, the settings validation system looks like a pretty good way to force the user to pick something. But if they just submit their job in BATCH mode and it effectively bypasses the settings validation, then that won't work.

Maybe we could have it crash automatically in INTERACTIVE mode but run no matter what in BATCH mode.

I think with this suggestion you are effectively saying that we no longer even prompt the user for a response at all? From my understanding, INTERACTIVE mode is currently the scenario in which a user can answer the prompt and fix the issue. So I don't think understand your suggestion, but that could be a misunderstanding on my part.

Another option I could think of is to add a new case setting to ARMI that let's the user decide how to proceed when the settings validation has questions. Something like settingsValidationDefaultBehavior with the following options:

  • askOrFail : The default case. Prompt the user, if possible, or kill the job if not
  • askOrAutoCorrect : Prompt the user, if possible, or just accept the suggested answer to the query if not
  • autoCorrect : Don't even bother prompting the user, just accept the suggested answer to all queries
  • ignore : Don't even bother prompting the user and do not take any action on all queries. Continue running the case as-is

This should satisfy power users, and makes it very explicitly clear that they are overriding behavior that a developer believes is incorrect. It's like SUDO capabilities to override the settings validator.

@keckler keckler mentioned this issue May 3, 2022
6 tasks
@Nebbychadnezzar
Copy link
Contributor

I think that this system of just continuing the run when a setting-clash is known to exist is probably bad practice, because it gives the user the impression that their run is valid when in fact something is known to be wrong.

I'd like to log here that I didn't know this was an issue and ended up making this assumption, and so was burned by it. 😢

@keckler
Copy link
Member Author

keckler commented Sep 11, 2023

Ultimately it is on the user to check their output and justify any warnings. On this note, I will close this ticket because otherwise we would have to make an API-breaking change.

I still don't like it, but that's okay 😄

@keckler keckler closed this as completed Sep 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants