Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature 1684 bss and 1685 single reference model #1689

Merged
merged 19 commits into from
Mar 2, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
c0dab33
Per #1684, move an instance of the ClimoCDFInfo class into PairBase. …
JohnHalleyGotway Feb 26, 2021
89e9481
Add to VxPairDataPoint and VxPairDataEnsemble functions to set the Cl…
JohnHalleyGotway Feb 26, 2021
f541a81
Per #1684, update ensemble_stat and point_stat to set the ClimoCDFInf…
JohnHalleyGotway Feb 26, 2021
4d655d8
Per #1684, update the vx_statistics library and stat_analysis to make…
JohnHalleyGotway Feb 26, 2021
7d19d06
Per #1684, since cdf_info is a member of PairBase class, need to hand…
JohnHalleyGotway Mar 1, 2021
687b00a
Per #1684, during development, I ran across and then updated this log…
JohnHalleyGotway Mar 1, 2021
9abfbcc
Per #1684, working on log messages and figured that the regridding cl…
JohnHalleyGotway Mar 1, 2021
a759c17
Per #1684 and #1685, update the logic for the derive_climo_vals() uti…
JohnHalleyGotway Mar 1, 2021
f8f1b9f
Per #1684, just fixing the format of this log message.
JohnHalleyGotway Mar 1, 2021
0737877
Per #1684, add a STATLine::get_offset() member function.
JohnHalleyGotway Mar 1, 2021
dc6a8af
Per #1684, update parse_orank_line() logic. Rather than calling NumAr…
JohnHalleyGotway Mar 1, 2021
4b0a816
Per #1684, call EnsemblePairData::extend() when parsing ORANK data to…
JohnHalleyGotway Mar 1, 2021
4b5c718
Per #1684 and #1685, add another call to Ensemble-Stat to test comput…
JohnHalleyGotway Mar 1, 2021
716f777
Per #1684 and #1685, update ensemble-stat docs about computing CRPSS_…
JohnHalleyGotway Mar 1, 2021
97d622e
Merge branch 'develop' into feature_1684_bss
JohnHalleyGotway Mar 1, 2021
d2497ae
Per #1684, need to update Grid-Stat to store the climo cdf info in th…
JohnHalleyGotway Mar 1, 2021
7190179
Per #1684, remove debug print statements.
JohnHalleyGotway Mar 1, 2021
82b5898
Per #1684, need to set cdf_info when aggregating MPR lines in Stat-An…
JohnHalleyGotway Mar 1, 2021
9a792c5
Per #1684 and #1685, update PairDataEnsemble::compute_pair_vals() to …
JohnHalleyGotway Mar 2, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion met/docs/Users_Guide/ensemble-stat.rst
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,11 @@ The ranked probability score (RPS) is included in the Ranked Probability Score (
Climatology data
~~~~~~~~~~~~~~~~

The Ensemble-Stat output includes at least three statistics computed relative to external climatology data. The climatology is defined by mean and standard deviation fields, and both are required in the computation of ensemble skill score statistics. MET assumes that the climatology follows a normal distribution, defined by the mean and standard deviation at each point. When computing the CRPS skill score for (:ref:`Gneiting et al., 2004 <Gneiting-2004>`) the reference CRPS statistic is computed using the climatological mean and standard deviation directly. When computing the CRPS skill score for (:ref:`Hersbach, 2000 <Hersbach-2000>`) the reference CRPS statistic is computed by selecting equal-area-spaced values from the assumed normal climatological distribution. The number of points selected is determined by the *cdf_bins* setting in the *climo_cdf* dictionary. The reference CRPS is computed empirically from this ensemble of climatology values. The climatological distribution is also used for the RPSS. The forecast RPS statistic is computed from a probabilistic contingency table in which the probabilities are derived from the ensemble member values. In a simliar fashion, the climatogical probability for each observed value is derived from the climatological distribution. The area of the distribution to the left of the observed value is interpreted as the climatological probability. These climatological probabilities are also evaluated using a probabilistic contingency table from which the reference RPS score is computed. The skill scores are derived by comparing the forecast statistic to the reference climatology statistic.
The Ensemble-Stat output includes at least three statistics computed relative to external climatology data. The climatology is defined by mean and standard deviation fields, and typically both are required in the computation of ensemble skill score statistics. MET assumes that the climatology follows a normal distribution, defined by the mean and standard deviation at each point.

When computing the CRPS skill score for (:ref:`Gneiting et al., 2004 <Gneiting-2004>`) the reference CRPS statistic is computed using the climatological mean and standard deviation directly. When computing the CRPS skill score for (:ref:`Hersbach, 2000 <Hersbach-2000>`) the reference CRPS statistic is computed by selecting equal-area-spaced values from the assumed normal climatological distribution. The number of points selected is determined by the *cdf_bins* setting in the *climo_cdf* dictionary. The reference CRPS is computed empirically from this ensemble of climatology values. If the number bins is set to 1, the climatological CRPS is computed using only the climatological mean value. In this way, the empirical CRPSS may be computed relative to a single model rather than a climatological distribution.

The climatological distribution is also used for the RPSS. The forecast RPS statistic is computed from a probabilistic contingency table in which the probabilities are derived from the ensemble member values. In a simliar fashion, the climatogical probability for each observed value is derived from the climatological distribution. The area of the distribution to the left of the observed value is interpreted as the climatological probability. These climatological probabilities are also evaluated using a probabilistic contingency table from which the reference RPS score is computed. The skill scores are derived by comparing the forecast statistic to the reference climatology statistic.

Ensemble observation error
~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down
16 changes: 14 additions & 2 deletions met/src/libcode/vx_analysis_util/stat_line.cc
Original file line number Diff line number Diff line change
Expand Up @@ -327,6 +327,18 @@ bool STATLine::has(const char *col_str) const

{

return ( !is_bad_data(get_offset(col_str)) );

}


////////////////////////////////////////////////////////////////////////


int STATLine::get_offset(const char *col_str) const

{

int offset = bad_data_int;
int dim = bad_data_int;

Expand All @@ -353,10 +365,10 @@ if ( is_bad_data(offset) ) {
}

//
// Return whether a valid offset value was found
// Return the offset value
//

return ( !is_bad_data(offset) );
return ( offset );

}

Expand Down
9 changes: 5 additions & 4 deletions met/src/libcode/vx_analysis_util/stat_line.h
Original file line number Diff line number Diff line change
Expand Up @@ -61,10 +61,11 @@ class STATLine : public DataLine {
// retrieve values of the header columns
//

bool has (const char *) const;
ConcatString get (const char *, bool check_na = true) const;
const char * get_item (const char *, bool check_na = true) const;
const char * get_item (int, bool check_na = true) const;
bool has (const char *) const;
int get_offset(const char *) const;
ConcatString get (const char *, bool check_na = true) const;
const char * get_item (const char *, bool check_na = true) const;
const char * get_item (int, bool check_na = true) const;

const char * version () const;
const char * model () const;
Expand Down
3 changes: 2 additions & 1 deletion met/src/libcode/vx_statistics/compute_stats.cc
Original file line number Diff line number Diff line change
Expand Up @@ -743,7 +743,8 @@ void compute_pctinfo(const PairDataPoint &pd, bool pstd_flag,
// Use input climatological probabilities or derive them
if(cmn_flag) {
if(cprob_in) climo_prob = *cprob_in;
else climo_prob = derive_climo_prob(pd.cmn_na, pd.csd_na,
else climo_prob = derive_climo_prob(pd.cdf_info,
pd.cmn_na, pd.csd_na,
pct_info.othresh);
}

Expand Down
11 changes: 6 additions & 5 deletions met/src/libcode/vx_statistics/ens_stats.cc
Original file line number Diff line number Diff line change
Expand Up @@ -484,10 +484,10 @@ void RPSInfo::set(const PairDataEnsemble &pd) {
// Check that thresholds are actually defined
if(fthresh.n() == 0) {
mlog << Error << "\nRPSInfo::set(const PairDataEnsemble &) -> "
<< "no thresholds provided to compute the RPS line type! "
<< "Specify thresholds using the \""
<< conf_key_prob_cat_thresh
<< "\" configuration file option.\n\n";
<< "no thresholds provided to compute the RPS line type!\n"
<< "Specify thresholds using the \"" << conf_key_prob_cat_thresh
<< "\" configuration file option or by providing climatological "
<< "mean and standard deviation data.\n\n";
exit(1);
}

Expand Down Expand Up @@ -522,7 +522,8 @@ void RPSInfo::set(const PairDataEnsemble &pd) {
climo_pct.zero_out();

// Derive climatological probabilities
if(cmn_flag) climo_prob = derive_climo_prob(pd.cmn_na, pd.csd_na,
if(cmn_flag) climo_prob = derive_climo_prob(pd.cdf_info,
pd.cmn_na, pd.csd_na,
fthresh[i]);

// Loop over the observations
Expand Down
100 changes: 82 additions & 18 deletions met/src/libcode/vx_statistics/pair_base.cc
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,8 @@ void PairBase::clear() {
interp_mthd = InterpMthd_None;
interp_shape = GridTemplateFactory::GridTemplate_None;

cdf_info.clear();

o_na.clear();
x_na.clear();
y_na.clear();
Expand Down Expand Up @@ -121,6 +123,8 @@ void PairBase::erase() {
interp_mthd = InterpMthd_None;
interp_shape = GridTemplateFactory::GridTemplate_None;

cdf_info.clear();

o_na.erase();
x_na.erase();
y_na.erase();
Expand Down Expand Up @@ -267,6 +271,15 @@ void PairBase::set_interp_shape(GridTemplateFactory::GridTemplates shape) {

////////////////////////////////////////////////////////////////////////

void PairBase::set_climo_cdf_info(const ClimoCDFInfo &info) {

cdf_info = info;

return;
}

////////////////////////////////////////////////////////////////////////

void PairBase::set_fcst_ut(unixtime ut){

fcst_ut = ut;
Expand Down Expand Up @@ -1010,11 +1023,49 @@ bool set_climo_flag(const NumArray &f_na, const NumArray &c_na) {

////////////////////////////////////////////////////////////////////////

NumArray derive_climo_prob(const NumArray &mn_na, const NumArray &sd_na,
void derive_climo_vals(const ClimoCDFInfo &cdf_info,
double m, double s,
NumArray &climo_vals) {

// Initialize
climo_vals.erase();

// cdf_info.cdf_ta starts with >=0.0 and ends with >=1.0.
// The number of bins is the number of thresholds minus 1.

// Check for bad mean value
if(is_bad_data(m) || cdf_info.cdf_ta.n() < 2) {
return;
}
// Single climo bin
else if(cdf_info.cdf_ta.n() == 2) {
climo_vals.add(m);
}
// Check for bad standard deviation value
else if(is_bad_data(s)) {
return;
}
// Extract climo distribution values
else {

// Skip the first and last thresholds
for(int i=1; i<cdf_info.cdf_ta.n()-1; i++) {
climo_vals.add(
normal_cdf_inv(cdf_info.cdf_ta[i].get_value(), m, s));
}
}

return;
}

////////////////////////////////////////////////////////////////////////

NumArray derive_climo_prob(const ClimoCDFInfo &cdf_info,
const NumArray &mn_na, const NumArray &sd_na,
const SingleThresh &othresh) {
int i, n_mn, n_sd;
double prob;
NumArray climo_prob;
NumArray climo_prob, climo_vals;

// Number of valid climo mean and standard deviation
n_mn = mn_na.n_valid();
Expand Down Expand Up @@ -1045,7 +1096,7 @@ NumArray derive_climo_prob(const NumArray &mn_na, const NumArray &sd_na,
break;

default:
mlog << Error << "\n\nderive_climo_prob() -> "
mlog << Error << "\nderive_climo_prob() -> "
<< "climatological threshold \"" << othresh.get_str()
<< "\" cannot be converted to a probability!\n\n";
exit(1);
Expand All @@ -1060,23 +1111,17 @@ NumArray derive_climo_prob(const NumArray &mn_na, const NumArray &sd_na,
// threshold
else if(n_mn > 0 && n_sd > 0) {

mlog << Debug(2)
<< "Deriving normal approximation of climatological "
<< "probabilities for threshold " << othresh.get_str()
<< ".\n";
// The first (>=0.0) and last (>=1.0) climo thresholds are omitted
mlog << Debug(4)
<< "Deriving climatological probabilities for threshold "
<< othresh.get_str() << " by sampling " << cdf_info.cdf_ta.n()-2
<< " values from the normal climatological distribution.\n";

// Compute probability value for each point
// Compute the probability by sampling from the climo distribution
// and deriving the event frequency
for(i=0; i<mn_na.n(); i++) {

prob = normal_cdf(othresh.get_value(), mn_na[i], sd_na[i]);

// Handle greater-than probabilities
if(!is_bad_data(prob) &&
(othresh.get_type() == thresh_gt ||
othresh.get_type() == thresh_ge)) {
prob = 1.0 - prob;
}
climo_prob.add(prob);
derive_climo_vals(cdf_info, mn_na[i], sd_na[i], climo_vals);
climo_prob.add(derive_prob(climo_vals, othresh));
}
}
// If only climatological mean was provided, it should already
Expand Down Expand Up @@ -1107,3 +1152,22 @@ NumArray derive_climo_prob(const NumArray &mn_na, const NumArray &sd_na,
}

////////////////////////////////////////////////////////////////////////

double derive_prob(const NumArray &na, const SingleThresh &st) {
int i, n_vld, n_event;
double prob;

// Count up the number of events
for(i=0,n_vld=0,n_event=0; i<na.n(); i++) {
if(is_bad_data(na[i])) continue;
n_vld++;
if(st.check(na[i])) n_event++;
}

if(n_vld == 0) prob = bad_data_double;
else prob = (double) n_event / n_vld;

return(prob);
}

////////////////////////////////////////////////////////////////////////
48 changes: 29 additions & 19 deletions met/src/libcode/vx_statistics/pair_base.h
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,9 @@ class PairBase {
int interp_wdth;
GridTemplateFactory::GridTemplates interp_shape;

// Climo distribution info
ClimoCDFInfo cdf_info;

// Point and Grid Observation Information
NumArray o_na; // Observation value [n_obs]
NumArray x_na; // X [n_obs]
Expand Down Expand Up @@ -131,6 +134,8 @@ class PairBase {
void set_interp_wdth(int);
void set_interp_shape(GridTemplateFactory::GridTemplates);

void set_climo_cdf_info(const ClimoCDFInfo &);

void set_fcst_ut(unixtime ut);
void set_check_unique(bool check);
void set_obs_summary(ObsSummary s);
Expand Down Expand Up @@ -159,8 +164,7 @@ class PairBase {

void add_grid_obs(double, double, double,
double, double, double);



void add_climo(double, double, double);
void set_climo(int, double, double, double);
void add_climo_cdf();
Expand All @@ -187,29 +191,35 @@ extern void find_vert_lvl(const DataPlaneArray &, const double,
int &, int &);

extern double compute_interp(const DataPlaneArray &dpa,
const double obs_x, const double obs_y,
const double obs_v, const double cmn, const double csd,
const InterpMthd method, const int width,
const GridTemplateFactory::GridTemplates shape,
const double thresh,
const bool spfh_flag, const LevelType lvl_typ,
const double to_lvl, const int i_blw, const int i_abv,
const SingleThresh *cat_thresh = 0);
const double obs_x, const double obs_y,
const double obs_v, const double cmn, const double csd,
const InterpMthd method, const int width,
const GridTemplateFactory::GridTemplates shape,
const double thresh,
const bool spfh_flag, const LevelType lvl_typ,
const double to_lvl, const int i_blw, const int i_abv,
const SingleThresh *cat_thresh = 0);

extern void get_interp_points(const DataPlaneArray &dpa,
const double obs_x, const double obs_y,
const InterpMthd method, const int width,
const GridTemplateFactory::GridTemplates shape,
const double thresh,
const bool spfh_flag, const LevelType lvl_typ,
const double to_lvl, const int i_blw, const int i_abv,
NumArray &interp_points);
const double obs_x, const double obs_y,
const InterpMthd method, const int width,
const GridTemplateFactory::GridTemplates shape,
const double thresh,
const bool spfh_flag, const LevelType lvl_typ,
const double to_lvl, const int i_blw, const int i_abv,
NumArray &interp_points);

extern bool set_climo_flag(const NumArray &, const NumArray &);
extern bool set_climo_flag(const NumArray &, const NumArray &);

extern NumArray derive_climo_prob(const NumArray &, const NumArray &,
extern void derive_climo_vals(const ClimoCDFInfo &,
double, double, NumArray &);

extern NumArray derive_climo_prob(const ClimoCDFInfo &,
const NumArray &, const NumArray &,
const SingleThresh &);

extern double derive_prob(const NumArray &, const SingleThresh &);

////////////////////////////////////////////////////////////////////////

#endif // __PAIR_BASE_H__
Expand Down
Loading