Skip to content

DDI Handling of missing values in the QDDT

hildeorten edited this page Jun 27, 2018 · 2 revisions

Principles for missing values in the ESS

The following missing values system is used in the ESS:

Not applicable: 6, 66, 666 etc., where the respondent has been routed away from the question. Note regarding questionnaires: These values are not displayed in the PAPI source questionnaire.

Refusals: 7, 77, 777 etc., where the respondent has explicitly refused. Note regarding questionnaires: In some questions an explicit refusal code is available. Otherwise, “Refusal” shall not be included in the scale or the list of answer categories presented to the interviewer in the questionnaire:

Don't know: 8, 88, 888 etc. Note regarding questionnaires: These codes are almost always available. They can be included in the list of answering options presented to the interviewers, or placed in administrative columns, menus etc., depending on the routines of the survey organization.

No answer: 9, 99, 999 etc. are codes for missing data not elsewhere explained, for example respondent/interviewer errors and production/system errors. Note regarding questionnaires: These values are not displayed in the PAPI source questionnaire.

For further info about the missing values in the ESS, please see section C in http://www.europeansocialsurvey.org/docs/round6/survey/ESS6_data_protocol_e01_4.pdf

Usage in the ESS source questionnaire (currently in PAPI), examples:

In the below examples value n(8) (Don’t’know) is regarded as a value that the respondent should not see, but possible for the interviewer to tick if the respondent answers don’t know. This can be interpreted as a missing value on the level of the question.

Code domain example:

code domain

Scale domain example:

scale domain

Numeric domain example:

numeric domain

Usage in Variables:

In variables sets of missing values are often used. Variable documentation for the resulting variable from question A1 above. Value 77, 88 and 99 are missing values of the variable.

Blank is handled as a missing values in ESS variables.

Variable tvtot

A1. On an average weekday, how much time, in total, do you spend watching television?

Values Categories
0 No time at all
1 Less than 0,5 hour
2 0,5 hour to 1 hour
3 More than 1 hour, up to 1,5 hours
4 More than 1,5 hours, up to 2 hours
5 More than 2 hours, up to 2,5 hours
6 More than 2,5 hours, up to 3 hours
7 More than 3 hours
77 Refusal
88 Don't know
99 No answer

Conceptually, response domains for questions and variable representations for variables consist in two parts, the valid domain or representation, and the missing values.

Valid domains or representations are reusable within and between studies and can also be reused between variables and questions.

Sets of missing values are also reusable in themselves, especially within studies. Combinations of the valids and the missings is less reusable.

Questions regarding requirements for the tools that needs to be clarified:

  1. Is there a requirement to store missing values sets as separate elements? A probable answer is yes. Each component (valid/missing) is then more reusable.

  2. Can the same meaningful missing sets be reused in the response domain of questions and in value representations of variable? For the ESS as now this is less probable. More missings are usually included in the variable than in the question (see example A1 from ESS6 above). A question is if the ESS will need this in the future, or if other surveys will need this.

Agreement regarding this was reached at meeting in the domain group at NSD 2015-10-21. Yes to both.

Possibilities in DDI3.2

1.Missing values elements for questions and variables

DDI 3.2 has a separate missing values structure, which allows reuse of missing values and missing values sets in variables and question structures by reference.

For Variables, this can be done by adding in VariableRepresentation a MissingValuesReference to a ManagedMissingValueRepresentation, which can be a MissingCodeRepresentation, a MissingNumericRepresentation or a MissingTextRepresentation.

For questions a MissingValuesDomain Reference can be added as a substitution for a ResponseDomainReference in QuestionGrid, GridResponseDomain, QuestionItem or ResponseDomainInMixed. Each of these can only have one response domain reference. In most known instances this will need to be to the set of valid response options. ResponsDomainInMixed is an exception. This is meant to cover different response domains that can be added by reference in a StructuredMixedResponseDomain in a QuestionItem or a QuestionGrid.

2.Assigning missing values at the level of a value set, or to individual categories

For Variables there is a possibility to assign which values are missing using the ‘missingValue’ attribute at the level of the ValueRepresentationReference, which is on a lower hierarchical level than VariableRepresentation. This presupposes, however, that the value set already includes the missing values in addition to the valid ones, which makes the value set less reusable.

For response domains that are included in-line in a question it is possible to assign missing values at the level of the response domain. Again this presupposes, however, that the value set already includes the missing values in addition to the valid ones, which makes the value set less reusable. It is also possible to assign missing values at the category level, using attribute ‘isMissing’. This makes the category less reusable.

3.Missing blanks

Attribute BlankisMissingValue = ‘true’ can be used to define blanks as missing values at the level of ManagedMissingValueRepresentation. (This option is also possible in ValueRepresentation, ValuerepresentationReference and ResponseDomains when the ManagedMissingValueRepresentation structure is not used).

4.Use CV in UserAttributePair to distinguish between valid and missing value sets

For questions one possibility is to not use the missing structure, but to add the missing values in the response domain together with the valid values. Another possibility for questions is to use the StructuredMixedresponseDomain, and split the valid and missing values into separate response domains, without using the MissingValuesDomain structure. Missing values sets like for example 7 (Refusal), 8 (Don’t know) could be entered as a CodeDomain with reference to a CodeList, for example. A controlled vocabulary in UserAttributepair could be used to distinguish between valid and missing value sets.

Conclusion

For variables it makes in our view very much sense to use the reusable missing values structure to represent the missing values (alternative 1). A good point in DDI is that it is added at the level of the variable representation. Then Variable can inherit the value domain from RepresentedVariable, which is the more reusable part of Variable. Missing values can be added by reference in VariableRepresentation. This is a recommendation for the QVDB. For questions it makes in any case sense to use the StructuredMixedResponseDomain in the case that missing values exist. This is also recommended in the DDI.

As the same missing sets be reused in the response domain of questions and in value representations of variable it makes sense to introduce the missing values structure for questions already (alternative 1). Attribute isBlankMissingValue = ‘true’ can be used to define blanks as missing values at the level of ManagedMissingValueRepresentation (alt 3).

Agreement regarding this was reached at meeting in the domain group at NSD 2015-10-21. Yes to all.

Questions

The following missing values representations are available in DDI 3.2:

  • MissingCodeRepresentation
  • MissingNumericRepresentation
  • MissingTextRepresentation

MissingCodeRepresentation is used in the QDDT

We plan to use the following response domains by reference:

  • CategoryDomain
  • CodeDomain
  • DateTimeDomain
  • NumericDomain
  • ScaleDomain
  • TextDomain
Clone this wiki locally