Next: , Previous: UDUnits Support, Up: Common features


3.19 Missing values

Availability: ncap2, ncbo, ncea, ncflint, ncpdq, ncra, ncwa
Short options: None

The phrase missing data refers to data points that are missing, invalid, or for any reason not intended to be arithmetically processed in the same fashion as valid data. The NCO arithmetic operators attempt to handle missing data in an intelligent fashion. There are four steps in the NCO treatment of missing data:

  1. Identifying variables which may contain missing data.

    NCO follows the convention that missing data should be stored with the missing_value specified in the variable's missing_value attribute. The only way NCO recognizes that a variable may contain missing data is if the variable has a missing_value attribute. In this case, any elements of the variable which are numerically equal to the missing_value are treated as missing data.

  2. Converting the missing_value to the type of the variable, if neccessary.

    Consider a variable var of type var_type with a missing_value attribute of type att_type containing the value missing_value. As a guideline, the type of the missing_value attribute should be the same as the type of the variable it is attached to. If var_type equals att_type then NCO straightforwardly compares each value of var to missing_value to determine which elements of var are to be treated as missing data. If not, then NCO converts missing_value from att_type to var_type by using the implicit conversion rules of C, or, if att_type is NC_CHAR 1, by typecasting the results of the C function strtod(missing_value). You may use the NCO operator ncatted to change the missing_value attribute and all data whose data is missing_value to a new value (see ncatted netCDF Attribute Editor).

  3. Identifying missing data during arithmetic operations.

    When an NCO arithmetic operator processes a variable var with a missing_value attribute, it compares each value of var to missing_value before performing an operation. Note the missing_value comparison inflicts a performance penalty on the operator. Arithmetic processing of variables which contain the missing_value attribute always incurs this penalty, even when none of the data are missing. Conversely, arithmetic processing of variables which do not contain the missing_value attribute never incurs this penalty. In other words, do not attach a missing_value attribute to a variable which does not contain missing data. This exhortation can usually be obeyed for model generated data, but it may be harder to know in advance whether all observational data will be valid or not.

  4. Treatment of any data identified as missing in arithmetic operators.

    NCO averagers (ncra, ncea, ncwa) do not count any element with the value missing_value towards the average. ncbo and ncflint define a missing_value result when either of the input values is a missing_value. Sometimes the missing_value may change from file to file in a multi-file operator, e.g., ncra. NCO is written to account for this (it always compares a variable to the missing_value assigned to that variable in the current file). Suffice it to say that, in all known cases, NCO does “the right thing”.

    It is impossible to determine and store the correct result of a binary operation in a single variable. One such corner case occurs when both operands have differing missing_value attributes, i.e., attributes with different numerical values. Since the output (result) of the operation can only have one missing_value, some information may be lost. In this case, NCO always defines the output variable to have the same missing_value as the first input variable. Prior to performing the arithmetic operation, all values of the second operand equal to the second missing_value are replaced with the first missing_value. Then the arithmetic operation proceeds as normal, comparing each element of each operand to a single missing_value. Comparing each element to two distinct missing_value's would be much slower and would be no likelier to yield a more satisfactory answer. In practice, judicious choice of missing_value values prevents any important information from being lost.


Footnotes

[1] For example, the DOE ARM program often uses att_type = NC_CHAR and missing_value = `-99999.'.