Missing Data 概念

Missing Data Mechanisms

Describe relationships between measured variables and the probability of missing data Type: • Missing at random: missing conditionally at random, which means the missingness of a variable is conditional on another variable.(一个变量的缺失影响了另一个变量) • Missing Completely at random: means the propensity for a data point to be missing is completely random. There is no relationship between whether a data point is missing and any values in the data set, missing or observed.(完全随机) • Missing not at random: means the probability of missing data on a variable is related to the values of the variable itself,.(自己的值缺失会对自己的后面的值造成影响)

Handling missing values

Listwise deletion: Discards the data for any case that has one or more missing values. Pairwise deletion: Attempts to mitigate the loss of data by eliminating cases on an analysis-by -analysis basis. Single imputation: generates a single replacement value for each missing data point. • Arithmetic mean imputation: Arithmetic mean of the available cases. • Regression imputation: Replaces missing values with predicted scores from a regression equation. • Stochastic regression imputation: Add random residuals to the predicate values generated by standard regression imputation Imputation with K-Nearest Neighbour: use value of the K-Nearest neighbours to impute the missing value.