Missing Data 概念
Missing Data Mechanisms
Describe relationships between measured variables and the probability of missing data
Type:
• Missing at random: missing conditionally at random, which means the missingness of a variable is
conditional on another variable.(一个变量的缺失影响了另一个变量)
• Missing Completely at random: means the propensity for a data point to be missing is completely
random. There is no relationship between whether a data point is missing and any values in the data
set, missing or observed.(完全随机)
• Missing not at random: means the probability of missing data on a variable is related to the values
of the variable itself,.(自己的值缺失会对自己的后面的值造成影响)
Handling missing values
Listwise deletion: Discards the data for any case that has one or more missing values.
Pairwise deletion: Attempts to mitigate the loss of data by eliminating cases on an analysis-by
-analysis basis.
Single imputation: generates a single replacement value for each missing data point.
• Arithmetic mean imputation: Arithmetic mean of the available cases.
• Regression imputation: Replaces missing values with predicted scores from a regression equation.
• Stochastic regression imputation: Add random residuals to the predicate values generated by
standard regression imputation
Imputation with K-Nearest Neighbour: use value of the K-Nearest neighbours to impute the missing
value.