Notes: Gaussian Process Regression - When it does and does not work

In machine learning, regression refers to the application of fitting a function to the data. For a given set of observations, there is a huge number of functions that can potentially fit them. Gaussian processes solve this problem by assigning a probability to each of these functions. The mean value of this probability distribution (i.e., a Gaussian distribution) is used to represent the most probable outcome of a given input. Although the probabilistic model facilitates the derivation of predictive distribution for the regression outcome, the key characteristic of this modeling technique is that the variance of the distribution for a new observation x (i.e., the indicator of its prediction reliability) only depends on the input features and, in particular, on the relative location (e.g., distance calculated using feature values) of x to other observations in the training data, and not on the observed target (outcome) values (Rasmussen 2004). Since the modeled prediction variance is independent of outcome values (again, it depends only on input features), it is not well- suited to capture the magnitude of error in the prediction that is due to variability in the outcomes, which makes it a less informative measurement of individual prediction reliability (as considered in our paper). In other words, the independence of prediction variance modeling from outcome values is an explicit property of the Gaussian process regression, which we illustrate mathematically below.