Covariance part 3

Previously: Analysis of covariance and Covariance part 2.

We explore the philosophy and justification of covariate adjustment in this page.

To begin, we define the context. There is a variable of interest, the variate. There is some other variable, the covariate. The participants or subjects in our experiment or survey are not equal on the covariate. We want to know about differences in the variate when the participants or subjects are equal on the covariate.

At the start of the introduction to the analysis of covariance, we introduced the covariate as a "nuisance" variable. That quite nicely encapsulates the six criteria listed here. It is a useful exercise to apply "nuisance" to the covariate we have in mind, and see if that label sticks — if it seems to be an appropriate label, or if the covariate is something more (or less) than that.

The justification of the analysis of covariance rests on the following points.

First (importantly), there needs to be some (theoretical) reason for why the relationship between the variate and the covariate is relevant to the investigation. Why, exactly, is the covariate a nuisance?

Second (trivially), there does need to be an actual correlation between variate and covariate; doing an Ancova when r = 0 is pretty pointless. Is the covariate actually a nuisance?

Third (crucially), in an experimental setting, the covariate should not be affected by the experimental treatments; or in an observational (survey) setting, the covariate should be a measure which is prior (conceptually or temporally) to the variate measure if it is to support a cause-effect interpretation. Does the amount of the nuisance depend on the experimental treatment? Is the nuisance measured earlier and independently?

Fourth (hopefully obviously), the Ancova we have been discussing only provides for, and therefore assumes, a linear relationship between variate and covariate. The Ancova makes two adjustments to the variate based upon its relationship to the covariate, and both are linear adjustments, one to the error variance, and the other to the group means. Is the nuisance linear in its effect?

Fifth (conceptual reality of the adjusted variate) the adjusted data should be "real" in some sense — an adjusted variate should be capable of being observed, if only in principle, rather than existing as a theoretical construct which could not have physical instantiation. When we remove the nuisance effect from the data, can we sensibly discuss what is left behind?

Sixth (conceptual reality of the average covariate) making the data equal on the covariate should be "real" in some sense — an average covariate should be capable of being observed, if only in principle, rather than existing as a theoretical construct which could not have physical instantiation (§1). Is there such a thing as an average amount of nuisance?

(§1) The classic example here is a covariate which simplistically records Gender as "Male" or "Female" and is coded 0 and 1, and a variate which is adjusted by making the participants or subjects "equal" on Gender, that is, predicting a value of the variate when Gender = 0.5. There is no possibility of observing an "average" Gender = 0.5 in the real world, and discussing the predicted value of the variate for an "average" Gender requires exceptional mastery of the relevant literature and social context.

It should be clear that the series of example data sets we have explored in these pages on the analysis of covariance are playful inventions for the purpose of explanation, and do not meet the third or fifth criteria — the Alertness covariate is inextricably tied to a caffeine dose and to the performance of the participants. There is no useful sense in which we could discuss the effects of caffeine on performance while either holding Alertness constant, or adjusting for the effects of Alertness, or eliminating Alertness from a model of how caffeine works.

These considerations bring us to the conclusion that an Ancova requires thoughtful justification when we deploy it, and careful scrutiny, if not suspicion, when we find it deployed elsewhere.