One of the most important tools of science is statistical inference, the practice of reasoning from the analysis of samples to conclusions about the larger populations from which the samples were drawn. Entire courses are devoted to the topic, and to its detailed techniques — the t, F, and Chi-Squared tests, linear modeling, and many more. Today it is generally accepted that any result obtained from an experiment, through the analysis of a sample of measurements or responses to a survey, will be subjected to a significance test to determine what conclusions can reasonably be drawn about the larger world that is represented by the measurements or responses.
The earliest developments in statistical inference were made in the context of controlled experiments. For example, much was learned about agriculture by sowing test plots with new kinds of seeds, or submitting test plots to specific treatments. In such experiments it is reasonable to assume that every sample was created through a random process, and that the samples collectively represent what might have happened had the sample been much larger — in other words, what might be expected to happen under similar conditions in the universe of all similar experiments, whether conducted by the experimenter or by a farmer. Because there is variation in the experiment, it is important to know whether the variation observed in the sample is sufficiently large to reach conclusions about variation in the universe as a whole. Figure 2‑12, below, illustrates this process of statistical inference. A sample is drawn from the population by a random process. Data are then collected about the sample, and analyzed. Finally, inferences are made about the characteristics of the population, within the bounds of uncertainty inherent in the sampling process.
Figure 2‑12 The process of statistical inference
These techniques have become part of the standard apparatus of science, and it is unusual for scientists to question the assumptions that underlie them. But the techniques of spatial analysis are applied in very different circumstances from the controlled experiments of the agricultural scientist or psychologist. Rather than creating a sample by laying out experimental plots or recruiting participants in a survey, the spatial analyst typically has to rely on so-called natural experiments, in which the variation among samples is the result of circumstances beyond the analyst’s control.
In this context the two fundamental principles of statistical inference raise important questions: (i) were the members of the sample selected randomly and independently from a larger population, or did Tobler’s First Law virtually ensure lack of independence, and/or did the basic heterogeneity of the Earth’s surface virtually ensure that samples drawn in another location would be different? (ii) what universe is represented by the samples? and (iii) is it possible to reason from the results of the analysis to conclusions about the universe?
All too often the answers to these questions are negative. Spatial analysis is often conducted on all of the available data, so there is no concept of a universe from which the data were drawn, and about which inferences are to be made. It is rarely possible to argue that sample observations were drawn independently, unless they are spaced far apart. Specialized methods have been devised that circumvent these problems to some degree, and they will be discussed at various points in the book. More often, however, the analyst must be content with results that apply only to the sample under analysis, and cannot be generalized to some larger universe.