<< Click to Display Table of Contents >>
## Classification and clustering |

“Classification is, perhaps, the basic procedure by which we impose some sort of order and coherence upon the vast inflow of information from the real world.” Harvey (1969)

Harvey provides a very useful background to the field of classification, especially in a spatial context, but for the basics of classification within a GIS context Mitchell (1999, pp 46-55) and Longley et al. (2015) are probably more useful and up-to-date starting points. These latter works address the question of how to classify and map attribute data associated with polygonal regions or provided as continuous fields. The manuals and online help facilities of the main GIS and image processing packages are also a good source of material for deciding which scheme might be the most appropriate for the dataset you have. Most of these schemes provide a range of standard alternatives for classifying univariate attributes, for example mapping population density by census tract. The attributes selected may be transformed in advance, for example: by producing spatially intensive variables such as densities rather than population totals; and/or by mathematically transforming data, e.g. by classifying log transformed data rather than the original dataset; and/or by normalizing datasets (as discussed in Section 4.3.3, Ratios, indices, normalization, standardization and rate smoothing). Alternatively the classification procedure selected may impose some form of normalization on the untransformed data (e.g. standard deviation based classification).

Classification needs to be seen in terms of purpose as well as method. However, GIS software places almost no constraints upon a user’s selection of classification method, so it is the responsibility of the analyst to choose appropriate procedures and settings that are fit for purpose. Classification is by no means a peculiarly spatial problem, and most methods apply to almost any kind of data, i.e. there is rarely a specific spatial aspect, such as a contiguity condition, applied. Such facilities are more likely to be found in the spatial autocorrelation and pattern analysis features of GIS toolsets.

Univariate classification schemes

Within GIS software, univariate classification facilities are found as tools to: aid in the production of choropleth or thematic maps; explore untransformed or transformed datasets; analyze (classify and re-classify) image data (see further, Section 4.2.12, Classification and clustering); and display continuous field data. In the majority of cases these procedures perform classification based purely on the input dataset, without reference to separate external evaluation criteria.

In almost all instances the objects to be classified are regarded as discrete, distinct items that can only reside in one class at a time (sometimes referred to as hard classification). Separate schemes exist for classifying objects that have uncertain class membership (soft or fuzzy classification) and/or unclear boundaries (as discussed briefly in Section 4.2.13, Boundaries and zone membership), or which require classification on the basis of multiple attributes (see Section 4.2.12, Classification and clustering). Typically the attributes used in classification have numerical values that are real or integer type. In most instances these numeric values represent interval or ratio-scaled variables. Purely nominal data are effectively already classified (see, for example, Figure 4‑21 in which each zone has been assigned a unique color from a pre-selected palette; and see Section 2.1.2, Attributes, for a brief discussion of commonly used scales). Of course, there is always the option to apply no predefined classification procedure, but instead when mapping such data to apply a continuous gradation of gray tones or colors, using a linear or non-linear (e.g. ogive) scaling of the data values (see further Tobler, 1973 and Cromley and Ye, 2006).

Table 4‑5 provides details of a number of univariate classification schemes together with comments on their use. Most of the main GIS packages provide classification options of the types listed, although some (such as the box and percentile methods) are only available in a limited number of software tools (e.g. GeoDa). A useful variant of the box method, known as hybrid equal interval, in which the inter-quartile range is itself divided into equal intervals, does not appear to be implemented in mainstream GIS packages. Schemes that take into account spatial contiguity, such as the so-called minimum boundary error method described by Cromley (1996), also do not appear to be readily available as a standard means of classification.

Brewer and Pickle (2002) examined many of these methods, with particular attention being paid to ease of interpretation and comparison of map series, e.g. mapping time series data, for example of disease incidence by health service area over a number of years. They concluded that simple quantile methods were amongst the most effective, and had the great advantage of consistent legends for each map in the series, despite the fact that the class interval values often vary widely. They also found the schemes that resulted in a very large central class (e.g. 40%+) were more difficult to interpret.

In addition to selection of the scheme to use, the number of breaks or intervals and the positioning of these breaks are fundamental decisions. The number of breaks is often selected as an odd value: 5, 7 or 9. With an even number of classes there is no central class, and with a number of classes less than 4 or 5 the level of detail obtained may be too limited. With more than 9 classes gradations may be too fine to distinguish key differences between zones, but this will depend a great deal on the data and the purpose of the classification being selected. In a number of cases the GIS package provides linked frequency diagrams with breaks identified and in some cases interactively adjustable. In other cases generation of frequency diagrams should be conducted in advance to help determine the ideal number of classes and type of classification to be used.

Table 4‑5 Selected univariate classification schemes

Classification scheme |
Description/application |
---|---|

Unique values |
Each value is treated separately, for example mapped as a distinct color |

Manual classification |
The analyst specifies the boundaries between classes required as a list, or specifies a lower bound and interval or lower and upper bound plus number of intervals required |

Equal interval, Slice |
The attribute values are divided into n classes with each interval having the same width=Range/n. For raster maps this operation is often called slice |

Defined interval |
A variant of manual and equal interval, in which the user defines each of the intervals required |

Exponential interval |
Intervals are selected so that the number of observations in each successive interval increases (or decreases) exponentially |

Equal count or quantile |
Intervals are selected so that the number of observations in each interval is the same. If each interval contains 25% of the observations the result is known as a quartile classification. Ideally the procedure should indicate the exact numbers assigned to each class, since they will rarely be exactly equal |

Percentile |
Percentile plots are a variant of equal count or quantile plots. In the standard version equal percentages (percentiles) are included in each class. In GeoDa’s implementation of percentile plots (specifically designed for exploratory spatial data analysis, ESDA) unequal numbers are assigned to provide classes that contain 6 intervals: <=1%, 1% to <10%, 10% to <50%, 50% to <90%, 90% to <99% and >=99% |

Natural breaks/Jenks |
Widely used within GIS packages, these are forms of variance-minimization classification. Breaks are typically uneven, and are selected to separate values where large changes in value occur. May be significantly affected by the number of classes selected and tends to have unusual class boundaries. Typically the method applied is due to Jenks, as described in Jenks and Caspall (1971), which in turn follows Fisher (1958). See Figure 4‑22 for more details. Unsuitable for map comparisons |

Standard deviation |
The mean and standard deviation of the attribute values are calculated, and values classified according to their deviation from the mean (z-transform). The transformed values are then mapped, usually at intervals of 1.0 or 0.5 standard deviations. Note that this often results in no central class, only classes either side of the mean and the number of classes is then even. SD classifications in which there is a central class (defined as the mean value +/-0.5SD) with additional classes at +/- 1SD intervals beyond this central class, are also used |

Box |
A variant of quartile classification designed to highlight outliers, due to Tukey (1977, Section 2C). Typically six classes are defined, these being the 4 quartiles, plus two further classifications based on outliers. These outliers are defined as being data items (if any) that are more than 1.5 times the inter-quartile range (IQR) from the median. An even more restrictive set is defined by 3.0 times the IQR. A slightly different formulation is sometimes used to determine these box ends or hinge values. Box plots (see Section 5.2.2, Outlier detection) are implemented in GeoDa and STARS, but are not generally found in mainstream GIS software. They are commonly implemented in statistics packages, for example the MATLab Statistics Toolbox |

For data with a very large range of values, smoothly varying across a range, a graduated set of classes and coloring may be entirely appropriate. Such classification schemes are common with field-like data and raster files. Continuous graded shading of thematic maps is also possible, with some packages, such as Manifold’s Thematic Formatting facilities providing a wide range of options that reduce or remove the dependence on formal class boundaries. The Jenks method, as implemented in a number of GIS packages such as ArcGIS, is not always well-documented so a brief description of the algorithm is provided in Figure 4‑22. This description also provides an initial model for some of the multivariate methods described in subsection 4.2.12, Classification and clustering.

Figure 4‑22 Jenks Natural Breaks algorithm

Step 1: The user selects the attribute, x, to be classified and specifies the number of classes required, k Step 2: A set of k‑1 random or uniform values are generated in the range [min{x},max{x}]. These are used as initial class boundaries Step 3: The mean values for each initial class are computed and the sum of squared deviations of class members from the mean values is computed. The total sum of squared deviations (TSSD) is recorded Step 4: Individual values in each class are then systematically assigned to adjacent classes by adjusting the class boundaries to see if the TSSD can be reduced. This is an iterative process, which ends when improvement in TSSD falls below a threshold level, i.e. when the within class variance is as small as possible and between class variance is as large as possible. True optimization is not assured. The entire process can be optionally repeated from Step 1 or 2 and TSSD values compared |

Positioning of breaks may be pre-determined by the classification procedure (e.g. Jenks Natural breaks), but these are often manually adjustable for some of the schemes provided — this is particularly useful if specific values or intervals are preferred, such as whole numbers or some convenient values such as 1000s, or if comparisons are to be made across a series of maps. In some instances options are provided for dealing with zero-valued and/or missing data prior to classification, but if not provided the data should be inspected in advance and subsets selected prior to classification where necessary. Some authors recommend that maps with large numbers of zero or missing data values should have such regions identified in a class of their own.

Multivariate classification and clustering

Harvey (1969, p.346) describes the key steps of multivariate classification as follows:

•quantitative analysis of the inter-relationships among the attributes or among the objects

•transformation or reduction of the correlations to a geometric structure with known properties (usually Euclidean)

•grouping or clustering of the objects or attributes on the basis of the distance measured in this transformed space, and once the classes have been firmly identified

•the development of rules for assigning phenomena to classes

Important additional steps to those listed by Harvey are in-process and post-process analysis of the quality of classification achieved. In-process quality analysis attempts to use multiple classification approaches (typically soft classifiers ― see further, below) to evaluate the robustness and strength of the classes that have been defined. Alternatively or additionally, random samples of classified data (e.g. pixels) are taken and compared with the ground truth, by field survey (using new or previously collected data).

Classification of datasets based purely on attribute information (i.e. ignoring spatial relationships) can readily be conducted using general purpose statistical and mathematical toolsets. Some techniques, such as clustering based on multivariate analysis of variance, require that input datasets be Normally distributed, but many methods make no such assumptions. Most statistical and mathematical libraries provide a range of tools to facilitate classification and assignment. These typically include:

•facilities that can be used to reduce the complexity (dimensionality) of the source datasets, such as factor analysis, principal components analysis and multidimensional scaling

•facilities to identify clusters within the dataset, either on a single level for a fixed number of clusters (e.g. K-means clustering) or on a hierarchical basis (typically techniques that rely on some form of linkage function, usually based on a function of the simple n-dimensional Euclidean distance between points within separate clusters); and

•facilities for optimally assigning new observations to existing classes (e.g. linear discriminant analysis)

Before we examine the application of such methods to the analysis of multi-band image data (the main application area for such methods within GIS and remotely-sensed image processing packages), we first summarize the K-means and linear discriminant analysis procedures mentioned above.

K-means clustering attempts to partition a multivariate dataset into K distinct (non-overlapping) clusters such that points within a cluster are as close as possible in multi-dimensional space, and as far away as possible from points in other clusters. The input dataset can be viewed as a set of objects with an associated set of n (typically real-valued) measures or attributes. Each object can then be viewed as a single point or vector x={x1,x2,x3…,xn} in n‑dimensional space. The cluster procedure is then as follows:

•The method starts with a set of K initial cluster centers. These might be assigned: (i) as K random locations in n-space within the observed data ranges; (ii) as K uniformly sited locations within the observed data ranges; (iii) as a user-defined set of K locations; (iv) by selecting K datapoints at random from the observed set as cluster centers; or (v) as a set of K locations obtained by clustering a small subset of the data using random or uniform allocation. A minimum separation distance for starting points may be specified

•The distance between every point and each of the K means is computed, based on some prior defined metric (often using squared Euclidean, L22, or City block, L1, as these are both fast to compute ― see further, Section 4.4.1, Metrics). The Idrisi software package, for example, uses Euclidean distances and the general expression:

where dik defines the distance from the ith pixel in an image to the kth centroid, Xin is the n-element (band) vector of the ith pixel, and Ckn is the n-element vector of the kth centroid

•Points are assigned to the nearest center (i.e. it is a locally optimizing greedy heuristic; see further, Section 7.2.2, Heuristic and meta-heuristic algorithms). Any entirely empty clusters may be discarded and (optionally) a new start point added

•The location of the center of each cluster is then re-computed based on the set of points allocated to that center, and the previous step is then repeated. This sequence is iterated until no further reassignments occur, or a preset number of iterations have been completed, or no cluster center is moved by more than a pre-specified small amount. The total distance (DSUM) of all points to the centers of the K clusters to which they have been assigned is calculated

•Each point is then re-examined in turn (Phase 2 of the process) and checked to see if DSUM is reduced if the point is assigned to another cluster. If DSUM is reduced the point is reassigned to the cluster that results in the maximum overall reduction

•Stopping criteria may be determined by setting a limit to the number of iterations of the process (e.g. 50), or the level of change in DSUM, or the number or percentage of data points that continue to be re-allocated (e.g. 1%), or a combination of these options

•Special consideration may be given to the re-allocation of data items where the cluster membership is very small (e.g. contain less than 1% of the data). The option to remove low membership clusters and re-allocate their members (i.e. consolidating the clusters to provide a smaller number of larger clusters) is quite common

•The entire process may then be repeated with a different choice of starting points. This may result in an overall reduction in the objective function, DSUM, since the algorithm is not guaranteed to provide an optimal partitioning

•Generic clustering algorithms, such as K-means, may result in many spatially isolated pixels or zones, which may be members of a well-represented class (>>1%). In some instances a smoothing filter can consolidate such occurrences with their surrounding zones, but must be applied with care since the resultant classification will knowingly contain mixtures

•It is often also the practice to assign different weights to clusters when finalizing the classes, in accordance with a priori interpretation of the resulting outcomes. The benefits of so doing should be set out in a clear and transparent manner in order to aid interpretation of the results by third parties

Linear Discriminant Analysis (LDA), as applied within Idrisi, assumes we have P classes and P linear discriminant functions. These functions, which are rather like linear regression equations, are computed by analysis of training site data comprised of m image bands. Having computed the weights for each of these linear functions, a set of P weighted sums is then calculated for each pixel in the target image and the sum that has the highest value or score, Si, is taken as the class to which the pixel is to be assigned. The sum is of the form:

where ci is a constant for class i, xj is the observed pixel value in class j, and wi,j is the pre-computed weight applying to band j and class i. Hence with m=7 spectral bands there would be a set of 7+1 associated weights for each discriminant function. If there were P=5 such functions (i.e. 5 classes), there would be a matrix, W, of 40 weight values, and each pixel would be assigned to one of the 5 classes.

Most GIS packages, such as such as TNTMips, Idrisi, GRASS and ArcGIS Spatial Analyst, include a range of multivariate classification procedures. As noted earlier, these are principally for image classification and pixel assignment, as are those provided within the specialized image processing suites, ERDAS and ENVI. The most commonly provided include K-means, ISODATA, and Maximum Likelihood procedures. More details, with specific reference to multi-band and hyperspectral image analysis, are provided in Sections 4.2.12, Classification and clustering, below. Details of image classification techniques that use artificial neural network (ANN) methods are provided in Section 8.3, Artificial Neural Networks (ANN). Details of many of the traditional methods described may be found in Tou and Gonzales (1974, Chapter 3), Ball and Hall (1965), Richards (1986) and Burrough and McDonnell (1998, 2015).

Multi-band image classification

In this subsection we provide a brief description of many of the multi-band classifier procedures generally available, with special reference to those provided within one of the main raster-focused GIS packages, Idrisi. Other packages, notably ENVI, also provide many of these facilities.

Within GIS and remotely-sensed image processing software multivariate classification methods are almost exclusively applied to image datasets — typically multi-band image files. The objective is to identify distinct areas or features (collections of pixels or cells) such as forest, water, grassland and buildings and assign all occurrences of such features to distinct classes. This is either performed semi-automatically (supervised, requiring training site data) or automatically (unsupervised).

Supervised classification (S) involves the manual analysis of selected images to identify features of interest and assign polygonal regions to user-defined classes. The statistical characteristics of these regions within and across rasters, sometimes described as signatures, can then be used to guide subsequent classification. Special toolsets are generally provided for signature development. It may also be necessary to normalize the datasets (to ensure the range of values in each band is similar) and/or to (dimensionally) transform the datasets in some way before proceeding to the formal clustering and assignment processes (for example using techniques such as principal components analysis, PCA, or factor analysis, FA – see further, CATMOGs 7 and 8).

Unsupervised classification (U) either does not require training data to be specified in advance, or performs a form of automated selection from the input images. With these techniques it is often helpful to specify more clusters than you would ultimately like or expect and then to combine some of those produced if they seem essentially the same in terms of your own objectives rather than purely computational factors. For example separate clusters may be produced for deciduous and coniferous vegetation cover, but if the objective is to classify areas as woodland without separating these by type, post classification combination may be effective. Most of the more sophisticated raster-based GIS packages provide such facilities, for example TNTMips (see the TNTMips Image Classification Guide) and Idrisi (see the Idrisi Guide to GIS and Image Processing), as well as the more generic ArcGIS Spatial Analyst toolset and the specialized image processing suites, ERDAS and ENVI. Several of these packages refer to Richards (1986) as a primary source for many of the standard procedures they adopt for classification and interested readers are recommended to refer to this volume for more information.

Each pixel of a multi-band image to be classified can be regarded as being comprised of a vector x={x1,x2,x3…,xm} of data values, one per band. Typically the data are real-valued and positive. In so-called remotely-sensed multi-band image processing the number of bands is generally not very large, typically 7 or fewer. In such multi-band images it is often the case that most of the useful information is contained in just two or three bands, for example the red, near infra-red and middle infra-red bands, and in some instances these may be used rather than the full set. Hyperspectral images, on the other hand, are comprised of a very large number of finely divided spectral bands, from 10s to 100s of such bands. A third type of multi-band image is one in which the images are co-registered but are not all derived from remote-sensing ― some of the bands or raster layers may be derived from other forms of survey and/or be computationally derived (e.g. rasters generated by rasterizing zonal, linear or point vector data). An example involving the analysis of such data using artificial neural network (ANN) methods is provided in Section 8.3.3, Self organizing networks.

Table 4‑6 summarizes the principal set of so-called hard classifiers (H), some of which also provide soft classification as an option. Hard classifiers assign every pixel in the target image to one of n classes, or to a null class (usually coded as 0). Soft classifiers (S) assign pixels to classes with a value that indicates their degree of membership of that class. This may be expressed as a probability value, pk, with the sum over all k=1..P classes being 1 (or 100%), or may be a value that is not regarded as a probability and may not sum to 1 (typically summing to less than 1). Display of soft classified images is often as a series of separate layers, one per class, with the pixels being color- or gray-scale coded to indicate their degree of membership.

The output of soft classifiers can be hardened, in a variety of ways. One approach is to take the output from a soft classification and generate a set of classification maps that provide the first (most likely/highest scoring), second etc. class assignments as separate displays. Another approach is to utilize ancillary data, such as (buffered) GIS vector data that is overlaid on the classified results. For example, this may assist in identifying classes that are or are not built-up land (i.e. increasing the strength of evidence for specific classes) based on the known distance from roads. Yet another approach is to utilize prior geostatistical analysis of higher resolution remotely-sensed imagery obtained from a smaller area, or field survey data, to provide enhanced guidance for the classification process.

Most of the classification procedures described in Table 4‑6 are non-hierarchical, i.e. they classify pixels in the target image into distinct classes that are not grouped into levels. Some do provide a tree-like structure (e.g. CTA). In this case, conceptually one might imagine high level groups such as Built, Wetland, Forest, Grass, and Rock. Forest, for example, might be divided into deciduous and evergreen. Deciduous might be further divided into species-type subgroups. In Table 4‑6 the H/S column identifies whether the classification is hard or soft, with some classifiers providing both options. The S/U column identifies whether the procedure is essentially supervised (S) or unsupervised (U). In addition, and not described in detail here, are mixture analyses, in which pixels may be allocated to one or more than one class simultaneously. For example, a pixel might be assigned to two different kinds of woodland or two different kinds of soil type, thus regarded as being of explicitly mixed rather than single class membership. Mixture assignments are usually on an hierarchical basis, e.g. with 3 classes [A], [B] and [C] the 7 possible mixtures would be [A], [B], [C], [A,B], [A,C], [B,C], and [A,B,C]. Clearly there may be a very large number of mixture class combinations, so this approach must be applied with caution.

Table 4‑6 Image classification facilities — Selected classifiers

Classifier (Idrisi function shown in CAPS) |
S/U |
H/S |
Description |
---|---|---|---|

Simple one-pass clustering |
U |
H |
A technique that generates up to P clusters by assigning each input cell to the nearest cluster if its Euclidean distance is less than a given threshold. If not the cell becomes a new cluster center. It principal merit is speed, but its quality of assignment may not be acceptable (see also, PIPED, below) |

Parallelepiped |
S |
H |
Based on a set of lower and upper threshold values determined for a signature on each band. To be assigned to a particular class, a pixel must exhibit values within this range (absolute limits or standard deviation, for training dataset) for every band considered. Non-unique assignment with no assignment (Class=0) if a pixel lies outside the threshold for all signatures. Very fast, but generally not recommended for use because the class ‘boxes’ defined by the thresholds typically overlap, meaning that pixel assignment to specific classes is arbitrary in these regions |

Minimum distance to mean |
S |
H |
Essentially the same as simple one-pass clustering but cluster centers are pre-determined by analysis of a training dataset, using the mean values on each band for a signature. Pixels are assigned to the class with the mean closest to the value of that pixel. Applied when the number of pixels used to define signatures is very small or when training sites are not well defined. No assignment (Class=0) is made if a pixel lies outside a maximum search distance set by the user for all signatures. Data may be raw values or pre-normalized. Use in preference to MAXLIKE if training sites are not well defined in terms of classes. Fast, often performing well, but does not account for the variability between classes since it uses mean values only ― MAXLIKE (below) is generally a preferable approach (see also ISOCLUST) |

Maximum likelihood (MAXLIKE) |
S |
H |
A method that uses statistical analysis (variance and covariance) of a training dataset (class signatures) whose contents are assumed to be Normally distributed (prior transformation of the input dataset may therefore be advisable). It seeks to determine the probability (or likelihood) that a cell should be assigned to a particular cluster, with assignment being based on the computed Maximum Likelihood value. MAXLIKE operates in a similar manner to MINDIST but takes account of correlation between bands. Pixels are assigned to the most likely class based on a comparison of the posterior probability that it belongs to, each of the signatures being considered (i.e. Bayesian). Prior probabilities may be set in various ways, including uniform (all classes equally likely) or using separate knowledge of some or all classes, e.g. 30% is expected to be woodland, 20% grassland, etc, where the labels woodland and grassland correspond to specific signatures. Unique assignment ― no assignment (Class=0) is made if a pixel lies outside a pre-specified probability level (e.g. 1%, 5%, i.e. less than x% likelihood of belonging to any of the signature classes). Requires number of pixels >10 times number of bands. Limitations (Idrisi): number of bands <65; number of signatures<256. Quality heavily affected by homogeneity of training sites. Relatively fast (see also ISOCLUST) |

(Stepwise) Linear Discriminant (FISHER) |
S |
H |
This is essentially a Linear Discriminant Analysis (LDA) method, which attempts to compute linear functions of the dataset variables that best explain or discriminate between values in a training dataset. With the stepwise variant new linear functions are added incrementally, orthogonal to each other, and then these functions are used to assign all data points to the classes. The criterion function minimized in such methods is usually Mahalanobis distance, or the D2 function. Linear Discriminant Analysis of the training site data are used to form a set of linear functions that express the degree of support for each class. The assigned class for each pixel is then that class which receives the highest support after evaluation of all functions. In the Idrisi implementation there are as many classification functions as there are classes. Each function allows classification scores to be computed for each pixel for each class. Limitations (Idrisi): number of bands <65; number of signatures<256. Unique assignment. All pixels are assigned to classes. Relatively fast |

K-nearest neighbors |
S |
H,S |
In this procedure the Euclidean distance is calculated for each pixel in an input image to all sampled training pixels in all classes. Then, for the specified k, only those k-nearest neighbors are examined in determining an image pixel’s class or degree of membership to a class. For the hard classification output, the class that dominates the k-nearest neighbors is assigned to that pixel. For the soft classification output, the proportion for each category among the k-nearest neighbors is evaluated. For each class an image is produced with each pixel assigned its respective proportion. Variable speed of operation, depending on parameters, especially size of k-value |

Histogram clustering |
U |
H |
CLUSTER uses a histogram peak technique of cluster analysis. This is equivalent to looking for the peaks in a one-dimensional histogram, where a peak is defined as a value with a greater frequency than its neighbors on either side (see Figure 4‑23). Once the peaks have been identified, all possible values are assigned to the nearest peak and the divisions between classes fall at the midpoints between peaks. In Idrisi a maximum of 7 image bands can be used as input, and pixels are assigned to a maximum of 255 classes. A 1-dimensional to 7-dimensional histogram is used to find the peaks. A peak is a class where the frequency is higher than all of its non-diagonal neighbors (broad level), or higher than all but one of its non-diagonal neighbors (fine level). Input images are pre-processed into 256 gray-scale values, generally with the tails of the histograms cut off (e.g. 2.5% at each end). CLUSTER can be used to develop class signatures that may then be applied to supervised classification systems like MAXLIKE. Fast |

ISOdata/ISOcluster (Iterative Self-Organizing) |
U |
H |
Similar to the K-means procedure but at each iteration the various clusters are examined to see if they would benefit from being combined or split, based on a number of criteria: (i) combination — if two cluster centers are closer than a pre-defined tolerance they are combined and a new mean calculated as the cluster center; (ii) if the number of members in a cluster is below a given level (e.g. 20) the cluster is discarded and the members re-assigned to the closest cluster; and (iii) separation — if the number of members, or the standard deviation, or the average distance from the cluster center exceed pre-defined values then the cluster may be split. ISOCLUST is an automated procedure that combines the CLUSTER operation with the MAXLIKE cluster assignment process, applied iteratively ― both are described above. It is similar to the ISODATA procedure (see Dunn, 1973). CLUSTER is used to generate initial cluster centers (seed locations) and these define the signatures which are then used by MAXLIKE to determine cluster assignments. Restrictions are as per CLUSTER and MAXLIKE. The procedure is quite slow, depending on the complexity of the images and the number of clusters, but is claimed to produce very strong cluster assignment. |

K-means |
U |
H |
The classical K-means algorithm has been described earlier. Limitations (Idrisi): n<65 image bands and K<256 clusters. Very dependent on selection of number of clusters and their initial center locations (seed locations). This is a heuristic, greedy algorithm for minimizing SSE (Sum of Squared Errors) hence it may not converge to a global optimum. Three options are provided in Idrisi for seed locations: random partitioning ― each pixel is assigned at random to one of K clusters and the cluster centers are computed from the assigned points; random centers ― randomly selects K pixels as the initial centers and assigns other pixels to these; and diagonal values ― the range of pixel data values in n-space is computed and values assigned at even spacing along this [min] to [max] n-vector. Relatively fast |

Fuzzy c-means (FCM) |
U |
H/S |
Similar to the K-means procedure, originally developed by Dunn (1973) but uses weighted distances rather than unweighted distances. Weights are computed from prior analysis of training data for a specified number of classes. These cluster centers then define the classes and all cells are assigned a membership weight for each cluster. The process then proceeds as for K-means but with distances weighted by the prior assigned membership coefficients (see also, Section 4.2.13, Boundaries and zone membership) |

Minimum distribution angle (MDA) |
U |
H |
An iterative procedure similar to K-means but instead of computing the distance from points to selected centers this method treats cell centers and data points as directed vectors from the origin. The angle between the data point and the cluster center vector provides a measure of similarity of attribute mix (ignoring magnitude). This concept is similar to considering mixes of red and blue paint to produce purple. It is the proportions that matter rather than the amounts of paint used. See also, Spectral angle clustering in this section |

Multi-level perceptron (MLP) |
S/U |
H/S |
A neural network technique. Described in detail in Section 8.3.1 et seq, Introduction to artificial neural networks, with a worked example. Fairly slow |

Self organizing maps (SOM) |
S/U |
H |
A neural network technique based on Kohonen (1990). Described in detail in Section 8.3.3, Self organizing networks, et seq. With the unsupervised model variant K-means is generally used to locate the initial clusters. Moderate speed of clustering |

Fuzzy ARTMAP |
S/U |
H |
ART stands for Adaptive Resonance Theory, a form of neural network model. Fuzzy ART is a clustering algorithm that operates on vectors with fuzzy input patterns (real numbers between 0.0 and 1.0) and incorporates an incremental learning approach which allows it to learn continuously (remembering previous learned states). The MAP part of the name refers to the use of an additional MAP layer in the neural network model when used in supervised mode. For more details see Mannan and Roy (1998), or Fischer (2006, Ch.11) |

Classified tree analysis |
S |
H/S |
A univariate hierarchical data splitting procedure, that progressively divides the training dataset pixels into two classes based on a splitting rule, and then further subdivides these two classes (i.e. a binary tree structure). Division (splitting) may be achieved using the entropy statistic or similar measures (e.g. gain ratio, Gini statistic). See further, Zambon et al. (2006) and Idrisi’s focus paper describing the technique |

Figure 4‑23 SPOT Band 1 image histogram — distinct peaks highlighted (CLUSTER)

Uncertainty and image processing

Uncertainty is an intrinsic feature of remotely-sensed image data for several reasons, including:

Resolution: the resolution of a remotely-sensed image defines the area represented by a single pixel. If a pixel represents a 100m by 100m region it is almost certain that its contents will be of mixed landcover types, and hence will generate data that reflects this mixture. Specific techniques (generally based on soft classifiers such as Idrisi’s UNMIX, BAYCLASS and FUZCLASS functions) are available for analyzing within-cell spectral variation, often described as sub-pixel classification

Temporal variables: time of day and season affect both the landcover (e.g. growth stages) and illumination patterns (e.g. lighting of slopes, vegetation surfaces, moisture levels) resulting in additional variations within and between image datasets and bands

Spectral response: remote-sensing may not measure the spectral response band or bands on which the particular surface provides maximum response. The response data may therefore be weak or mixed/easily confused with other surface responses

Soft classification procedures can provide an image displaying the degree of uncertainty of the classification. For example, for every soft classification image one pixel or set of contiguous pixels (region) may have the same or very similar degree of class membership, hence no one class is clearly favored. This may be contrasted with a hard classification process in which the winning class, i.e. the one with slightly higher probability or degree of membership, would be selected in preference to any other. An uncertainty map illustrates the degree to which one class (from a total of P classes) stands out from the rest for each pixel in the image. Definition of uncertainty could be based on Shannon’s information statistic (entropy) statistic or variations thereof, or on an application-specific measure. Idrisi adopts the latter approach, using an uncertainty measure, U, of the form:

Here, m is the maximum degree of membership recorded for the pixel in question, and

is the sum of the degree of membership for that pixel. Thus, with P=10 classes and m=0.3 and s=1, say, we have U=1‑0.22=0.78 whilst with m=0.1 we have U=1 (complete uncertainty) and with m=1 U=0 (complete certainty).

Hyperspectral image classification

Hyperspectral images are comprised of a large number of finely divided spectral bands, from 10s to 100s of such bands. The following analysis utilizes a hyperspectral dataset produced by the NASA AVIRIS (Airborne Visible/Infrared Imaging Spectrometer) project. As noted previously, hyperspectral imagery is become increasingly available and is essentially the same as multi-band imagery but with far more bands provided. It has particular application in geological work, both terrestrial and for planetary/lunar data analysis.

The dataset we use in this subsection is comprised of 224 contiguous spectral bands in the range 400nm (nanometers) to approximately 2500 nm at 10 nm intervals. The raster image files cover 512x614 cells of 20m squares and relate to a mining region (Cuprite) in western Nevada, USA (hence c.10km by 12km in area). This dataset is available for download via the data section of the AVIRIS website and from the TNTMips web site as one of their tutorial datasets. It is discussed in more detail at the USGS spectral lab site: speclab.cr.usgs.gov, from which reference spectra may be obtained. The layers of raster datasets of this type are spatially matched and highly correlated. This means that storage of such data as 224 raster objects would be extremely inefficient and hence of much more compact (compressed) 3D object variant is stored by TNTMips which they describe as a hypercube.

Figure 4‑24 shows two alternative 2D representations of the region, where 3 spectral layers have been chosen to represent the red/green/blue (RGB) components of the image for display purposes. This provides a strong visual impression of the data, but each image is based on just 3 of the 224 layers and depends for its interpretation upon the choice of the three spectral bands and color scheme applied. There are a number of alternative representations for datasets of this type.

The 3D hypercube described above can be displayed and layers selected systematically or as desired (Figure 4‑25). This is essentially a visualization tool that enables closer inspection of the data and highlights the correlated nature of the information, but does not provide a direct route to analysis. Several software packages support analysis of this type of dataset, including Idrisi (the HYPERSAM and HYPERMIN facilities, and the new Earth Trends Modeler) and the TNTMips spectral angle mapping facility, which we have used in the current example. Each pixel of the 3D hypercube has a spectral value for the ordered set of frequencies (i.e. across the depth of the cube). This ordered set can be viewed as a graph or spectrogram and can be compared with reference spectrograms (e.g. ground survey reference samples) or using laboratory sets obtained from prior research for different surface materials and minerals. Similarity of (normalized) spectral plots to the image spectral plot provides a means of classifying individual pixels and hence the region as a whole.

Figure 4‑24 2D map of Cuprite mining district, Western Nevada, USA

Bands: R=207, G=123, B=36 |
Bands: R=12, G=6, B=1 |

Figure 4‑25 3D hypercube visualization of Cuprite mining district, Western Nevada, USA

Laboratory-generated reference spectra are available from various sources, including the USGS spectral analysis laboratory cited above. It should be noted, however, that laboratory-generated reference spectra will differ from remotely-sensed spectra owing to many factors, including non-uniform atmospheric distortions. Special facilities (e.g. the hyperspectral PROFILE facility in Idrisi) are available to identify and, where necessary, eliminate those bands that are subject to particularly high atmospheric distortion. As noted above, the analysis of hyperspectral images can be undertaken by regarding each cell or pixel as a single point in 224-dimensional space. The recorded value for each spectral band is then the coordinate value for this pixel for the selected dimension. The single point in n-space can be seen as a vector from the spectral origin, 0, with the full pixel set (512x614 pixels) being seen as clouds of points in n-space. The direction of these vectors is more important than their magnitude, since the latter are affected by average brightness. Simple spectral angle mapping (SAM) consists of comparing a known (reference) spectrum, which has its own characteristic m-dimensional vector, with those in the sample image. Pixels that have an angular proximity to the reference vector below a user-specified value are assigned to that class (cf. Table 4‑6, minimum distribution angle method). Every pixel in the source hypercube can be compared with the reference vectors in m-space and the angle between the pairs of vectors computed. The resulting set of scalar values can then be mapped as a form of “angular distance” raster (fuzzy, or soft class assignment) or just those values that are below a given angular distance threshold can be mapped (absolute, or hard class assignment). Figure 4‑26 shows an absolute (hard) class assignment for a single spectrum overlaid on Figure 4‑24B above (blue area).

Figure 4‑26 Single class assignment from spectral angle analysis