Once developed, the ABM computer program must be verified by checking that the model behaves as expected; often referred to as internal validation (or inner validity). Whether the model itself is an accurate representation of the real-world is a different type of validity (see below). Achieving inner validity is harder than it might seem. For instance, hypotheses about the model’s output can be tested under a range of input parameter settings. Perhaps the model can be examined under an extreme situation where the outcome is easily predictable. These hypotheses should yield expected results from the model because they are based on the conceptual model design. However, it is difficult for a programmer to know whether unexpected outcomes are a reflection of a mistake in the computer program (a ‘bug’), logical errors of the model, or a surprising consequence of the model itself (Gilbert and Terna, 1999).
This predicament is compounded by the problem that complex systems can often produce emergent and counter-intuitive results. A modeler can guard against the former of these problems by adopting ‘unit tests’ while programming a model. Unit testing involves the execution of the computer program after each modification of the code to check that a bug has not been introduced. Nevertheless, a modeler must still determine if unexpected results arise because of errors in the model logic, or just a feature of the system being modeled. These difficulties of verification are further compounded by the fact that most simulations rely on (pseudo-) random numbers to generate the effects of unmeasured variables and random choices. Thus, repeated runs can be expected to produce different outcomes. Fortunately, one of the main advantages of ABMs is that they provide a natural method for describing and simulating a real-world system, which helps simplify the model logic. However, it is not uncommon to spend more time confirming that a model has been programmed correctly than programming the model itself.
The most thorough way of verifying a model is to re-implement the model using a different programming language and ideally a different ABM toolkit; a process sometimes referred to as ‘docking’ or ‘alignment’ (Axtell et al., 1996). Although this method will never attain the status of a proof, it helps the modeler become more confident as to the veracity of the model results (Hales et al., 2003). Docking is not always practical (owing to time or resource constraints) or feasible (requiring a modeler to learn a new programming language or how to develop their model in an alternative system). However, replication is one of the hallmarks of the scientific method, and is an important facet needed to confirm whether claimed results of a simulation are reliable (i.e. can be reproduced by somebody else starting from scratch). Axelrod (2006) notes that without this confirmation it is possible that specific published results could be in error as a result of programming mistakes, misrepresentation of the system being modeled, or errors analyzing the simulation results. Furthermore, replication is required to determine whether a new model can subsume a previous model. If it is not practical or feasible for a modeler to dock their model, it is critical that a thorough description of the model is provided for others to attempt replication. Unfortunately, the latter stipulation has rarely been accomplished to date; a noteworthy exception being Railsback et al. (2006). Documentation should include information regarding the source code for running the model, how to run the program, and how to understand the output. Attempts are being made to devise ontologies and protocols for model comparison, such as the ODD (overview, design concepts, details) protocol proposed and demonstrated by Grimm et al. (2006). A worked example applied to three land use and land cover models can be seen in Polhill et al. (2008). Fortunately, the use of standardized ABM toolkits and programming languages facilitates replication. Unfortunately, it is often difficult for a modeler to provide a complete description of their model within the word limit of publications (except digital media), especially when addressing an interdisciplinary audience. Carley (1996) stresses that computer models and their output should be described and presented as separate publications.
After a modeling endeavor has been verified the final stages are to calibrate and validate the model. Calibration entails setting the model structure and parameter values in ways that accurately reflect a real-world system. Calibration typically requires data on the micro-level processes that the agent-based model is based upon. These data can be acquired through various means, such as surveys, statistical analysis of empirical data, experiments designed to obtain decision-making strategies and factors, etc. Calibration occurs in stages, usually repeated iteratively, until the outcomes of the model fit (within a reasonable tolerance) the real-world data collected. Therefore, calibration is useful for assessing the feasibility of the model to simulate the real-world system (i.e. showing that the model can generate results that match the real-world data). If the model output cannot be fitted to the real-world data, it may be necessary for the modeler to re-program aspects of the model (e.g. rules dictating agent behavior and interaction, etc.). Thus, calibration is also helpful in the verification process of the model. It should be recognized that the level of correspondence between the model and the real-world data depends in part on the purpose of the model. Hence the modeler must have confidence in the accuracy of the real-world data (e.g. ensuring that the data do not represent only extreme or improbable situations). The impulse to calibrate a model can lead to the model being overly fitted. In this situation, a model fits the real-world data but is insufficiently general to represent a diverse range of system outcomes, or be applied to other systems.
It is often argued that a model with sufficient parameters can always be adjusted until the real-world data are matched. To this extent, modelers should be wary that calibration does not guarantee the validity of a model. However, for many models this criticism is less valid. In particular, within agent-based models that represent processes with rules (e.g. the interaction and behavior of agents) rather than parameterized equations, there are often few if any parameters. Conversely, there is no guarantee that a model with a large number of rules dictating the interaction and behavior of agents can be configured to generate the observed data.