Model selection in earthquake recurrence relationships: b-value bias in tectonic, volcanic and induced seismological settings
The Gutenberg-Richter (GR) law logF = a − bm expresses the relationship between the magnitude of earthquakes (m) and their frequency of occurrence (F) via two constants a and b, which describe the seismicity rate and the ratio of small to large earthquakes in any given region and time, respectively. The b-value is a key parameter in earthquake statistics because it determines the recurrence relationships used in earthquake hazard assessment and therefore the extrapolated likelihood of extreme events that may be larger than those that have occurred so far in the instrumental or historical record. In practice, b-value estimates are easily affected by estimates of other prior required parameters such as the magnitude of completeness of a catalogue. Any estimate of b is also conditional on the assumption that the underlying distribution is the GR law, but this may not always be the case. These dependencies can easily lead to bias in estimating the hazard, and therefore ultimately the risk at a certain location. The aim of this research is to quantify these biases, and to understand their potential impact on the seismic hazard, notably in defining earthquake recurrence relationships. Here I test both the GR law and a modified version, denoted MGR, which introduces a gradual taper to the frequencies at large magnitudes. The MGR has the advantage of maintaining a finite rate of energy release in the population, at the expense of an extra parameter representing a characteristic or ‘corner’ magnitude characterising the taper. I also examine a variety of existing methods for parameter estimation of the magnitude of completeness to assess how sensitive changes in the b-value are due to amount of data, the dynamic range of magnitudes above the completeness threshold, the model estimation itself, and then apply these to a wide variety of examples of seismicity. This choice of a variety of seismological settings incorporates volcanic and induced earthquakes because they have not yet been as widely researched in terms of the potential biases described above as tectonic seismicity, and to test the common claim that b-values are systematically higher for volcanic and induced seismicity. This allows a comparative discussion of the extent to which the recurrence relationships are similar or different to those of tectonic seismicity, and of how parameters in tectonic seismicity may be affected by the nature and scale of the study regions chosen. This is done initially on synthetic data sampled from a known underlying distribution, and then by applying the lessons learned to the real data. The results show that induced and volcanic seismicity do not necessarily have systematically higher b-values, i.e. there are not necessarily relatively more small than large events in such settings. Instead, the b-value is often biased to high values, due to a combination of narrow dynamic range of magnitude and number of events, the application of incorrect methods of model estimation, and the prior assumption of an inappropriate underlying distribution associated with a particular physical process. Furthermore, I show that volcanic settings can have surprisingly low b-values when the data analysed does not just cover specific outbreaks or eruptions, but spans a much longer period of time, thereby increasing both the amount of data and the dynamic range of magnitudes. While it is not simple to identify the true underlying model when investigating new, real catalogues, I suggest how alternate hypotheses may be tested before assuming the default GR law in all situations, and provide a protocol for doing so. I also show that the source of the bias in the b-value can be best explained by the convergence of the mean magnitude from below to an asymptotic limit in random samples. As a direct consequence, the maximum likelihood b-value requires large amounts of data and dynamic range to reach full convergence with high accuracy and precision. The results are important because they quantify the systematic error in the b-value that may arise in different settings from the different aspects of the primary data, its sampling, the prior assumption of the form of the frequency-magnitude distribution and the methods of analysis in common use at present. In turn, such systematic errors are important because they can have a large effect on forecasting the likelihood of rare, large events, and hence should be considered as a potential source of epistemic uncertainty in seismic hazard analysis.