Bias Correction Tutorial

In this tutorial, we will explore how to work with the topt module from xtremes to apply bias correction techniques in extreme value analysis. Specifically, we will cover:

  • Generating index ranges for sliding and disjoint block methods (I_sb and D_n).

  • Computing exceedances using exceedances.

  • Estimating cluster size probability using hat_pi0.

  • Utilizing the Upsilon, Pi, and Psi functions for bias correction.

  • Solving for bias-corrected parameters with varpi and a1_asy.

Each section includes code examples for clarity.

Step 1: Generating Block Indices

In time series analysis, blocks of data are used to analyze cluster sizes and exceedances. We define two key functions:

I_sb(j, bs): Generate indices for a sliding block

import numpy as np

def I_sb(j, bs):
    return np.arange(j, j + bs)

# Example usage
print(I_sb(3, 5))  # Output: [3, 4, 5, 6, 7]

D_n(n, bs): Generate index pairs for disjoint blocks

def D_n(n, bs):
    idx = np.arange(1, n - bs + 2)
    pairs = []
    for i in idx:
        left_js = idx[idx + bs <= i]
        right_js = idx[idx >= i + bs]
        for j in left_js:
            pairs.append((i, j))
        for j in right_js:
            pairs.append((i, j))
    return pairs

# Example usage
print(D_n(5, 2))  # Output: [(1, 3), (1, 4), (2, 4), (3, 1), (3, 2), (4, 1), (4, 2)]

Step 2: Computing Exceedances

The exceedances function counts values exceeding a given threshold in a block.

def exceedances(data, maxima, bs, i, j, stride='DBM'):
    if stride == 'DBM':
        return np.sum(data[(j-1)*bs:j*bs] > maxima[i-1])
    if stride == 'SBM':
        return np.sum(data[j:j+bs] > maxima[i-1])

# Example usage
data = np.array([1, 3, 5, 2, 6, 4, 7, 9])
maxima = np.array([5, 7])
print(exceedances(data, maxima, bs=2, i=1, j=3, stride='DBM'))  # Output: 1

Step 3: Estimating Cluster Size Probability

The function hat_pi0 estimates the probability of cluster size 1.

import xtremes.topt as topt

def hat_pi0(data, maxima=None, bs=None, stride='DBM'):
    if maxima is not None:
        bs = len(data) // len(maxima)
    elif bs is not None:
        maxima = topt.extract_BM(data, bs, stride=stride)
    else:
        raise ValueError('Either maxima or block size must be provided')
    k = len(maxima)
    if stride == 'DBM':
        s = np.sum([exceedances(data, maxima, bs, i, j, stride='DBM') == 1
                    for i in range(1, 1 + k)
                    for j in np.delete(np.arange(1, 1 + k), i-1)])
        return 4 * s / (k * (k-1))

# Example usage
print(hat_pi0(data, maxima, bs=2, stride='DBM'))

Step 4: Using Bias Correction Functions

We define functions for the Upsilon, Pi, and Psi functions used in bias correction.

from scipy.special import gamma, digamma, polygamma
from scipy.optimize import root_scalar

def Upsilon(x, rho0):
    return rho0 * gamma(x+2) + (1-rho0) * gamma(x+1)

def Upsilon_derivative(x, rho0):
    return rho0 * gamma(x+2) * digamma(x+2) + (1-rho0) * gamma(x+1) * digamma(x+1)

def Pi(x, rho0):
    return 1/x - Upsilon_derivative(x, rho0)/Upsilon(x, rho0) + rho0/2 - np.euler_gamma

def Psi(a, a_true, rho0):
    vp = a / a_true
    term = 1 / vp - Upsilon_derivative(vp, rho0) / Upsilon(vp, rho0) + rho0/2 - np.euler_gamma
    return 2 / a_true * term

Step 5: Solving for Bias-Corrected Parameters

To obtain bias-corrected parameters, we use numerical root-finding methods.

def varpi(rho0):
    sol = root_scalar(Pi, args=(rho0,), bracket=[0.01, 10])
    return sol.root

def a1_asy(a_true, rho0):
    sol = root_scalar(Psi, args=(a_true, rho0), bracket=[1e-2, 100])
    return sol.root

# Example usage
rho0 = 0.5
print(varpi(rho0))  # Computes bias-corrected root

Conclusion

In this tutorial, we explored: - How to generate block indices. - Compute exceedances within blocks. - Estimate cluster size probability using hat_pi0. - Use the Upsilon, Pi, and Psi functions for bias correction. - Solve for bias-corrected parameters using numerical root-finding.

These methods are essential for extreme value analysis in time series data and improving MLE estimates by accounting for bias.