Bias Correction

This module computes bias corrections for the Maximum Likelihood Estimation (MLE) of cluster size distributions in stationary time series. So far, it only works for $t=2$.

Overview

The biascorrection module provides tools for estimating probabilities of cluster sizes, calculating related functions like ( Upsilon ) and ( Pi ), and finding roots for bias correction.

Functions

The I_sb Function

xtremes.biascorrection.I_sb(j, bs)

Generate a range of integers representing indices for a sliding block method.

This function is often used in time series analysis or statistical modeling contexts where data is divided into overlapping or non-overlapping blocks.

Parameters:

jint

The starting index for the block. This typically represents the position in the time series or dataset where the block begins.

bsint

The block size, which determines the number of elements in the block and the range of indices generated.

Returns:

numpy.ndarray

An array of consecutive integers starting from j and ending at j + bs - 1. This array can be used to reference indices of elements within a specific block.

Example:

>>> import numpy as np
>>> I_sb(3, 5)
array([3, 4, 5, 6, 7])

References:

Bücher, A., & Jennessen, T. (2022). Statistical analysis for stationary time series at extreme levels: New estimators for the limiting cluster size distribution. Stochastic Processes and their Applications, 149, 75-106.

The D_n Function

xtremes.biascorrection.D_n(n, bs)

Generate all pairs of indices representing disjoint blocks in a time series.

This function identifies pairs of block indices (i, j) such that the blocks starting at indices i and j, each of size bs, do not overlap. It is useful in time series analysis for constructing disjoint block structures, particularly in statistical estimators for cluster sizes.

Parameters:

nint

The length of the time series or dataset. This determines the range of valid indices for block positions.

bsint

The block size, which specifies the number of elements in each block. Blocks are constructed as contiguous subsets of the time series.

Returns:

list of tuples

A list of tuples (i, j), where i and j are the starting indices of two disjoint blocks. The blocks I_sb(i, bs) and I_sb(j, bs) have no overlap, i.e., their intersection is empty.

Example:

>>> import numpy as np
>>> def I_sb(j, bs):
>>>     return np.arange(j, j + bs)
>>> D_n(5, 2)
[(1, 3), (1, 4), (2, 4), (3, 1), (3, 2), (4, 1), (4, 2)]

Notes:

  • The function uses the helper function I_sb to define the range of indices covered by a block starting at a given index.

  • It checks for disjointness using np.intersect1d, which computes the intersection of two arrays.

  • This function is critical in methods involving disjoint blocks, such as certain statistical estimators that assume independence between blocks.

The exceedances Function

xtremes.biascorrection.exceedances(data, maxima, bs, i, j, stride='DBM')

Calculate the number of exceedances of a given maximum within a specified block. This function computes the number of values in the j-th block of the data that exceed the i-th maximum. The block can be defined using either a disjoint block method (DBM) or a sliding block method (SBM).

Parameters:

dataarray-like

The time series or dataset from which exceedances are calculated.

maximaarray-like

An array of maxima values, where maxima[i-1] is the reference maximum for which exceedances are counted.

bsint

The block size, specifying the number of consecutive elements in each block.

iint

The index (1-based) of the maximum in maxima used as the reference for exceedances.

jint

The index (1-based) of the block in the dataset to examine for exceedances.

stride{‘DBM’, ‘SBM’}, optional

Specifies the block method used: - ‘DBM’ (Disjoint Block Method): The blocks are non-overlapping and start at (j-1)*bs and end at j*bs. - ‘SBM’ (Sliding Block Method): The blocks can overlap and are defined to start at j and end at j+bs.

Default is ‘DBM’.

Returns:

int

The number of elements in the specified block that exceed the given maximum.

Example:

>>> import numpy as np
>>> data = np.array([1, 3, 5, 2, 6, 4, 7, 9])
>>> maxima = np.array([5, 7])
>>> exceedances(data, maxima, bs=2, i=1, j=3, stride='DBM')
1  # One value in the 3rd disjoint block exceeds maxima[0] = 5
>>> exceedances(data, maxima, bs=3, i=2, j=2, stride='SBM')
2  # Two values in the sliding block exceed maxima[1] = 7

Notes:

  • The stride parameter allows flexibility in block definition: - ‘DBM’ is suited for non-overlapping blocks, often used in classical block maxima methods. - ‘SBM’ is suited for sliding windows, offering finer granularity for overlap-based analysis.

  • Indexing of blocks and maxima is 1-based for user-friendliness but is internally converted to Python’s 0-based indexing.

The hat_pi0 Function

xtremes.biascorrection.hat_pi0(data, maxima=None, bs=None, stride='DBM')

Compute estimators ( hat{pi}(1) ) for the limiting cluster size probability ( pi(1) ).

This function estimates the first value of ( pi(m) ), which represent probabilities associated with cluster sizes in stationary time series.

Parameters:

dataarray-like

The dataset or time series used to compute probabilities and estimators.

maximaarray-like, optional

A pre-computed array of maxima (e.g., block maxima). If not provided, block maxima are computed internally using the specified bs.

bsint, optional

The block size, specifying the number of elements in each block. If not provided, it is inferred from the length of data divided by the number of maxima, if maxima are provided.

stride{‘DBM’, ‘SBM’}, optional

Defines the block method: - ‘DBM’ (Disjoint Block Method): Non-overlapping blocks are used. - ‘SBM’ (Sliding Block Method): Overlapping sliding blocks are used. Default is ‘DBM’.

Returns:

float

Estimated ( hat{pi}(1) ) value.

Raises:

ValueError

If neither maxima nor bs is provided.

Example:

>>> import numpy as np
>>> data = np.array([1, 3, 5, 2, 6, 4, 7, 9, 8, 10])
>>> maxima = np.array([5, 7, 10])
>>> hat_pis(1, data, bs=3, stride='SBM')
[1.5, 1.0]  # Using SBM and calculated block maxima

Notes:

  • The recursive formula for ( hat{pi}(m) ) is given by:

\[\hat{\pi}(1) = 4 \bar{p}(1) \hat{\pi}(m) = 4 \bar{p}(m) - 2 \sum_{k=1}^{m-1} \hat{\pi}(m-k) \bar{p}(k)\]
  • This ensures that each ( hat{pi}(m) ) is built upon the values of smaller ( m ).

  • If pbars is not provided, the function computes ( bar{p}(m) ) using pbar_dbm_fast for the DBM stride. Future extensions can integrate SBM support dynamically.

  • The function is optimized for the DBM stride but can handle SBM if provided with pbars.

References:

Bücher, A., & Jennessen, T. (2022). Statistical analysis for stationary time series at extreme levels: New estimators for the limiting cluster size distribution. Stochastic Processes and their Applications, 149, 75-106.

The Upsilon Function

xtremes.biascorrection.Upsilon(x, rho0)

Compute the ( Upsilon(x, rho_0) ) function specified in bias correction.

Parameters:

xfloat

The parameter ( x ) in the ( Upsilon(x, rho_0) ) function. Typically, ( x ) relates to the scaling or shape parameter in the analysis.

rho0 : float

Returns:

float

The computed value of ( Upsilon(x, rho_0) ).

The Upsilon_derivative Function

xtremes.biascorrection.Upsilon_derivative(x, rho0)

Compute the derivative of ( Upsilon(x, rho_0) ) function specified in bias correction.

Parameters:

xfloat

The parameter ( x ) in the ( Upsilon(x, rho_0) ) function. Typically, ( x ) relates to the scaling or shape parameter in the analysis.

rho0 : float

Returns:

float

The computed value of ( Upsilon’(x, rho_0) ).

The Upsilon_2ndderivative Function

xtremes.biascorrection.Upsilon_2ndderivative(x, rho0)

Compute the second derivative ( Upsilon(x, rho_0) ) function specified in bias correction.

Parameters:

xfloat

The parameter ( x ) in the ( Upsilon(x, rho_0) ) function. Typically, ( x ) relates to the scaling or shape parameter in the analysis.

rho0 : float

Returns:

float

The computed value of ( Upsilon’’(x, rho_0) ).

The Psi Function

xtremes.biascorrection.Psi(a, a_true, rho0)

Compute the ( Psi(x, rho_0) ) function specified in bias correction, closely related to (Upsilon).

Parameters:

xfloat

The parameter ( x ) in the ( Upsilon(x, rho_0) ) function. Typically, ( x ) relates to the scaling or shape parameter in the analysis.

rho0 : float

Returns:

float

The computed value of ( Psi(x, rho_0) ).

The a1_asy Function

xtremes.biascorrection.a1_asy(a_true, rho0)

Compute the numerical root of ( Psi(x, rho_0) ).

Parameters:

a_true : float

rho0 : float

Returns:

float

The computed root of ( Psi(x, rho_0) ).

The varpi Function

xtremes.biascorrection.varpi(rho0)

Compute the numerical root of ( Pi(x, rho_0) ).

Parameters:

rho0 : float

Returns:

float

The computed root of ( Pi(x, rho_0) ).

The z0 Function

xtremes.biascorrection.z0(rho0)

Compute (z_0), a special quantity related to bias-correcting (sigma) ( Pi(x, rho_0) ).

Parameters:

rho0 : float

Returns:

float

The computed root of ( Pi(x, rho_0) ).