Reference: Bootstrap

This module computes a Bootstrap procedure on disjoint or sliding block maxima.

Overview

The xtremes.bootstrap module provides tools for performing bootstrap procedures on block maxima. It includes classes and functions for extracting block maxima, resampling, and estimating parameters using Maximum Likelihood Estimation (MLE).

Classes

The `FullBootstrap` Class

class xtremes.bootstrap.FullBootstrap(initial_sample, bs=10, stride='DBM', dist_type='Frechet')

Bases: object

A class to perform bootstrapping of Maximum Likelihood Estimates (MLE) for Fréchet or GEV distributions.

This class performs block maxima extraction from an initial sample using either disjoint or sliding blocks. It applies a bootstrap resampling procedure to estimate the variability of the MLE parameters for the specified distribution type (Fréchet or GEV). The bootstrap method is parallelized for efficiency and supports reproducibility through optional seed setting.

Parameters

initial_samplelist or numpy.ndarray: The initial dataset from which block maxima will be extracted and bootstrapped.
bsint, optional: Block size for the block maxima extraction. Default is 10.
stride{‘DBM’, ‘SBM’}, optional: Stride type for block maxima extraction: - ‘DBM’ (Disjoint Block Maxima): Non-overlapping blocks. - ‘SBM’ (Sliding Block Maxima): Overlapping blocks. Default is ‘DBM’.
dist_type{‘Frechet’, ‘GEV’}, optional: Distribution type to estimate the parameters for: - ‘Frechet’: Estimate parameters for the 2-parametric Fréchet distribution. - ‘GEV’: Estimate parameters for the 3-parametric Generalized Extreme Value (GEV) distribution. Default is ‘Frechet’.

Attributes

circmaxslist: The block maxima extracted from the initial sample using the specified block size and stride.
datahos.Data: The hos.Data object containing the original dataset and its MLE results.
MLEvalsnumpy.ndarray: The MLE estimates from the original dataset before bootstrapping.
valuesnumpy.ndarray: MLE estimates for each bootstrap sample after running the run_bootstrap method.
statisticsdict: Dictionary containing summary statistics (mean and standard deviation) of the bootstrap estimates.

Methods

run_bootstrap(num_bootstraps=100, set_seeds=False, max_workers=1): Runs the bootstrap procedure in parallel and calculates the MLE estimates for each bootstrap sample.

Example

>>> sample = np.random.rand(100)
>>> bootstrap = FullBootstrap(sample, bs=10, stride='DBM', dist_type='Frechet')
>>> bootstrap.run_bootstrap(num_bootstraps=100, set_seeds=True, max_workers=4)
>>> bootstrap.statistics['mean']  # Mean of bootstrap estimates
>>> bootstrap.statistics['std']   # Standard deviation of bootstrap estimates

get_CI(alpha=0.05, method='bootstrap')

Compute the confidence interval (CI) for the Maximum Likelihood Estimate (MLE) parameters based on bootstrap samples.

Parameters

alphafloat, optional: Significance level for the confidence interval. Default is 0.05, corresponding to a 95% confidence interval.
methodstr, optional: Method to compute the confidence interval. Two options are available: - ‘symmetric’: The confidence interval is computed using the symmetric quantiles. - ‘minimal_width’: The confidence interval is computed by finding the minimal-width interval that contains (1 - alpha) proportion of the bootstrap distribution. The default is ‘symmetric’.

Returns

numpy.ndarray: A 2D array with shape (n_parameters, 2) containing the lower and upper bounds of the confidence interval for each parameter. The first column represents the lower bounds, and the second column represents the upper bounds.

Notes

The confidence intervals are based on bootstrap estimates of the MLE parameters, which means the confidence intervals are derived from the empirical distribution of the parameter estimates obtained from multiple bootstrap samples.

There are two methods available for calculating the confidence intervals: - ‘symmetric’: This method takes the alpha/2 and (1 - alpha/2) quantiles of the bootstrap distribution for each parameter. It is based on the assumption that the distribution is approximately symmetric and works well when the bootstrap distribution is roughly normal. - ‘minimal_width’: This method identifies the interval with the minimal width that contains (1 - alpha) proportion of the bootstrap samples. It is particularly useful when the bootstrap distribution is skewed or not symmetric.

plot_bootstrap(param_idx=0, param_name='gamma', bins=30, output_file=None, show=True)

Plot the bootstrap distribution for a specified parameter.

Parameters

param_idxint, optional: Index of the parameter to plot (0 for the first parameter, 1 for the second, etc.). Default is 0.
binsint, optional: Number of bins to use for the histogram. Default is 30.

Notes

This method generates a histogram of the bootstrap estimates for the specified parameter and overlays the mean and confidence interval.

run_bootstrap(num_bootstraps=100, set_seeds=False, max_workers=1)

Run the bootstrap resampling procedure in parallel.

This method resamples the block maxima dataset, estimates the MLE parameters for each bootstrap sample, and computes summary statistics (mean and standard deviation) of the bootstrap estimates. The computation is parallelized using ProcessPoolExecutor with an adjustable number of worker processes.

Parameters

num_bootstrapsint, optional: Number of bootstrap samples to generate. Default is 100.
set_seedsbool, optional: If True, sets the random seed for reproducibility in each bootstrap iteration. Default is False.
max_workersint, optional: Maximum number of worker processes to use for parallelization. Default is 1 (no parallelism). Set to None to use all available CPU cores.

Returns

None: Results are stored in the values attribute and summary statistics in the statistics attribute.

Example

>>> bootstrap.run_bootstrap(num_bootstraps=500, set_seeds=True, max_workers=4)
>>> bootstrap.statistics['mean']  # Access the mean of bootstrap estimates
>>> bootstrap.statistics['std']   # Access the standard deviation of bootstrap estimates

Functions

The `circmax` Function

xtremes.bootstrap.circmax(sample, bs=10, stride='DBM')

Extract the block maxima (BM) from a given sample using different stride methods.

Parameters

samplenumpy.ndarray: A 1D array containing the sample from which block maxima will be extracted.
bsint, optional: The block size (number of observations per block) used to divide the sample for block maxima extraction. Default is 10.
stride{‘DBM’, ‘SBM’}, optional: The stride method used for extracting block maxima: - ‘DBM’ (Disjoint Block Maxima): Extracts maxima from non-overlapping blocks. - ‘SBM’ (Sliding Block Maxima): Extracts maxima using overlapping blocks. Default is ‘DBM’.

Returns

numpy.ndarray: A 1D or 2D array containing the block maxima extracted from the sample. The result depends on the stride method used: - For ‘DBM’, returns a 1D array of block maxima. - For ‘SBM’, returns a 2D array where each row contains the block maxima extracted from overlapping blocks.

Raises

ValueError: If an invalid stride method is specified.

Notes

‘DBM’ (Disjoint Block Maxima) extracts block maxima from non-overlapping blocks of size bs.
‘SBM’ (Sliding Block Maxima) creates overlapping blocks, effectively increasing the number of block maxima compared to ‘DBM’.
In the ‘SBM’ setting, the circmax() method introduced by Bücher and Staud 2024 is used.

References

Bücher, A., & Staud, T. (2024). Bootstrapping Estimators based on the Block Maxima Method. arXiv preprint arXiv:2409.05529.

Example

>>> sample = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
>>> circmax(sample, bs=5, stride='DBM')
array([5, 10])

>>> circmax(sample, bs=3, stride='SBM')
array([[3, 6, 9],
       [4, 7, 10]])

The `uniquening` Function

xtremes.bootstrap.uniquening(circmaxs, stride='DBM')

Identify unique values and their counts from a list of arrays.

Parameters

circmaxsnumpy.ndarray: A NumPy array containing block maxima values extracted from a sample.

Returns

list of tuples: A list where each element is a tuple containing two NumPy arrays: - The first array contains the unique values from the corresponding row in circmaxs. - The second array contains the counts of each unique value.

The `Bootstrap` Function

xtremes.bootstrap.Bootstrap(xx)

Generate a bootstrap sample by resampling with replacement from the input data.

Parameters

xxlist or numpy.ndarray: The input sample to resample from.

Returns

list: A new sample of the same size, created by randomly selecting elements from xx with replacement.

Notes

This function creates a bootstrap sample, which is commonly used in statistical resampling methods to estimate the variability of a statistic.

Example

>>> sample = [1, 2, 3, 4, 5]
>>> Bootstrap(sample)
[2, 5, 3, 1, 2]  # Example output, actual result may vary

The `aggregate_boot` Function

xtremes.bootstrap.aggregate_boot(boot_samp, stride='DBM')

Aggregate counts of unique values from a list of tuples containing values and their counts.

Parameters

boot_samplist of tuples: Each tuple contains two arrays: the first with values and the second with corresponding counts.

Returns

numpy.ndarray: A 2D array with two columns: the first column contains unique values, and the second column contains the aggregated counts.

Example

>>> boot_samp = [(np.array([1, 2, 3]), np.array([1, 1, 2])), (np.array([2, 3]), np.array([2, 1]))]
>>> aggregate_boot(boot_samp)
array([[1, 1],
       [2, 3],
       [3, 3]])

The `bootstrap_worker` Function

xtremes.bootstrap.bootstrap_worker(args)

Auxiliary function to perform a single bootstrap resampling and MLE estimation.

This function is designed to be used in parallelized bootstrap procedures. It takes arguments for a single bootstrap iteration, performs resampling on the given block maxima, estimates MLE parameters using the specified distribution type, and returns the results.

Parameters

argstuple: A tuple containing the following elements: - idx (int): The iteration index, used for setting the random seed if set_seeds is True. - set_seeds (bool): Whether to set the random seed for reproducibility. - circmaxs (list or numpy.ndarray): The block maxima dataset to be resampled. - aggregate_boot (callable): A function to aggregate the resampled data. - ML_estimators_data (callable): A function or class to compute MLE parameters on the aggregated data. - dist_type (str): The distribution type for MLE estimation (‘Frechet’ or ‘GEV’).

Returns

numpy.ndarray: The MLE parameter estimates for the current bootstrap sample.

Notes

This function is designed to be compatible with ProcessPoolExecutor or other parallel processing tools.
The random seed is set per iteration to ensure reproducibility when set_seeds is True.

Example

>>> args = (0, True, circmaxs, aggregate_boot, ML_estimators_data, 'GEV')
>>> bootstrap_worker(args)
array([param1, param2, param3])  # Example output for GEV distribution

Examples

Here are some examples of how to use the xtremes.bootstrap module:

FullBootstrap Class:

import numpy as np
from xtremes.bootstrap import FullBootstrap

sample = np.random.rand(100)
bootstrap = FullBootstrap(sample, bs=10, stride='DBM', dist_type='Frechet')
bootstrap.run_bootstrap(num_bootstraps=100)
print("Mean of bootstrap estimates:", bootstrap.statistics['mean'])
print("Standard deviation of bootstrap estimates:", bootstrap.statistics['std'])

circmax Function:

import numpy as np
from xtremes.bootstrap import circmax

sample = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
block_maxima = circmax(sample, bs=5, stride='DBM')
print("Block Maxima (DBM):", block_maxima)

block_maxima = circmax(sample, bs=3, stride='SBM')
print("Block Maxima (SBM):", block_maxima)

uniquening Function:

import numpy as np
from xtremes.bootstrap import uniquening

circmaxs = np.array([[1, 2, 2, 3], [2, 3, 3, 4]])
unique_values = uniquening(circmaxs)
print("Unique values and counts:", unique_values)

Bootstrap Function:

from xtremes.bootstrap import Bootstrap

sample = [1, 2, 3, 4, 5]
bootstrap_sample = Bootstrap(sample)
print("Bootstrap sample:", bootstrap_sample)

aggregate_boot Function:

import numpy as np
from xtremes.bootstrap import aggregate_boot

boot_samp = [(np.array([1, 2, 3]), np.array([1, 1, 2])), (np.array([2, 3]), np.array([2, 1]))]
aggregated_counts = aggregate_boot(boot_samp)
print("Aggregated counts:", aggregated_counts)

References

Bücher, A., & Staud, T. (2024). Bootstrapping Estimators based on the Block Maxima Method. arXiv preprint arXiv:2409.05529.

Reference: Bootstrap

Overview

Classes

The FullBootstrap Class

Parameters

Attributes

Methods

Example

Parameters

Returns

Notes

Parameters

Notes

Parameters

Returns

Example

Functions

The circmax Function

Parameters

Returns

Raises

Notes

References

Example

The uniquening Function

Parameters

Returns

The Bootstrap Function

Parameters

Returns

Notes

Example

The aggregate_boot Function

Parameters

Returns

Example

The bootstrap_worker Function

Parameters

Returns

Notes

Example

Examples

References

The `FullBootstrap` Class

The `circmax` Function

The `uniquening` Function

The `Bootstrap` Function

The `aggregate_boot` Function

The `bootstrap_worker` Function