Reference: Bootstrap

This module computes a Bootstrap procedure on disjoint or sliding block maxima.

Overview

The xtremes.bootstrap module provides tools for performing bootstrap procedures on block maxima. It includes classes and functions for extracting block maxima, resampling, and estimating parameters using Maximum Likelihood Estimation (MLE).

Classes

The FullBootstrap Class

class xtremes.bootstrap.FullBootstrap(initial_sample, bs=10, stride='DBM', dist_type='Frechet')

Bases: object

A class to perform bootstrapping of Maximum Likelihood Estimates (MLE) for Fréchet or GEV distributions.

This class performs block maxima extraction from an initial sample using either disjoint or sliding blocks. It applies a bootstrap resampling procedure to estimate the variability of the MLE parameters for the specified distribution type (Fréchet or GEV). The bootstrap method is parallelized for efficiency and supports reproducibility through optional seed setting.

Parameters

initial_samplelist or numpy.ndarray

The initial dataset from which block maxima will be extracted and bootstrapped.

bsint, optional

Block size for the block maxima extraction. Default is 10.

stride{‘DBM’, ‘SBM’}, optional

Stride type for block maxima extraction: - ‘DBM’ (Disjoint Block Maxima): Non-overlapping blocks. - ‘SBM’ (Sliding Block Maxima): Overlapping blocks. Default is ‘DBM’.

dist_type{‘Frechet’, ‘GEV’}, optional

Distribution type to estimate the parameters for: - ‘Frechet’: Estimate parameters for the 2-parametric Fréchet distribution. - ‘GEV’: Estimate parameters for the 3-parametric Generalized Extreme Value (GEV) distribution. Default is ‘Frechet’.

Attributes

circmaxslist

The block maxima extracted from the initial sample using the specified block size and stride.

datahos.Data

The hos.Data object containing the original dataset and its MLE results.

MLEvalsnumpy.ndarray

The MLE estimates from the original dataset before bootstrapping.

valuesnumpy.ndarray

MLE estimates for each bootstrap sample after running the run_bootstrap method.

statisticsdict

Dictionary containing summary statistics (mean and standard deviation) of the bootstrap estimates.

Methods

run_bootstrap(num_bootstraps=100, set_seeds=False, max_workers=1)

Runs the bootstrap procedure in parallel and calculates the MLE estimates for each bootstrap sample.

Example

>>> sample = np.random.rand(100)
>>> bootstrap = FullBootstrap(sample, bs=10, stride='DBM', dist_type='Frechet')
>>> bootstrap.run_bootstrap(num_bootstraps=100, set_seeds=True, max_workers=4)
>>> bootstrap.statistics['mean']  # Mean of bootstrap estimates
>>> bootstrap.statistics['std']   # Standard deviation of bootstrap estimates
get_CI(alpha=0.05, method='bootstrap')

Compute the confidence interval (CI) for the Maximum Likelihood Estimate (MLE) parameters based on bootstrap samples.

Parameters

alphafloat, optional

Significance level for the confidence interval. Default is 0.05, corresponding to a 95% confidence interval.

methodstr, optional

Method to compute the confidence interval. Two options are available: - ‘symmetric’: The confidence interval is computed using the symmetric quantiles. - ‘minimal_width’: The confidence interval is computed by finding the minimal-width interval that contains (1 - alpha) proportion of the bootstrap distribution. The default is ‘symmetric’.

Returns

numpy.ndarray

A 2D array with shape (n_parameters, 2) containing the lower and upper bounds of the confidence interval for each parameter. The first column represents the lower bounds, and the second column represents the upper bounds.

Notes

The confidence intervals are based on bootstrap estimates of the MLE parameters, which means the confidence intervals are derived from the empirical distribution of the parameter estimates obtained from multiple bootstrap samples.

There are two methods available for calculating the confidence intervals: - ‘symmetric’: This method takes the alpha/2 and (1 - alpha/2) quantiles of the bootstrap distribution for each parameter. It is based on the assumption that the distribution is approximately symmetric and works well when the bootstrap distribution is roughly normal. - ‘minimal_width’: This method identifies the interval with the minimal width that contains (1 - alpha) proportion of the bootstrap samples. It is particularly useful when the bootstrap distribution is skewed or not symmetric.

plot_bootstrap(param_idx=0, param_name='gamma', bins=30, output_file=None, show=True)

Plot the bootstrap distribution for a specified parameter.

Parameters

param_idxint, optional

Index of the parameter to plot (0 for the first parameter, 1 for the second, etc.). Default is 0.

binsint, optional

Number of bins to use for the histogram. Default is 30.

Notes

This method generates a histogram of the bootstrap estimates for the specified parameter and overlays the mean and confidence interval.

run_bootstrap(num_bootstraps=100, set_seeds=False, max_workers=1)

Run the bootstrap resampling procedure in parallel.

This method resamples the block maxima dataset, estimates the MLE parameters for each bootstrap sample, and computes summary statistics (mean and standard deviation) of the bootstrap estimates. The computation is parallelized using ProcessPoolExecutor with an adjustable number of worker processes.

Parameters

num_bootstrapsint, optional

Number of bootstrap samples to generate. Default is 100.

set_seedsbool, optional

If True, sets the random seed for reproducibility in each bootstrap iteration. Default is False.

max_workersint, optional

Maximum number of worker processes to use for parallelization. Default is 1 (no parallelism). Set to None to use all available CPU cores.

Returns

None

Results are stored in the values attribute and summary statistics in the statistics attribute.

Example

>>> bootstrap.run_bootstrap(num_bootstraps=500, set_seeds=True, max_workers=4)
>>> bootstrap.statistics['mean']  # Access the mean of bootstrap estimates
>>> bootstrap.statistics['std']   # Access the standard deviation of bootstrap estimates

Functions

The circmax Function

xtremes.bootstrap.circmax(sample, bs=10, stride='DBM')

Extract the block maxima (BM) from a given sample using different stride methods.

Parameters

samplenumpy.ndarray

A 1D array containing the sample from which block maxima will be extracted.

bsint, optional

The block size (number of observations per block) used to divide the sample for block maxima extraction. Default is 10.

stride{‘DBM’, ‘SBM’}, optional

The stride method used for extracting block maxima: - ‘DBM’ (Disjoint Block Maxima): Extracts maxima from non-overlapping blocks. - ‘SBM’ (Sliding Block Maxima): Extracts maxima using overlapping blocks. Default is ‘DBM’.

Returns

numpy.ndarray

A 1D or 2D array containing the block maxima extracted from the sample. The result depends on the stride method used: - For ‘DBM’, returns a 1D array of block maxima. - For ‘SBM’, returns a 2D array where each row contains the block maxima extracted from overlapping blocks.

Raises

ValueError

If an invalid stride method is specified.

Notes

  • ‘DBM’ (Disjoint Block Maxima) extracts block maxima from non-overlapping blocks of size bs.

  • ‘SBM’ (Sliding Block Maxima) creates overlapping blocks, effectively increasing the number of block maxima compared to ‘DBM’.

  • In the ‘SBM’ setting, the circmax() method introduced by Bücher and Staud 2024 is used.

References

Bücher, A., & Staud, T. (2024). Bootstrapping Estimators based on the Block Maxima Method. arXiv preprint arXiv:2409.05529.

Example

>>> sample = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
>>> circmax(sample, bs=5, stride='DBM')
array([5, 10])
>>> circmax(sample, bs=3, stride='SBM')
array([[3, 6, 9],
       [4, 7, 10]])

The uniquening Function

xtremes.bootstrap.uniquening(circmaxs, stride='DBM')

Identify unique values and their counts from a list of arrays.

Parameters

circmaxsnumpy.ndarray

A NumPy array containing block maxima values extracted from a sample.

Returns

list of tuples

A list where each element is a tuple containing two NumPy arrays: - The first array contains the unique values from the corresponding row in circmaxs. - The second array contains the counts of each unique value.

The Bootstrap Function

xtremes.bootstrap.Bootstrap(xx)

Generate a bootstrap sample by resampling with replacement from the input data.

Parameters

xxlist or numpy.ndarray

The input sample to resample from.

Returns

list

A new sample of the same size, created by randomly selecting elements from xx with replacement.

Notes

This function creates a bootstrap sample, which is commonly used in statistical resampling methods to estimate the variability of a statistic.

Example

>>> sample = [1, 2, 3, 4, 5]
>>> Bootstrap(sample)
[2, 5, 3, 1, 2]  # Example output, actual result may vary

The aggregate_boot Function

xtremes.bootstrap.aggregate_boot(boot_samp, stride='DBM')

Aggregate counts of unique values from a list of tuples containing values and their counts.

Parameters

boot_samplist of tuples

Each tuple contains two arrays: the first with values and the second with corresponding counts.

Returns

numpy.ndarray

A 2D array with two columns: the first column contains unique values, and the second column contains the aggregated counts.

Example

>>> boot_samp = [(np.array([1, 2, 3]), np.array([1, 1, 2])), (np.array([2, 3]), np.array([2, 1]))]
>>> aggregate_boot(boot_samp)
array([[1, 1],
       [2, 3],
       [3, 3]])

The bootstrap_worker Function

xtremes.bootstrap.bootstrap_worker(args)

Auxiliary function to perform a single bootstrap resampling and MLE estimation.

This function is designed to be used in parallelized bootstrap procedures. It takes arguments for a single bootstrap iteration, performs resampling on the given block maxima, estimates MLE parameters using the specified distribution type, and returns the results.

Parameters

argstuple

A tuple containing the following elements: - idx (int): The iteration index, used for setting the random seed if set_seeds is True. - set_seeds (bool): Whether to set the random seed for reproducibility. - circmaxs (list or numpy.ndarray): The block maxima dataset to be resampled. - aggregate_boot (callable): A function to aggregate the resampled data. - ML_estimators_data (callable): A function or class to compute MLE parameters on the aggregated data. - dist_type (str): The distribution type for MLE estimation (‘Frechet’ or ‘GEV’).

Returns

numpy.ndarray

The MLE parameter estimates for the current bootstrap sample.

Notes

  • This function is designed to be compatible with ProcessPoolExecutor or other parallel processing tools.

  • The random seed is set per iteration to ensure reproducibility when set_seeds is True.

Example

>>> args = (0, True, circmaxs, aggregate_boot, ML_estimators_data, 'GEV')
>>> bootstrap_worker(args)
array([param1, param2, param3])  # Example output for GEV distribution

Examples

Here are some examples of how to use the xtremes.bootstrap module:

  1. FullBootstrap Class:

    import numpy as np
    from xtremes.bootstrap import FullBootstrap
    
    sample = np.random.rand(100)
    bootstrap = FullBootstrap(sample, bs=10, stride='DBM', dist_type='Frechet')
    bootstrap.run_bootstrap(num_bootstraps=100)
    print("Mean of bootstrap estimates:", bootstrap.statistics['mean'])
    print("Standard deviation of bootstrap estimates:", bootstrap.statistics['std'])
    
  2. circmax Function:

    import numpy as np
    from xtremes.bootstrap import circmax
    
    sample = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
    block_maxima = circmax(sample, bs=5, stride='DBM')
    print("Block Maxima (DBM):", block_maxima)
    
    block_maxima = circmax(sample, bs=3, stride='SBM')
    print("Block Maxima (SBM):", block_maxima)
    
  3. uniquening Function:

    import numpy as np
    from xtremes.bootstrap import uniquening
    
    circmaxs = np.array([[1, 2, 2, 3], [2, 3, 3, 4]])
    unique_values = uniquening(circmaxs)
    print("Unique values and counts:", unique_values)
    
  4. Bootstrap Function:

    from xtremes.bootstrap import Bootstrap
    
    sample = [1, 2, 3, 4, 5]
    bootstrap_sample = Bootstrap(sample)
    print("Bootstrap sample:", bootstrap_sample)
    
  5. aggregate_boot Function:

    import numpy as np
    from xtremes.bootstrap import aggregate_boot
    
    boot_samp = [(np.array([1, 2, 3]), np.array([1, 1, 2])), (np.array([2, 3]), np.array([2, 1]))]
    aggregated_counts = aggregate_boot(boot_samp)
    print("Aggregated counts:", aggregated_counts)
    

References

  • Bücher, A., & Staud, T. (2024). Bootstrapping Estimators based on the Block Maxima Method. arXiv preprint arXiv:2409.05529.