Reference: Bootstrap
This module computes a Bootstrap procedure on disjoint or sliding block maxima.
Overview
The xtremes.bootstrap module provides tools for performing bootstrap procedures on block maxima. It includes classes and functions for extracting block maxima, resampling, and estimating parameters using Maximum Likelihood Estimation (MLE).
Classes
The FullBootstrap Class
- class xtremes.bootstrap.FullBootstrap(initial_sample, bs=10, stride='DBM', dist_type='Frechet')
Bases:
objectA class to perform bootstrapping of Maximum Likelihood Estimates (MLE) for Fréchet or GEV distributions.
This class performs block maxima extraction from an initial sample using either disjoint or sliding blocks. It applies a bootstrap resampling procedure to estimate the variability of the MLE parameters for the specified distribution type (Fréchet or GEV). The bootstrap method is parallelized for efficiency and supports reproducibility through optional seed setting.
Parameters
- initial_samplelist or numpy.ndarray
The initial dataset from which block maxima will be extracted and bootstrapped.
- bsint, optional
Block size for the block maxima extraction. Default is 10.
- stride{‘DBM’, ‘SBM’}, optional
Stride type for block maxima extraction: - ‘DBM’ (Disjoint Block Maxima): Non-overlapping blocks. - ‘SBM’ (Sliding Block Maxima): Overlapping blocks. Default is ‘DBM’.
- dist_type{‘Frechet’, ‘GEV’}, optional
Distribution type to estimate the parameters for: - ‘Frechet’: Estimate parameters for the 2-parametric Fréchet distribution. - ‘GEV’: Estimate parameters for the 3-parametric Generalized Extreme Value (GEV) distribution. Default is ‘Frechet’.
Attributes
- circmaxslist
The block maxima extracted from the initial sample using the specified block size and stride.
- datahos.Data
The hos.Data object containing the original dataset and its MLE results.
- MLEvalsnumpy.ndarray
The MLE estimates from the original dataset before bootstrapping.
- valuesnumpy.ndarray
MLE estimates for each bootstrap sample after running the run_bootstrap method.
- statisticsdict
Dictionary containing summary statistics (mean and standard deviation) of the bootstrap estimates.
Methods
- run_bootstrap(num_bootstraps=100, set_seeds=False, max_workers=1)
Runs the bootstrap procedure in parallel and calculates the MLE estimates for each bootstrap sample.
Example
>>> sample = np.random.rand(100) >>> bootstrap = FullBootstrap(sample, bs=10, stride='DBM', dist_type='Frechet') >>> bootstrap.run_bootstrap(num_bootstraps=100, set_seeds=True, max_workers=4) >>> bootstrap.statistics['mean'] # Mean of bootstrap estimates >>> bootstrap.statistics['std'] # Standard deviation of bootstrap estimates
- get_CI(alpha=0.05, method='bootstrap')
Compute the confidence interval (CI) for the Maximum Likelihood Estimate (MLE) parameters based on bootstrap samples.
Parameters
- alphafloat, optional
Significance level for the confidence interval. Default is 0.05, corresponding to a 95% confidence interval.
- methodstr, optional
Method to compute the confidence interval. Two options are available: - ‘symmetric’: The confidence interval is computed using the symmetric quantiles. - ‘minimal_width’: The confidence interval is computed by finding the minimal-width interval that contains (1 - alpha) proportion of the bootstrap distribution. The default is ‘symmetric’.
Returns
- numpy.ndarray
A 2D array with shape (n_parameters, 2) containing the lower and upper bounds of the confidence interval for each parameter. The first column represents the lower bounds, and the second column represents the upper bounds.
Notes
The confidence intervals are based on bootstrap estimates of the MLE parameters, which means the confidence intervals are derived from the empirical distribution of the parameter estimates obtained from multiple bootstrap samples.
There are two methods available for calculating the confidence intervals: - ‘symmetric’: This method takes the alpha/2 and (1 - alpha/2) quantiles of the bootstrap distribution for each parameter. It is based on the assumption that the distribution is approximately symmetric and works well when the bootstrap distribution is roughly normal. - ‘minimal_width’: This method identifies the interval with the minimal width that contains (1 - alpha) proportion of the bootstrap samples. It is particularly useful when the bootstrap distribution is skewed or not symmetric.
- plot_bootstrap(param_idx=0, param_name='gamma', bins=30, output_file=None, show=True)
Plot the bootstrap distribution for a specified parameter.
Parameters
- param_idxint, optional
Index of the parameter to plot (0 for the first parameter, 1 for the second, etc.). Default is 0.
- binsint, optional
Number of bins to use for the histogram. Default is 30.
Notes
This method generates a histogram of the bootstrap estimates for the specified parameter and overlays the mean and confidence interval.
- run_bootstrap(num_bootstraps=100, set_seeds=False, max_workers=1)
Run the bootstrap resampling procedure in parallel.
This method resamples the block maxima dataset, estimates the MLE parameters for each bootstrap sample, and computes summary statistics (mean and standard deviation) of the bootstrap estimates. The computation is parallelized using ProcessPoolExecutor with an adjustable number of worker processes.
Parameters
- num_bootstrapsint, optional
Number of bootstrap samples to generate. Default is 100.
- set_seedsbool, optional
If True, sets the random seed for reproducibility in each bootstrap iteration. Default is False.
- max_workersint, optional
Maximum number of worker processes to use for parallelization. Default is 1 (no parallelism). Set to None to use all available CPU cores.
Returns
- None
Results are stored in the values attribute and summary statistics in the statistics attribute.
Example
>>> bootstrap.run_bootstrap(num_bootstraps=500, set_seeds=True, max_workers=4) >>> bootstrap.statistics['mean'] # Access the mean of bootstrap estimates >>> bootstrap.statistics['std'] # Access the standard deviation of bootstrap estimates
Functions
The circmax Function
- xtremes.bootstrap.circmax(sample, bs=10, stride='DBM')
Extract the block maxima (BM) from a given sample using different stride methods.
Parameters
- samplenumpy.ndarray
A 1D array containing the sample from which block maxima will be extracted.
- bsint, optional
The block size (number of observations per block) used to divide the sample for block maxima extraction. Default is 10.
- stride{‘DBM’, ‘SBM’}, optional
The stride method used for extracting block maxima: - ‘DBM’ (Disjoint Block Maxima): Extracts maxima from non-overlapping blocks. - ‘SBM’ (Sliding Block Maxima): Extracts maxima using overlapping blocks. Default is ‘DBM’.
Returns
- numpy.ndarray
A 1D or 2D array containing the block maxima extracted from the sample. The result depends on the stride method used: - For ‘DBM’, returns a 1D array of block maxima. - For ‘SBM’, returns a 2D array where each row contains the block maxima extracted from overlapping blocks.
Raises
- ValueError
If an invalid stride method is specified.
Notes
‘DBM’ (Disjoint Block Maxima) extracts block maxima from non-overlapping blocks of size bs.
‘SBM’ (Sliding Block Maxima) creates overlapping blocks, effectively increasing the number of block maxima compared to ‘DBM’.
In the ‘SBM’ setting, the circmax() method introduced by Bücher and Staud 2024 is used.
References
Bücher, A., & Staud, T. (2024). Bootstrapping Estimators based on the Block Maxima Method. arXiv preprint arXiv:2409.05529.
Example
>>> sample = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10]) >>> circmax(sample, bs=5, stride='DBM') array([5, 10])
>>> circmax(sample, bs=3, stride='SBM') array([[3, 6, 9], [4, 7, 10]])
The uniquening Function
- xtremes.bootstrap.uniquening(circmaxs, stride='DBM')
Identify unique values and their counts from a list of arrays.
Parameters
- circmaxsnumpy.ndarray
A NumPy array containing block maxima values extracted from a sample.
Returns
- list of tuples
A list where each element is a tuple containing two NumPy arrays: - The first array contains the unique values from the corresponding row in circmaxs. - The second array contains the counts of each unique value.
The Bootstrap Function
- xtremes.bootstrap.Bootstrap(xx)
Generate a bootstrap sample by resampling with replacement from the input data.
Parameters
- xxlist or numpy.ndarray
The input sample to resample from.
Returns
- list
A new sample of the same size, created by randomly selecting elements from xx with replacement.
Notes
This function creates a bootstrap sample, which is commonly used in statistical resampling methods to estimate the variability of a statistic.
Example
>>> sample = [1, 2, 3, 4, 5] >>> Bootstrap(sample) [2, 5, 3, 1, 2] # Example output, actual result may vary
The aggregate_boot Function
- xtremes.bootstrap.aggregate_boot(boot_samp, stride='DBM')
Aggregate counts of unique values from a list of tuples containing values and their counts.
Parameters
- boot_samplist of tuples
Each tuple contains two arrays: the first with values and the second with corresponding counts.
Returns
- numpy.ndarray
A 2D array with two columns: the first column contains unique values, and the second column contains the aggregated counts.
Example
>>> boot_samp = [(np.array([1, 2, 3]), np.array([1, 1, 2])), (np.array([2, 3]), np.array([2, 1]))] >>> aggregate_boot(boot_samp) array([[1, 1], [2, 3], [3, 3]])
The bootstrap_worker Function
- xtremes.bootstrap.bootstrap_worker(args)
Auxiliary function to perform a single bootstrap resampling and MLE estimation.
This function is designed to be used in parallelized bootstrap procedures. It takes arguments for a single bootstrap iteration, performs resampling on the given block maxima, estimates MLE parameters using the specified distribution type, and returns the results.
Parameters
- argstuple
A tuple containing the following elements: - idx (int): The iteration index, used for setting the random seed if set_seeds is True. - set_seeds (bool): Whether to set the random seed for reproducibility. - circmaxs (list or numpy.ndarray): The block maxima dataset to be resampled. - aggregate_boot (callable): A function to aggregate the resampled data. - ML_estimators_data (callable): A function or class to compute MLE parameters on the aggregated data. - dist_type (str): The distribution type for MLE estimation (‘Frechet’ or ‘GEV’).
Returns
- numpy.ndarray
The MLE parameter estimates for the current bootstrap sample.
Notes
This function is designed to be compatible with ProcessPoolExecutor or other parallel processing tools.
The random seed is set per iteration to ensure reproducibility when set_seeds is True.
Example
>>> args = (0, True, circmaxs, aggregate_boot, ML_estimators_data, 'GEV') >>> bootstrap_worker(args) array([param1, param2, param3]) # Example output for GEV distribution
Examples
Here are some examples of how to use the xtremes.bootstrap module:
FullBootstrap Class:
import numpy as np from xtremes.bootstrap import FullBootstrap sample = np.random.rand(100) bootstrap = FullBootstrap(sample, bs=10, stride='DBM', dist_type='Frechet') bootstrap.run_bootstrap(num_bootstraps=100) print("Mean of bootstrap estimates:", bootstrap.statistics['mean']) print("Standard deviation of bootstrap estimates:", bootstrap.statistics['std'])
circmax Function:
import numpy as np from xtremes.bootstrap import circmax sample = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10]) block_maxima = circmax(sample, bs=5, stride='DBM') print("Block Maxima (DBM):", block_maxima) block_maxima = circmax(sample, bs=3, stride='SBM') print("Block Maxima (SBM):", block_maxima)
uniquening Function:
import numpy as np from xtremes.bootstrap import uniquening circmaxs = np.array([[1, 2, 2, 3], [2, 3, 3, 4]]) unique_values = uniquening(circmaxs) print("Unique values and counts:", unique_values)
Bootstrap Function:
from xtremes.bootstrap import Bootstrap sample = [1, 2, 3, 4, 5] bootstrap_sample = Bootstrap(sample) print("Bootstrap sample:", bootstrap_sample)
aggregate_boot Function:
import numpy as np from xtremes.bootstrap import aggregate_boot boot_samp = [(np.array([1, 2, 3]), np.array([1, 1, 2])), (np.array([2, 3]), np.array([2, 1]))] aggregated_counts = aggregate_boot(boot_samp) print("Aggregated counts:", aggregated_counts)
References
Bücher, A., & Staud, T. (2024). Bootstrapping Estimators based on the Block Maxima Method. arXiv preprint arXiv:2409.05529.