Reference: topt

This module is specialized in implementing Top-t based Maximum Likelihood Estimators.

Overview

The xtremes.topt module provides tools for analyzing higher order statistics and their influence on Maximum Likelihood estimations. It includes classes and functions for handling time series data, extracting block maxima, and performing statistical analysis.

Classes

The TimeSeries Class and its Functionalities

The TimeSeries class is used to handle and manipulate time series data. It provides methods for extracting block maxima and high order statistics.

class xtremes.topt.TimeSeries(n, distr='GEV', correlation='IID', modelparams=[0], ts=0)

Bases: object

TimeSeries class for simulating and analyzing time series data with optional correlation structures.

This class is designed to simulate time series data based on specified distributions and correlation types. It also provides methods to extract block maxima and high order statistics from the simulated data.

Parameters

nint

The length of the time series.

distrstr, optional

The distribution to simulate the time series data from. Default is ‘GEV’ (Generalized Extreme Value).

correlationstr, optional

The type of correlation structure. Options include [‘IID’, ‘ARMAX’, ‘AR’]. Default is ‘IID’ (independent and identically distributed).

modelparamslist, optional

The parameters of the specified distribution. Default is [0].

tsfloat, optional

A parameter for controlling the time series characteristics, particularly for correlated series (e.g., in AR models). Must be in the range [0, 1]. Default is 0.

Attributes

valueslist

A list to store the simulated time series data.

distrstr

The type of distribution used for generating the time series data.

corrstr

The type of correlation structure applied to the time series.

modelparamslist

The parameters of the chosen distribution model.

tsfloat

A parameter controlling the correlation or time series structure, if applicable.

lenint

The length of the time series.

blockmaximalist

A list storing block maxima extracted from the simulated data.

high_order_statslist

A list storing the high order statistics extracted from the simulated data.

Methods

simulate(rep=10, seeds=’default’):

Simulates time series data based on the given distribution and correlation type.

get_blockmaxima(block_size=2, stride=’DBM’, rep=10):

Extracts block maxima from the simulated time series data.

get_HOS(orderstats=2, block_size=2, stride=’DBM’, rep=10):

Extracts high order statistics from the simulated time series data.

Examples

>>> # Create a TimeSeries object
>>> ts = TimeSeries(n=100, distr='GEV', correlation='ARMAX', modelparams=[0.5], ts=0.6)
>>> # Simulate time series data with 5 repetitions and specific seeds
>>> ts.simulate(rep=5, seeds=[42, 123, 456, 789, 1011])
>>> # Extract block maxima using a block size of 5
>>> ts.get_blockmaxima(block_size=5, stride='DBM', rep=5)
>>> # Extract high order statistics (order 3) using the same block size and stride
>>> ts.get_HOS(orderstats=3, block_size=5, stride='DBM', rep=5)
get_ABM_weights(recursively=True)

“Computes weights for MLE with ABM stride

get_HOS(orderstats=2, block_size=2, stride='DBM', rep=10)

Extract high order statistics from simulated time series data.

High order statistics include values such as the second-largest value within each block of the time series.

Parameters

orderstatsint, optional

The order of statistics to extract. Default is 2 (i.e., second-highest value).

block_sizeint, optional

The size of blocks from which to extract the statistics. Default is 2.

stridestr, optional

The stride or step type used for block extraction. Choose from [‘SBM’, ‘DBM’]. Default is ‘DBM’.

repint, optional

The number of repetitions for extracting high order statistics. Should match the number of simulations. Default is 10.

get_blockmaxima(block_size=2, stride='DBM', rep=10)

Extract block maxima from simulated time series data.

Block maxima are the maximum values extracted from blocks of the time series data.

Parameters

block_sizeint, optional

The size of blocks from which to extract maxima. Default is 2.

stride{str, int}, optional

The stride or step type used for block extraction. Choose from [‘SBM’ (Sliding Block Maxima), ‘DBM’ (Disjoint Block Maxima)] or specify an integer as a step size. Default is ‘DBM’.

repint, optional

The number of repetitions for maxima extraction. This should match the number of simulations. Default is 10.

plot(rep=1, filename=None)

Plot the simulated time series data.

This method generates a plot showing the simulated time series data. The user can choose to display data from specific repetitions or all repetitions.

Parameters

repint or list, optional

The repetition number(s) to plot. If 0, all repetitions are plotted. If a list is provided, only the specified repetitions are plotted. Default is 1.

filenamestr, optional

The name of the PNG file to save the plot. If None, the plot is displayed but not saved. Default is None.

plot_blockmaxima(rep=1, filename=None, plot_type='line')

Plot the extracted block maxima.

This method generates a plot showing the extracted block maxima from the simulated time series data. The user can choose to display block maxima from specific repetitions or all repetitions. The plot can be either a line plot or a histogram.

Parameters

repint or list, optional

The repetition number(s) to plot. If 0, all repetitions are plotted. If a list is provided, only the specified repetitions are plotted. Default is 1.

filenamestr, optional

The name of the PNG file to save the plot. If None, the plot is displayed but not saved. Default is None.

plot_typestr, optional

The type of plot to generate. Options are ‘line’ for a line plot and ‘hist’ for a histogram. Default is ‘line’.

plot_with_blockmaxima(rep=1, plotlim=300, filename=None)

Plot the simulated time series data along with block maxima.

This method generates a plot showing the simulated time series data along with the block maxima. The user can choose to display data from specific repetitions or all repetitions.

Parameters

repint, optional

The repetition number to plot. Default is 1.

plotlimint, optional

The limit for the number of data points to plot. Default is 300.

filenamestr, optional

The name of the PNG file to save the plot. If None, the plot is displayed but not saved. Default is None.

simulate(rep=10, seeds='default')

Simulate time series data.

This method generates time series data based on the specified distribution, correlation type, and other parameters.

Parameters

repint, optional

The number of repetitions (simulations) to run. Default is 10.

seeds{str, list}, optional

Seed(s) for random number generation. If ‘default’, a sequence of seeds is automatically generated based on the number of repetitions. If a list of seeds is provided, it must match the number of repetitions (rep). Default is ‘default’.

Raises

ValueError

If the number of provided seeds does not match the number of repetitions.

Examples

  1. Extract Block Maxima:

    from xtremes.topt import TimeSeries
    import numpy as np
    
    ts_data = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15])
    ts = TimeSeries(ts_data)
    block_maxima = ts.extract_BM(block_size=5, stride='DBM')
    print("Block Maxima:", block_maxima)
    
  2. Extract High Order Statistics:

    from xtremes.topt import TimeSeries
    import numpy as np
    
    ts_data = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15])
    ts = TimeSeries(ts_data)
    high_order_stats = ts.extract_HOS(orderstats=3, block_size=5, stride='DBM')
    print("High Order Statistics:", high_order_stats)
    

The HighOrderStats Class and its Functionalities

The HighOrderStats class is used to compute and analyze higher order statistics from the time series data. It provides methods for calculating log-likelihoods and performing Maximum Likelihood Estimation (MLE).

class xtremes.topt.HighOrderStats(TimeSeries)

Bases: object

HighOrderStats class for calculating and analyzing high-order statistics of time series data.

Notes

This class provides functionality for calculating Probability Weighted Moment (PWM) estimators and Maximum Likelihood (ML) estimators from a given TimeSeries object.

Methods

get_PWM_estimation():

Calculate the PWM estimators for the time series data.

get_ML_estimation(initParams=’auto’, FrechetOrGEV = ‘Frechet’, option=1, estimate_pi=False):

Calculate the ML estimators for the time series data.

Attributes

TimeSeriesTimeSeries

The TimeSeries object containing the time series data.

high_order_statslist

List of high-order statistics extracted from the TimeSeries object.

blockmaximalist

List of block maxima derived from the high-order statistics.

gamma_truefloat

True gamma parameter of the GEV distribution derived from the TimeSeries object.

PWM_estimatorsPWM_estimators

Instance of PWM_estimators class for calculating PWM estimators.

ML_estimatorsML_estimators

Instance of ML_estimators class for calculating ML estimators.

Example

>>> # Create a TimeSeries object
>>> ts = TimeSeries(n=100, distr='GEV', correlation='ARMAX', modelparams=[0.5], ts=0.6)
>>> # Simulate time series data
>>> ts.simulate(rep=5, seeds=[42, 123, 456, 789, 1011])
>>> # Initialize HighOrderStats object
>>> hos = HighOrderStats(ts)
>>> # Calculate PWM estimators
>>> hos.get_PWM_estimation()
>>> # Calculate ML estimators
>>> hos.get_ML_estimation(initParams='auto', r=1)
get_ML_estimation(initParams='auto', r=None, FrechetOrGEV='Frechet')

Calculate the Maximum Likelihood (ML) estimators.

Parameters

initParamsstr or array-like, optional

Method for initializing parameters. Default is ‘auto’, which uses automatic parameter initialization.

rint, optional

Number of order statistics to calculate the log-likelihood on. If not specified, use all provided.

FrechetOrGEVstr, optional

Whether to fit the Frechet or GEV distribution.

Notes

This function performs maximum likelihood estimation based on either the Frechet or GEV distribution.

get_PWM_estimation()

Calculate the Probability Weighted Moment (PWM) estimators.

Examples

  1. Log Likelihood:

    from xtremes.topt import HighOrderStats
    import numpy as np
    
    hos_data = np.array([[0.1, 0.2], [0.3, 0.4], [0.2, 0.5], [0.4, 0.6]])
    hos = HighOrderStats(hos_data)
    log_likelihood = hos.log_likelihood(gamma=0.5, mu=0, sigma=2, r=2)
    print("Log Likelihood:", log_likelihood)
    
  2. Frechet Log Likelihood:

    from xtremes.topt import HighOrderStats
    import numpy as np
    
    hos_data = np.array([[0.5, 1.0], [1.5, 2.0], [1.2, 2.2], [2.0, 3.0]])
    hos = HighOrderStats(hos_data)
    frechet_log_likelihood = hos.Frechet_log_likelihood(alpha=2, sigma=1.5, r=2)
    print("Frechet Log Likelihood:", frechet_log_likelihood)
    

The Data Class and its Functionalities

The Data class is used to handle and manipulate real data for analysis.

class xtremes.topt.Data(values)

Bases: object

A class for analyzing data with block maxima and high order statistics.

This class provides methods to extract block maxima and high order statistics from a given dataset. It also supports Maximum Likelihood (ML) estimation for the parameters of either the Frechet or GEV distributions.

Parameters

valueslist or numpy.ndarray

The dataset from which block maxima and high order statistics are extracted. This should be a 1D array or list of values representing the time series or data.

Attributes

valueslist or numpy.ndarray

The dataset on which operations are performed.

lenint

The length of the dataset.

blockmaximalist

List to store the block maxima extracted from the dataset.

bm_indiceslist

List of indices corresponding to the positions of the block maxima in the original dataset.

high_order_statslist

List to store the high order statistics extracted from the dataset.

ML_estimatorsML_estimators_data

Object that stores and handles the ML estimation results for the Frechet or GEV parameters.

Methods

get_blockmaxima(block_size=2, stride=’DBM’):

Extracts block maxima from the dataset.

get_HOS(orderstats=2, block_size=2, stride=’DBM’):

Extracts high order statistics from the dataset.

get_ML_estimation(r=None, FrechetOrGEV=’GEV’):

Computes ML estimations for the Frechet or GEV parameters.

Example

>>> # Initialize the Data object with a dataset
>>> data = Data([2.5, 3.1, 1.7, 4.6, 5.3, 2.2, 6.0])
>>> # Extract block maxima using a block size of 2
>>> data.get_blockmaxima(block_size=2, stride='DBM')
>>> print(data.blockmaxima)
>>> # Extract second-highest order statistics (HOS) with the same block size and stride
>>> data.get_HOS(orderstats=2, block_size=2, stride='DBM')
>>> print(data.high_order_stats)
>>> # Perform ML estimation using the extracted HOS, choosing between Frechet and GEV
>>> data.get_ML_estimation(FrechetOrGEV='GEV')
>>> print(data.ML_estimators.values)
get_ABM_weights(recursively=True)

“Computes weights for MLE with ABM stride

get_HOS(orderstats=2, block_size=2, stride='DBM')

Extract high order statistics (HOS) from the dataset.

High order statistics refer to statistics of interest beyond the maximum (e.g., second-highest, third-highest). This method extracts these statistics from blocks of the dataset.

Parameters

orderstatsint, optional

The order of the statistic to extract (e.g., 2 for second-highest). Default is 2.

block_sizeint, optional

The size of blocks from which to extract the statistics. Default is 2.

stridestr or int, optional

The type of stride to use when extracting blocks. Choose from: - ‘SBM’: Sliding Block Maxima (step size 1) - ‘DBM’: Disjoint Block Maxima (non-overlapping blocks) - int: Specifies the step size directly. Default is ‘DBM’.

Returns

None

The high order statistics are stored in the high_order_stats attribute.

get_ML_estimation(r=None, FrechetOrGEV='GEV', bounds=None)

Compute Maximum Likelihood (ML) estimations for the Frechet or GEV parameters.

This method computes ML estimators for the parameters of either the Frechet or GEV distribution based on the high order statistics extracted from the data. If no high order statistics are available, it will first extract them.

Parameters

rint, optional

The number of order statistics to use in the ML estimation. If not specified, it uses all the extracted statistics.

FrechetOrGEVstr, optional

The type of distribution to use for the ML estimation. Choose between ‘Frechet’ and ‘GEV’. Default is ‘GEV’.

Returns

None

The ML estimators are stored in the ML_estimators attribute.

get_blockmaxima(block_size=2, stride='DBM')

Extract block maxima from the dataset.

Block maxima are the largest values in each block of the dataset, where the block size and the stride (step size) determine how the blocks are divided.

Parameters

block_sizeint, optional

The size of blocks for maxima extraction. Default is 2.

stridestr or int, optional

The type of stride to use when extracting blocks. Choose from: - ‘SBM’: Sliding Block Maxima (step size 1) - ‘DBM’: Disjoint Block Maxima (non-overlapping blocks) - int: Specifies the step size directly. Default is ‘DBM’.

Returns

None

The block maxima and their corresponding indices are stored in the blockmaxima and bm_indices attributes.

The PWM_estimators Class

The PWM_estimators class is used to compute Probability Weighted Moment (PWM) estimators.

class xtremes.topt.PWM_estimators(TimeSeries)

Bases: object

Calculates Probability Weighted Moment (PWM) estimators from block maxima and computes statistics and confidence intervals for the GEV parameters.

Notes

This class provides methods to compute the Probability Weighted Moment (PWM) estimators and convert them into Generalized Extreme Value (GEV) distribution parameters (gamma, mu, sigma). It also allows users to compute confidence intervals for the estimated parameters and assess the statistical properties (mean, variance, bias, mean squared error) of the estimators compared to a true gamma value. Visualization options are provided to plot the estimators and confidence intervals.

Parameters

param TimeSeries:

TimeSeries object TimeSeries object containing block maxima or high order statistics.

Attributes

attribute blockmaxima:

numpy.ndarray Array of block maxima extracted from the TimeSeries object.

attribute values:

numpy.ndarray Array containing the PWM estimators (\(\gamma, \mu, \sigma\)) for each block maxima series.

attribute statistics:

dict Dictionary containing statistics (mean, variance, bias, mse) and confidence intervals of the PWM estimators.

Methods

method get_PWM_estimation():

Compute PWM estimators for each block maxima series and convert them into GEV parameters.

method get_statistics(gamma_true):

Compute statistics of the PWM estimators using a true gamma value.

method get_CIs(alpha=0.05, method=’symmetric’):

Compute confidence intervals (CIs) for the GEV parameters using either symmetric quantiles or minimal width intervals.

method plot(param=’gamma’, show_CI=True, show_true=True, filename=None):

Plot the PWM estimators for the GEV parameters (gamma, mu, sigma) with options to display confidence intervals and save the plot as an image.

Raises

raises ValueError:

If block maxima or high order statistics are not available in the TimeSeries object.

Examples

>>> from TimeSeries import TimeSeries
>>> from PWM_estimators import PWM_estimators
>>> ts = TimeSeries(data)  # Initialize TimeSeries object with data
>>> ts.get_blockmaxima(block_size=10, stride='SBM')  # Extract block maxima
>>> pwm = PWM_estimators(ts)  # Initialize PWM_estimators object
>>> pwm.get_PWM_estimation()  # Compute PWM estimators
>>> pwm.get_statistics(0.1)  # Compute statistics with true gamma value 0.1
>>> pwm.get_CIs(alpha=0.05, method='minimal_width')  # Compute confidence intervals
>>> pwm.plot(param='gamma', show_CI=True, show_true=True, filename='PWM_plot.png')  # Plot the results and save as image
>>> print(pwm.statistics)  # Print computed statistics
{'gamma_mean': 0.25, 'gamma_variance': 0.005, 'gamma_bias': 0.02, 'gamma_mse': 0.01,
 'mu_mean': 1.15, 'mu_variance': 0.04, 'sigma_mean': 0.9, 'sigma_variance': 0.02,
 'gamma_CI': (0.2, 0.5), 'mu_CI': (1.1, 1.5), 'sigma_CI': (0.8, 1.0)}
get_CIs(alpha=0.05, method='symmetric')

Compute confidence intervals (CIs) for the GEV parameters using different methods.

Notes

This function calculates confidence intervals (CIs) for the Generalized Extreme Value (GEV) parameters (gamma, mu, and sigma) estimated from Probability-Weighted Moments (PWM). The user can choose between two methods for computing the confidence intervals:

  • ‘symmetric’: This method uses the quantiles of the distribution of parameter estimates to compute symmetric confidence intervals.

  • ‘minimal_width’: This method finds the interval with minimal width that contains the desired proportion (1 - alpha) of the sorted parameter estimates.

For each block maxima series, confidence intervals for the GEV shape (gamma), location (mu), and scale (sigma) parameters are calculated. The results are stored in the self.statistics dictionary, with keys ‘gamma_CI’, ‘mu_CI’, and ‘sigma_CI’ corresponding to the computed confidence intervals.

Parameters

param alpha:

float, optional Significance level for the confidence intervals (default is 0.05, for a 95% CI).

param method:

str, optional Method for computing the confidence intervals. Options are: - ‘symmetric’: Uses quantiles to compute symmetric CIs (default). - ‘minimal_width’: Computes the minimal width interval containing (1 - alpha) of the estimates.

Example

>>> estimator = PWM_Estimators(timeseries_data)
>>> estimator.get_PWM_estimation()
>>> estimator.get_CIs(alpha=0.05, method='minimal_width')
>>> print(estimator.statistics)
{'gamma_CI': (0.2, 0.5), 'mu_CI': (1.1, 1.5), 'sigma_CI': (0.8, 1.0)}

Returns

None

The results are stored in self.statistics, which contains the confidence intervals for each GEV parameter.

get_PWM_estimation()

Compute Probability-Weighted Moment (PWM) estimators and convert them into GEV parameters.

Notes

This function iterates over each block maxima series, computes the Probability-Weighted Moments (PWMs), and converts the PWMs into the Generalized Extreme Value (GEV) distribution parameters: shape (gamma), location (mu), and scale (sigma). The function utilizes the misc.PWM_estimation function to calculate the PWMs and then applies misc.PWM2GEV to convert these moments into GEV parameters.

The results for each block maxima series are stored in the self.values attribute, which is a NumPy array where each row corresponds to the GEV parameters [gamma, mu, sigma] for a specific block maxima series.

This function clears any previously stored values in self.values before appending new estimates.

Parameters

None

Example

>>> estimator = PWM_estimators(timeseries_data)
>>> estimator.get_PWM_estimation()
>>> print(estimator.values)
array([[0.2, 1.1, 0.8],
    [0.3, 1.2, 0.9]])

Returns

None

The results are stored in self.values, a NumPy array containing the estimated GEV parameters for each block maxima series.

get_statistics(gamma_true)

Compute statistics of the PWM estimators using the true gamma value.

Notes

This function calculates various statistical measures (mean, variance, bias, and mean squared error) for the estimated Generalized Extreme Value (GEV) shape parameter (gamma) compared to a provided true value (gamma_true). It also computes the mean and variance for the location (mu) and scale (sigma) parameters across all block maxima series.

  • Mean Squared Error (MSE): Measures the average of the squares of the differences between the estimated and true gamma values.

  • Bias: Represents the systematic deviation of the estimated gamma values from the true gamma value.

  • Variance: Describes the spread of the estimated gamma values around their mean.

If only one block maxima series is available, a warning is raised since variance cannot be computed.

The computed statistics are stored in the self.statistics dictionary with the following keys: - ‘gamma_mean’, ‘gamma_variance’, ‘gamma_bias’, ‘gamma_mse’ - ‘mu_mean’, ‘mu_variance’ - ‘sigma_mean’, ‘sigma_variance’

Parameters

param gamma_true:

float The true value of the GEV shape parameter (gamma) used to compute bias and MSE.

Example

>>> estimator = PWM_estimators(timeseries_data)
>>> estimator.get_PWM_estimation()
>>> estimator.get_statistics(gamma_true=0.2)
>>> print(estimator.statistics)
{'gamma_mean': 0.25, 'gamma_variance': 0.005, 'gamma_bias': 0.02, 'gamma_mse': 0.01,
'mu_mean': 1.15, 'mu_variance': 0.04, 'sigma_mean': 0.9, 'sigma_variance': 0.02}

Returns

None

The results are stored in self.statistics, containing the calculated statistics for gamma, mu, and sigma.

plot(param='gamma', show_CI=True, show_true=True, filename=None)

Plot the PWM estimators and confidence intervals for the GEV parameters.

Notes

This function generates a plot showing the Probability-Weighted Moment (PWM) estimators for the Generalized Extreme Value (GEV) parameters (gamma, mu, sigma) computed from block maxima. The user can choose to display confidence intervals (CIs) for each parameter.

The plot is saved as a PNG image if the save parameter is set to True.

Parameters

param param:

str, optional GEV parameter to plot (default is ‘gamma’).

param show_CI:

bool, optional Flag indicating whether to display confidence intervals (default is True).

param show:

bool, optional Flag indicating whether to display the plot.

param save:

bool, optional Flag indicating whether to save the plot as a PNG image (default is False).

param filename:

str, optional Name of the PNG file to save the plot (default is None).

Example

>>> estimator = PWM_estimators(timeseries_data)
>>> estimator.get_PWM_estimation()
>>> estimator.get_CIs(alpha=0.05, method='minimal_width')
>>> estimator.plot(show_CI=True, show_true=True, save=True, filename='PWM_estimation.png')

Returns

None

The plot is displayed in the console and saved as a PNG image if the save parameter is set to True.

xtremes.topt.automatic_parameter_initialization(PWM_estimators, corr, ts=0.5)

Automatic parameter initialization for ML estimation.

Notes

This function is designed for initializing parameters for maximum likelihood estimation (ML) in statistical models. It automatically computes the probability weighted moment (PWM) estimators and adjusts them based on the specified correlation type (‘ARMAX’, ‘IID’, etc.). The ‘ts’ parameter is used to control the strength of temporal dependence in the model.

Parameters

param PWM_estimators:

list or numpy.array Probability weighted moment estimators.

param corr:

str Correlation type for the model. Supported values are ‘ARMAX’, ‘IID’, etc.

param ts:

float, optional Time series parameter controlling the strength of temporal dependence (default is 0.5).

return:

numpy.ndarray Initial parameters for ML estimation.

See also

misc.PWM_estimation : Function for computing probability weighted moment estimators.

Examples

>>> from test_xtremes import misc
>>> PWM_est = misc.PWM_estimators(data)
>>> init_params = automatic_parameter_initialization(PWM_est, 'ARMAX', ts=0.8)

The ML_estimators, Frechet_ML_estimators and ML_estimators_data Classes

The ML_estimators, Frechet_ML_estimators, and ML_estimators_data classes are used for performing Maximum Likelihood Estimation (MLE) and handling the results.

class xtremes.topt.ML_estimators(TimeSeries)

Bases: object

Maximum Likelihood Estimators (MLE) for GEV parameters.

This class calculates Maximum Likelihood Estimators (MLE) for the parameters of the Generalized Extreme Value (GEV) distribution using the method of maximum likelihood estimation.

Parameters

TimeSeriesTimeSeries

The TimeSeries object containing the data for which MLE estimators will be calculated.

Attributes

valuesnumpy.ndarray

An array containing the MLE estimators for each set of high order statistics.

statisticsdict

A dictionary containing statistics computed from the MLE estimators.

Methods

__len__()

Returns the number of MLE estimators calculated.

get_ML_estimation(PWM_estimators=None, initParams=’auto’, option=1, estimate_pi=False)

Computes the MLE estimators for the GEV parameters.

get_statistics(gamma_true)

Computes statistics from the MLE estimators.

Examples

>>> import numpy as np
>>> from hos import TimeSeries, ML_estimators, PWM_estimators
>>> blockmaxima_data = np.random.normal(loc=10, scale=2, size=100)
>>> high_order_stats_data = np.random.normal(loc=5, scale=1, size=(100, 3))
>>> ts = TimeSeries(blockmaxima=blockmaxima_data, high_order_stats=high_order_stats_data, corr='IID', ts=0.5)
>>> pwm = PWM_estimators(ts)
>>> pwm.get_PWM_estimation()
>>> ml = ML_estimators(ts)
>>> ml.get_ML_estimation(PWM_estimators=pwm)
>>> ml.get_statistics(gamma_true=0.1)
>>> print("ML Estimators:")
>>> print(ml.values)
>>> print("\nStatistics:")
>>> print(ml.statistics)
get_CIs(alpha=0.05, method='symmetric')

Compute confidence intervals (CIs) for the GEV parameters using different methods.

Notes

This function calculates confidence intervals (CIs) for the Generalized Extreme Value (GEV) parameters (gamma, mu, and sigma) estimated from Maximum Likelihood Estimators (MLE). The user can choose between two methods for computing the confidence intervals:

  • ‘symmetric’: This method uses the quantiles of the distribution of parameter estimates to compute symmetric confidence intervals.

  • ‘minimal_width’: This method finds the interval with minimal width that contains the desired proportion (1 - alpha) of the sorted parameter estimates.

For each block maxima series, confidence intervals for the GEV shape (gamma), location (mu), and scale (sigma) parameters are calculated. The results are stored in the self.statistics dictionary, with keys ‘gamma_CI’, ‘mu_CI’, and ‘sigma_CI’ corresponding to the computed confidence intervals.

Parameters

param alpha:

float, optional Significance level for the confidence intervals (default is 0.05, for a 95% CI).

param method:

str, optional Method for computing the confidence intervals. Options are: - ‘symmetric’: Uses quantiles to compute symmetric CIs (default). - ‘minimal_width’: Computes the minimal width interval containing (1 - alpha) of the estimates.

Example

>>> estimator = ML_stimator(timeseries_data)
>>> estimator.get_PWM_estimation()
>>> estimator.get_CIs(alpha=0.05, method='minimal_width')
>>> print(estimator.statistics)
{'gamma_CI': (0.2, 0.5), 'mu_CI': (1.1, 1.5), 'sigma_CI': (0.8, 1.0)}

Returns

None

The results are stored in self.statistics, which contains the confidence intervals for each GEV parameter.

get_ML_estimation(PWM_estimators=None, initParams='auto', r=None)

Compute Maximum Likelihood (ML) estimators for each series of high order statistics within the ML_estimators class.

This method fits the Generalized Extreme Value (GEV) distribution to each series of high order statistics using Maximum Likelihood Estimation (MLE) by optimizing the log-likelihood function.

Parameters

PWM_estimatorsPWM_estimators object, optional

An object containing PWM (Probability Weighted Moments) estimators. This is used for initializing the parameters in the ML estimation if initParams is set to ‘auto’. Required if initParams is ‘auto’.

initParamsstr or numpy.ndarray, optional

Initial parameters for the ML estimation. If ‘auto’, the initial parameters will be computed automatically using the PWM_estimators object. If a NumPy array is provided, these will be used as initial parameter values. Default is ‘auto’.

rint, optional

The number of order statistics to calculate the log-likelihood on. If not specified, all provided order statistics will be used.

Returns

None

This method updates the self.values attribute of the ML_estimators object with the estimated parameters (gamma, mu, sigma) for each series of high order statistics.

Raises

ValueError

If initParams is set to ‘auto’ and no PWM_estimators are provided, a ValueError is raised.

Notes

  • This method performs Maximum Likelihood Estimation (MLE) to fit the Generalized Extreme Value (GEV) distribution to the high order statistics within the ML_estimators class.

  • The method uses optimization techniques such as Nelder-Mead (and optionally COBYLA) to minimize the negative log-likelihood.

  • If initParams is set to ‘auto’, the initial parameters for the optimization are derived using the PWM_estimators object.

  • The optimization results (gamma, mu, sigma) are stored in the self.values list for each series of high order statistics.

get_statistics(gamma_true)

Compute statistics of the ML estimators using a true \(\gamma\) value.

Parameters

param gamma_true:

float True \(\gamma\) value for calculating statistics.

plot(param='gamma', show_CI=True, show_true=True, filename=None)

Plot the ML estimators and confidence intervals for the GEV parameters.

Notes

This function generates a plot showing the Maximum Likelihood estimators for the Generalized Extreme Value (GEV) parameters (gamma, mu, sigma) computed from block maxima. The user can choose to display confidence intervals (CIs) for each parameter

The plot is saved as a PNG image if the save parameter is set to True.

Parameters

param param:

str, optional GEV parameter to plot (default is ‘gamma’).

param show_CI:

bool, optional Flag indicating whether to display confidence intervals (default is True).

param show:

bool, optional Flag indicating whether to display the plot.

param save:

bool, optional Flag indicating whether to save the plot as a PNG image (default is False).

param filename:

str, optional Name of the PNG file to save the plot (default is None).

Example

>>> estimator = ML_estimators(timeseries_data)
>>> estimator.get_ML_estimation()
>>> estimator.get_CIs(alpha=0.05, method='minimal_width')
>>> estimator.plot(show_CI=True, show_true=True, save=True, filename='PWM_estimation.png')

Returns

None

The plot is displayed in the console and saved as a PNG image if the save parameter is set to True.

class xtremes.topt.Frechet_ML_estimators(TimeSeries)

Bases: object

Maximum Likelihood Estimators (MLE) for Frechet parameters.

This class calculates Maximum Likelihood Estimators (MLE) for the parameters of the 2-parameter Frechet distribution using the method of maximum likelihood estimation on a series of high order statistics.

Parameters

TimeSeriesTimeSeries

The TimeSeries object containing the data (high order statistics) for which MLE estimators will be calculated.

Attributes

valuesnumpy.ndarray

An array containing the MLE estimators (alpha, sigma) for each set of high order statistics.

statisticsdict

A dictionary containing computed statistics from the MLE estimators such as mean, variance, bias, and MSE.

Methods

__len__()

Returns the number of MLE estimators calculated.

get_ML_estimation(PWM_estimators=None, initParams=’auto’, r=None)

Computes the MLE estimators for the Frechet parameters (alpha, sigma).

get_statistics(alpha_true)

Computes statistics (mean, variance, bias, and MSE) of the MLE estimators using a true alpha value.

Examples

>>> import numpy as np
>>> from hos import TimeSeries, Frechet_ML_estimators, PWM_estimators
>>> blockmaxima_data = np.random.normal(loc=10, scale=2, size=100)
>>> high_order_stats_data = np.random.normal(loc=5, scale=1, size=(100, 3))
>>> ts = TimeSeries(blockmaxima=blockmaxima_data, high_order_stats=high_order_stats_data, corr='IID', ts=0.5)
>>> pwm = PWM_estimators(ts)
>>> pwm.get_PWM_estimation()
>>> ml = Frechet_ML_estimators(ts)
>>> ml.get_ML_estimation(PWM_estimators=pwm)
>>> ml.get_statistics(alpha_true=0.1)
>>> print("ML Estimators:")
>>> print(ml.values)
>>> print("\nStatistics:")
>>> print(ml.statistics)
get_CIs(alpha=0.05, method='symmetric')

Compute confidence intervals (CIs) for the Frechet parameters using different methods.

Notes

This function calculates confidence intervals (CIs) for the Frechet parameters (alpha and sigma) estimated from Maximum Likelihood Estimators (MLE). The user can choose between two methods for computing the confidence intervals:

  • ‘symmetric’: This method uses the quantiles of the distribution of parameter estimates to compute symmetric confidence intervals.

  • ‘minimal_width’: This method finds the interval with minimal width that contains the desired proportion (1 - alpha) of the sorted parameter estimates.

For each block maxima series, confidence intervals for the shape (alpha) and scale (sigma) parameters are calculated. The results are stored in the self.statistics dictionary, with keys ‘gamma_CI’, ‘mu_CI’, and ‘sigma_CI’ corresponding to the computed confidence intervals.

Do not get confused with alpha being the significance level as well as the shape parameter of the Frechet distribution.

Parameters

param alpha:

float, optional Significance level for the confidence intervals (default is 0.05, for a 95% CI).

param method:

str, optional Method for computing the confidence intervals. Options are: - ‘symmetric’: Uses quantiles to compute symmetric CIs (default). - ‘minimal_width’: Computes the minimal width interval containing (1 - alpha) of the estimates.

Example

>>> estimator = ML_stimator(timeseries_data)
>>> estimator.get_PWM_estimation()
>>> estimator.get_CIs(alpha=0.05, method='minimal_width')
>>> print(estimator.statistics)

Returns

None

The results are stored in self.statistics, which contains the confidence intervals for each GEV parameter.

get_ML_estimation(PWM_estimators=None, initParams='auto', r=None, weights=None)

Compute ML estimators (alpha, sigma) for each high order statistics series using Frechet distribution.

Parameters

PWM_estimatorsPWM_estimators object, optional

PWM_estimators object containing PWM estimators for initializing parameters. Required if initParams is set to ‘auto’.

initParamsstr or numpy.ndarray, optional

Initial parameters for ML estimation. ‘auto’ to calculate automatically using PWM estimators. If a numpy array is provided, these will be used as initial parameter values. Default is ‘auto’.

rint, optional

Number of order statistics to calculate the log-likelihood on. If not specified, all provided order statistics will be used.

Returns

None

Updates the self.values attribute with the estimated parameters (alpha, sigma) for each series of high order statistics.

Raises

ValueError

If initParams is set to ‘auto’ and no PWM_estimators are provided.

get_statistics(alpha_true)

Compute statistics (mean, variance, bias, and MSE) of the ML estimators using the true \(\alpha\) value.

Parameters

alpha_truefloat

The true value of \(\alpha\) to calculate bias, MSE, and other statistics.

Returns

None

Updates the self.statistics dictionary with calculated statistics such as mean, variance, bias, and MSE for both \(\alpha\) and \(\sigma\).

Notes

The statistics include: - Mean and variance for the estimated alpha and sigma values. - Bias and mean squared error (MSE) for the alpha estimates.

plot(param='alpha', show_CI=True, show_true=True, filename=None)

Plot the ML estimators and confidence intervals for the GEV parameters.

Notes

This function generates a plot showing the Maximum Likelihood estimators for the Generalized Extreme Value (GEV) parameters (gamma, mu, sigma) computed from block maxima. The user can choose to display confidence intervals (CIs) for each parameter

The plot is saved as a PNG image if the save parameter is set to True.

Parameters

param param:

str, optional GEV parameter to plot (default is ‘gamma’).

param show_CI:

bool, optional Flag indicating whether to display confidence intervals (default is True).

param show:

bool, optional Flag indicating whether to display the plot.

param save:

bool, optional Flag indicating whether to save the plot as a PNG image (default is False).

param filename:

str, optional Name of the PNG file to save the plot (default is None).

Example

>>> estimator = PWM_estimators(timeseries_data)
>>> estimator.get_PWM_estimation()
>>> estimator.get_CIs(alpha=0.05, method='minimal_width')
>>> estimator.plot(show_CI=True, show_true=True, save=True, filename='PWM_estimation.png')

Returns

None

The plot is displayed in the console and saved as a PNG image if the save parameter is set to True.

xtremes.topt.log_likelihood(high_order_statistics, gamma=0, mu=0, sigma=1, r=None)

Calculate the GEV log likelihood based on the two highest order statistics.

Parameters

high_order_statisticsnumpy.ndarray

A 2D array where each row contains the two highest order statistics for each observation.

gammafloat, optional

The shape parameter (γ) for the Generalized Extreme Value (GEV) distribution. Default is 0.

mufloat, optional

The location parameter (μ) for the GEV distribution. Default is 0.

sigmafloat, optional

The scale parameter (σ) for the GEV distribution. Must be positive. Default is 1.

rint, optional

The number of order statistics to calculate the log-likelihood on. If not specified, it uses all provided statistics.

Returns

float

The calculated log likelihood.

Notes

  • This function computes the log likelihood using the two highest order statistics and supports both the classical Gumbel case (γ = 0) and the generalized case (γ ≠ 0).

  • The high_order_statistics array should be structured with the two highest order statistics per observation as rows.

  • The shape parameter γ controls the tail behavior of the distribution. When γ = 0, the distribution becomes the Gumbel type.

  • The r parameter controls how many order statistics are used for the likelihood calculation, typically r=2 for two order statistics.

Example

>>> hos = np.array([[0.1, 0.2], [0.3, 0.4], [0.2, 0.5], [0.4, 0.6]])
>>> log_likelihood(hos, gamma=0.5, mu=0, sigma=2, r=2)
-7.494890426732856
xtremes.topt.Frechet_log_likelihood(high_order_statistics, alpha=1, sigma=1, r=None, weights=None)

Calculate the 2-parameter Frechet log likelihood based on the highest order statistics. The calculation can be done using either the joint likelihood of the top two order statistics or the product of their marginals.

Parameters

high_order_statisticsnumpy.ndarray

A 2D array where each row contains the two highest order statistics for each observation.

alphafloat, optional

The shape parameter (α) for the Frechet distribution. Default is 1. Controls the tail behavior of the distribution.

sigmafloat, optional

The scale parameter (σ) for the Frechet distribution. Must be positive. Default is 1.

rint, optional

The number of order statistics to calculate the log-likelihood on. If not specified, it uses all provided statistics.

weightsnumpy.ndarray, optional

ABM as stride provides weights. If they are provided, use them instead of counts

Returns

float

The calculated log likelihood for the given data under the Frechet distribution.

Notes

  • This function computes the log likelihood using the two highest order statistics from each observation, with a focus on the Frechet distribution.

  • The shape parameter alpha determines the heaviness of the tail in the distribution, and the scale parameter sigma must be strictly positive.

  • The high_order_statistics array should be structured such that each row represents the two highest order statistics for an observation.

  • The function can either calculate the joint likelihood of the top two order statistics or consider the product of their marginals, depending on the values used.

Example

>>> hos = np.array([[0.5, 1.0], [1.5, 2.0], [1.2, 2.2], [2.0, 3.0]])
>>> Frechet_log_likelihood(hos, alpha=2, sigma=1.5, r=2)
-15.78467219003245

Running Extensive Simulations

The xtremes.topt module also provides functions for running extensive simulations and performing multiple MLEs.

xtremes.topt.run_ML_estimation(file, corr='IID', gamma_true=0, block_sizes=[5, 10, 15, 20, 25, 30, 35, 40, 45, 50], stride='SBM', option=1, estimate_pi=False)

Run maximum likelihood estimation (ML) for a given time series file.

Notes

This function reads a time series from a file and performs ML estimation for the specified correlation type, GEV shape parameter, block sizes, and stride type. It iterates over each block size, extracts block maxima, computes high order statistics, and performs ML estimation. The results are stored in a dictionary with the GEV shape parameter as the key and ML estimation results for each block size as the value.

Parameters

param file:

str Path to the file containing the time series data.

param corr:

str, optional Correlation type for the model (default is ‘IID’).

param gamma_true:

float, optional True value of the GEV shape parameter (default is 0).

param block_sizes:

list, optional List of block sizes for extracting block maxima (default is [5, 10, 15, 20, 25, 30, 35, 40, 45, 50]).

param stride:

str, optional Stride type for block maxima extraction. Options are ‘SBM’ (sliding block maxima) or ‘DBM’ (default block maxima) (default is ‘SBM’).

param r:

int, optional Number of orderstatistics to calculate the log-likelihood on. If not specified, use all provided

param estimate_pi:

bool, optional Flag indicating whether to estimate the pi parameter (default is False).

Example

>>> result = run_ML_estimation("timeseries_data.pkl", corr='ARMAX', gamma_true=0.5, block_sizes=[10, 20, 30], stride='DBM', option=2, estimate_pi=True)
>>> print(result)
{0.5: {10: HighOrderStats_object, 20: HighOrderStats_object, 30: HighOrderStats_object}}

Returns

return:

dict Dictionary containing ML estimation results for each block size.

xtremes.topt.run_multiple_ML_estimations(file, corr='IID', gamma_trues=array([-0.4, -0.3, -0.2, -0.1, 0., 0.1, 0.2, 0.3, 0.4]), block_sizes=[5, 10, 15, 20, 25, 30, 35, 40, 45, 50], stride='SBM', option=1, estimate_pi=False, parallelize=False)

Run multiple maximum likelihood estimations (ML) for a range of GEV shape parameter values.

Notes

This function performs ML estimation for a range of GEV shape parameter values specified in ‘gamma_trues’ and aggregates the results into a single dictionary. It iterates over each gamma value, calls the ‘run_ML_estimation’ function, and collects the results. If ‘parallelize’ is set to True, it runs the estimations concurrently using asyncio.

Parameters

param file:

str Path to the file containing the time series data.

param corr:

str, optional Correlation type for the model (default is ‘IID’).

param gamma_trues:

numpy.ndarray, optional Array of GEV shape parameter values to perform ML estimation for (default is np.arange(-4, 5, 1)/10).

param block_sizes:

list, optional List of block sizes for extracting block maxima (default is [5, 10, 15, 20, 25, 30, 35, 40, 45, 50]).

param stride:

str, optional Stride type for block maxima extraction. Options are ‘SBM’ (sliding block maxima) or ‘DBM’ (default block maxima) (default is ‘SBM’).

param r:

int, optional Number of orderstatistics to calculate the log-likelihood on. If not specified, use all provided

param estimate_pi:

bool, optional Flag indicating whether to estimate the pi parameter (default is False).

param parallelize:

bool, optional Flag indicating whether to parallelize the ML estimations using asyncio (default is False).

Returns

return:

dict Dictionary containing ML estimation results for each gamma value and block size.

Examples

  1. Run ML Estimation:

    from xtremes.topt import run_ML_estimation
    
    result = run_ML_estimation("timeseries_data.pkl", corr='ARMAX', gamma_true=0.5, block_sizes=[10, 20, 30], stride='DBM', option=2, estimate_pi=True)
    print(result)
    
  2. Run Multiple ML Estimations:

    from xtremes.topt import run_multiple_ML_estimations
    import numpy as np
    
    result = run_multiple_ML_estimations("timeseries_data.pkl", corr='IID', gamma_trues=np.arange(-4, 5, 1)/10, block_sizes=[10, 20, 30], stride='SBM', option=1, estimate_pi=False, parallelize=True)
    print(result)