PyStatLab

Install

git clone https://github.com/carrollstreet/PyStatLab
cd PyStatLab
pip3 install .

Update

cd PyStatLab
git pull origin main
pip3 install . --upgrade

Docs

ab_tesing

The ab_testing module is essential for:

Conducting A/B tests with robust statistical backing.
Analyzing tests where traditional assumptions (e.g., normal distribution, equal variances) do not apply.
Incorporating Bayesian approaches into A/B testing.
Comparing quantiles in large datasets.
Visualizing the results of statistical tests for better interpretation and decision-making.

Classes

ParentTestInterface

Description

Base interface for conducting statistical tests, particularly for A/B testing. Provides foundational methods and attributes for subclass implementations.

Parameters

confidence_level: The confidence level for calculating confidence intervals.
n_resamples: The number of simulations or units to be considered in the test.
random_state: The seed for the random number generator to ensure reproducibility.

Methods

__init__: Initializes the class.
__setattr__: Custom attribute setter.
_compute_confidence_bounds: Computes the confidence bounds.
_compute_ci: Computes confidence intervals for given data.
_compute_uplift: Calculates uplift.
get_test_parameters: Retrieves initial test parameters.
_get_alternative_value: Calculates p-value for two-sided tests.
_get_readable_format: Prints results in a readable format.
_metric_distributions_chart: Draws a histogram chart.
_uplift_distribtuion_chart: Draws a cumulative distribution chart.
resample: Abstract method for resampling.
compute: Abstract method for computing test results.
get_charts: Abstract method for generating charts.

BayesBeta

Description

Implements Bayesian approach to A/B testing using beta distributions. Extends ParentTestInterface with specific methods for Bayesian analysis.

Parameters

Inherits parameters from ParentTestInterface.

Methods

__init__: Constructor for the class.
resample: Generates beta distributions for analysis.
compute: Calculates statistical significance and other metrics.
get_charts: Generates and displays charts.

Bootstrap

Description

Bootstrap class for conducting statistical tests using bootstrapping. Provides resampling method to estimate the distribution of a statistic.

Parameters

Inherits parameters from ParentTestInterface.
func: Statistical function to apply to the samples.

Methods

__init__: Constructor for the Bootstrap class.
resample: Performs the resampling process.
compute: Computes statistical significance and metrics.
get_charts: Generates and displays charts.

QuantileBootstrap

Description

Efficient implementation of the quantile comparison method using bootstrap resampling for large-scale A/B testing scenarios.

Parameters

Inherits parameters from ParentTestInterface.
q: Target quantile for comparison.

Methods

__init__: Constructor for the QuantileBootstrap class.
resample: Performs resampling for quantile distribution.
compute: Computes statistical significance of quantile comparison.
get_charts: Generates and displays charts.

ResamplingTtest

Description

Conducts t-distribution-based resampling tests in A/B testing scenarios, suitable for unequal variances and smaller sample sizes.

Parameters

Inherits parameters from ParentTestInterface.

Methods

__init__: Constructor for the class.
resample: Performs resampling using a t-distribution approach.
compute: Computes statistical significance using an analytical approach.
get_charts: Generates and displays charts.

Functions

permutation_ind

Performs an independent two-sample permutation test.

g_squared

Calculates the G-squared statistic for a given contingency table.

ttest_confidence_interval

Calculates the confidence interval for the difference between means using a t-test.

test_design

The test_design module is instrumental for:

Planning and conducting A/B tests with appropriate sample sizes to detect meaningful effects.
Evaluating the effectiveness of different statistical tests for specific distributions.
Calculating effect sizes and understanding their implications in experimental design.
Controlling error rates in multiple testing situations, ensuring the reliability of results.

Classes

DurationEstimatorInterface

Description: Base class for estimating test duration and sample size.
Attributes:
- uplift: Expected effect size or uplift.
- daily_size_per_sample: Daily observations per sample.
- alpha: Significance level for hypothesis testing.
- power_threshold: Desired test power.
- n_resamples: Number of resampling iterations.
- random_state: Seed for random number generator.
Methods:
- __init__: Initializes the class with specified parameters.
- __setattr__: Customizes attribute setting, especially for uplift.
- compute_size: Computes required sample size and duration.
- _compute_pvalues: Abstract method for p-value computation.

ProportionSizeEstim

Description: Estimates sample size and duration for proportion metrics.
Inherits: DurationEstimatorInterface
Attributes:
- cr_baseline: Baseline conversion rate.
- Additional attributes inherited from DurationEstimatorInterface.
Methods:
- __init__: Initializes the class with specific parameters for proportions.
- _compute_pvalues: Computes p-values for proportion metrics.

TtestSizeEstim

Description: Estimates sample size and duration for mean metrics using T-tests.
Inherits: DurationEstimatorInterface
Attributes:
- target_sample: Target sample data for analysis.
- Additional attributes inherited from DurationEstimatorInterface.
Methods:
- __init__: Initializes the class with specific parameters for T-tests.
- _compute_pvalues: Computes p-values for mean metrics.

Functions

cohens_d

Description: Calculates Cohen's d, a measure of effect size for the difference between two means.
Parameters:
- *args: Two sample arrays or four specific values (two means and two standard deviations).
- from_samples: Flag to indicate calculation method.
Returns: Cohen's d value.

proportion_size

Description: Calculates required sample size for proportion testing.
Parameters:
- p: Baseline proportion.
- uplift: Expected uplift.
- n_comparison: Number of comparisons.
- alpha: Significance level.
- power: Desired test power.
- groups: Number of groups in the experiment.
Returns: Total sample size.

ttest_size

Description: Calculates required sample size for T-tests on mean metrics.
Parameters: Similar to proportion_size, but focused on mean metrics.
Returns: Total sample size.

expected_proportion

Description: Estimates expected proportion based on effect size.
Parameters:
- effect_size: Anticipated effect size.
- proportion_1: Proportion in the first group.
Returns: Expected proportion in the second group and uplift.

normal_1samp_size

Description: Determines sample size for estimating population mean.
Parameters:
- sigma: Population standard deviation.
- d: Desired precision level.
- confidence_level: Desired confidence level.
Returns: Required sample size.

proportion_1samp_size

Description: Determines sample size for estimating population proportion.
Parameters: Similar to normal_1samp_size, but for proportions.
Returns: Required sample size.

fwer

Description: Calculates the family-wise error rate for multiple hypothesis testing.
Parameters:
- n_comparison: Number of hypothesis tests.
- alpha: Significance level for a single comparison.
Returns: Calculated FWER.

stat_analysis

Functions

correlation_ratio

Description: Calculates the correlation ratio between a categorical variable and a continuous variable.
Parameters:
- values: Continuous data values (dependent variable).
- categories: Categories corresponding to the values (independent variable).
Returns: Calculated correlation ratio, a value between 0 and 1.

cramers_v

Description: Calculates Cramér's V statistic for association between two categorical variables.
Parameters:
- rc_table: A two-dimensional contingency table of frequencies or counts.
- observations: Handling method for small sample sizes ('raise' or 'ignore').
Returns: Dictionary with 'correlation' (Cramér's V value), 'pvalue', and 'chi2'.

robust_mean

Description: Calculates a robust mean using truncated or Winsorized mean method.
Parameters:
- data: Data for calculation.
- trunc_level: Level for truncation or Winsorization.
- type_: Type of robust mean ('truncated' or 'winsorized').
Returns: Calculated robust mean.

binom_wilson_confidence_interval

Description: Calculates the Wilson confidence interval for a binomial proportion.
Parameters:
- p: Observed proportion.
- n: Total number of observations.
- confidence_level: Desired confidence level.
Returns: Lower and upper bounds of the Wilson confidence interval.

get_lognormal_params

Description: Estimates parameters of a lognormal distribution from mean and standard deviation.
Parameters:
- mean: Mean value for normal distribution generation.
- std: Standard deviation for normal distribution generation.
Returns: Estimates of mean and standard deviation for the lognormal distribution.

jackknife_samples

Description: Generates jackknife samples by omitting each observation systematically.
Parameters:
- sample: Original sample data.
Returns: Array of jackknife samples.

jackknife_estim

Description: Performs jackknife resampling to estimate a parameter and its confidence interval.
Parameters:
- sample: Original sample data.
- func: Statistical function to apply.
- confidence_level: Confidence level for the interval calculation.
Returns: Dictionary with estimated parameter, bias, standard error, and confidence interval.

bootstrap_ci

Description: Calculates bootstrap confidence intervals for a sample statistic.
Parameters:
- sample: Original sample data.
- func: Statistical function to apply.
- confidence_level: Confidence level for the interval calculation.
- n_resamples: Number of bootstrap resamples.
- method: Bootstrap method ('percentile', 'pivotal', 'bca').
- return_dist: Flag to return bootstrap sample distribution.
- random_state: Seed for the random number generator.
Returns: Lower and upper bounds of the confidence interval (and bootstrap samples if return_dist is True).

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
pystatlab		pystatlab
README.md		README.md
pyproject.toml		pyproject.toml
setup.py		setup.py

carrollstreet/PyStatLab

Folders and files

Latest commit

History

Repository files navigation

PyStatLab

Install

Update

Docs

ab_tesing

Classes

ParentTestInterface

Description

Parameters

Methods

BayesBeta

Description

Parameters

Methods

Bootstrap

Description

Parameters

Methods

QuantileBootstrap

Description

Parameters

Methods

ResamplingTtest

Description

Parameters

Methods

Functions

permutation_ind

g_squared

ttest_confidence_interval

test_design

Classes

DurationEstimatorInterface

ProportionSizeEstim

TtestSizeEstim

Functions

cohens_d

proportion_size

ttest_size

expected_proportion

normal_1samp_size

proportion_1samp_size

fwer

stat_analysis

Functions

correlation_ratio

cramers_v

robust_mean

binom_wilson_confidence_interval

get_lognormal_params

jackknife_samples

jackknife_estim

bootstrap_ci

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages