git clone https://github.com/carrollstreet/PyStatLab
cd PyStatLab
pip3 install .
cd PyStatLab
git pull origin main
pip3 install . --upgrade
The ab_testing
module is essential for:
- Conducting A/B tests with robust statistical backing.
- Analyzing tests where traditional assumptions (e.g., normal distribution, equal variances) do not apply.
- Incorporating Bayesian approaches into A/B testing.
- Comparing quantiles in large datasets.
- Visualizing the results of statistical tests for better interpretation and decision-making.
Base interface for conducting statistical tests, particularly for A/B testing. Provides foundational methods and attributes for subclass implementations.
confidence_level
: The confidence level for calculating confidence intervals.n_resamples
: The number of simulations or units to be considered in the test.random_state
: The seed for the random number generator to ensure reproducibility.
__init__
: Initializes the class.__setattr__
: Custom attribute setter._compute_confidence_bounds
: Computes the confidence bounds._compute_ci
: Computes confidence intervals for given data._compute_uplift
: Calculates uplift.get_test_parameters
: Retrieves initial test parameters._get_alternative_value
: Calculates p-value for two-sided tests._get_readable_format
: Prints results in a readable format._metric_distributions_chart
: Draws a histogram chart._uplift_distribtuion_chart
: Draws a cumulative distribution chart.resample
: Abstract method for resampling.compute
: Abstract method for computing test results.get_charts
: Abstract method for generating charts.
Implements Bayesian approach to A/B testing using beta distributions. Extends ParentTestInterface with specific methods for Bayesian analysis.
- Inherits parameters from
ParentTestInterface
.
__init__
: Constructor for the class.resample
: Generates beta distributions for analysis.compute
: Calculates statistical significance and other metrics.get_charts
: Generates and displays charts.
Bootstrap class for conducting statistical tests using bootstrapping. Provides resampling method to estimate the distribution of a statistic.
- Inherits parameters from
ParentTestInterface
. func
: Statistical function to apply to the samples.
__init__
: Constructor for the Bootstrap class.resample
: Performs the resampling process.compute
: Computes statistical significance and metrics.get_charts
: Generates and displays charts.
Efficient implementation of the quantile comparison method using bootstrap resampling for large-scale A/B testing scenarios.
- Inherits parameters from
ParentTestInterface
. q
: Target quantile for comparison.
__init__
: Constructor for the QuantileBootstrap class.resample
: Performs resampling for quantile distribution.compute
: Computes statistical significance of quantile comparison.get_charts
: Generates and displays charts.
Conducts t-distribution-based resampling tests in A/B testing scenarios, suitable for unequal variances and smaller sample sizes.
- Inherits parameters from
ParentTestInterface
.
__init__
: Constructor for the class.resample
: Performs resampling using a t-distribution approach.compute
: Computes statistical significance using an analytical approach.get_charts
: Generates and displays charts.
Performs an independent two-sample permutation test.
Calculates the G-squared statistic for a given contingency table.
Calculates the confidence interval for the difference between means using a t-test.
The test_design
module is instrumental for:
- Planning and conducting A/B tests with appropriate sample sizes to detect meaningful effects.
- Evaluating the effectiveness of different statistical tests for specific distributions.
- Calculating effect sizes and understanding their implications in experimental design.
- Controlling error rates in multiple testing situations, ensuring the reliability of results.
- Description: Base class for estimating test duration and sample size.
- Attributes:
uplift
: Expected effect size or uplift.daily_size_per_sample
: Daily observations per sample.alpha
: Significance level for hypothesis testing.power_threshold
: Desired test power.n_resamples
: Number of resampling iterations.random_state
: Seed for random number generator.
- Methods:
__init__
: Initializes the class with specified parameters.__setattr__
: Customizes attribute setting, especially for uplift.compute_size
: Computes required sample size and duration._compute_pvalues
: Abstract method for p-value computation.
- Description: Estimates sample size and duration for proportion metrics.
- Inherits:
DurationEstimatorInterface
- Attributes:
cr_baseline
: Baseline conversion rate.- Additional attributes inherited from
DurationEstimatorInterface
.
- Methods:
__init__
: Initializes the class with specific parameters for proportions._compute_pvalues
: Computes p-values for proportion metrics.
- Description: Estimates sample size and duration for mean metrics using T-tests.
- Inherits:
DurationEstimatorInterface
- Attributes:
target_sample
: Target sample data for analysis.- Additional attributes inherited from
DurationEstimatorInterface
.
- Methods:
__init__
: Initializes the class with specific parameters for T-tests._compute_pvalues
: Computes p-values for mean metrics.
- Description: Calculates Cohen's d, a measure of effect size for the difference between two means.
- Parameters:
*args
: Two sample arrays or four specific values (two means and two standard deviations).from_samples
: Flag to indicate calculation method.
- Returns: Cohen's d value.
- Description: Calculates required sample size for proportion testing.
- Parameters:
p
: Baseline proportion.uplift
: Expected uplift.n_comparison
: Number of comparisons.alpha
: Significance level.power
: Desired test power.groups
: Number of groups in the experiment.
- Returns: Total sample size.
- Description: Calculates required sample size for T-tests on mean metrics.
- Parameters: Similar to
proportion_size
, but focused on mean metrics. - Returns: Total sample size.
- Description: Estimates expected proportion based on effect size.
- Parameters:
effect_size
: Anticipated effect size.proportion_1
: Proportion in the first group.
- Returns: Expected proportion in the second group and uplift.
- Description: Determines sample size for estimating population mean.
- Parameters:
sigma
: Population standard deviation.d
: Desired precision level.confidence_level
: Desired confidence level.
- Returns: Required sample size.
- Description: Determines sample size for estimating population proportion.
- Parameters: Similar to
normal_1samp_size
, but for proportions. - Returns: Required sample size.
- Description: Calculates the family-wise error rate for multiple hypothesis testing.
- Parameters:
n_comparison
: Number of hypothesis tests.alpha
: Significance level for a single comparison.
- Returns: Calculated FWER.
- Description: Calculates the correlation ratio between a categorical variable and a continuous variable.
- Parameters:
values
: Continuous data values (dependent variable).categories
: Categories corresponding to the values (independent variable).
- Returns: Calculated correlation ratio, a value between 0 and 1.
- Description: Calculates Cramér's V statistic for association between two categorical variables.
- Parameters:
rc_table
: A two-dimensional contingency table of frequencies or counts.observations
: Handling method for small sample sizes ('raise' or 'ignore').
- Returns: Dictionary with 'correlation' (Cramér's V value), 'pvalue', and 'chi2'.
- Description: Calculates a robust mean using truncated or Winsorized mean method.
- Parameters:
data
: Data for calculation.trunc_level
: Level for truncation or Winsorization.type_
: Type of robust mean ('truncated' or 'winsorized').
- Returns: Calculated robust mean.
- Description: Calculates the Wilson confidence interval for a binomial proportion.
- Parameters:
p
: Observed proportion.n
: Total number of observations.confidence_level
: Desired confidence level.
- Returns: Lower and upper bounds of the Wilson confidence interval.
- Description: Estimates parameters of a lognormal distribution from mean and standard deviation.
- Parameters:
mean
: Mean value for normal distribution generation.std
: Standard deviation for normal distribution generation.
- Returns: Estimates of mean and standard deviation for the lognormal distribution.
- Description: Generates jackknife samples by omitting each observation systematically.
- Parameters:
sample
: Original sample data.
- Returns: Array of jackknife samples.
- Description: Performs jackknife resampling to estimate a parameter and its confidence interval.
- Parameters:
sample
: Original sample data.func
: Statistical function to apply.confidence_level
: Confidence level for the interval calculation.
- Returns: Dictionary with estimated parameter, bias, standard error, and confidence interval.
- Description: Calculates bootstrap confidence intervals for a sample statistic.
- Parameters:
sample
: Original sample data.func
: Statistical function to apply.confidence_level
: Confidence level for the interval calculation.n_resamples
: Number of bootstrap resamples.method
: Bootstrap method ('percentile', 'pivotal', 'bca').return_dist
: Flag to return bootstrap sample distribution.random_state
: Seed for the random number generator.
- Returns: Lower and upper bounds of the confidence interval (and bootstrap samples if
return_dist
is True).