DSC381

DSC-381 Simulation Work

This contains compiled simulation functions in python for statistical analysis related to DSC381.

Prerequisites:

Python 3
The following packages:
- numpy
- pandas
- matplotlib
- scipy
- statsmodels

Functionality

Currently, there is one python script (statistics_functions_py) capable of doing the following tests:

Hypothesis Testing

Hypothesis test for a single mean

    hyptest_singlemean(data, null_hyp = 0, n = 10000, alt = 'two-sided')
        
        Returns simulation p-value (float) for hypothesis test of a single mean.

        Parameter(s):
        data (array) is the dataset for analysis
        null_hyp (float) is the null hypothesis
        n (int) is the number of bootstrap simulations
        alt (string) defines alt hypothesis ['two-sided, 'less', 'greater']

Hypothesis test for a single in proportion

    hyptest_singleprop(p_sample, p_null = 0.5, size = 30, n = 10000, alt = 'two-sided', disp = 'count')

        Returns simulation p-value (float) for hypothesis test of a single proportion.

        Parameter(s):
        p_sample (float) is the sample proportion
        p_null (float) is the null hypothesis proportion
        size (int) is the sample size
        n (int) is the number of bootstrap simulations
        alt (string) defines alt hypothesis ['two-sided, 'less', 'greater']
        disp (string) choose plot display ['count', 'prop']

Hypothesis test for difference in two means

    hyptest_diffmeans(data_1, data_2, n = 10000, alt = 'two-sided')
        
        Returns simulation p-value (float) for hypothesis test of a difference in two means
        Null hypothesis is defined as the two means being equal

        Parameter(s):
        data_1 (array) is the dataset for first mean
        data_2 (array) is the dataset for second mean
        n (int) is the number of bootstrap simulations
        alt (string) defines alt hypothesis ['two-sided, 'less', 'greater']

Hypothesis test for difference in two proportions

    hyptest_diffprops(p_1, size_1, p_2, size_2, n = 10000, alt = 'two-sided')
        
        Returns simulation p-value (float) for hypothesis test of a difference in two proportions
        Null hypothesis is defined as the two proportions being equal

        Parameter(s):
        p_1 (float) is proportion for dataset 1
        size_1 (int) is the size of dataset 1
        p_2 (float) is the proportion for dataset 2
        size_2 (int) is the size of dataset 2
        n (int) is the number of bootstrap simulations
        alt (string) defines alt hypothesis ['two-sided, 'less', 'greater']

Hypothesis test for a slope

    hyptest_slope(features, targets, n=10000, alt='two-sided')
        
        Returns simulation p-value (float) for hypothesis test of a slope (correlation) of two datasets
        Null hypothesis is that there is no correlation between the two datasets

        Parameter(s):
        features (array) is a single dimension array of features
        targets (array) is a single dimension array of targets
        n (int) is the number of bootstrap simulations
        alt (string) defines alt hypothesis ['two-sided, 'less', 'greater']

Confidence Intervals

Confidence Interval for a statistic

    ci_statistic(data, n=10000, l_tail=5, r_tail=95, stat='mean')

        Returns returns confidence interval (tuple) for a statistic (left tail, right tail, length)

        Parameter(s):
        data (array) is the dataset for analysis
        n (int) is the number of bootstrap simulations
        l_tail (float) is the cut off percentile for the left tail [0,100]
        r_tail (float) is the cut off percentile for the right tail [0,100]
        stat (string) is the statistic being analyzed ["mean","median","stdev"]

Confidence Interval for a proportion

    ci_prop(p, size, n=10000, l_tail=5, r_tail=95)

        Returns returns confidence interval (tuple) a proportion (left tail, right tail, length)

        Parameter(s):
        p (float) is proportion for the dataset
        size (int) is the size of the dataset
        n (int) is the number of bootstrap simulations
        l_tail (float) is the cut off percentile for the left tail [0,100]
        r_tail (float) is the cut off percentile for the right tail [0,100]

Confidence Interval for difference of two proportions

    ci_diffprops(p_1, size_1, p_2, size_2, n=10000, l_tail=5, r_tail=95)

        Returns returns confidence interval (tuple) for difference in proportions (left tail, right tail, length)

        Parameter(s):
        p_1 (float) is proportion for dataset 1
        size_1 (int) is the size of dataset 1
        p_2 (float) is the proportion for dataset 2
        size_2 (int) is the size of dataset 2
        n (int) is the number of bootstrap simulations
        l_tail (float) is the cut off percentile for the left tail [0,100]
        r_tail (float) is the cut off percentile for the right tail [0,100]

Confidence Interval for difference of two proportions

    ci_diffstatistics(data_1, data_2, n=10000, l_tail=5, r_tail=95, stat='mean')

        Returns the confidence interval for the difference in two statistics

        Parameter(s):
        data_1 (array) is the dataset for first mean
        data_2 (array) is the dataset for second mean
        n (int) is the number of bootstrap simulations
        l_tail (float) is the cut off percentile for the left tail [0,100]
        r_tail (float) is the cut off percentile for the right tail [0,100]
        stat (string) is the statistic being analyzed ["mean","median","stdev"]

Confidence Interval for a slope

    ci_slope(features, targets, n=10000, l_tail=5, r_tail=95)
        
        Returns simulation p-value (float) for hypothesis test of a slope (correlation) of two datasets
        Null hypothesis is that there is no correlation between the two datasets

        Parameter(s):
        features (array) is a single dimension array of features
        targets (array) is a single dimension array of targets
        n (int) is the number of bootstrap simulations
        alt (string) defines alt hypothesis ['two-sided, 'less', 'greater']

Sample Size and Margin Estimation

    samplesize_stat(ci, stdev, margin):
        
        Returns the minimum required sample size (int) given a confidence interval, standard deviation, and sample size for a statistic

        Parameter(s):
        ci (float) is the desired confidence interval in an interval (0, 1)
        stdev (float) is the estimated population standard deviation
        margin (float) is the desired margin of error of the statistic

    marginsize_stat(ci, stdev, size):
       
        Returns the margin size (float) given a confidence interval, standard deviation, and sample size for a statistic

        Parameter(s):
        ci (float) is the desired confidence interval in an interval (0, 1)
        stdev (float) is the estimated population standard deviation
        size (int) is the size of the sample

    marginsize_stat(ci, stdev, size):
        
        Returns the margin size (float) given a confidence interval, standard deviation, and sample size for a statistic

        Parameter(s):
        ci (float) is the desired confidence interval in an interval (0, 1)
        stdev (float) is the estimated population standard deviation
        size (int) is the size of the sample

    samplesize_prop(ci, p_est = 0.5, margin = 0.05):
        
        Returns the minimum required sample size (int) given a confidence interval, est. proportion, and margin of error for a proportion

        Parameter(s):
        ci (float) is the desired confidence interval in an interval (0, 1)
        stdev (float) is the estimated population standard deviation
        margin (float) is the desired margin of error of the statistic

    marginsize_prop(ci, p_est = 0.5, size = 100):
            
            Returns the margin (float) given a confidence interval, est. proportion, and sample size for a proportion

            Parameter(s):
            ci (float) is the desired confidence interval in an interval (0, 1)
            stdev (float) is the estimated population standard deviation
            size (int) is the size of the sample

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
statistics_functions.py		statistics_functions.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DSC381

DSC-381 Simulation Work

Prerequisites:

Functionality

Hypothesis Testing

Confidence Intervals

Sample Size and Margin Estimation

About

Releases

Packages

Languages

mgz-dev/DSC381

Folders and files

Latest commit

History

Repository files navigation

DSC381

DSC-381 Simulation Work

Prerequisites:

Functionality

Hypothesis Testing

Confidence Intervals

Sample Size and Margin Estimation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages