Skip to content

Commit

Permalink
fix
Browse files Browse the repository at this point in the history
  • Loading branch information
RobTillaart committed Jul 23, 2024
1 parent 4a6a5c7 commit 77067f7
Show file tree
Hide file tree
Showing 3 changed files with 137 additions and 25 deletions.
58 changes: 41 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,28 +23,48 @@ based upon a (very) small sample (selection).
Goal is to calculate the interval for which there is a certain confidence e.g. 95%,
that the population mean lies between the estimated mean from the sample +- some intervalDelta.

The 0.1.x version of the library uses a lookup table for sample size is 2..20, and for the
discrete confidence levels 80%, 90%, 95%, 98% and 99%.
This look up table is defined in the file **StudentTable.h** and generated from a spreadsheet.
Note: values in the LUT are single sided.
The 0.1.x version of the library is limited to a maximum of 20 samples to keep RAM usage
relative low.
It uses a lookup table (LUT) for a sample size up to 20, and for five discrete confidence
levels 80%, 90%, 95%, 98% and 99% ==> size table = 20 x 5 = 100 floats.
The table encodes the 100 floats as uint16_t to save 50% of the RAM needed.

The library allows to calculate the interval after every addition until the internal buffer
of 20 samples is full. (At least 2 samples needed)
of 20 samples is full. Note that at least 2 samples are needed to calculate the interval.


### StudentTable.h

The table is defined in the file **StudentTable.h** and generated from a spreadsheet.

If one need to extend the sample size, the file contains commented values for sample size 21-100.
One has to adjust **STUDENT_MAX_SIZE** in **StudentTable.h** too (max 255).
Note this will cost extra RAM.

If one wants to reduce the sample size / RAM, one can comment part of the table not needed.
One has to adjust **STUDENT_MAX_SIZE** too.


### History

The T-distribution is developed by W.A. Gosset in 1908 while working for Guinness Beer.
His goal was to guard a constant quality of the beer. For this he needed a method to determine
the quality of the raw materials like grains, malt and hops to
The T-distribution is developed by William Gosset, head experimental brewer, in 1908 while
working for Guinness Beer. His goal was to guard a constant quality of the beer brewed.
He needed a method to determine the quality of the raw materials like grains, malt and hops,
based on small samples from the fields.
For this he invented the t-distribution and published it under the name Student.
Therefore this distribution is also known as the Student distribution.


### Accuracy / precision

The version 0.1.x uses float for internal storage. This means precision is at most 6-7 digits.

The version 0.1.x lookup table has 20 x 5 values with 3 decimals, coded as an uint16_t (for RAM)
This allows about 3-4 digits precision in the found interval.
This allows about 3-4 digits precision for the found interval.

The 0.1.x version does not interpolate the confidence level yet, and only support 5 distinct levels.
The 0.1.x version does not interpolate the confidence level (yet), and only support 5 distinct levels.
This interpolation (between 80-99) is planned to be implemented in the future.
If a non supported confidence level is used, the library will use 95%.

If you need only one confidence interval you could strip the lookup table to one column.

Expand Down Expand Up @@ -78,11 +98,10 @@ If you need only one confidence interval you could strip the lookup table to one
#include Student.h
```


### Constructor + meta

- **Student()** constructor. 0.1.x has a fixed max sample size = 20.
- **uint8_t getSize()** returns 20.
- **Student()** constructor. 0.1.x has a fixed max sample size STUDENT_MAX_SIZE = 20.
- **uint8_t getSize()** returns STUDENT_MAX_SIZE == 20.
- **uint8_t getCount()** returns the number of samples added.
Returns value between 0 .. getSize().
- **void reset()** resets internal counter to zero.
Expand All @@ -96,9 +115,10 @@ Returns false if the internal buffer would "overflow".

### Math

- **float mean()** returns estimated mean based upon samples added.
- **float variance()** idem
- **float deviation()** idem
- **float mean()** returns mean (average) of the samples added.
This is the estimated mean of the population from which the samples are taken.
- **float variance()** returns variance of the samples added.
- **float deviation()** returns standard deviation of the samples added.
- **float estimatedDeviation()** returns estimated deviation of the
estimated mean (based upon the samples).

Expand All @@ -107,7 +127,6 @@ estimated mean (based upon the samples).
Confidence should be 80, 90, 95, 98 or 99.
The confidence level is not interpolated and incorrect values are replaced by 95%.


- **float intervalDelta(int confidence)** returns the delta to be added
oor subtracted to the mean to determine the confidence interval.
- **float meanLower(int confidence)** returns mean - intervalDelta.
Expand All @@ -128,13 +147,18 @@ oor subtracted to the mean to determine the confidence interval.
- optimize lookup table, PROGMEM for footprint?
- dynamic allocation for sizes > 20
- or derived classes, student30, student40, student50, student100?
- linear interpolation for values > 10 (performance?)
- add interpolation to **intervalDelta()** so confidence level (0.2.x)
could be any integer value from 80-99 (maybe even float?)

#### Could

- add examples
- add unit tests
- replace look up table with a formula? (performance drop!!)
- access function for internal array to access samples?
- template class instead of STUDENT_MAX_SIZE? (becomes different types).
- circular buffer for the samples? Running T-test?

#### Won't (unless requested)

Expand Down
4 changes: 2 additions & 2 deletions Student.h
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ class Student
public:
Student()
{
_size = 10;
_size = STUDENT_MAX_SIZE;
_count = 0;
}

Expand Down Expand Up @@ -133,7 +133,7 @@ class Student
uint8_t _size;
uint8_t _count;
float _mean = 0;
float _value[20];
float _value[STUDENT_MAX_SIZE];

};

Expand Down
100 changes: 94 additions & 6 deletions StudentTable.h
Original file line number Diff line number Diff line change
Expand Up @@ -6,18 +6,24 @@
// PURPOSE: Arduino library for Student or T-distribution math.
// DATE: 2024-07-22

// lookup table, do not change.
// lookup table, do not change (unless you know what you do).
//
// spreadsheet = ROUND(1000 * T.INV.2T(C$4;$B7))
//
// x = degrees of freedom
// y = confidence level 0.1, 0.05, 0.025, 0.01, 0.005 (right only)
// y = confidence level 0.1, 0.05, 0.025, 0.01, 0.005 (single ended)
// values are multiplied by 1000, to save RAM.
//
//


// command line overrule possible
#ifndef STUDENT_MAX_SIZE
#define STUDENT_MAX_SIZE (20)
#endif


// uses 200 bytes of RAM
uint16_t StudentLUT[20][5]
uint16_t StudentLUT[STUDENT_MAX_SIZE][5]
{
// 80% 90% 95% 98% 99%
//---------------------------------
Expand Down Expand Up @@ -60,12 +66,94 @@ uint16_t StudentLUT[20][5]
{ 1311, 1699, 2045, 2462, 2756},
{ 1310, 1697, 2042, 2457, 2750}, // n = 30
{ 1309, 1696, 2040, 2453, 2744},
{ 1309, 1694, 2037, 2449, 2738},
{ 1308, 1692, 2035, 2445, 2733},
{ 1307, 1691, 2032, 2441, 2728},
{ 1306, 1690, 2030, 2438, 2724},
{ 1306, 1688, 2028, 2434, 2719},
{ 1305, 1687, 2026, 2431, 2715},
{ 1304, 1686, 2024, 2429, 2712},
{ 1304, 1685, 2023, 2426, 2708},
{ 1303, 1684, 2021, 2423, 2704}, // n = 40
{ 1303, 1683, 2020, 2421, 2701},
{ 1302, 1682, 2018, 2418, 2698},
{ 1302, 1681, 2017, 2416, 2695},
{ 1301, 1680, 2015, 2414, 2692},
{ 1301, 1679, 2014, 2412, 2690},
{ 1300, 1679, 2013, 2410, 2687},
{ 1300, 1678, 2012, 2408, 2685},
{ 1299, 1677, 2011, 2407, 2682},
{ 1299, 1677, 2010, 2405, 2680},
{ 1299, 1676, 2009, 2403, 2678}, // n = 50
{ 1298, 1675, 2008, 2402, 2676},
{ 1298, 1675, 2007, 2400, 2674},
{ 1298, 1674, 2006, 2399, 2672},
{ 1297, 1674, 2005, 2397, 2670},
{ 1297, 1673, 2004, 2396, 2668},
{ 1297, 1673, 2003, 2395, 2667},
{ 1297, 1672, 2002, 2394, 2665},
{ 1296, 1672, 2002, 2392, 2663},
{ 1296, 1671, 2001, 2391, 2662},
{ 1296, 1671, 2000, 2390, 2660}, // n = 60
{ 1296, 1670, 2000, 2389, 2659},
{ 1295, 1670, 1999, 2388, 2657},
{ 1295, 1669, 1998, 2387, 2656},
{ 1295, 1669, 1998, 2386, 2655},
{ 1295, 1669, 1997, 2385, 2654},
{ 1295, 1668, 1997, 2384, 2652},
{ 1294, 1668, 1996, 2383, 2651},
{ 1294, 1668, 1995, 2382, 2650},
{ 1294, 1667, 1995, 2382, 2649},
{ 1294, 1667, 1994, 2381, 2648}, // n = 70
{ 1294, 1667, 1994, 2380, 2647},
{ 1293, 1666, 1993, 2379, 2646},
{ 1293, 1666, 1993, 2379, 2645},
{ 1293, 1666, 1993, 2378, 2644},
{ 1293, 1665, 1992, 2377, 2643},
{ 1293, 1665, 1992, 2376, 2642},
{ 1293, 1665, 1991, 2376, 2641},
{ 1292, 1665, 1991, 2375, 2640},
{ 1292, 1664, 1990, 2374, 2640},
{ 1292, 1664, 1990, 2374, 2639}, // n = 80
{ 1292, 1664, 1990, 2373, 2638},
{ 1292, 1664, 1989, 2373, 2637},
{ 1292, 1663, 1989, 2372, 2636},
{ 1292, 1663, 1989, 2372, 2636},
{ 1292, 1663, 1988, 2371, 2635},
{ 1291, 1663, 1988, 2370, 2634},
{ 1291, 1663, 1988, 2370, 2634},
{ 1291, 1662, 1987, 2369, 2633},
{ 1291, 1662, 1987, 2369, 2632},
{ 1291, 1662, 1987, 2368, 2632}, // n = 90
{ 1291, 1662, 1986, 2368, 2631},
{ 1291, 1662, 1986, 2368, 2630},
{ 1291, 1661, 1986, 2367, 2630},
{ 1291, 1661, 1986, 2367, 2629},
{ 1291, 1661, 1985, 2366, 2629},
{ 1290, 1661, 1985, 2366, 2628},
{ 1290, 1661, 1985, 2365, 2627},
{ 1290, 1661, 1984, 2365, 2627},
{ 1290, 1660, 1984, 2365, 2626},
{ 1290, 1660, 1984, 2364, 2626} // n = 100
40 1303 1684 2021 2423 2704
60 1296 1671 2000 2390 2660
120 1289 1658 1980 2358 2617
10000 1282 1645 1960 2327 2576 // effectively infinity.
*/


// -- END OF FILE --

0 comments on commit 77067f7

Please sign in to comment.