diff --git a/README.md b/README.md index 671bc62..99fa0d9 100644 --- a/README.md +++ b/README.md @@ -23,28 +23,48 @@ based upon a (very) small sample (selection). Goal is to calculate the interval for which there is a certain confidence e.g. 95%, that the population mean lies between the estimated mean from the sample +- some intervalDelta. -The 0.1.x version of the library uses a lookup table for sample size is 2..20, and for the -discrete confidence levels 80%, 90%, 95%, 98% and 99%. -This look up table is defined in the file **StudentTable.h** and generated from a spreadsheet. -Note: values in the LUT are single sided. +The 0.1.x version of the library is limited to a maximum of 20 samples to keep RAM usage +relative low. +It uses a lookup table (LUT) for a sample size up to 20, and for five discrete confidence +levels 80%, 90%, 95%, 98% and 99% ==> size table = 20 x 5 = 100 floats. +The table encodes the 100 floats as uint16_t to save 50% of the RAM needed. The library allows to calculate the interval after every addition until the internal buffer -of 20 samples is full. (At least 2 samples needed) +of 20 samples is full. Note that at least 2 samples are needed to calculate the interval. + + +### StudentTable.h + +The table is defined in the file **StudentTable.h** and generated from a spreadsheet. + +If one need to extend the sample size, the file contains commented values for sample size 21-100. +One has to adjust **STUDENT_MAX_SIZE** in **StudentTable.h** too (max 255). +Note this will cost extra RAM. + +If one wants to reduce the sample size / RAM, one can comment part of the table not needed. +One has to adjust **STUDENT_MAX_SIZE** too. ### History -The T-distribution is developed by W.A. Gosset in 1908 while working for Guinness Beer. -His goal was to guard a constant quality of the beer. For this he needed a method to determine -the quality of the raw materials like grains, malt and hops to +The T-distribution is developed by William Gosset, head experimental brewer, in 1908 while +working for Guinness Beer. His goal was to guard a constant quality of the beer brewed. +He needed a method to determine the quality of the raw materials like grains, malt and hops, +based on small samples from the fields. +For this he invented the t-distribution and published it under the name Student. +Therefore this distribution is also known as the Student distribution. + ### Accuracy / precision +The version 0.1.x uses float for internal storage. This means precision is at most 6-7 digits. + The version 0.1.x lookup table has 20 x 5 values with 3 decimals, coded as an uint16_t (for RAM) -This allows about 3-4 digits precision in the found interval. +This allows about 3-4 digits precision for the found interval. -The 0.1.x version does not interpolate the confidence level yet, and only support 5 distinct levels. +The 0.1.x version does not interpolate the confidence level (yet), and only support 5 distinct levels. This interpolation (between 80-99) is planned to be implemented in the future. +If a non supported confidence level is used, the library will use 95%. If you need only one confidence interval you could strip the lookup table to one column. @@ -78,11 +98,10 @@ If you need only one confidence interval you could strip the lookup table to one #include Student.h ``` - ### Constructor + meta -- **Student()** constructor. 0.1.x has a fixed max sample size = 20. -- **uint8_t getSize()** returns 20. +- **Student()** constructor. 0.1.x has a fixed max sample size STUDENT_MAX_SIZE = 20. +- **uint8_t getSize()** returns STUDENT_MAX_SIZE == 20. - **uint8_t getCount()** returns the number of samples added. Returns value between 0 .. getSize(). - **void reset()** resets internal counter to zero. @@ -96,9 +115,10 @@ Returns false if the internal buffer would "overflow". ### Math -- **float mean()** returns estimated mean based upon samples added. -- **float variance()** idem -- **float deviation()** idem +- **float mean()** returns mean (average) of the samples added. +This is the estimated mean of the population from which the samples are taken. +- **float variance()** returns variance of the samples added. +- **float deviation()** returns standard deviation of the samples added. - **float estimatedDeviation()** returns estimated deviation of the estimated mean (based upon the samples). @@ -107,7 +127,6 @@ estimated mean (based upon the samples). Confidence should be 80, 90, 95, 98 or 99. The confidence level is not interpolated and incorrect values are replaced by 95%. - - **float intervalDelta(int confidence)** returns the delta to be added oor subtracted to the mean to determine the confidence interval. - **float meanLower(int confidence)** returns mean - intervalDelta. @@ -128,6 +147,7 @@ oor subtracted to the mean to determine the confidence interval. - optimize lookup table, PROGMEM for footprint? - dynamic allocation for sizes > 20 - or derived classes, student30, student40, student50, student100? + - linear interpolation for values > 10 (performance?) - add interpolation to **intervalDelta()** so confidence level (0.2.x) could be any integer value from 80-99 (maybe even float?) @@ -135,6 +155,10 @@ oor subtracted to the mean to determine the confidence interval. - add examples - add unit tests +- replace look up table with a formula? (performance drop!!) +- access function for internal array to access samples? +- template class instead of STUDENT_MAX_SIZE? (becomes different types). +- circular buffer for the samples? Running T-test? #### Won't (unless requested) diff --git a/Student.h b/Student.h index d3a327c..d3a9807 100644 --- a/Student.h +++ b/Student.h @@ -18,7 +18,7 @@ class Student public: Student() { - _size = 10; + _size = STUDENT_MAX_SIZE; _count = 0; } @@ -133,7 +133,7 @@ class Student uint8_t _size; uint8_t _count; float _mean = 0; - float _value[20]; + float _value[STUDENT_MAX_SIZE]; }; diff --git a/StudentTable.h b/StudentTable.h index ff244c2..fdacfc0 100644 --- a/StudentTable.h +++ b/StudentTable.h @@ -6,18 +6,24 @@ // PURPOSE: Arduino library for Student or T-distribution math. // DATE: 2024-07-22 -// lookup table, do not change. +// lookup table, do not change (unless you know what you do). // // spreadsheet = ROUND(1000 * T.INV.2T(C$4;$B7)) // // x = degrees of freedom -// y = confidence level 0.1, 0.05, 0.025, 0.01, 0.005 (right only) +// y = confidence level 0.1, 0.05, 0.025, 0.01, 0.005 (single ended) // values are multiplied by 1000, to save RAM. -// +// + + +// command line overrule possible +#ifndef STUDENT_MAX_SIZE +#define STUDENT_MAX_SIZE (20) +#endif // uses 200 bytes of RAM -uint16_t StudentLUT[20][5] +uint16_t StudentLUT[STUDENT_MAX_SIZE][5] { // 80% 90% 95% 98% 99% //--------------------------------- @@ -60,12 +66,94 @@ uint16_t StudentLUT[20][5] { 1311, 1699, 2045, 2462, 2756}, { 1310, 1697, 2042, 2457, 2750}, // n = 30 + { 1309, 1696, 2040, 2453, 2744}, + { 1309, 1694, 2037, 2449, 2738}, + { 1308, 1692, 2035, 2445, 2733}, + { 1307, 1691, 2032, 2441, 2728}, + { 1306, 1690, 2030, 2438, 2724}, + + { 1306, 1688, 2028, 2434, 2719}, + { 1305, 1687, 2026, 2431, 2715}, + { 1304, 1686, 2024, 2429, 2712}, + { 1304, 1685, 2023, 2426, 2708}, + { 1303, 1684, 2021, 2423, 2704}, // n = 40 + + { 1303, 1683, 2020, 2421, 2701}, + { 1302, 1682, 2018, 2418, 2698}, + { 1302, 1681, 2017, 2416, 2695}, + { 1301, 1680, 2015, 2414, 2692}, + { 1301, 1679, 2014, 2412, 2690}, + + { 1300, 1679, 2013, 2410, 2687}, + { 1300, 1678, 2012, 2408, 2685}, + { 1299, 1677, 2011, 2407, 2682}, + { 1299, 1677, 2010, 2405, 2680}, + { 1299, 1676, 2009, 2403, 2678}, // n = 50 + + { 1298, 1675, 2008, 2402, 2676}, + { 1298, 1675, 2007, 2400, 2674}, + { 1298, 1674, 2006, 2399, 2672}, + { 1297, 1674, 2005, 2397, 2670}, + { 1297, 1673, 2004, 2396, 2668}, + + { 1297, 1673, 2003, 2395, 2667}, + { 1297, 1672, 2002, 2394, 2665}, + { 1296, 1672, 2002, 2392, 2663}, + { 1296, 1671, 2001, 2391, 2662}, + { 1296, 1671, 2000, 2390, 2660}, // n = 60 + + { 1296, 1670, 2000, 2389, 2659}, + { 1295, 1670, 1999, 2388, 2657}, + { 1295, 1669, 1998, 2387, 2656}, + { 1295, 1669, 1998, 2386, 2655}, + { 1295, 1669, 1997, 2385, 2654}, + + { 1295, 1668, 1997, 2384, 2652}, + { 1294, 1668, 1996, 2383, 2651}, + { 1294, 1668, 1995, 2382, 2650}, + { 1294, 1667, 1995, 2382, 2649}, + { 1294, 1667, 1994, 2381, 2648}, // n = 70 + + { 1294, 1667, 1994, 2380, 2647}, + { 1293, 1666, 1993, 2379, 2646}, + { 1293, 1666, 1993, 2379, 2645}, + { 1293, 1666, 1993, 2378, 2644}, + { 1293, 1665, 1992, 2377, 2643}, + + { 1293, 1665, 1992, 2376, 2642}, + { 1293, 1665, 1991, 2376, 2641}, + { 1292, 1665, 1991, 2375, 2640}, + { 1292, 1664, 1990, 2374, 2640}, + { 1292, 1664, 1990, 2374, 2639}, // n = 80 + + { 1292, 1664, 1990, 2373, 2638}, + { 1292, 1664, 1989, 2373, 2637}, + { 1292, 1663, 1989, 2372, 2636}, + { 1292, 1663, 1989, 2372, 2636}, + { 1292, 1663, 1988, 2371, 2635}, + + { 1291, 1663, 1988, 2370, 2634}, + { 1291, 1663, 1988, 2370, 2634}, + { 1291, 1662, 1987, 2369, 2633}, + { 1291, 1662, 1987, 2369, 2632}, + { 1291, 1662, 1987, 2368, 2632}, // n = 90 + + { 1291, 1662, 1986, 2368, 2631}, + { 1291, 1662, 1986, 2368, 2630}, + { 1291, 1661, 1986, 2367, 2630}, + { 1291, 1661, 1986, 2367, 2629}, + { 1291, 1661, 1985, 2366, 2629}, + + { 1290, 1661, 1985, 2366, 2628}, + { 1290, 1661, 1985, 2365, 2627}, + { 1290, 1661, 1984, 2365, 2627}, + { 1290, 1660, 1984, 2365, 2626}, + { 1290, 1660, 1984, 2364, 2626} // n = 100 -40 1303 1684 2021 2423 2704 -60 1296 1671 2000 2390 2660 120 1289 1658 1980 2358 2617 10000 1282 1645 1960 2327 2576 // effectively infinity. */ + // -- END OF FILE --