diff --git a/adix.html b/adix.html new file mode 100644 index 0000000..4ecd334 --- /dev/null +++ b/adix.html @@ -0,0 +1,140 @@ + + + + +
+ + + + + + + + + + + + + +This module provides several alternative hashes to the Nim stdlib targeting high entropy in low order bits for and mask conversion to a table index. Activate via e.g., proc hash(x: K): Hash {.inline.} = hashRoMu1(x), where K is your integer key type.
+The fastest hash used here is a likely the multiply & rotate hash (lately called a "Fibonacci hash", after Knuth's golden ratio fascination. Knuth immediately, in context, shows it works for any irrational number and then computer arithmetic is finite anyway). This propagates entropy in key bits to the double-width product in a Pascal's Binomial Triangle sort of pattern. Middle bits have the most key related entropy (there is even a PRNG method called "middle square" based on this pattern). More specifically, consider the product of a pair of 2 digit decimal numbers: (a1*10+a0)*(b1*10+b0) = a1*b1*100 + (a1*b0+b0*a1)*10 + a0*b0. This shows the "triangle" kind of pattern that makes the mixing strongest in the middle.
+The full product is easily accessed for half CPU register width and narrower numbers. For the widest integer type, C/backend arithmetic rules only give the modulus|lower order bits of said product (x86/amd64 does make the upper word available in another register). hashRoMu1 takes the simple portable way out and discards half of key entropy, just rotating down the more well mixed bits (bit reversal might be better, though expensive compared to rotr). Reducing hash codes to table address via shr is another way out. Getting high order avalanche is more natural than the low order avalanche needed for and mask, both may be https://burtleburtle.net/bob/hash/integer.html's "half avalanche". Anyway, rotating is portably fast (often 1 cycle latency, 1-per cycle tput for an immediate constant number of bits). Bit reversal is not so slow using a fully unrolled loop and 16 entry nibble lookup table. Different hashes will perform more/less well on different data. So, we just provide a few here, and one is based upon highly non-linear bit reversal.
+A stronger hash is hashWangYi1 which passes SMHasher's entropy tests, but "funnels" 64 inputs into about 62-bits of output entropy. It is still the default Hash-rehasher since it is fast & tables with 2^62 entries are rare. WangYi1's primitive is the xor of the high & low parts of a double-width product of a salt with a key. The xor blends well-mixed low order bits of the high output product word with less well-mixed low order bits of the low output product word, yielding "mostly level" mixing across all bits of the hash. WangYi1 takes two rounds of such mixing to achieve avalanche. It may be possible to "nearly pass" in only one round. hashWY0 is one attempt at that, but the salt may need to be re-optimized to have any hope of doing well on SMHasher. This hash became the default hash(int) in Nim stdlib.
+There is a motley selection of other hashes here. The strongest fast hash is Pelle Evensen's bijective (64-bits -> 64-bits) hashNASAM. It's 2-3x slower than WangYi1 on some CPU architectures, but has no known statistical flaws.
+Incidentally, most "fast in GB/s" hashes are far too slow for just one int. Even assessing them so is misleading for lookup tables. You want time =~ a + b*nBytes where a & b maybe come from line regressions. 1/b alone says too little, especially for integer keys where a dominates. Short string hashes or a alone can similarly mislead. TLDR, want >=2 "summary numbers" not one & whole curves are best.
+ + + +proc hashDegski[T: Ordinal | enum](x: T): Hash {.inline.}
proc hashIdentity[T: Ordinal | enum](x: T): Hash {.inline.}
proc hashMoreMur[T: Ordinal | enum](x: T): Hash {.inline.}
proc hashRevFib(x: int32 | uint32): Hash {.inline.}
proc hashRevFib(x: int64 | uint64): Hash {.inline.}
proc hashRevFib(x: uint): Hash {.inline, ...raises: [], tags: [].}
proc hashRoMu1(x: SomeOrdinal | Hash): Hash {.inline.}
proc hashRoMu2(x: SomeOrdinal | Hash): Hash {.inline.}
proc hashSplit64[T: Ordinal | enum](x: T): Hash {.inline.}
proc hashSplitMix[T: Ordinal | enum](x: T): Hash {.inline.}
proc hashWangYi1[T: Ordinal | enum](x: T): Hash {.inline.}
proc hashWY0[T: Ordinal | enum](x: T): Hash {.inline.}
proc normalized[T](x: openArray[T]): seq[float]
proc raiseNotFound[K](key: K)
proc secureSalt(x: pointer): Hash {.inline, ...raises: [], tags: [].}
proc vmaddrSalt(x: pointer): Hash {.inline, ...raises: [], tags: [].}
Approximately k-Most Oft. This is constant space & constant inc&query time with adjustably small error. AMOft[K,C] augments the sketch with an O(k) paired Table & HeapQueue. Present impl does NOT scale well to very large k (past ~top-1000). E.g.:
var amo = initAMOft[int, uint32](k=10) +for i in 0..<500: amo.inc i, i # Not v.skewed => not v.accurate +for (i, c) in amo.mostCommon(5): echo i, " ", c+
AMOft[K; C] = object + sketch: CtMnSketch[K, C] + top: HeapQueue[(C, int)] + no2key: seq[K] + key2no: Table[K, int] + k: int +
CtMnSketch[K; C] = object + data: seq[seq[C]] + salts: seq[Hash] + w: int +
proc inc[K, C](a: var AMOft[K, C]; key: K; r = 1)
proc inc[K, C](cs: var CtMnSketch[K, C]; key: K; r = 1): C {.discardable.}
proc initAMOft[K, C](k, w: int; d = 4; salts: seq[int] = @[]): AMOft[K, C]
proc initCtMnSketch[K, C](w: int; d = 4; salts: seq[int] = @[]): CtMnSketch[K, C]
iterator mostCommon[K, C](a: AMOft[K, C]; k = 0): (K, C)
Binary Indexed Sum Tree (BIST); Invented by P.Fenwick in 1994. { Fenwick proposed "BIT" but that A) collides with many uses, B) takes partial sums as implied, while the trick applies more broadly (e.g.products), and C) does not rhyme with "dist" (for distribution, which is what this is mostly about). } While the Internet has many tutorials, to my knowledge, no one (yet) collects these algorithms all in one place. Fenwick1994 itself messed up on what we here call invCDF, correcting with a tech report a year later. This implementation is also careful to only allocate needed space/handle [0].
+The basic idea of a standard binary heap with kids(k)@[2k],[2k+1] for dynamic distributions goes back to Wong&Easton 1980 (or earlier?). Fenwick's clever index encoding/overlaid trees trick allows using 1/4 to 1/2 that space (only max index+1 array elements vs 2*lgCeil(n)). Meaningful explanations
+The Bist[T] type in this module is generic over the type of counters used for partial sums|counts. For few total items, you can use a Bist[uint8] while for many you want to use Bist[uint32]. This can be space-optimized up to 2X further with a sequint specialized to store an array of B-bit counters. Also, ranked B-trees start being faster for >28-bit index spaces.
+ + +proc cdf[T](t: Bist[T]; i: int): T {.inline.}
proc fromCnts[T](t: var Bist[T])
proc inc[T](t: var Bist[T]; i, d: SomeInteger) {.inline.}
proc invCDF[T](t: Bist[T]; s: T; s0, s1: var T): int {.inline.}
proc invCDF[T](t: Bist[T]; s: T; s0: var T): int {.inline.}
proc max[T](t: Bist[T]): int {.inline.}
proc min[T](t: Bist[T]): int {.inline.}
proc pmf[T](t: Bist[T]; i: int): T {.inline.}
proc quantile[T](t: Bist[T]; q: float): float {.inline.}
proc quantile[T](t: Bist[T]; q: float; iL, iH: var int): float {.inline.}
This is a reimplementation of some things we need from bitops which has CT trouble due to importc's. (I feel it's a better naming/factoring, too).
+proc ceilPow2(x: int): int {.noSideEffect, inline, ...raises: [], tags: [].}
proc floorPow2(x: int): int {.noSideEffect, inline, ...raises: [], tags: [].}
proc lg(x: int): int {.inline, ...raises: [], tags: [].}
proc lgCeil(x: int): int {.inline, ...raises: [], tags: [].}
proc lgFloor(x: int): int {.inline, ...raises: [], tags: [].}
proc reverseBits(x: uint32): uint32 {....raises: [], tags: [].}
proc reverseBits(x: uint64): uint64 {....raises: [], tags: [].}
proc reverseBitsByte(x: uint8): uint8 {.inline, ...raises: [], tags: [].}
proc rotateLeftBits(a: uint64; numBits: int): uint64 {.inline, ...raises: [], + tags: [].}
proc rotateRightBits(a: uint64; numBits: int): uint64 {.inline, ...raises: [], + tags: [].}
This module specializes to the case where keys are 1..k-bit ints & values are 0..v-bit ints (k+v<=8*int.sizeof) using one SeqUInt as backing store. (Mnemonically, "BL" = "Bit Level" or start of "BLoom Filter", a sometimes competing data structure.) Users must give a number of bits for the key. Bits for values and the sentinel key default to 0. BLTab otherwise tries to be similar to hash variants of multisets.
+ +blGrowPow2 = 1'u8
blInitialSize = 2
blRobinHood = false
blSentinel = 0'u8
blValueBits = 0'u8
proc contains(s: BLTab; hc: Hash): bool {.inline, ...raises: [], tags: [].}
proc containsOrIncl(s: var BLTab; hc: Hash): bool {.inline, + ...raises: [ResourceExhaustedError], tags: [].}
proc init(s: var BLTab; size, mask: int) {.inline, ...raises: [], tags: [].}
proc initBLTab(size, mask: int): BLTab {.inline, ...raises: [], tags: [].}
proc missingOrExcl(s: var BLTab; hc: Hash): bool {.inline, ...raises: [], tags: [].}
I know of no other as-simple/general FOSS B-Tree (in any prog.langs|books). Theory people recurse; DB code clutters w/sync; Other APIs are less general. Correct me if I am wrong/cite this if it be your starting point. Little effort is made to explain these algos in comments as it's impractical to cram a course into this file with no figures. More details & resources are at https://en.wikipedia.org/wiki/B-tree or in Graefe & Kuno 2011.
+This module defines a template that defines procs on a pretty general B-Tree. The tree can be positional-only, keyed-only or keyed-ranked, be either set of keyed rows or (key,value)-style, have its nodes allocated in any way (via abstract ptrs & deref, eg. on disk via memfiles, via node pools hanging off another object, or GC'd refs), & manage dup keys either inlined into the structure or handled externally (within the same rank space!). This is >36 (3rk*2kv*3alloc*2du) styles from 1 instantiation harness. The template has many parameters to control all these choices. All but Ob are defaulted:
+Common notation/abbreviations in this code/APiS:
+There is no "Tree" type distinct from the "SubTree/Node" type. Once made a root node never moves. That root address is the only handle needed. "nil" ptrs (in whatever allocation arena is used) are just empty trees. Because linear ordering always has exactly 2 sides, parameterization into s|side often keeps life simple/organized (cf. Mehlhorn DS&Algos|common sense).
+Routines are all non-recursive. Instead a Path object is central to the API & we clearly separate cursor manipulation from mutation. This also makes the 3 main styles (ranked-only, keyed-only, keyed-ranked) symmetric & removes recursion overhead (big for tall trees/small m). Each instance can be a multiset/table with "sided" edits (stack/queue) of duplicate key series.
+Victim replacement selection in internal deletes biases toward uniform node occupancy rather than minimum node count. The bulk loader to build a minimum height tree from pre-ordered inputs also allows leaving 1 (generalizable to x?) spare slot in each node to speed early inserts later. A property check routine is provided for would-be extenders. There is presently no provision for for concurrent access as the focus is just a good single-threaded tree.
+ +One limitation is that leaf&internal nodes have the same size&representation, wasting ~4..8*m B/leaf. This is <33% waste for Ob >=8..16 bytes. (Post aging, occupancy =~69% anyway.) This cost is spent to avoid complexity of either two node allocation pools with dynamic conversion or different-m orders for leaf & internal nodes. Max data density means wider 4..7 node split-merges (not 2..3) & specializing on key types, anyway; Eg. for string keys not duplicating prefixes in a leaf between ("aab1", "aab2").
+Nim TODOs incl: make easy to include inside other generic types, add easy HashSet & Table use in terms of this lower-level core, run-tm ->CT errs, do GC'd&memfiles Ln variants, do distinct int Ln for ovrld/figure out exports.
+ +gcc/clang error out if the generated C includes a tmmintrin.h header on CPUs without -march=enabling the instructions. An admittedly expensive staticExec lets us probe a build-time system for all pre-defined C preprocessor macros in one execution. We then postprocess these into a set of flags for Nim compile-time when checks to make "fall back" easy/natural.
+X86Feature = enum + x86sse2, x86ssse3, x86bmi2
ccPreDefs = "#define __SSP_STRONG__ 3\n#define __UINT_LEAST16_MAX__ 0xffff\n#define __ATOMIC_ACQUIRE 2\n#define __FLT128_MAX_10_EXP__ 4932\n#define __GCC_IEC_559_COMPLEX 2\n#define __UINT_LEAST8_TYPE__ unsigned char\n#define __SIZEOF_FLOAT80__ 16\n#define __INTMAX_C(c) c ## L\n#define __tune_haswell__ 1\n#define __MOVBE__ 1\n#define __UINT8_MAX__ 0xff\n#define __SCHAR_WIDTH__ 8\n#define __WINT_MAX__ 0xffffffffU\n#define __ORDER_LITTLE_ENDIAN__ 1234\n#define __SIZE_MAX__ 0xffffffffffffffffUL\n#define __SSE4_1__ 1\n#define __WCHAR_MAX__ 0x7fffffff\n#define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_1 1\n#define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_2 1\n#define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4 1\n#define __DBL_DENORM_MIN__ ((double)4.94065645841246544176568792868221372e-324L)\n#define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_8 1\n#define __GCC_ATOMIC_CHAR_LOCK_FREE 2\n#define __GCC_IEC_559 2\n#define __FLT32X_DECIMAL_DIG__ 17\n#define __FLT_EVAL_METHOD__ 0\n#define __FLT64_DECIMAL_DIG__ 17\n#define __CET__ 3\n#define __DBL_MIN_EXP__ (-1021)\n#define __GCC_ATOMIC_CHAR32_T_LOCK_FREE 2\n#define __core_avx2__ 1\n#define __UINT_FAST64_MAX__ 0xffffffffffffffffUL\n#define __SIG_ATOMIC_TYPE__ int\n#define __DBL_MIN_10_EXP__ (-307)\n#define __FINITE_MATH_ONLY__ 0\n#define __FLT32X_MAX_EXP__ 1024\n#define __FLT32_HAS_DENORM__ 1\n#define __UINT_FAST8_MAX__ 0xff\n#define __FLT32_MAX_10_EXP__ 38\n#define __DEC64_MAX_EXP__ 385\n#define __INT8_C(c) c\n#define __INT_LEAST8_WIDTH__ 8\n#define __UINT_LEAST64_MAX__ 0xffffffffffffffffUL\n#define __SHRT_MAX__ 0x7fff\n#define __LDBL_MAX__ 1.18973149535723176502126385303097021e+4932L\n#define __FLT64X_MAX_10_EXP__ 4932\n#define __LDBL_IS_IEC_60559__ 2\n#define __FLT64X_HAS_QUIET_NAN__ 1\n#define __UINT_LEAST8_MAX__ 0xff\n#define __GCC_ATOMIC_BOOL_LOCK_FREE 2\n#define __LAHF_SAHF__ 1\n#define __FLT128_DENORM_MIN__ 6.47517511943802511092443895822764655e-4966F128\n#define __UINTMAX_TYPE__ long unsigned int\n#define __linux 1\n#define __DEC32_EPSILON__ 1E-6DF\n#define __FLT_EVAL_METHOD_TS_18661_3__ 0\n#define __unix 1\n#define __UINT32_MAX__ 0xffffffffU\n#define __FLT128_MIN_EXP__ (-16381)\n#define __WINT_MIN__ 0U\n#define __CHAR_BIT__ 8\n#define __FLT128_MIN_10_EXP__ (-4931)\n#define __FLT32X_IS_IEC_60559__ 2\n#define __INT_LEAST16_WIDTH__ 16\n#define __SCHAR_MAX__ 0x7f\n#define __FLT128_MANT_DIG__ 113\n#define __WCHAR_MIN__ (-__WCHAR_MAX__ - 1)\n#define __INT64_C(c) c ## L\n#define __GCC_ATOMIC_POINTER_LOCK_FREE 2\n#define __FLT32X_MANT_DIG__ 53\n#define __USER_LABEL_PREFIX__ \n#define __FLT64X_EPSILON__ 1.08420217248550443400745280086994171e-19F64x\n#define __STDC_HOSTED__ 1\n#define __FLT32_DIG__ 6\n#define __FLT_EPSILON__ 1.19209289550781250000000000000000000e-7F\n#define __ABM__ 1\n#define __SHRT_WIDTH__ 16\n#define __FLT32_IS_IEC_60559__ 2\n#define __LDBL_MIN__ 3.36210314311209350626267781732175260e-4932L\n#define __STDC_UTF_16__ 1\n#define __DBL_IS_IEC_60559__ 2\n#define __DEC32_MAX__ 9.999999E96DF\n#define __FLT64X_DENORM_MIN__ 3.64519953188247460252840593361941982e-4951F64x\n#define __FP_FAST_FMA 1\n#define __CRC32__ 1\n#define __FLT32X_HAS_INFINITY__ 1\n#define __INT32_MAX__ 0x7fffffff\n#define __unix__ 1\n#define __INT_WIDTH__ 32\n#define __SIZEOF_LONG__ 8\n#define __STDC_IEC_559__ 1\n#define __STDC_ISO_10646__ 201706L\n#define __UINT16_C(c) c\n#define __DECIMAL_DIG__ 21\n#define __STDC_IEC_559_COMPLEX__ 1\n#define __FLT64_EPSILON__ 2.22044604925031308084726333618164062e-16F64\n#define __DBL_DIG__ 15\n#define __gnu_linux__ 1\n#define __FLT128_IS_IEC_60559__ 2\n#define __FLT64X_MIN_10_EXP__ (-4931)\n#define __LDBL_HAS_QUIET_NAN__ 1\n#define __FLT64_MANT_DIG__ 53\n#define __FLT_MIN__ 1.17549435082228750796873653722224568e-38F\n#define __FLT64X_MANT_DIG__ 64\n#define __GNUC__ 11\n#define __pie__ 2\n#define __MMX__ 1\n#define __FLT_HAS_DENORM__ 1\n#define __SIZEOF_LONG_DOUBLE__ 16\n#define __XSAVEOPT__ 1\n#define __BIGGEST_ALIGNMENT__ 32\n#define __PRFCHW__ 1\n#define __FLT64_MAX_10_EXP__ 308\n#define __DBL_MAX__ ((double)1.79769313486231570814527423731704357e+308L)\n#define __INT_FAST32_MAX__ 0x7fffffffffffffffL\n#define __DBL_HAS_INFINITY__ 1\n#define __SSE4_2__ 1\n#define __SIZEOF_FLOAT__ 4\n#define __DEC32_MIN_EXP__ (-94)\n#define __INTPTR_WIDTH__ 64\n#define __FLT64X_HAS_INFINITY__ 1\n#define __UINT_LEAST32_MAX__ 0xffffffffU\n#define __FLT32X_HAS_DENORM__ 1\n#define __INT_FAST16_TYPE__ long int\n#define __MMX_WITH_SSE__ 1\n#define __LDBL_HAS_DENORM__ 1\n#define __FLT_DIG__ 6\n#define __FLT128_HAS_INFINITY__ 1\n#define __DEC32_MIN__ 1E-95DF\n#define __POPCNT__ 1\n#define __DBL_MAX_EXP__ 1024\n#define __WCHAR_WIDTH__ 32\n#define __FLT32_MAX__ 3.40282346638528859811704183484516925e+38F32\n#define __DEC128_EPSILON__ 1E-33DL\n#define __SSE2_MATH__ 1\n#define __ATOMIC_HLE_RELEASE 131072\n#define __PTRDIFF_MAX__ 0x7fffffffffffffffL\n#define __FLT128_MAX_EXP__ 16384\n#define __amd64 1\n#define __AVX__ 1\n#define __LONG_LONG_MAX__ 0x7fffffffffffffffLL\n#define __SIZEOF_SIZE_T__ 8\n#define __LZCNT__ 1\n#define __FLT64X_MIN_EXP__ (-16381)\n#define __SIZEOF_WINT_T__ 4\n#define __LONG_LONG_WIDTH__ 64\n#define __FLT32_MAX_EXP__ 128\n#define __GXX_ABI_VERSION 1016\n#define __RTM__ 1\n#define __FLT_MIN_EXP__ (-125)\n#define __GCC_HAVE_DWARF2_CFI_ASM 1\n#define __core_avx2 1\n#define __INT16_MAX__ 0x7fff\n#define __x86_64 1\n#define __INT_FAST64_TYPE__ long int\n#define __FP_FAST_FMAF 1\n#define __FLT64_DENORM_MIN__ 4.94065645841246544176568792868221372e-324F64\n#define __DBL_MIN__ ((double)2.22507385850720138309023271733240406e-308L)\n#define __PCLMUL__ 1\n#define __FLT128_EPSILON__ 1.92592994438723585305597794258492732e-34F128\n#define __FLT64X_NORM_MAX__ 1.18973149535723176502126385303097021e+4932F64x\n#define __SIZEOF_POINTER__ 8\n#define __F16C__ 1\n#define __LP64__ 1\n#define __DBL_HAS_QUIET_NAN__ 1\n#define __FLT32X_EPSILON__ 2.22044604925031308084726333618164062e-16F32x\n#define __DECIMAL_BID_FORMAT__ 1\n#define __FLT64_MIN_EXP__ (-1021)\n#define __FLT64_MIN_10_EXP__ (-307)\n#define __FLT64X_DECIMAL_DIG__ 21\n#define __DEC128_MIN__ 1E-6143DL\n#define __REGISTER_PREFIX__ \n#define __UINT16_MAX__ 0xffff\n#define __DBL_HAS_DENORM__ 1\n#define __LDBL_HAS_INFINITY__ 1\n#define __FLT32_MIN__ 1.17549435082228750796873653722224568e-38F32\n#define __UINT8_TYPE__ unsigned char\n#define __XSAVE__ 1\n#define __NO_INLINE__ 1\n#define __DEC_EVAL_METHOD__ 2\n#define __DEC128_MAX__ 9.999999999999999999999999999999999E6144DL\n#define __FLT32X_MAX_10_EXP__ 308\n#define __LDBL_DECIMAL_DIG__ 21\n#define __VERSION__ \"11.4.0\"\n#define __UINT64_C(c) c ## UL\n#define __FMA__ 1\n#define _STDC_PREDEF_H 1\n#define __INT_LEAST32_MAX__ 0x7fffffff\n#define __GCC_ATOMIC_INT_LOCK_FREE 2\n#define __FLT32_MANT_DIG__ 24\n#define __FLOAT_WORD_ORDER__ __ORDER_LITTLE_ENDIAN__\n#define __STDC_IEC_60559_COMPLEX__ 201404L\n#define __ATOMIC_HLE_ACQUIRE 65536\n#define __FLT128_HAS_DENORM__ 1\n#define __FLT32_DECIMAL_DIG__ 9\n#define __FLT128_DIG__ 33\n#define __INT32_C(c) c\n#define __DEC64_EPSILON__ 1E-15DD\n#define __ORDER_PDP_ENDIAN__ 3412\n#define __DEC128_MIN_EXP__ (-6142)\n#define __INT_FAST32_TYPE__ long int\n#define __UINT_LEAST16_TYPE__ short unsigned int\n#define unix 1\n#define __SIZE_TYPE__ long unsigned int\n#define __UINT64_MAX__ 0xffffffffffffffffUL\n#define __FLT_IS_IEC_60559__ 2\n#define __GNUC_WIDE_EXECUTION_CHARSET_NAME \"UTF-32LE\"\n#define __FLT64X_DIG__ 18\n#define __INT8_TYPE__ signed char\n#define __ELF__ 1\n#define __GCC_ASM_FLAG_OUTPUTS__ 1\n#define __UINT32_TYPE__ unsigned int\n#define __FLT_RADIX__ 2\n#define __INT_LEAST16_TYPE__ short int\n#define __LDBL_EPSILON__ 1.08420217248550443400745280086994171e-19L\n#define __UINTMAX_C(c) c ## UL\n#define __SSE_MATH__ 1\n#define __FLT32X_MIN__ 2.22507385850720138309023271733240406e-308F32x\n#define __SIG_ATOMIC_MAX__ 0x7fffffff\n#define __GCC_ATOMIC_WCHAR_T_LOCK_FREE 2\n#define __STDC_IEC_60559_BFP__ 201404L\n#define __SIZEOF_PTRDIFF_T__ 8\n#define __haswell 1\n#define __RDSEED__ 1\n#define __BMI__ 1\n#define __LDBL_DIG__ 18\n#define __FLT64_IS_IEC_60559__ 2\n#define __x86_64__ 1\n#define __FLT32X_MIN_EXP__ (-1021)\n#define __DEC32_SUBNORMAL_MIN__ 0.000001E-95DF\n#define __INT_FAST16_MAX__ 0x7fffffffffffffffL\n#define __FLT64_DIG__ 15\n#define __UINT_FAST32_MAX__ 0xffffffffffffffffUL\n#define __UINT_LEAST64_TYPE__ long unsigned int\n#define __FLT_HAS_QUIET_NAN__ 1\n#define __FLT_MAX_10_EXP__ 38\n#define __LONG_MAX__ 0x7fffffffffffffffL\n#define __FLT64X_HAS_DENORM__ 1\n#define __DEC128_SUBNORMAL_MIN__ 0.000000000000000000000000000000001E-6143DL\n#define __FLT_HAS_INFINITY__ 1\n#define __GNUC_EXECUTION_CHARSET_NAME \"UTF-8\"\n#define __UINT_FAST16_TYPE__ long unsigned int\n#define __DEC64_MAX__ 9.999999999999999E384DD\n#define __INT_FAST32_WIDTH__ 64\n#define __tune_core_avx2__ 1\n#define __CHAR16_TYPE__ short unsigned int\n#define __PRAGMA_REDEFINE_EXTNAME 1\n#define __SIZE_WIDTH__ 64\n#define __SEG_FS 1\n#define __INT_LEAST16_MAX__ 0x7fff\n#define __DEC64_MANT_DIG__ 16\n#define __INT64_MAX__ 0x7fffffffffffffffL\n#define __SEG_GS 1\n#define __FLT32_DENORM_MIN__ 1.40129846432481707092372958328991613e-45F32\n#define __SIG_ATOMIC_WIDTH__ 32\n#define __INT_LEAST64_TYPE__ long int\n#define __INT16_TYPE__ short int\n#define __INT_LEAST8_TYPE__ signed char\n#define __STDC_VERSION__ 201710L\n#define __SIZEOF_INT__ 4\n#define __DEC32_MAX_EXP__ 97\n#define __INT_FAST8_MAX__ 0x7f\n#define __FLT128_MAX__ 1.18973149535723176508575932662800702e+4932F128\n#define __INTPTR_MAX__ 0x7fffffffffffffffL\n#define linux 1\n#define __AVX2__ 1\n#define __FLT64_HAS_QUIET_NAN__ 1\n#define __FLT32_MIN_10_EXP__ (-37)\n#define __SSSE3__ 1\n#define __FLT32X_DIG__ 15\n#define __RDRND__ 1\n#define __PTRDIFF_WIDTH__ 64\n#define __LDBL_MANT_DIG__ 64\n#define __FLT64_HAS_INFINITY__ 1\n#define __FLT64X_MAX__ 1.18973149535723176502126385303097021e+4932F64x\n#define __SIG_ATOMIC_MIN__ (-__SIG_ATOMIC_MAX__ - 1)\n#define __code_model_small__ 1\n#define __GCC_ATOMIC_LONG_LOCK_FREE 2\n#define __DEC32_MANT_DIG__ 7\n#define __INTPTR_TYPE__ long int\n#define __UINT16_TYPE__ short unsigned int\n#define __WCHAR_TYPE__ int\n#define __pic__ 2\n#define __UINTPTR_MAX__ 0xffffffffffffffffUL\n#define __INT_FAST64_WIDTH__ 64\n#define __INT_FAST64_MAX__ 0x7fffffffffffffffL\n#define __GCC_ATOMIC_TEST_AND_SET_TRUEVAL 1\n#define __FLT_NORM_MAX__ 3.40282346638528859811704183484516925e+38F\n#define __FLT32_HAS_INFINITY__ 1\n#define __FLT64X_MAX_EXP__ 16384\n#define __UINT_FAST64_TYPE__ long unsigned int\n#define __INT_MAX__ 0x7fffffff\n#define __linux__ 1\n#define __INT64_TYPE__ long int\n#define __FLT_MAX_EXP__ 128\n#define __ORDER_BIG_ENDIAN__ 4321\n#define __DBL_MANT_DIG__ 53\n#define __SIZEOF_FLOAT128__ 16\n#define __INT_LEAST64_MAX__ 0x7fffffffffffffffL\n#define __GCC_ATOMIC_CHAR16_T_LOCK_FREE 2\n#define __FP_FAST_FMAF32 1\n#define __DEC64_MIN__ 1E-383DD\n#define __WINT_TYPE__ unsigned int\n#define __UINT_LEAST32_TYPE__ unsigned int\n#define __SIZEOF_SHORT__ 2\n#define __FLT32_NORM_MAX__ 3.40282346638528859811704183484516925e+38F32\n#define __SSE__ 1\n#define __LDBL_MIN_EXP__ (-16381)\n#define __FLT64_MAX__ 1.79769313486231570814527423731704357e+308F64\n#define __WINT_WIDTH__ 32\n#define __FP_FAST_FMAF64 1\n#define __INT_LEAST8_MAX__ 0x7f\n#define __INT_LEAST64_WIDTH__ 64\n#define __LDBL_MAX_EXP__ 16384\n#define __SIZEOF_INT128__ 16\n#define __FLT64X_IS_IEC_60559__ 2\n#define __LDBL_MAX_10_EXP__ 4932\n#define __ATOMIC_RELAXED 0\n#define __DBL_EPSILON__ ((double)2.22044604925031308084726333618164062e-16L)\n#define __FLT32_MIN_EXP__ (-125)\n#define __FLT128_MIN__ 3.36210314311209350626267781732175260e-4932F128\n#define _LP64 1\n#define __UINT8_C(c) c\n#define __FLT64_MAX_EXP__ 1024\n#define __INT_LEAST32_TYPE__ int\n#define __SIZEOF_WCHAR_T__ 4\n#define __UINT64_TYPE__ long unsigned int\n#define __GNUC_PATCHLEVEL__ 0\n#define __FLT128_NORM_MAX__ 1.18973149535723176508575932662800702e+4932F128\n#define __amd64__ 1\n#define __FLT64_NORM_MAX__ 1.79769313486231570814527423731704357e+308F64\n#define __FLT128_HAS_QUIET_NAN__ 1\n#define __INTMAX_MAX__ 0x7fffffffffffffffL\n#define __INT_FAST8_TYPE__ signed char\n#define __FLT64X_MIN__ 3.36210314311209350626267781732175260e-4932F64x\n#define __GNUC_STDC_INLINE__ 1\n#define __FLT64_HAS_DENORM__ 1\n#define __FLT32_EPSILON__ 1.19209289550781250000000000000000000e-7F32\n#define __FP_FAST_FMAF32x 1\n#define __DBL_DECIMAL_DIG__ 17\n#define __STDC_UTF_32__ 1\n#define __INT_FAST8_WIDTH__ 8\n#define __FXSR__ 1\n#define __FLT32X_MAX__ 1.79769313486231570814527423731704357e+308F32x\n#define __DBL_NORM_MAX__ ((double)1.79769313486231570814527423731704357e+308L)\n#define __BYTE_ORDER__ __ORDER_LITTLE_ENDIAN__\n#define __INTMAX_WIDTH__ 64\n#define __UINT32_C(c) c ## U\n#define __FLT_DENORM_MIN__ 1.40129846432481707092372958328991613e-45F\n#define __INT8_MAX__ 0x7f\n#define __LONG_WIDTH__ 64\n#define __PIC__ 2\n#define __UINT_FAST32_TYPE__ long unsigned int\n#define __FLT32X_NORM_MAX__ 1.79769313486231570814527423731704357e+308F32x\n#define __CHAR32_TYPE__ unsigned int\n#define __FLT_MAX__ 3.40282346638528859811704183484516925e+38F\n#define __SSE2__ 1\n#define __INT32_TYPE__ int\n#define __SIZEOF_DOUBLE__ 8\n#define __FLT_MIN_10_EXP__ (-37)\n#define __FLT_MANT_DIG__ 24\n#define __FLT64_MIN__ 2.22507385850720138309023271733240406e-308F64\n#define __INT_LEAST32_WIDTH__ 32\n#define __INTMAX_TYPE__ long int\n#define __DEC128_MAX_EXP__ 6145\n#define __FSGSBASE__ 1\n#define __FLT32X_HAS_QUIET_NAN__ 1\n#define __ATOMIC_CONSUME 1\n#define __GNUC_MINOR__ 4\n#define __INT_FAST16_WIDTH__ 64\n#define __UINTMAX_MAX__ 0xffffffffffffffffUL\n#define __PIE__ 2\n#define __FLT32X_DENORM_MIN__ 4.94065645841246544176568792868221372e-324F32x\n#define __DBL_MAX_10_EXP__ 308\n#define __LDBL_DENORM_MIN__ 3.64519953188247460252840593361941982e-4951L\n#define __BMI2__ 1\n#define __INT16_C(c) c\n#define __STDC__ 1\n#define __AES__ 1\n#define __PTRDIFF_TYPE__ long int\n#define __haswell__ 1\n#define __DEC64_MIN_EXP__ (-382)\n#define __ATOMIC_SEQ_CST 5\n#define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_16 1\n#define __ADX__ 1\n#define __FLT32X_MIN_10_EXP__ (-307)\n#define __UINTPTR_TYPE__ long unsigned int\n#define __DEC64_SUBNORMAL_MIN__ 0.000000000000001E-383DD\n#define __DEC128_MANT_DIG__ 34\n#define __LDBL_MIN_10_EXP__ (-4931)\n#define __SIZEOF_LONG_LONG__ 8\n#define __HAVE_SPECULATION_SAFE_VALUE 1\n#define __FLT128_DECIMAL_DIG__ 36\n#define __GCC_ATOMIC_LLONG_LOCK_FREE 2\n#define __FLT32_HAS_QUIET_NAN__ 1\n#define __FLT_DECIMAL_DIG__ 9\n#define __UINT_FAST16_MAX__ 0xffffffffffffffffUL\n#define __LDBL_NORM_MAX__ 1.18973149535723176502126385303097021e+4932L\n#define __GCC_ATOMIC_SHORT_LOCK_FREE 2\n#define __SSE3__ 1\n#define __UINT_FAST8_TYPE__ unsigned char\n#define __ATOMIC_ACQ_REL 4\n#define __ATOMIC_RELEASE 3"
x86features = {x86sse2, x86ssse3, x86bmi2}
proc cumsum(c: ptr UncheckedArray[uint8]; n: uint) {....raises: [], tags: [].}
proc cumsum(c: ptr UncheckedArray[uint16]; n: uint) {....raises: [], tags: [].}
proc cumsum(c: ptr UncheckedArray[uint32]; n: uint) {....raises: [], tags: [].}
This module provides a directly indexed set/table representation via a dense seq[T] with an auxiliary sparse direct index to accelerate searching. "direct" means the same size as the key space or "alphabet". The index says what seq index has each (unique) key. This can do any unordered set op in guaranteed unit time cost per element -- make, insert, delete, member query, iterate, union/intersect/etc. -- all with arbitrarily "aged" sets.
+The catch is that the "unit" in said cost will only be small if the key space is small enough to afford allocating index space & if the working set of the index fits into low enough latency memory to not be too slow. Otherwise, other data structures like hash tables & B-trees will outperform this. Iteration is always in insertion order whenever no deletes have occurred. The "unit" is also 2 memory accesses per operation, vs. often 1 for lptabz. So, very large scale can make this guaranteed to be ~2X slowe than a good average case for lptabz, all depending upon exact requirements, of course.
+The earliest reference I have elaborating the properties of this approach is An Efficient Representation for Sparse Sets by Preston Briggs & Linda Torczon from Rice in 1993. It's simple enough that the idea may date back to early 1960s DB work (likely by Codd), maybe under a term like "direct indexing". The key type here must have an available conversion to int. Duplicate keys are not allowed for this one.
+Below here is pretty standard except for the generic signatures + +diGrowPow2 = 0
diInitialSize = 0
diRobinHood = false
proc containsOrIncl[K](t: var DITab[K, void]; key: K): bool {.inline.}
proc del[K, V](t: var DITab[K, V]; key: K) {.inline.}
proc depthStats[K, V](s: DITab[K, V]): tuple[m1, m2: float, mx: int]
proc difference[K](s1, s2: DITab[K, void]): DITab[K, void]
proc getOrDefault[K, V](t: DITab[K, V]; key: K; default = default(V)): V {. + inline.}
proc hasKeyOrPut[K, V](t: var DITab[K, V]; key: K; val: V): bool {.inline.}
proc inc[K, V: SomeInteger](t: var DITab[K, V]; key: K; amount: SomeInteger = 1) {. + inline.}
proc indexBy[A, K, V](collection: A; index: proc (x: V): K): DITab[K, V]
proc init[K, V](t: var DITab[K, V]; initialSize = 0; numer = 0; denom = 0; + minFree = 0; growPow2 = 0; rehash = false; robinhood = false) {. + inline.}
proc initDISet[K](initialSize = 0; numer = diNumer; denom = diDenom; + minFree = diMinFree; growPow2 = diGrowPow2; rehash = diRehash; + robinhood = diRobinHood): DISet[K] {.inline.}
proc initDITab[K, V](initialSize = 0; numer = 0; denom = 0; minFree = 0; + growPow2 = 0; rehash = false; robinhood = false): DITab[K, + V] {.inline.}
proc intersection[K](s1, s2: DITab[K, void]): DITab[K, void]
proc map[K, A](data: DITab[K, void]; op: proc (x: K): A {.closure.}): DITab[A, + void]
proc mgetOrPut[K, V](t: var DITab[K, V]; key: K; val: V): var V {.inline.}
proc mgetOrPut[K, V](t: var DITab[K, V]; key: K; val: V; had: var bool): var V {. + inline.}
proc missingOrExcl[K, V](t: var DITab[K, V]; key: K): bool
proc nthPair[K, V: not void](t: DITab[K, V]; n: int): (K, V) {.inline.}
proc nthPair[K, V: not void](t: var DITab[K, V]; n: int): (K, ptr V) {.inline.}
proc pop[K, V: not void](t: var DITab[K, V]; key: K; val: var V): bool {.inline.}
proc rightSize(count: int; numer = 0; denom = 0; minFree = 0): int {.inline, + ...deprecated: "Deprecated since 0.2; identity function", raises: [], tags: [].}
proc setPolicy[K, V](t: var DITab[K, V]; numer = 0; denom = 0; minFree = 0; + growPow2 = 0; rehash = 0; robinhood = 0) {.inline.}
proc symmetricDifference[K](s1, s2: DITab[K, void]): DITab[K, void]
proc take[K, V: not void](t: var DITab[K, V]; key: K; val: var V): bool {.inline.}
LgHisto is a simple application of BISTs to log-spaced histograms that can yield efficient, dynamic quantiles. log-spacing supports high dynamic range without inordinate cost while Fenwick/BIST supports dynamic membership with operation-balanced perf.
+Quantile error is absolute {not relative to q*(1-q) like a t-Digest} & easily bounded as <~ 1/2 bin width {about 10^(lg(b/a)/n/2)}. So, if you need 3 places or your data is clustered within a few orders of magnitude then you can likely just use 1e4 bins and your counters will remain very L1 cache resident, depending upon resource competition. Cache is the main cost Re: speed. Re: space, since 99% of bins are 0 in many cases, net/disk transfer cost can be improved via simple run-length encoding.
+The way Fenwick BISTs work, the generic parameter C must be a wide enough integer type to hold both elemental bin counts AND grand totals. uint32 is likely enough for many applications, though some might sneak by with uint16 and a few might need uint64. This scales bin size/space cost.
+t-Digests are a well marketed competitor using ~10X less space BUT with >>5X slower quantiles of similar accuracy. Actual cost is sensitive to operation mixes. { This may change, but present t-Digest impls, even with trees, linear scan for quantile/CDFs. None even offer "batch" APIs to do N quantiles in one such scan. "Histo B-trees" should allow better scaling for such. } A BIST basis differs from t-Digests in other important ways. First, BISTs are well suited for pop or moving data window operations with strict finite memory, for e.g. translation of full streams to moving quantiles as in Bollinger Band style smooths. Second, floating point weights for EWMA-like decaying memory are not possible since FP arithmetic kind of breaks BISTs.
+ + +proc `$`[C](s: LgHisto[C]; nonZero = true): string
func add[F, C](s: var LgHisto[C]; x: F; w = 1.C)
func binAB[F, C](s: LgHisto[C]; x: F): (float, float)
func cdf[F, C](s: LgHisto[C]; x: F): C
func fromIx[F, C](s: LgHisto[C]; i: int; offset: F = 0.5): F
func init[C](s: var LgHisto[C]; a = 1e-16; b = 1e+20; n = 8300)
func initLgHisto[C](a = 1e-16; b = 1e+20; n = 8300): LgHisto[C]
func merge[C](dst: var LgHisto[C]; src: LgHisto[C])
func pop[F, C](s: var LgHisto[C]; x: F; w = 1.C)
func quantile[F, C](s: LgHisto[C]; q: F): F
func space[C](s: LgHisto[C]): int
func toIx[F, C](s: LgHisto[C]; x: F): int
func underflows[C](s: LgHisto[C]): int
This module provides an (in|un)ordered multiset/multitable representation via Linear Probing with aging friendly Backshift Delete(Knuth TAOCPv3) & optional Robin Hood re-org (Celis,1986). Linear probing collision clusters yields "no tuning needed" locality of reference: 1 DRAM hit per access for large tables of small items. RH sorts collision clusters by search depth which adds nice properties: faster miss search (eg. for inserts, usually compensating for data motion) and min depth variance (no unlucky keys). The latter enables ~100% table space utilization (we need one empty slot to halt some loops).
+Misuse/attack is always possible. Note that inserting many duplicates causes overlong scans as hash collisions can and is thus "misuse". If this is likely then use V=seq[T] instead. We provide a few mitigations
+Program-wide-tunable defaults are to rehash, warn, re-org & salt with vmAddr since this is the safest portable mode, but most can also be set init time.
+MultiSET personality ensues when the V value type generic parameter is void. Otherwise the style of interface is multiTABLE. Every attempt is made for either personality to be drop-in compatible with Nim's standard library sets & tables, but extra features are provided here.
+Space-time optimization of a sentinel key (a value of K disallowed for ordinary keys) is supported through the final two generic parameters, Z, and z. If Z is void, hash codes are saved and z is ignored. If Z==K, z is the sentinel key value.
+If Z is neither K nor void then compact, insertion-ordered mode is used and z means how many bits of hcode are saved beside an index into a dense seq[(K,V)]. 6..8 bits avoids most "double cache misses" for miss lookups/inserts. z=0 works if space matters more than time.
+ + +LPTabz[K; V; Z; z] = object + when Z is K or Z is void: + data: seq[HCell[K, V, Z, z]] + count: int + + else: + data: seq[HCell[K, V, Z, z]] + idx: SeqUint + hcBits: uint8 + + numer, denom, minFree, growPow2, pow2: uint8 + rehash, robin: bool + salt: Hash +
lpGrowPow2 = 1
lpInitialSize = 2
lpRobinHood = false
proc `$`[K, V: not void; Z; z: static int](t: LPTabz[K, V, Z, z]): string
proc `*`[K, Z; z: static int](s1, s2: LPTabz[K, void, Z, z]): LPTabz[K, void, Z, + z]
proc `+`[K, Z; z: static int](s1, s2: LPTabz[K, void, Z, z]): LPTabz[K, void, Z, + z]
proc `-+-`[K, Z; z: static int](s1, s2: LPTabz[K, void, Z, z]): LPTabz[K, void, + Z, z]
proc `-`[K, Z; z: static int](s1, s2: LPTabz[K, void, Z, z]): LPTabz[K, void, Z, + z]
proc `==`[K, V: not void; Z; z: static int](x, y: LPTabz[K, V, Z, z]): bool
proc `[]=`[K, V, Z; z: static int](t: var LPTabz[K, V, Z, z]; key: K; val: V) {. + inline.}
proc `[]`[K, V, Z; z: static int](t: LPTabz[K, V, Z, z]; key: K): V {.inline.}
proc `[]`[K, V, Z; z: static int](t: var LPTabz[K, V, Z, z]; key: K): var V {. + inline.}
proc `{}=`[K, V, Z; z: static int](t: var LPTabz[K, V, Z, z]; key: K; val: V) {. + inline.}
proc `{}`[K, V, Z; z: static int](t: LPTabz[K, V, Z, z]; key: K): V {.inline.}
proc `{}`[K, V, Z; z: static int](t: var LPTabz[K, V, Z, z]; key: K): var V {. + inline.}
proc add[K, V, Z; z: static int](t: var LPTabz[K, V, Z, z]; key: K; val: V) {. + inline.}
proc add[K, Z; z: static int](t: var LPTabz[K, void, Z, z]; key: K) {.inline.}
proc allValues[K, V, Z; z: static int](t: LPTabz[K, V, Z, z]; key: K; + vals: var seq[V]): bool {.inline.}
proc card[K, Z; z: static int](s: LPTabz[K, void, Z, z]): int {.inline.}
proc contains[K, V, Z; z: static int](t: LPTabz[K, V, Z, z]; key: K): bool {. + inline.}
proc containsOrIncl[K, Z; z: static int](t: var LPTabz[K, void, Z, z]; key: K): bool {. + inline.}
proc debugDump[K, V, Z; z: static int](s: LPTabz[K, V, Z, z]; label = "")
proc del[K, V, Z; z: static int](t: var LPTabz[K, V, Z, z]; key: K) {.inline.}
proc depths[K, V, Z; z: static int](t: LPTabz[K, V, Z, z]): seq[int]
proc depthStats[K, V, Z; z: static int](s: LPTabz[K, V, Z, z]): tuple[ + m1, m2: float, mx: int]
proc difference[K, Z; z: static int](s1, s2: LPTabz[K, void, Z, z]): LPTabz[K, + void, Z, z]
proc disjoint[K, Z; z: static int](s1, s2: LPTabz[K, void, Z, z]): bool
proc editKey[K, V, Z; z: static int](t: var LPTabz[K, V, Z, z]; old, new: K)
proc excl[K, Z; z: static int](s: var LPTabz[K, void, Z, z]; + other: LPTabz[K, void, Z, z])
proc excl[K, Z; z: static int](s: var LPTabz[K, void, Z, z]; elt: K) {.inline.}
proc getCap[K, V, Z; z: static int](t: LPTabz[K, V, Z, z]): int {.inline.}
proc getOrDefault[K, V, Z; z: static int](t: LPTabz[K, V, Z, z]; key: K; + default = default(V)): V {.inline.}
proc hasKey[K, V, Z; z: static int](t: LPTabz[K, V, Z, z]; key: K): bool {. + inline.}
proc hasKeyOrPut[K, V, Z; z: static int](t: var LPTabz[K, V, Z, z]; key: K; + val: V): bool {.inline.}
proc inc[K, V, Z; z: static int](c: var LPTabz[K, V, Z, z]; key: K; + amount: V = 1) {.inline.}
proc incl[K, Z; z: static int](s: var LPTabz[K, void, Z, z]; + other: LPTabz[K, void, Z, z])
proc incl[K, Z; z: static int](s: var LPTabz[K, void, Z, z]; elt: K) {.inline.}
proc indexBy[A, K, V, Z; z: static int](collection: A; index: proc (x: V): K): LPTabz[ + K, V, Z, z]
proc init[K, V, Z; z: static int](t: var LPTabz[K, V, Z, z]; + initialSize = lpInitialSize; numer = lpNumer; + denom = lpDenom; minFree = lpMinFree; + growPow2 = lpGrowPow2; rehash = lpRehash; + robinhood = lpRobinHood) {.inline.}
proc initLPSet[K](initialSize = lpInitialSize; numer = lpNumer; denom = lpDenom; + minFree = lpMinFree; growPow2 = lpGrowPow2; rehash = lpRehash; + robinhood = lpRobinHood): LPSet[K] {.inline.}
proc initLPSetz[K, Z; z: static int](initialSize = lpInitialSize; + numer = lpNumer; denom = lpDenom; + minFree = lpMinFree; growPow2 = lpGrowPow2; + rehash = lpRehash; robinhood = lpRobinHood): LPSetz[ + K, Z, z] {.inline.}
proc initLPTab[K, V](initialSize = lpInitialSize; numer = lpNumer; + denom = lpDenom; minFree = lpMinFree; + growPow2 = lpGrowPow2; rehash = lpRehash; + robinhood = lpRobinHood): LPTab[K, V] {.inline.}
proc initLPTabz[K, V, Z; z: static int](initialSize = lpInitialSize; + numer = lpNumer; denom = lpDenom; + minFree = lpMinFree; + growPow2 = lpGrowPow2; + rehash = lpRehash; + robinhood = lpRobinHood): LPTabz[K, V, + Z, z] {.inline.}
proc intersection[K, Z; z: static int](s1, s2: LPTabz[K, void, Z, z]): LPTabz[K, + void, Z, z]
proc len[K, V, Z; z: static int](t: LPTabz[K, V, Z, z]): int {.inline.}
proc load[K, V, Z; z: static int](t: var LPTabz[K, V, Z, z]; path: string)
proc loadLPTabz[K, V, Z; z: static int](path: string): LPTabz[K, V, Z, z]
proc map[K, A, Z; z: static int](data: LPTabz[K, void, Z, z]; + op: proc (x: K): A {.closure.}): LPTabz[A, + void, Z, z]
proc merge[K, V, Z; z: static int](c: var LPTabz[K, V, Z, z]; + b: LPTabz[K, V, Z, z])
proc mgetOrPut[K, V, Z; z: static int](t: var LPTabz[K, V, Z, z]; key: K; val: V): var V {. + inline.}
proc mgetOrPut[K, V, Z; z: static int](t: var LPTabz[K, V, Z, z]; key: K; + val: V; had: var bool): var V {.inline.}
proc missingOrExcl[K, V, Z; z: static int](t: var LPTabz[K, V, Z, z]; key: K): bool
proc mmap[K, V, Z; z: static int](t: var LPTabz[K, V, Z, z]; path: string)
proc nthKey[K, Z; z: static int](t: LPTabz[K, void, Z, z]; n: int): K {.inline.}
proc nthPair[K, V: not void; Z; z: static int](t: LPTabz[K, V, Z, z]; n: int): ( + K, V)
proc nthPair[K, V: not void; Z; z: static int](t: var LPTabz[K, V, Z, z]; n: int): ( + K, ptr V) {.inline.}
proc numItems[K, Z; z: static int](t: LPTabz[K, void, Z, z]; key: K): int {. + inline.}
proc pop[K, V: not void; Z; z: static int](t: var LPTabz[K, V, Z, z]): (K, V) {. + inline.}
proc pop[K, V: not void; Z; z: static int](t: var LPTabz[K, V, Z, z]; key: K; + val: var V): bool {.inline.}
proc pop[K, Z; z: static int](s: var LPTabz[K, void, Z, z]; key: var K): bool
proc pop[K, Z; z: static int](t: var LPTabz[K, void, Z, z]): K {.inline.}
proc rightSize(count: int; numer = 0; denom = 0; minFree = 0): int {.inline, + ...deprecated: "Deprecated since 0.2; identity function", raises: [], tags: [].}
proc save[K, V, Z; z: static int](t: LPTabz[K, V, Z, z]; pathStub: string)
proc setCap[K, V, Z; z: static int](t: var LPTabz[K, V, Z, z]; newSize = -1)
proc setOrIncl[K, Z; z: static int](t: var LPTabz[K, void, Z, z]; key: K): bool {. + inline.}
proc setPolicy[K, V, Z; z: static int](t: var LPTabz[K, V, Z, z]; + numer = lpNumer; denom = lpDenom; + minFree = lpMinFree; + growPow2 = lpGrowPow2; rehash = lpRehash; + robinhood = lpRobinHood) {.inline.}
proc symmetricDifference[K, Z; z: static int](s1, s2: LPTabz[K, void, Z, z]): LPTabz[ + K, void, Z, z]
proc take[K, V: not void; Z; z: static int](t: var LPTabz[K, V, Z, z]; key: K; + val: var V): bool {.inline.}
proc take[K, Z; z: static int](t: var LPTabz[K, void, Z, z]; key: var K): bool
proc toLPTabz[K; V: not void; Z; z: static int](pairs: openArray[(K, V)]; + dups = false): LPTabz[K, V, Z, z]
iterator allItems[K, Z; z: static int](s: LPTabz[K, void, Z, z]; key: K): K
iterator allValues[K, V, Z; z: static int](t: LPTabz[K, V, Z, z]; key: K): V
iterator allValues[K, V, Z; z: static int](t: LPTabz[K, V, Z, z]; + vals: var seq[V]): K
iterator hcodes[K, Z; z: static int](s: LPTabz[K, void, Z, z]): (int, Hash)
iterator mitems[K, Z; z: static int](s: var LPTabz[K, void, Z, z]): var K
iterator mostCommon[K](xs: openArray[K]; n = 10): (K, int)
iterator mpairs[K, V, Z; z: static int](t: var LPTabz[K, V, Z, z]): (K, var V)
iterator mvalues[K, V, Z; z: static int](t: var LPTabz[K, V, Z, z]): var V
iterator numItems[K, Z; z: static int](t: LPTabz[K, void, Z, z]): (K, int)
iterator pairs[K, Z; z: static int](t: LPTabz[K, void, Z, z]): (int, K)
template editOrInit[K, V, Z; z: static int](t: var LPTabz[K, V, Z, z]; key: K; + v, found, missing: untyped): untyped
template getItOrFail[K, V, Z; z: static int](t: LPTabz[K, V, Z, z]; k: K; + found, missing): untyped
template withIt[K, V, Z; z: static int](t: var LPTabz[K, V, Z, z]; k: K; + edit, init): untyped
This module provides an easy way to do compile-time switched impl swaps for various table/set reprs with various compile-time switched defaults. You should really just learn how to use LPTabz[..] directly, though.
+ +proc initSet[K](sz = lpInitialSize; numer = lpNumer; denom = lpDenom; + minFree = lpMinFree; growPow2 = lpGrowPow2; rehash = rDefault; + robinHood = rhDefault): Set[K] {.inline.}
proc initTab[K, V](sz = lpInitialSize; numer = lpNumer; denom = lpDenom; + minFree = lpMinFree; growPow2 = lpGrowPow2; + rehash = rDefault; robinHood = rhDefault): Tab[K, V] {.inline.}
Summary stats built in running/online fashion (as std/stats) BUT over (maybe) MOVING data windows (via pop) and (sometimes) 50X faster & a million X more accurate. Speed up comes from SIMD auto-vectorization in whole openArray[] calls (in --passC:-ffast-math|-Ofast backend modes) aided by "shift" idea at en.wikipedia.org/wiki/Algorithms_for_calculating_variance (both simpler & faster than Welford). Both var & non-var overloads are provided to allow caching 1.0/n which may be identical-but-expensive (eg. reporting at each cycle of a 1 pop per push (so fixed n) window over data).
+Note: this all costs more in both time & space than exponentially weighted equivalents but has precise rather than infinite memory which can be nice. I.e., it can perfectly "forget" a large spike when it leaves a window.
+ + +BasicStats[F] = object + n*: int ## sample size + min*, max*, mean*, sdev*: F ## the usual suspects. +
MovingStat[F; C] = object + options*: set[Option] + n*, n4Inv: int ## amount of pushed data + nIn, dx, s1, s2, s3, s4: F + min*, max*: F + lgHisto*: LgHisto[C] +
proc `$`[F: SomeFloat; C: SomeInteger](s: MovingStat[F, C]): string
func `+=`[F](a: var BasicStats[F]; b: BasicStats[F])
func `+`[F](a, b: BasicStats[F]): BasicStats[F]
func add[F](a: var BasicStats[F]; b: BasicStats[F])
func basicStats[F: SomeFloat](xs: openArray[F]): BasicStats[F] {. + codegenDecl: "__attribute__((optimize(\"Ofast\", \"fast-math\"))) $# $#$#".}
func cdf[F: SomeFloat; C: SomeInteger](s: MovingStat[F, C]; x: float): float {. + inline.}
func clear[F: SomeFloat; C: SomeInteger](s: var MovingStat[F, C]) {.inline.}
func init[F: SomeFloat; C: SomeInteger](s: var MovingStat[F, C]; a = 1e-16; + b = 1e+20; n = 8300; + options: set[Option] = {}) {.inline.}
func initMovingStat[F: SomeFloat; C: SomeInteger](a = 1e-16; b = 1e+20; + n = 8300; options: set[Option] = {}): MovingStat[F, C] {.inline.}
func kurtosis[F: SomeFloat; C: SomeInteger](s: MovingStat[F, C]): float {.inline.}
func kurtosis[F: SomeFloat; C: SomeInteger](s: var MovingStat[F, C]): float {. + inline.}
func kurtosis[T: SomeNumber](xs: openArray[T]; accum = 32): float
func kurtosisS[F: SomeFloat; C: SomeInteger](s`gensym35: MovingStat[F, C]): float
func kurtosisS[F: SomeFloat; C: SomeInteger]( + s`gensym35: var MovingStat[F, C]): float
func kurtosisS[T: SomeNumber](xs`gensym35: openArray[T]; accum`gensym35 = 32): float
func mean[F: SomeFloat; C: SomeInteger](s: MovingStat[F, C]): float {.inline.}
func mean[F: SomeFloat; C: SomeInteger](s: var MovingStat[F, C]): float {.inline.}
func mean[T: SomeNumber](xs: openArray[T]): float {. + codegenDecl: "__attribute__((optimize(\"Ofast\", \"fast-math\"))) $# $#$#".}
func nInv[F: SomeFloat; C: SomeInteger](s: MovingStat[F, C]): F {.inline.}
func nInv[F: SomeFloat; C: SomeInteger](s: var MovingStat[F, C]): F {.inline.}
func pop[F: SomeFloat; C: SomeInteger](s: var MovingStat[F, C]; x: SomeNumber)
func push[F: SomeFloat; C: SomeInteger](s: var MovingStat[F, C]; x: SomeNumber)
func quantile[F: SomeFloat; C: SomeInteger](s: MovingStat[F, C]; q: float): float {. + inline.}
func range[F: SomeFloat](xs: openArray[F]): (F, F) {. + codegenDecl: "__attribute__((optimize(\"Ofast\", \"fast-math\"))) $# $#$#".}
func skewness[F: SomeFloat; C: SomeInteger](s: MovingStat[F, C]): float {.inline.}
func skewness[F: SomeFloat; C: SomeInteger](s: var MovingStat[F, C]): float {. + inline.}
func skewness[T: SomeNumber](xs: openArray[T]; accum = 32): float
func skewnessS[F: SomeFloat; C: SomeInteger](s`gensym28: MovingStat[F, C]): float
func skewnessS[F: SomeFloat; C: SomeInteger]( + s`gensym28: var MovingStat[F, C]): float
func skewnessS[T: SomeNumber](xs`gensym28: openArray[T]; accum`gensym28 = 32): float
func standardDeviation[F: SomeFloat; C: SomeInteger](s: MovingStat[F, C]): float {. + inline.}
func standardDeviation[F: SomeFloat; C: SomeInteger](s: var MovingStat[F, C]): float {. + inline.}
func standardDeviation[T: SomeNumber](xs: openArray[T]; accum = 32): float
func standardDeviationS[F: SomeFloat; C: SomeInteger](s: MovingStat[F, C]): float
func standardDeviationS[F: SomeFloat; C: SomeInteger](s: var MovingStat[F, C]): float
func standardDeviationS[T: SomeNumber](xs: openArray[T]; accum = 32): float
func stderror[F: SomeFloat; C: SomeInteger](s: MovingStat[F, C]): float {.inline.}
func stderror[F: SomeFloat; C: SomeInteger](s: var MovingStat[F, C]): float {. + inline.}
func stderror[T: SomeNumber](xs: openArray[T]; accum = 32): float
func sum[F: SomeFloat; C: SomeInteger](s: MovingStat[F, C]): float {.inline.}
func variance[F: SomeFloat; C: SomeInteger](s: MovingStat[F, C]): float {.inline.}
func variance[F: SomeFloat; C: SomeInteger](s: var MovingStat[F, C]): float {. + inline.}
func variance[T: SomeNumber](xs: openArray[T]; accum = 32): float {. + codegenDecl: "__attribute__((optimize(\"Ofast\", \"fast-math\"))) $# $#$#".}
func varianceS[F: SomeFloat; C: SomeInteger](s`gensym21: MovingStat[F, C]): float
func varianceS[F: SomeFloat; C: SomeInteger]( + s`gensym21: var MovingStat[F, C]): float
This numeric sort module encapsulates sorting by native numeric keys embedded at some offset inside Nim objects of any size (well, <= 256 bytes for char- keyed obs, but you should "tag sort" objects > 64B anyway). This kind of interface allows low overhead generality and enables algorithms specialized to number types. Such algorithms are often many times faster than comparison sorts. The algo is roughly counting sort for 1-Byte keys and for [248]Byte native type keys an LSD radix sort with optional transforms from signed|float domains. This implementation has several sadly rare optimizations.
+FIRST, total order and order by digit0 is checked in the first histogramming pass to decide if remaining work can be skipped. Hence, if run on an already sorted array, only one read-only pass over the data is done to confirm order. Similarly, items already sorted by digit0 induce no re-ordering write phase. Reverse order is also detected. These non-branching integer comparisons add little work and potentially save a lot.
+SECOND, only 2 counter buffers are ever needed at once - current pass write pointers & next pass read histogram. Using more wastes limited L1/L2 space and/or shrinks max histogram size (increasing number of passes). The buffers for these two "alternate" histograms just toggle across passes. (A several- histogram-at-once counting pass can also achieve no excess re-fetches, but at higher cache usage. Cache use is same for 2B keys, but only high whole byte constancy can yield skipped 2nd passes. The best 2B method depends on keys.)
+THIRD, histogram details are optimized. For moderate n, prefix summing histograms into cumulative distribution functions (really output bin offsets) is a dominant cost. So, this impl does SIMD parallel prefix sum. This optim is more effective as more counters fit into vector registers. So, this impl uses the smallest [1248]Byte counter needed for n items & takes care to align the 2 power of 2-sized counter buffers to maximize vector use. Last, a time cost estimate formula & Newton's method is used to decide pass sizes.
+FOURTH, bits that vary across keys are measured (mask or=(x[i-1] xor x[i])) in the first read-pass. Key-wise constant bits are zero in said mask. Since an array MUST be sorted by constant key bits, smart digit plans can skip them to maybe shrink pass count. pext32/pext64 from Intel's Advanced Bit Manipulation subset (Haswell 2014) are a cheap way to use the above mask. This optimization does not help bitwise non-constant data (other than ease of digit formation afforded by pext). However, this impl is structured so there is almost no added cost to access the potentially large savings. Work on digit plans is ongoing. This optimization is novel to my knowledge. It seems small for an academic paper (maybe optimal digit plans would add enough to discuss) but likely to have been mentioned offhand already. I'd cite a reference if I could and please cite me if you don't have one. How well this does depends strongly upon bit-entropy of keys. Eg., time(2) outputs, over allocated 8B ints, 64-bit VM addrs & positive floats spanning 1..2 octaves may be constant above 12-24 LSbits. Some samples may even get sorted in just one pass! I'd bet much real-world f32 data is 2pass w/b0=12.
+One undone optimization is multi-threads for counting phases. This can boost L3->L1 read throughput by 2-9x (but since scattered writes are much slower than streaming reads, overall speed-up is likely limited). Another maybe useful optimization is saving transformed keys in the tmp arrays (but this also needs inverse transforms on the final pass & current xforms are already cheap compared to L2 or worse bandwidth costs. So, 5-20% boost, I'd guess.)
+ + +proc nsort(obs, tmp: pointer; n, sz: int; off: uint8; xf: XForm; b0 = 0): pointer {. + ...raises: [], tags: [].}
proc nsort[O, W](obs: var openArray[O]; off: W; xf = xfNone; b0 = 0)
template nsortBy(x, field: untyped; b0 = 0): untyped
Convenience template around nsort proc to reduce typing. b0 is the number of bits for the first pass histogram. 0 means use a good default.
+You can only nsort by one numeric field at a time, but sorts are stable. Do x.nsortBy foo; x.nsortBy bar to do a multi-level sort.
+import nsort +var recs = @[(f: 3.0, age: 50), (f: 6.0, age: 30), (f: 5.0, age: 30)] +recs.nsortBy f # Multi-level sort by `age` then `f`. Invoke +recs.nsortBy age # ..in REVERSE order of usual comparison-order. +for r in recs: echo r # Right output: @[(5.0,30), (6.0,30), (3.0,50)]+ Source + Edit + +
template nsortByRaw(x, field: untyped; b0 = 0): untyped
template nsortByTag(x, field: untyped; b0 = 0): untyped
This module provides a memory optimized seq[uint] for a user-given range of numbers (by default its own initial length). E.g., if the range is 0..7, it uses just 3 bits per number (plus rounding error). Other pithy descriptions are "the array version of bit fields" | "the matrix version of bit vectors".
+In the best case, this allows packing numbers 8x (e.g., 8/1) more densely than a "next biggest CPU int rounded" approach (like 8,16,32,64). The public API uses uint, usually a 64-bit unsigned integer.
+To store n indices from 0..n-1 takes n*ceil(lg(n)) bits. E.g., circa 2020 L3 CPU caches have become large enough to support any permutation of 0..2^24-1 since (242*24/8 = 48 MiB).
+Using the widest type for backing store and limiting the design to hold only numbers up to said wide type ensures <= 2 consecutive backing items are ever needed to access a given number.
+While dynamically growing a SeqUint works, changing element size doesn't. So, callers must t=initSeqUint(s.len, 1 shl (s.bits+1)) & copy as needed.
+ + +This is a "tail digest" {github/tdunning/t-digest|arxiv.org/abs/1902.04023}. Counter decs are not possible. (So, moving quantiles are unsupported.) In my experiments, quantiles are 5X slower than adix/lghisto. Tail quantiles do deliver better accuracy for less space (but lghisto already needs little local cache space | network BW in absolute terms). There may be a way to adapt my B-Tree to speed up the idea. { tDig is also very involved - folks just intuit histo(ln(x)), but that is a more subjective critique. }
+func add(s: var DigesT; others: var openArray[DigesT]) {....raises: [ValueError], + tags: [].}
func add(s: var DigesT; x: float; w = 1) {....raises: [ValueError], tags: [].}
func compress(s: var DigesT) {....raises: [], tags: [].}
func init(s: var DigesT; cpr = 100.0; scale = scLog; mainLen = 0; nBuf = 0) {. + ...raises: [], tags: [].}
func initDigesT(cpr = 100.0; scale = scLog; mainLen = 0; nBuf = 0): DigesT {. + ...raises: [], tags: [].}
func mergeNew(s: var DigesT; force = false; cpr = -1.0) {....raises: [], tags: [].}
The min-count sketch (NOT count-min) idea is to see hash(x) as a U(0,1) & use P(sampleMax<x)=x^n for sample size n. Low art inverts a confidence.ival for h.max to estimate n. Tracking k-most distinct h gives better accuracy and is usually called a KMV sketch. (Intuition is that k-th edge val => average gap between k-1 uniques&averaging cuts noise.) See Bar-Yossef 2002 "Counting Distinct..", Giroire05 "Order statistics & estimating cardinalities" & Ting14 "Streamed approximate counting..".
+proc initUniqCe[F: SomeFloat](k = 1024): UniqCe[F]
proc jaccard[F: SomeFloat](a: UniqCe[F]; b: UniqCe[F]): float32
proc nUnique[F: SomeFloat](uc: UniqCe[F]): float32
proc nUniqueErr[F: SomeFloat](uc: UniqCe[F]): float32
proc push[F: SomeFloat](uc: var UniqCe[F]; h: F)