Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Adding Decimal to Std library #17699

Closed
wants to merge 3 commits into from
Closed

Conversation

JohnAD
Copy link
Contributor

@JohnAD JohnAD commented Apr 11, 2021

This is VERY early. Just the basic ability to store the IEEE 128-bit decimal format to/from binary; and the first passes at converting to/from strings are supported.

I'm starting the PR right now as the core is written. However, I have much to do still before this really needs review.

@JohnAD
Copy link
Contributor Author

JohnAD commented Apr 11, 2021

ToDo

Docs

  • write and have edited an introduction to the lib
  • fully document the public methods/procs

Targets

  • For 32-bit systems, using when, write an alternative to the 113-bit multiplication/division that does not use 64-bit integers.
  • For JS, using when, work around the 64-bit masking ops with equivalent 32-bit mask ops. This will probably be used by the 32-bit systems also.

String in/out

  • Support hex prefixed strings for input (but don't support the 'E' exponent marker)
  • Support octal strings in/out
  • Support binary strings in/out

Types

  • Int (all) in/out
  • Float in/out

Ops

  • Addition +
  • Subtraction -
  • Multiplication *
  • Division /. Note: per IEEE spec 1/0 is +Infinity; ignoring proper mathematics :) . However 0/0 is NaN.
  • equivalence (same scalar value, but different precision) e.g. equiv(123.4'm, 123.4000'm)
  • gt and gte >. Accounts for NaN and Inf.
  • lt and lte <. Accounts for NaN and Inf.

Collection Ops

  • mean for seq[Decimal]. Accounts for Nan and Inf.
  • median for seq[Decimal]. Accounts for NaN.
  • sum for seq[Decimal]. Accounts for NaN and Inf.
  • sort for seq[Decimal]. Accounts for NaN.
  • countReal for seq[Decimal]. Number of non-NaN/Inf. Infinity is not a real number; but a conceptual one.

Methods

  • significance= (changes variable) and withSignificance (returns a new value)
  • withPlaces (returns a new value) (note: places= already written)

Rounding

  • flush out default 'rounding' support using standard "round-half-to-even" aka "bankers rounding"
  • Add support for other rounding method with explicit procedures:
    • Truncation
    • Rounding Up
    • Rounding Down
    • Rounding Half Up
    • Rounding Half Down
    • Rounding Half Away From Zero

Tests

  • finish decimal/tconversion.nim with cases for int and float
  • write decimal/trounding.nim to handle rounding tests
  • write decimal/tops.nim to test operations.

@konsumlamm
Copy link
Contributor

According to this, new modules should go to std/ and prefer the plural, so this should go to lib/std/decimals.nim.

lib/pure/decimal.nim Outdated Show resolved Hide resolved
lib/pure/decimal.nim Outdated Show resolved Hide resolved
lib/pure/decimal.nim Outdated Show resolved Hide resolved
lib/pure/decimal.nim Outdated Show resolved Hide resolved
lib/pure/decimal.nim Outdated Show resolved Hide resolved
lib/pure/decimal.nim Outdated Show resolved Hide resolved
lib/pure/decimal.nim Outdated Show resolved Hide resolved
lib/pure/decimal.nim Outdated Show resolved Hide resolved
lib/pure/decimal.nim Outdated Show resolved Hide resolved
lib/pure/decimal.nim Outdated Show resolved Hide resolved
@@ -0,0 +1,32 @@
import std/[decimal, strutils]

template assertEquals*(actual, expected: untyped): untyped =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reuse unittest.check

@konsumlamm
Copy link
Contributor

I submitted some feedback, note that I didn't review the code thoroughly, as it's expected to change.

@JohnAD JohnAD marked this pull request as draft April 12, 2021 00:21
@@ -0,0 +1,1040 @@
##[
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have a performance comparison against established decimal128 libraries, let's followup on this point in timotheecour#690

lib/pure/decimal.nim Outdated Show resolved Hide resolved
Comment on lines 604 to 672
case state:
of psPre:
if ch == '-':
state = psLeadingMinus
elif ch == '0':
state = psLeadingZeroes
elif DIGITS.contains(ch):
state = psIntCoeff
elif ch == '.': # yes, we are allowing numbers like ".123" even though that is bad form
state = psDecimalPoint
of psLeadingMinus:
if ch == '0':
state = psLeadingZeroes
elif DIGITS.contains(ch):
state = psIntCoeff
elif ch == '.': # yes, we are allowing numbers like "-.123" even though that is bad form
state = psDecimalPoint
else:
state = psDone # anything else is not legit
of psLeadingZeroes:
if ch == '0':
discard
elif DIGITS.contains(ch):
state = psIntCoeff
elif ch == '.':
state = psDecimalPoint
elif ch == 'e':
state = psSignForExp
else:
state = psDone
of psIntCoeff:
if DIGITS.contains(ch):
discard
elif IGNORED_CHARS.contains(ch):
discard
elif ch == '.':
state = psDecimalPoint
elif ch == 'e':
state = psSignForExp
else:
state = psDone
of psDecimalPoint:
if DIGITS.contains(ch):
state = psFracCoeff
else:
state = psDone
of psFracCoeff:
if DIGITS.contains(ch):
discard
elif IGNORED_CHARS.contains(ch):
discard
elif ch == 'e':
state = psSignForExp
else:
state = psDone
of psSignForExp:
if DIGITS.contains(ch):
state = psExp
elif (ch == '-') or (ch == '+'):
discard
else:
state = psDone
of psExp:
if DIGITS.contains(ch):
discard
else:
state = psDone
of psDone:
discard
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
case state:
of psPre:
if ch == '-':
state = psLeadingMinus
elif ch == '0':
state = psLeadingZeroes
elif DIGITS.contains(ch):
state = psIntCoeff
elif ch == '.': # yes, we are allowing numbers like ".123" even though that is bad form
state = psDecimalPoint
of psLeadingMinus:
if ch == '0':
state = psLeadingZeroes
elif DIGITS.contains(ch):
state = psIntCoeff
elif ch == '.': # yes, we are allowing numbers like "-.123" even though that is bad form
state = psDecimalPoint
else:
state = psDone # anything else is not legit
of psLeadingZeroes:
if ch == '0':
discard
elif DIGITS.contains(ch):
state = psIntCoeff
elif ch == '.':
state = psDecimalPoint
elif ch == 'e':
state = psSignForExp
else:
state = psDone
of psIntCoeff:
if DIGITS.contains(ch):
discard
elif IGNORED_CHARS.contains(ch):
discard
elif ch == '.':
state = psDecimalPoint
elif ch == 'e':
state = psSignForExp
else:
state = psDone
of psDecimalPoint:
if DIGITS.contains(ch):
state = psFracCoeff
else:
state = psDone
of psFracCoeff:
if DIGITS.contains(ch):
discard
elif IGNORED_CHARS.contains(ch):
discard
elif ch == 'e':
state = psSignForExp
else:
state = psDone
of psSignForExp:
if DIGITS.contains(ch):
state = psExp
elif (ch == '-') or (ch == '+'):
discard
else:
state = psDone
of psExp:
if DIGITS.contains(ch):
discard
else:
state = psDone
of psDone:
discard
state = case state
of psPre:
case ch
of '-': psLeadingMinus
of '0': psLeadingZeroes
of '1'..'9': psIntCoeff
of '.': psDecimalPoint
else: psDone
of psLeadingMinus:
case ch
of '0': psLeadingZeroes
of '1'..'9': psIntCoeff
of '.': psDecimalPoint
else: psDone # anything else is not legit
of psLeadingZeroes:
case ch
of '0': psLeadingZeroes
of '1'..'9': psIntCoeff
of '.': psDecimalPoint
of 'e': psSignForExp
else: psDone
of psIntCoeff:
case ch
of '0'..'9','_',',': psIntCoeff
of '.': psDecimalPoint
of 'e': psSignForExp
else: psDone
of psDecimalPoint:
case ch
of '0'..'9': psFracCoeff
else: psDone
of psFracCoeff:
case ch
of '0'..'9','_',',': psFracCoeff
of 'e': psSignForExp
else: psDone
of psSignForExp:
case ch
of '0'..'9': psExp
of '-','+': psSignForExp
else: psDone
of psExp:
case ch
of '0'..'9': psExp
else: psDone
of psDone: psDone

just a bit terser

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but don't indent of

@timotheecour
Copy link
Member

should this go under stdx/decimals instead of std/decimals ?
refs #17722 and see the readme in https://github.com/nim-lang/Nim/pull/17722/files#diff-f29febf9c63569addbb2d089f61ce5cf88d74679880fc7a522ded76babf99a87 for suggested semantics of stdx

/cc @Araq

@Araq
Copy link
Member

Araq commented Apr 17, 2021

std because decimal numbers are math and don't change. We also need std / bignums.

@timotheecour
Copy link
Member

ping on benchmarks, are there any benchmarks against some established high performance C libraries?
We shouldn't sacrifice performance IMO; as shown in #18008, we can wrap C libraries and still have it work in VM via vmops and offer idiomatic nim interface.

@JohnAD
Copy link
Contributor Author

JohnAD commented May 29, 2021

There is a well-used standard C library at https://www.bytereef.org/mpdecimal/ and is highly tweaked for performance. It is the library used by Python behind the scenes.

I don't know of any performance benchmarks; but one could certainly be made if there are none.

I seriously doubt the initial version of Decimal will be as fast as mpdecimal; but it could be later tweaked to get very close.

Among other things, the initial version will not take advantage of 128bit math ops available on some CPU platforms. But the c lib can be conditionally compiled to take advantage of them.

@timotheecour
Copy link
Member

timotheecour commented May 29, 2021

why can't we do like python and other languages and provide a wrapper against mpdecimal instead of re-implementing a decimal library with likely worse performance (by a unknown factor) and less features; it uses a permissive license (BSD) and has a C API so should be no problem to wrap (along with vmops for VM use).

Performance and features matters more than whether pure nim code was used for most users, and the efforts spent on playing catch up in terms of performance or API feature set could be spent elsewhere; it's potentially a massive effort.

@JohnAD
Copy link
Contributor Author

JohnAD commented May 30, 2021

I can see the benefit of that. If that is what is wanted by the core team, I could merge this code over to my existing pure nimble library. Then a new project using the mpdecimal could be started.

Are you thinking of us bringing in the c source of mpdecimal (definitely allowed by the license) into the Nim repo? Or, requiring an external dependency?

@timotheecour
Copy link
Member

timotheecour commented May 30, 2021

Are you thinking of us bringing in the c source of mpdecimal (definitely allowed by the license) into the Nim repo? Or, requiring an external dependency?

what I suggest is something very similar to #18008:

  • copy the c source of mpedecimal in nim repo in $nim/lib/vendor/mpdecimal/*.{c,h}
  • write a nim wrapper for it in lib/std/decimals.nim that importc's mpdecimal procs
  • add vmops for each importc'd proc so it works in VM (can be done in subsequent work)

whether it uses addDependency("mpdecimal") (as done in #18008) or compiles the C files during ./koch boot (or even a c2nim port) is TBD, but not an essential distinction for the main part, the wrapper.

The main work is the writing the nim wrapper with an idiomatic nim API (it's still much easier than re-implementing mpdecimal from scratch, obviously), and that's what I'd like to review.

The good news is this can be done gradually, starting with the most obvious procs first, but at least it'll work from day 1, with the same performance as mpdecimal.

I think std/jsbigints is a good example of the kind of API std/decimals could shoot for.

@nc-x
Copy link
Contributor

nc-x commented May 30, 2021

Please no. Adding C/C++ dependencies everywhere would make it very difficult to have backends such as wasm, llvm etc.
Also, if i am writing Nim programs, i want to debug Nim if any issues occur, not the underlying C or C++ code.
So, my vote goes to having Native Nim code for decimal, and gradually replacing any other c/c++ thingy being used in the stdlib by native nim also.
If anybody wants to use ultra-fast-but-almost-impossible-to-debug libraries then that should be a nimble package and not the stdlib.

@timotheecour
Copy link
Member

timotheecour commented May 30, 2021

Adding C/C++ dependencies everywhere would make it very difficult to have backends such as wasm, llvm etc.

you can compile complex C++ libraries like opencv to wasm; if a nim backend can't handle a C dependency, it has bigger problems.

Also, if i am writing Nim programs, i want to debug Nim if any issues occur, not the underlying C or C++ code.
If anybody wants to use ultra-fast-but-almost-impossible-to-debug libraries

how often did you trace into libpcre when using std/re, std/nre? or any of the other library wrappers nim relies on for that matter.

Even if mpdecimal ends up having a bug you run into, what makes you think a nim port with comparable feature set/performance would have less bugs than a library that's been developed for over a decade, is used extensively including by python in its decimal module?

This is pure NIH. I don't want a port with sub-par performance/feature just for the purpose of using pure nim code, when there are established alternatives that give state of the art performance and usable from day 1 with a wrapper. Don't underestimate the development effort that would go into writing and maintaining a port of something like mpdecimal.

@JohnAD
Copy link
Contributor Author

JohnAD commented May 30, 2021

mpdecimal itself consists of 17 files. I could make a wild attempt with c2nim and see just how far I get...

@Varriount
Copy link
Contributor

mpdecimal itself consists of 17 files. I could make a wild attempt with c2nim and see just how far I get...

Feel free to ping me on Discord if you want help.

@JohnAD
Copy link
Contributor Author

JohnAD commented May 30, 2021

Thanks, @Varriount ! I might do that when I get time tomorrow. Sadly, I'll be unable to work on it further till then.

In the short term, I've pulled in short pass with c2nim here: https://github.com/JohnAD/mpdecimal-nim

It is far from reaching the point of compilation. Among other things, c2nim's parser did not like seeing unsigned long longs (ULL) so I did a lot of pre-emptive commenting.

@nc-x
Copy link
Contributor

nc-x commented May 31, 2021

@timotheecour

This is pure NIH

Yeah, right. Every programming language community in the world reinventing the wheel because they do not use json/xml/yaml/etc parsers written in C. Oh wait, according to the internet, writing programming languages in itself is NIH.

I understand that writing anything from scratch is going to be time consuming, but it has plenty of benefits (because of which every programming language community "reinvents the wheel" instead of just wrapping everything over).

Just dismissing the benefits and labelling it NIH is ignorance and nothing else.

In any case, I do not have any use of Decimal, so I don't have any problems with wrapping a C library. But yeah, if your solution to everything is to wrap over a C library because "NIH", I would much rather move over to a different programming language, maybe C itself, because hey, it has tested libraries for everything and has no downsides whatsoever...

@Araq
Copy link
Member

Araq commented May 31, 2021

Most C code is full of manual RAII and error handling, you think you need to replicate the hard work of "thousands of lines of C code", but you don't, it's usually full of artificial complexity, just like you would expect from a "portable assembler" language.

@JohnAD
Copy link
Contributor Author

JohnAD commented May 31, 2021

Ok. More like 40 source files in the end. And, while I'm sure it's fast, it is crazy complicated:

  • dynamic object size based on context (64/128/256bit) with lot's of manual memory management further convoluted by memory management macros.
  • Regularly resorting to assembler all through-out; which Nim can do also, but can really kill portability.

So, a c2nim started conversion is possible. But it is not a weekend project by any stretch.

@JohnAD
Copy link
Contributor Author

JohnAD commented May 31, 2021

I get the impression it resorts to inline ASM so often because:

  • there are arithmetic instruction combinations not easily generated natively by C and effect performance in key areas
  • it wants to handle 32-bit, 64-bit, and 64-bit-with-128-bit-math CPU targets consistently.

@JohnAD
Copy link
Contributor Author

JohnAD commented May 31, 2021

There is another possible project that we can pull C code from. And is much more straightforward and clean: the BSON library for the MongoDb database supports 128bit decimal internally. Essentially, we would be taking out just the decimal portions.

http://mongoc.org/libbson/current/creating.html
https://github.com/mongodb/mongo-c-driver/tree/master/src/libbson
https://github.com/mongodb/mongo-c-driver/blob/master/src/libbson/src/bson/bson-decimal128.c

It is not likely be as fast at mpdecimal, but it would be no slouch either since it is maintained by MongoDB Inc; a database vendor. So they have been "tweaking" software performance.

The downside would be that not all the bells and whistles of a decimal library are written because the code really is for the database client (and server). But the core would be solid.

Just for kicks, I c2nim'd the main bson-decimal128.c file and it converted cleanly by default and is very readable.

I see that the file is covered by Apache 2.0 license. Not sure if that is a problem.

@timotheecour
Copy link
Member

It is not likely be as fast at mpdecimal, but it would be no slouch either

we should measure performance with a benchmark, not guess; this should be done earlier rather than later to avoid any surprises after spending significant effort

@JohnAD
Copy link
Contributor Author

JohnAD commented Jun 23, 2021

Finished up a crude benchmark: https://github.com/JohnAD/decimal-benchmark

The Nim versions are not nearly as fast at the C versions, but neither Nim version has been optimized to any real extent yet.

I'm actually surprised that the libbson library ran so fast. Unlike the mpdecimal monster, it does not resort to assembly tweaks and other unreadable measures. Unfortunately, libbson does not support math ops as it pretty much assumes that any higher-level (non-c) language that uses it will already have a built-in decimal library (with math); so libbson is mostly about the interface between BSON and the other language.

BTW, I've taken on another customer that will have me over-busy again. But I hope to begin work on decimal again in a month or so.

@stale
Copy link

stale bot commented Jun 23, 2022

This pull request has been automatically marked as stale because it has not had recent activity. If you think it is still a valid PR, please rebase it on the latest devel; otherwise it will be closed. Thank you for your contributions.

@stale stale bot added the stale Staled PR/issues; remove the label after fixing them label Jun 23, 2022
@stale stale bot closed this Jul 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale Staled PR/issues; remove the label after fixing them
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants