[WIP] Adding Decimal to Std library #17699

JohnAD · 2021-04-11T19:08:58Z

This is VERY early. Just the basic ability to store the IEEE 128-bit decimal format to/from binary; and the first passes at converting to/from strings are supported.

I'm starting the PR right now as the core is written. However, I have much to do still before this really needs review.

JohnAD · 2021-04-11T19:23:09Z

konsumlamm · 2021-04-11T21:05:00Z

According to this, new modules should go to std/ and prefer the plural, so this should go to lib/std/decimals.nim.

lib/pure/decimal.nim

timotheecour · 2021-04-11T21:35:02Z

tests/stdlib/decimal/helpers.nim

@@ -0,0 +1,32 @@
+import std/[decimal, strutils]
+
+template assertEquals*(actual, expected: untyped): untyped =


reuse unittest.check

konsumlamm · 2021-04-11T21:37:38Z

I submitted some feedback, note that I didn't review the code thoroughly, as it's expected to change.

timotheecour · 2021-04-12T01:18:47Z

lib/pure/decimal.nim

@@ -0,0 +1,1040 @@
+##[


We should have a performance comparison against established decimal128 libraries, let's followup on this point in timotheecour#690

lib/pure/decimal.nim

shirleyquirk · 2021-04-17T01:18:15Z

lib/pure/decimal.nim

+    case state:
+    of psPre:
+      if ch == '-':
+        state = psLeadingMinus
+      elif ch == '0':
+        state = psLeadingZeroes
+      elif DIGITS.contains(ch):
+        state = psIntCoeff
+      elif ch == '.':  # yes, we are allowing numbers like ".123" even though that is bad form
+        state = psDecimalPoint
+    of psLeadingMinus:
+      if ch == '0':
+        state = psLeadingZeroes
+      elif DIGITS.contains(ch):
+        state = psIntCoeff
+      elif ch == '.':  # yes, we are allowing numbers like "-.123" even though that is bad form
+        state = psDecimalPoint
+      else:
+        state = psDone  # anything else is not legit
+    of psLeadingZeroes:
+      if ch == '0':
+        discard
+      elif DIGITS.contains(ch):
+        state = psIntCoeff
+      elif ch == '.':
+        state = psDecimalPoint
+      elif ch == 'e':
+        state = psSignForExp
+      else:
+        state = psDone
+    of psIntCoeff:
+      if DIGITS.contains(ch):
+        discard
+      elif IGNORED_CHARS.contains(ch):
+        discard
+      elif ch == '.':
+        state = psDecimalPoint
+      elif ch == 'e':
+        state = psSignForExp
+      else:
+        state = psDone
+    of psDecimalPoint:
+      if DIGITS.contains(ch):
+        state = psFracCoeff
+      else:
+        state = psDone
+    of psFracCoeff:
+      if DIGITS.contains(ch):
+        discard
+      elif IGNORED_CHARS.contains(ch):
+        discard
+      elif ch == 'e':
+        state = psSignForExp
+      else:
+        state = psDone
+    of psSignForExp:
+      if DIGITS.contains(ch):
+        state = psExp
+      elif (ch == '-') or (ch == '+'):
+        discard
+      else:
+        state = psDone
+    of psExp:
+      if DIGITS.contains(ch):
+        discard
+      else:
+        state = psDone
+    of psDone:
+      discard


Suggested change

case state:

of psPre:

if ch == '-':

state = psLeadingMinus

elif ch == '0':

state = psLeadingZeroes

elif DIGITS.contains(ch):

state = psIntCoeff

elif ch == '.': # yes, we are allowing numbers like ".123" even though that is bad form

state = psDecimalPoint

of psLeadingMinus:

if ch == '0':

state = psLeadingZeroes

elif DIGITS.contains(ch):

state = psIntCoeff

elif ch == '.': # yes, we are allowing numbers like "-.123" even though that is bad form

state = psDecimalPoint

else:

state = psDone # anything else is not legit

of psLeadingZeroes:

if ch == '0':

discard

elif DIGITS.contains(ch):

state = psIntCoeff

elif ch == '.':

state = psDecimalPoint

elif ch == 'e':

state = psSignForExp

else:

state = psDone

of psIntCoeff:

if DIGITS.contains(ch):

discard

elif IGNORED_CHARS.contains(ch):

discard

elif ch == '.':

state = psDecimalPoint

elif ch == 'e':

state = psSignForExp

else:

state = psDone

of psDecimalPoint:

if DIGITS.contains(ch):

state = psFracCoeff

else:

state = psDone

of psFracCoeff:

if DIGITS.contains(ch):

discard

elif IGNORED_CHARS.contains(ch):

discard

elif ch == 'e':

state = psSignForExp

else:

state = psDone

of psSignForExp:

if DIGITS.contains(ch):

state = psExp

elif (ch == '-') or (ch == '+'):

discard

else:

state = psDone

of psExp:

if DIGITS.contains(ch):

discard

else:

state = psDone

of psDone:

discard

state = case state

of psPre:

case ch

of '-': psLeadingMinus

of '0': psLeadingZeroes

of '1'..'9': psIntCoeff

of '.': psDecimalPoint

else: psDone

of psLeadingMinus:

case ch

of '0': psLeadingZeroes

of '1'..'9': psIntCoeff

of '.': psDecimalPoint

else: psDone # anything else is not legit

of psLeadingZeroes:

case ch

of '0': psLeadingZeroes

of '1'..'9': psIntCoeff

of '.': psDecimalPoint

of 'e': psSignForExp

else: psDone

of psIntCoeff:

case ch

of '0'..'9','_',',': psIntCoeff

of '.': psDecimalPoint

of 'e': psSignForExp

else: psDone

of psDecimalPoint:

case ch

of '0'..'9': psFracCoeff

else: psDone

of psFracCoeff:

case ch

of '0'..'9','_',',': psFracCoeff

of 'e': psSignForExp

else: psDone

of psSignForExp:

case ch

of '0'..'9': psExp

of '-','+': psSignForExp

else: psDone

of psExp:

case ch

of '0'..'9': psExp

else: psDone

of psDone: psDone

just a bit terser

but don't indent of

timotheecour · 2021-04-17T01:21:09Z

should this go under stdx/decimals instead of std/decimals ?
refs #17722 and see the readme in https://github.com/nim-lang/Nim/pull/17722/files#diff-f29febf9c63569addbb2d089f61ce5cf88d74679880fc7a522ded76babf99a87 for suggested semantics of stdx

/cc @Araq

Araq · 2021-04-17T07:11:08Z

std because decimal numbers are math and don't change. We also need std / bignums.

timotheecour · 2021-05-29T20:53:09Z

ping on benchmarks, are there any benchmarks against some established high performance C libraries?
We shouldn't sacrifice performance IMO; as shown in #18008, we can wrap C libraries and still have it work in VM via vmops and offer idiomatic nim interface.

JohnAD · 2021-05-29T22:01:34Z

There is a well-used standard C library at https://www.bytereef.org/mpdecimal/ and is highly tweaked for performance. It is the library used by Python behind the scenes.

I don't know of any performance benchmarks; but one could certainly be made if there are none.

I seriously doubt the initial version of Decimal will be as fast as mpdecimal; but it could be later tweaked to get very close.

Among other things, the initial version will not take advantage of 128bit math ops available on some CPU platforms. But the c lib can be conditionally compiled to take advantage of them.

timotheecour · 2021-05-29T23:26:09Z

why can't we do like python and other languages and provide a wrapper against mpdecimal instead of re-implementing a decimal library with likely worse performance (by a unknown factor) and less features; it uses a permissive license (BSD) and has a C API so should be no problem to wrap (along with vmops for VM use).

Performance and features matters more than whether pure nim code was used for most users, and the efforts spent on playing catch up in terms of performance or API feature set could be spent elsewhere; it's potentially a massive effort.

JohnAD · 2021-05-30T02:19:06Z

I can see the benefit of that. If that is what is wanted by the core team, I could merge this code over to my existing pure nimble library. Then a new project using the mpdecimal could be started.

Are you thinking of us bringing in the c source of mpdecimal (definitely allowed by the license) into the Nim repo? Or, requiring an external dependency?

timotheecour · 2021-05-30T02:49:54Z

Are you thinking of us bringing in the c source of mpdecimal (definitely allowed by the license) into the Nim repo? Or, requiring an external dependency?

what I suggest is something very similar to #18008:

copy the c source of mpedecimal in nim repo in $nim/lib/vendor/mpdecimal/*.{c,h}
write a nim wrapper for it in lib/std/decimals.nim that importc's mpdecimal procs
add vmops for each importc'd proc so it works in VM (can be done in subsequent work)

whether it uses addDependency("mpdecimal") (as done in #18008) or compiles the C files during ./koch boot (or even a c2nim port) is TBD, but not an essential distinction for the main part, the wrapper.

The main work is the writing the nim wrapper with an idiomatic nim API (it's still much easier than re-implementing mpdecimal from scratch, obviously), and that's what I'd like to review.

The good news is this can be done gradually, starting with the most obvious procs first, but at least it'll work from day 1, with the same performance as mpdecimal.

I think std/jsbigints is a good example of the kind of API std/decimals could shoot for.

nc-x · 2021-05-30T07:45:08Z

Please no. Adding C/C++ dependencies everywhere would make it very difficult to have backends such as wasm, llvm etc.
Also, if i am writing Nim programs, i want to debug Nim if any issues occur, not the underlying C or C++ code.
So, my vote goes to having Native Nim code for decimal, and gradually replacing any other c/c++ thingy being used in the stdlib by native nim also.
If anybody wants to use ultra-fast-but-almost-impossible-to-debug libraries then that should be a nimble package and not the stdlib.

timotheecour · 2021-05-30T17:41:35Z

Adding C/C++ dependencies everywhere would make it very difficult to have backends such as wasm, llvm etc.

you can compile complex C++ libraries like opencv to wasm; if a nim backend can't handle a C dependency, it has bigger problems.

Also, if i am writing Nim programs, i want to debug Nim if any issues occur, not the underlying C or C++ code.
If anybody wants to use ultra-fast-but-almost-impossible-to-debug libraries

how often did you trace into libpcre when using std/re, std/nre? or any of the other library wrappers nim relies on for that matter.

Even if mpdecimal ends up having a bug you run into, what makes you think a nim port with comparable feature set/performance would have less bugs than a library that's been developed for over a decade, is used extensively including by python in its decimal module?

This is pure NIH. I don't want a port with sub-par performance/feature just for the purpose of using pure nim code, when there are established alternatives that give state of the art performance and usable from day 1 with a wrapper. Don't underestimate the development effort that would go into writing and maintaining a port of something like mpdecimal.

JohnAD · 2021-05-30T18:42:23Z

mpdecimal itself consists of 17 files. I could make a wild attempt with c2nim and see just how far I get...

Varriount · 2021-05-30T19:30:15Z

mpdecimal itself consists of 17 files. I could make a wild attempt with c2nim and see just how far I get...

Feel free to ping me on Discord if you want help.

JohnAD · 2021-05-30T19:52:04Z

Thanks, @Varriount ! I might do that when I get time tomorrow. Sadly, I'll be unable to work on it further till then.

In the short term, I've pulled in short pass with c2nim here: https://github.com/JohnAD/mpdecimal-nim

It is far from reaching the point of compilation. Among other things, c2nim's parser did not like seeing unsigned long longs (ULL) so I did a lot of pre-emptive commenting.

nc-x · 2021-05-31T05:27:41Z

@timotheecour

This is pure NIH

Yeah, right. Every programming language community in the world reinventing the wheel because they do not use json/xml/yaml/etc parsers written in C. Oh wait, according to the internet, writing programming languages in itself is NIH.

I understand that writing anything from scratch is going to be time consuming, but it has plenty of benefits (because of which every programming language community "reinvents the wheel" instead of just wrapping everything over).

Just dismissing the benefits and labelling it NIH is ignorance and nothing else.

In any case, I do not have any use of Decimal, so I don't have any problems with wrapping a C library. But yeah, if your solution to everything is to wrap over a C library because "NIH", I would much rather move over to a different programming language, maybe C itself, because hey, it has tested libraries for everything and has no downsides whatsoever...

Araq · 2021-05-31T08:48:36Z

Most C code is full of manual RAII and error handling, you think you need to replicate the hard work of "thousands of lines of C code", but you don't, it's usually full of artificial complexity, just like you would expect from a "portable assembler" language.

JohnAD · 2021-05-31T21:53:33Z

Ok. More like 40 source files in the end. And, while I'm sure it's fast, it is crazy complicated:

dynamic object size based on context (64/128/256bit) with lot's of manual memory management further convoluted by memory management macros.
Regularly resorting to assembler all through-out; which Nim can do also, but can really kill portability.

So, a c2nim started conversion is possible. But it is not a weekend project by any stretch.

JohnAD · 2021-05-31T21:59:34Z

I get the impression it resorts to inline ASM so often because:

there are arithmetic instruction combinations not easily generated natively by C and effect performance in key areas
it wants to handle 32-bit, 64-bit, and 64-bit-with-128-bit-math CPU targets consistently.

JohnAD · 2021-05-31T22:21:32Z

There is another possible project that we can pull C code from. And is much more straightforward and clean: the BSON library for the MongoDb database supports 128bit decimal internally. Essentially, we would be taking out just the decimal portions.

http://mongoc.org/libbson/current/creating.html
https://github.com/mongodb/mongo-c-driver/tree/master/src/libbson
https://github.com/mongodb/mongo-c-driver/blob/master/src/libbson/src/bson/bson-decimal128.c

It is not likely be as fast at mpdecimal, but it would be no slouch either since it is maintained by MongoDB Inc; a database vendor. So they have been "tweaking" software performance.

The downside would be that not all the bells and whistles of a decimal library are written because the code really is for the database client (and server). But the core would be solid.

Just for kicks, I c2nim'd the main bson-decimal128.c file and it converted cleanly by default and is very readable.

I see that the file is covered by Apache 2.0 license. Not sure if that is a problem.

timotheecour · 2021-06-01T01:41:03Z

It is not likely be as fast at mpdecimal, but it would be no slouch either

we should measure performance with a benchmark, not guess; this should be done earlier rather than later to avoid any surprises after spending significant effort

JohnAD · 2021-06-23T02:18:52Z

Finished up a crude benchmark: https://github.com/JohnAD/decimal-benchmark

The Nim versions are not nearly as fast at the C versions, but neither Nim version has been optimized to any real extent yet.

I'm actually surprised that the libbson library ran so fast. Unlike the mpdecimal monster, it does not resort to assembly tweaks and other unreadable measures. Unfortunately, libbson does not support math ops as it pretty much assumes that any higher-level (non-c) language that uses it will already have a built-in decimal library (with math); so libbson is mostly about the interface between BSON and the other language.

BTW, I've taken on another customer that will have me over-busy again. But I hope to begin work on decimal again in a month or so.

stale · 2022-06-23T05:03:23Z

This pull request has been automatically marked as stale because it has not had recent activity. If you think it is still a valid PR, please rebase it on the latest devel; otherwise it will be closed. Thank you for your contributions.

Decimal storage and string conversion works.

30bce5a

konsumlamm reviewed Apr 11, 2021

View reviewed changes

timotheecour reviewed Apr 11, 2021

View reviewed changes

JohnAD marked this pull request as draft April 12, 2021 00:21

timotheecour mentioned this pull request Apr 12, 2021

misc decimal timotheecour/Nim#690

Open

timotheecour reviewed Apr 12, 2021

View reviewed changes

konsumlamm reviewed Apr 12, 2021

View reviewed changes

lib/pure/decimal.nim Outdated Show resolved Hide resolved

shirleyquirk reviewed Apr 17, 2021

View reviewed changes

Merge branch 'nim-lang:devel' into devel

442f6ba

Minor changes based on review.

fc16b47

timotheecour mentioned this pull request Jun 5, 2021

Regression in float stringification - dragonbox [devel] #18182

Closed

stale bot added the stale Staled PR/issues; remove the label after fixing them label Jun 23, 2022

stale bot closed this Jul 31, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Adding Decimal to Std library #17699

[WIP] Adding Decimal to Std library #17699

JohnAD commented Apr 11, 2021

JohnAD commented Apr 11, 2021 •

edited

Loading

konsumlamm commented Apr 11, 2021

timotheecour Apr 11, 2021

konsumlamm commented Apr 11, 2021

timotheecour Apr 12, 2021

shirleyquirk Apr 17, 2021

timotheecour Apr 17, 2021

timotheecour commented Apr 17, 2021

Araq commented Apr 17, 2021

timotheecour commented May 29, 2021

JohnAD commented May 29, 2021

timotheecour commented May 29, 2021 •

edited

Loading

JohnAD commented May 30, 2021 •

edited

Loading

timotheecour commented May 30, 2021 •

edited

Loading

nc-x commented May 30, 2021

timotheecour commented May 30, 2021 •

edited

Loading

JohnAD commented May 30, 2021

Varriount commented May 30, 2021

JohnAD commented May 30, 2021 •

edited

Loading

nc-x commented May 31, 2021 •

edited

Loading

Araq commented May 31, 2021

JohnAD commented May 31, 2021

JohnAD commented May 31, 2021 •

edited

Loading

JohnAD commented May 31, 2021 •

edited

Loading

timotheecour commented Jun 1, 2021

JohnAD commented Jun 23, 2021

stale bot commented Jun 23, 2022

		@@ -0,0 +1,32 @@
		import std/[decimal, strutils]

		template assertEquals*(actual, expected: untyped): untyped =

[WIP] Adding Decimal to Std library #17699

[WIP] Adding Decimal to Std library #17699

Conversation

JohnAD commented Apr 11, 2021

JohnAD commented Apr 11, 2021 • edited Loading

ToDo

Docs

Targets

String in/out

Types

Ops

Collection Ops

Methods

Rounding

Tests

konsumlamm commented Apr 11, 2021

timotheecour Apr 11, 2021

Choose a reason for hiding this comment

konsumlamm commented Apr 11, 2021

timotheecour Apr 12, 2021

Choose a reason for hiding this comment

shirleyquirk Apr 17, 2021

Choose a reason for hiding this comment

timotheecour Apr 17, 2021

Choose a reason for hiding this comment

timotheecour commented Apr 17, 2021

Araq commented Apr 17, 2021

timotheecour commented May 29, 2021

JohnAD commented May 29, 2021

timotheecour commented May 29, 2021 • edited Loading

JohnAD commented May 30, 2021 • edited Loading

timotheecour commented May 30, 2021 • edited Loading

nc-x commented May 30, 2021

timotheecour commented May 30, 2021 • edited Loading

JohnAD commented May 30, 2021

Varriount commented May 30, 2021

JohnAD commented May 30, 2021 • edited Loading

nc-x commented May 31, 2021 • edited Loading

Araq commented May 31, 2021

JohnAD commented May 31, 2021

JohnAD commented May 31, 2021 • edited Loading

JohnAD commented May 31, 2021 • edited Loading

timotheecour commented Jun 1, 2021

JohnAD commented Jun 23, 2021

stale bot commented Jun 23, 2022

JohnAD commented Apr 11, 2021 •

edited

Loading

timotheecour commented May 29, 2021 •

edited

Loading

JohnAD commented May 30, 2021 •

edited

Loading

timotheecour commented May 30, 2021 •

edited

Loading

timotheecour commented May 30, 2021 •

edited

Loading

JohnAD commented May 30, 2021 •

edited

Loading

nc-x commented May 31, 2021 •

edited

Loading

JohnAD commented May 31, 2021 •

edited

Loading

JohnAD commented May 31, 2021 •

edited

Loading