Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test failures due to floating point precision issues on ppc64le #72226

Closed
amitsadaphule opened this issue Oct 29, 2021 · 11 comments · Fixed by #86126
Closed

Test failures due to floating point precision issues on ppc64le #72226

amitsadaphule opened this issue Oct 29, 2021 · 11 comments · Fixed by #86126
Labels
B-unsupported-arch Non-x86_64 architectures: PPC, MIPS, etc C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-community Originated from the community T-sql-queries SQL Queries Team X-blathers-triaged blathers was able to find an owner

Comments

@amitsadaphule
Copy link
Contributor

amitsadaphule commented Oct 29, 2021

Describe the problem

I built the cockroachdb v20.2.17 on ppc64le and when I executed the test suite, I found that there were test failures for the following packages:

github.com/cockroachdb/cockroach/pkg/geo
github.com/cockroachdb/cockroach/pkg/geo/geogfn
github.com/cockroachdb/cockroach/pkg/geo/geographiclib
github.com/cockroachdb/cockroach/pkg/geo/geoindex
github.com/cockroachdb/cockroach/pkg/sql/sem/tree
github.com/cockroachdb/cockroach/pkg/sql/sem/tree/eval_test

When I analysed the logs FP-precision-failures.zip, I found that the failing tests were due to floating point precision issues. A sample log of test failure for geogfn package pasted below (Please refer full package test logs in attached zip for more details):

--- FAIL: TestAzimuth/north_east (0.00s)
        azimuth_test.go:70: 
            	Error Trace:	azimuth_test.go:70
            	Error:      	Not equal: 
            	            	expected: 0.7886800845259658
            	            	actual  : 0.788680084525966
            	Test:       	TestAzimuth/north_east
				

--- FAIL: TestSegmentize/LINESTRING(1.0_1.0,_2.0_2.0,_3.0_3.0),_maximum_segment_length:_100000.000000 (0.00s)
	segmentize_test.go:118: 
			Error Trace:	segmentize_test.go:118
			Error:      	Not equal: 
							expected: geo.Geography{spatialObject:geopb.SpatialObject{Type:1, EWKB:geopb.EWKB{0x1, 0x2, 0x0, 0x0, 0x20, 0xe6, 0x10, 0x0, 0x0, 0x5, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xf0, 0x3f, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xf0, 0x3f, 0xed, 0x4e, 0xa4, 0x2f, 0x88, 0xff, 0xf7, 0x3f, 0x97, 0x93, 0x60, 0xdd, 0x3b, 0x0, 0xf8, 0x3f, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x40, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x40, 0x9d, 0xc8, 0x4f, 0x1d, 0x9c, 0xff, 0x3, 0x40, 0x91, 0xbb, 0xc5, 0xd8, 0x31, 0x0, 0x4, 0x40, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x8, 0x40, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x8, 0x40}, SRID:4326, ShapeType:2, BoundingBox:(*geopb.BoundingBox)(0xc000270560)}}
							actual  : geo.Geography{spatialObject:geopb.SpatialObject{Type:1, EWKB:geopb.EWKB{0x1, 0x2, 0x0, 0x0, 0x20, 0xe6, 0x10, 0x0, 0x0, 0x5, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xf0, 0x3f, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xf0, 0x3f, 0xed, 0x4e, 0xa4, 0x2f, 0x88, 0xff, 0xf7, 0x3f, 0x96, 0x93, 0x60, 0xdd, 0x3b, 0x0, 0xf8, 0x3f, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x40, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x40, 0x9c, 0xc8, 0x4f, 0x1d, 0x9c, 0xff, 0x3, 0x40, 0x91, 0xbb, 0xc5, 0xd8, 0x31, 0x0, 0x4, 0x40, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x8, 0x40, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x8, 0x40}, SRID:4326, ShapeType:2, BoundingBox:(*geopb.BoundingBox)(0xc000270540)}}
							
							Diff:
							--- Expected
							+++ Actual
							@@ -6,4 +6,4 @@
								00000010  00 00 00 f0 3f 00 00 00  00 00 00 f0 3f ed 4e a4  |....?.......?.N.|
							-   00000020  2f 88 ff f7 3f 97 93 60  dd 3b 00 f8 3f 00 00 00  |/...?..`.;..?...|
							-   00000030  00 00 00 00 40 00 00 00  00 00 00 00 40 9d c8 4f  |....@.......@..O|
							+   00000020  2f 88 ff f7 3f 96 93 60  dd 3b 00 f8 3f 00 00 00  |/...?..`.;..?...|
							+   00000030  00 00 00 00 40 00 00 00  00 00 00 00 40 9c c8 4f  |....@.......@..O|
								00000040  1d 9c ff 03 40 91 bb c5  d8 31 00 04 40 00 00 00  |....@....1..@...|
			Test:       	TestSegmentize/LINESTRING(1.0_1.0,_2.0_2.0,_3.0_3.0),_maximum_segment_length:_100000.000000

To Reproduce

Here are the steps to reproduce the issue:

#!/bin/bash

CWD=`pwd`

# Install all dependencies
dnf -y --disableplugin=subscription-manager install \
        http://mirror.centos.org/centos/8/BaseOS/ppc64le/os/Packages/centos-gpg-keys-8-2.el8.noarch.rpm \
        http://mirror.centos.org/centos/8/BaseOS/ppc64le/os/Packages/centos-linux-repos-8-2.el8.noarch.rpm \
        https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm

yum install -y git cmake make gcc-c++ autoconf ncurses-devel.ppc64le wget.ppc64le openssl-devel.ppc64le diffutils procps-ng

cd $HOME

set -eu

# Install nodejs
NODE_VERSION=v12.18.2
DISTRO=linux-ppc64le
wget https://nodejs.org/dist/$NODE_VERSION/node-$NODE_VERSION-$DISTRO.tar.gz
tar -xzf node-$NODE_VERSION-$DISTRO.tar.gz
export PATH=$HOME/node-$NODE_VERSION-$DISTRO/bin:$PATH

npm install yarn --global

cd $HOME

# Setup go environment and install go
GOPATH=$HOME/go
COCKROACH_HOME=$GOPATH/src/github.com/cockroachdb
mkdir -p $COCKROACH_HOME
export GOPATH
curl -O https://dl.google.com/go/go1.13.5.linux-ppc64le.tar.gz
tar -C /usr/local -xzf go1.13.5.linux-ppc64le.tar.gz
rm -rf go1.13.5.linux-ppc64le.tar.gz
export PATH=$PATH:/usr/local/go/bin

# Clone cockroach and build
COCKROACH_VERSION=v20.2.17
cd $COCKROACH_HOME
git clone https://github.com/cockroachdb/cockroach.git
cd cockroach
git checkout $COCKROACH_VERSION
make buildoss

export GOMAXPROCS=4
make test PKG=./pkg/geo TESTFLAGS='-v -count=1'
make test PKG=./pkg/geo/geogfn TESTFLAGS='-v -count=1'
make test PKG=./pkg/geo/geographiclib TESTFLAGS='-v -count=1'
make test PKG=./pkg/geo/geoindex TESTFLAGS='-v -count=1'
make test PKG=./pkg/sql/sem/tree TESTFLAGS='-v -count=1'
make test PKG=./pkg/sql/sem/tree/eval_test TESTFLAGS='-v -count=1'

Environment:

CockroachDB [v20.2.17]
Architecture and OS: [ppc64le/RHEL/UBI 8.4]
Go: [1.13.5]
Node: [12.18.2]

Any pointers to resolve the failures would be of great help.

Jira issue: CRDB-11048

@amitsadaphule amitsadaphule added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Oct 29, 2021
@blathers-crl
Copy link

blathers-crl bot commented Oct 29, 2021

Hello, I am Blathers. I am here to help you get the issue triaged.

Hoot - a bug! Though bugs are the bane of my existence, rest assured the wretched thing will get the best of care here.

I have CC'd a few people who may be able to assist you:

  • @cockroachdb/bulk-io (found keywords: export)

If we have not gotten back to your issue within a few business days, you can try the following:

  • Join our community slack channel and ask on #cockroachdb.
  • Try find someone from here if you know they worked closely on the area and CC them.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@blathers-crl blathers-crl bot added A-disaster-recovery O-community Originated from the community X-blathers-triaged blathers was able to find an owner labels Oct 29, 2021
@blathers-crl
Copy link

blathers-crl bot commented Oct 29, 2021

cc @cockroachdb/bulk-io

@blathers-crl blathers-crl bot added the T-server-and-security DB Server & Security label Nov 1, 2021
@blathers-crl blathers-crl bot added the T-sql-queries SQL Queries Team label Nov 9, 2021
@knz knz added B-unsupported-arch Non-x86_64 architectures: PPC, MIPS, etc and removed T-server-and-security DB Server & Security labels Nov 9, 2021
@knz
Copy link
Contributor

knz commented Nov 9, 2021

@amitsadaphule this is the result of a different floating-point behavior on PPC.

As explained in #72227, CockroachDB's team does not support alternative build platforms, so we're likely not going to work actively to resolve this particular problem. However, on the other hand we have many users that report that cockroachDB works on their platform, even if not officially supported.

Note that the difference in behavior wrt floating-point computations means that SQL queries are likely to return slightly differnt results than on other platforms. However, from experience the difference is often very small and depending on the application might be negligible.

Also, in the future please do not file common issues for multiple go packages. We would prefer different issues filed for different areas. Thank you.

@seth-priya
Copy link

Thanks @knz for your response again and point taken.

Just a quick confirmation/clarification on your statement above
"this is the result of a different floating-point behavior on PPC."

I saw probably similar references at a couple of additional places today, basically gohugoio/hugo#6387 (comment)
disintegration/gift#20 (comment)

which essentially say that
The Go compiler now implements a fused multiply and add (FMA) instruction on some architectures, which leads to floating-point rounding differences compared to amd64. is this what you are pointing to as well?

@knz
Copy link
Contributor

knz commented Nov 9, 2021

I wasn't aware about the FMA behavior in the go compiler, but yet it could contribute to the situation as well.

Note that even without FMA, there are subtle differences in the ordering of FP operations inside the micro-architecture that can cause rounding differences.

@prashantkhoje
Copy link
Contributor

The geo failures are fixed by #81734.

The geogfn and geomfn failures are fixed by:
#81894
#82250

One of the geoindex failure is fixed by #82456.

Following failures are not reproducible anymore:

  • pkg/geo/geographiclib
  • pkg/sql/sem/tree
  • pkg/sql/sem/tree/eval_test

@prashantkhoje
Copy link
Contributor

Hello @otan,
How are the fixes in vendored code handled?
I've a solution for test failures in pkg/geo/geomfn.
It'll be submitted to the original source repository. However, it'll take some time to get merged and then be synchronized in vendored repo.
Is it allowed to commit changes in vendored repo directly?

Thank you,
Prashant

@otan
Copy link
Contributor

otan commented Jul 14, 2022

if you let me know which change i can let you know further

@prashantkhoje
Copy link
Contributor

I need to force floating-point type conversion in twpayne/go-geom/xy/lineintersector/nonrobust_line_intersector.go to avoid FMA on ppc64le.

@prashantkhoje
Copy link
Contributor

prashantkhoje commented Jul 26, 2022

The go-geom v1.4.2 has the required fix in nonrobust_line_intersector.go.
The PR #214 resolves geomfn/TestShortestLineString test failure on ppc64le.
The PR #213 fixes test failures in go-geom.

@otan
Copy link
Contributor

otan commented Aug 15, 2022

gotcha, sorry just came back from holiday.
updating vendor here: #86126

@craig craig bot closed this as completed in 85ae31d Aug 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
B-unsupported-arch Non-x86_64 architectures: PPC, MIPS, etc C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-community Originated from the community T-sql-queries SQL Queries Team X-blathers-triaged blathers was able to find an owner
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants