Improved #3494: AVX2 bugfixes + no code duplication for the integer workhorses in there #3495

GerHobbelt · 2021-07-13T08:06:55Z

same as patch-4 (#3494) but now with reduced code duplication: for TFloat to work, we don't need to duplicate the integer work functions as it's only the ExtractResults16[8,16] functions that need different implementations for float vs. double. These are therefor common to both implementations:

static void PartialMatrixDotVector64(const int8_t *wi, const TFloat *scales, const int8_t *u,
                                     int num_in, TFloat *v) {

static void PartialMatrixDotVector32(const int8_t *wi, const TFloat *scales, const int8_t *u,
                                     int num_in, TFloat *v) {

static void PartialMatrixDotVector16(const int8_t *wi, const TFloat *scales, const int8_t *u,
                                     int num_in, TFloat *v) {

static inline void PartialMatrixDotVector8(const int8_t *wi, const TFloat *scales, const int8_t *u,
                                           int num_in, TFloat *v) {

static void matrixDotVector(int dim1, int dim2, const int8_t *wi, const TFloat *scales,
                            const int8_t *u, TFloat *v) {

(extract from #3490)

Up to now Tesseract used double for training and recognition with "best" models. This commit replaces double by a new data type TFloat which is double by default, but float if FAST_FLOAT is defined. Ideally this should allow faster training. Signed-off-by: Stefan Weil <sw@weilnetz.de>

Signed-off-by: Stefan Weil <sw@weilnetz.de>

…ation: for TFloat to work, we don't need to duplicate the integer work functions as it's only the ExtractResults16[8,16] functions that need different implementations for float vs. double. These are therefor common to both implementations: ``` static void PartialMatrixDotVector64(const int8_t *wi, const TFloat *scales, const int8_t *u, int num_in, TFloat *v) { static void PartialMatrixDotVector32(const int8_t *wi, const TFloat *scales, const int8_t *u, int num_in, TFloat *v) { static void PartialMatrixDotVector16(const int8_t *wi, const TFloat *scales, const int8_t *u, int num_in, TFloat *v) { static inline void PartialMatrixDotVector8(const int8_t *wi, const TFloat *scales, const int8_t *u, int num_in, TFloat *v) { static void matrixDotVector(int dim1, int dim2, const int8_t *wi, const TFloat *scales, const int8_t *u, TFloat *v) { ```

GerHobbelt · 2021-07-13T08:33:30Z

Closed for reason: submitted against master, which is wrong base. Will re-issue, as github doesn't allow to change pullreq base (?at least I haven't seen how to do that, so re-issuing is the only alt?) --> #3494 (comment)

GerHobbelt · 2021-07-13T09:01:22Z

Re-issued as stweil#5

stweil and others added 11 commits July 13, 2021 07:18

Fix some compiler warnings

c64ab2e

Signed-off-by: Stefan Weil <sw@weilnetz.de>

Optimize DotProductStdInnerProduct for float

78871a9

Signed-off-by: Stefan Weil <sw@weilnetz.de>

Avoid double / float conversion

1b9e462

Signed-off-by: Stefan Weil <sw@weilnetz.de>

Implement TFloat for IntSimdMatrix

93e9022

Signed-off-by: Stefan Weil <sw@weilnetz.de>

Test more implementations of DotProduct

00e4283

Signed-off-by: Stefan Weil <sw@weilnetz.de>

Add unittest for dotproduct

e2529dd

Signed-off-by: Stefan Weil <sw@weilnetz.de>

Support Apple Accelerate framework for training and best models

01ae69e

Signed-off-by: Stefan Weil <sw@weilnetz.de>

Fix TFloat builds for Apple M1

a09531a

Signed-off-by: Stefan Weil <sw@weilnetz.de>

Fix DotProductNative for TFloat

1a59b6f

Signed-off-by: Stefan Weil <sw@weilnetz.de>

GerHobbelt mentioned this pull request Jul 13, 2021

Tfloat patch 4: bugfixes for AVX2 FAST_FLOAT Extract8+16 implementations #3494

Closed

GerHobbelt closed this Jul 13, 2021

This was referenced Jul 13, 2021

bugfixing the AVX2 Extract8+16 codes, where there's lines like [...] stweil/tesseract#4

Merged

Improved #4 / 3494: AVX2 bugfixes + no code duplication for the integer workhorses in there stweil/tesseract#5

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved #3494: AVX2 bugfixes + no code duplication for the integer workhorses in there #3495

Improved #3494: AVX2 bugfixes + no code duplication for the integer workhorses in there #3495

GerHobbelt commented Jul 13, 2021

GerHobbelt commented Jul 13, 2021

GerHobbelt commented Jul 13, 2021

Improved #3494: AVX2 bugfixes + no code duplication for the integer workhorses in there #3495

Improved #3494: AVX2 bugfixes + no code duplication for the integer workhorses in there #3495

Conversation

GerHobbelt commented Jul 13, 2021

GerHobbelt commented Jul 13, 2021

GerHobbelt commented Jul 13, 2021