From 3d4ee46c8e8a1a983f34072385490365b312b3d6 Mon Sep 17 00:00:00 2001 From: anjakefala Date: Fri, 26 Aug 2022 14:17:55 -0700 Subject: [PATCH 1/3] PARQUET-758: Add Float16/Half-float logical type Type involves a trade-off of reduced precision, in exchange for more efficient storage. --- LogicalTypes.md | 8 ++++++++ src/main/thrift/parquet.thrift | 2 ++ 2 files changed, 10 insertions(+) diff --git a/LogicalTypes.md b/LogicalTypes.md index b860ea50..13acd12b 100644 --- a/LogicalTypes.md +++ b/LogicalTypes.md @@ -245,6 +245,14 @@ comparison. To support compatibility with older readers, implementations of parquet-format should write `DecimalType` precision and scale into the corresponding SchemaElement field in metadata. +### FLOAT16 + +The `FLOAT16` annotation represents half-precision floating-point numbers in the 2-byte IEEE little-endian format. + +Used in contexts where precision is traded off for smaller footprint and potentially better performance. + +The primitive type is a 2-byte fixed length binary. + ## Temporal Types ### DATE diff --git a/src/main/thrift/parquet.thrift b/src/main/thrift/parquet.thrift index 81a7cf82..510c070a 100644 --- a/src/main/thrift/parquet.thrift +++ b/src/main/thrift/parquet.thrift @@ -232,6 +232,7 @@ struct MapType {} // see LogicalTypes.md struct ListType {} // see LogicalTypes.md struct EnumType {} // allowed for BINARY, must be encoded with UTF-8 struct DateType {} // allowed for INT32 +struct Float16Type {} // allowed for FIXED[2], must encoded raw FLOAT16 bytes /** * Logical type to annotate a column that is always null. @@ -342,6 +343,7 @@ union LogicalType { 12: JsonType JSON // use ConvertedType JSON 13: BsonType BSON // use ConvertedType BSON 14: UUIDType UUID // no compatible ConvertedType + 15: Float16Type FLOAT16 // no compatible ConvertedType } /** From 333f61b76441b7035d6f8b2566713845bd337890 Mon Sep 17 00:00:00 2001 From: anjakefala Date: Wed, 14 Dec 2022 14:19:37 -0800 Subject: [PATCH 2/3] PARQUET-758: specify sort order for Float16 --- LogicalTypes.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/LogicalTypes.md b/LogicalTypes.md index 13acd12b..51bc42f6 100644 --- a/LogicalTypes.md +++ b/LogicalTypes.md @@ -253,6 +253,8 @@ Used in contexts where precision is traded off for smaller footprint and potenti The primitive type is a 2-byte fixed length binary. +The sort order for `FLOAT16` is signed (with special handling of NANs and signed zeros); it uses the same [logic](https://github.com/apache/parquet-format#sort-order) as `FLOAT32` and `FLOAT64`. + ## Temporal Types ### DATE From 5f26a451884429463ceb29149c54ea320687cc40 Mon Sep 17 00:00:00 2001 From: anjakefala Date: Tue, 17 Oct 2023 12:33:50 -0700 Subject: [PATCH 3/3] PARQUET-758: correct names of primitive types --- LogicalTypes.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/LogicalTypes.md b/LogicalTypes.md index 51bc42f6..dd56818c 100644 --- a/LogicalTypes.md +++ b/LogicalTypes.md @@ -253,7 +253,7 @@ Used in contexts where precision is traded off for smaller footprint and potenti The primitive type is a 2-byte fixed length binary. -The sort order for `FLOAT16` is signed (with special handling of NANs and signed zeros); it uses the same [logic](https://github.com/apache/parquet-format#sort-order) as `FLOAT32` and `FLOAT64`. +The sort order for `FLOAT16` is signed (with special handling of NANs and signed zeros); it uses the same [logic](https://github.com/apache/parquet-format#sort-order) as `FLOAT` and `DOUBLE`. ## Temporal Types