Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PARQUET-758: Add Float16/Half-float logical type #184

Merged
merged 3 commits into from
Oct 27, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions LogicalTypes.md
Original file line number Diff line number Diff line change
Expand Up @@ -245,6 +245,16 @@ comparison.
To support compatibility with older readers, implementations of parquet-format should
write `DecimalType` precision and scale into the corresponding SchemaElement field in metadata.

### FLOAT16

The `FLOAT16` annotation represents half-precision floating-point numbers in the 2-byte IEEE little-endian format.

Used in contexts where precision is traded off for smaller footprint and potentially better performance.

The primitive type is a 2-byte fixed length binary.

The sort order for `FLOAT16` is signed (with special handling of NANs and signed zeros); it uses the same [logic](https://github.com/apache/parquet-format#sort-order) as `FLOAT` and `DOUBLE`.

## Temporal Types

### DATE
Expand Down
2 changes: 2 additions & 0 deletions src/main/thrift/parquet.thrift
Original file line number Diff line number Diff line change
Expand Up @@ -232,6 +232,7 @@ struct MapType {} // see LogicalTypes.md
struct ListType {} // see LogicalTypes.md
struct EnumType {} // allowed for BINARY, must be encoded with UTF-8
struct DateType {} // allowed for INT32
struct Float16Type {} // allowed for FIXED[2], must encoded raw FLOAT16 bytes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not allow bit splitting?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@emkornfield What do you mean here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, perhaps you mean the BYTE_STREAM_SPLIT encoding?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. BYTE_STREAM_SPLIT

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I guess it wouldn't cost much to allow it (implementations would not support it at the start anyway).


/**
* Logical type to annotate a column that is always null.
Expand Down Expand Up @@ -342,6 +343,7 @@ union LogicalType {
12: JsonType JSON // use ConvertedType JSON
13: BsonType BSON // use ConvertedType BSON
14: UUIDType UUID // no compatible ConvertedType
15: Float16Type FLOAT16 // no compatible ConvertedType
}

/**
Expand Down