Skip to content

Commit

Permalink
apacheGH-39489: [C++][Parquet] Revert apache#39491 and add timestamp …
Browse files Browse the repository at this point in the history
…behavior to doc
  • Loading branch information
mapleFU committed Jan 11, 2024
1 parent c752bdb commit 407b7d7
Show file tree
Hide file tree
Showing 3 changed files with 21 additions and 6 deletions.
3 changes: 2 additions & 1 deletion cpp/src/parquet/arrow/schema.cc
Original file line number Diff line number Diff line change
Expand Up @@ -387,7 +387,8 @@ Status FieldToNode(const std::string& name, const std::shared_ptr<Field>& field,
break;
case ArrowTypeId::TIME64: {
type = ParquetType::INT64;
auto time_type = static_cast<::arrow::Time64Type*>(field->type().get());
const auto* time_type =
static_cast<const ::arrow::Time64Type*>(field->type().get());
if (time_type->unit() == ::arrow::TimeUnit::NANO) {
logical_type =
LogicalType::Time(/*is_adjusted_to_utc=*/true, LogicalType::TimeUnit::NANOS);
Expand Down
6 changes: 5 additions & 1 deletion cpp/src/parquet/arrow/schema_internal.cc
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,11 @@ Result<std::shared_ptr<ArrowType>> MakeArrowTime64(const LogicalType& logical_ty

Result<std::shared_ptr<ArrowType>> MakeArrowTimestamp(const LogicalType& logical_type) {
const auto& timestamp = checked_cast<const TimestampLogicalType&>(logical_type);
const bool utc_normalized = timestamp.is_adjusted_to_utc();
// GH-39489: Parquet timestamps should follow the `timestamp.is_adjusted_to_utc()`,
// however, arrow timestamps should not carry this information if it is from
// a converted type and doesn't have `ARROW:schema` annotation set.
const bool utc_normalized =
timestamp.is_from_converted_type() ? false : timestamp.is_adjusted_to_utc();
static const char* utc_timezone = "UTC";
switch (timestamp.time_unit()) {
case LogicalType::TimeUnit::MILLIS:
Expand Down
18 changes: 14 additions & 4 deletions docs/source/cpp/parquet.rst
Original file line number Diff line number Diff line change
Expand Up @@ -467,12 +467,12 @@ physical type.
+-------------------+-----------------------------+----------------------------+---------+
| DATE | INT32 | Date32 | \(3) |
+-------------------+-----------------------------+----------------------------+---------+
| TIME | INT32 | Time32 (milliseconds) | |
| TIME | INT32 | Time32 (milliseconds) | \(7) |
+-------------------+-----------------------------+----------------------------+---------+
| TIME | INT64 | Time64 (micro- or | |
| TIME | INT64 | Time64 (micro- or | \(7) |
| | | nanoseconds) | |
+-------------------+-----------------------------+----------------------------+---------+
| TIMESTAMP | INT64 | Timestamp (milli-, micro- | |
| TIMESTAMP | INT64 | Timestamp (milli-, micro- | \(8) |
| | | or nanoseconds) | |
+-------------------+-----------------------------+----------------------------+---------+
| STRING | BYTE_ARRAY | Utf8 | \(4) |
Expand All @@ -499,6 +499,16 @@ physical type.
in contradiction with the
`Parquet specification <https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#maps>`__.

* \(7) On the writer side, an Arrow Time32/Time64 will be written as a Parquet
Time with LogicalType LogicalType::Time and adjustedToUTC flag set to true.

* \(8) On the writer side, an Arrow Timestamp will be written as a Parquet
Timestamp with LogicalType LogicalType::Timestamp and adjustedToUTC flag
defined by the arrow timestamp type. On the reader side, the Parquet
Timestamp will be read as an Arrow Timestamp with the adjustedToUTC flag.
If the parquet file was written with a legacy TIMESTAMP ConvertedType, the
adjustedToUTC flag will be set to false in the Arrow Timestamp.

*Unsupported logical types:* JSON, BSON, UUID. If such a type is encountered
when reading a Parquet file, the default physical type mapping is used (for
example, a Parquet JSON column may be read as Arrow Binary or FixedSizeBinary).
Expand Down Expand Up @@ -590,4 +600,4 @@ Miscellaneous
data read APIs do not currently make any use of them.

* \(2) APIs are provided for creating, serializing and deserializing Bloom
Filters, but they are not integrated into data read APIs.
Filters, but they are not integrated into data read/write APIs.

0 comments on commit 407b7d7

Please sign in to comment.