Skip to content

Commit

Permalink
Add from_iso8601_timestamp Presto function (facebookincubator#10062)
Browse files Browse the repository at this point in the history
Summary:

from_iso8601_timestamp parses the ISO 8601 formatted string into a timestamp with time zone.

Accepts formats described by the following syntax::

        datetime          = time | date-opt-time
        time              = 'T' time-element [offset]
        date-opt-time     = date-element ['T' [time-element] [offset]]
        date-element      = yyyy ['-' MM ['-' dd]]
        time-element      = HH [minute-element] | [fraction]
        minute-element    = ':' mm [second-element] | [fraction]
        second-element    = ':' ss [fraction]
        fraction          = ('.' | ',') digit+
        offset            = 'Z' | (('+' | '-') HH [':' mm [':' ss [('.' | ',') SSS]]])

Examples of valid input strings:

* '2012'
* '2012-4'
* '2012-04'
* '2012-4-7'
* '2012-04-07'
* '2012-04-07   '
* '2012-04T01:02'
* 'T01:02:34'
* 'T01:02:34,123'
* '2012-04-07T01:02:34'
* '2012-04-07T01:02:34.123'
* '2012-04-07T01:02:34,123'
* '2012-04-07T01:02:34.123Z'
* '2012-04-07T01:02:34.123-05:00'

Limitations:

- Presto supports ordinal and week dates, but Velox doesn't. This limitation exists for from_iso8601_date as well: facebookincubator#10058

from_iso8601_timestamp is similar to CAST, but it is not the same. CAST accepts the following formats:

       date-opt-time     = date-element [' ' [time-element] [[' '] [offset]]]
       date-element      = yyyy ['-' MM ['-' dd]]
       time-element      = HH [minute-element] | [fraction]
       minute-element    = ':' mm [second-element] | [fraction]
       second-element    = ':' ss [fraction]
       fraction          = '.' digit+
       offset            = 'Z' | ZZZ

Notable differences are:

* Separator between date and time: space in CAST; T in from_iso.
* Separator between seconds and microseconds: period in CAST; period or comma in from_iso.
* Time zones: offsets or names in CAST; only offsets in from_iso.
* Partial date: not allowed in CAST; allowed in from_iso.
* Hour-only time: not allowed in CAST; allowed in from_iso.
* Leading and trailing whitespaces: allowed in CAST; not allowed in from_iso.

Also, fixed CAST(varchar AS timestamp) and CAST(varchar AS timestamp with time
zone) to not accept 'T' as the separator between date and time. 

Fixes facebookincubator#7258

Fixes facebookincubator#10059

Reviewed By: pedroerp

Differential Revision: D58182255
  • Loading branch information
mbasmanova authored and facebook-github-bot committed Jun 6, 2024
1 parent 7164f92 commit b4a8152
Show file tree
Hide file tree
Showing 24 changed files with 825 additions and 573 deletions.
57 changes: 42 additions & 15 deletions velox/docs/functions/presto/datetime.rst
Original file line number Diff line number Diff line change
Expand Up @@ -95,25 +95,52 @@ Date and Time Functions
.. function:: from_iso8601_date(string) -> date

Parses the ISO 8601 formatted ``string`` into a ``date``.
ISO 8601 ``string`` can be formatted as any of the following:
``[+-][Y]Y*``

``[+-][Y]Y*-[M]M*``
Accepts formats described by the following syntax::

``[+-][Y]Y*-[M]M*-[D]D*``
date = yyyy ['-' MM ['-' dd]]

``[+-][Y]Y*-[M]M*-[D]D* *``
Examples of valid input strings:

Year value must contain at least one digit, and may contain up to six digits.
Month and day values are optional and may each contain one or two digits.
* '2012'
* '2012-4'
* '2012-04'
* '2012-4-7'
* '2012-04-07'
* '2012-04-07 '

Examples of supported input strings:
"2012",
"2012-4",
"2012-04",
"2012-4-7",
"2012-04-07",
"2012-04-07 ”
.. function:: from_iso8601_timestamp(string) -> timestamp with time zone

Parses the ISO 8601 formatted string into a timestamp with time zone.

Accepts formats described by the following syntax::

datetime = time | date-opt-time
time = 'T' time-element [offset]
date-opt-time = date-element ['T' [time-element] [offset]]
date-element = yyyy ['-' MM ['-' dd]]
time-element = HH [minute-element] | [fraction]
minute-element = ':' mm [second-element] | [fraction]
second-element = ':' ss [fraction]
fraction = ('.' | ',') digit+
offset = 'Z' | (('+' | '-') HH [':' mm [':' ss [('.' | ',') SSS]]])

Examples of valid input strings:

* '2012'
* '2012-4'
* '2012-04'
* '2012-4-7'
* '2012-04-07'
* '2012-04-07 '
* '2012-04T01:02'
* 'T01:02:34'
* 'T01:02:34,123'
* '2012-04-07T01:02:34'
* '2012-04-07T01:02:34.123'
* '2012-04-07T01:02:34,123'
* '2012-04-07T01:02:34.123Z'
* '2012-04-07T01:02:34.123-05:00'

.. function:: from_unixtime(unixtime) -> timestamp

Expand Down Expand Up @@ -412,4 +439,4 @@ list of supported timezones follow the definition `here
-- 2012-10-31 01:00:00.000 UTC

SELECT timestamp '2012-10-31 01:00 UTC' AT TIME ZONE 'America/Los_Angeles';
-- 2012-10-30 18:00:00.000 America/Los_Angeles
-- 2012-10-30 18:00:00.000 America/Los_Angeles
10 changes: 6 additions & 4 deletions velox/exec/tests/TableScanTest.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -4195,10 +4195,12 @@ TEST_F(TableScanTest, timestampPartitionKey) {
makeFlatVector<Timestamp>(
std::end(inputs) - std::begin(inputs),
[&](auto i) {
auto t = util::fromTimestampString(inputs[i]).thenOrThrow(
folly::identity, [&](const Status& status) {
VELOX_USER_FAIL("{}", status.message());
});
auto t = util::fromTimestampString(
inputs[i], util::TimestampParseMode::kPrestoCast)
.thenOrThrow(
folly::identity, [&](const Status& status) {
VELOX_USER_FAIL("{}", status.message());
});
t.toGMT(Timestamp::defaultTimezone());
return t;
}),
Expand Down
10 changes: 8 additions & 2 deletions velox/expression/ConstantExpr.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -175,14 +175,20 @@ void appendSqlLiteral(
}
case TypeKind::HUGEINT:
[[fallthrough]];
case TypeKind::TIMESTAMP:
[[fallthrough]];
case TypeKind::REAL:
[[fallthrough]];
case TypeKind::DOUBLE:
out << "'" << vector.wrappedVector()->toString(vector.wrappedIndex(row))
<< "'::" << vector.type()->toString();
break;
case TypeKind::TIMESTAMP: {
TimestampToStringOptions options;
options.dateTimeSeparator = ' ';
const auto ts =
vector.wrappedVector()->as<SimpleVector<Timestamp>>()->valueAt(row);
out << "'" << ts.toString(options) << "'::" << vector.type()->toString();
break;
}
case TypeKind::VARCHAR:
appendSqlString(
vector.wrappedVector()->toString(vector.wrappedIndex(row)), out);
Expand Down
4 changes: 2 additions & 2 deletions velox/expression/PrestoCastHooks.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,8 @@ PrestoCastHooks::PrestoCastHooks(const core::QueryConfig& config)

Expected<Timestamp> PrestoCastHooks::castStringToTimestamp(
const StringView& view) const {
const auto conversionResult =
util::fromTimestampWithTimezoneString(view.data(), view.size());
const auto conversionResult = util::fromTimestampWithTimezoneString(
view.data(), view.size(), util::TimestampParseMode::kPrestoCast);
if (conversionResult.hasError()) {
return folly::makeUnexpected(conversionResult.error());
}
Expand Down
5 changes: 5 additions & 0 deletions velox/expression/tests/CastExprTest.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -609,6 +609,11 @@ TEST_F(CastExprTest, stringToTimestamp) {
Timestamp(946729316, 0),
};
testCast<std::string, Timestamp>("timestamp", input, expected);

VELOX_ASSERT_THROW(
(evaluateOnce<Timestamp, std::string>(
"cast(c0 as timestamp)", "1970-01-01T00:00")),
"Cannot cast VARCHAR '1970-01-01T00:00' to TIMESTAMP. Unable to parse timestamp value");
}

TEST_F(CastExprTest, timestampToString) {
Expand Down
2 changes: 1 addition & 1 deletion velox/expression/tests/ExprTest.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2660,7 +2660,7 @@ TEST_P(ParameterizedExprTest, constantToSql) {

ASSERT_EQ(
toSql(Timestamp(123'456, 123'000)),
"'1970-01-02T10:17:36.000123000'::TIMESTAMP");
"'1970-01-02 10:17:36.000123000'::TIMESTAMP");
ASSERT_EQ(toSql(variant::null(TypeKind::TIMESTAMP)), "NULL::TIMESTAMP");

ASSERT_EQ(
Expand Down
8 changes: 5 additions & 3 deletions velox/functions/lib/tests/DateTimeFormatterTest.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -59,9 +59,11 @@ class DateTimeFormatterTest : public testing::Test {
};

static Timestamp fromTimestampString(const StringView& timestamp) {
return util::fromTimestampString(timestamp).thenOrThrow(
folly::identity,
[&](const Status& status) { VELOX_USER_FAIL("{}", status.message()); });
return util::fromTimestampString(
timestamp, util::TimestampParseMode::kPrestoCast)
.thenOrThrow(folly::identity, [&](const Status& status) {
VELOX_USER_FAIL("{}", status.message());
});
}

void testTokenRange(
Expand Down
38 changes: 38 additions & 0 deletions velox/functions/prestosql/DateTimeFunctions.h
Original file line number Diff line number Diff line change
Expand Up @@ -1257,6 +1257,44 @@ struct FromIso8601Date {
}
};

template <typename T>
struct FromIso8601Timestamp {
VELOX_DEFINE_FUNCTION_TYPES(T);

FOLLY_ALWAYS_INLINE void initialize(
const std::vector<TypePtr>& /*inputTypes*/,
const core::QueryConfig& config,
const arg_type<Varchar>* /*input*/) {
auto sessionTzName = config.sessionTimezone();
if (!sessionTzName.empty()) {
sessionTzID_ = util::getTimeZoneID(sessionTzName);
}
}

FOLLY_ALWAYS_INLINE Status call(
out_type<TimestampWithTimezone>& result,
const arg_type<Varchar>& input) {
const auto castResult = util::fromTimestampWithTimezoneString(
input.data(), input.size(), util::TimestampParseMode::kIso8601);
if (castResult.hasError()) {
return castResult.error();
}

auto [ts, tzID] = castResult.value();
// Input string may not contain a timezone - if so, it is interpreted in
// session timezone.
if (tzID == -1) {
tzID = sessionTzID_;
}
ts.toGMT(tzID);
result = pack(ts.toMillis(), tzID);
return Status::OK();
}

private:
int16_t sessionTzID_{0};
};

template <typename T>
struct DateParseFunction {
VELOX_DEFINE_FUNCTION_TYPES(T);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -197,6 +197,8 @@ void registerSimpleFunctions(const std::string& prefix) {
{prefix + "date_parse"});
registerFunction<FromIso8601Date, Date, Varchar>(
{prefix + "from_iso8601_date"});
registerFunction<FromIso8601Timestamp, TimestampWithTimezone, Varchar>(
{prefix + "from_iso8601_timestamp"});
registerFunction<CurrentDateFunction, Date>({prefix + "current_date"});
registerFunction<ToISO8601Function, Varchar, Date>({prefix + "to_iso8601"});
registerFunction<
Expand Down
11 changes: 3 additions & 8 deletions velox/functions/prestosql/tests/ComparisonsTest.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -115,15 +115,10 @@ TEST_F(ComparisonsTest, betweenTimestamp) {
auto expr =
"c0 between cast(\'2019-02-28 10:00:00.500\' as timestamp) and"
" cast(\'2019-02-28 10:00:00.600\' as timestamp)";
if (s.has_value()) {
const auto ts =
util::fromTimestampString((StringView)s.value())
.thenOrThrow(folly::identity, [&](const Status& status) {
VELOX_USER_FAIL("{}", status.message());
});
return evaluateOnce<bool>(expr, std::optional(ts));
if (!s.has_value()) {
return evaluateOnce<bool>(expr, std::optional<Timestamp>());
}
return evaluateOnce<bool>(expr, std::optional<Timestamp>());
return evaluateOnce<bool>(expr, std::optional{parseTimestamp(s.value())});
};

EXPECT_EQ(std::nullopt, between(std::nullopt));
Expand Down
Loading

0 comments on commit b4a8152

Please sign in to comment.