You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If you have historical search enabled and the file_format is set to parquet, bad news, we will be screwed if we change the type(s) in a log schema and we will get the error HIVE_PARTITION_SCHEMA_MISMATCH error when we try to search historical data across all partitions in the table using the schema we changed.
For example, if we change following timestamp to string, carbonblack_alert_watchlist_hit_feedsearch_bin table partitions will be screwed.
If we don't change the schema ever, happy life! Unfortunately, this is not the reality 😢
Desired Change
Couple things we can improve.
Standardize Everything on string
String is larger in memory footprint, but is the most permissive to future changes.
Have a script that can fix this quickly
Script should drop target table(s) and rebuild them using new schemas, and should recreate partitions. This script may also need to fix underlying data (which might be hard).
Or other solutions we haven't thought about.
The text was updated successfully, but these errors were encountered:
Background
If you have historical search enabled and the
file_format
is set toparquet
, bad news, we will be screwed if we change the type(s) in a log schema and we will get the errorHIVE_PARTITION_SCHEMA_MISMATCH
error when we try to search historical data across all partitions in the table using the schema we changed.For example, if we change following
timestamp
tostring
,carbonblack_alert_watchlist_hit_feedsearch_bin
table partitions will be screwed.streamalert/conf/schemas/carbonblack.json
Line 51 in 19458d7
If we don't change the schema ever, happy life! Unfortunately, this is not the reality 😢
Desired Change
Couple things we can improve.
Standardize Everything on string
String is larger in memory footprint, but is the most permissive to future changes.
Have a script that can fix this quickly
Script should drop target table(s) and rebuild them using new schemas, and should recreate partitions. This script may also need to fix underlying data (which might be hard).
Or other solutions we haven't thought about.
The text was updated successfully, but these errors were encountered: