Skip to content
Andrei Tupitcyn edited this page Dec 23, 2015 · 1 revision

SerDe is short for Serializer/Deserializer. Parsek uses SerDe interface for parsing source data and serialize PValue.

Separated SerDe

Allow to work with delimiter-separated values (DSV). The most popular DSV formats are CSV and TSV. Also Separated SerDe compatible with Hive Delimited format and support complex types like Map, Struct and List.

Configuration key Default Description
fields - List of fields. Can be list of field names or field definitions.
enclosure " Enclosure character.
escape \ Escape character.
delimiter \u0001 Delimiter character. The same as FIELDS TERMINATED BY for Hive.
listDelimiter \u0002 The same as COLLECTION ITEMS TERMINATED BY for Hive.
mapFieldDelimiter \u0002 The same as MAP KEYS TERMINATED BY for Hive.
nullValue "" Value for Null's.
multiLine false Allow multiline output. Hive is not supported multiline data.
timeFormat yyyy-MM-dd HH:mm:ss Parse/write format for time fields. For unixtimestamp format use "timestamp" value.

CSV SerDe

The same as Separated SerDe but with overwritten defaults:

case class CsvSerDe(config: Config) extends DelimitedSerDeTrait {
  override val delimiter = ','
  override val listDelimiter = '|'
  override val mapFieldDelimiter = ':'
}

TSV SerDe

The same as Separated SerDe but with overwritten defaults:

case class TsvSerDe(config: Config) extends DelimitedSerDeTrait {
  override val delimiter = '\t'
  override val listDelimiter = '|'
  override val mapFieldDelimiter = ':'
}

HiveTsv SerDe

The same as Separated SerDe but with overwritten defaults:

case class HiveTsvSerDe(config: Config) extends DelimitedSerDeTrait {
  override val delimiter = '\t'
  override val listDelimiter = '|'
  override val mapFieldDelimiter = ':'
  override val enclosure = CSVWriter.NO_ESCAPE_CHARACTER
}

Json SerDe

Allow to work with data in json format.

Configuration key Default Description
fields - List of field names.
timeFormat yyyy-MM-dd HH:mm:ss Parse/write format for time fields. For unixtimestamp format use "timestamp" value.
Clone this wiki locally