Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KSQL doesn't handle the column names with a leading @ character #647

Closed
rmoff opened this issue Jan 21, 2018 · 7 comments · Fixed by #581
Closed

KSQL doesn't handle the column names with a leading @ character #647

rmoff opened this issue Jan 21, 2018 · 7 comments · Fixed by #581

Comments

@rmoff
Copy link
Contributor

rmoff commented Jan 21, 2018

Topic contents (from Elasticsearch's filebeats tool):

{"ROWTIME":-1,"ROWKEY":"null","@timestamp":"2018-01-20T21:48:42.069Z","@metadata":{"beat":"filebeat","type":"doc","version":"6.1.2","topic":"logs"},"message":"2018-01-20 21:48:40+00 asgard02 deleted[582]: diskmanagement: [DMManager(PrivateMethods) clientConforms:error:]: currentThread=12339=0x3033 expectedThread=20483=0x5003","prospector":{"type":"log"},"beat":{"name":"asgard02.local","hostname":"asgard02.local","version":"6.1.2"},"source":"/var/log/install.log","offset":44244986}
{"ROWTIME":-1,"ROWKEY":"null","@timestamp":"2018-01-20T21:48:42.069Z","@metadata":{"beat":"filebeat","type":"doc","version":"6.1.2","topic":"logs"},"source":"/var/log/install.log","offset":44245154,"message":"2018-01-20 21:48:40+00 asgard02 deleted[582]: diskmanagement: [DMManager(PrivateMethods) clientConforms:error:]: currentThread=12339=0x3033 expectedThread=20483=0x5003","prospector":{"type":"log"},"beat":{"name":"asgard02.local","hostname":"asgard02.local","version":"6.1.2"}}
^CTopic printing ceased

KSQL doesn't handle the column names with a leading @ very well:

ksql> create stream logs2 (source varchar, message varchar, prospector varchar, "@timestamp" varchar) with (kafka_topic='logs', value_format='json');

 Message
----------------
 Stream created
----------------
ksql> describe logs2;

 Field      | Type
----------------------------------------
 ROWTIME    | BIGINT           (system)
 ROWKEY     | VARCHAR(STRING)  (system)
 SOURCE     | VARCHAR(STRING)
 MESSAGE    | VARCHAR(STRING)
 PROSPECTOR | VARCHAR(STRING)
 @timestamp | VARCHAR(STRING)
----------------------------------------
For runtime statistics and query details run: DESCRIBE EXTENDED <Stream,Table>;
ksql> select * from logs2 limit 1;
Code generation failed for SelectValueMapper
Caused by: Line 1, Column 8: Unexpected token "@"
ksql>
ksql> create stream logs3 (source varchar, message varchar, prospector varchar, "@timestamp" varchar) with (kafka_topic='logs', value_format='json', timestamp='"@timestamp"');
No column with the provided timestamp column name in the WITH clause, "@TIMESTAMP", exists in the defined schema.
ksql>
@hjafarpour
Copy link
Contributor

@rmoff You need to change your KSQL statements and use the column name without @. KSQL JSON serde automatically drops the @ prefix from the column name.
Try this for your DDL statement:

create stream logs2 (source varchar, message varchar, prospector varchar, timestamp varchar) with (kafka_topic='logs', value_format='json');

Then you will have a column with name timestamp.

@rmoff
Copy link
Contributor Author

rmoff commented Jan 24, 2018

@hjafarpour How would this work in this situation, in which @timestamp and timestamp are columns in the message:

{
  "request": "/content/images/2017/01/holt_-_Timelion_-_Kibana.png",
  "agent": "\"Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36\"",
  "auth": "-",
  "ident": "-",
  "verb": "GET",
  "message": "193.71.175.102 - - [24/Jan/2018:09:04:15 +0000] \"GET /content/images/2017/01/holt_-_Timelion_-_Kibana.png HTTP/1.1\" 200 148930 \"https://www.google.no/\" \"Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36\" \"-\"",
  "path": "/Users/Robin/Downloads/nginx_access.log",
  "referrer": "\"https://www.google.no/\"",
  "@timestamp": "2018-01-24T23:29:37.951Z",
  "response": "200",
  "bytes": "148930",
  "clientip": "193.71.175.102",
  "@version": "1",
  "host": "asgard02.local",
  "httpversion": "1.1",
  "timestamp": "24/Jan/2018:09:04:15 +0000"
}

Here the problem is that the parsed timestamp (@timestamp) is ignored and the literal timestamp column selected instead:

ksql> create stream logs4 (request varchar, agent varchar, response int, timestamp varchar) with (kafka_topic='logstash_logs_json' , value_format='json');

 Message
----------------
 Stream created
----------------
ksql>
ksql>
ksql> select * from logs4 limit 5;
1516836576053 | null | /rss/ | "Feedbin feed-id:1122344 - 2 subscribers" | 304 | 24/Jan/2018:04:08:44 +0000

@timestamp is in a standardised format, which has been derived from timestamp using logic in the source application.

@rmoff rmoff reopened this Jan 24, 2018
@hjafarpour
Copy link
Contributor

@rmoff This won't work correctly. Can you change the name of timestamp field to something else?

@rmoff
Copy link
Contributor Author

rmoff commented Jan 26, 2018

I could, but a proper fix would be for KSQL to have a way to handle @ properly, e.g. through escaping or quoting.

@apurvam
Copy link
Contributor

apurvam commented Jan 31, 2018

Yes, we should definitely figure out a way to handle @. How important do you think this is @rmoff ? Do you think this is something we should fix before GA?

@bluemonk3y
Copy link

bluemonk3y commented Jan 31, 2018

@hjafarpour - if the fieldname supports it - i.e. Create results in an entity where the name is listed correctly, then why do we strip it in the json serdes - is it to do with the json-mapper?

@apurvam
Copy link
Contributor

apurvam commented Feb 5, 2018

@hjafarpour which PR addresses this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants