Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix inconsistent data type #2448

Merged
merged 2 commits into from
May 11, 2015
Merged

Conversation

cannium
Copy link
Contributor

@cannium cannium commented Apr 29, 2015

Otherwise panic

panic: interface conversion: interface is *influxql.firstLastMapOutput, not influxql.firstLastMapOutput

would occur when using first(), last() or spread().

@neonstalwart
Copy link
Contributor

@cannium i guess this isn't covered by any tests. might be a good idea to add something to make sure this doesn't regress. i've been adding integration tests for aggregations to https://github.com/influxdb/influxdb/blob/b0865984452938e77148530c4e10b33a6f8c03c5/cmd/influxd/server_integration_test.go#L581. in general it seems the aggregations are not very well tested.

@cannium cannium force-pushed the fix-data-type branch 2 times, most recently from 24412f5 to 1861f41 Compare April 30, 2015 03:40
@cannium
Copy link
Contributor Author

cannium commented Apr 30, 2015

Though I managed to make integration tests passed, I noticed that the returned time of first, last are always 1970-01-01T00:00:00Z. I think that should also be fixed.

@neonstalwart
Copy link
Contributor

i'm not sure that the current architecture supports having aggregate functions return anything more than just a value but i could be wrong.

@neonstalwart
Copy link
Contributor

@cannium actually, i realized that first and last aggregate functions are for a different purpose than when you would use order by with limit 1 to get the whole first or last row (which would include the time related to the point).

aggregates are applied across intervals of time (determined by group by, where, etc) and the time associated with each aggregated value is the time at the beginning of that interval. your tests are implicitly asking to put all data points in 1 big interval and by default that interval starts at the epoch. to verify what i'm saying, try changing your test like so:

{
    reset: true,
    name:  "last value",
    write: `{"database" : "%DB%", "retentionPolicy" : "%RP%", "points": [
        {"name": "cpu", "timestamp": "2000-01-01T00:00:00Z", "fields": {"value": 2}},
        {"name": "cpu", "timestamp": "2000-01-01T00:01:00Z", "fields": {"value": 7}},
        {"name": "cpu", "timestamp": "2000-01-01T00:01:10Z", "fields": {"value": 9}}
    ]}`,
    query:    `SELECT last(value) FROM cpu WHERE time >= '2000-01-01T00:00:00Z' AND time <= '2000-01-01T00:01:10Z' GROUP BY time(2m)`,
    queryDb:  "%DB%",
    expected: `{"results":[{"series":[{"name":"cpu","columns":["time","last"],"values":[["2000-01-01T00:00:00Z",9]]}]}]}`,
},

you can see that now we've specified an interval and the time showed in the results matches the way i've described it.

expected: `{"results":[{"series":[{"name":"cpu","columns":["time","last"],"values":[["1970-01-01T00:00:00Z",9]]}]}]}`,
},
{
reset: true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The data is the same with every write, so there is no need to set the reset flag between each test. It will just slow testing down without reason.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's by purpose. I think it's better to make every test to stand on its own in case of any future modifications.

在 2015年5月1日,06:49,otoolep notifications@github.com 写道:

In cmd/influxd/server_integration_test.go:

  •   },
    
  •   {
    
  •       reset: true,
    
  •       name: "last value",
    
  •       write: `{"database" : "%DB%", "retentionPolicy" : "%RP%", "points": [
    
  •           {"name": "cpu", "timestamp": "2000-01-01T00:00:00Z", "fields": {"value": 2}},
    
  •           {"name": "cpu", "timestamp": "2000-01-01T00:01:00Z", "fields": {"value": 7}},
    
  •           {"name": "cpu", "timestamp": "2000-01-01T00:01:10Z", "fields": {"value": 9}}
    
  •       ]}`,
    
  •       query: `SELECT last(value) FROM cpu`,
    
  •       queryDb:  "%DB%",
    
  •       // FIXME: returned time should be "2000-01-01T00:01:10Z"
    
  •       expected: `{"results":[{"series":[{"name":"cpu","columns":["time","last"],"values":[["1970-01-01T00:00:00Z",9]]}]}]}`,
    
  •   },
    
  •   {
    
  •       reset: true,
    
    The data is the same with every write, so there is no need to set the reset flag between each test. It will just slow testing down without reason.


Reply to this email directly or view it on GitHub.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realise that, but test run times are also important to us. Happy to make changes in the future, but when data is the same, we don't use reset.

@cannium
Copy link
Contributor Author

cannium commented Apr 30, 2015

Yes, I've read the code. But I still feel the method name confusing. From a user's perspective, one might expect the corresponding time stamp also returned, like a normal select query.

在 2015年4月30日,23:15,Ben Hockey notifications@github.com 写道:

@cannium actually, i realized that first and last aggregate functions are for a different purpose than when you would use order by with limit 1 to get the whole first or last row (which would include the time related to the point).

aggregates are applied across intervals of time (determined by group by, where, etc) and the time associated with each aggregated value is the time at the beginning of that interval. your tests are implicitly asking to put all data points in 1 big interval and by default that interval starts at the epoch. to verify what i'm saying, try changing your test like so:

{
reset: true,
name: "last value",
write: {"database" : "%DB%", "retentionPolicy" : "%RP%", "points": [ {"name": "cpu", "timestamp": "2000-01-01T00:00:00Z", "fields": {"value": 2}}, {"name": "cpu", "timestamp": "2000-01-01T00:01:00Z", "fields": {"value": 7}}, {"name": "cpu", "timestamp": "2000-01-01T00:01:10Z", "fields": {"value": 9}} ]},
query: SELECT last(value) FROM cpu WHERE time >= '2000-01-01T00:00:00Z' AND time <= '2000-01-01T00:01:10Z' GROUP BY time(2m),
queryDb: "%DB%",
expected: {"results":[{"series":[{"name":"cpu","columns":["time","last"],"values":[["2000-01-01T00:00:00Z",9]]}]}]},
},
you can see that now we've specified an interval and the time showed in the results matches the way i've described it.


Reply to this email directly or view it on GitHub.

@cannium
Copy link
Contributor Author

cannium commented May 6, 2015

ping?

@toddboom
Copy link
Contributor

toddboom commented May 8, 2015

@cannium would you mind rebasing this? if you can, i'll try to get this merged today.

@cannium
Copy link
Contributor Author

cannium commented May 9, 2015

Rebased.

@toddboom
Copy link
Contributor

+1 from myself and verbal +1 from @otoolep - thanks!

toddboom added a commit that referenced this pull request May 11, 2015
@toddboom toddboom merged commit 6b3bd90 into influxdata:master May 11, 2015
@cannium cannium deleted the fix-data-type branch May 14, 2015 02:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants