Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"list series" should allow filtering using a regular expression #376

Merged
merged 1 commit into from
Aug 12, 2014

Conversation

jvshahid
Copy link
Contributor

The "list series" command should allow the return of a subset of series based on a regular expression similar to the way this is possible for select statements like "list series /stats..*/i;".

Since issue #351 talks about introducing a system database I'm wondering if it would perhaps be possible to get rid of the special "list series" syntax altogether and instead take an approach similar to MySQL's information_schema and use something like "select name from system.db.testdb.timeseries where name =~ /stats..*/".

@pauldix
Copy link
Member

pauldix commented Mar 30, 2014

You can kind of do this right now using the select from regex queries. Like:

select * from /some_regex/ limit 1

However, there is an open issue with the performance of doing a limit 1 query. It should be VERY fast, but #364 will track progress on fixing that.

The purpose of the system database is less about describing schema and more about storing performance and log information. Although maybe that would make sense? Not sure.

@daledude
Copy link

Being able to use a regex for "list series" (or whatever method of getting series names) would be useful in general. A method that returns only a part of the series name would be very useful in particular for grafana and mavino's patch grafana/grafana#375 for templated filters.

@Dieterbe
Copy link
Contributor

Dieterbe commented Jun 9, 2014

the ability to get all series matching a certain expression would be great for graphite-influxdb.
I've found that select * from /regex/ limit 1 takes several minutes (with only about 300k series),
so graphite-influxdb currently does a list series and then does the regex check in python, which is slow as well.
i have no strong preference over list series like X (like mysql has show tables like '%patt%' syntax) or using a system database. the latter seems more rigid and flexible, perhaps?

@Dieterbe
Copy link
Contributor

Dieterbe commented Jun 9, 2014

also I would appreciate the ability to not only retrieve series names (matching a regex), but also their respective column names. perhaps it would also make sense to filter not only on series names, but also on column names. ( see vimeo/graphite-influxdb#4)

@Dieterbe
Copy link
Contributor

Dieterbe commented Jul 7, 2014

having this would shave a few hundred ms off every graphite-influxdb /render request, which would be pretty great!

@pauldix
Copy link
Member

pauldix commented Jul 7, 2014

This will be trivial to implement after I merge in #689, which should be later this week.

@Dieterbe
Copy link
Contributor

the below seems correct but i couldn't quite figure out how to best update the query parsing stuff to pass the regex into this function. anyone else wants to do that part? :)

diff --git a/coordinator/coordinator.go b/coordinator/coordinator.go
index 3ff8304..5ad6bc8 100644
--- a/coordinator/coordinator.go
+++ b/coordinator/coordinator.go
@@ -173,11 +173,16 @@ func (self *CoordinatorImpl) runListSeriesQuery(querySpec *parser.QuerySpec, ser
        series := self.clusterConfiguration.MetaStore.GetSeriesForDatabase(querySpec.Database())
        name := "list_series_result"
        fields := []string{"name"}
-       points := make([]*protocol.Point, len(series), len(series))
+       points := make([]*protocol.Point, 0, len(series))
+       r, _ := regexp.Compile("some regex")

-       for i, s := range series {
-               fieldValues := []*protocol.FieldValue{{StringValue: proto.String(s)}}
-               points[i] = &protocol.Point{Values: fieldValues}
+       for _, s := range series {
+               if !r.MatchString(s) {
+                       continue
+               }
+               name := proto.String(s)
+               fieldValues := []*protocol.FieldValue{{StringValue: name}}
+               points = append(points, &protocol.Point{Values: fieldValues})
        }

        seriesResult := &protocol.Series{Name: &name, Fields: fields, Points: points}

@pauldix
Copy link
Member

pauldix commented Jul 30, 2014

You'll have to modify the query.yacc and possibly some other stuff. You can probably get it done by modeling it after how the drop series query is done since it has one argument: https://github.com/influxdb/influxdb/blob/master/parser/query.yacc#L171-L175

@ddieterly
Copy link

Is there an ETA for regex in 'list series' to be supported? We are waiting for this feature. Thanks.

@Dieterbe
Copy link
Contributor

Dieterbe commented Aug 1, 2014

i've been trying to work on it but i got stuck on some yacc errors. https://gist.github.com/Dieterbe/f84a2aaf5ce7c49663a7 and haven't made progress since then. i'm hoping somebody else just beats me to it ;) maybe you can have a look.

@ddieterly
Copy link

I'm checking with my team to see if we want to jump in on this and lend a hand.

@ddieterly
Copy link

Is there a design doc on how this feature is to be implemented? How will regex searches be implemented efficiently? There is some concern on my team about whether this can be done or not. Thanks.

@Dieterbe
Copy link
Contributor

Dieterbe commented Aug 1, 2014

i already got the regex stuff down (see the diff above), it's just a matter of parsing list series like /regex/ using yacc and passing the regex into runListSeriesQuery

@pauldix
Copy link
Member

pauldix commented Aug 1, 2014

@ddieterly The naive way would just be to loop through the series names and see which ones match. They're all kept in memory so it should be fairly fast. How often would you be issuing list series queries?

@ddieterly
Copy link

This particular query is initiated by user interaction. So, probably not very often. I think your ’naïve' solution is a good one. Thanks.

@TheOriginalAK47
Copy link

Hey @pauldix, do y'all plan on adding this feature into the InfluxDB source code and if so is there a ETA?

@pauldix
Copy link
Member

pauldix commented Aug 5, 2014

@TheOriginalAK47 no ETA, but this is a fairly important feature. I have a few other things on my plate but I'll try to take a look soon. Unless you can have a look at @Dieterbe's work and help him with the query language part? @Dieterbe, do you have this on a branch somewhere?

@Dieterbe
Copy link
Contributor

Dieterbe commented Aug 5, 2014

i put my WIP stuff here
https://github.com/Dieterbe/influxdb/tree/list-queries-like-regex
would love it if someone can chip in cause i don't have time for this anymore

pauldix added a commit that referenced this pull request Aug 12, 2014
"list series" should allow filtering using a regular expression
@pauldix pauldix merged commit 23b7ae0 into master Aug 12, 2014
@pauldix
Copy link
Member

pauldix commented Aug 12, 2014

Looks good, we'll push with the next 0.8 rc. Syntax: list series /.*foo.*/i

@Dieterbe
Copy link
Contributor

awesome. looks great. thanks @jvshahid !
just a tiny note.. i noticed if there's a regex, the code creates a nil slice and then calls append(). i wonder how the performance of that compares to the approach of creating a slice with a cap of len(series) and length 0 (which would use more memory but a lower append() overhead because there's only the initial array allocation, I think)

@jvshahid
Copy link
Contributor

That's a good point, I think we can get rid of the extra slice by just filtering on the fly. Which I initially wanted to do, but forgot for some reason.

@jvshahid jvshahid deleted the list-series-376 branch August 12, 2014 19:17
@jvshahid jvshahid added this to the 0.8.0 milestone Aug 12, 2014
@Dieterbe
Copy link
Contributor

You know a way to not even create 1 extra array? I'm intrigued.

meantime I decided to try this out

diff --git a/coordinator/coordinator.go b/coordinator/coordinator.go
index c936693..b63fb06 100644
--- a/coordinator/coordinator.go
+++ b/coordinator/coordinator.go
@@ -173,7 +173,7 @@ func (self *CoordinatorImpl) runListSeriesQuery(querySpec *parser.
        allSeries := self.clusterConfiguration.MetaStore.GetSeriesForDatabase(querySpe
        matchingSeries := allSeries
        if q := querySpec.Query().GetListSeriesQuery(); q.HasRegex() {
-               matchingSeries = nil
+               matchingSeries = make([]string, 0, len(allSeries))
                regex := q.GetRegex()
                for _, s := range allSeries {
                        if !regex.MatchString(s) {
lines 1-13/13 (END)
echo 'list series' | influx-cli -db graphite  | wc -l
168478

simple benchmark:

list series /.*/  # in ms

before:
842
868
831
864
834
after:
853
873
850
816
884


list series /.*df.*dfs2.*net.*multi.*/

before:
1.701663888s
1.63869869s
1.661913666s
1.651506558s
1.751640935s
after:
1.626757677s
1.673508082s
1.649400369s
1.692636814s
1.665108684s

result: no significant improvement.

@Dieterbe
Copy link
Contributor

(btw, with my influx-cli program you can type \t to activate tracking timings for queries

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants