Facet value of boolean mapped fields should be "true" or "false" #2462

kroepke · 2012-12-04T12:02:32Z

When performing the following index and search operations I would expect the facet values of boolean fields to be "true" or "false" and not "T" and "F", since "T" and "F" are implementation detail of the BooleanFieldMapper.

To reproduce (using any version I tested, including current master):

curl -XPUT http://localhost:9200/boolfacet
curl -XPUT http://localhost:9200/boolfacet/auto/1 -d '{"foo":true}'
curl -XPOST http://localhost:9200/boolfacet/auto/_search?pretty=true -d '{"query":{"match_all":{}}, "facets":{"bar":{"terms":{"field":"foo"}}}}'

{
   "took" : 1,
   "timed_out" : false,
   "_shards" : {
     "total" : 5,
    "successful" : 5,
     "failed" : 0
   },
   "hits" : {
     "total" : 1,
     "max_score" : 1.0,
     "hits" : [ {
       "_index" : "boolfacet",
       "_type" : "auto",
       "_id" : "1",
       "_score" : 1.0, "_source" : {"foo":true}
     } ]
   },
   "facets" : {
     "bar" : {
       "_type" : "terms",
       "missing" : 0,
       "total" : 1,
       "other" : 0,
       "terms" : [ {
         "term" : "T",
         "count" : 1
       } ]
     }
   }
 }

Note the "term": "T" line in the facets.
A preliminary patch follows shortly.

The text was updated successfully, but these errors were encountered:

pcomte · 2012-12-05T10:03:52Z

Hi,

Any idea when this will be included in the master branch?

Thanks!

martijnvg · 2012-12-06T16:24:41Z

@kroepke This makes sense. We will get this in after the lucene4 refactorings.

kroepke · 2012-12-06T19:56:46Z

@martijnvg ok, great. Until then we will build our own version with that patch then.
Thx!
Let me know if you want me to clean anything up when this comes up again.

kroepke · 2012-12-07T10:59:38Z

A workaround for this problem is using the following script in the term fact.
Not pretty, but it solves the immediate problem for boolean mappers.

{
  "query": {
    "match_all": {}
  },
  "facets": {
    "bla": {
      "terms": {
        "term": true,
        "size": 10,
        "script": "doc['foo'].value == 'T' ? true : false"
      }
    }
  }
}

Results in:

{
  "facets": {
     "bla": {
        "_type": "terms",
        "missing": 0,
        "total": 1,
        "other": 0,
        "terms": [
             {
                 "term": "true",
                 "count": 1
              }
         ]
    }
}

karmi · 2012-12-07T12:20:46Z

@kroepke Just interested, how does using script affect performance, do you have any numbers or a guess?

kroepke · 2012-12-10T12:58:54Z

@karmi We haven't done any benchmarks yet, we hope to get this deployed the coming week.

kroepke · 2012-12-13T16:22:03Z

@karmi I've just done a very unscientific test, using the queries above.
The index has 213198 documents and is 1.9gb in size.
The value distribution is extremely uneven towards "false" (upwards of 99% false), but that shouldn't actually matter too much.
It's running on a single server, all data fits into memory, nothing else is using that machine. Unfortunately I forgot to change the shard count, which is the default 5, so this is 5 shards on one box.
Without the script response times around between 8 and 9 ms, with the script field it's consistently between 48 and 50 ms, so the overhead is considerable.
It may or may not be feasible for people to use the workaround, but we will have to go with a patched version until this gets fixed upstream.

karmi · 2012-12-13T16:26:23Z

@kroepke Thanks for the numbers! That is still quite a good performance, considering it has to scan all the matching documents and aggregate the counts.

s1monw · 2013-05-03T21:17:43Z

this seems to be open for a long time... I will look into that soonish

lmenezes · 2013-06-04T08:39:55Z

@s1monw
Any updates on that?
I understand that changing this at this point could lead to problems to people who count on this values being returned the way it is.
Perhaps the mapping could be configured to either map to true/false or not, and defaulting to the current behavior?

kimchy · 2013-06-04T12:38:48Z

our plan is to look into this for the 1.0 release. Your solution is definitely a possible way for us to support it.

kroepke · 2013-06-05T12:10:48Z

The following request illustrates the new behavior:

curl -XPOST http://localhost:9200/boolfacet/auto/_search?pretty=true -d '{"query":{"match_all":{}}, "facets":{"bar":{"terms":{"field""foo", "map_boolean_values":"true"}}}}'

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "boolfacet",
      "_type" : "auto",
      "_id" : "1",
      "_score" : 1.0, "_source" : {"foo":true}
    } ]
  },
  "facets" : {
    "bar" : {
      "_type" : "terms",
      "missing" : 0,
      "total" : 1,
      "other" : 0,
      "terms" : [ {
        "term" : "true",
        "count" : 1
      } ]
    }
  }
}

Without the flag it's back to 'T' and 'F', also of course if it's set to 'false'.
Since you don't merge pull request here, feel free to pick that commit apart. Also I'm not sure what else needs to be done to properly support it in all code paths.
If you tell me what to test with I'd be very happy to implement the missing bits.

s1monw · 2013-06-06T06:57:03Z

hey there, I just briefly looked at the pull request and I wonder if we can make this a little less intrusive. At the end of the day what counts here is what kind of representation we have to "True" and "False" so we could just load this through FieldData adding a feature next to the FieldDataFilter. I'd call it a FieldDataTransformer that you can use to transform `"T" to "True"' when field data is loaded then your facets would just naturally come back as a full boolean string?

kroepke · 2013-06-06T09:23:30Z

I'm not sure I can follow you there :)

From what I can see you still need to have the field mapper to do that for you, since that's what it does, no?
Or do you suggest moving the responsibility for resolving the conversion somewhere else?

I'm asking because this problem affects every field that stores values in a different representation, e.g. IPs. For those you get back the int based representation, which I actually find worse than the boolean case...

clintongormley · 2013-06-06T11:32:09Z

Hiya @kroepke

Been talking to @s1monw about this and two features seem to make sense:

FieldData Transformer:

You would be able to specify a script on a field which allows you to transform each value as it is loaded into the fielddata cache. This would be useful for, eg:

converting "T"/"F" to "true"/"false"
truncating not_analyzed strings to eg 15 characters for sorting purposes (on the fly, as opposed to having to reindex with the truncate token filter
capitalizing tokens for display
rounding datetimes to (eg) seconds instead of milliseconds

The benefit here is that there would be a one-time cost of conversion (just when loading the fielddata) and the same values would be exposed for facets, scripting and sorting.

Facet Value Transformer:

A script which is applied just before the serialization of the facet responses into JSON which could, eg:

convert "T"/"F" to "true"/"false"
convert longs representing IPv4 addresses into their string representations
capitalize tokens for display
format datetime longs as date strings

These would be more efficient than the current facet scripts because they would only be run on the return values, not on every value. However, you would still have the raw longs loaded into memory which are more useful for calculations (eg easier to calc netmasks on a long rather than on a string IPv4 address)

I believe that the value transformers are planned for the new facets that we are planning for 1.0 (@uri?)

lmenezes · 2013-06-06T11:50:11Z

The idea of having to specify a script for executing these mappings feel awkward, imho.

I think that everybody using elastic search will eventually run into this issue, and the script solution seems more like a work around rather than an "option" of something that should be transparent in the first place.

I could see this feature as a generic transformer(which i like) for the facets, but for elastic search builtin types(boolean, ip..) I think this should be done without the need of defining anything(at most a "flag", saying you want it to be done, and even that is just because of backward compatibility).

If internally, this "automatic" transformation for the builtin types is done using the script implementation you just described, cool. but having everybody defining its own script for mappings that most likely everybody will do, feels wrong.

that's my 2 cents anyway...

@clintongormley maybe the wrong uri :D @uboness

clintongormley · 2013-06-06T11:57:51Z

(heh, too many nicks in too many places)

Yes, I understand what you mean about the core types in ES. I think boolean and ip are the only two types that this really applies to. "true" and "false" are less surprising than "T" and "F", and IP addresses expressed as longs are pretty much useless in facets.

Perhaps for these two types, this should be the default return value in the new aggregations rewrite. We don't have to maintain backward compatibility here because this is completely new functionality. The old facets will continue to work as before.

For any other "label" transformations, the only generically useful way of doing them is via a script, as everybody will have their own particular requirements which would result in an explosion of options.

lmenezes · 2013-06-06T12:02:02Z

@clintongormley Agreed. This way makes complete sense and I like the script approach as a generic way of doing this. Looking forward to it.

kroepke · 2013-06-06T12:23:19Z

@clintongormley In fact this applies to dates as well, if you use terms facets (even if you might argue that terms facets on dates could be done with the date histogram facet instead).
While I see the value of what @s1monw is proposing, I have to agree with @lmenezes that it feels wrong to have to fix this with specifying it for every query just to get back what I put in.
The use case of doing netmask calculations is a valid one, but right now I consider terms facets to be broken because they leak internal representation. I doubt that it's useful to keep this for backwards compatibility reasons beyond 1.0, especially since most of the code has been touched in the last 7 months anyway.
(Sorry for sounding a little bitter, but this has been a continuous problem for us.)

clintongormley · 2013-06-06T12:27:25Z

@kroepke As I understand it, there are also plans for converting datetimes into strings, by specifying formats (rather than scripts). However, for (eg) charting libraries, it is usually more efficient to hand them millis-since-the-epoch instead of date strings, so the default return value will probably remain as longs.

gpopovic · 2014-01-04T14:49:17Z

Any news on this?

uboness · 2014-01-10T02:44:01Z

we'll work on this one post 1.0

s1monw · 2014-07-04T10:28:43Z

We deprecated facets already in 1.3 we decided to close this. We will fix this in aggregations instead.

aaneja · 2014-10-08T23:25:57Z

Any updates on which ES release will have this ?

clintongormley · 2014-10-15T19:43:31Z

@aaneja will probably be 2.0

uboness mentioned this issue Dec 14, 2012

spurious terms in boolean field #2484

Closed

ghost assigned s1monw May 3, 2013

willkg mentioned this issue Jun 6, 2013

Make faceting on booleans return True and False instead of "T" and "F" mozilla/elasticutils#72

Closed

damienalexandre mentioned this issue Jul 3, 2013

Boolean fields in custom_score fails #609

Closed

lmenezes mentioned this issue Jul 20, 2013

Terms facet on ip field returns raw integers instead of ip addresses #3321

Closed

rashidkpc mentioned this issue Oct 29, 2013

ES type "ip" displayed as integer elastic/kibana#571

Closed

uboness mentioned this issue Jan 10, 2014

Create dedicated fielddata implementation for boolean fields #4678

Closed

kevinkluge added the v1.1.0 label Feb 11, 2014

kevinkluge removed the v1.0.1 label Feb 11, 2014

s1monw added v1.2.0 and removed v1.1.0 labels Mar 12, 2014

jpountz added v1.3.0 and removed v1.2.0 labels May 7, 2014

s1monw closed this as completed Jul 4, 2014

s1monw removed v2.0.0 labels Jul 4, 2014

cigolpl mentioned this issue Jun 27, 2016

Aggregation with boolean data type fields itemsapi/itemsapi#129

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Facet value of boolean mapped fields should be "true" or "false" #2462

Facet value of boolean mapped fields should be "true" or "false" #2462

kroepke commented Dec 4, 2012

pcomte commented Dec 5, 2012

martijnvg commented Dec 6, 2012

kroepke commented Dec 6, 2012

kroepke commented Dec 7, 2012

karmi commented Dec 7, 2012

kroepke commented Dec 10, 2012

kroepke commented Dec 13, 2012

karmi commented Dec 13, 2012

s1monw commented May 3, 2013

lmenezes commented Jun 4, 2013

kimchy commented Jun 4, 2013

kroepke commented Jun 5, 2013

s1monw commented Jun 6, 2013

kroepke commented Jun 6, 2013

clintongormley commented Jun 6, 2013

lmenezes commented Jun 6, 2013

clintongormley commented Jun 6, 2013

lmenezes commented Jun 6, 2013

kroepke commented Jun 6, 2013

clintongormley commented Jun 6, 2013

gpopovic commented Jan 4, 2014

uboness commented Jan 10, 2014

s1monw commented Jul 4, 2014

aaneja commented Oct 8, 2014

clintongormley commented Oct 15, 2014

Facet value of boolean mapped fields should be "true" or "false" #2462

Facet value of boolean mapped fields should be "true" or "false" #2462

Comments

kroepke commented Dec 4, 2012

pcomte commented Dec 5, 2012

martijnvg commented Dec 6, 2012

kroepke commented Dec 6, 2012

kroepke commented Dec 7, 2012

karmi commented Dec 7, 2012

kroepke commented Dec 10, 2012

kroepke commented Dec 13, 2012

karmi commented Dec 13, 2012

s1monw commented May 3, 2013

lmenezes commented Jun 4, 2013

kimchy commented Jun 4, 2013

kroepke commented Jun 5, 2013

s1monw commented Jun 6, 2013

kroepke commented Jun 6, 2013

clintongormley commented Jun 6, 2013

FieldData Transformer:

Facet Value Transformer:

lmenezes commented Jun 6, 2013

clintongormley commented Jun 6, 2013

lmenezes commented Jun 6, 2013

kroepke commented Jun 6, 2013

clintongormley commented Jun 6, 2013

gpopovic commented Jan 4, 2014

uboness commented Jan 10, 2014

s1monw commented Jul 4, 2014

aaneja commented Oct 8, 2014

clintongormley commented Oct 15, 2014