Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Terms facet on ip field returns raw integers instead of ip addresses #3321

Closed
mkaluza opened this issue Jul 12, 2013 · 10 comments
Closed

Terms facet on ip field returns raw integers instead of ip addresses #3321

mkaluza opened this issue Jul 12, 2013 · 10 comments

Comments

@mkaluza
Copy link

mkaluza commented Jul 12, 2013

Terms facet on ip field returns raw integers instead of ip addresses.

When asked:

curl -XGET http://172.16.0.134:9200/nginx-2013.07.12/_search?pretty -d'
{
  "facets": {
    "pie": {
      "terms": {
        "field": "clientip",
        "size": 10,
        "exclude": []
      },
      "facet_filter": {
        "fquery": {
          "query": {
            "filtered": {
              "query": {
                "query_string": {
                  "query": "*"
                }
              },
              "filter": {
                "range": {
                  "@timestamp": {
                    "from": "2013-07-12T12:09:30.122Z",
                    "to": "2013-07-12T12:14:30.122Z"
                  }
                }
              }
            }
          }
        }
      }
    }
  },
  "size": 0
}'

it sais:

{
  "took" : 43,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "hits" : {
    "total" : 1308357,
    "max_score" : 1.0,
    "hits" : [ ]
  },
  "facets" : {
    "pie" : {
      "_type" : "terms",
      "missing" : 0,
      "total" : 14220,
      "other" : 7100,
      "terms" : [ {
        "term" : 1472557106,
        "count" : 1727
      }, {
        "term" : 1501212747,
        "count" : 1621
      }, {
        "term" : 1556945832,
        "count" : 1616
      }, {
        "term" : 1498126311,
        "count" : 566
      }, {
        "term" : 1541613928,
        "count" : 438
      }, {
        "term" : 1541613412,
        "count" : 346
      }, {
        "term" : 2488386185,
        "count" : 227
      }, {
        "term" : 3245280414,
        "count" : 208
      }, {
        "term" : 2999036429,
        "count" : 198
      }, {
        "term" : 1299254797,
        "count" : 173
      } ]
    }
  }
}

Mappings:

curl -XGET http://172.16.0.134:9200/nginx-2013.07.12/_mapping?pretty
{
  "nginx-2013.07.12" : {
    "nginx" : {
      "_all" : {
        "enabled" : false
      },
      "_source" : {
        "compress" : true
      },
      "properties" : {
        "@timestamp" : {
          "type" : "date",
          "format" : "dateOptionalTime"
        },
        "agent" : {
          "type" : "string"
        },
        "app_id" : {
          "type" : "string"
        },
        "auth" : {
          "type" : "string"
        },
        "bytes" : {
          "type" : "integer"
        },
        "clientip" : {
          "type" : "ip"
        },
        "duration" : {
          "type" : "float"
        },
        "hosting" : {
          "type" : "string"
        },
        "httpversion" : {
          "type" : "string"
        },
        "ident" : {
          "type" : "string"
        },
        "message" : {
          "type" : "string",
          "index" : "not_analyzed",
          "omit_norms" : true,
          "index_options" : "docs"
        },
        "physical" : {
          "type" : "string"
        },
        "referrer" : {
          "type" : "string"
        },
        "request" : {
          "type" : "string"
        },
        "request_size" : {
          "type" : "integer"
        },
        "response" : {
          "type" : "integer"
        },
        "source" : {
          "type" : "string",
          "index" : "not_analyzed",
          "omit_norms" : true,
          "index_options" : "docs"
        },
        "source_host" : {
          "type" : "string",
          "index" : "not_analyzed",
          "omit_norms" : true,
          "index_options" : "docs"
        },
        "source_path" : {
          "type" : "string",
          "index" : "not_analyzed",
          "omit_norms" : true,
          "index_options" : "docs"
        },
        "tags" : {
          "type" : "string",
          "index" : "not_analyzed",
          "omit_norms" : true,
          "index_options" : "docs"
        },
        "timestamp" : {
          "type" : "date",
          "format" : "dateOptionalTime"
        },
        "type" : {
          "type" : "string",
          "index" : "not_analyzed",
          "omit_norms" : true,
          "index_options" : "docs"
        },
        "verb" : {
          "type" : "string"
        },
        "vhost" : {
          "type" : "string"
        }
      }
    }
  }
}
@lmenezes
Copy link
Contributor

this is the same as: #2462
I don't really see it being fixed anytime soon(maybe on 1.0 this will be different?).
Probably it makes sense for you to just have a multi field for this, having a value as an ip and another as string. Then you could use the first for whatever you are currently using, and the second to facet.

@ralphm
Copy link

ralphm commented Jul 23, 2013

@lmenezes how does that help exactly? For facet queries like the above you'd still get facets that have to be processed before presenting them in a UI.

That said, I'm getting back IP addresses as strings when faceting with a field of type ip in a clean test.

However, I found that if there are other mappings, where a field by the same name has type string, I am getting responses like the following, which don't even look like IP addresses to me:

    "facets" : {
        "pie" : {
            "_type" : "terms",
            "missing" : 133599,
            "total" : 1683808,
            "other" : 701976,
            "terms" : [ {
                "term" : "\\\b",
                "count" : 105238
            }, {
                "term" : "X\u0001\u0000",
                "count" : 105238
            }, {
                "term" : "T\u0010\u0000",
                "count" : 105238
            }, {
                "term" : "P\u0002\u0000\u0000",
                "count" : 105238
            }, {
                "term" : "L \u0000\u0000",
                "count" : 105238
            }, {
                "term" : "H\u0004\u0000\u0000\u0000",
                "count" : 105238
            }, {
                "term" : "D@\u0000\u0000\u0000",
                "count" : 105238
            }, {
                "term" : "@\b\u0000\u0000\u0000\u0000",
                "count" : 105238
            }, {
                "term" : "<\u0001\u0000\u0000\u0000\u0000\n",
                "count" : 70859
            }, {
                "term" : "8\u0010\u0000\u0000\u0000\u0001&",
                "count" : 69069
            } ]
        }
    }

@lmenezes
Copy link
Contributor

@ralphm not really sure what you meant there... but running that:

curl -XPOST http://localhost:9200/foo

curl -XPUT http://localhost:9200/foo/bar/_mapping -d '{ "bar": { "properties": { "clientip": { "type": "multi_field", "fields": { "clientip": { "type": "ip" }, "clientip_facet": { "type": "string", "index": "not_analyzed" } } } } } }'

curl -XPUT http://localhost:9200/foo/bar/1 -d '{"clientip":"192.168.0.1"}'
curl -XPUT http://localhost:9200/foo/bar/2 -d '{"clientip":"192.168.0.2"}'
curl -XPUT http://localhost:9200/foo/bar/3 -d '{"clientip":"192.168.0.3"}'
curl -XPUT http://localhost:9200/foo/bar/4 -d '{"clientip":"192.168.0.4"}'

curl -XGET http://localhost:9200/foo/bar/_search -d '{ "facets": { "pie": { "terms": { "field": "clientip", "size": 10 } } } }'
curl -XGET http://localhost:9200/foo/bar/_search -d '{ "facets": { "pie": { "terms": { "field": "clientip_facet", "size": 10 } } } }'

might give you an idea of what I meant.
Of course you have to replicate some information here, but I see no better way currently for achieving the same.

@ralphm
Copy link

ralphm commented Jul 23, 2013

@lmenezes Ah, yes. I just did the same thing on my own.

Interestingly, I now also see those IPs as integers on my other installation. I'm not entirely sure what the difference is between these two installs. They should be identical.

@avleen
Copy link

avleen commented Oct 31, 2013

Just for reference, I found that if you set "index" : "not_analyzed" on the ip field, it doesn't break.

@mkaluza
Copy link
Author

mkaluza commented Oct 31, 2013

Thanks for the tip. Unfortunately I need an index on this field :/

2013/10/31 avleen notifications@github.com

Just for reference, I found that if you set "index" : "not_analyzed" on
the ip field, it doesn't break.


Reply to this email directly or view it on GitHubhttps://github.com//issues/3321#issuecomment-27465636
.

@avleen
Copy link

avleen commented Oct 31, 2013

That's fine :-)

Not analysing only means that analyser isn't run against the content of the
field. The analyser tokenizes the contacts for doing things like frequency
searches of characters in words the field will still be indexed, and you
will still be able to do searches against it, including wild card and range
searches.

For a while I was also under the impression that you had to analyse in
order to index this is simply not the case. In fact, with my input from log
stash, I make every field not analysed, except for the source field. It
speeds up indexing and reduces index size quite noticeably .
On 31 Oct 2013 11:09, "mkaluza" notifications@github.com wrote:

Thanks for the tip. Unfortunately I need an index on this field :/

2013/10/31 avleen notifications@github.com

Just for reference, I found that if you set "index" : "not_analyzed" on
the ip field, it doesn't break.


Reply to this email directly or view it on GitHub<
https://github.com/elasticsearch/elasticsearch/issues/3321#issuecomment-27465636>

.


Reply to this email directly or view it on GitHubhttps://github.com//issues/3321#issuecomment-27499361
.

@stonith
Copy link

stonith commented Nov 16, 2013

@avleen Term facets return ip's for you when your mapping is like: "clientip": { "type": "ip", "index": "not_analyzed" } ? I'm getting integers.

UPDATE: I just realized that "index": "not_analyzed" isn't a valid option for type ip:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-ip-type.html

@rottenbytes
Copy link

👍 on this one, it's pretty boring

@dadoonet
Copy link
Member

dadoonet commented Dec 5, 2013

Heya,

The advice provided by @lmenezes could really help you guys to deal with this issue: #3321 (comment)

I think we can close this one as #3300 (see IP Range) will fix it.

Feel free to reopen if you don't think so.

@dadoonet dadoonet closed this as completed Dec 5, 2013
martijnvg added a commit that referenced this issue Apr 25, 2018
* es/6.x: (106 commits)
  Revert "Fix elasticsearch-cli dependency"
  Fix elasticsearch-cli dependency
  [Watcher] Use index.auto_expand_replicas: 0-1 (#3284)
  [DOCS] Reformatted machine learning overview (#3346)
  [DOCS] Added monitoring PRs to 6.1 release notes (#3297)
  [DOCS] Added xpack.ml.node_concurrent_job_allocations setting (#3327)
  [DOCS] Fixed troubleshooting titles
  Watcher: Set index and type dynamically in index action (#3264)
  Tests: Ensure that watcher is started in HipchatServiceTests
  Fix test due to BytesSizeValue negative value deprecation
  [DOCS] Explain ML datafeed run-as integration/limitations (#3311)
  Monitoring: Ensure all monitoring watches filter by timestamp (#3238)
  Fix license messaging for Logstash functionality (#3268)
  [DOCS] Updated titles of ML APIs
  Fixes test to support BytesSizeValue changes (#3321)
  Revert "Fixes test to support BytesSizeValue changes (#3321)"
  Fixes test to support BytesSizeValue changes (#3321)
  Add missing import
  Check for existing x-pack directory when running the `users` CLI tool (#3271)
  [DOCS] Fixed title in 6.1.0 release notes
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants