Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ECS: support composite region_iso_code #206

Merged
merged 5 commits into from
Jan 26, 2022
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
## 7.2.11
- Improved compatibility with the Elastic Common Schema [#206](https://github.com/logstash-plugins/logstash-filter-geoip/pull/206)
- Added support for ECS's composite `region_iso_code`
yaauie marked this conversation as resolved.
Show resolved Hide resolved
- [DOC] Improve ECS-related documentation

## 7.2.10
- [DOC] Air-gapped environment requires both ASN and City databases [#204](https://github.com/logstash-plugins/logstash-filter-geoip/pull/204)

Expand Down
81 changes: 60 additions & 21 deletions docs/index.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -169,14 +169,56 @@ Example response:
}
--------------------------------------------------

[id="plugins-{type}s-{plugin}-field-mapping"]
==== Field mapping

When this plugin is run _without_ <<plugins-{type}s-{plugin}-ecs_compatibility>>, the MaxMind DB's fields are added directly to the <<plugins-{type}s-{plugin}-target>>, but when ECS compatibility is enabled, the fields are structured to fit into an ECS shape.
yaauie marked this conversation as resolved.
Show resolved Hide resolved

[cols="3,5,3"]
|===========================
| Database Field Name | ECS Field | Example

| `ip` | `[ip]` | `12.34.56.78`

| `city_name` | `[geo][city_name]` | `Seattle`
| `country_name` | `[geo][country_name]` | `United States`
| `continent_code` | `[geo][continent_code]` | `NA`
| `continent_name` | `[geo][continent_name]` | `North America`
| `country_code2` | `[geo][country_iso_code]` | `US`
| `country_code3` | _N/A_ | `US`

_maintained for legacy
support, but populated
with 2-character country
code_

| `postal_code` | `[geo][postal_code]` | `98106`
| `region_name` | `[geo][region_name]` | `Washington`
| `region_code` | `[geo][region_code]` | `WA`
| `region_iso_code`* | `[geo][region_iso_code]` | `US-WA`
| `timezone` | `[geo][timezone]` | `America/Los_Angeles`
| `location`* | `[geo][location]` | `{"lat": 47.6062, "lon": -122.3321}"`
| `latitude` | `[geo][location][lat]` | `47.6062`
| `longitude` | `[geo][location][lon]` | `-122.3321`

| `domain` | `[domain]` | `example.com`

| `asn` | `[as][number]` | `98765`
| `as_org` | `[as][organization][name]` | `Elastic, NV`

| `isp` | `[mmdb][isp]` | `InterLink Supra LLC`
| `dma_code` | `[mmdb][dma_code]` | `819`
| `organization` | `[mmdb][organization]` | `Elastic, NV`
|===========================
karenzone marked this conversation as resolved.
Show resolved Hide resolved

NOTE: `*` indicates a composite field, which is only populated if GeoIP lookup result contains all components.

==== Details

A `[geoip][location]` field is created if
the GeoIP lookup returns a latitude and longitude. The field is stored in
http://geojson.org/geojson-spec.html[GeoJSON] format. Additionally,
the default Elasticsearch template provided with the
{logstash-ref}/plugins-outputs-elasticsearch.html[elasticsearch output] maps
the `[geoip][location]` field to an {ref}/geo-point.html[Elasticsearch Geo_point datatype].
When using a City database, the enrichment is aborted if no latitude/longitude pair is available.

The `location` field combines the latitude and longitude into a structure called http://geojson.org/geojson-spec.html[GeoJSON].
yaauie marked this conversation as resolved.
Show resolved Hide resolved
yaauie marked this conversation as resolved.
Show resolved Hide resolved
When using a default <<plugins-{type}s-{plugin}-target>>, the templates provided by the {logstash-ref}/plugins-outputs-elasticsearch.html[elasticsearch output] maps the field to an {ref}/geo-point.html[Elasticsearch Geo_point datatype].
yaauie marked this conversation as resolved.
Show resolved Hide resolved

As this field is a `geo_point` _and_ it is still valid GeoJSON, you get
the awesomeness of Elasticsearch's geospatial query, facet and filter functions
Expand Down Expand Up @@ -242,16 +284,16 @@ number of cache misses and waste memory.
===== `database`

* Value type is <<path,path>>
* If not specified, the database defaults to the GeoLite2 City database that ships with Logstash.
* If not specified, the database defaults to the `GeoLite2 City` database that ships with Logstash.

The path to MaxMind's database file that Logstash should use. The default database is GeoLite2-City.
GeoLite2-City, GeoLite2-Country, GeoLite2-ASN are the free databases from MaxMind that are supported.
GeoIP2-City, GeoIP2-ISP, GeoIP2-Country are the commercial databases from MaxMind that are supported.
The path to MaxMind's database file that Logstash should use.
The default database is `GeoLite2-City`.
This plugin supports several free databases (`GeoLite2-City`, `GeoLite2-Country`, `GeoLite2-ASN`)
and a selection of commercially-licensed databases (`GeoIP2-City`, `GeoIP2-ISP`, `GeoIP2-Country`).

Database auto-update applies to default distribution. When `database` points to user's database path,
auto-update will be disabled.
See
<<plugins-{type}s-{plugin}-database_license,Database License>> for more information.
Database auto-update applies to default distribution.
yaauie marked this conversation as resolved.
Show resolved Hide resolved
When `database` points to user's database path, auto-update will be disabled.
yaauie marked this conversation as resolved.
Show resolved Hide resolved
See <<plugins-{type}s-{plugin}-database_license,Database License>> for more information.

[id="plugins-{type}s-{plugin}-default_database_type"]
===== `default_database_type`
Expand All @@ -270,21 +312,18 @@ This plugin now includes both the GeoLite2-City and GeoLite2-ASN databases. If

An array of geoip fields to be included in the event.

Possible fields depend on the database type. By default, all geoip fields
are included in the event.
Possible fields depend on the database type.
By default, all geoip fields from the relevant database are included in the event.

For the built-in GeoLite2 City database, the following are available:
`city_name`, `continent_code`, `country_code2`, `country_code3`, `country_name`,
`dma_code`, `ip`, `latitude`, `location`, `longitude`, `postal_code`, `region_code`,
`region_name` and `timezone`.
For a complete list of available fields and how they map to an event's structure, see <<plugins-{type}s-{plugin}-field-mapping,field mapping>>.

[id="plugins-{type}s-{plugin}-ecs_compatibility"]
===== `ecs_compatibility`

* Value type is <<string,string>>
* Supported values are:
** `disabled`: unstructured geo data added at root level
** `v1`, `v8`: uses fields that are compatible with Elastic Common Schema (for example, `[client][geo][country_name]`)
** `v1`, `v8`: uses fields that are compatible with Elastic Common Schema (for example, `[client][geo][country_name]`; see <<plugins-{type}s-{plugin}-field-mapping,field mapping>>)
yaauie marked this conversation as resolved.
Show resolved Hide resolved
* Default value depends on which version of Logstash is running:
** When Logstash provides a `pipeline.ecs_compatibility` setting, its value is used as the default
** Otherwise, the default value is `disabled`.
Expand Down
2 changes: 1 addition & 1 deletion logstash-filter-geoip.gemspec
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Gem::Specification.new do |s|

s.name = 'logstash-filter-geoip'
s.version = '7.2.10'
s.version = '7.2.11'
s.licenses = ['Apache License (2.0)']
s.summary = "Adds geographical information about an IP address"
s.description = "This gem is a Logstash plugin required to be installed on top of the Logstash core pipeline using $LS_HOME/bin/logstash-plugin install gemname. This gem is not a stand-alone program"
Expand Down
16 changes: 14 additions & 2 deletions spec/filters/geoip_ecs_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,10 @@
end

context "with city database" do
# example.com, has been static for 10+ years
# and has city-level details
let(:ip) { "93.184.216.34" }

let(:options) { common_options }

it "should return geo in target" do
Expand All @@ -36,15 +40,23 @@
expect( event.get ecs_select[disabled: "[#{target}][country_code2]", v1: "[#{target}][geo][country_iso_code]"] ).to eq 'US'
expect( event.get ecs_select[disabled: "[#{target}][country_name]", v1: "[#{target}][geo][country_name]"] ).to eq 'United States'
expect( event.get ecs_select[disabled: "[#{target}][continent_code]", v1: "[#{target}][geo][continent_code]"] ).to eq 'NA'
expect( event.get ecs_select[disabled: "[#{target}][location][lat]", v1: "[#{target}][geo][location][lat]"] ).to eq 37.751
expect( event.get ecs_select[disabled: "[#{target}][location][lon]", v1: "[#{target}][geo][location][lon]"] ).to eq -97.822
expect( event.get ecs_select[disabled: "[#{target}][location][lat]", v1: "[#{target}][geo][location][lat]"] ).to eq 42.1596
expect( event.get ecs_select[disabled: "[#{target}][location][lon]", v1: "[#{target}][geo][location][lon]"] ).to eq -70.8217
expect( event.get ecs_select[disabled: "[#{target}][city_name]", v1: "[#{target}][geo][city_name]"] ).to eq 'Norwell'
expect( event.get ecs_select[disabled: "[#{target}][dma_code]", v1: "[#{target}][mmdb][dma_code]"] ).to eq 506
expect( event.get ecs_select[disabled: "[#{target}][region_name]", v1: "[#{target}][geo][region_name]"] ).to eq 'Massachusetts'

if ecs_select.active_mode == :disabled
expect( event.get "[#{target}][country_code3]" ).to eq 'US'
expect( event.get "[#{target}][region_code]" ).to eq 'MA'
expect( event.get "[#{target}][region_iso_code]" ).to be_nil
else
expect( event.get "[#{target}][geo][country_code3]" ).to be_nil
expect( event.get "[#{target}][country_code3]" ).to be_nil
expect( event.get "[#{target}][geo][region_iso_code]" ).to eq 'US-MA'
expect( event.get "[#{target}][region_code]" ).to be_nil
end
Comment on lines 49 to 58
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to keep populating the non-ECS [geo][region_code], this change is also needed:

Suggested change
if ecs_select.active_mode == :disabled
expect( event.get "[#{target}][country_code3]" ).to eq 'US'
expect( event.get "[#{target}][region_code]" ).to eq 'MA'
expect( event.get "[#{target}][region_iso_code]" ).to be_nil
else
expect( event.get "[#{target}][geo][country_code3]" ).to be_nil
expect( event.get "[#{target}][country_code3]" ).to be_nil
expect( event.get "[#{target}][geo][region_iso_code]" ).to eq 'US-MA'
expect( event.get "[#{target}][region_code]" ).to be_nil
end
if ecs_select.active_mode == :disabled
expect( event.get "[#{target}][country_code3]" ).to eq 'US'
expect( event.get "[#{target}][region_code]" ).to eq 'MA'
expect( event.get "[#{target}][region_iso_code]" ).to be_nil
else
expect( event.get "[#{target}][geo][country_code3]" ).to be_nil
expect( event.get "[#{target}][country_code3]" ).to be_nil
expect( event.get "[#{target}][geo][region_iso_code]" ).to eq 'US-MA'
expect( event.get "[#{target}][geo][region_code]" ).to eq 'MA'
end

puts event.to_hash.inspect
end
end

Expand Down
9 changes: 9 additions & 0 deletions src/main/java/org/logstash/filters/geoip/Fields.java
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ enum Fields {
DMA_CODE("mmdb.dma_code", "dma_code"),
REGION_NAME("geo.region_name", "region_name"),
REGION_CODE("geo.region_code", "region_code"),
REGION_ISO_CODE("geo.region_iso_code", "region_iso_code"),
TIMEZONE("geo.timezone", "timezone"),
LOCATION("geo.location", "location"),
LATITUDE("geo.location.lat", "latitude"),
Expand Down Expand Up @@ -96,6 +97,14 @@ public String getFieldReferenceECSv1() {
Fields.COUNTRY_CODE3, Fields.IP, Fields.POSTAL_CODE, Fields.DMA_CODE, Fields.REGION_NAME,
Fields.REGION_CODE, Fields.TIMEZONE, Fields.LOCATION, Fields.LATITUDE, Fields.LONGITUDE);

// When ECS is enabled, the composite REGION_ISO_CODE field is preferred to separate REGION_CODE
static final EnumSet<Fields> DEFAULT_ECS_CITY_FIELDS;
static {
DEFAULT_ECS_CITY_FIELDS = EnumSet.copyOf(DEFAULT_CITY_FIELDS);
DEFAULT_ECS_CITY_FIELDS.remove(REGION_CODE);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is perhaps a breaking change from a certain perspective, for a user who was relying on the non-ECS [geo][region_code] being populated in ECS mode. I'm inclined to call this a bugfix, since users have to explicitly opt into ECS to hit this behaviour.

But if we want to continue populating [geo][region_code] in ECS mode, this change and a small change to the specs would suffice:

Suggested change
DEFAULT_ECS_CITY_FIELDS.remove(REGION_CODE);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With ECS default on in 8.x, this is a breaking change. Users do not have the same data of region_code or replacement. Thinking to have both region_code and region_iso_code, but in some sense, the data of region_code is also a kind of iso code. Having both seems to be confusing. +1 to the breaking change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With ECS default on in 8.x, this is a breaking change

A user upgrading from Logstash 7 to Logstash 8 should be expecting to be consuming breaking changes, and the path to ECS provides appropriate guidance on how to keep a plugin in non-ECS mode (hint: set pipeline.ecs_compatibility: disabled for the pipeline to lock in 7.x behaviour in all its plugins).

It is a little breaking for users who already opt into ECS for the pipeline or an instance of this plugin, but it is also a bugfix since users who are already going out of their way to get ECS have a reasonable expectation that the result will be ECS (which [geo][region_code] is not).

DEFAULT_ECS_CITY_FIELDS.add(REGION_ISO_CODE);
}

static final EnumSet<Fields> DEFAULT_COUNTRY_FIELDS = EnumSet.of(Fields.IP, Fields.COUNTRY_CODE2,
Fields.IP, Fields.COUNTRY_NAME, Fields.CONTINENT_NAME);

Expand Down
12 changes: 9 additions & 3 deletions src/main/java/org/logstash/filters/geoip/GeoIPFilter.java
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@ public GeoIPFilter(String sourceField, String targetField, List<String> fields,
} catch (IOException e) {
throw new IllegalArgumentException("The database provided was not found in the path", e);
}
this.desiredFields = createDesiredFields(fields);
this.desiredFields = createDesiredFields(fields, !ecsCompatibility.equals("disabled"));
}

public static boolean isDatabaseValid(String databasePath) {
Expand All @@ -107,7 +107,7 @@ public static boolean isDatabaseValid(String databasePath) {
return false;
}

private Set<Fields> createDesiredFields(List<String> fields) {
private Set<Fields> createDesiredFields(List<String> fields, final boolean ecsCompatibilityEnabled) {
Set<Fields> desiredFields = EnumSet.noneOf(Fields.class);
if (fields == null || fields.isEmpty()) {
switch (databaseReader.getMetadata().getDatabaseType()) {
Expand All @@ -118,7 +118,7 @@ private Set<Fields> createDesiredFields(List<String> fields) {
case CITY_EUROPE_DB_TYPE:
case CITY_NORTH_AMERICA_DB_TYPE:
case CITY_SOUTH_AMERICA_DB_TYPE:
desiredFields = Fields.DEFAULT_CITY_FIELDS;
desiredFields = ecsCompatibilityEnabled ? Fields.DEFAULT_ECS_CITY_FIELDS : Fields.DEFAULT_CITY_FIELDS;
break;
case COUNTRY_LITE_DB_TYPE:
case COUNTRY_DB_TYPE:
Expand Down Expand Up @@ -311,6 +311,12 @@ private Map<Fields,Object> retrieveCityGeoData(InetAddress ipAddress) throws Geo
geoData.put(Fields.REGION_CODE, subdivisionCode);
}
break;
case REGION_ISO_CODE:
String countryCodeForRegion = country.getIsoCode();
String regionCode2 = subdivision.getIsoCode();
if (countryCodeForRegion != null && regionCode2 != null) {
geoData.put(Fields.REGION_ISO_CODE, String.format("%s-%s", countryCodeForRegion, regionCode2));
}
yaauie marked this conversation as resolved.
Show resolved Hide resolved
case TIMEZONE:
String locationTimeZone = location.getTimeZone();
if (locationTimeZone != null) {
Expand Down