-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement GET API of ip2geo datasource #279
Implement GET API of ip2geo datasource #279
Conversation
@heemin32 Why does this PR need to change PUT action? |
No. Thought it is small change so I mixed them in the PR. Let me break them.. |
82c4c98
to
518273a
Compare
@jmazanec15 @junqiu-lei Could you take a look? |
0b7d612
to
986171e
Compare
src/main/java/org/opensearch/geospatial/ip2geo/action/GetDatasourceRequest.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/geospatial/ip2geo/action/GetDatasourceRequest.java
Show resolved
Hide resolved
src/main/java/org/opensearch/geospatial/ip2geo/action/GetDatasourceRequest.java
Show resolved
Hide resolved
src/main/java/org/opensearch/geospatial/ip2geo/action/GetDatasourceResponse.java
Show resolved
Hide resolved
src/main/java/org/opensearch/geospatial/ip2geo/action/GetDatasourceResponse.java
Show resolved
Hide resolved
src/main/java/org/opensearch/geospatial/ip2geo/action/GetDatasourceTransportAction.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/geospatial/ip2geo/action/GetDatasourceTransportAction.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/geospatial/ip2geo/action/RestGetDatasourceHandler.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/geospatial/ip2geo/common/DatasourceFacade.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/geospatial/ip2geo/jobscheduler/Datasource.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/geospatial/ip2geo/action/GetDatasourceRequest.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/geospatial/ip2geo/common/DatasourceFacade.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/geospatial/ip2geo/common/DatasourceFacade.java
Outdated
Show resolved
Hide resolved
src/test/java/org/opensearch/geospatial/ip2geo/Ip2GeoTestCase.java
Outdated
Show resolved
Hide resolved
src/test/java/org/opensearch/geospatial/ip2geo/action/GetDatasourceRequestTests.java
Outdated
Show resolved
Hide resolved
src/test/java/org/opensearch/geospatial/ip2geo/action/RestGetDatasourceHandlerTests.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/geospatial/ip2geo/action/GetDatasourceTransportAction.java
Outdated
Show resolved
Hide resolved
public List<Route> routes() { | ||
return List.of( | ||
new Route(GET, String.join(URL_DELIMITER, getPluginURLPrefix(), "ip2geo/datasource")), | ||
new Route(GET, String.join(URL_DELIMITER, getPluginURLPrefix(), "ip2geo/datasource/{name}")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Will this name be unique?
- Do we support regex?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't support regex.
What do you mean by saying unique?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are changing from id to name in get. I assume you are validating that data source name should be unique
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This naming is just for labeling. Still it is used as doc id internally.
@@ -142,7 +145,7 @@ public class Datasource implements ScheduledJobParameter { | |||
"datasource_metadata", | |||
true, | |||
args -> { | |||
String id = (String) args[0]; | |||
String name = (String) args[0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we add instance and length check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you mean by saying "add instance"?
Also, length check is done in PutDatasourceRequest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are casting args[0] to String without checking is it string or not. It is safe to check array length wherever you are unpacking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Value type and array length is defined in below code already. No need to validate again.
static {
PARSER.declareString(ConstructingObjectParser.constructorArg(), NAME_FIELD);
PARSER.declareLong(ConstructingObjectParser.constructorArg(), LAST_UPDATE_TIME_FIELD);
PARSER.declareLong(ConstructingObjectParser.optionalConstructorArg(), ENABLED_TIME_FIELD);
PARSER.declareBoolean(ConstructingObjectParser.constructorArg(), ENABLED_FIELD);
PARSER.declareObject(ConstructingObjectParser.constructorArg(), (p, c) -> ScheduleParser.parse(p), SCHEDULE_FIELD);
PARSER.declareString(ConstructingObjectParser.constructorArg(), ENDPOINT_FIELD);
PARSER.declareString(ConstructingObjectParser.constructorArg(), STATE_FIELD);
PARSER.declareStringArray(ConstructingObjectParser.constructorArg(), INDICES_FIELD);
PARSER.declareObject(ConstructingObjectParser.constructorArg(), Database.PARSER, DATABASE_FIELD);
PARSER.declareObject(ConstructingObjectParser.constructorArg(), UpdateStats.PARSER, UPDATE_STATS_FIELD);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the data type for args here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Array of object I believe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a new pattern that I invented. This is how we parse XContent in OpenSearch generally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't want to block you from merging PR. My point is whenever we do casting, it is always safe to do instance check before casting it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, we know what the instance type it is because the instance is not coming somewhere else but we provide it in our own code right below it.
/** | ||
* Convert long to instant | ||
* | ||
* This method is static so that it can be used in child class | ||
* | ||
* @param epochMilli the epoch milliseconds | ||
* @return the instant | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: i don't think you need comments since it is very obvious from implementation/method name and it is private method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use private method in child class? Do you mean in inner class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use private method in inner class
src/main/java/org/opensearch/geospatial/ip2geo/action/GetDatasourceRequest.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/geospatial/ip2geo/action/GetDatasourceResponse.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Heemin Kim <heemin@amazon.com>
Signed-off-by: Heemin Kim <heemin@amazon.com>
Signed-off-by: Heemin Kim <heemin@amazon.com>
* Update gradle version to 7.6 (#265) Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com> * Exclude lombok generated code from jacoco coverage report (#268) Signed-off-by: Heemin Kim <heemin@amazon.com> * Make jacoco report to be generated faster in local (#267) Signed-off-by: Heemin Kim <heemin@amazon.com> * Update dependency org.json:json to v20230227 (#273) Co-authored-by: mend-for-github-com[bot] <50673670+mend-for-github-com[bot]@users.noreply.github.com> * Baseline owners and maintainers (#275) Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com> * Add Auto Release Workflow (#288) Signed-off-by: Naveen Tatikonda <navtat@amazon.com> * Change package for Strings.hasText (#314) Signed-off-by: Heemin Kim <heemin@amazon.com> * Adding release notes for 2.8 (#323) Signed-off-by: Martin Gaievski <gaievski@amazon.com> * Add 2.9.0 release notes (#350) Signed-off-by: Junqiu Lei <junqiu@amazon.com> * Update packages according to a change in OpenSearch core (#353) Signed-off-by: Heemin Kim <heemin@amazon.com> * Implement creation of ip2geo feature (#257) * Update gradle version to 7.6 (#265) Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com> * Implement creation of ip2geo feature * Implementation of ip2geo datasource creation * Implementation of ip2geo processor creation Signed-off-by: Heemin Kim <heemin@amazon.com> --------- Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com> Signed-off-by: Heemin Kim <heemin@amazon.com> Co-authored-by: Vijayan Balasubramanian <balasvij@amazon.com> * Added unit tests with some refactoring of codes (#271) * Add Unit tests * Set cache true for search query * Remove in memory cache implementation (Two way door decision) * Relying on search cache without custom cache * Renamed datasource state from FAILED to CREATE_FAILED * Renamed class name from *Helper to *Facade * Changed updateIntervalInDays to updateInterval * Changed value type of default update_interval from TimeValue to Long * Read setting value from cluster settings directly Signed-off-by: Heemin Kim <heemin@amazon.com> * Sync from main (#280) * Update gradle version to 7.6 (#265) Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com> * Exclude lombok generated code from jacoco coverage report (#268) Signed-off-by: Heemin Kim <heemin@amazon.com> * Make jacoco report to be generated faster in local (#267) Signed-off-by: Heemin Kim <heemin@amazon.com> * Update dependency org.json:json to v20230227 (#273) Co-authored-by: mend-for-github-com[bot] <50673670+mend-for-github-com[bot]@users.noreply.github.com> * Baseline owners and maintainers (#275) Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com> --------- Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com> Signed-off-by: Heemin Kim <heemin@amazon.com> Co-authored-by: Vijayan Balasubramanian <balasvij@amazon.com> Co-authored-by: mend-for-github-com[bot] <50673670+mend-for-github-com[bot]@users.noreply.github.com> * Add datasource name validation (#281) Signed-off-by: Heemin Kim <heemin@amazon.com> * Refactoring of code (#282) 1. Change variable name from datasourceName to name 2. Change variable name from id to name 3. Added helper methods in test code Signed-off-by: Heemin Kim <heemin@amazon.com> * Change field name from md5 to sha256 (#285) Signed-off-by: Heemin Kim <heemin@amazon.com> * Implement get datasource api (#279) Signed-off-by: Heemin Kim <heemin@amazon.com> * Update index option (#284) 1. Make geodata index as hidden 2. Make geodata index as read only allow delete after creation is done 3. Refresh datasource index immediately after update Signed-off-by: Heemin Kim <heemin@amazon.com> * Make some fields in manifest file as mandatory (#289) Signed-off-by: Heemin Kim <heemin@amazon.com> * Create datasource index explicitly (#283) Signed-off-by: Heemin Kim <heemin@amazon.com> * Add wrapper class of job scheduler lock service (#290) Signed-off-by: Heemin Kim <heemin@amazon.com> * Remove all unused client attributes (#293) Signed-off-by: Heemin Kim <heemin@amazon.com> * Update copyright header (#298) Signed-off-by: Heemin Kim <heemin@amazon.com> * Run system index handling code with stashed thread context (#297) Signed-off-by: Heemin Kim <heemin@amazon.com> * Reduce lock duration and renew the lock during update (#299) Signed-off-by: Heemin Kim <heemin@amazon.com> * Implements delete datasource API (#291) Signed-off-by: Heemin Kim <heemin@amazon.com> * Set User-Agent in http request (#300) Signed-off-by: Heemin Kim <heemin@amazon.com> * Implement datasource update API (#292) Signed-off-by: Heemin Kim <heemin@amazon.com> * Refactoring test code (#302) Make buildGeoJSONFeatureProcessorConfig method to be more general Signed-off-by: Heemin Kim <heemin@amazon.com> * Add ip2geo processor integ test for failure case (#303) Signed-off-by: Heemin Kim <heemin@amazon.com> * Bug fix and refactoring of code (#305) 1. Bugfix: Ingest metadata can be null if there is no processor created 2. Refactoring: Moved private method to another class for better testing support 3. Refactoring: Set some private static final variable as public so that unit test can use it 4. Refactoring: Changed string value to static variable Signed-off-by: Heemin Kim <heemin@amazon.com> * Add integration test for Ip2GeoProcessor (#306) Signed-off-by: Heemin Kim <heemin@amazon.com> * Add ConcurrentModificationException (#308) Signed-off-by: Heemin Kim <heemin@amazon.com> * Add integration test for UpdateDatasource API (#307) Signed-off-by: Heemin Kim <heemin@amazon.com> * Bug fix on lock management and few performance improvements (#310) * Release lock before response back to caller for update/delete API * Release lock in background task for creation API * Change index settings to improve indexing performance Signed-off-by: Heemin Kim <heemin@amazon.com> * Change index setting from read_only_allow_delete to write (#311) read_only_allow_delete does not block write to an index. The disk-based shard allocator may add and remove this block automatically. Therefore, use index.blocks.write instead. Signed-off-by: Heemin Kim <heemin@amazon.com> * Fix bug in get datasource API and improve memory usage (#313) Signed-off-by: Heemin Kim <heemin@amazon.com> * Change package for Strings.hasText (#314) (#317) Signed-off-by: Heemin Kim <heemin@amazon.com> * Remove jitter and move index setting from DatasourceFacade to DatasourceExtension (#319) Signed-off-by: Heemin Kim <heemin@amazon.com> * Do not index blank value and do not enrich null property (#320) Signed-off-by: Heemin Kim <heemin@amazon.com> * Move index setting keys to constants (#321) Signed-off-by: Heemin Kim <heemin@amazon.com> * Return null index name for expired data (#322) Return null index name for expired data so that it can be deleted by clean up process. Clean up process exclude current index from deleting. Signed-off-by: Heemin Kim <heemin@amazon.com> * Add new fields in datasource (#325) Signed-off-by: Heemin Kim <heemin@amazon.com> * Delete index once it is expired (#326) Signed-off-by: Heemin Kim <heemin@amazon.com> * Add restoring event listener (#328) In the listener, we trigger a geoip data update Signed-off-by: Heemin Kim <heemin@amazon.com> * Reverse forcemerge and refresh order (#331) Otherwise, opensearch does not clear old segment files Signed-off-by: Heemin Kim <heemin@amazon.com> * Removed parameter and settings (#332) * Removed first_only parameter * Removed max_concurrency and batch_size setting first_only parameter was added as current geoip processor has it. However, the parameter have no benefit for ip2geo processor as we don't do a sequantial search for array data but use multi search. max_concurrency and batch_size setting is removed as these are only reveal internal implementation and could be a future blocker to improve performance later. Signed-off-by: Heemin Kim <heemin@amazon.com> * Add a field in datasource for current index name (#333) Signed-off-by: Heemin Kim <heemin@amazon.com> * Delete GeoIP data indices after restoring complete (#334) We don't want to use restored GeoIP data indices. Therefore we delete the indices once restoring process complete. When GeoIP metadata index is restored, we create a new GeoIP data index instead. Signed-off-by: Heemin Kim <heemin@amazon.com> * Use bool query for array form of IPs (#335) Signed-off-by: Heemin Kim <heemin@amazon.com> * Run update/delete request in a new thread (#337) This is not to block transport thread Signed-off-by: Heemin Kim <heemin@amazon.com> * Remove IP2Geo processor validation (#336) Cannot query index to get data to validate IP2Geo processor. Will add validation when we decide to store some of data in cluster state metadata. Signed-off-by: Heemin Kim <heemin@amazon.com> * Acquire lock sychronously (#339) By acquiring lock asychronously, the remaining part of the code is being run by transport thread which does not allow blocking code. We want only single update happen in a node using single thread. However, it cannot be acheived if I acquire lock asynchronously and pass the listener. Signed-off-by: Heemin Kim <heemin@amazon.com> * Added a cache to store datasource metadata (#338) Signed-off-by: Heemin Kim <heemin@amazon.com> * Changed class name and package (#341) Signed-off-by: Heemin Kim <heemin@amazon.com> * Refactoring of code (#342) 1. Changed class name from Ip2GeoCache to Ip2GeoCachedDao 2. Moved the Ip2GeoCachedDao from cache to dao package Signed-off-by: Heemin Kim <heemin@amazon.com> * Add geo data cache (#340) Signed-off-by: Heemin Kim <heemin@amazon.com> * Add cache layer to reduce GeoIp data retrieval latency (#343) Signed-off-by: Heemin Kim <heemin@amazon.com> * Use _primary in query preference and few changes (#347) 1. Use _primary preference to get datasource metadata so that it can read the latest data. RefreshPolicy.IMMEDIATE won't refresh replica shards immediately according to #346 2. Update datasource metadata index mapping 3. Move batch size from static value to setting Signed-off-by: Heemin Kim <heemin@amazon.com> * Wait until GeoIP data to be replicated to all data nodes (#348) Signed-off-by: Heemin Kim <heemin@amazon.com> * Update packages according to a change in OpenSearch core (#354) * Update packages according to a change in OpenSearch core Signed-off-by: Heemin Kim <heemin@amazon.com> * Update packages according to a change in OpenSearch core (#353) Signed-off-by: Heemin Kim <heemin@amazon.com> --------- Signed-off-by: Heemin Kim <heemin@amazon.com> --------- Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com> Signed-off-by: Heemin Kim <heemin@amazon.com> Signed-off-by: Naveen Tatikonda <navtat@amazon.com> Signed-off-by: Martin Gaievski <gaievski@amazon.com> Signed-off-by: Junqiu Lei <junqiu@amazon.com> Co-authored-by: Vijayan Balasubramanian <balasvij@amazon.com> Co-authored-by: mend-for-github-com[bot] <50673670+mend-for-github-com[bot]@users.noreply.github.com> Co-authored-by: Naveen Tatikonda <navtat@amazon.com> Co-authored-by: Martin Gaievski <gaievski@amazon.com> Co-authored-by: Junqiu Lei <junqiu@amazon.com>
* Implement creation of ip2geo feature (#257) * Update gradle version to 7.6 (#265) Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com> * Implement creation of ip2geo feature * Implementation of ip2geo datasource creation * Implementation of ip2geo processor creation Signed-off-by: Heemin Kim <heemin@amazon.com> --------- Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com> Signed-off-by: Heemin Kim <heemin@amazon.com> Co-authored-by: Vijayan Balasubramanian <balasvij@amazon.com> * Added unit tests with some refactoring of codes (#271) * Add Unit tests * Set cache true for search query * Remove in memory cache implementation (Two way door decision) * Relying on search cache without custom cache * Renamed datasource state from FAILED to CREATE_FAILED * Renamed class name from *Helper to *Facade * Changed updateIntervalInDays to updateInterval * Changed value type of default update_interval from TimeValue to Long * Read setting value from cluster settings directly Signed-off-by: Heemin Kim <heemin@amazon.com> * Sync from main (#280) * Update gradle version to 7.6 (#265) Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com> * Exclude lombok generated code from jacoco coverage report (#268) Signed-off-by: Heemin Kim <heemin@amazon.com> * Make jacoco report to be generated faster in local (#267) Signed-off-by: Heemin Kim <heemin@amazon.com> * Update dependency org.json:json to v20230227 (#273) Co-authored-by: mend-for-github-com[bot] <50673670+mend-for-github-com[bot]@users.noreply.github.com> * Baseline owners and maintainers (#275) Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com> --------- Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com> Signed-off-by: Heemin Kim <heemin@amazon.com> Co-authored-by: Vijayan Balasubramanian <balasvij@amazon.com> Co-authored-by: mend-for-github-com[bot] <50673670+mend-for-github-com[bot]@users.noreply.github.com> * Add datasource name validation (#281) Signed-off-by: Heemin Kim <heemin@amazon.com> * Refactoring of code (#282) 1. Change variable name from datasourceName to name 2. Change variable name from id to name 3. Added helper methods in test code Signed-off-by: Heemin Kim <heemin@amazon.com> * Change field name from md5 to sha256 (#285) Signed-off-by: Heemin Kim <heemin@amazon.com> * Implement get datasource api (#279) Signed-off-by: Heemin Kim <heemin@amazon.com> * Update index option (#284) 1. Make geodata index as hidden 2. Make geodata index as read only allow delete after creation is done 3. Refresh datasource index immediately after update Signed-off-by: Heemin Kim <heemin@amazon.com> * Make some fields in manifest file as mandatory (#289) Signed-off-by: Heemin Kim <heemin@amazon.com> * Create datasource index explicitly (#283) Signed-off-by: Heemin Kim <heemin@amazon.com> * Add wrapper class of job scheduler lock service (#290) Signed-off-by: Heemin Kim <heemin@amazon.com> * Remove all unused client attributes (#293) Signed-off-by: Heemin Kim <heemin@amazon.com> * Update copyright header (#298) Signed-off-by: Heemin Kim <heemin@amazon.com> * Run system index handling code with stashed thread context (#297) Signed-off-by: Heemin Kim <heemin@amazon.com> * Reduce lock duration and renew the lock during update (#299) Signed-off-by: Heemin Kim <heemin@amazon.com> * Implements delete datasource API (#291) Signed-off-by: Heemin Kim <heemin@amazon.com> * Set User-Agent in http request (#300) Signed-off-by: Heemin Kim <heemin@amazon.com> * Implement datasource update API (#292) Signed-off-by: Heemin Kim <heemin@amazon.com> * Refactoring test code (#302) Make buildGeoJSONFeatureProcessorConfig method to be more general Signed-off-by: Heemin Kim <heemin@amazon.com> * Add ip2geo processor integ test for failure case (#303) Signed-off-by: Heemin Kim <heemin@amazon.com> * Bug fix and refactoring of code (#305) 1. Bugfix: Ingest metadata can be null if there is no processor created 2. Refactoring: Moved private method to another class for better testing support 3. Refactoring: Set some private static final variable as public so that unit test can use it 4. Refactoring: Changed string value to static variable Signed-off-by: Heemin Kim <heemin@amazon.com> * Add integration test for Ip2GeoProcessor (#306) Signed-off-by: Heemin Kim <heemin@amazon.com> * Add ConcurrentModificationException (#308) Signed-off-by: Heemin Kim <heemin@amazon.com> * Add integration test for UpdateDatasource API (#307) Signed-off-by: Heemin Kim <heemin@amazon.com> * Bug fix on lock management and few performance improvements (#310) * Release lock before response back to caller for update/delete API * Release lock in background task for creation API * Change index settings to improve indexing performance Signed-off-by: Heemin Kim <heemin@amazon.com> * Change index setting from read_only_allow_delete to write (#311) read_only_allow_delete does not block write to an index. The disk-based shard allocator may add and remove this block automatically. Therefore, use index.blocks.write instead. Signed-off-by: Heemin Kim <heemin@amazon.com> * Fix bug in get datasource API and improve memory usage (#313) Signed-off-by: Heemin Kim <heemin@amazon.com> * Change package for Strings.hasText (#314) (#317) Signed-off-by: Heemin Kim <heemin@amazon.com> * Remove jitter and move index setting from DatasourceFacade to DatasourceExtension (#319) Signed-off-by: Heemin Kim <heemin@amazon.com> * Do not index blank value and do not enrich null property (#320) Signed-off-by: Heemin Kim <heemin@amazon.com> * Move index setting keys to constants (#321) Signed-off-by: Heemin Kim <heemin@amazon.com> * Return null index name for expired data (#322) Return null index name for expired data so that it can be deleted by clean up process. Clean up process exclude current index from deleting. Signed-off-by: Heemin Kim <heemin@amazon.com> * Add new fields in datasource (#325) Signed-off-by: Heemin Kim <heemin@amazon.com> * Delete index once it is expired (#326) Signed-off-by: Heemin Kim <heemin@amazon.com> * Add restoring event listener (#328) In the listener, we trigger a geoip data update Signed-off-by: Heemin Kim <heemin@amazon.com> * Reverse forcemerge and refresh order (#331) Otherwise, opensearch does not clear old segment files Signed-off-by: Heemin Kim <heemin@amazon.com> * Removed parameter and settings (#332) * Removed first_only parameter * Removed max_concurrency and batch_size setting first_only parameter was added as current geoip processor has it. However, the parameter have no benefit for ip2geo processor as we don't do a sequantial search for array data but use multi search. max_concurrency and batch_size setting is removed as these are only reveal internal implementation and could be a future blocker to improve performance later. Signed-off-by: Heemin Kim <heemin@amazon.com> * Add a field in datasource for current index name (#333) Signed-off-by: Heemin Kim <heemin@amazon.com> * Delete GeoIP data indices after restoring complete (#334) We don't want to use restored GeoIP data indices. Therefore we delete the indices once restoring process complete. When GeoIP metadata index is restored, we create a new GeoIP data index instead. Signed-off-by: Heemin Kim <heemin@amazon.com> * Use bool query for array form of IPs (#335) Signed-off-by: Heemin Kim <heemin@amazon.com> * Run update/delete request in a new thread (#337) This is not to block transport thread Signed-off-by: Heemin Kim <heemin@amazon.com> * Remove IP2Geo processor validation (#336) Cannot query index to get data to validate IP2Geo processor. Will add validation when we decide to store some of data in cluster state metadata. Signed-off-by: Heemin Kim <heemin@amazon.com> * Acquire lock sychronously (#339) By acquiring lock asychronously, the remaining part of the code is being run by transport thread which does not allow blocking code. We want only single update happen in a node using single thread. However, it cannot be acheived if I acquire lock asynchronously and pass the listener. Signed-off-by: Heemin Kim <heemin@amazon.com> * Added a cache to store datasource metadata (#338) Signed-off-by: Heemin Kim <heemin@amazon.com> * Changed class name and package (#341) Signed-off-by: Heemin Kim <heemin@amazon.com> * Refactoring of code (#342) 1. Changed class name from Ip2GeoCache to Ip2GeoCachedDao 2. Moved the Ip2GeoCachedDao from cache to dao package Signed-off-by: Heemin Kim <heemin@amazon.com> * Add geo data cache (#340) Signed-off-by: Heemin Kim <heemin@amazon.com> * Add cache layer to reduce GeoIp data retrieval latency (#343) Signed-off-by: Heemin Kim <heemin@amazon.com> * Use _primary in query preference and few changes (#347) 1. Use _primary preference to get datasource metadata so that it can read the latest data. RefreshPolicy.IMMEDIATE won't refresh replica shards immediately according to #346 2. Update datasource metadata index mapping 3. Move batch size from static value to setting Signed-off-by: Heemin Kim <heemin@amazon.com> * Wait until GeoIP data to be replicated to all data nodes (#348) Signed-off-by: Heemin Kim <heemin@amazon.com> * Update packages according to a change in OpenSearch core (#354) * Update packages according to a change in OpenSearch core Signed-off-by: Heemin Kim <heemin@amazon.com> * Update packages according to a change in OpenSearch core (#353) Signed-off-by: Heemin Kim <heemin@amazon.com> --------- Signed-off-by: Heemin Kim <heemin@amazon.com> --------- Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com> Signed-off-by: Heemin Kim <heemin@amazon.com> Co-authored-by: Vijayan Balasubramanian <balasvij@amazon.com> Co-authored-by: mend-for-github-com[bot] <50673670+mend-for-github-com[bot]@users.noreply.github.com>
* Implement creation of ip2geo feature (#257) * Update gradle version to 7.6 (#265) Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com> * Implement creation of ip2geo feature * Implementation of ip2geo datasource creation * Implementation of ip2geo processor creation Signed-off-by: Heemin Kim <heemin@amazon.com> --------- Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com> Signed-off-by: Heemin Kim <heemin@amazon.com> Co-authored-by: Vijayan Balasubramanian <balasvij@amazon.com> * Added unit tests with some refactoring of codes (#271) * Add Unit tests * Set cache true for search query * Remove in memory cache implementation (Two way door decision) * Relying on search cache without custom cache * Renamed datasource state from FAILED to CREATE_FAILED * Renamed class name from *Helper to *Facade * Changed updateIntervalInDays to updateInterval * Changed value type of default update_interval from TimeValue to Long * Read setting value from cluster settings directly Signed-off-by: Heemin Kim <heemin@amazon.com> * Sync from main (#280) * Update gradle version to 7.6 (#265) Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com> * Exclude lombok generated code from jacoco coverage report (#268) Signed-off-by: Heemin Kim <heemin@amazon.com> * Make jacoco report to be generated faster in local (#267) Signed-off-by: Heemin Kim <heemin@amazon.com> * Update dependency org.json:json to v20230227 (#273) Co-authored-by: mend-for-github-com[bot] <50673670+mend-for-github-com[bot]@users.noreply.github.com> * Baseline owners and maintainers (#275) Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com> --------- Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com> Signed-off-by: Heemin Kim <heemin@amazon.com> Co-authored-by: Vijayan Balasubramanian <balasvij@amazon.com> Co-authored-by: mend-for-github-com[bot] <50673670+mend-for-github-com[bot]@users.noreply.github.com> * Add datasource name validation (#281) Signed-off-by: Heemin Kim <heemin@amazon.com> * Refactoring of code (#282) 1. Change variable name from datasourceName to name 2. Change variable name from id to name 3. Added helper methods in test code Signed-off-by: Heemin Kim <heemin@amazon.com> * Change field name from md5 to sha256 (#285) Signed-off-by: Heemin Kim <heemin@amazon.com> * Implement get datasource api (#279) Signed-off-by: Heemin Kim <heemin@amazon.com> * Update index option (#284) 1. Make geodata index as hidden 2. Make geodata index as read only allow delete after creation is done 3. Refresh datasource index immediately after update Signed-off-by: Heemin Kim <heemin@amazon.com> * Make some fields in manifest file as mandatory (#289) Signed-off-by: Heemin Kim <heemin@amazon.com> * Create datasource index explicitly (#283) Signed-off-by: Heemin Kim <heemin@amazon.com> * Add wrapper class of job scheduler lock service (#290) Signed-off-by: Heemin Kim <heemin@amazon.com> * Remove all unused client attributes (#293) Signed-off-by: Heemin Kim <heemin@amazon.com> * Update copyright header (#298) Signed-off-by: Heemin Kim <heemin@amazon.com> * Run system index handling code with stashed thread context (#297) Signed-off-by: Heemin Kim <heemin@amazon.com> * Reduce lock duration and renew the lock during update (#299) Signed-off-by: Heemin Kim <heemin@amazon.com> * Implements delete datasource API (#291) Signed-off-by: Heemin Kim <heemin@amazon.com> * Set User-Agent in http request (#300) Signed-off-by: Heemin Kim <heemin@amazon.com> * Implement datasource update API (#292) Signed-off-by: Heemin Kim <heemin@amazon.com> * Refactoring test code (#302) Make buildGeoJSONFeatureProcessorConfig method to be more general Signed-off-by: Heemin Kim <heemin@amazon.com> * Add ip2geo processor integ test for failure case (#303) Signed-off-by: Heemin Kim <heemin@amazon.com> * Bug fix and refactoring of code (#305) 1. Bugfix: Ingest metadata can be null if there is no processor created 2. Refactoring: Moved private method to another class for better testing support 3. Refactoring: Set some private static final variable as public so that unit test can use it 4. Refactoring: Changed string value to static variable Signed-off-by: Heemin Kim <heemin@amazon.com> * Add integration test for Ip2GeoProcessor (#306) Signed-off-by: Heemin Kim <heemin@amazon.com> * Add ConcurrentModificationException (#308) Signed-off-by: Heemin Kim <heemin@amazon.com> * Add integration test for UpdateDatasource API (#307) Signed-off-by: Heemin Kim <heemin@amazon.com> * Bug fix on lock management and few performance improvements (#310) * Release lock before response back to caller for update/delete API * Release lock in background task for creation API * Change index settings to improve indexing performance Signed-off-by: Heemin Kim <heemin@amazon.com> * Change index setting from read_only_allow_delete to write (#311) read_only_allow_delete does not block write to an index. The disk-based shard allocator may add and remove this block automatically. Therefore, use index.blocks.write instead. Signed-off-by: Heemin Kim <heemin@amazon.com> * Fix bug in get datasource API and improve memory usage (#313) Signed-off-by: Heemin Kim <heemin@amazon.com> * Change package for Strings.hasText (#314) (#317) Signed-off-by: Heemin Kim <heemin@amazon.com> * Remove jitter and move index setting from DatasourceFacade to DatasourceExtension (#319) Signed-off-by: Heemin Kim <heemin@amazon.com> * Do not index blank value and do not enrich null property (#320) Signed-off-by: Heemin Kim <heemin@amazon.com> * Move index setting keys to constants (#321) Signed-off-by: Heemin Kim <heemin@amazon.com> * Return null index name for expired data (#322) Return null index name for expired data so that it can be deleted by clean up process. Clean up process exclude current index from deleting. Signed-off-by: Heemin Kim <heemin@amazon.com> * Add new fields in datasource (#325) Signed-off-by: Heemin Kim <heemin@amazon.com> * Delete index once it is expired (#326) Signed-off-by: Heemin Kim <heemin@amazon.com> * Add restoring event listener (#328) In the listener, we trigger a geoip data update Signed-off-by: Heemin Kim <heemin@amazon.com> * Reverse forcemerge and refresh order (#331) Otherwise, opensearch does not clear old segment files Signed-off-by: Heemin Kim <heemin@amazon.com> * Removed parameter and settings (#332) * Removed first_only parameter * Removed max_concurrency and batch_size setting first_only parameter was added as current geoip processor has it. However, the parameter have no benefit for ip2geo processor as we don't do a sequantial search for array data but use multi search. max_concurrency and batch_size setting is removed as these are only reveal internal implementation and could be a future blocker to improve performance later. Signed-off-by: Heemin Kim <heemin@amazon.com> * Add a field in datasource for current index name (#333) Signed-off-by: Heemin Kim <heemin@amazon.com> * Delete GeoIP data indices after restoring complete (#334) We don't want to use restored GeoIP data indices. Therefore we delete the indices once restoring process complete. When GeoIP metadata index is restored, we create a new GeoIP data index instead. Signed-off-by: Heemin Kim <heemin@amazon.com> * Use bool query for array form of IPs (#335) Signed-off-by: Heemin Kim <heemin@amazon.com> * Run update/delete request in a new thread (#337) This is not to block transport thread Signed-off-by: Heemin Kim <heemin@amazon.com> * Remove IP2Geo processor validation (#336) Cannot query index to get data to validate IP2Geo processor. Will add validation when we decide to store some of data in cluster state metadata. Signed-off-by: Heemin Kim <heemin@amazon.com> * Acquire lock sychronously (#339) By acquiring lock asychronously, the remaining part of the code is being run by transport thread which does not allow blocking code. We want only single update happen in a node using single thread. However, it cannot be acheived if I acquire lock asynchronously and pass the listener. Signed-off-by: Heemin Kim <heemin@amazon.com> * Added a cache to store datasource metadata (#338) Signed-off-by: Heemin Kim <heemin@amazon.com> * Changed class name and package (#341) Signed-off-by: Heemin Kim <heemin@amazon.com> * Refactoring of code (#342) 1. Changed class name from Ip2GeoCache to Ip2GeoCachedDao 2. Moved the Ip2GeoCachedDao from cache to dao package Signed-off-by: Heemin Kim <heemin@amazon.com> * Add geo data cache (#340) Signed-off-by: Heemin Kim <heemin@amazon.com> * Add cache layer to reduce GeoIp data retrieval latency (#343) Signed-off-by: Heemin Kim <heemin@amazon.com> * Use _primary in query preference and few changes (#347) 1. Use _primary preference to get datasource metadata so that it can read the latest data. RefreshPolicy.IMMEDIATE won't refresh replica shards immediately according to #346 2. Update datasource metadata index mapping 3. Move batch size from static value to setting Signed-off-by: Heemin Kim <heemin@amazon.com> * Wait until GeoIP data to be replicated to all data nodes (#348) Signed-off-by: Heemin Kim <heemin@amazon.com> * Update packages according to a change in OpenSearch core (#354) * Update packages according to a change in OpenSearch core Signed-off-by: Heemin Kim <heemin@amazon.com> * Update packages according to a change in OpenSearch core (#353) Signed-off-by: Heemin Kim <heemin@amazon.com> --------- Signed-off-by: Heemin Kim <heemin@amazon.com> --------- Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com> Signed-off-by: Heemin Kim <heemin@amazon.com> Co-authored-by: Vijayan Balasubramanian <balasvij@amazon.com> Co-authored-by: mend-for-github-com[bot] <50673670+mend-for-github-com[bot]@users.noreply.github.com> (cherry picked from commit 0cd9153)
* Implement creation of ip2geo feature (#257) * Update gradle version to 7.6 (#265) Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com> * Implement creation of ip2geo feature * Implementation of ip2geo datasource creation * Implementation of ip2geo processor creation Signed-off-by: Heemin Kim <heemin@amazon.com> --------- Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com> Signed-off-by: Heemin Kim <heemin@amazon.com> Co-authored-by: Vijayan Balasubramanian <balasvij@amazon.com> * Added unit tests with some refactoring of codes (#271) * Add Unit tests * Set cache true for search query * Remove in memory cache implementation (Two way door decision) * Relying on search cache without custom cache * Renamed datasource state from FAILED to CREATE_FAILED * Renamed class name from *Helper to *Facade * Changed updateIntervalInDays to updateInterval * Changed value type of default update_interval from TimeValue to Long * Read setting value from cluster settings directly Signed-off-by: Heemin Kim <heemin@amazon.com> * Sync from main (#280) * Update gradle version to 7.6 (#265) Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com> * Exclude lombok generated code from jacoco coverage report (#268) Signed-off-by: Heemin Kim <heemin@amazon.com> * Make jacoco report to be generated faster in local (#267) Signed-off-by: Heemin Kim <heemin@amazon.com> * Update dependency org.json:json to v20230227 (#273) Co-authored-by: mend-for-github-com[bot] <50673670+mend-for-github-com[bot]@users.noreply.github.com> * Baseline owners and maintainers (#275) Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com> --------- Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com> Signed-off-by: Heemin Kim <heemin@amazon.com> Co-authored-by: Vijayan Balasubramanian <balasvij@amazon.com> Co-authored-by: mend-for-github-com[bot] <50673670+mend-for-github-com[bot]@users.noreply.github.com> * Add datasource name validation (#281) Signed-off-by: Heemin Kim <heemin@amazon.com> * Refactoring of code (#282) 1. Change variable name from datasourceName to name 2. Change variable name from id to name 3. Added helper methods in test code Signed-off-by: Heemin Kim <heemin@amazon.com> * Change field name from md5 to sha256 (#285) Signed-off-by: Heemin Kim <heemin@amazon.com> * Implement get datasource api (#279) Signed-off-by: Heemin Kim <heemin@amazon.com> * Update index option (#284) 1. Make geodata index as hidden 2. Make geodata index as read only allow delete after creation is done 3. Refresh datasource index immediately after update Signed-off-by: Heemin Kim <heemin@amazon.com> * Make some fields in manifest file as mandatory (#289) Signed-off-by: Heemin Kim <heemin@amazon.com> * Create datasource index explicitly (#283) Signed-off-by: Heemin Kim <heemin@amazon.com> * Add wrapper class of job scheduler lock service (#290) Signed-off-by: Heemin Kim <heemin@amazon.com> * Remove all unused client attributes (#293) Signed-off-by: Heemin Kim <heemin@amazon.com> * Update copyright header (#298) Signed-off-by: Heemin Kim <heemin@amazon.com> * Run system index handling code with stashed thread context (#297) Signed-off-by: Heemin Kim <heemin@amazon.com> * Reduce lock duration and renew the lock during update (#299) Signed-off-by: Heemin Kim <heemin@amazon.com> * Implements delete datasource API (#291) Signed-off-by: Heemin Kim <heemin@amazon.com> * Set User-Agent in http request (#300) Signed-off-by: Heemin Kim <heemin@amazon.com> * Implement datasource update API (#292) Signed-off-by: Heemin Kim <heemin@amazon.com> * Refactoring test code (#302) Make buildGeoJSONFeatureProcessorConfig method to be more general Signed-off-by: Heemin Kim <heemin@amazon.com> * Add ip2geo processor integ test for failure case (#303) Signed-off-by: Heemin Kim <heemin@amazon.com> * Bug fix and refactoring of code (#305) 1. Bugfix: Ingest metadata can be null if there is no processor created 2. Refactoring: Moved private method to another class for better testing support 3. Refactoring: Set some private static final variable as public so that unit test can use it 4. Refactoring: Changed string value to static variable Signed-off-by: Heemin Kim <heemin@amazon.com> * Add integration test for Ip2GeoProcessor (#306) Signed-off-by: Heemin Kim <heemin@amazon.com> * Add ConcurrentModificationException (#308) Signed-off-by: Heemin Kim <heemin@amazon.com> * Add integration test for UpdateDatasource API (#307) Signed-off-by: Heemin Kim <heemin@amazon.com> * Bug fix on lock management and few performance improvements (#310) * Release lock before response back to caller for update/delete API * Release lock in background task for creation API * Change index settings to improve indexing performance Signed-off-by: Heemin Kim <heemin@amazon.com> * Change index setting from read_only_allow_delete to write (#311) read_only_allow_delete does not block write to an index. The disk-based shard allocator may add and remove this block automatically. Therefore, use index.blocks.write instead. Signed-off-by: Heemin Kim <heemin@amazon.com> * Fix bug in get datasource API and improve memory usage (#313) Signed-off-by: Heemin Kim <heemin@amazon.com> * Change package for Strings.hasText (#314) (#317) Signed-off-by: Heemin Kim <heemin@amazon.com> * Remove jitter and move index setting from DatasourceFacade to DatasourceExtension (#319) Signed-off-by: Heemin Kim <heemin@amazon.com> * Do not index blank value and do not enrich null property (#320) Signed-off-by: Heemin Kim <heemin@amazon.com> * Move index setting keys to constants (#321) Signed-off-by: Heemin Kim <heemin@amazon.com> * Return null index name for expired data (#322) Return null index name for expired data so that it can be deleted by clean up process. Clean up process exclude current index from deleting. Signed-off-by: Heemin Kim <heemin@amazon.com> * Add new fields in datasource (#325) Signed-off-by: Heemin Kim <heemin@amazon.com> * Delete index once it is expired (#326) Signed-off-by: Heemin Kim <heemin@amazon.com> * Add restoring event listener (#328) In the listener, we trigger a geoip data update Signed-off-by: Heemin Kim <heemin@amazon.com> * Reverse forcemerge and refresh order (#331) Otherwise, opensearch does not clear old segment files Signed-off-by: Heemin Kim <heemin@amazon.com> * Removed parameter and settings (#332) * Removed first_only parameter * Removed max_concurrency and batch_size setting first_only parameter was added as current geoip processor has it. However, the parameter have no benefit for ip2geo processor as we don't do a sequantial search for array data but use multi search. max_concurrency and batch_size setting is removed as these are only reveal internal implementation and could be a future blocker to improve performance later. Signed-off-by: Heemin Kim <heemin@amazon.com> * Add a field in datasource for current index name (#333) Signed-off-by: Heemin Kim <heemin@amazon.com> * Delete GeoIP data indices after restoring complete (#334) We don't want to use restored GeoIP data indices. Therefore we delete the indices once restoring process complete. When GeoIP metadata index is restored, we create a new GeoIP data index instead. Signed-off-by: Heemin Kim <heemin@amazon.com> * Use bool query for array form of IPs (#335) Signed-off-by: Heemin Kim <heemin@amazon.com> * Run update/delete request in a new thread (#337) This is not to block transport thread Signed-off-by: Heemin Kim <heemin@amazon.com> * Remove IP2Geo processor validation (#336) Cannot query index to get data to validate IP2Geo processor. Will add validation when we decide to store some of data in cluster state metadata. Signed-off-by: Heemin Kim <heemin@amazon.com> * Acquire lock sychronously (#339) By acquiring lock asychronously, the remaining part of the code is being run by transport thread which does not allow blocking code. We want only single update happen in a node using single thread. However, it cannot be acheived if I acquire lock asynchronously and pass the listener. Signed-off-by: Heemin Kim <heemin@amazon.com> * Added a cache to store datasource metadata (#338) Signed-off-by: Heemin Kim <heemin@amazon.com> * Changed class name and package (#341) Signed-off-by: Heemin Kim <heemin@amazon.com> * Refactoring of code (#342) 1. Changed class name from Ip2GeoCache to Ip2GeoCachedDao 2. Moved the Ip2GeoCachedDao from cache to dao package Signed-off-by: Heemin Kim <heemin@amazon.com> * Add geo data cache (#340) Signed-off-by: Heemin Kim <heemin@amazon.com> * Add cache layer to reduce GeoIp data retrieval latency (#343) Signed-off-by: Heemin Kim <heemin@amazon.com> * Use _primary in query preference and few changes (#347) 1. Use _primary preference to get datasource metadata so that it can read the latest data. RefreshPolicy.IMMEDIATE won't refresh replica shards immediately according to #346 2. Update datasource metadata index mapping 3. Move batch size from static value to setting Signed-off-by: Heemin Kim <heemin@amazon.com> * Wait until GeoIP data to be replicated to all data nodes (#348) Signed-off-by: Heemin Kim <heemin@amazon.com> * Update packages according to a change in OpenSearch core (#354) * Update packages according to a change in OpenSearch core Signed-off-by: Heemin Kim <heemin@amazon.com> * Update packages according to a change in OpenSearch core (#353) Signed-off-by: Heemin Kim <heemin@amazon.com> --------- Signed-off-by: Heemin Kim <heemin@amazon.com> --------- Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com> Signed-off-by: Heemin Kim <heemin@amazon.com> Co-authored-by: Vijayan Balasubramanian <balasvij@amazon.com> Co-authored-by: mend-for-github-com[bot] <50673670+mend-for-github-com[bot]@users.noreply.github.com> (cherry picked from commit 0cd9153) Co-authored-by: Heemin Kim <heemin@amazon.com>
Description
Implement GET API of ip2geo datasource
Issues Resolved
N/A
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.