-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evil Dataset Proof of Concept (updated proposal) #31
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,78 @@ | ||
# Evil Dataset | ||
|
||
This directory is intended to house the "evil dataset"--a collection of data & associated queries that test edge cases and behavior changes between versions. | ||
|
||
## Datapoint Library | ||
The datapoints are contained in `/datapoint-library`. The directory name is the unique identifier for the test case. | ||
|
||
Within the datapoint-library directory, the directory structure should be as follows: | ||
|
||
``` | ||
datapoint-library/ | ||
├─ example-datapoint/ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [blocker] Happy to iterate on the design as appropriate, but I'm curious to hear your thoughts on the number of "datapoints" you think we'll have in the medium and long term and whether we'll want to stick with this approach of separate directory structures for each of them? Or are we thinking of making a custom file format that encompasses all the data for a "datapoint"? Or maybe multiple "datapoints"? |
||
│ ├─ README.md # human-friendly description of the edgecase or query involved | ||
│ ├─ data.json # bulk-api json formatted document with the data to index | ||
│ ├─ query.json # Query as an OpenSearch DSL query | ||
│ ├─ expected.json # The expected result from the query | ||
│ ├─ expected.7.x.txt # Optional: the expected result from the query for a specific version | ||
│ ├─ filter.jq # Optional: a jq filter that pulls out relevant portions of the query response to be compared | ||
├─ second-example-datapoint/ | ||
│ ├─ README.md | ||
│ ├─ bulk.json | ||
│ ├─ query.??? | ||
│ ├─ expected.txt | ||
│ | ||
... | ||
``` | ||
|
||
## Usage | ||
|
||
For the time-being, these datapoints are manually invoked by the user. | ||
|
||
The following has an example of how to use the provided files. It depends on an ES/OS cluster running--in this example, locally. | ||
|
||
``` | ||
> cd example-datapoint | ||
|
||
> curl -XPOST 'https://localhost:9200/_bulk?pretty' -ku "admin:admin" -H "Content-Type: application/x-ndjson" --data-binary @data.json | ||
{ | ||
"took": 65, | ||
"errors": false, | ||
"items": [ ... ] | ||
} | ||
|
||
# The following command shows the full output from the query | ||
> curl -XGET 'https://localhost:9200/_search?pretty' -ku "admin:admin" -H "Content-Type: application/x-ndjson" --data-binary @query.json | ||
{ | ||
"took" : 14, | ||
"timed_out" : false, | ||
"_shards" : { | ||
"total" : 6, | ||
"successful" : 6, | ||
"skipped" : 0, | ||
"failed" : 0 | ||
}, | ||
"hits" : { | ||
"total" : { | ||
"value" : 2, | ||
"relation" : "eq" | ||
}, | ||
"max_score" : 1.0, | ||
"hits" : [ ... ] | ||
} | ||
} | ||
|
||
# For ease of comparison, a jq filter can be provided, and this allows for a one-line curl command to compare the actual vs expected output. Any output from this command indicates a mismatch, silence means the query is as expected. | ||
> curl -s -XGET 'https://localhost:9200/_search?pretty' -ku "admin:admin" -H "Content-Type: application/x-ndjson" --data-binary @query.json | jq -f filter.jq | diff - expected.json | ||
|
||
# An unsuccesful comparison might look like the following: | ||
> curl -s -XGET 'https://localhost:9200/_search?pretty' -ku "admin:admin" -H "Content-Type: application/x-ndjson" --data-binary @query.json | jq -f filter.jq | diff - expected.json | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [Blocker] Trying to understand your longer-term intent here and what the scripting implications will be. Do you see us using curl directly for the foreseeable future? Or are we going to use the client SDKs for whatever language the test script is written in? |
||
2c2 | ||
< "count": 4, | ||
--- | ||
> "count": 2, | ||
5,6d4 | ||
< "C", | ||
< "B", | ||
# Here the query returned 4 hits instead of the expected 2. | ||
``` |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
# Trivial Example | ||
|
||
This example doesn't demonstrate an edge case, but is intended to be a simple example of loading data and querying it as proof of concept and a template for future development. | ||
|
||
It loads three documents with different dates and then queries with a date range with an inclusive upper bound that should catch two of the three documents. | ||
|
||
The jq filter pulls out the number of hits and the names of the hits -- this ensures that we're getting the correct two files. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
{ "create": { "_index": "date-range-test"} } | ||
{ "created_at": "2022-12-03", "name": "A" } | ||
{ "create": { "_index": "date-range-test"} } | ||
{ "created_at": "2022-12-04", "name": "B" } | ||
{ "create": { "_index": "date-range-test"} } | ||
{ "created_at": "2022-12-05", "name": "C" } |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
{ | ||
"count": 2, | ||
"names": [ | ||
"B", | ||
"C" | ||
] | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
. | {count: .hits.total.value, names: [.hits.hits[]._source.name]} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
{ | ||
"query": { | ||
"range": { | ||
"created_at": { | ||
"gt": "2022-12-03", | ||
"lte": "2022-12-05" | ||
} | ||
} | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
{ | ||
"took" : 32, | ||
"timed_out" : false, | ||
"_shards" : { | ||
"total" : 6, | ||
"successful" : 6, | ||
"skipped" : 0, | ||
"failed" : 0 | ||
}, | ||
"hits" : { | ||
"total" : { | ||
"value" : 2, | ||
"relation" : "eq" | ||
}, | ||
"max_score" : 1.0, | ||
"hits" : [ | ||
{ | ||
"_index" : "date-range-test", | ||
"_type" : "_doc", | ||
"_id" : "sw6G5IQB2vy30Mw7fgct", | ||
"_score" : 1.0, | ||
"_source" : { | ||
"created_at" : "2022-12-04", | ||
"name" : "B" | ||
} | ||
}, | ||
{ | ||
"_index" : "date-range-test", | ||
"_type" : "_doc", | ||
"_id" : "tA6G5IQB2vy30Mw7fgct", | ||
"_score" : 1.0, | ||
"_source" : { | ||
"created_at" : "2022-12-05", | ||
"name" : "C" | ||
} | ||
} | ||
] | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Blocker] Just trying to understand your intent. Are you envisioning this housing all of our "expectations"?