Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evil Dataset Proof of Concept (updated proposal) #31

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
78 changes: 78 additions & 0 deletions evil-dataset/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
# Evil Dataset

This directory is intended to house the "evil dataset"--a collection of data & associated queries that test edge cases and behavior changes between versions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Blocker] Just trying to understand your intent. Are you envisioning this housing all of our "expectations"?


## Datapoint Library
The datapoints are contained in `/datapoint-library`. The directory name is the unique identifier for the test case.

Within the datapoint-library directory, the directory structure should be as follows:

```
datapoint-library/
├─ example-datapoint/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[blocker] Happy to iterate on the design as appropriate, but I'm curious to hear your thoughts on the number of "datapoints" you think we'll have in the medium and long term and whether we'll want to stick with this approach of separate directory structures for each of them? Or are we thinking of making a custom file format that encompasses all the data for a "datapoint"? Or maybe multiple "datapoints"?

│ ├─ README.md # human-friendly description of the edgecase or query involved
│ ├─ data.json # bulk-api json formatted document with the data to index
│ ├─ query.json # Query as an OpenSearch DSL query
│ ├─ expected.json # The expected result from the query
│ ├─ expected.7.x.txt # Optional: the expected result from the query for a specific version
│ ├─ filter.jq # Optional: a jq filter that pulls out relevant portions of the query response to be compared
├─ second-example-datapoint/
│ ├─ README.md
│ ├─ bulk.json
│ ├─ query.???
│ ├─ expected.txt
...
```

## Usage

For the time-being, these datapoints are manually invoked by the user.

The following has an example of how to use the provided files. It depends on an ES/OS cluster running--in this example, locally.

```
> cd example-datapoint

> curl -XPOST 'https://localhost:9200/_bulk?pretty' -ku "admin:admin" -H "Content-Type: application/x-ndjson" --data-binary @data.json
{
"took": 65,
"errors": false,
"items": [ ... ]
}

# The following command shows the full output from the query
> curl -XGET 'https://localhost:9200/_search?pretty' -ku "admin:admin" -H "Content-Type: application/x-ndjson" --data-binary @query.json
{
"took" : 14,
"timed_out" : false,
"_shards" : {
"total" : 6,
"successful" : 6,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [ ... ]
}
}

# For ease of comparison, a jq filter can be provided, and this allows for a one-line curl command to compare the actual vs expected output. Any output from this command indicates a mismatch, silence means the query is as expected.
> curl -s -XGET 'https://localhost:9200/_search?pretty' -ku "admin:admin" -H "Content-Type: application/x-ndjson" --data-binary @query.json | jq -f filter.jq | diff - expected.json

# An unsuccesful comparison might look like the following:
> curl -s -XGET 'https://localhost:9200/_search?pretty' -ku "admin:admin" -H "Content-Type: application/x-ndjson" --data-binary @query.json | jq -f filter.jq | diff - expected.json
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Blocker] Trying to understand your longer-term intent here and what the scripting implications will be. Do you see us using curl directly for the foreseeable future? Or are we going to use the client SDKs for whatever language the test script is written in?

2c2
< "count": 4,
---
> "count": 2,
5,6d4
< "C",
< "B",
# Here the query returned 4 hits instead of the expected 2.
```
7 changes: 7 additions & 0 deletions evil-dataset/datapoint-library/trivial-example/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Trivial Example

This example doesn't demonstrate an edge case, but is intended to be a simple example of loading data and querying it as proof of concept and a template for future development.

It loads three documents with different dates and then queries with a date range with an inclusive upper bound that should catch two of the three documents.

The jq filter pulls out the number of hits and the names of the hits -- this ensures that we're getting the correct two files.
6 changes: 6 additions & 0 deletions evil-dataset/datapoint-library/trivial-example/data.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
{ "create": { "_index": "date-range-test"} }
{ "created_at": "2022-12-03", "name": "A" }
{ "create": { "_index": "date-range-test"} }
{ "created_at": "2022-12-04", "name": "B" }
{ "create": { "_index": "date-range-test"} }
{ "created_at": "2022-12-05", "name": "C" }
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
{
"count": 2,
"names": [
"B",
"C"
]
}
1 change: 1 addition & 0 deletions evil-dataset/datapoint-library/trivial-example/filter.jq
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
. | {count: .hits.total.value, names: [.hits.hits[]._source.name]}
10 changes: 10 additions & 0 deletions evil-dataset/datapoint-library/trivial-example/query.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
{
"query": {
"range": {
"created_at": {
"gt": "2022-12-03",
"lte": "2022-12-05"
}
}
}
}
39 changes: 39 additions & 0 deletions evil-dataset/datapoint-library/trivial-example/results.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
{
"took" : 32,
"timed_out" : false,
"_shards" : {
"total" : 6,
"successful" : 6,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "date-range-test",
"_type" : "_doc",
"_id" : "sw6G5IQB2vy30Mw7fgct",
"_score" : 1.0,
"_source" : {
"created_at" : "2022-12-04",
"name" : "B"
}
},
{
"_index" : "date-range-test",
"_type" : "_doc",
"_id" : "tA6G5IQB2vy30Mw7fgct",
"_score" : 1.0,
"_source" : {
"created_at" : "2022-12-05",
"name" : "C"
}
}
]
}
}