opensearch-project · mikaylathompson · Dec 5, 2022 · chelma · Dec 6, 2022 · chelma
@@ -0,0 +1,78 @@
+# Evil Dataset
+
+This directory is intended to house the "evil dataset"--a collection of data & associated queries that test edge cases and behavior changes between versions.
+
+## Datapoint Library
+The datapoints are contained in `/datapoint-library`. The directory name is the unique identifier for the test case.
+
+Within the datapoint-library directory, the directory structure should be as follows:
+
+```
+datapoint-library/
+├─ example-datapoint/
+│  ├─ README.md		      # human-friendly description of the edgecase or query involved
+│  ├─ data.json		      # bulk-api json formatted document with the data to index
+│  ├─ query.json	      # Query as an OpenSearch DSL query
+│  ├─ expected.json	    # The expected result from the query
+│  ├─ expected.7.x.txt	# Optional: the expected result from the query for a specific version
+│  ├─ filter.jq         # Optional: a jq filter that pulls out relevant portions of the query response to be compared
+├─ second-example-datapoint/
+│  ├─ README.md
+│  ├─ bulk.json
+│  ├─ query.???
+│  ├─ expected.txt
+│
+...
+```
+
+## Usage
+
+For the time-being, these datapoints are manually invoked by the user.
+
+The following has an example of how to use the provided files. It depends on an ES/OS cluster running--in this example, locally.
+
+```
+> cd example-datapoint
+
+> curl -XPOST 'https://localhost:9200/_bulk?pretty' -ku "admin:admin" -H "Content-Type: application/x-ndjson" --data-binary @data.json
+{
+  "took": 65,
+  "errors": false,
+  "items": [ ... ]
+}
+
+# The following command shows the full output from the query
+> curl -XGET 'https://localhost:9200/_search?pretty' -ku "admin:admin" -H "Content-Type: application/x-ndjson" --data-binary @query.json
+{
+  "took" : 14,
+  "timed_out" : false,
+  "_shards" : {
+    "total" : 6,
+    "successful" : 6,
+    "skipped" : 0,
+    "failed" : 0
+  },
+  "hits" : {
+    "total" : {
+      "value" : 2,
+      "relation" : "eq"
+    },
+    "max_score" : 1.0,
+    "hits" : [ ... ]
+  }
+}
+
+# For ease of comparison, a jq filter can be provided, and this allows for a one-line curl command to compare the actual vs expected output. Any output from this command indicates a mismatch, silence means the query is as expected.
+> curl -s -XGET 'https://localhost:9200/_search?pretty' -ku "admin:admin" -H "Content-Type: application/x-ndjson" --data-binary @query.json | jq -f filter.jq | diff - expected.json
+
+# An unsuccesful comparison might look like the following:
+> curl -s -XGET 'https://localhost:9200/_search?pretty' -ku "admin:admin" -H "Content-Type: application/x-ndjson" --data-binary @query.json | jq -f filter.jq | diff - expected.json
+2c2
+<   "count": 4,
+---
+>   "count": 2,
+5,6d4
+<     "C",
+<     "B",
+# Here the query returned 4 hits instead of the expected 2.
+```
@@ -0,0 +1,7 @@
+# Trivial Example
+
+This example doesn't demonstrate an edge case, but is intended to be a simple example of loading data and querying it as proof of concept and a template for future development.
+
+It loads three documents with different dates and then queries with a date range with an inclusive upper bound that should catch two of the three documents.
+
+The jq filter pulls out the number of hits and the names of the hits -- this ensures that we're getting the correct two files.
@@ -0,0 +1,6 @@
+{ "create": { "_index": "date-range-test"} }
+{ "created_at": "2022-12-03", "name": "A" }
+{ "create": { "_index": "date-range-test"} }
+{ "created_at": "2022-12-04", "name": "B" }
+{ "create": { "_index": "date-range-test"} }
+{ "created_at": "2022-12-05", "name": "C" }
@@ -0,0 +1,7 @@
+{
+  "count": 2,
+  "names": [
+    "B",
+    "C"
+  ]
+}
@@ -0,0 +1 @@
+. | {count: .hits.total.value, names: [.hits.hits[]._source.name]}
@@ -0,0 +1,10 @@
+{
+  "query": {
+    "range": {
+      "created_at": {
+        "gt": "2022-12-03",
+        "lte": "2022-12-05"
+      }
+    }
+  }
+}
@@ -0,0 +1,39 @@
+{
+  "took" : 32,
+  "timed_out" : false,
+  "_shards" : {
+    "total" : 6,
+    "successful" : 6,
+    "skipped" : 0,
+    "failed" : 0
+  },
+  "hits" : {
+    "total" : {
+      "value" : 2,
+      "relation" : "eq"
+    },
+    "max_score" : 1.0,
+    "hits" : [
+      {
+        "_index" : "date-range-test",
+        "_type" : "_doc",
+        "_id" : "sw6G5IQB2vy30Mw7fgct",
+        "_score" : 1.0,
+        "_source" : {
+          "created_at" : "2022-12-04",
+          "name" : "B"
+        }
+      },
+      {
+        "_index" : "date-range-test",
+        "_type" : "_doc",
+        "_id" : "tA6G5IQB2vy30Mw7fgct",
+        "_score" : 1.0,
+        "_source" : {
+          "created_at" : "2022-12-05",
+          "name" : "C"
+        }
+      }
+    ]
+  }
+}
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		. \| {count: .hits.total.value, names: [.hits.hits[]._source.name]}