Skip to content
This repository has been archived by the owner on Aug 2, 2022. It is now read-only.

Backporting latest master commit to opendistro 1.4.0 branch #460

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
d0c719c
Update the opendistro sql 1.4.0 release notes (#359)
penghuo Jan 28, 2020
f1ff95f
adding DATETIME cast support (#310)
davidcui1225 Jan 30, 2020
a927c13
Documentation for simple query (#366)
dai-chen Feb 17, 2020
f989909
Return Correct Type Information for Fields (#365)
davidcui1225 Feb 21, 2020
a9ef08d
Report date data as a standardized format (#367)
jordanw-bq Feb 28, 2020
5ea731a
Integration test with external ES cluster (#374)
abbashus Mar 5, 2020
71d2adb
Bug fix, return object type for field which has implicit object datat…
penghuo Mar 12, 2020
0bf0b3c
Pagination doc (#379)
abbashus Mar 13, 2020
5f620cd
Handle the elasticsearch exceptions in JDBC formatted outputs (#362)
chloe-zh Mar 17, 2020
470ee0a
Modified the wording of exception messages and created the troublesho…
chloe-zh Mar 17, 2020
667debb
Sql CI/CD (#384)
rishabh6788 Mar 18, 2020
6ad305c
FIX: field function name letter case preserved in select with group b…
chenqi0805 Mar 19, 2020
4cf20b9
Fix broken LICENSE link in README.md (#394)
abbashus Mar 30, 2020
e3032b7
New SQL cluster settings endpoint (#400)
abbashus Mar 31, 2020
b6eb292
Bug Fix, add support for strict_date_optional_time (#412)
penghuo Apr 7, 2020
c63b3b8
Invalidate HTTP GET method (#414)
chloe-zh Apr 8, 2020
014ae01
More docs in reference manual and add architecture doc (#417)
dai-chen Apr 8, 2020
bd9fbe6
Bug fix, support subquery in from doesn't have alias (#418)
penghuo Apr 9, 2020
79600e3
Anonymize sensitive data in queries exposed to RestSqlAction logs (#419)
chloe-zh Apr 10, 2020
6100e68
Bug fix, ignore the term query rewrite if there is no index found (#425)
penghuo Apr 12, 2020
d47f18a
Simple Query Cursor support (#390)
abbashus Apr 14, 2020
fda5f6c
[BugFix] Enforce AVG to return double data type (#437)
zhongnansu Apr 22, 2020
4c824de
Bug fix, count(distinct field) should transalte to cardinality aggreg…
penghuo Apr 27, 2020
e397120
Fix CSV injection issue (#447)
dai-chen Apr 29, 2020
c56ed25
[BugFix] mock LocalClusterState settings in QueryPlanner base class (…
zhongnansu Apr 29, 2020
05bef16
Escape comma for CSV header and all queries (#456)
dai-chen May 4, 2020
0ab12c7
Bug Fix, support using aggregation function in order by clause (#452)
penghuo May 4, 2020
e6f8cd9
Remove CI and change release workflow based on tags (#457)
penghuo May 4, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 43 additions & 0 deletions .github/workflows/release-workflow.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
name: Release SQL Artifacts
# This workflow is triggered on creating tags to master or an opendistro release branch
on:
push:
tags:
- 'v*'

jobs:
build:
strategy:
matrix:
java: [12]

name: Build and Release SQL Plugin
runs-on: [ubuntu-16.04]

steps:
- name: Checkout SQL
uses: actions/checkout@v1

- name: Configure AWS Credentials
uses: aws-actions/configure-aws-credentials@v1
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: us-east-1

- name: Setup Java ${{ matrix.java }}
uses: actions/setup-java@v1
with:
java-version: ${{ matrix.java }}

- name: Run build
run: |
./gradlew buildPackages --refresh-dependencies --console=plain -Dbuild.snapshot=false
artifact=`ls build/distributions/*.zip`
rpm_artifact=`ls build/distributions/*.rpm`
deb_artifact=`ls build/distributions/*.deb`

aws s3 cp $artifact s3://artifacts.opendistroforelasticsearch.amazon.com/downloads/elasticsearch-plugins/opendistro-sql/
aws s3 cp $rpm_artifact s3://artifacts.opendistroforelasticsearch.amazon.com/downloads/rpms/opendistro-sql/
aws s3 cp $deb_artifact s3://artifacts.opendistroforelasticsearch.amazon.com/downloads/debs/opendistro-sql/
aws cloudfront create-invalidation --distribution-id E1VG5HMIWI4SA2 --paths "/downloads/*"
File renamed without changes.
5 changes: 1 addition & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,9 +29,6 @@ The package uses the [Gradle](https://docs.gradle.org/4.10.2/userguide/userguide
To use the feature, send requests to the `_opendistro/_sql` URI. You can use a request parameter or the request body (recommended).

* Simple query
```
GET https://<host>:<port>/_opendistro/_sql?sql=select * from my-index limit 50
```

```
POST https://<host>:<port>/_opendistro/_sql
Expand Down Expand Up @@ -186,7 +183,7 @@ If you discover a potential security issue in this project we ask that you notif

## Licensing

See the [LICENSE](./LICENSE) file for our project's licensing. We will ask you to confirm the licensing of your contribution.
See the [LICENSE](./LICENSE.txt) file for our project's licensing. We will ask you to confirm the licensing of your contribution.


## Copyright
Expand Down
22 changes: 19 additions & 3 deletions build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,9 @@ version = "${opendistroVersion}.0"

apply plugin: 'elasticsearch.esplugin'
apply plugin: 'jacoco'
apply from: 'build-tools/sqlplugin-coverage.gradle'
if (!System.properties.containsKey('tests.rest.cluster') && !System.properties.containsKey('tests.cluster')){
apply from: 'build-tools/sqlplugin-coverage.gradle'
}
apply plugin: 'antlr'

jacoco {
Expand Down Expand Up @@ -134,6 +136,15 @@ integTestRunner {
// allows integration test classes to access test resource from project root path
systemProperty('project.root', project.rootDir.absolutePath)

// Tell the test JVM if the cluster JVM is running under a debugger so that tests can use longer timeouts for
// requests. The 'doFirst' delays reading the debug setting on the cluster till execution time.
doFirst { systemProperty 'cluster.debug', integTestCluster.debug }

// The --debug-jvm command-line option makes the cluster debuggable; this makes the tests debuggable
if (System.getProperty("test.debug") != null) {
jvmArgs '-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=*:5005'
}

// Run different task based on test type. "exclude" is required for each task.
def testType = System.getProperty("testType")
if (testType == 'doctest') { // Doctest to generate documentation
Expand All @@ -155,9 +166,9 @@ integTestRunner {
systemProperty "otherDbUrls", System.getProperty("otherDbUrls")
systemProperty "queries", System.getProperty("queries")

} else { // Run all other integration tests and doctest by default
} else { // Run all other integration tests. Skip doctest for now due to randomness in our stats API.
include 'com/amazon/opendistroforelasticsearch/sql/esintgtest/**/*IT.class'
include 'com/amazon/opendistroforelasticsearch/sql/doctest/**/*IT.class'
exclude 'com/amazon/opendistroforelasticsearch/sql/doctest/**/*IT.class'
exclude 'com/amazon/opendistroforelasticsearch/sql/correctness/**'
}
}
Expand All @@ -167,6 +178,10 @@ integTestCluster {
distribution = "oss-zip"
}

run {
distribution = "oss-zip"
}

generateGrammarSource {
arguments += ['-visitor', '-package', 'com.amazon.opendistroforelasticsearch.sql.antlr.parser']
source = sourceSets.main.antlr
Expand Down Expand Up @@ -223,6 +238,7 @@ dependencies {
compile group: "org.elasticsearch.plugin", name: 'reindex-client', version: "${es_version}"
compile group: 'com.google.guava', name: 'guava', version:'15.0'
compile group: 'org.json', name: 'json', version:'20180813'
compile group: 'org.apache.commons', name: 'commons-lang3', version: '3.9'

// ANTLR gradle plugin and runtime dependency
antlr "org.antlr:antlr4:4.7.1"
Expand Down
39 changes: 39 additions & 0 deletions docs/dev/Architecture.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# OpenDistro SQL Engine Architecture

---
## 1.Overview

The OpenDistro SQL (OD-SQL) project is developed based on NLPChina project (https://github.com/NLPchina/elasticsearch-sql) which has been deprecated now ([attributions](https://github.com/opendistro-for-elasticsearch/sql/blob/master/docs/attributions.md)). Over the one year in development, a lot of features have been added to the OD-SQL project on top of the existing older NLPChina project. The purpose of this document is to explain the OD-SQL current architecture going ahead.

---
## 2.High Level View

In the high level, the OD-SQL Engine could be divided into four major sub-module.

* *Parser*: Currently, there are two Lex&Parser coexists. The Druid Lex&Parser is the original one from NLPChina. The input AST of Core Engine is from the Druid Lex&Parser. The [ANTLR](https://github.com/opendistro-for-elasticsearch/sql/blob/master/src/main/antlr/OpenDistroSqlParser.g4) Lex&Parser is added by us to customized the verification and exception handling.
* *Analyzer*: The analyzer module take the output from ANTLR Lex&Parser then perform syntax and semantic analyze.
* *Core Engine*: The QueryAction take the output from Druid Lex&Parser and translate to the Elasticsearch DSL if possible. This is an NLPChina original module. The QueryPlanner Builder is added by us to support the JOIN and Post-processing logic. The QueryPlanner will take the take the output from Druid Lex&Parser and build the PhysicalPlan
* *Execution*: The execution module execute QueryAction or QueryPlanner and return the response to the client. Different from the Frontend, Analyzer and Core Engine which running on the Transport Thread and can’t do any blocking operation. The Execution module running on the client threadpool and can perform the blocking operation.

There are also others modules include in the OD-SQL engine.

* _Documentation_: it is used to auto-generated documentation.
* _Metrics_: it is used to collect OD-SQL related metrics.
* _Resource Manager_: it is used to monitor the memory consumption when performing join operation to avoid the impact to Elasticsearch availability.

![Architecture Overview](img/architecture-overview.png)

---
## 3.Journey of the query in OD-SQL engine.

The following diagram take a sample query and explain how the query flow within different modules.

![Architecture Journey](img/architecture-journey.png)

1. The ANTRL parser based on grammar file (https://github.com/opendistro-for-elasticsearch/sql/blob/master/src/main/antlr/OpenDistroSqlParser.g4) to auto generate the AST.
2. The Syntax and Semantic Analyzer will walk through the AST and verify whether the query is follow the grammar and supported by the OD-SQL. e.g. *SELECT * FROM semantics WHERE LOG(age, city) = 1, *will throw exception with message* Function [LOG] cannot work with [INTEGER, KEYWORD]. *and sample usage message* Usage: LOG(NUMBER T) → DOUBLE.
3. The Druid Lex&Parser takes the input query and generate the druid AST which is different from the AST generated by the ANTRL. This module is the open source library (https://github.com/alibaba/druid) used by NLPChina originally.
4. The QueryPlanner Builder take the AST as input and generate the LogicalPlan from it. Then it optimize the LogicalPlan to PhysicalPlan.(In current implementation, only rule-based model is implemented). The major part of PhysicalPlan generation use NLPChina’s original logic to translate the SQL expression in AST to Elasticsearch DSL.
5. The QueryPlanner executor execute the PhysicalPlan in worker thread.
6. The formatter will reformat the response data to the required format. The default format is JDBC format.

Loading