diff --git a/dev-tools/checkstyle_suppressions.xml b/dev-tools/checkstyle_suppressions.xml index af8c48fc9f0..8ed9353e0db 100644 --- a/dev-tools/checkstyle_suppressions.xml +++ b/dev-tools/checkstyle_suppressions.xml @@ -29,7 +29,6 @@ - diff --git a/sql/server/src/main/java/org/elasticsearch/xpack/sql/package-info.java b/sql/server/src/main/java/org/elasticsearch/xpack/sql/package-info.java index 100a8ccef87..9528df0332d 100644 --- a/sql/server/src/main/java/org/elasticsearch/xpack/sql/package-info.java +++ b/sql/server/src/main/java/org/elasticsearch/xpack/sql/package-info.java @@ -6,140 +6,178 @@ /** * X-Pack SQL module is a SQL interface to Elasticsearch.
- * In a nutshell, currently, SQL acts as a translator, allowing traditional SQL queries to be executed against - * Elasticsearch indices without any modifications. Do note that SQL does not try to hide Elasticsearch or - * abstract it in anyway; rather it maps the given SQL, if possible, to one (at the moment) query DSL. Of course, - * this means not all SQL queries are supported.
- * + * In a nutshell, currently, SQL acts as a translator, allowing + * traditional SQL queries to be executed against Elasticsearch indices + * without any modifications. Do note that SQL does not try to hide + * Elasticsearch or abstract it in anyway; rather it maps the given SQL, + * if possible, to one (at the moment) query DSL. Of course, this means + * not all SQL queries are supported.
* *

Premise

- * Since Elasticsearch is not a database nor does it supports arbitrary {@code JOIN}s (a cornerstone of SQL), - * SQL module is built from the ground-up with Elasticsearch in mind first and SQL second. - * In fact, even the grammar introduces Elasticsearch specific components that have no concept in ANSI SQL. + * Since Elasticsearch is not a database nor does it supports arbitrary + * {@code JOIN}s (a cornerstone of SQL), SQL module is built from the + * ground-up with Elasticsearch in mind first and SQL second. In fact, + * even the grammar introduces Elasticsearch specific components that + * have no concept in ANSI SQL. * *

Architecture

- * SQL module is roughly based on the Volcano project (by Graefe {@code &} co) + * SQL module is roughly based on the Volcano project (by Graefe + * {@code &} co) * [1] * [2] * [3] * - * which argues for several design principles, from which 2 are relevant to this project, namely: + * which argues for several design principles, from which 2 are relevant + * to this project, namely: * *
*
Logical and Physical algebra
- *
Use of extensible algebraic and logical set of operators to describe the operation to underlying engine. - * The engine job is to map a user query into logical algebra and then translate this into physical algebra.
+ *
Use of extensible algebraic and logical set of operators to + * describe the operation to underlying engine. The engine job is to + * map a user query into logical algebra and then translate this into + * physical algebra.
*
Rules to identify patterns
- *
The use of rules as a way to identify relevant patterns inside the plans that can be worked upon
+ *
The use of rules as a way to identify relevant + * patterns inside the plans that can be worked upon
*
* - * In other words, the use of a logical plan, which represents what the user has requested and a physical plan which is - * what the engine needs to execute based on the user request. - * To manipulate the plans, the engine does pattern matching implemented as rules that get applied over and over until - * none matches. - * An example of a rule would be expanding {@code *} to actual concrete references. + * In other words, the use of a logical plan, which represents what the + * user has requested and a physical plan which is what the engine needs + * to execute based on the user request. To manipulate the plans, the + * engine does pattern matching implemented as rules that get applied over + * and over until none matches. + * An example of a rule would be expanding {@code *} to actual concrete + * references. * - * As a side-note, the Volcano model has proved quite popular being used (to different degrees) by the majority of SQL - * engines out there such as Apache Calcite, Apache Impala, Apache Spark and Facebook Presto. + * As a side-note, the Volcano model has proved quite popular being used + * (to different degrees) by the majority of SQL engines out there such + * as Apache Calcite, Apache Impala, Apache Spark and Facebook Presto. * *

Concepts

* - * The building operation of the SQL engine is defined by an action, namely a rule (defined in - * {@link org.elasticsearch.xpack.sql.rule rule} package that accepts one - * immutable tree (defined in {@link org.elasticsearch.xpack.sql.tree tree} package) and transforms it to another immutable - * tree. - * Each rules looks for a certain pattern that it can identify and then transform. + * The building operation of the SQL engine is defined by an action, + * namely a rule (defined in {@link org.elasticsearch.xpack.sql.rule rule} + * package that accepts one immutable tree (defined in + * {@link org.elasticsearch.xpack.sql.tree tree} package) and transforms + * it to another immutable tree. + * Each rules looks for a certain pattern that it can identify and + * then transform. * * The engine works with 3 main type of trees: * *
*
Logical plan
- *
Logical representation of a user query. Any transformation of this plan should result in an - * equivalent plan - meaning for the same input, - * it will generate the same output.
+ *
Logical representation of a user query. Any transformation + * of this plan should result in an equivalent plan - meaning for + * the same input, it will generate the same output.
*
Physical plan
- *
Execution representation of a user query. This plan needs to translate to (currently) one query to - * Elasticsearch. It is likely in the future (once - * we look into supporting {@code JOIN}s, different strategies for generating a physical plan will be available - * depending on the cost.
+ *
Execution representation of a user query. This plan needs + * to translate to (currently) one query to Elasticsearch. It is likely + * in the future (once we look into supporting {@code JOIN}s, different + * strategies for generating a physical plan will be available depending + * on the cost.
*
Expression tree
- *
Both the logical and physical plan contain an expression trees that need to be incorporated into the query. - * For the most part, most of the work inside the engine - * resolves around expressions.
+ *
Both the logical and physical plan contain an expression trees + * that need to be incorporated into the query. For the most part, most + * of the work inside the engine resolves around expressions.
*
* * All types of tree inside the engine have the following properties: *
*
Immutability
- *
Each node and its properties are immutable. A change in a property results in a new node which results in a - * new tree.
+ *
Each node and its properties are immutable. A change in a property + * results in a new node which results in a new tree.
*
Resolution
- *
Due to the algebraic nature of SQL, each tree has the notion of resolution which indicates whether it has been - * resolved or not. A node can be resolved only if it - * and its children have all been resolved.
+ *
Due to the algebraic nature of SQL, each tree has the notion of + * resolution which indicates whether it has been resolved or not. A node + * can be resolved only if it and its children have all been + * resolved.
*
Traversal
- *
Each tree can be traversed top-to-bottom/pre-order/parents-first or bottom-up/post-order/children-first. The - * difference in the traversal depends on the pattern that is being - * identified.
+ *
Each tree can be traversed top-to-bottom/pre-order/parents-first or + * bottom-up/post-order/children-first. The difference in the traversal + * depends on the pattern that is being identified.
*
* * A typical flow inside the engine is the following: * *
    *
  1. The engine is given a query
  2. - *
  3. The query is parsed and transformed into an unresolved AST or logical plan
  4. + *
  5. The query is parsed and transformed into an unresolved AST or + * logical plan
  6. *
  7. The logical plan gets analyzed and resolved
  8. *
  9. The logical plan gets optimized
  10. *
  11. The logical plan gets transformed into a physical plan
  12. - *
  13. The physical plan gets mapped and then folded into an Elasticsearch query
  14. + *
  15. The physical plan gets mapped and then folded into an Elasticsearch + * query
  16. *
  17. The Elasticsearch query gets executed
  18. *
* - *

Digression - Visitors, pattern matching, {@code instanceof} and Java 10/11/12

+ *

Digression - Visitors, pattern matching, {@code instanceof} and + * Java 10/11/12

* - * To implement the above concepts, several choices have been made in the engine (which are not common in the rest of the XPack code base). - * In particular the conventions/signatures of {@link org.elasticsearch.xpack.sql.tree.Node tree}s and usage of {@code instanceof} inside {@link org.elasticsearch.xpack.sql.rule.Rule rule}s). - * Java doesn't provide any utilities for tree abstractions or pattern matching for that matter. - * Typically for tree traversal one would employ the Visitor + * To implement the above concepts, several choices have been made in the + * engine (which are not common in the rest of the XPack code base). In + * particular the conventions/signatures of + * {@link org.elasticsearch.xpack.sql.tree.Node tree}s and usage of + * {@code instanceof} inside + * {@link org.elasticsearch.xpack.sql.rule.Rule rule}s). + * Java doesn't provide any utilities for tree abstractions or pattern + * matching for that matter. Typically for tree traversal one would employ + * the Visitor * pattern however that is not a suitable candidate for SQL because: *
    - *
  • the visitor granularity is a node and patterns are likely to involve multiple nodes
  • - *
  • transforming a tree and identifying a pattern requires holding a state which means either the tree or the visitor become - * stateful
  • + *
  • the visitor granularity is a node and patterns are likely to involve + * multiple nodes
  • + *
  • transforming a tree and identifying a pattern requires holding a + * state which means either the tree or the visitor become stateful
  • *
  • a node can stop traversal (which is not desired)
  • - *
  • it's unwieldy - every node type requires a dedicated {@code visit} method
  • + *
  • it's unwieldy - every node type requires a dedicated {@code visit} + * method
  • *
* - * While in Java, there might be hope for the future - * Scala has made it a core feature. Its byte-code implementation - * is less pretty as it relies on {@code instanceof} checks. - * Which is how many rules are implemented in the SQL engine as well. Where possible though, one can use typed traversal by passing - * a {@code Class} token to the lambdas (i.e. {@link org.elasticsearch.xpack.sql.tree.Node#transformDown(java.util.function.Function, Class) pre-order transformation}). + * While in Java, there might be hope for + * the future + * Scala has made it a + * core feature. + * Its byte-code implementation is less pretty as it relies on + * {@code instanceof} checks. Which is how many rules are implemented in + * the SQL engine as well. Where possible though, one can use typed + * traversal by passing a {@code Class} token to the lambdas (i.e. + * {@link org.elasticsearch.xpack.sql.tree.Node#transformDown(java.util.function.Function, Class) + * pre-order transformation}). * *

Components

* * The SQL engine is made up of the following components: *
*
{@link org.elasticsearch.xpack.sql.parser Parser} package
- *
Tokenizer and Lexer of the SQL grammar. Translates user query into an AST tree ({@code LogicalPlan}. Makes sure the user query is syntactically valid.
+ *
Tokenizer and Lexer of the SQL grammar. Translates user query into an + * AST tree ({@code LogicalPlan}. Makes sure the user query is syntactically + * valid.
*
{@link org.elasticsearch.xpack.sql.analysis.analyzer.PreAnalyzer PreAnalyzer}
- *
Performs basic inspection of the {@code LogicalPlan} for gathering critical information for the main analysis. This stage is separate from {@code Analysis} + *
Performs basic inspection of the {@code LogicalPlan} for gathering critical + * information for the main analysis. This stage is separate from {@code Analysis} * since it performs async/remote calls to the cluster.
*
{@link org.elasticsearch.xpack.sql.analysis.analyzer.Analyzer Analyzer}
- *
Performs {@code LogicalPlan} analysis, resolution and verification. Makes sure the user query is actually valid and semantically valid.
+ *
Performs {@code LogicalPlan} analysis, resolution and verification. Makes + * sure the user query is actually valid and semantically valid.
*
{@link org.elasticsearch.xpack.sql.optimizer.Optimizer Optimizer}
- *
Transforms the resolved {@code LogicalPlan} into a semantically equivalent tree, meaning for the same input, the same output is produced.
+ *
Transforms the resolved {@code LogicalPlan} into a semantically + * equivalent tree, meaning for the same input, the same output is produced.
*
{@link org.elasticsearch.xpack.sql.planner.Planner Planner}
*
Performs query planning. The planning is made up of two components: *
*
{@code Mapper}
*
Maps the {@code LogicalPlan} to a {@code PhysicalPlan}
*
{@code Folder}
- *
Folds or rolls-up the {@code PhysicalPlan} into an Elasticsearch {@link org.elasticsearch.xpack.sql.plan.physical.EsQueryExec executable query}
+ *
Folds or rolls-up the {@code PhysicalPlan} into an Elasticsearch + * {@link org.elasticsearch.xpack.sql.plan.physical.EsQueryExec executable query} + *
*
*
*
{@link org.elasticsearch.xpack.sql.execution Execution}
- *
Actual execution of the query, results retrieval, extractions and translation into a {@link org.elasticsearch.xpack.sql.session.RowSet tabular} format.
+ *
Actual execution of the query, results retrieval, extractions and translation + * into a {@link org.elasticsearch.xpack.sql.session.RowSet tabular} format.
*
*/ -package org.elasticsearch.xpack.sql; \ No newline at end of file +package org.elasticsearch.xpack.sql;