docs: update future development blurbs (#16939)

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
This commit is contained in:
317brian 2024-10-01 15:02:05 -07:00 committed by GitHub
parent 878adff9aa
commit 1fc82a96bd
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
4 changed files with 11 additions and 17 deletions

View File

@ -105,10 +105,9 @@ for reading from external data sources and publishing new Druid segments.
[**Indexer**](../design/indexer.md) services are an alternative to Middle Managers and Peons. Instead of
forking separate JVM processes per-task, the Indexer runs tasks as individual threads within a single JVM process.
The Indexer is designed to be easier to configure and deploy compared to the Middle Manager + Peon system and to better enable resource sharing across tasks. The Indexer is a newer feature and is currently designated [experimental](../development/experimental.md) due to the fact that its memory management system is still under
development. It will continue to mature in future versions of Druid.
The Indexer is designed to be easier to configure and deploy compared to the MiddleManager + Peon system and to better enable resource sharing across tasks, which can help streaming ingestion. The Indexer is currently designated [experimental](../development/experimental.md).
Typically, you would deploy either Middle Managers or Indexers, but not both.
Typically, you would deploy one of the following: MiddleManagers, [MiddleManager-less ingestion using Kubernetes](../development/extensions-contrib/k8s-jobs.md), or Indexers. You wouldn't deploy more than one of these options.
## Colocation of services

View File

@ -24,8 +24,7 @@ sidebar_label: "Indexer"
-->
:::info
The Indexer is an optional and [experimental](../development/experimental.md) feature.
Its memory management system is still under development and will be significantly enhanced in later releases.
The Indexer is an optional and experimental feature. If you're primarily performing batch ingestion, we recommend you use either the MiddleManager and Peon task execution system or [MiddleManager-less ingestion using Kubernetes](../development/extensions-contrib/k8s-jobs.md). If you're primarily doing streaming ingestion, you may want to try either [MiddleManager-less ingestion using Kubernetes](../development/extensions-contrib/k8s-jobs.md) or the Indexer service.
:::
The Apache Druid Indexer service is an alternative to the Middle Manager + Peon task execution system. Instead of forking a separate JVM process per-task, the Indexer runs tasks as separate threads within a single JVM process.

View File

@ -54,7 +54,7 @@ Additionally, this extension has following configuration.
### Gotchas
- Label/Annotation path in each pod spec MUST EXIST, which is easily satisfied if there is at least one label/annotation in the pod spec already. This limitation may be removed in future.
- Label/Annotation path in each pod spec MUST EXIST, which is easily satisfied if there is at least one label/annotation in the pod spec already.
- All Druid Pods belonging to one Druid cluster must be inside same kubernetes namespace.
- All Druid Pods need permissions to be able to add labels to self-pod, List and Watch other Pods, create and read ConfigMap for leader election. Assuming, "default" service account is used by Druid pods, you might need to add following or something similar Kubernetes Role and Role Binding.

View File

@ -431,25 +431,21 @@ and how to detect it.
3. One common reason for implicit subquery generation is if the types of the two halves of an equality do not match.
For example, since lookup keys are always strings, the condition `druid.d JOIN lookup.l ON d.field = l.field` will
perform best if `d.field` is a string.
4. The join operator must evaluate the condition for each row. In the future, we expect
to implement both early and deferred condition evaluation, which we expect to improve performance considerably for
common use cases.
4. The join operator must evaluate the condition for each row.
5. Currently, Druid does not support pushing down predicates (condition and filter) past a Join (i.e. into
Join's children). Druid only supports pushing predicates into the join if they originated from
above the join. Hence, the location of predicates and filters in your Druid SQL is very important.
Also, as a result of this, comma joins should be avoided.
#### Future work for joins
#### Limitations for joins
Joins are an area of active development in Druid. The following features are missing today but may appear in
future versions:
Joins in Druid have the following limitations:
- Reordering of join operations to get the most performant plan.
- Preloaded dimension tables that are wider than lookups (i.e. supporting more than a single key and single value).
- RIGHT OUTER and FULL OUTER joins in the native query engine. Currently, they are partially implemented. Queries run
- The order of joins is not entirely optimized. Join operations are not reordered to get the most performant plan.
- Preloaded dimension tables that are wider than lookups (i.e. supporting more than a single key and single value) are not supported.
- RIGHT OUTER and FULL OUTER joins in the native query engine are not fully implemented. Queries run
but results are not always correct.
- Performance-related optimizations as mentioned in the [previous section](#join-performance).
- Join conditions on a column containing a multi-value dimension.
- Join conditions on a column can't contain a multi-value dimension.
### `unnest`