* update link and title
* Discard changes to website/package.json
* Apply suggestions from code review
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
---------
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
index_realtime tasks were removed from the documentation in #13107. Even
at that time, they weren't really documented per se— just mentioned. They
existed solely to support Tranquility, which is an obsolete ingestion
method that predates migration of Druid to ASF and is no longer being
maintained. Tranquility docs were also de-linked from the sidebars and
the other doc pages in #11134. Only a stub remains, so people with
links to the page can see that it's no longer recommended.
index_realtime_appenderator tasks existed in the code base, but were
never documented, nor as far as I am aware were they used for any purpose.
This patch removes both task types completely, as well as removes all
supporting code that was otherwise unused. It also updates the stub
doc for Tranquility to be firmer that it is not compatible. (Previously,
the stub doc said it wasn't recommended, and pointed out that it is
built against an ancient 0.9.2 version of Druid.)
ITUnionQueryTest has been migrated to the new integration tests framework and updated to use Kafka ingestion.
Co-authored-by: Gian Merlino <gianmerlino@gmail.com>
* Initial support for bootstrap segments.
- Adds a new API in the coordinator.
- All processes that have storage locations configured (including tasks)
talk to the coordinator if they can, and fetch bootstrap segments from it.
- Then load the segments onto the segment cache as part of startup.
- This addresses the segment bootstrapping logic required by processes before
they can start serving queries or ingesting.
This patch also lays the foundation to speed up upgrades.
* Fail open by default if there are any errors talking to the coordinator.
* Add test for failure scenario and cleanup logs.
* Cleanup and add debug log
* Assert the events so we know the list exactly.
* Revert RunRules test.
The rules aren't evaluated if there are no clusters.
* Revert RunRulesTest too.
* Remove debug info.
* Make the API POST and update log.
* Fix up UTs.
* Throw 503 from MetadataResource; clean up exception handling and DruidException.
* Remove unused logger, add verification of metrics and docs.
* Update error message
* Update server/src/main/java/org/apache/druid/server/coordination/SegmentLoadDropHandler.java
Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
* Apply suggestions from code review
Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
* Adjust test metric expectations with the rename.
* Add BootstrapSegmentResponse container in the response for future extensibility.
* Rename to BootstrapSegmentsInfo for internal consistency.
* Remove unused log.
* Use a member variable for broadcast segments instead of segmentAssigner.
* Minor cleanup
* Add test for loadable bootstrap segments and clarify comment.
* Review suggestions.
---------
Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
* Link fix
* Update docs/operations/auth.md
Co-authored-by: Andreas Maechler <amaechler@gmail.com>
---------
Co-authored-by: Andreas Maechler <amaechler@gmail.com>
Update default value of druid.indexer.tasklock.batchAllocationWaitTime to 0.
Thus, a segment allocation request is processed immediately unless there are already some requests queued before this one. While in queue, a segment allocation request may get clubbed together with other similar requests into a batch to reduce load on the metadata store.
* Simplify serialized form of JsonInputFormat.
Use JsonInclude for keepNullColumns, assumeNewlineDelimited, and
useJsonNodeReader. Because the default value of keepNullColumns is
variable, we store the original configured value rather than the
derived value, and include if the original value is nonnull.
* Fix test.
Thanks for your contribution @amit-git-account
* Added new use cases and description of the use case - 5/14/24
The use case listing is not changed in a long time. While speaking with users, I came across several other use cases not listed here in the index. So I added new use cases and also added description against the use cases.
* Apply suggestions from code review
Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
* update spelling file
* Update docs/design/index.md
---------
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
Co-authored-by: Benedict Jin <asdf2014@apache.org>
* Speed up SQL IN using SCALAR_IN_ARRAY.
Main changes:
1) DruidSqlValidator now includes a rewrite of IN to SCALAR_IN_ARRAY, when the size of
the IN is above inFunctionThreshold. The default value of inFunctionThreshold
is 100. Users can restore the prior behavior by setting it to Integer.MAX_VALUE.
2) SearchOperatorConversion now generates SCALAR_IN_ARRAY when converting to a regular
expression, when the size of the SEARCH is above inFunctionExprThreshold. The default
value of inFunctionExprThreshold is 2. Users can restore the prior behavior by setting
it to Integer.MAX_VALUE.
3) ReverseLookupRule generates SCALAR_IN_ARRAY if the set of reverse-looked-up values is
greater than inFunctionThreshold.
* Revert test.
* Additional coverage.
* Update docs/querying/sql-query-context.md
Co-authored-by: Benedict Jin <asdf2014@apache.org>
* New test.
---------
Co-authored-by: Benedict Jin <asdf2014@apache.org>
Add validation for reindex with realtime sources.
With the addition of concurrent compaction, it is possible to ingest data while querying from realtime sources with MSQ into the same datasource. This could potentially lead to issues if the interval that is ingested into is replaced by an MSQ job, which has queried only some of the data from the realtime task.
This PR adds validation to check that the datasource being ingested into is not being queried from, if the query includes realtime sources.
Follow up to #15705
Changes:
- Remove references to ZK-based segment loading in the docs
- Fix doc for existing config `druid.coordinator.loadqueuepeon.http.repeatDelay`
* Delta Lake support for filters.
* Updates
* cleanup comments
* Docs
* Remmove Enclosed runner
* Rename
* Cleanup test
* Serde test for the Delta input source and fix jackson annotation.
* Updates and docs.
* Update error messages to be clearer
* Fixes
* Handle NumberFormatException to provide a nicer error message.
* Apply suggestions from code review
Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
* Doc fixes based on feedback
* Yes -> yes in docs; reword slightly.
* Update docs/ingestion/input-sources.md
Co-authored-by: Laksh Singla <lakshsingla@gmail.com>
* Update docs/ingestion/input-sources.md
Co-authored-by: Laksh Singla <lakshsingla@gmail.com>
* Documentation, javadoc and more updates.
* Not with an or expression end-to-end test.
* Break up =, >, >=, <, <= into its own types instead of sub-classing.
---------
Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
Co-authored-by: Laksh Singla <lakshsingla@gmail.com>
Changes:
- Add new config `lagAggregate` to `LagBasedAutoScalerConfig`
- Add field `aggregateForScaling` to `LagStats`
- Use the new field/config to determine which aggregate to use to compute lag
- Remove method `Supervisor.computeLagForAutoScaler()`
* Four changes to scalar_in_array as follow-ups to #16306:
1) Align behavior for `null` scalars to the behavior of the native `in` and `inType` filters: return `true` if the array itself contains null, else return `null`.
2) Rename the class to more closely match the function name.
3) Add a specialization for constant arrays, where we build a `HashSet`.
4) Use `castForEqualityComparison` to properly handle cross-type comparisons.
Additional tests verify comparisons between LONG and DOUBLE are now
handled properly.
* Fix spelling.
* Adjustments from review.
Issue: #14989
The initial step in optimizing segment metadata was to centralize the construction of datasource schema in the Coordinator (#14985). Thereafter, we addressed the problem of publishing schema for realtime segments (#15475). Subsequently, our goal is to eliminate the requirement for regularly executing queries to obtain segment schema information.
This is the final change which involves publishing segment schema for finalized segments from task and periodically polling them in the Coordinator.