Commit Graph

324 Commits

Author SHA1 Message Date
Martijn van Groningen 636e85e5b7
percolator: Hint what clauses are important in a conjunction query based on fields
The percolator field mapper doesn't need to extract all terms and ranges from a bool query with must or filter clauses.
In order to help to default extraction behavior, boost fields can be configured, so that fields that are known for not being
selective enough can be ignored in favor for other fields or clauses with specific fields can forcefully take precedence over other clauses.
This can help selecting clauses for fields that don't match with a lot of percolator queries over other clauses and thus improving performance of the percolate query.

For example a status like field is something that should configured as an ignore field.
Queries on this field tend to match with more documents and so if clauses for this fields
get selected as best clause then that isn't very helpful for the candidate query that the
percolate query generates to filter out percolator queries that are likely not going to match.
2017-08-11 15:32:01 +02:00
Martijn van Groningen b88cfe2008
docs: Use stackexchange based example to make documentation easier to understand 2017-08-04 16:04:26 +02:00
Martijn van Groningen ec7ac32772
docs: document work around for the percolator if query time text analysis is expensive. 2017-07-28 15:04:15 +02:00
Martijn van Groningen 7c3735bdc4
percolator: Store the QueryBuilder's Writable representation instead of its XContent representation.
The Writeble representation is less heavy to parse and that will benefit percolate performance and throughput.

The query builder's binary format has now the same bwc guarentees as the xcontent format.

Added a qa test that verifies that percolator queries written in older versions are still readable by the current version.
2017-07-28 12:24:10 +02:00
Martijn van Groningen 5cf56a846a
docs: Remove incorrect warning
Closes #25935
2017-07-28 10:53:47 +02:00
Colin Goodheart-Smithe f1f1725fcf [DOCS] improve explanation of dynamic mapping setting (#25829)
Closes #25825
2017-07-21 12:24:38 +01:00
Clinton Gormley febb4bf7bc Update removal_of_types.asciidoc
Fixed `include_in_type` -> `include_type_name`
2017-07-20 19:18:51 +02:00
Clinton Gormley f69decf509 NOCONSOLE -> NOTCONSOLE in removal-of-types 2017-07-19 14:06:04 +02:00
Clinton Gormley ff4a2519f2 Update experimental labels in the docs (#25727)
Relates https://github.com/elastic/elasticsearch/issues/19798

Removed experimental label from:
* Painless
* Diversified Sampler Agg
* Sampler Agg
* Significant Terms Agg
* Terms Agg document count error and execution_hint
* Cardinality Agg precision_threshold
* Pipeline Aggregations
* index.shard.check_on_startup
* index.store.type (added warning)
* Preloading data into the file system cache
* foreach ingest processor
* Field caps API
* Profile API

Added experimental label to:
* Moving Average Agg Prediction


Changed experimental to beta for:
* Adjacency matrix agg
* Normalizers
* Tasks API
* Index sorting

Labelled experimental in Lucene:
* ICU plugin custom rules file
* Flatten graph token filter
* Synonym graph token filter
* Word delimiter graph token filter
* Simple pattern tokenizer
* Simple pattern split tokenizer

Replaced experimental label with warning that details may change in the future:
* Analysis explain output format
* Segments verbose output format
* Percentile Agg compression and HDR Histogram
* Percentile Rank Agg HDR Histogram
2017-07-18 14:06:22 +02:00
Simon Willnauer e81804cfa4 Add a shard filter search phase to pre-filter shards based on query rewriting (#25658)
Today if we search across a large amount of shards we hit every shard. Yet, it's quite
common to search across an index pattern for time based indices but filtering will exclude
all results outside a certain time range ie. `now-3d`. While the search can potentially hit
hundreds of shards the majority of the shards might yield 0 results since there is not document
that is within this date range. Kibana for instance does this regularly but used `_field_stats`
to optimize the indexes they need to query. Now with the deprecation of `_field_stats` and it's upcoming removal a single dashboard in kibana can potentially turn into searches hitting hundreds or thousands of shards and that can easily cause search rejections even though the most of the requests are very likely super cheap and only need a query rewriting to early terminate with 0 results.

This change adds a pre-filter phase for searches that can, if the number of shards are higher than a the `pre_filter_shard_size` threshold (defaults to 128 shards), fan out to the shards
and check if the query can potentially match any documents at all. While false positives are possible, a negative response means that no matches are possible. These requests are not subject to rejection and can greatly reduce the number of shards a request needs to hit. The approach here is preferable to the kibana approach with field stats since it correctly handles aliases and uses the correct threadpools to execute these requests. Further it's completely transparent to the user and improves scalability of elasticsearch in general on large clusters.
2017-07-12 22:19:20 +02:00
James Baiera 847378a43b Add another parent value option to join documentation (#25609)
Indexing a join field on a document requires a value of type "object" and two sub fields "name" 
and "parent". The "parent" field is only required on child documents, but the "name" field which 
denotes the name of the relation is always needed. Previously, only the short-hand version of the 
join field was documented. This adds documentation for the long-hand join field data, and 
explicitly points out that just specifying the name of the relation for the field value is a 
convenience shortcut.
2017-07-11 15:36:59 -04:00
Martijn van Groningen d0f9f425bd
parent/child: Removed ParentJoinFieldSubFetchPhase 2017-07-06 13:15:02 +02:00
Adrien Grand 26de905f1e Fix the documentation to state that the `_id` field is indexed. (#25540) 2017-07-05 16:09:31 +02:00
Clinton Gormley 0170e0e8d3 Remove usage of multi-types from the docs and added a page explaining type removal (#25543)
Closes #25401
2017-07-05 12:30:19 +02:00
Martijn van Groningen 9ce9c21b83
docs: added percolator script query limitation 2017-06-28 17:10:30 +02:00
Nathan Taylor 645bb9d0fb Docs: Removed duplicated line in mapping docs 2017-06-21 10:47:19 +02:00
Jim Ferenczi afada69ea9 [Docs] more fix for the parent-join docs 2017-06-16 12:49:16 +02:00
Jim Ferenczi 664193185e [Docs] Fix cross reference for parent-join field 2017-06-16 11:53:16 +02:00
Jim Ferenczi ccb3c9aae7 Add documentation for the new parent-join field (#25227)
* Add documentation for the new parent-join field

This commit adds the docs for the new parent-join field.
It explains how to define, index and query this new field.

Relates #20257
2017-06-16 11:13:23 +02:00
Russ Cam f6821c41d8 Add half_float and scaled float (#22988)
to numeric datatypes
(cherry picked from commit 67ea06145a80d5ec52ba55d1f2e1e8287e1882b1)
2017-06-13 09:54:44 +10:00
Ryan Ernst a03b6c2fa5 Scripting: Change keys for inline/stored scripts to source/id (#25127)
This commit adds back "id" as the key within a script to specify a
stored script (which with file scripts now gone is no longer ambiguous).
It also adds "source" as a replacement for "code". This is in an attempt
to normalize how scripts are specified across both put stored scripts and script usages, including search template requests. This also deprecates the old inline/stored keys.
2017-06-09 08:29:25 -07:00
Jim Ferenczi 8250aa4267 Remove the postings highlighter and make unified the default highlighter choice (#25028)
This change removes the `postings` highlighter. This highlighter has been removed from Lucene master (7.x) because it behaves
exactly like the `unified` highlighter when index_options is set to `offsets`:
https://issues.apache.org/jira/browse/LUCENE-7815

It also makes the `unified` highlighter the default choice for highlighting a field (if `type` is not provided).
The strategy used internally by this highlighter remain the same as before, it checks `term_vectors` first, then `postings` and ultimately it re-analyzes the text.
Ultimately it rewrites the docs so that the options that the `unified` highlighter cannot handle are clearly marked as such.
There are few features that the `unified` highlighter is not able to handle which is why the other highlighters (`plain` and `fvh`) are still available.
I'll open separate issues for these features and we'll deprecate the `fvh` and `plain` highlighters when full support for these features have been added to the `unified`.
2017-06-09 14:09:57 +02:00
Andrey Groshev e4fd8485ce Made the same length of opening and closing lines (#23583) 2017-06-09 00:50:43 -07:00
Jim Ferenczi ad905924ae update docs that claim that classic is the default similarity 2017-06-09 09:22:48 +02:00
Adrien Grand ebf806d38f Reorganize docs of global ordinals. (#24982)
Currently global ordinals are documented under `fielddata`. It moves them to
their own file since they also work with doc values and fielddata is on the way
out.

Closes #23101
2017-06-01 16:47:44 +02:00
markharwood b7197f5e21 SignificantText aggregation - like significant_terms, but for text (#24432)
* SignificantText aggregation - like significant_terms but doesn’t require fielddata=true, recommended used with `sampler` agg to limit expense of tokenizing docs and takes optional `filter_duplicate_text`:true setting to avoid stats skew from repeated sections of text in search results.

Closes #23674
2017-05-24 13:46:43 +01:00
Adrien Grand a72eaa8e0f Identify documents by their `_id`. (#24460)
Now that indices have a single type by default, we can move to the next step
and identify documents using their `_id` rather than the `_uid`.

One notable change in this commit is that I made deletions implicitly create
types. This helps with the live version map in the case that documents are
deleted before the first type is introduced. Otherwise there would be no way
to differenciate `DELETE index/foo/1` followed by `PUT index/foo/1` from
`DELETE index/bar/1` followed by `PUT index/foo/1`, even though those are
different if versioning is involved.
2017-05-09 16:33:52 +02:00
Nicholas Knize 0c4eb0a029 Add new ip_range field type
This commit adds support for indexing and searching a new ip_range field type. Both IPv4 and IPv6 formats are supported. Tests are updated and docs are added.
2017-05-05 09:43:42 -05:00
Nik Everett a01f846226 CONSOLEify a few more docs
Adds CONSOLE to cross-cluster-search docs but skips them for testing
because we don't have a second cluster set up. This gets us the
`VIEW IN CONSOLE` and `COPY AS CURL` links and makes sure that they
are valid yaml (not json, technically) but doesn't get testing.
Which is better than we had before.

Adds CONSOLE to the dynamic templates docs and ingest-node docs.
The ingest-node docs contain a *ton* of non-console snippets. We
might want to convert them to full examples later, but that can be
a separate thing.

Relates to #18160
2017-05-04 21:01:14 -04:00
Adrien Grand 1be2800120 Only allow one type on 7.0 indices (#24317)
This adds the `index.mapping.single_type` setting, which enforces that indices
have at most one type when it is true. The default value is true for 6.0+ indices
and false for old indices.

Relates #15613
2017-04-27 08:43:20 +02:00
Danilo Akamine 0adaf9fb4c Drop `search_analyzer` parameter from keyword.asciidoc (#24221)
`search_analyzer` isn't supported by `keyword` fields so this removes
it from the documentation for them.
2017-04-25 12:49:50 -04:00
Nik Everett e429d66956 CONSOLEify some more docs
Relates to #18160
2017-04-24 16:08:19 -04:00
Fabien Baligand 4a45579506 token_count type : add an option to count tokens (fix #23227) (#24175)
Add option "enable_position_increments" with default value true.
If option is set to false, indexed value is the number of tokens
(not position increments count)
2017-04-21 00:53:28 +02:00
Loek van Gool e11d892562 Update field-names-field.asciidoc (#24178)
fix typo in field name
2017-04-19 11:57:37 +02:00
Martijn van Groningen 3d9671a668
[PERCOLATOR] Allowing range queries with now ranges inside percolator queries.
Before now ranges where forbidden, because the percolator query itself could get cached and then the percolator queries with now ranges that should no longer match, incorrectly will continue to match.
By disabling caching when the `percolator` is being used, the percolator can now correctly support range queries with now based ranges.

 I think this is the right tradeoff. The percolator query is likely to not be the same between search requests and disabling range queries with now ranges really disabled people using the percolator for their use cases.

 Also fixed an issue that existed in the percolator fieldmapper, it was unable to find forbidden queries inside `dismax` queries.

 Closes #23859
2017-04-07 08:44:43 +02:00
Lee Hinman b6b9ef8e26 [DOCS] Remove line about eager loading global ordinals
Fielddata can no longer be configured to be loaded eagerly (it only accepts
`true` and `false`), so this line is a little misleading because it talks about
a procedure we can no longer do.
2017-04-03 12:56:21 -06:00
Nik Everett 653f50973a CONSOLEify geo-shape docs
`CONSOLE`ify geo-shape type and geo-shape query docs.

Relates to #18160
2017-03-31 09:11:54 -04:00
Nik Everett 5f91241f57 CONSOLEify geo aggregation docs
Turns the top example in each of the geo aggregation docs into a working
example that can be opened in CONSOLE. Subsequent examples can all also
be opened in console and will work after you've run the first example.
All examples are tested as part of the build.
2017-03-30 21:28:52 -04:00
Ali Beyad 8359dd05c9 Adds boolean similarity to Elasticsearch (#23637)
This commit adds the boolean similarity scoring from Lucene to
Elasticsearch.  The boolean similarity provides a means to specify that
a field should not be scored with typical full-text ranking algorithms,
but rather just whether the query terms match the document or not.
Boolean similarity scores a query term equal to its query boost only.
Boolean similarity is available as a default similarity option and thus
a field can be specified to have boolean similarity by declaring in its
mapping:
    "similarity": "boolean"

Closes #6731
2017-03-28 10:17:23 -04:00
Martijn van Groningen b116b8f0cb
[DOCS] Update the docs about the fact that global ordinals for _parent field are loaded eagerly instead of lazily by default.
Relates to #8053
2017-03-22 10:39:39 +01:00
Lee Hinman b3c27a7fdd Disallow include_in_all for 6.0+ indices
Since `_all` is now deprecated and cannot be set for new indices, we should also
disallow any field that has the `include_in_all` parameter set.

Resolves #22923
2017-02-07 19:31:51 -07:00
AlexNodex fb8bdbc57a Update typo in date (#22955)
your example has yyy and it should be yyyy
2017-02-03 13:16:17 +01:00
Clinton Gormley 19ce039d2d Update type-field.asciidoc
Wildcard type names are not supported
2017-01-27 17:50:28 +01:00
Yannick Welsch 881993de3a [Docs] Remove outdated info about enabling/disabling doc_values (#22694) 2017-01-19 17:33:40 +01:00
Daniel Mitterdorfer aece89d6a1 Make boolean conversion strict (#22200)
This PR removes all leniency in the conversion of Strings to booleans: "true"
is converted to the boolean value `true`, "false" is converted to the boolean
value `false`. Everything else raises an error.
2017-01-19 07:59:18 +01:00
Scott Somerville 372812da98 Allow an index to be partitioned with custom routing (#22274)
This change makes it possible for custom routing values to go to a subset of shards rather than
just a single shard. This enables the ability to utilize the spatial locality that custom routing can
provide while mitigating the likelihood of ending up with an imbalanced cluster or suffering
from a hot shard.

This is ideal for large multi-tenant indices with custom routing that suffer from one or both of
the following:
- The big tenants cannot fit into a single shard or there is so many of them that they will likely
end up on the same shard
- Tenants often have a surge in write traffic and a single shard cannot process it fast enough

Beyond that, this should also be useful for use cases where most queries are done under the context
of a specific field (e.g. a category) since it gives a hint at how the data can be stored to minimize
the number of shards to check per query. While a similar solution can be achieved with multiple
concrete indices or aliases per value today, those approaches breakdown for high cardinality fields.

A partitioned index enforces that mappings have routing required, that the partition size does not
change when shrinking an index (the partitions will shrink proportionally), and rejects mappings
that have parent/child relationships.

Closes #21585
2017-01-18 08:51:23 +01:00
Alex a0c83c4511 Minor doc changes to clarify mapping index param for string type (#22652)
* Grammatical correction

* Add note for legacy string mapping type

* Update truncate token filter to not mention the keyword tokenizer

The advice predates the existence of the keyword field

Closes #22650
2017-01-17 16:43:11 +01:00
Lee Hinman 7a18bb50fc Disable _all by default
This change disables the _all meta field by default.

Now that we have the "all-fields" method of query execution, we can save both
indexing time and disk space by disabling it.

_all can no longer be configured for indices created after 6.0.

Relates to #20925 and #21341
Resolves #19784
2017-01-11 16:47:13 -07:00
Nik Everett 75d5b3d9eb Fix parent_id example in docs
And fix some indentation I noticed while looking up the query.
2017-01-10 10:01:31 -05:00
Clinton Gormley cb7952e71d Docs: Parent field is no longer indexed and should use parent_id instead of term query
Closes #22517
2017-01-10 13:48:07 +01:00