OpenSearch

Commit Graph

Author	SHA1	Message	Date
Christoph Büscher	254c1b28e9	[Docs] Clarify behaviour of Pattern Capture Token Filter during search (#26278 ) There was some confusion about the fact that tokens emitted from a Pattern Capture Token Filter are treated as synonyms when used to analyze a search query. This commit adds an explanation to the note in the docs to emphasize this behaviour. Closes #25746	2017-08-21 14:56:52 +02:00
Clinton Gormley	ff4a2519f2	Update experimental labels in the docs (#25727 ) Relates https://github.com/elastic/elasticsearch/issues/19798 Removed experimental label from: * Painless * Diversified Sampler Agg * Sampler Agg * Significant Terms Agg * Terms Agg document count error and execution_hint * Cardinality Agg precision_threshold * Pipeline Aggregations * index.shard.check_on_startup * index.store.type (added warning) * Preloading data into the file system cache * foreach ingest processor * Field caps API * Profile API Added experimental label to: * Moving Average Agg Prediction Changed experimental to beta for: * Adjacency matrix agg * Normalizers * Tasks API * Index sorting Labelled experimental in Lucene: * ICU plugin custom rules file * Flatten graph token filter * Synonym graph token filter * Word delimiter graph token filter * Simple pattern tokenizer * Simple pattern split tokenizer Replaced experimental label with warning that details may change in the future: * Analysis explain output format * Segments verbose output format * Percentile Agg compression and HDR Histogram * Percentile Rank Agg HDR Histogram	2017-07-18 14:06:22 +02:00
Neil Rickards	5189bd14f1	[Docs] Fix typo in pattern-tokenizer.asciidoc (#25626 )	2017-07-13 18:43:48 +02:00
Simon Willnauer	e81804cfa4	Add a shard filter search phase to pre-filter shards based on query rewriting (#25658 ) Today if we search across a large amount of shards we hit every shard. Yet, it's quite common to search across an index pattern for time based indices but filtering will exclude all results outside a certain time range ie. `now-3d`. While the search can potentially hit hundreds of shards the majority of the shards might yield 0 results since there is not document that is within this date range. Kibana for instance does this regularly but used `_field_stats` to optimize the indexes they need to query. Now with the deprecation of `_field_stats` and it's upcoming removal a single dashboard in kibana can potentially turn into searches hitting hundreds or thousands of shards and that can easily cause search rejections even though the most of the requests are very likely super cheap and only need a query rewriting to early terminate with 0 results. This change adds a pre-filter phase for searches that can, if the number of shards are higher than a the `pre_filter_shard_size` threshold (defaults to 128 shards), fan out to the shards and check if the query can potentially match any documents at all. While false positives are possible, a negative response means that no matches are possible. These requests are not subject to rejection and can greatly reduce the number of shards a request needs to hit. The approach here is preferable to the kibana approach with field stats since it correctly handles aliases and uses the correct threadpools to execute these requests. Further it's completely transparent to the user and improves scalability of elasticsearch in general on large clusters.	2017-07-12 22:19:20 +02:00
Jun Ohtani	62d1969595	Parse synonyms with the same analysis chain (#8049 ) * [Analysis] Parse synonyms with the same analysis chain Synonym Token Filter / Synonym Graph Filter tokenize synonyms with whatever tokenizer and token filters appear before it in the chain. Close #7199	2017-06-20 21:50:33 +09:00
Andy Bristol	4c5bd57619	Rename simple pattern tokenizers (#25300 ) Changed names to be snake case for consistency Related to #25159, original issue #23363	2017-06-19 13:48:43 -07:00
debadair	c161d90524	[DOCS] Defined es-test-dir and plugins-examples-dir in index.asciidoc. (#25232 ) Use these attributes when specifying the location of included tests.	2017-06-15 08:54:10 -07:00
Adrien Grand	0c117145f6	Upgrade to lucene-7.0.0-snapshot-92b1783. (#25222 ) This snapshot has faster range queries on range fields (LUCENE-7828), more accurate norms (LUCENE-7730) and the ability to use fake term frequencies (LUCENE-7854).	2017-06-15 09:52:07 +02:00
Andy Bristol	48696ab544	expose simple pattern tokenizers (#25159 ) Expose the experimental simplepattern and simplepatternsplit tokenizers in the common analysis plugin. They provide tokenization based on regular expressions, using Lucene's deterministic regex implementation that is usually faster than Java's and has protections against creating too-deep stacks during matching. Both have a not-very-useful default pattern of the empty string because all tokenizer factories must be able to be instantiated at index creation time. They should always be configured by the user in practice.	2017-06-13 12:46:59 -07:00
Jim Ferenczi	2508df6cc8	Add missing link for the WordDelimiterGraphFilter	2017-04-28 17:12:38 +02:00
Adrien Grand	1be2800120	Only allow one type on 7.0 indices (#24317 ) This adds the `index.mapping.single_type` setting, which enforces that indices have at most one type when it is true. The default value is true for 6.0+ indices and false for old indices. Relates #15613	2017-04-27 08:43:20 +02:00
Nik Everett	ad69503dce	CONSOLEify analysis docs Converts the analysis docs to that were marked as json into `CONSOLE` format. A few of them were in yaml but marked as json for historical reasons. I added more complete examples for a few of the less obvious sounding ones. Relates to #18160	2017-04-02 11:17:14 -04:00
Nik Everett	514187be8e	Fix language in some docs The pattern-analyzer docs contained a snippet that was an expanded regex that was marked as `[source,js]`. This changes it to `[source,regex]`. The htmlstrip-charfilter and pattern-replace-charfilter docs had examples that were actually a list of tokens but marked `[source,js]`. This marks them as `[source,text]` so they don't count as unconverted CONSOLE snippets. The pattern-replace-charfilter also had a doc who's test was skipped because of funny interaction with the test framework. This fixes the test. Three more down, eighty-two to go. Relates to #18160	2017-04-01 14:45:44 -04:00
Nik Everett	9baa48a928	CONSOLEify lang-analyzer docs CONSOLEifies the lang-analyzer docs and replaces the (invalid) empty `keyword_marker` setups that were on the page with one that contains the word "example" translated into the appropriate language. Relates to #18160	2017-04-01 14:21:58 -04:00
Abdon Pijpelink	ef1329727d	Update compound-word-tokenfilter.asciidoc (#23817 ) Updated URL to OFFO Sourceforge project	2017-03-30 12:27:32 +02:00
Ali Beyad	2120086d82	Adds pattern keyword marker filter support (#23600 ) This commit adds support for the pattern keyword marker filter in Lucene. Previously, the keyword marker filter in Elasticsearch supported specifying a keywords set or a path to a set of keywords. This commit exposes the regular expression pattern based keyword marker filter also available in Lucene, so that any token matching the pattern specified by the `keywords_pattern` setting is excluded from being stemmed by any stemming filters. Closes #4877	2017-03-28 11:13:34 -04:00
Nik Everett	a783c6c85c	CONSOLEify some more docs And expand on the `stemmer_override` examples, including the file on disk and an example of specifying the rules inline. Relates to #18160	2017-03-22 17:58:06 -04:00
Nik Everett	e860fe7363	CONSOLEify some more docs Relates to #18160	2017-03-22 17:15:14 -04:00
Nik Everett	1dee2f32a4	Docs: CONSOLEify synonym tokenfiler docs Relates to #18160	2017-03-22 16:30:52 -04:00
Nik Everett	1c1b29400b	Docs: Fix language on a few snippets They aren't `js`, they are their own thing. Relates to #18160	2017-03-22 15:57:28 -04:00
Jim Ferenczi	63bdd01eb7	Expose WordDelimiterGraphTokenFilter (#23327 ) This change exposes the new Lucene graph based word delimiter token filter in the analysis filters. Unlike the `word_delimiter` this token filter named `word_delimiter_graph` correctly handles multi terms expansion at query time. Closes #23104	2017-02-24 00:53:38 +01:00
markwalkom	ced99dde50	Update stop-analyzer.asciidoc (#23195 ) Clarified where the stopwords file needs to live	2017-02-16 13:36:15 +01:00
Adrien Grand	f3509b8003	Consolify docs/reference/analysis/tokenfilters/pattern-capture-tokenfilter.asciidoc. (#23050 )	2017-02-13 11:00:12 +01:00
Clinton Gormley	f5e7c25e24	Update normalizers.asciidoc analyzers -> normalizers	2017-02-07 12:09:39 +01:00
Shubham Aggarwal	e07e4cc4dd	Fix incorrect heading for Whitespace Tokenizer (#22883 )	2017-01-31 12:51:37 +01:00
Daniel Mitterdorfer	aece89d6a1	Make boolean conversion strict (#22200 ) This PR removes all leniency in the conversion of Strings to booleans: "true" is converted to the boolean value `true`, "false" is converted to the boolean value `false`. Everything else raises an error.	2017-01-19 07:59:18 +01:00
Michael McCandless	1d1bdd476c	Finish exposing FlattenGraphTokenFilter (#22667 )	2017-01-18 11:05:34 -05:00
Clinton Gormley	519a9c469d	Update truncate token filter to not mention the keyword tokenizer The advice predates the existence of the keyword field Closes #22650	2017-01-17 12:15:22 +01:00
Matt Weber	609d2aab15	QueryString and SimpleQueryString Graph Support (#22541 ) Add support for graph token streams to "query_String" and "simple_query_string" queries.	2017-01-11 18:59:43 +01:00
Achraf	5dc85c25d9	Hindu-Arabico-Latino Numerals (#22476 ) Hi, same edit as for : https://www.elastic.co/guide/en/elasticsearch/reference/current/analyzer-anatomy.html	2017-01-10 15:24:56 +01:00
Adrien Grand	3f805d68cb	Add the ability to set an analyzer on keyword fields. (#21919 ) This adds a new `normalizer` property to `keyword` fields that pre-processes the field value prior to indexing, but without altering the `_source`. Note that only the normalization components that work on a per-character basis are applied, so for instance stemming filters will be ignored while lowercasing or ascii folding will be applied. Closes #18064	2016-12-30 09:36:10 +01:00
Francesc Gil	dec6fc2d40	Repeated language analyzers (#22240 ) * Repeated language analyzers The `catalan` analyzer was repeated on the supported list :) * Reordered the languages to have alphabetic order * Added space for format * Reordered the languages and removed repeated	2016-12-21 17:32:02 +01:00
Thibault Pierre	e494d6a94e	Fix wrong link (#22019 )	2016-12-07 17:58:46 +01:00
Allen Torres	887fbb6387	Update lowercase-tokenizer.asciidoc (#21896 ) Fixed typo	2016-12-02 10:49:51 -05:00
Matt Weber	04e07bcdb6	Synonym Graph Support (LUCENE-6664) (#21517 ) Integrate the patch from LUCENE-6664 into elasticsearch and add support for handling a graph token stream in match/multi-match queries. This fixes longstanding bugs with multi-token synonyms returning incorrect results with proximity queries.	2016-11-28 09:25:49 -08:00
Achraf	d81a928b1f	Correction of the names of numirals (#21531 ) What was called Arabic numerals is actually Hindu - Eastern Arabic notation. And the Latin numerals you refer to is the Arabic numbers.	2016-11-25 14:30:49 +01:00
Pascal Borreli	fcb01deb34	Fixed typos (#20843 )	2016-10-10 14:51:47 -06:00
Clinton Gormley	22f1acde94	Docs: Pattern analyzer does not support a max_token_length parameter Closes #20713	2016-10-08 12:27:33 +02:00
Alexander Lin	7cd0316b51	Fix minhash docs level Relates #20547	2016-09-19 07:54:04 -04:00
Clinton Gormley	2f6d0119f1	Added warning messages about the dangers of pathological regexes to: * pattern-replace charfilter * pattern-capture and pattern-replace token filters * pattern tokenizer * pattern analyzer Relates to #20038	2016-09-09 09:53:07 +02:00
Alexander Lin	f825e8f4cb	Exposing lucene 6.x minhash filter. (#20206 ) Exposing lucene 6.x minhash tokenfilter Generate min hash tokens from an incoming stream of tokens that can be used to estimate document similarity. Closes #20149	2016-09-07 09:38:12 +02:00
Jim Ferenczi	4682fc34ae	Add the ability to disable the retrieval of the stored fields entirely This change adds a special field named _none_ that allows to disable the retrieval of the stored fields in a search request or in a TopHitsAggregation. To completely disable stored fields retrieval (including disabling metadata fields retrieval such as _id or _type) use _none_ like this: ```` POST _search { "stored_fields": "_none_" } ````	2016-08-24 16:40:08 +02:00
markwalkom	f556424ab9	Update synonym-tokenfilter.asciidoc (#19988 ) * Update synonym-tokenfilter.asciidoc * Update synonym-tokenfilter.asciidoc	2016-08-17 13:39:22 +02:00
Nik Everett	7aeea764ba	Remove wait_for_status=yellow from the docs It is no longer required after `687e2e12b3`.	2016-07-15 16:02:07 -04:00
Clinton Gormley	6f17736eb1	Fixed asciidoc	2016-07-15 12:58:38 +02:00
Jim Ferenczi	881afcba60	Fixed tests that failed now that BM25 is the default similarity.	2016-06-21 15:42:42 +02:00
Nik Everett	a0585269be	[docs] s/lags/Flags/ Copy and paste lots an `F`.	2016-06-09 13:08:53 -04:00
Nik Everett	09cc4c449a	[docs] Pattern replace char filter now support flags	2016-06-09 12:41:20 -04:00
Clinton Gormley	5da9e5dcbc	Docs: Improved tokenizer docs (#18356 ) * Docs: Improved tokenizer docs Added descriptions and runnable examples * Addressed Nik's comments * Added TESTRESPONSEs for all tokenizer examples * Added TESTRESPONSEs for all analyzer examples too * Added docs, examples, and TESTRESPONSES for character filters * Skipping two tests: One interprets "$1" as a stack variable - same problem exists with the REST tests The other because the "took" value is always different * Fixed tests with "took" * Fixed failing tests and removed preserve_original from fingerprint analyzer	2016-05-19 19:42:23 +02:00
Nik Everett	8155e1efda	[docs] Add wait_for_status=yellow Another unstable snippet.... https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob-os-compatibility/os=sles/402/console	2016-05-12 17:53:34 -04:00

1 2 3 4

158 Commits