Commit Graph

121 Commits

Author SHA1 Message Date
Clinton Gormley 22f1acde94 Docs: Pattern analyzer does not support a max_token_length parameter
Closes #20713
2016-10-08 12:27:33 +02:00
Alexander Lin 7cd0316b51 Fix minhash docs level
Relates #20547
2016-09-19 07:54:04 -04:00
Clinton Gormley 2f6d0119f1 Added warning messages about the dangers of pathological regexes to:
* pattern-replace charfilter
* pattern-capture and pattern-replace token filters
* pattern tokenizer
* pattern analyzer

Relates to #20038
2016-09-09 09:53:07 +02:00
Alexander Lin f825e8f4cb Exposing lucene 6.x minhash filter. (#20206)
Exposing lucene 6.x minhash tokenfilter

Generate min hash tokens from an incoming stream of tokens that can
be used to estimate document similarity.

Closes #20149
2016-09-07 09:38:12 +02:00
Jim Ferenczi 4682fc34ae Add the ability to disable the retrieval of the stored fields entirely
This change adds a special field named _none_ that allows to disable the retrieval of the stored fields in a search request or in a TopHitsAggregation.

To completely disable stored fields retrieval (including disabling metadata fields retrieval such as _id or _type) use _none_ like this:

````
POST _search
{
   "stored_fields": "_none_"
}
````
2016-08-24 16:40:08 +02:00
markwalkom f556424ab9 Update synonym-tokenfilter.asciidoc (#19988)
* Update synonym-tokenfilter.asciidoc

* Update synonym-tokenfilter.asciidoc
2016-08-17 13:39:22 +02:00
Nik Everett 7aeea764ba Remove wait_for_status=yellow from the docs
It is no longer required after 687e2e12b3.
2016-07-15 16:02:07 -04:00
Clinton Gormley 6f17736eb1 Fixed asciidoc 2016-07-15 12:58:38 +02:00
Jim Ferenczi 881afcba60 Fixed tests that failed now that BM25 is the default similarity. 2016-06-21 15:42:42 +02:00
Nik Everett a0585269be [docs] s/lags/Flags/
Copy and paste lots an `F`.
2016-06-09 13:08:53 -04:00
Nik Everett 09cc4c449a [docs] Pattern replace char filter now support flags 2016-06-09 12:41:20 -04:00
Clinton Gormley 5da9e5dcbc Docs: Improved tokenizer docs (#18356)
* Docs: Improved tokenizer docs

Added descriptions and runnable examples

* Addressed Nik's comments

* Added TESTRESPONSEs for all tokenizer examples

* Added TESTRESPONSEs for all analyzer examples too

* Added docs, examples, and TESTRESPONSES for character filters

* Skipping two tests:

One interprets "$1" as a stack variable - same problem exists with the REST tests

The other because the "took" value is always different

* Fixed tests with "took"

* Fixed failing tests and removed preserve_original from fingerprint analyzer
2016-05-19 19:42:23 +02:00
Nik Everett 8155e1efda [docs] Add wait_for_status=yellow
Another unstable snippet....

https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob-os-compatibility/os=sles/402/console
2016-05-12 17:53:34 -04:00
Zachary Tong 5ee5cc25cc Move AsciiFolding earlier in FingerprintAnalyzer filter chain
Rearranges the FingerprintAnalyzer so that AsciiFolding comes earlier in the chain (after lowercasing, before stop removal, for maximum deduping power)

Closes #18266
2016-05-12 09:34:15 -04:00
Clinton Gormley 97a41ee973 First pass at improving analyzer docs (#18269)
* Docs: First pass at improving analyzer docs

I've rewritten the intro to analyzers plus the docs
for all analyzers to provide working examples.

I've also removed:

* analyzer aliases (see #18244)
* analyzer versions (see #18267)
* snowball analyzer (see #8690)

Next steps will be tokenizers, token filters, char filters

* Fixed two typos
2016-05-11 14:17:56 +02:00
Clinton Gormley 3f594089c2 Renamed all AUTOSENSE snippets to CONSOLE (#18210) 2016-05-09 15:42:23 +02:00
Nik Everett 3912761572 [docs] Add wait_until_yellow to fix build failure
The snippet in the docs creates and index and uses it with the
_analyze api. The trouble is that if the index hasn't been created
fully the _analyze API will fail. This adds a
GET _cluster/health?wait_for_status=yellow
which fixes the issue.

While this does make the docs more cluttered, it also makes the snippets
actually runnable.

Closes #18165
2016-05-05 16:02:00 -04:00
Nik Everett 4b1c116461 Generate and run tests from the docs
Adds infrastructure so `gradle :docs:check` will extract tests from
snippets in the documentation and execute the tests. This is included
in `gradle check` so it should happen on CI and during a normal build.

By default each `// AUTOSENSE` snippet creates a unique REST test. These
tests are executed in a random order and the cluster is wiped between
each one. If multiple snippets chain together into a test you can annotate
all snippets after the first with `// TEST[continued]` to have the
generated tests for both snippets joined.

Snippets marked as `// TESTRESPONSE` are checked against the response
of the last action.

See docs/README.asciidoc for lots more.

Closes #12583. That issue is about catching bugs in the docs during build.
This catches *some* bugs in the docs during build which is a good start.
2016-05-05 13:58:03 -04:00
Zachary Tong 80288ad60c Add `fingerprint` token filter and `fingerprint` analyzer
Adds a `fingerprint` token filter which uses Lucene's FingerprintFilter,
and a `fingerprint` analyzer that combines the Fingerprint filter with
lowercasing, stop word removal and asciifolding.

Closes #13325
2016-04-20 16:10:56 -04:00
Clinton Gormley a62b9296c6 Docs: Fixed link to phonetic plugin 2016-04-13 10:17:46 +02:00
Adrien Grand b42f66c8ac Document 5.0 mapping changes. 2016-03-22 16:22:58 +01:00
Clinton Gormley dc21ab7576 Docs: Corrected behaviour of max_token_length in standard tokenizer 2016-03-18 10:58:16 +01:00
Clinton Gormley a5a9bbfe88 Update compound-word-tokenfilter.asciidoc
Only FOP v1.2 compatible hyphenation files are supported by the hyphenation decompounder
2016-03-11 15:08:36 +01:00
Lee Hinman 6adbbff97c Fix organization rename in all files in project
Basically a query-replace of "https://github.com/elasticsearch/" with "https://github.com/elastic/"
2016-03-03 12:04:13 -07:00
Andrey Ryaguzov f744c3f724 Docs: Added migration description for custom analysis file path
Closes #15597
Closes #15556
2016-02-29 20:56:19 +01:00
Dongjoon Hyun 21ea552070 Fix typos in docs. 2016-02-09 02:07:32 -08:00
Adrien Grand f8e802c028 Merge pull request #15794 from damienalexandre/french-doc
[Doc] Fix french analyzer elision token filter doc
2016-01-06 18:39:26 +01:00
Damien Alexandre 23a64f8214 Fix french analyzer elision token filter doc
Fix #15774
2016-01-06 18:26:03 +01:00
David Pilato 995e796eab [doc] Fix cross link with ICU plugin
Doc bug introduced with #15695
2015-12-30 12:07:33 +01:00
David Pilato 3076377fdb Remove ICU Plugin in reference guide
This documentation lives now in plugins documentation at https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-icu.html.

We don't need a copy in analysis reference guide.
2015-12-29 11:23:28 +01:00
socurites 485915bbe7 comma(,) was duplicated
deleted it.
2015-12-24 14:31:26 +01:00
socurites 25d23091e2 Edge NGram: "side" setting was depercated
Edge NGram: "side" setting was depercated
2015-12-24 14:26:24 +01:00
Jason Tedor d9a24961c5 Fix minor issues in delimited payload token filter docs
This commit addresses a few minor issues in the delimited payload token
filter docs:
  - the provided example reversed the payloads associated with the
    tokens "the" and "fox"
  - two additional typos in the same sentence
    - "per default" -> "by default"
    - "default int to" -> "default into"
  - adds two serial commas
2015-12-16 13:00:20 -05:00
tomoya yokota 82d26c852a property name is not right
`ignore_script` is not right. `ignored_script' is right.

See org.elasticsearch.index.analysis.CJKBigramFilterFactory
2015-11-26 14:22:23 +09:00
Clinton Gormley 98028419a5 Merge pull request #14610 from yokotaso/patch-1
Update snowball document page.
2015-11-17 14:17:30 +01:00
Jason O'Donnell 42fb690a1c Fixing typo 2015-10-26 16:46:36 -04:00
Adrien Grand d3aa3565db Deprecate `index.analysis.analyzer.default_index` in favor of `index.analysis.analyzer.default`.
Close #11861
2015-10-12 22:19:16 +02:00
Clinton Gormley 1f76f49003 Update compound-word-tokenfilter.asciidoc
Improved the docs for compound work token filter.

Closes #13670
Closes #13595
2015-09-21 11:22:14 +02:00
Robert Muir f216d92d19 Upgrade to lucene 5.4-snapshot r1701068 2015-09-03 15:13:33 -04:00
Robert Muir 0d3e3f81fc Lithuanian analysis 2015-09-01 08:52:10 -04:00
xuzha fb2be6d6a1 The name "position_offset_gap" is confusing because Lucene has three
similar sounding things:

* Analyzer#getPositionIncrementGap
* Analyzer#getOffsetGap
* IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS and
* FieldType#storeTermVectorOffsets

Rename position_offset_gap to position_increment_gap
closes #13056
2015-08-26 14:56:35 -07:00
Nik Everett 4b9664beeb Mapping: Default position_offset_gap to 100
This is much more fiddly than you'd expect it to be because of the way
position_offset_gap is applied in StringFieldMapper. Instead of setting
the default to 100 its simpler to make sure that all the analyzers default
to 100 and that StringFieldMapper doesn't override the default unless the
user specifies something different. Unless the index was created before
2.1, in which case the old default of 0 has to take.

Also postition_offset_gaps less than 0 aren't allowed at all.

New tests test that:
1. the new default doesn't match phrases across values with reasonably low
slop (5)
2. the new default doest match phrases across values with reasonably high
slop (50)
3. you can override the value and phrases work as you'd expect
4. if you leave the value undefined in the mapping and define it on a
custom analyzer the the value from the custom analyzer shines through

Closes #7268
2015-08-25 14:21:50 -04:00
Clinton Gormley 2b512f1f29 Docs: Use "js" instead of "json" and "sh" instead of "shell" for source highlighting 2015-07-14 18:14:09 +02:00
Britta Weber eeeb29f900 spell correct and add single quotes 2015-05-26 11:41:19 +02:00
Britta Weber 37782c1745 analyzers: custom analyzers names and aliases must not start with _
closes #9596
2015-05-26 11:38:15 +02:00
Clinton Gormley 3a69b65e88 Docs: Fixed the backslash escaping on the pattern analyzer docs
Closes #11099
2015-05-15 18:40:16 +02:00
Ryan Ernst ba68d354c4 Merge pull request #10934 from mattweber/custom_analyzer_pos_offset_gap
document and test custom analyzer position offset gap
2015-05-04 08:56:50 -07:00
Matt Weber 63c4a214db document and test custom analyzer position offset gap 2015-05-04 08:53:45 -07:00
Robert Muir 4b3672b7df Add migration note for hunspell dictionaries 2015-05-04 10:00:05 -04:00
Clinton Gormley cf177c32d4 Docs: Fixed pattern-capture token filter example
Closes #10690
2015-04-25 19:27:55 +02:00