Commit Graph

115 Commits

Author SHA1 Message Date
Nik Everett 7aeea764ba Remove wait_for_status=yellow from the docs
It is no longer required after 687e2e12b3.
2016-07-15 16:02:07 -04:00
Clinton Gormley 6f17736eb1 Fixed asciidoc 2016-07-15 12:58:38 +02:00
Jim Ferenczi 881afcba60 Fixed tests that failed now that BM25 is the default similarity. 2016-06-21 15:42:42 +02:00
Nik Everett a0585269be [docs] s/lags/Flags/
Copy and paste lots an `F`.
2016-06-09 13:08:53 -04:00
Nik Everett 09cc4c449a [docs] Pattern replace char filter now support flags 2016-06-09 12:41:20 -04:00
Clinton Gormley 5da9e5dcbc Docs: Improved tokenizer docs (#18356)
* Docs: Improved tokenizer docs

Added descriptions and runnable examples

* Addressed Nik's comments

* Added TESTRESPONSEs for all tokenizer examples

* Added TESTRESPONSEs for all analyzer examples too

* Added docs, examples, and TESTRESPONSES for character filters

* Skipping two tests:

One interprets "$1" as a stack variable - same problem exists with the REST tests

The other because the "took" value is always different

* Fixed tests with "took"

* Fixed failing tests and removed preserve_original from fingerprint analyzer
2016-05-19 19:42:23 +02:00
Nik Everett 8155e1efda [docs] Add wait_for_status=yellow
Another unstable snippet....

https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob-os-compatibility/os=sles/402/console
2016-05-12 17:53:34 -04:00
Zachary Tong 5ee5cc25cc Move AsciiFolding earlier in FingerprintAnalyzer filter chain
Rearranges the FingerprintAnalyzer so that AsciiFolding comes earlier in the chain (after lowercasing, before stop removal, for maximum deduping power)

Closes #18266
2016-05-12 09:34:15 -04:00
Clinton Gormley 97a41ee973 First pass at improving analyzer docs (#18269)
* Docs: First pass at improving analyzer docs

I've rewritten the intro to analyzers plus the docs
for all analyzers to provide working examples.

I've also removed:

* analyzer aliases (see #18244)
* analyzer versions (see #18267)
* snowball analyzer (see #8690)

Next steps will be tokenizers, token filters, char filters

* Fixed two typos
2016-05-11 14:17:56 +02:00
Clinton Gormley 3f594089c2 Renamed all AUTOSENSE snippets to CONSOLE (#18210) 2016-05-09 15:42:23 +02:00
Nik Everett 3912761572 [docs] Add wait_until_yellow to fix build failure
The snippet in the docs creates and index and uses it with the
_analyze api. The trouble is that if the index hasn't been created
fully the _analyze API will fail. This adds a
GET _cluster/health?wait_for_status=yellow
which fixes the issue.

While this does make the docs more cluttered, it also makes the snippets
actually runnable.

Closes #18165
2016-05-05 16:02:00 -04:00
Nik Everett 4b1c116461 Generate and run tests from the docs
Adds infrastructure so `gradle :docs:check` will extract tests from
snippets in the documentation and execute the tests. This is included
in `gradle check` so it should happen on CI and during a normal build.

By default each `// AUTOSENSE` snippet creates a unique REST test. These
tests are executed in a random order and the cluster is wiped between
each one. If multiple snippets chain together into a test you can annotate
all snippets after the first with `// TEST[continued]` to have the
generated tests for both snippets joined.

Snippets marked as `// TESTRESPONSE` are checked against the response
of the last action.

See docs/README.asciidoc for lots more.

Closes #12583. That issue is about catching bugs in the docs during build.
This catches *some* bugs in the docs during build which is a good start.
2016-05-05 13:58:03 -04:00
Zachary Tong 80288ad60c Add `fingerprint` token filter and `fingerprint` analyzer
Adds a `fingerprint` token filter which uses Lucene's FingerprintFilter,
and a `fingerprint` analyzer that combines the Fingerprint filter with
lowercasing, stop word removal and asciifolding.

Closes #13325
2016-04-20 16:10:56 -04:00
Clinton Gormley a62b9296c6 Docs: Fixed link to phonetic plugin 2016-04-13 10:17:46 +02:00
Adrien Grand b42f66c8ac Document 5.0 mapping changes. 2016-03-22 16:22:58 +01:00
Clinton Gormley dc21ab7576 Docs: Corrected behaviour of max_token_length in standard tokenizer 2016-03-18 10:58:16 +01:00
Clinton Gormley a5a9bbfe88 Update compound-word-tokenfilter.asciidoc
Only FOP v1.2 compatible hyphenation files are supported by the hyphenation decompounder
2016-03-11 15:08:36 +01:00
Lee Hinman 6adbbff97c Fix organization rename in all files in project
Basically a query-replace of "https://github.com/elasticsearch/" with "https://github.com/elastic/"
2016-03-03 12:04:13 -07:00
Andrey Ryaguzov f744c3f724 Docs: Added migration description for custom analysis file path
Closes #15597
Closes #15556
2016-02-29 20:56:19 +01:00
Dongjoon Hyun 21ea552070 Fix typos in docs. 2016-02-09 02:07:32 -08:00
Adrien Grand f8e802c028 Merge pull request #15794 from damienalexandre/french-doc
[Doc] Fix french analyzer elision token filter doc
2016-01-06 18:39:26 +01:00
Damien Alexandre 23a64f8214 Fix french analyzer elision token filter doc
Fix #15774
2016-01-06 18:26:03 +01:00
David Pilato 995e796eab [doc] Fix cross link with ICU plugin
Doc bug introduced with #15695
2015-12-30 12:07:33 +01:00
David Pilato 3076377fdb Remove ICU Plugin in reference guide
This documentation lives now in plugins documentation at https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-icu.html.

We don't need a copy in analysis reference guide.
2015-12-29 11:23:28 +01:00
socurites 485915bbe7 comma(,) was duplicated
deleted it.
2015-12-24 14:31:26 +01:00
socurites 25d23091e2 Edge NGram: "side" setting was depercated
Edge NGram: "side" setting was depercated
2015-12-24 14:26:24 +01:00
Jason Tedor d9a24961c5 Fix minor issues in delimited payload token filter docs
This commit addresses a few minor issues in the delimited payload token
filter docs:
  - the provided example reversed the payloads associated with the
    tokens "the" and "fox"
  - two additional typos in the same sentence
    - "per default" -> "by default"
    - "default int to" -> "default into"
  - adds two serial commas
2015-12-16 13:00:20 -05:00
tomoya yokota 82d26c852a property name is not right
`ignore_script` is not right. `ignored_script' is right.

See org.elasticsearch.index.analysis.CJKBigramFilterFactory
2015-11-26 14:22:23 +09:00
Clinton Gormley 98028419a5 Merge pull request #14610 from yokotaso/patch-1
Update snowball document page.
2015-11-17 14:17:30 +01:00
Jason O'Donnell 42fb690a1c Fixing typo 2015-10-26 16:46:36 -04:00
Adrien Grand d3aa3565db Deprecate `index.analysis.analyzer.default_index` in favor of `index.analysis.analyzer.default`.
Close #11861
2015-10-12 22:19:16 +02:00
Clinton Gormley 1f76f49003 Update compound-word-tokenfilter.asciidoc
Improved the docs for compound work token filter.

Closes #13670
Closes #13595
2015-09-21 11:22:14 +02:00
Robert Muir f216d92d19 Upgrade to lucene 5.4-snapshot r1701068 2015-09-03 15:13:33 -04:00
Robert Muir 0d3e3f81fc Lithuanian analysis 2015-09-01 08:52:10 -04:00
xuzha fb2be6d6a1 The name "position_offset_gap" is confusing because Lucene has three
similar sounding things:

* Analyzer#getPositionIncrementGap
* Analyzer#getOffsetGap
* IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS and
* FieldType#storeTermVectorOffsets

Rename position_offset_gap to position_increment_gap
closes #13056
2015-08-26 14:56:35 -07:00
Nik Everett 4b9664beeb Mapping: Default position_offset_gap to 100
This is much more fiddly than you'd expect it to be because of the way
position_offset_gap is applied in StringFieldMapper. Instead of setting
the default to 100 its simpler to make sure that all the analyzers default
to 100 and that StringFieldMapper doesn't override the default unless the
user specifies something different. Unless the index was created before
2.1, in which case the old default of 0 has to take.

Also postition_offset_gaps less than 0 aren't allowed at all.

New tests test that:
1. the new default doesn't match phrases across values with reasonably low
slop (5)
2. the new default doest match phrases across values with reasonably high
slop (50)
3. you can override the value and phrases work as you'd expect
4. if you leave the value undefined in the mapping and define it on a
custom analyzer the the value from the custom analyzer shines through

Closes #7268
2015-08-25 14:21:50 -04:00
Clinton Gormley 2b512f1f29 Docs: Use "js" instead of "json" and "sh" instead of "shell" for source highlighting 2015-07-14 18:14:09 +02:00
Britta Weber eeeb29f900 spell correct and add single quotes 2015-05-26 11:41:19 +02:00
Britta Weber 37782c1745 analyzers: custom analyzers names and aliases must not start with _
closes #9596
2015-05-26 11:38:15 +02:00
Clinton Gormley 3a69b65e88 Docs: Fixed the backslash escaping on the pattern analyzer docs
Closes #11099
2015-05-15 18:40:16 +02:00
Ryan Ernst ba68d354c4 Merge pull request #10934 from mattweber/custom_analyzer_pos_offset_gap
document and test custom analyzer position offset gap
2015-05-04 08:56:50 -07:00
Matt Weber 63c4a214db document and test custom analyzer position offset gap 2015-05-04 08:53:45 -07:00
Robert Muir 4b3672b7df Add migration note for hunspell dictionaries 2015-05-04 10:00:05 -04:00
Clinton Gormley cf177c32d4 Docs: Fixed pattern-capture token filter example
Closes #10690
2015-04-25 19:27:55 +02:00
Benoit Delbosc 4a94e1f14b Docs: Warning about the conflict with the Standard Tokenizer
The examples given requires a specific Tokenizer to work.

Closes: 10645
2015-04-23 21:16:30 +02:00
Glen Smith 3d5fbfb997 Docs: Update pattern-replace-charfilter.asciidoc
Remove invalid trailing comma from json

Closes #9477
2015-01-29 20:24:08 +01:00
Lee Hinman 2f6527f491 [DOCS] Update documentation for `max_token_length`
In 1.4 the behavior is different due to
https://issues.apache.org/jira/browse/LUCENE-5897
2015-01-27 13:52:14 -07:00
David Haney 395960feef Docs: Updated standard token filter docs to indicate true behavior: doing nothing
Closes #9300
2015-01-15 21:33:29 +01:00
Tomoya Hirano 15d46988dc Fix typo in sample json
Fixes #9253
2015-01-12 15:58:16 +00:00
Michael McCandless dfb6d6081c Core: upgrade to current Lucene 5.0.0 snapshot
Elasticsearch no longer unlocks the Lucene index on startup (this was
dangerous, and could possibly lead to corruption).

Added the new serbian_normalization TokenFilter from Lucene.

NoLockFactory is no longer supported (index.store.fs.fs_lock = none),
and if you have a typo in your fs_lock you'll now hit a StoreException
instead of silently using NoLockFactory.

Closes #8588
2014-11-24 05:08:42 -05:00