Commit Graph

1916 Commits

Author SHA1 Message Date
Boaz Leskes 3c5106ae98 Added cluster health status to the Cluster Stats API
Relates to #4460
2013-12-18 12:03:49 +01:00
Chris Simpson 4f8c916eed [Docs] Fix Typo
Fixes small typo in the geo_distance aggregation docs.
2013-12-18 11:21:21 +01:00
Boaz Leskes 2b6214cff7 Added Cluster Stats API
Closes #4460
2013-12-17 13:14:46 +01:00
Grégory Quatannens c64abaae7e Fixing typo and grammar 2013-12-17 11:39:02 +01:00
Adrien Grand 33599d9a34 Compressed geo-point field data.
This commit allows to trade precision for memory when storing geo points.
This new field data impl accepts a `precision` parameter that controls the
maximum expected error for storing coordinates. This option can be updated on
a live index with the PUT mapping API.

Default precision is 1cm, which requires 8 bytes per geo-point (50% memory
saving compared to using 2 doubles).

Close #4386
2013-12-17 11:29:48 +01:00
Clinton Gormley 684affa5c7 [DOCS] Removed unused file 2013-12-17 11:28:19 +01:00
Alexander Reelsen b713cf56ed Allow to provide parameters not only through -D but as long parameters
All getopt long style parameters are now set as es. properties,

elasticsearch --path.data=/some/path

results in -Des.path.data=/some/path

Closes #4393
2013-12-17 10:43:27 +01:00
Alexander Reelsen c30945a3d8 Start elasticsearch in the foreground by default
Instead of using the '-f' parameter to start elasticsearch in the
foreground, this is now the default modus.

In order to start elasticsearch in the background, the '-d' parameter
can be used.

Closes #4392
2013-12-17 10:39:22 +01:00
Clinton Gormley 34b9b16233 [DOCS] Fixed some bad link refs 2013-12-16 18:07:33 +01:00
Martijn van Groningen 23d2b1ea7b Renamed top level `filter` to `post_filter`.
Closes #4119
2013-12-16 17:10:14 +01:00
Lee Hinman db431b7cb3 Remove the `field` and `text` queries.
The `text` query was replaced by the `match` query and has been
deprecated for quite a while.

The `field` query should be replaced by a `query_string` query with
the `default_field` specified.

Fixes #4033
2013-12-16 08:59:36 -07:00
Adrien Grand 4e7ce4ee02 Make field data changes immediately taken into account and add the ability to disallow field data loading.
This commit changes field data configuration updates so that they are
immediately taken into account for loading new segments. The way it works
is that field data configuration is now cached separately from the field
data cache, meaning that it is now possible to clear the field data
configuration from IndexFieldDataService while the cache will stay around. On
the next time that Elasticsearch will reload field data configuration, it will
check if there is already a cache entry, and reuse it if it exists.

To disable field data loading, all that is required is to change the field
data format to "none" (supported by all field data types) using the update
mapping API. Elasticsearch will then refuse to load field data on any new
segment, but field data which has been loaded on the previous segments will
remain available. So you need to clear the field data cache in order to
reclaim memory (otherwise memory will be reclaimed slower, as segments get
merged).

Close #4430
Close #4431
2013-12-16 14:34:33 +01:00
Adrien Grand 36bd9cc432 Aggregations: Ordinals-based string bucketing support.
When the ValuesSource has ordinals, terms ordinals are used as a cache key to
bucket ordinals. This can make terms aggregations on String terms significantly
faster.

Close #4350
2013-12-13 15:34:02 +01:00
Martijn van Groningen 10e2528cce Added the `force_source` option to highlighting that enforces to use of the _source even if there are stored fields.
The percolator uses this option to deal with the fact that the MemoryIndex doesn't support stored fields,
this is possible b/c the _source of the document being percolated is always present.

Closes #4348
2013-12-13 13:39:53 +01:00
Lee Hinman 77fcf71338 Add new `simple_query_string` query type
This adds support for Lucene's SimpleQueryParser by adding a new type
of query called the `simple_query_string`. The `simple_query_string`
query is designed to be able to parse human-entered queries without
throwing any exceptions.

Resolves #4159
2013-12-12 12:09:32 -07:00
Alexander Reelsen 81e13a870b Packaging: Ensure setting of sysctl vm.max_map_count
In order to be sure that memory mapped lucene directories are working
one can configure the kernel about how many memory mapped areas
a process may have. This setting ensure for the debian and redhat initscripts
as well as the systemd startup, that this setting is set high enough.

Closes #4397
2013-12-11 09:19:22 +01:00
Boaz Leskes 99b421925f Add wildcard support to field resolving in the Get Field Mapping API
Closes #4367
2013-12-10 23:46:37 +01:00
Simon Willnauer 6c189310b9 Remove 'term_index_interval' and 'term_index_divisor'
These settings are no longer relevant since they are codec /
postingsformat level settings since Lucene 4.0

Closes #3912
2013-12-10 16:54:08 +01:00
Martijn van Groningen ebf6519965 Added aggs option to percolate api documentation. 2013-12-10 14:09:37 +01:00
Lee Hinman bc9698a347 Support 'yaml' as a format for the Analyze API
Fixes #4311
2013-12-08 15:08:00 -07:00
Martijn van Groningen 8c1de501e7 Update percolator highlighting docs. 2013-12-07 16:40:49 -05:00
Adrien Grand 32eb5ffa92 [Docs] Document which encoding should be used in order to make sense of the offsets returned by the term vectors API.
Close #4363
2013-12-06 22:39:08 +01:00
Shay Banon 28eff2ba29 remove help command, list all cat commands in /_cat?h endpoint 2013-12-05 14:36:27 +01:00
Markus Fischer 2da0611dfb [DOCS] Completion suggest: Clarify de-duplication, optimize/merge
This contribution is based on the feedback given in issue #4254 and
issue #4255, and should clear things up, when suggestions are being
removed and not displayed anymore after deletion of data.
2013-12-05 11:10:56 +01:00
Nik Everett 8e34057bc0 Add support for combining fields to the FVH
The Fast Vector Highlighter can combine matches on multiple fields to
highlight a single field using `matched_fields`.  This is most
intuitive for multifields that analyze the same string in different
ways.  Example:
{
    "query": {
        "query_string": {
            "query": "content.plain:running scissors",
            "fields": ["content"]
        }
    },
    "highlight": {
        "order": "score",
        "fields": {
            "content": {
                "matched_fields": ["content", "content.plain"],
                "type" : "fvh"
            }
        }
    }
}

Closes #3750
2013-12-03 11:10:01 +01:00
Yousef 302c762d5e Wrong link to Token Filter 2013-12-03 10:39:13 +01:00
Nik Everett 7690b40ec6 Allow string fields to store token counts
To use this one you send a string to a field of type 'token_count'.  This
makes the most sense with a multi-field.
2013-12-03 09:39:32 +01:00
Alexander Reelsen 6528df2764 [DOCS] Test framework documentation
The java test framework using randomized testing is explained with a couple of examples.
2013-12-02 18:01:45 +01:00
Clinton Gormley 7d993fd917 [DOCS] Another cat?v change 2013-12-02 15:30:49 +01:00
Clinton Gormley 5b15ed73fa [DOCS] Linked cat-pending to cluster-pending 2013-12-02 15:29:47 +01:00
Clinton Gormley 992b2d82b0 [DOCS] Changed the _cat docs to use ?v instead of ?v=true 2013-12-02 15:27:41 +01:00
Clinton Gormley d9a480c97a [DOCS] Typos in aggregations 2013-12-02 15:14:25 +01:00
Conrad Pankoff 87246af256 [DOCS] Fixed typos and corrected grammar 2013-12-02 10:08:26 +01:00
uboness cdc7dfbb2c Changed the "script_lang" parameter to "lang" in all value source based aggs - to be consistent with all other script based APIs. 2013-12-02 02:01:03 +01:00
Clinton Gormley bc393b6d79 Changed the minScore comparator from > to >=
Closes #4303
2013-11-29 20:29:20 +01:00
uboness 0d6a35b9a7 - Added support for term filtering based on include/exclude regex on the terms agg
- Added javadoc to the TermsBuilder

Closes #4267
2013-11-29 13:46:48 +01:00
uboness afb0d119e4 - Added docs for the value_count aggregation
- Fixed typos in the terms facets docs
- Fixed aggregation docs layout
- Added docs for shard_size in term aggregation
2013-11-29 12:35:42 +01:00
Clinton Gormley b48344f296 [DOCS] Doc'ed cluster pending tasks 2013-11-29 08:21:26 +01:00
Andrew Raines 91999e14ce Add _cat/pending_tasks.
Closes #4251.
2013-11-29 01:09:06 -06:00
Lee Hinman 9939e81d88 [DOCS] Fix porter stem filter name in other stemming docs 2013-11-28 22:14:47 -07:00
Lee Hinman fb4e903e35 [DOCS] Fix name of porter stemming token filter 2013-11-28 22:01:19 -07:00
Clinton Gormley 6ce3495029 [DOCS] Fixed a bad link 2013-11-27 17:54:25 +01:00
Clinton Gormley cdc1935b6e [DOCS] Documented rest.action.multi.allow_explicit_index 2013-11-27 17:33:09 +01:00
Boaz Leskes c63d8c4fb5 [Docs] Added _source filtering to documentation
Relates to #3301
2013-11-26 19:16:24 +01:00
Britta Weber dbef64009f [DOC] add doc for multi term vector api
closes #3998
2013-11-26 17:03:14 +01:00
Alexander Reelsen bf74f49fdd Updated Analyzing/Fuzzysuggester from lucene trunk
* Minor alignments (like setter to ctor)
* FuzzySuggester has a unicode aware flag, which is not exposed in the fuzzy completion request parameters
* Made XAnalyzingSuggester flags (PAYLOAD_SEP, END_BYTE, SEP_LABEL) to be written into the postings format, so we can retain backwards compatibility
* The above change also implies, that these flags can be set per instantiated XAnalyzingSuggester
* CompletionPostingsFormatTest now uses a randomProvider for writing data to check for bwc
2013-11-26 12:52:06 +01:00
Martijn van Groningen a03556daa0 Added execution option to `range` filter, with the `index` and `fielddata` as values.
Deprecated `numeric_range` filter in favor for the `range` filter with `fielddata` as execution.

Closes #4034
2013-11-25 23:43:40 +01:00
uboness c7f6c5266d initial commit of the aggregations module
Closes #3300
2013-11-24 03:13:08 -08:00
Jun Ohtani 7bbe453273 [DOCS] Added elasticsearch-extended-analyze plugin 2013-11-21 09:48:00 +01:00
Clinton Gormley 7c59ed4087 [DOCS] Fixed duplicate docs ID in delete 2013-11-21 17:38:51 +11:00
Shay Banon a9880dcbf1 add timeout doc to delete 2013-11-20 12:50:03 -08:00
Matt Weber a841a422f6 Add a field data based TermsFilter
Add FieldDataTermsFilter that compares terms out of
the fielddata cache. When filtering on a large
set of terms this filter can be considerably faster
than using a standard lucene terms filter.

Add the "fielddata" execution mode to the
terms filter parser to enable the use of
the new FieldDataTermsFilter.

Add supporting tests and documentation.

Closes #4209
2013-11-19 19:18:16 +01:00
Andrew Raines 8fabeb1c0b First pass at cat docs. 2013-11-14 21:37:02 -05:00
Andrew Raines 5c085c1204 Fix misspellings. 2013-11-14 20:10:36 -05:00
Luca Cavanna 0aaa39d00a Minor improvements to indices filter and query & updated docs
Slightly simplified indices filter and query parsers code
Trimmed down tests where possible
2013-11-14 17:25:34 +01:00
Olivier Favre fa80ca97b2 Indices query/filter skip parsing altogether for irrelevant indices when possible
Closes #2416
2013-11-14 17:24:49 +01:00
Igor Motov 510397aecd Initial implementation of Snapshot/Restore API
Closes #3826
2013-11-10 18:26:56 -05:00
Lee Hinman f7d5d1e5c9 [DOCS] Update store docs to indicate mmapfs is now the default on 64-bit Linux 2013-11-09 11:42:43 -07:00
Clinton Gormley 5af4e02d6c [DOCS] Fix link to statsd plugin
Fixes #4128
2013-11-08 20:29:51 +01:00
Clinton Gormley 7189310764 In ctor of GeoPointFieldMapper, geohash_prefix now implicitly enables geohash option
Also improved docs for geopoint type and geohash_cell filte

Closes #3951
2013-11-08 13:52:17 +01:00
Clinton Gormley b27976fbed [DOCS] Fixed the fielddata regex example on core mapping 2013-11-07 17:09:18 +01:00
Clinton Gormley 3465e69e83 [DOCS] Changed all store:yes/no to store:true/false
which is how this setting is stored internally
2013-11-07 16:57:18 +01:00
Simon Willnauer 77bc5d5ecf release [1.0.0.Beta1] 2013-11-06 15:32:43 +01:00
Simon Willnauer 9654631186 Change 'standart' analyzer to use emtpy stopword list by default.
The 'default' / 'standard' analyzer can be a trappy default sicne it filters
english stopwords by default. Yet a default should not be dedicated to a certain language
since elasticsearch is used in many different scenarios where a standard analysis chain
with specialization to english full-text might be rather counter productive.

This commit changes the 'standard' analyzer to use an empty stopword list for indices
that are created from 1.0.0.Beta1 version onwards but will maintain backwards compatibiliy
for older indices.

Closes #3775
2013-11-05 21:07:21 +01:00
Shay Banon 7c32269f4f Dist. Percolation: Use .percolator instead of _percolator for type name
Use .percolator as the internal (hidden) type name for percolators within the index. Seems nicer name to represent "hidden" types within an index.
closes #4090
2013-11-05 20:02:59 +01:00
Boaz Leskes a9fdcadf01 [DOCS] Added documentation for the keep word token filter 2013-11-04 18:38:44 +01:00
Clinton Gormley 356de95840 Added simplified range syntax to query string docs 2013-11-04 18:18:36 +01:00
Ben McCann 46edfc484a [DOCS] Add some documentation about the performance of `_source` usage in scripts. 2013-11-04 11:05:55 +01:00
Igor Motov c724f0de5d Initial implementation of ResourceWatcherService
Closes #4062
2013-11-03 21:55:54 -05:00
Dan Everton 6df60b7271 [DOC] Improve documentation on search stats groups
Document the ability to return all search statistics groups and provide examples of returning search statistics for groups.
2013-11-01 13:53:39 +01:00
Martijn van Groningen 30ab6f841d [DOCS] Fixed percolate docs errors 2013-11-01 11:44:07 +01:00
Clinton Gormley 4206cc988e [DOCS] Typo on shingle tokenfilter 2013-10-31 20:18:00 +01:00
Alexander Reelsen dfcb3ca2d4 RegexpQueryBuilder now implements MultiTermQueryBuilder
This allows the RegexpQueryBuilder to be used in span queries

Added tests for all span multi term queries.
Also updated the documentation and removed mentioning of numeric range
queries for span queries (they have to be terms).

Closes #3392
2013-10-31 09:12:57 +01:00
Boaz Leskes 8819f91d47 Add a GetFieldMapping API
This new API allows to get the mapping for a specific set of fields rather than get the whole index mapping and traverse it.
The fields to be retrieved can be specified by their full path, index name and field name and will be resolved in this order.
In case multiple field match, the first one will be returned.

Since we are now generating the output (rather then fall back to the stored mapping), you can specify `include_defaults`=true on the request to have default values returned.

Closes #3941
2013-10-30 16:16:36 +01:00
Clinton Gormley 8b2efd4849 [DOCS] Added a version flag to percolation 2013-10-30 13:59:03 +01:00
Clinton Gormley 0585890a5f [DOCS] Fixed a typo 2013-10-30 13:57:18 +01:00
Alexander Reelsen 2ec9742147 [DOCS] Extending setup as a service documentation
* Tell people to use ES_JAVA_OPTS for es.node.name or similar parameters
* Showing a simple way to install Oracle JDK on ubuntu/debian

Closes #3999
2013-10-29 13:58:06 +01:00
David Pilato 5d90abf701 mget API should support global routing parameter
mget API support `_routing` field but not `routing` parameter.

Reproduction here:

```sh
curl -XDELETE "http://localhost:9200/test/"; echo
curl -XPUT "http://localhost:9200/test/" -d'{
   "settings": {
      "number_of_replicas": 0,
      "number_of_shards": 5
   }
}'; echo

curl -XPUT 'http://localhost:9200/test/order/1-1?routing=key1' -d '{
   "productName":"doc 1"
}'; echo
curl -XPUT 'http://localhost:9200/test/order/1-2?routing=key1' -d '{
   "productName":"doc 2"
}'; echo
curl -XPUT 'http://localhost:9200/test/order/1-3?routing=key1&refresh=true' -d '{
   "productName":"doc 3"
}'; echo

curl -XPOST 'http://localhost:9200/test/order/_mget?pretty' -d '{
    "docs" : [
        {
            "_index" : "test",
            "_type" : "order",
            "_id" : "1-1",
            "_routing" : "key1"
        },
        {
            "_index" : "test",
            "_type" : "order",
            "_id" : "1-2",
            "_routing" : "key1"
        },
        {
            "_index" : "test",
            "_type" : "order",
            "_id" : "1-3",
            "_routing" : "key1"
        }
    ]
}'; echo

curl -XPOST 'http://localhost:9200/test/order/_mget?pretty&routing=key1' -d '{
	"ids": [
		"1-1",
		"1-2",
		"1-3"
	]
}'; echo
```

Closes #3996.
2013-10-28 21:05:55 +01:00
Britta Weber c9dab6991e rename and document "index.mapping.date.parse_upper_inclusive" setting for date fields
The setting causes the upper bound for a range query/filter to be rounded up,
therefore the name `round_ceil` seems to make more sense.

Also this commit removes the redundant fourth parameter to DateMathParser.parse(..)
which was never used.
was:    parse(String text, long now, boolean roundUp, boolean upperInclusive)
is now: parse(String text, long now, boolean roundCeil)

closes #3914
2013-10-28 15:48:31 +01:00
Ben McCann cc4bc7d57d Fix nonsensical sentence in standard analyzer documentation so that it is more understandable 2013-10-25 00:18:32 +02:00
Luca Cavanna 48ac9747a8 Added third highlighter type based on lucene postings highlighter
Requires field index_options set to "offsets" in order to store positions and offsets in the postings list.
Considerably faster than the plain highlighter since it doesn't require to reanalyze the text to be highlighted: the larger the documents the better the performance gain should be.
Requires less disk space than term_vectors, needed for the fast_vector_highlighter.
Breaks the text into sentences and highlights them. Uses a BreakIterator to find sentences in the text. Plays really well with natural text, not quite the same if the text contains html markup for instance.
Treats the document as the whole corpus, and scores individual sentences as if they were documents in this corpus, using the BM25 algorithm.

Uses forked version of lucene postings highlighter to support:
- per value discrete highlighting for fields that have multiple values, needed when number_of_fragments=0 since we want to return a snippet per value
- manually passing in query terms to avoid calling extract terms multiple times, since we use a different highlighter instance per doc/field, but the query is always the same

The lucene postings highlighter api is  quite different compared to the existing highlighters api, the main difference being that it allows to highlight multiple fields in multiple docs with a single call, ensuring sequential IO.
The way it is introduced in elasticsearch in this first round is a compromise trying not to change the current highlight api, which works per document, per field. The main disadvantage is that we lose the sequential IO, but we can always refactor the highlight api to work with multiple documents.

Supports pre_tag, post_tag, number_of_fragments (0 highlights the whole field), require_field_match, no_match_size, order by score and html encoding.

Closes #3704
2013-10-24 23:38:00 +02:00
Luca Cavanna e981e411d7 [DOCS] rephrased docs for highlight no_match_size parameter
(removed 0.90.6 coming tag as it's needed only in 0.90 branch)
2013-10-24 14:38:32 +02:00
Nik Everett 14a709f563 Highlighting can return excerpt with no highlights
You can configure the highlighting api to return an excerpt of a field
even if there wasn't a match on the field.

The FVH makes excerpts from the beginning of the string to the first
boundary character after the requested length or the boundary_max_scan,
whichever comes first.  The Plain highlighter makes excerpts from the
beginning of the string to the end of the last token before the requested
length.

Closes #1171
2013-10-24 14:38:32 +02:00
Boaz Leskes 0e6e6f97dc Merge pull request #3940 from rboulton/patch-1
[Docs] Clean up wording in cluster health api doc
2013-10-22 04:09:13 -07:00
Markus Fischer 782d315da3 Fix markup 2013-10-21 16:11:09 +02:00
Richard Boulton b62cc7c716 Clean up wording to reduce confusion
The description of the timeout parameter was worded misleadingly; it implied that the API would wait until the cluster reached the desired level and then stayed at that level for the timeout. I've tweaked the sentence to remove the risk of confusion.
2013-10-21 12:37:50 +01:00
Clinton Gormley b2d82d7e75 [DOCS] Reorganised the highlight_query docs and added a version flag 2013-10-18 18:03:31 +02:00
Matt Weber 1e0a834c68 Document strict dynamic type mapping. 2013-10-18 08:29:31 -07:00
Nik Everett 60550e4cc2 phrase_len is not called phrase_length 2013-10-18 09:29:53 -04:00
Clinton Gormley adf0c8424b [DOCS] How to check max_file_descriptors 2013-10-17 11:54:36 +02:00
Martijn van Groningen b7c4adeea3 [Docs] update reference to remove documentation about percolating during an index, bulk or update request. 2013-10-16 16:31:36 +02:00
Martijn van Groningen 1d0841e2b8 Added initial documentation for the redesigned percolator. 2013-10-16 14:12:19 +02:00
Boaz Leskes 18e12ef66c [Docs] updated refrences to dynamic_date_formats 2013-10-16 12:04:31 +02:00
Boaz Leskes 57b2d45142 [Docs] added document for the lenient option in match queries 2013-10-16 10:53:25 +02:00
Alexander Reelsen 4d19239ec4 Add support for Lucene SuggestStopFilter
The suggest stop filter is an improved version of the stop filter, which
takes stopwords only into account if the last char of a query is a
whitespace. This allows you to keep stopwords, but to allow suggesting for
"a".

Example: Index document content "a word". You are now able to suggest for
"a" and get back results in the completion suggester, if the suggest stop
filter is used on the query side, but will not get back any results for
"a " as this is identified as a stopword.

The implementation allows to set the `remove_trailing` parameter for a
custom stop filter and thus use the suggest stop filter instead of the
standard stop filter.
2013-10-15 16:12:02 +02:00
Clinton Gormley 870346070e [DOCS] Added compound_on_flush docs and updated compound_format
docs to include note about accepting a float
2013-10-15 13:30:56 +02:00
Clinton Gormley d67331b554 [DOCS] Added script.disable_dynamic to the scripting page 2013-10-15 12:25:07 +02:00
steve mayzak 48656fd1ed removed a duplicate paragraphin config docs 2013-10-14 15:33:56 -07:00
Britta Weber 34441f3897 fix naming in function_score
- "boost" should be "boost_factor"
    - "mult" should be "multiply"

Also, store combine function names in ImmutableMap instead of iterating
over all possible names each time.

closes #3872 for master
2013-10-14 14:56:59 +02:00
Simon Willnauer 25d6f04f13 [DOCS] Note that cutoff_frequency doesn't handle stacked tokens gracefully 2013-10-14 14:09:38 +02:00
Britta Weber c3ab79a10e [DOCS] Add doc for delimited payload token filter 2013-10-14 13:41:35 +02:00
Clinton Gormley 9a062e465c [DOCS] Reorganised common API conventions 2013-10-13 16:46:56 +02:00
Clinton Gormley 4316b13880 [DOCS] Render common options on the same page 2013-10-13 14:14:50 +02:00
Shay Banon 420b3396f4 Set queue sizes by default on bulk/index thread pools
Now that we properly fixed the ability to set the queue size on the index / bulk thread pool, we should actually set them to a somehow reasonable value to protect from users potentially overflowing our system.

I suggest defaults to be 50 for bulk, and 200 for indexing.

Also, set the thread pool for get, which we should set (in a similar value to a "read" queue size we have today).
closes #3888
2013-10-12 21:51:37 +02:00
Subhash Gopalakrishnan b758b76da4 Support year units in date math expressions
According to http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-date-format.html, the date math expressions support M (month), w (week), h (hour), m (minute), and s (second) units. Why years are not supported? Please add support for year units.

Closes #3828.
Closes #3874.
2013-10-11 09:24:52 +02:00
Clinton Gormley 8462f88c39 [DOCS] Added more specific versions to the suggesters 2013-10-10 20:59:12 +02:00
Adrien Grand f2d75654bf Add clear warnings that only the default codec, postings format and doc values format have backward compatibility warranties. 2013-10-10 13:30:08 +02:00
Clinton Gormley ba1b4886e3 [DOCS] Moved "named filters/queries" up one level 2013-10-10 11:23:08 +02:00
Adrien Grand 4fa8f6f61f Doc values integration.
This commit allows for using Lucene doc values as a backend for field data,
moving the cost of building field data from the refresh operation to indexing.
In addition, Lucene doc values can be stored on disk (partially, or even
entirely), so that memory management is done at the operating system level
(file-system cache) instead of the JVM, avoiding long pauses during major
collections due to large heaps.

So far doc values are supported on numeric types and non-analyzed strings
(index:no or index:not_analyzed). Under the hood, it uses SORTED_SET doc values
which is the only type to support multi-valued fields. Since the field data API
set is a bit wider than the doc values API set, some operations are not
supported:
 - field data filtering: this will fail if doc values are enabled,
 - field data cache clearing, even for memory-based doc values formats,
 - getting the memory usage for a specific field,
 - knowing whether a field is actually multi-valued.

This commit also allows for configuring doc-values formats on a per-field basis
similarly to postings formats. In particular the doc values format of the
_version field can be configured through its own field mapper (it used to be
handled in UidFieldMapper previously).

Closes #3806
2013-10-09 16:34:30 +02:00
Lee Hinman dede6ee874 Remove extra 'processors' anchor in threadpool docs 2013-10-09 01:56:49 -06:00
Adrien Grand 97958ed02a Improved warm-up of new segments.
* Merged segments are now warmed-up at the end of the merge operation instead
  of _refresh, so that _refresh doesn't pay the price for the warm-up of merged
  segments, which is often higher than flushed segments because of their size.
* Even when no _warmer is registered, some basic warm-up of the segments is
  performed: norms, doc values (_version). This should help a bit people who
  forget to register warmers.
* Eager loading support for the parent id cache and field data: when one
  can't predict what terms will be present in the index, it is tempting to use
  a match_all query in a warmer, but in that case, query execution might not be
  much faster than field data loading so having a warmer that only loads field
  data without running a query can be useful.

Closes #3819
2013-10-08 23:06:55 +02:00
Clinton Gormley 264a00a40f [DOCS] Added pages explaining lucene query parser syntax and regular expression syntax 2013-10-07 14:42:49 +02:00
Clinton Gormley 7a53d41446 [DOCS] Changed capitalization of operator in rescore query 2013-10-05 17:18:15 +02:00
Clinton Gormley d062409309 [DOCS] Removed enable_position_increments in stop filter 2013-10-05 17:06:13 +02:00
Clinton Gormley ea05f4538c [DOCS] Updated ICU-Plugin docs from the repo README 2013-10-05 16:31:52 +02:00
Luca Cavanna b0fee6c01b Changed nested filter example to use an inner bool filter instead of a bool query, to demonstrate the usage of a filter rather than a query. 2013-10-04 14:08:37 +02:00
Clinton Gormley e53a26ff21 [DOCS] Fixed a typo in indices.get_templates 2013-10-03 11:40:29 +02:00
uboness f3c6108b71 introduced support for "shard_size" for terms & terms_stats facets. The "shard_size" is the number of term entries each shard will send back to the coordinating node. "shard_size" > "size" will increase the accuracy (both in terms of the counts associated with each term and the terms that will actually be returned the user) - of course, the higher "shard_size" is, the more expensive the processing becomes as bigger queues are maintained on a shard level and larger lists are streamed back from the shards.
closes #3821
2013-10-02 22:02:00 +02:00
Nik Everett 6b000d8c6d Support specifing score query on highlight.
This is useful if you want to highlight terms not in the search query or
you want sort highlighted snippets based on another query.

Closes #3630
2013-10-02 15:46:24 -04:00
Lee Hinman ba40aa374e Uniquify anchor links to fix asciidoc/docbook generation 2013-09-30 15:32:00 -06:00
Lee Hinman 0442b737be Add more anchor links to documentation
Related to #3679
2013-09-30 13:13:16 -06:00
Alexander Reelsen c63869b0be Documentation: Removed service wrapper, added rpm/deb package information 2013-09-26 14:30:25 +02:00
gtt116 6304d58e36 Remove a comma in doc to make example a valid json.
This will help reader to do a hurry up copy-paste test.
2013-09-24 15:23:23 +08:00
Costin Leau 3685a22e4a add docs on new service.bat facility 2013-09-23 18:24:31 +03:00
Martijn van Groningen d365a4ccba Added nested filter join option to the docs.
Closes #3738
2013-09-20 21:22:56 +02:00
Shay Banon 359d14ddc5 doc processors setting 2013-09-20 14:55:35 +02:00
Shay Banon 29c0f27a9e fix thread pool docs to remove blocking 2013-09-20 12:31:17 +02:00
Adrien Grand 90524d7ad2 Fix formatting of the documentation.
Remaining '@'s have been replaced with '`'s.
2013-09-18 12:35:44 +02:00
Britta Weber b7c3b50909 add date field to decay function doc 2013-09-17 19:54:31 +02:00
David Pilato 1e3ffa0df7 Add distance supported units 2013-09-17 14:21:45 +02:00
Clinton Gormley 85bba668f7 [DOCS] Tidied up various doc formatting errors 2013-09-16 16:13:01 +02:00
Clinton Gormley c2eb4a1c40 [DOCS] Tidied up function score 2013-09-16 15:57:08 +02:00
Clinton Gormley 422eed7985 [Docs] Added an added[0.90.4] flag to the disk based allocator 2013-09-16 15:57:07 +02:00
Simon Willnauer 85fcefc60d Allow include / exclude of completion stats via REST parameters
Stats can be retrieved on a per-feature / per-component  basis including the fields
they apply to. This commit add support for a 'completion' flag to include statistics
for the complition feature as well as 'completion_fields' to only
include certain fields into the returned statistics.
To disambiguate between 'fielddata' and 'completion' fields this commit
uses 'fields' as the default inclusion filter for stats fields only used
if not dedicated '[completion|fielddata]_fields' paramter is provided.

Relates to #3522
2013-09-16 11:28:32 +02:00
Martijn van Groningen f6f4b5014f Added docs for named queries.
Relates to #3581
2013-09-16 11:17:01 +02:00
Shay Banon 20745adadd Add dedicated Suggest Thread Pool
Add a dedicated suggest thread pool for the suggest API. With the new completion suggest type, which is purely CPU bounded, it makes more sense to have a dedicated thread pool for suggest compared to having it share the search thread pool and "competing" against other search operations.
closes #3698
2013-09-15 01:54:27 +02:00
Shay Banon df3f681ef0 Optimize API: Remove refresh flag
Refresh flag in optimize is problematic, since the shards refresh is allowed to execute on is different compared to the optimize shards. In order to do optimize and then refresh, they should be executed as separate APIs when needed.
closes #3690
2013-09-13 21:44:38 +02:00
Shay Banon 7cc48c8e87 Flush API: remove refresh flag
Refresh flag in flush is problematic, since the shards refresh is allowed to execute on is different compared to the flush shards. In order to do flush and then refresh, they should be executed as separate APIs when needed.
closes #3689
2013-09-13 21:09:45 +02:00
David Pilato ea4988e9dc Support for REST get ALL templates.
/_template shows: No handler found for uri [/_template] and method [GET]

It would make sense to list the templates as they are listed in the /_cluster/state call.

Closes #2532.
2013-09-13 15:08:59 +02:00
Clinton Gormley d6ecdecc19 [DOCS] Deprecated the from/to/include_lower/include_upper params
in the range query, range filter and numeric range filter.
Better to use gt/gte/lt/lte as they are explicit.
2013-09-12 15:07:36 +02:00
David Pilato 169cd007b5 Fix typo
Thanks to @ybonnel for finding it ;-)
2013-09-12 11:00:59 +02:00
Martijn van Groningen 8ddb809f98 If all scroll ids should be removed then the `_all` value should be used instead of not specifying any scroll ids. 2013-09-12 10:41:38 +02:00
Martijn van Groningen 0efa78710b Added clear scroll api.
The clear scroll api allows clear all resources associated with a `scroll_id` by deleting the `scroll_id` and its associated SearchContext.

Closes #3657
2013-09-10 21:17:34 +02:00
David Pilato fafc4eef98 Plugin Manager: add silent mode.
Now with have proper exit codes for elasticsearch plugin manager (see #3463), we can add a silent mode to plugin manager.

```sh
bin/plugin --install karmi/elasticsearch-paramedic --silent
```

Closes #3628.
2013-09-10 18:31:35 +02:00
David Pilato 764aa54f2d Plugin Manager should support -remove group/artifact/version naming
When installing a plugin, we use:

```sh
bin/plugin --install groupid/artifactid/version
```

But when removing the plugin, we only support:

```sh
bin/plugin --remove dirname
```

where `dirname` is the directory name of the plugin under `/plugins` dir.

Closes #3421.
2013-09-09 21:17:16 +02:00
Brad Fritz f3c0e39380 key is "index.store.type", not "index.storage.type" 2013-09-09 13:06:09 -04:00
Lee Hinman 7d52d58747 Add AllocationDecider that takes free disk space into account
This commit adds two main pieces, the first is a ClusterInfoService
that provides a service running on the master nodes that fetches the
total/free bytes for each data node in the cluster as well as the
sizes of all shards in the cluster. This information is gathered by
default every 30 seconds, and can be changed dynamically by setting
the `cluster.info.update.interval` setting. This ClusterInfoService
can hopefully be used in the future to weight nodes for allocation
based on their disk usage, if desired.

The second main piece is the DiskThresholdDecider, which can disallow
a shard from being allocated to a node, or from remaining on the node
depending on configuration parameters. There are three main
configuration parameters for the DiskThresholdDecider:

`cluster.routing.allocation.disk.threshold_enabled` controls whether
the decider is enabled. It defaults to false (disabled). Note that the
decider is also disabled for clusters with only a single data node.

`cluster.routing.allocation.disk.watermark.low` controls the low
watermark for disk usage. It defaults to 0.70, meaning ES will not
allocate new shards to nodes once they have more than 70% disk
used. It can also be set to an absolute byte value (like 500mb) to
prevent ES from allocating shards if less than the configured amount
of space is available.

`cluster.routing.allocation.disk.watermark.high` controls the high
watermark. It defaults to 0.85, meaning ES will attempt to relocate
shards to another node if the node disk usage rises above 85%. It can
also be set to an absolute byte value (similar to the low watermark)
to relocate shards once less than the configured amount of space is
available on the node.

Closes #3480
2013-09-09 09:49:30 -06:00
Clinton Gormley 9e6d30a14a [DOCS] Changed the deprecation of custom_boost/score/filters_score
queries to 0.90.4
2013-09-05 12:14:10 +02:00
Clinton Gormley 2b3a762c27 [DOCS] Function score was added in 0.90.4 not 1.00.Beta 2013-09-05 11:25:06 +02:00
Clinton Gormley 8257aba166 [DOCS] Fixed fielddata regex syntax 2013-09-04 23:20:56 +02:00
Clinton Gormley 6d667e5d41 [DOCS] Missing sort values now works for all field types 2013-09-04 23:20:55 +02:00
Clinton Gormley 765bd026f5 [DOCS] Added function score query 2013-09-04 23:20:55 +02:00
Clinton Gormley aa59ef2e84 [DOCS] Added the human flag 2013-09-04 23:20:55 +02:00
Clinton Gormley 9d0dd545cb [DOCS] Tidied up the plugins page and added Graphite and Statsd 2013-09-04 23:20:55 +02:00
Clinton Gormley e1c6f45ff0 [DOCS] Added clarification about global scope in facets 2013-09-04 23:20:55 +02:00
Clinton Gormley 08f8e77b8f [DOCS] Added fuzzy options to completion suggester 2013-09-04 23:20:55 +02:00
Clinton Gormley 047c86e3b2 [DOCS] Added wildcard template matching 2013-09-04 23:20:55 +02:00
Clinton Gormley 9f5d0b6e89 [DOCS] Added a few clarifications to the docs from the issues list 2013-09-04 23:20:55 +02:00
Clinton Gormley 94be785726 [DOCS] Added multi-index open/close 2013-09-04 23:20:55 +02:00
Clinton Gormley 5b60506b2e [DOCS] Added highlighting to the phrase suggester 2013-09-04 23:20:54 +02:00
Clinton Gormley 53ad7330fc [DOCS] Added docs for term vectors 2013-09-04 23:20:54 +02:00
Clinton Gormley eac2b3a52e [DOCS] Fixed typo 2013-09-04 23:20:54 +02:00
Clinton Gormley 393c28bee4 [DOCS] Removed outdated new/deprecated version notices 2013-09-03 21:28:31 +02:00
Simon Willnauer eb2fed85f1 Add 'min_input_len' to completion suggester
Restrict the size of the input length to a reasonable size otherwise very
long strings can cause StackOverflowExceptions deep down in lucene land.
Yet, this is simply a saftly limit set to `50` UTF-16 codepoints by default.
This limit is only present at index time and not at query time. If prefix
completions > 50 UTF-16 codepoints are expected / desired this limit should be raised.
Critical string sizes are beyone the 1k UTF-16 Codepoints limit.

Closes #3596
2013-09-03 10:26:37 +02:00
Boaz Leskes e807c99f27 Fixed a typo in the config of light finnish stemmer (old last_finish is still supported for backward compatibility)
Closes #3594
2013-08-29 10:15:40 +02:00
Clinton Gormley 822043347e Migrated documentation into the main repo 2013-08-29 01:24:34 +02:00