OpenSearch

Commit Graph

Author	SHA1	Message	Date
Simon Willnauer	9654631186	Change 'standart' analyzer to use emtpy stopword list by default. The 'default' / 'standard' analyzer can be a trappy default sicne it filters english stopwords by default. Yet a default should not be dedicated to a certain language since elasticsearch is used in many different scenarios where a standard analysis chain with specialization to english full-text might be rather counter productive. This commit changes the 'standard' analyzer to use an empty stopword list for indices that are created from 1.0.0.Beta1 version onwards but will maintain backwards compatibiliy for older indices. Closes #3775	2013-11-05 21:07:21 +01:00
Shay Banon	7c32269f4f	Dist. Percolation: Use .percolator instead of _percolator for type name Use .percolator as the internal (hidden) type name for percolators within the index. Seems nicer name to represent "hidden" types within an index. closes #4090	2013-11-05 20:02:59 +01:00
Boaz Leskes	a9fdcadf01	[DOCS] Added documentation for the keep word token filter	2013-11-04 18:38:44 +01:00
Clinton Gormley	356de95840	Added simplified range syntax to query string docs	2013-11-04 18:18:36 +01:00
Ben McCann	46edfc484a	[DOCS] Add some documentation about the performance of `_source` usage in scripts.	2013-11-04 11:05:55 +01:00
Igor Motov	c724f0de5d	Initial implementation of ResourceWatcherService Closes #4062	2013-11-03 21:55:54 -05:00
Dan Everton	6df60b7271	[DOC] Improve documentation on search stats groups Document the ability to return all search statistics groups and provide examples of returning search statistics for groups.	2013-11-01 13:53:39 +01:00
Martijn van Groningen	30ab6f841d	[DOCS] Fixed percolate docs errors	2013-11-01 11:44:07 +01:00
Clinton Gormley	4206cc988e	[DOCS] Typo on shingle tokenfilter	2013-10-31 20:18:00 +01:00
Alexander Reelsen	dfcb3ca2d4	RegexpQueryBuilder now implements MultiTermQueryBuilder This allows the RegexpQueryBuilder to be used in span queries Added tests for all span multi term queries. Also updated the documentation and removed mentioning of numeric range queries for span queries (they have to be terms). Closes #3392	2013-10-31 09:12:57 +01:00
Boaz Leskes	8819f91d47	Add a GetFieldMapping API This new API allows to get the mapping for a specific set of fields rather than get the whole index mapping and traverse it. The fields to be retrieved can be specified by their full path, index name and field name and will be resolved in this order. In case multiple field match, the first one will be returned. Since we are now generating the output (rather then fall back to the stored mapping), you can specify `include_defaults`=true on the request to have default values returned. Closes #3941	2013-10-30 16:16:36 +01:00
Clinton Gormley	8b2efd4849	[DOCS] Added a version flag to percolation	2013-10-30 13:59:03 +01:00
Clinton Gormley	0585890a5f	[DOCS] Fixed a typo	2013-10-30 13:57:18 +01:00
Alexander Reelsen	2ec9742147	[DOCS] Extending setup as a service documentation * Tell people to use ES_JAVA_OPTS for es.node.name or similar parameters * Showing a simple way to install Oracle JDK on ubuntu/debian Closes #3999	2013-10-29 13:58:06 +01:00
David Pilato	5d90abf701	mget API should support global routing parameter mget API support `_routing` field but not `routing` parameter. Reproduction here: ```sh curl -XDELETE "http://localhost:9200/test/"; echo curl -XPUT "http://localhost:9200/test/" -d'{ "settings": { "number_of_replicas": 0, "number_of_shards": 5 } }'; echo curl -XPUT 'http://localhost:9200/test/order/1-1?routing=key1' -d '{ "productName":"doc 1" }'; echo curl -XPUT 'http://localhost:9200/test/order/1-2?routing=key1' -d '{ "productName":"doc 2" }'; echo curl -XPUT 'http://localhost:9200/test/order/1-3?routing=key1&refresh=true' -d '{ "productName":"doc 3" }'; echo curl -XPOST 'http://localhost:9200/test/order/_mget?pretty' -d '{ "docs" : [ { "_index" : "test", "_type" : "order", "_id" : "1-1", "_routing" : "key1" }, { "_index" : "test", "_type" : "order", "_id" : "1-2", "_routing" : "key1" }, { "_index" : "test", "_type" : "order", "_id" : "1-3", "_routing" : "key1" } ] }'; echo curl -XPOST 'http://localhost:9200/test/order/_mget?pretty&routing=key1' -d '{ "ids": [ "1-1", "1-2", "1-3" ] }'; echo ``` Closes #3996.	2013-10-28 21:05:55 +01:00
Britta Weber	c9dab6991e	rename and document "index.mapping.date.parse_upper_inclusive" setting for date fields The setting causes the upper bound for a range query/filter to be rounded up, therefore the name `round_ceil` seems to make more sense. Also this commit removes the redundant fourth parameter to DateMathParser.parse(..) which was never used. was: parse(String text, long now, boolean roundUp, boolean upperInclusive) is now: parse(String text, long now, boolean roundCeil) closes #3914	2013-10-28 15:48:31 +01:00
Ben McCann	cc4bc7d57d	Fix nonsensical sentence in standard analyzer documentation so that it is more understandable	2013-10-25 00:18:32 +02:00
Luca Cavanna	48ac9747a8	Added third highlighter type based on lucene postings highlighter Requires field index_options set to "offsets" in order to store positions and offsets in the postings list. Considerably faster than the plain highlighter since it doesn't require to reanalyze the text to be highlighted: the larger the documents the better the performance gain should be. Requires less disk space than term_vectors, needed for the fast_vector_highlighter. Breaks the text into sentences and highlights them. Uses a BreakIterator to find sentences in the text. Plays really well with natural text, not quite the same if the text contains html markup for instance. Treats the document as the whole corpus, and scores individual sentences as if they were documents in this corpus, using the BM25 algorithm. Uses forked version of lucene postings highlighter to support: - per value discrete highlighting for fields that have multiple values, needed when number_of_fragments=0 since we want to return a snippet per value - manually passing in query terms to avoid calling extract terms multiple times, since we use a different highlighter instance per doc/field, but the query is always the same The lucene postings highlighter api is quite different compared to the existing highlighters api, the main difference being that it allows to highlight multiple fields in multiple docs with a single call, ensuring sequential IO. The way it is introduced in elasticsearch in this first round is a compromise trying not to change the current highlight api, which works per document, per field. The main disadvantage is that we lose the sequential IO, but we can always refactor the highlight api to work with multiple documents. Supports pre_tag, post_tag, number_of_fragments (0 highlights the whole field), require_field_match, no_match_size, order by score and html encoding. Closes #3704	2013-10-24 23:38:00 +02:00
Luca Cavanna	e981e411d7	[DOCS] rephrased docs for highlight no_match_size parameter (removed 0.90.6 coming tag as it's needed only in 0.90 branch)	2013-10-24 14:38:32 +02:00
Nik Everett	14a709f563	Highlighting can return excerpt with no highlights You can configure the highlighting api to return an excerpt of a field even if there wasn't a match on the field. The FVH makes excerpts from the beginning of the string to the first boundary character after the requested length or the boundary_max_scan, whichever comes first. The Plain highlighter makes excerpts from the beginning of the string to the end of the last token before the requested length. Closes #1171	2013-10-24 14:38:32 +02:00
Boaz Leskes	0e6e6f97dc	Merge pull request #3940 from rboulton/patch-1 [Docs] Clean up wording in cluster health api doc	2013-10-22 04:09:13 -07:00
Markus Fischer	782d315da3	Fix markup	2013-10-21 16:11:09 +02:00
Richard Boulton	b62cc7c716	Clean up wording to reduce confusion The description of the timeout parameter was worded misleadingly; it implied that the API would wait until the cluster reached the desired level and then stayed at that level for the timeout. I've tweaked the sentence to remove the risk of confusion.	2013-10-21 12:37:50 +01:00
Clinton Gormley	b2d82d7e75	[DOCS] Reorganised the highlight_query docs and added a version flag	2013-10-18 18:03:31 +02:00
Matt Weber	1e0a834c68	Document strict dynamic type mapping.	2013-10-18 08:29:31 -07:00
Nik Everett	60550e4cc2	phrase_len is not called phrase_length	2013-10-18 09:29:53 -04:00
Clinton Gormley	adf0c8424b	[DOCS] How to check max_file_descriptors	2013-10-17 11:54:36 +02:00
Martijn van Groningen	b7c4adeea3	[Docs] update reference to remove documentation about percolating during an index, bulk or update request.	2013-10-16 16:31:36 +02:00
Martijn van Groningen	1d0841e2b8	Added initial documentation for the redesigned percolator.	2013-10-16 14:12:19 +02:00
Boaz Leskes	18e12ef66c	[Docs] updated refrences to dynamic_date_formats	2013-10-16 12:04:31 +02:00
Boaz Leskes	57b2d45142	[Docs] added document for the lenient option in match queries	2013-10-16 10:53:25 +02:00
Alexander Reelsen	4d19239ec4	Add support for Lucene SuggestStopFilter The suggest stop filter is an improved version of the stop filter, which takes stopwords only into account if the last char of a query is a whitespace. This allows you to keep stopwords, but to allow suggesting for "a". Example: Index document content "a word". You are now able to suggest for "a" and get back results in the completion suggester, if the suggest stop filter is used on the query side, but will not get back any results for "a " as this is identified as a stopword. The implementation allows to set the `remove_trailing` parameter for a custom stop filter and thus use the suggest stop filter instead of the standard stop filter.	2013-10-15 16:12:02 +02:00
Clinton Gormley	870346070e	[DOCS] Added compound_on_flush docs and updated compound_format docs to include note about accepting a float	2013-10-15 13:30:56 +02:00
Clinton Gormley	d67331b554	[DOCS] Added script.disable_dynamic to the scripting page	2013-10-15 12:25:07 +02:00
steve mayzak	48656fd1ed	removed a duplicate paragraphin config docs	2013-10-14 15:33:56 -07:00
Britta Weber	34441f3897	fix naming in function_score - "boost" should be "boost_factor" - "mult" should be "multiply" Also, store combine function names in ImmutableMap instead of iterating over all possible names each time. closes #3872 for master	2013-10-14 14:56:59 +02:00
Simon Willnauer	25d6f04f13	[DOCS] Note that cutoff_frequency doesn't handle stacked tokens gracefully	2013-10-14 14:09:38 +02:00
Britta Weber	c3ab79a10e	[DOCS] Add doc for delimited payload token filter	2013-10-14 13:41:35 +02:00
Clinton Gormley	9a062e465c	[DOCS] Reorganised common API conventions	2013-10-13 16:46:56 +02:00
Clinton Gormley	4316b13880	[DOCS] Render common options on the same page	2013-10-13 14:14:50 +02:00
Shay Banon	420b3396f4	Set queue sizes by default on bulk/index thread pools Now that we properly fixed the ability to set the queue size on the index / bulk thread pool, we should actually set them to a somehow reasonable value to protect from users potentially overflowing our system. I suggest defaults to be 50 for bulk, and 200 for indexing. Also, set the thread pool for get, which we should set (in a similar value to a "read" queue size we have today). closes #3888	2013-10-12 21:51:37 +02:00
Subhash Gopalakrishnan	b758b76da4	Support year units in date math expressions According to http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-date-format.html, the date math expressions support M (month), w (week), h (hour), m (minute), and s (second) units. Why years are not supported? Please add support for year units. Closes #3828. Closes #3874.	2013-10-11 09:24:52 +02:00
Clinton Gormley	8462f88c39	[DOCS] Added more specific versions to the suggesters	2013-10-10 20:59:12 +02:00
Adrien Grand	f2d75654bf	Add clear warnings that only the default codec, postings format and doc values format have backward compatibility warranties.	2013-10-10 13:30:08 +02:00
Clinton Gormley	ba1b4886e3	[DOCS] Moved "named filters/queries" up one level	2013-10-10 11:23:08 +02:00
Adrien Grand	4fa8f6f61f	Doc values integration. This commit allows for using Lucene doc values as a backend for field data, moving the cost of building field data from the refresh operation to indexing. In addition, Lucene doc values can be stored on disk (partially, or even entirely), so that memory management is done at the operating system level (file-system cache) instead of the JVM, avoiding long pauses during major collections due to large heaps. So far doc values are supported on numeric types and non-analyzed strings (index:no or index:not_analyzed). Under the hood, it uses SORTED_SET doc values which is the only type to support multi-valued fields. Since the field data API set is a bit wider than the doc values API set, some operations are not supported: - field data filtering: this will fail if doc values are enabled, - field data cache clearing, even for memory-based doc values formats, - getting the memory usage for a specific field, - knowing whether a field is actually multi-valued. This commit also allows for configuring doc-values formats on a per-field basis similarly to postings formats. In particular the doc values format of the _version field can be configured through its own field mapper (it used to be handled in UidFieldMapper previously). Closes #3806	2013-10-09 16:34:30 +02:00
Lee Hinman	dede6ee874	Remove extra 'processors' anchor in threadpool docs	2013-10-09 01:56:49 -06:00
Adrien Grand	97958ed02a	Improved warm-up of new segments. * Merged segments are now warmed-up at the end of the merge operation instead of _refresh, so that _refresh doesn't pay the price for the warm-up of merged segments, which is often higher than flushed segments because of their size. * Even when no _warmer is registered, some basic warm-up of the segments is performed: norms, doc values (_version). This should help a bit people who forget to register warmers. * Eager loading support for the parent id cache and field data: when one can't predict what terms will be present in the index, it is tempting to use a match_all query in a warmer, but in that case, query execution might not be much faster than field data loading so having a warmer that only loads field data without running a query can be useful. Closes #3819	2013-10-08 23:06:55 +02:00
Clinton Gormley	264a00a40f	[DOCS] Added pages explaining lucene query parser syntax and regular expression syntax	2013-10-07 14:42:49 +02:00
Clinton Gormley	7a53d41446	[DOCS] Changed capitalization of operator in rescore query	2013-10-05 17:18:15 +02:00

1 2 3

103 Commits